Implementing System on a Chip Designs
The following is a condensed version of the notes for this class as it was taught during spring 2015.
Hardware simulation is meant to catch bugs in various levels of the design:
- Gate Level
- Logic of the system (Functional)
- Timing across the design
- Transistor level
- Electrical properties
- Thermal/Physical properties
For some of these, the simulation could continue beyond the level of the chip and even up to entire computer systems.
Verification & Debugging
- Verification - process of determining that what was built is what was specified
- Validation - process of determining if the final system meets the requirements
- Testing - covers all processes, but is used specifically for post-production process of determining if an instance of the design is fault-free
Exhaustive tests can be too complicated and infeasible. Representative samples can be generated instead:
- By hand
- Through an algorithm
- Biased randomness
Implementation and Design Trade-offs
Various factors influence the type of design used for a given application:
- Speed: Going significantly faster than required could require larger, more complex, power-hungry circuits.
- Power: Consumption is very important in battery-powered devices. Loosely related to size and speed of the device. Power needs to be dissipated as heat, so more usage requires better cooling systems.
- Size: In some cases, smaller chips are better and cheaper given the cost of the silicon wafers, but a given yield of functional chips increases that cost.
- Simplicity: Simple designs are likely to be small, but are also easier to build, test, debug, etc, which reduce development time and cost.
Synchronized (to a clock) designs are more simple to implement, are optimized in the toolchains, and the implementation tech (ASICs, FPGAs) are designed for synchronous tasks. Most importantly, the logic functions can be designed statically and glitches that happen before the clock pulses don't matter.
With Asynchronous design, care must be taken to ensure no glitches occur during logic transition.
- Increases speed:
- If two or more units can process at the same time, then the total time is reduced
- Increases resources required:
- flip-flops for piplining
- pre-evaluation of logic
- Can increase OR decrease complexity
- more simultaneous things to worry about
- logical separation of concerns can make it easier to debug and design
- CAN decrease power
- possibly employ several units running at a lower voltage
- increased clock speed
- increased latency
- if a stage has to stall, it might slow/halt the entire pipeline
- small area overhead
- MAY save power
- simplify designs
SIMD - Single Instruction, Multiple Data
- old technique
- if memory is slow, can use two memories and alternate between them
Some assumptions in synchronous model:
- T(su) - time the data needs to be stable before the clock
- T(hold) - time the data needs to stay stable after the clock
- T(pd) - output lag after the clock
Clock Skew - Difference in the clock's arrival times. Clock Jitter - variation of a clock's frequency around its specified value.
- High fan-out
- edges should be fast
- signals are repetative and regular, so latency doesn't matter as much
Reset Skew - activation of the reset doesn't matter as it's usually held for a significant amount of time, but the removalof reset can be a problem.
Maximum clock speed is set by the critical path.
Static Timing Analysis (STA) will identify significantly bad sections of logic at a low cost. Best to optimize logic early to avoid wasting effort later on.
Time Stealing - in a pipeline, deliberate clock skew can be added to delay the running of a faster stage so that the system can be clocked faster
Time Borrowing - same as stealing, but by exploiting transparent latches instead of edge-triggered flip-flops
Wave Pipline - piepline without latches. Could be the fastest way to implement logic, but nearly impossible to ensure parallel logic takes exactly the same amount of time
Period - T, the time between two identical points on a wave (i.e. rising edge of a clock) Frequency - 1/T
Harmonic frequencies - fixed phase relationship Non-Harmonic frequencies - drifting phase relationship
Clocks built from Crystal Oscillators, and no two oscillators are exactly the same. If a constant phase relationship is required, a single oscillator must be used.
To reduce power usage, clock gating may be used to stop the clock over an unused block. BUT USE CAUTION
Crossing clock domains can be problematic:
- Synchronous clocks avoid the problem
- Isochronous circuits have a harmonic relationship
- Asynchronous clocks cause problems
- There's a need for arbitration
Metastability - position halfway between a flip-flop registering 0 or 1, typically as a result of violating setup/hold conditions. Can stay here indefinitely.
No need to synchronize every signal crossing the boundary. When crossing these boundaries, there's always <emph>latency</emph> and <emph>chance of failure</emph> due to metastability.
Increasing Clock Frequency requires a Phase-Locked Loop (PLL). PLLs are machines capable of matching an input frequency. Low Pass Filters (LPF) pulse its output based on which of two input frequencies came first. Voltage Controlled Oscillator (VCO) oscillates naturally in a certain range of frequencies tuned by an input voltage (from a LPF in a PLL).
There's a need to connect different parts of a SoC with some sort of data interface.
Advanced Microcontroller Bus Architecture (AMBA):
- Advanced Peripheral Bus (APB) - simple/slow
- Advanced High-performance Bus (AHB)
- Advanced eXtensible Interface (AXI) - high bandwidth
On Chip buses are limited by distance rather than width, but are basically unidirectional these days.
Buffers - used to amplify electrical signal, and have a specific input/output which makes the wires unidirectional.
- transaction based
- independent, unidirectional channels
- many-cycle latency
- throughput improved through bursts
- out-of-order transactions are possible
Tightly Coupled Memory (TCM) - fast SRAM mapped to specific addressed. Sometimes preferred on microcontrollers over caches due to their consistent and predictable access times.
Bus Bridge - converts between bus protocols
Network on Chip - Typically either 2D grids, or random (like conventional networks)
Globally Async, Locally Sync (GALS) - frees from many timing constraints. Can run each block at its own 'best' frequency. Biggest disadvantage is the need for synchronization of signals when they arrive at their destination, and are thus slowed down.
Block Transfers - allow for higher bandwidth by buffering multiple data elements for a single sync cycle, which again increases latency, but also increases average bandwidth.
Enhancement Mode Device - because the device is normally off and requires a voltage on the gate to make it conduct.
- High voltage on, low voltage off
- electron-poor, positively-doped substrate
- negative charge repels more electrons, removing conductivity
- positive charge attracts electrons, creating a conductive path to turn on the gate
- Good at passing 0, but bad with 1
- Low voltage on, high voltage off
- Good at passing 1, but bad with 0
- Less conductive than NMOS
- typically need to be twice as wide to provide similar impedence
- Metal - the gate material
- Oxide - insulator between the gate and the substrate (not oxide, but same use)
- Silicon - substrate doped with various elements (Boron, Arsenic, Phosphorus, etc)
- Field - electric field created when the gate is charged
- Effect - caused by the field which 'sucks in' or 'drives away' electrons to control the
Transistor Length - distance through the channel between the diffusion source/drains, typically the smaller dimension. Transistor Width - wider means 'more' channel and more transconductance (drive strength)
CMOS (Complementary MOS) - combination of PMOS/NMOS to take advantage of the complementing pull directions in the two (though they aren't quite complementary). It's always inverting (NAND, NOR, NOT). Provide load on input, and dirve output.
Transistor Load - imposed by inputs of other gates (each transistor is a capacitor) and wires, which are both affected by fanout
Fanout - when big, makes edges slow. Possibly solved by bigger transistors, but those require more space and increase load on inputs
Standard Cells - used in the majority of tools.
Technology Mapping - converting the design into cells to be laid out on the chip
Transistor Series Stack - the deeper a transister is, the larger its width to provide the gate with equal switching edges
Transmission Gate - is a switch, will pass current in either direction
Macrocells - made from transistors rather than normal gates due to efficiency concerns
Power and power distribution
- Need to get power to where it's needed, but there will be a voltage drop.
- Waste heat needs to be removed
Vdd - voltage drain of the transistor
Vss - voltage of the supply source
Wires have inherent resistance, which causes a power dropfrom the supply.
- Dominant in active logic (because of gate switching)
- Scales with the square of the supply voltage
- Also has a component that depends on the supply voltage, the transistor threshold, and the input edge speed
- Can occur due to short circuits when one transistor turns on before the other turns off
- Need to keep edges fast (increase transistor widths, but that increases capacitive load)
- use high threshold transistors (slows switching and increases propagation delay. Best used on gates OFF the critical path)
- total: E = C.V^2
- occurs continuously
- depend on supply voltage and transistor thresholds
- can be reduced by power gating
Low Threshold Transistors - fast but power hungry (leaky)
High threshold Transistors - low power but slow
Transistor vs Gate threshold - gate threshold is the input voltage where the whole gate changes output logic state, whereas the transistor thresholds inside the gate can vary
Power Gating - low-leakage transistors can be used to cut power to sections of unneeded logic. Turned off sections are Dark Silicon Power Domains - power different parts of a SoC with different voltage (used in slower regions to save power). More supply distribution networks, and signal levels need to cross between domains carefully otherwise can cause problems (Use Level Shifter)
Dynamic Voltage/Frequency Scaling - vary the clock frequency and power voltage based on work load
Design -> Production
Current trend moves away from peripheral pad rings to flip-chip mounting, where bumps on the outside of the package are used for contacts.
Simulation Program with Integrated Circuit Emphasis(SPICE) - simulates circuit activity by solving differential equations. Takes a long time, and thus is only typically used to characterize standard cells Need to simulate edge cases like:
- low voltages
- high temperatures
- variability in doping
Typically classes as PVT(process, Voltage, Temperature)
Placement - act of mapping cells onto surface of silicon so they are close but do not overlap
Routeing - tries to interconnect the cells with wires, preferring lower layers to minimize parasitic load capacitance
Floorplanning - placement of large blocks rather than cells. The blocks are pre-generated and copy-pasted, like CPUs in a multiprocessor
Insert Bufferswhere wires have excessive capacitance, so that we speed up edges. Saves time overall and improves electrical integrity.
Other layout considerations:
- Power Analysis - potential hotspots, sufficient grid wiring
- Design Rule Checks (DRC) - make sure the layout can fit
- Electrical Rule Checks (ERC) - ensure legal electrical connections
- Antenna checks - verify the device will not be destroyed during manufacture
- layout versus schematic - verify the laid out hardware is the same as what was wanted
- want 100% coverage
- simplest tests are for 'stuck at' faults
- desire controllability, to set block inputs regardless of the block's location
- desire observability, to see a block's output
- Built-In Self Test (BIST)
Scan Chains - makes flip-flops controllable and observable by adding a path through most or all of them to allow clocking in and out of data. There is some area and performance hit with.
Boundary scan chains can be use in things like FPGAs, to program the contents, and software debugging, to get better CPU statistics.
Atoms are about .1nm radius, so 22nm transistors are only about 100 atoms long.
Photolithography - process to make chips via lighting an area, but there are limits to this given the wavelengths used, ~193nm, is much larger than the transistors' content.
With smaller chips have come lower voltages, but that reduces the allowed noise margin and makes it harder to distinguish between the transistor's on/off state
High-K dielectric is a measure of the field strength in the dielectric.
Larger variability in silicon makes smaller transistors more error prone.