Modern vlsi design by wayne wolf free download
Using a separate cell for the memory element would simply take up routing resources. For example, many logic elements also contain special circuitry for addition. Many FPGAs also incorporate specialized adder logic in the logic ele- ment.
The critical component of an adder is the carry chain, which can be implemented much more efficiently in specialized logic than it can using standard lookup table techniques. The next two examples describe the logic elements in two FPGAs.
They illustrate both the commonality between FPGA structures and the vary- ing approaches to the design of logic elements. The foundation of a logic cell is the pair of four-bit lookup tables. Their inputs are F1-F4 and G1-G4. Each lookup table can also be used as a bit synchronous RAM or as a bit shift register. Each slice also contains carry logic for each LUT so that additions can be performed. The arithmetic logic also includes an XOR gate.
Each slice includes a multiplexer that is used to combine the results of the two function generators in a slice. Another multiplexer combines the outputs of the multiplexers in the two slices, generating a result for the entire CLB. The registers can be configured either as D-type flip-flops or as latches. Each register has clock and clock enable signals. Each LAB includes 10 logic elements.
The logic elements in an LAB share logic elements some logic, such as a carry chain and some control signal generation. The output of the LUT is fed to a carry chain. The cascade chain is used for cascading large fanin functions. For example, an AND function with a large num- ber of inputs can be built using the cascade chain.
The output of the cas- cade chain goes to a register. To use all this logic, an LE can be operated in normal, arithmetic, or counter mode. As a result, the interconnect can be reconfigured, just as the logic elements can. A programmable connection between two wires is made by a CMOS transistor a pass transistor.
D register. A CMOS transistor has a good off-state though off-states are becoming worse as chip geometries shrink. However, the pass transistor is relatively slow, particularly on a signal path that includes several interconnection points in a row. As we will see in Section 3. These alternative circuits provide higher performance at the cost of additional chip area. The pass transistor is not a perfect on-switch, so a programma- ble interconnection point is somewhat slower than a pair of wires per- manently connected by a via.
In addition, FPGA wires are generally longer than would be necessary for a custom chip. In a custom layout, a wire can be made just as long as necessary. A net made of programmable interconnect may be longer, introducing extra capacitance and resistance that slows down the signals on the net.
As we saw in Section 3. The carry chains through the LEs are one example of short interconnect. As with high-speed highways with widely spaced exits, they have fewer connection points than local connec- tions. This reduces their impedance. Global wires may also include built-in electrical repeaters to reduce the effects of delay. Similarly, the ends of each wire must be able to connect to several different wires. Each of these choices requires its own con- nection box. This adds up to a large amount of circuitry and wiring that is devoted to programmable interconnection.
If we add too many choices, we end up devoting too much of the chip to programmable interconnect and not enough to logic. One of the key questions in the design of an FPGA fabric is how rich the programmable interconnect fabric should be.
Connections vary both in length and in speed. Most FPGAs offer several different types of wiring so that the type most suited to a particular wire can be chosen for that wire.
For example, the carry signal in an adder can be passed from LE to LE by wires that are designed for strictly local interconnections; longer con- nections may be made in a more general interconnect structure that uses segmented wiring. The next examples describe the interconnect systems in the two FPGAs we discussed earlier. It connects the LUTs, flip-flops, and general purpose interconnect.
It also interconnect provides internal CLB feedback. Finally, it includes some direct paths system for high-speed connections between horizontally adjacent CLBs. These paths can be used for arithmetic, shift registers, or other functions that need structured layout and short connections. Hex lines provide longer interconnect. The hex lines include buffers to drive the longer wires. There are 96 hex lines, one third bidirectional and the rest unidirectional.
Four partitionable busses are avail- able per CLB row. Another type of dedicated routing resource is the wires connecting the carry chain logic in the CLBs. The global routing system is designed to distribute high-fanout signals, including both clocks and logic signals. The primary global routing net- work is a set of four dedicated global nets with dedicated input pins. The clock distribution network is buffered to provide low delay and low skew: clock pin clock rows clock rows clock spine The secondary global routing network includes 24 backbone lines, half along the top of the chip and half along the bottom.
The chip also includes a delay-locked loop DLL to regulate the internal clock. A column line can also drive a row line; columns can be used to connect wires in two rows.
Some dedicated signals with buffers are provided for high-fanout sig- nals such as clocks. Because FPGAs are recon- figured relatively infrequently, configuration lines are usually bit-serial. However, it is possible to send several bits in parallel if configuration time is important. During prototyping and debugging, we change the configuration fre- quently. A download cable can be used to download the configuration directly from a PC.
When we move the design into production, we do not want to rely on a download cable and a PC. The FPGA upon power-up runs through a protocol on its configuration pins. However, there are cases when configuration time is important. This is particularly true when the FPGA will be dynamically reconfigured—reconfigured on-the-fly while the system is operating, such as the Radius monitor described in the next example. Example The Radius monitors for the Apple MacintoshTM computer [Tri94] operated in horizontal landscape and vertical portrait modes.
When Dynamic the monitor was rotated from horizontal to vertical or vise versa, the reconfiguration monitor contents changed so that the display contents did not rotate. Because long shift registers to hold the display bits were easily built on the FPGA, the part made sense even without reconfiguration. A mercury switch sensed the rotation and caused a new person- ality to be downloaded to the FPGA, implementing the mode switch. Glitches in the power supply voltage can cause memory circuits to change state.
Changing the state of a configu- ration memory cell changes the function of the chip and can even cause electrical problems if two circuits are shorted together. As a result, the memory cells used in configuration memory use more conservative designs than would be used in bulk SRAM. Configuration memory is slower to read or write than commodity SRAM in order to make the memory state more stable.
Although the configuration data is typically presented to the chip in serial mode in order to conserve pins, configuration is not shifted into the chip serially in modern FPGAs.
Many of these intermediate states will cause drivers to be shorted together, damaging the chip. Configuration bits are shifted into a temporary register and then written in parallel to a block of configura- tion memory [Tri98].
Manufacturing test circuitry is used to ensure that the chip was properly manufactured and that the board on which the chip is placed is properly manufactured.
JTAG is often called boundary scan because it is designed to scan the pins at the boundary between the chip and the board. During testing, the pins can be decoupled from their normal functions and used as a shift register. The process is controlled by the test access port TAP controller.
The standard also allows an optional test reset pin known as TRST. Each pin on the chip is modified to include the JTAG shift register logic.
Using this relatively small amount of logic, an outside unit can control and observe all the pins on the chip. Several pins are dedicated to configuration. The configuration mode is controlled by three pins M0, M1, and M2.
The DONE pin signals when configuration is finished. There are two technologies used to build FPGAs that need to be configured only once: antifuses and flash. In this section we will survey methods for using both to build FPGAs. When a programming voltage is applied across the anti- fuse, it makes a connection between the metal line above it and the via to the metal line below. The antifuse has several advantages over a fuse, a major one being that most connections in an FPGA should be open, so the antifuse leaves most programming points in the proper state.
Each antifuse must be programmed separately. The FPGA must include circuitry that allows each antifuse to be separately addressed and the programming voltage applied. Flash uses a floating gate structure in which a low-leakage capacitor holds a voltage that controls a transistor gate.
This memory cell can be used to control programming transistors. Figure shows the schematic of a flash-programmed cell. The mem- ory cell controls two transistors. One is the programmable connection point. It can be used for interconnect electrical nodes in interconnect or logic. The other allows read-write access to the cell.
Figure A single multiplexer used as a logic a out d0 element. When the multiplexer control a is 0, the output is d0; when the control is 1, the output is d1. This logic element lets us configure which signal is copied to the logic element output. Now consider the more complex logic element of Figure This ele- ment has two levels of multiplexing and four control signals.
The final multi- plexer stage is controlled by the OR of two other control signals. This provides a significantly more complex function. Members of this family range in capacity from 80, to 1 million gates. Example The Actel Axcelerator family has two types of logic elements: the C- cell for combinational logic and the R-cell for registers.
These cells are Actel organized into SuperClusters, each of which has four C-cells, two R- Axcelerator cells, and some additional logic. The signals can also be connected in their uncomplemented form.
The S and X bits are used for fast addition and are not available outside the SuperCluster. The cell includes logic for fast addition. The two bits to be added arrive at the A0 and A1 inputs. The carry logic is active when the CFN signal is high. This logic performs a carry-skip operation for faster addition. The S0 and S1 inputs act as data enables. The flip-flop provides active low clear and presets, with clear having higher priority.
A variety of clock sources can be selected by the CKS control signal; the CKP signal selects the polarity of the clock used to control the flip-flop. Each SuperCluster has two clusters. Each cluster has three cells in the pattern CCR. The multiplexer system can implement any function of three inputs except for three-input XOR. The feedback paths allow the logic element to be configured as a latch, in which case in2 is used as the clock and in3 as reset.
The logic element provides two output drivers, one for local interconnect and a larger driver for long lines. The next example describes the wiring organization of the Actel Axcel- erator family [Ac02]. The FastCon- nect system provides horizontal connections between logic modules Actel within a SuperCluster or to the SuperCluster directly below. CarryCon- Axcelerator nects route the carry signals between SuperClusters. DirectConnect con- interconnect nects entirely within a SuperCluster—it connects a C-cell to the neighboring R-cell.
A DirectConnect signal path does not include any system antifuses; because it has lower resistance it runs faster than programma- ble wiring.
Generic global wiring is implemented using segmented wiring channels. Routing tracks run across the entire chip both horizontally and verti- cally. Although most of the wires are segmented with segments of sev- eral different lengths, a few wires run the length of the chip.
The chip provides three types of global signals. Four routed clocks can drive the clock, clear, preset, or enable pin of an R-cell or any input of a C-cell.
Example The ProASIC K provides local wires that allow the output of each tile to be directly connected to the eight adjacent tiles. The chip also provides very long lines that run the length interconnect of the chip. The voltage is applied through the wires connected by the antifuse. The FPGA is architected so that all the anti- fuses are in the interconnect channels; this allows the wiring system to be used to address the antifuses for programming.
The gates of the pass transistors are controlled by program- ming signals that select the appropriate row and column for the desired antifuse, as shown in Figure The programming voltage is applied across the row and column such that only the desired antifuse receives the voltage and is programmed [ElG98].
Because the antifuses are permanently programmed, an antifuse-based FPGA does not need to be configured when it is powered up. No pins need to be dedicated to configuration and no time is required to load the configuration.
The pins on an FPGA must be programmable to accommodate the requirements of the configured logic. A standard FPGA pin can be con- figured as either an input, output, or three-state pin. Pins may also provide other features. Registers are typically provided at the pads so that input or output values may be held.
The slew rate of out- puts may be programmable to reduce electromagnetic interference; lower slew rates on output signals generate less energetic high-fre- quency harmonics that show up as electromagnetic interference EMI.
Example The Spartan-II 2. The pins on the chip are divided into eight banks, with each bank sharing the reference voltage pins. Pins within a bank must use standards that have the same VCCO. The IOB has three registers, one each for input, output, and three-state operation. These registers in the IOB can function either as flip-flops or latches.
The programmable delay element on the input path is used to eliminate variations in hold times from pin to pin. Propagation delays within the FPGA cause the IOB control signals to arrive at different times, causing the hold time for the pins to vary. The programmable delay element is matched to the internal clock propagation delay and, when enabled, eliminates skew-induced hold time variations. The circuit monitors the output value and weakly drives it to the desired high or low value.
The weak keeper is useful for pins that are connected to multiple drivers; it keeps the signal at its last valid state after all the drivers have disconnected. The size of a logic element determines how many can be put on a chip; the delay through a wire helps to determine the interconnection architecture of the fabric. We will rely heavily on the results of Chapter 2 throughout this section. A CMOS gate needs to implement only one cho- sen logic function. The logic element of an FPGA, in contrast, must be able to implement a number of different functions.
Antifuse-based FPGAs program their logic elements by connecting var- ious signals, either constants or variables, to the inputs of the logic ele- ments. The logic element itself is not configured as a SRAM-based logic element would be.
As a result, the logic element for an antifuse-based FPGA can be fairly small. Figure shows the schematic for a multi- plexer-based logic element used in early antifuse-based FPGAs. Table shows how to program some functions into the logic element by connecting its inputs to constants or signal variables. The logic element can also be programmed as a dynamic latch. Example compares lookup tables and static gates in some detail. Lookup table vs. The number of transistors in a static CMOS gate depend on both the number of inputs to the gate and the function to be implemented.
In contrast, the SRAM cell in the lookup table requires eight transistors, including the configuration logic. In addition, we need decoding circuitry for each bit in the lookup table. A straightforward decoder for the four-bit lookup table would be a multiplexer with 96 transistors, though smaller designs are possible. The delay of a static gate depends not only on the number of inputs and the function to be implemented, but also on the sizes of transistors used.
By changing the sizes of transistors, we can change the delay through the gate. The slowest gate uses the smallest transistors. The delay of a lookup table is independent of the function implemented and dominated by the delay through the SRAM addressing logic.
The power consumption of a CMOS static gate is, ignoring leakage, dependent on the capacitance connected to its output. The CMOS gate consumes no energy while the inputs are stable once again, ignoring leakage.
The SRAM, in contrast, consumes power even when its inputs do not change. The stored charge in the SRAM cell dissipates slowly in a mechanism independent of transistor leakage ; that charge must be replaced by the cross-coupled inverters in the SRAM cell. As we can see, the lookup table logic element is considerably more expensive than a static CMOS gate.
Because the logic element is so complex, its design requires careful attention to circuit characteristics. The lookup table for an SRAM-based logic element incorporates both the memory and the configuration circuit for that memory. SRAMs for LEs There are two possible organizations for the lookup table as shown in Figure a demultiplexer that causes one bit to drive the output or a multiplexer that selects the proper bit.
These organizations are logically equivalent but have different implications for circuitry. The demultiplexer selects a row to be addressed, and the shared bit lines are used to read or write the memory cells in that row. The shared bit line is very efficient in large memories but less so in small memories like those used in logic elements. Most FPGA logic elements use a mul- tiplexer to select the desired bit. SRAM multiplexer Should that multiplexer be made of static gates or pass transistors?
The design alternatives for the case of a two-input multiplexer are shown in Figure But as the number of series pass transistors grows the delay from the data input to the data output grows considerably. The delay through a series of pass transistors, in fact, grows as the square of the number of pass transistors in the chain, for reasons similar to that given by Elmore.
The choice between static gates and pass transistors therefore depends on the size of the lookup table. The next example compares the delay through static gate and pass transistor multiplexers. Example We want to build a b-input multiplexer that selects one of the b possible input bits. We will call the data input bits i0, etc. In our drawings we will show four-input multiplexers; these are multiplexer smaller than the multiplexers we want to use for lookup tables but they circuits are large enough to show the form of the multiplexer.
Ahmed Amine Jerraya and Wayne Wolf. You can download free embeded system books from following sites. Robert P. Dick , David L. Rhodes y. Email This BlogThis! Skip to main content.
Kindle Store. Go Search Hello Select your. It covers minimizing power consumption at every level of abstraction, from circuits to architecture and new insights into design-for-testability techniques that maximize quality despite quicker turnarounds. Search this site. Amy Denio PDF. Andrew Halcro PDF. Apex Omnium PDF. As Ever PDF. Goodreads helps you keep track of books you want to read.
0コメント