Unit # 5 DYNAMIC CMOS AND CLOCKING CONTENTS Twin-tub CMOS/Bulk process The starting material is an n+ or p+ substrate, with a lightly doped epitaxial layer on top. This epitaxial layer

Unit # 5

DYNAMIC CMOS AND CLOCKING

CONTENTS

5.1 Advantages of CMOS Over nMOS

5.2 CMOS Technologies

5.2.1 CMOS/SOI Technology

5.2.1.1 The CMOS/SOS Technology

5.2.2 CMOS/bulk Technology

5.2.2.1 p-well CMOS/Bulk process

5.2.2.2 n-well CMOS/Bulk process

5.2.2.3 Twin-tub CMOS/Bulk process

5.2.3 Latch-up in Bulk CMOS

5.2.3.1 Parasitic SCR structure

5.3 Static CMOS Design

5.4 Domino CMOS Structures

5.4.1 Domino CMOS logic examples

5.4.2 Cascaded domino CMOS logic gates

5.5 Charge Sharing

5.5.1 Solutions for charge sharing

5.6 Clocking

5.6.1 Clock generation

5.6.2 Clock distribution

5.6.3 Clocked storage elements

5.1 ADVANTAGES OF CMOS OVER nMOS The advantages of CMOS over nMOS are as follows:

• The most important advantage of CMOS is the very low static power dissipation in

compare with nMOS technology.

• Reduced power requirements lead to reduced cost and improved reliability of the final

circuit.

• Low power allows smaller, lower-cost power supplies and simplified power

distribution.

• Desirable speed-power product.

However, in recent times digital circuits are mainly CMOS circuits. We use nMOS

only when we want to fabricate fast and low-cost simple circuit. CMOS is preferred because

of its desirable speed-power product.

Disadvantages of CMOS over nMOS are as follows: • The larger number of process steps required to fabricate CMOS circuits.

• Larger die size.

• CMOS has lower gate density.

• Cost tends to increase with size than with process steps.

Larger area is due to:

• Prevent or minimize the latch-up.

• CMOS logic structures use twice the number of transistors.

• CMOS has more layout rules.

Speed/Power Performance of Available Technologies is as shown in figure 5.1.

Figure 5.1: Speed/Power Performance

5.2 CMOS TECHNOLOGIES

The categories of CMOS technologies are:

a. CMOS/SOI structures

b. CMOS/bulk (CMOS) on silicon substrate

The benefits of CMOS/SOI structures are the reduced load due to the absence of well-to-

substrate capacitance and very small interconnect-substrate capacitance. CMOS/SOI design

rules are simpler than CMOS/bulk rules. CMOS/SOI offers the nMOS designer an easy

transition to CMOS technology. CMOS/bulk requires a well (tub or an island) for at least one

type FET to provide electrical isolation

5.2.1 CMOS Silicon-on-Insulator (SOI) Technology

The SOI CMOS technology uses an insulating substrate to improve process characteristics

such as speed and latch-up susceptibility. The SOI CMOS technology allows the creation of

independent, completely isolated nMOS and pMOS transistors virtually side-by-side on an

insulating substrate.

The main advantages of this technology are:

� The higher integration density (because of the absence of well regions).

� Complete avoidance of the latch-up problem.

� Lower parasitic capacitances compared to the conventional p and n-well or twin-tub

CMOS processes.

A cross-section of nMOS and pMOS devices using SOI process is shown in figure 5.2.

Figure 5.2: SOI process

The SOI CMOS process is considerably more costly than the standard p & n-well

CMOS process. Yet the improvements of device performance and the absence of latch-up

problems can justify its use, especially for deep-sub-micron devices.

5.2.1.2 The CMOS/SOS Technology

Silicon-on-sapphire (SOS) is the highest-performance SOI technology today. In this

approach, silicon is grown on a sapphire substrate, and islands are formed by implant or

diffusion. The n-channel and p-channel transistors are built on the islands as shown in figure

5.3. High performance is achieved due to a significant reduction in parasitic capacitance, and

high gate density is achieved.

Figure 5.3: SOS process

Sapphire (Al2O3) is a good insulator and the lattice constants of silicon and sapphire

match well. When sapphire is used as the substrate, the epitaxial growth of silicon yields the

monocrystalline material. Sapphire is not affected by radiation as bulk silicon is, which

makes it a preferred material for military application which requires radiation-hardened

devices.

Disadvantages are:

• Manufacturing difficulty.

• High cost of sapphire wafers.

• Not competitive in high-volume, low-cost markets.

5.2.2 CMOS/bulk Technology

The CMOS/Bulk technologies are classified as follows:

a. p-well CMOS/Bulk process

b. n-well CMOS/Bulk process

c. twin-tub CMOS/Bulk process

5.2.2.1 p-well CMOS/Bulk process

The p-well CMOS/bulk uses p-type diffusion into an n-type bulk silicon substrate to form a

p-well for n-channel transistors. The p-channel transistors are directly built into n-substrate

as shown in figure 5.4.

Figure 5.4: p-well process

5.2.2.2 n-well CMOS/Bulk process

The n-well CMOS/bulk uses n-type diffusion into a p-type bulk silicon substrate to form an

n-well for p-channel transistors. The n-channel devices are built directly into the bulk p-

substrate as shown in figure 5.5; hence nMOS gives good performance than pMOS. This

process provides faster circuit than p-well CMOS process.

Figure 5.5: p-well process

Both p-well & n-well need contacts and leave minimum spacing (dead space) between the

edges of their wells.

5.2.2.3 Twin-tub CMOS/Bulk process

The starting material is an n+ or p+ substrate, with a lightly doped epitaxial layer on top. This

epitaxial layer provides the actual substrate on which the n-well and the p-well are formed.

Two independent doping steps are performed for the creation of the well regions; the

dopant concentrations can be carefully optimized to produce the desired device

characteristics. In p- and n-well CMOS process, the doping density of the well region is

higher than the substrate, which, among other effects, results in unbalanced drain parasitic.

The twin-tub process avoids this problem. The process is costlier and more complex.

The twin-tub process combines n-well and p-well technologies as shown in figure 5.6.

Figure 5.6: twin-well process

Twin-tub process has highest overall performance compared to n-well & p-well

process; it provides full freedom for the designer to optimize the performance of both the n-

channel & p-channel devices. This technology provides the basis for separate optimization of

the nMOS and pMOS transistors, thus making it possible for threshold voltage, body effect

and the channel transconductance of both types of transistors to be tuned independently.

5.2.3 Latch-up in Bulk CMOS

CMOS devices have parasitic bipolar transistors which can cause latch-up. Latch-up, is a

condition in which high current exist between VDD & GND. In latch-up, each collector of a

parasitic BJT is feeding the base of another parasitic BJT in a positive feedback configuration

forming a SCR. CMOS ICs have parastic silicon-controlled rectifiers (SCRs). When powered

up, SCRs can turn on, creating low-resistance path from power to ground. Latch-up can

cause malfunctioning and even destroy devices. Latch-up is terminated when power to the

SCR is interrupted.

The latch-up can occur in both p-well and n-well CMOS processes. Causes for the latch-

up are internal transient currents or voltages during power-up, external glitches on I/O pads,

and external radiation.

The triggering methods for the latch-up are current injected into the npn emitter, current

injected into the pnp emitter, and drastic current/voltage changes on any mode.

5.2.3.1 Parasitic SCR structure

Parasitic bipolar transistors (npn and pnp) exists in a CMOS structure, as shown in figure 5.7.

The well and the substrate have resistances Rw and Rs respectively.

Figure 5.7: Parasitic SCR structure

Latch-up Prevention • Two basic concepts (for reducing loop gain)

– Reduce Rwell and Rsubstrate

– Reduce parasitic npn and pnp transistors ( i.e. reduce Ic1 and Ic2)

• Decrease the current gains of the parasitic transistors

• Two basic ways:

– Latch-up resistant CMOS process

– Layout techniques

• Internal latch-up prevention techniques:

– Every well must have a substrate contact of the appropriate type.

– Every substrate contact should be connected to metal directly to a supply pad

(i.e., no diffusion or polysilicon underpasses in the supply rails)

• Use guard rings around the p- and/or n-wells, and making frequent contacts to the

rings.

• Place substrate contacts as close as possible to the source connection of transistors

connected to the supply rails (i.e., Vss in n-devices, Vdd in p-devices).

– This reduces the value of Rsubstrate and Rwell.

– A very conservative rule is place one substrate contact for every supply (Vss

or Vdd) connection.

• Otherwise a less conservative rule is to place a substrate contact for every 5-10

transistors or every 25-100µm.

5.3 STATIC CMOS DESIGN

Static CMOS Design is discussed in unit 3.

5.4 DOMINO CMOS STRUCTURES

Domino CMOS is a special form of precharge and evaluate CMOS with an inverting buffer

at the output. Problem with faulty discharge of precharged nodes in CMOS dynamic logic

circuits can be solved by placing an inverter in series with the output of each gate: All inputs

to N logic blocks therefore will be at zero volts during precharge and will remain at zero until

the evaluation stage has logic inputs to discharge the precharged node. However, all circuits

only provide non-inverted outputs. The generalized circuit diagram of a domino CMOS gate

is as shown in figure 5.8.

Figure 5.8: Domino CMOS gate

During precharge phase (when Φ = 0) the output node of the dynamic CMOS stage is

precharged to a high level, and the output of the CMOS inverter becomes low. During

evaluation phase (when Φ = 1) there are two possibilities:

– The output node either discharged to a low level through nMOS circuitry, or

– It remains high

5.4.1 Domino CMOS logic Examples:

Domino CMOS logic examples are given in figure 5.9. Dynamic CMOS logic gate stage is

cascaded with static CMOS inverter stage.

Figure 5.9: Domino CMOS logic example

5.4.2 Cascaded domino CMOS logic gates

Cascading domino CMOS logic stages are as shown in figure 5.10.

.

Figure 5.10: Cascaded domino CMOS logic gates

Cascading domino CMOS logic gates with static CMOS logic gates is shown in figure 5.11.

Figure 5.11: Cascading domino CMOS logic gates with static CMOS logic gates

Dynamic domino circuits are fast and draw no quiescent power, no glitches on output

but they require a reasonable clock rate.

Limitation is that number of inverting static load stages in cascade must be even, so

that the inputs of the next domino CMOS stage experience only 0 to 1 transitions during the

evaluation, only non-inverting structures can be implemented, and they have potential charge

sharing problems.

.

5.5 CHARGE SHARING

Charge sharing problems occur when two capacitive nodes charged to different voltages are

connected through a pass transistor. When pass transistor is turned on, it connects the two

nodes, resulting in a redistribution of the charge on both nodes. Charge sharing is a serious

problem in precharge circuits and must be carefully guarded against. One solution is to make

any charge holding capacitor much larger than any capacitors it shares charge with. Charge

sharing between the dynamic stage output node and the intermediate nodes of the nMOS

logic block during evaluation phase may cause erroneous outputs.

Charge sharing between the output capacitance C1 and an intermediate node

capacitance C2 during the evaluation cycle may reduce the output voltage level as shown in

figure 5.12.

During precharge phase, the output node capacitance C1 is charged up to its logic-

high level of VDD through pMOS transistor. In next phase, the clock signal goes high and

the evaluation begins. If the input signal of the uppermost nMOS transistor switches from

low to high during this evaluation phase as shown in figure 5.12, the charge initially stored in

the output capacitance C1 will now be shared by C2, leading to the charge sharing

phenomenon. The output node voltage becomes VDD/2, if C1 = C2 in the evaluation phase.

Thus it is important to have C2 much smaller than C1.

Figure 5.12: Charge sharing

5.5.1 Solutions for charge sharing

A weak P device (with a small W/L ratio) is added for the dynamic CMOS stage output,

compensates for charge loss due to charge sharing and leakage at low frequency clock

operation as shown in figure 5.13 (a), since weak P device is always on, the static power

dissipation increases. Other way to realize this is to have a weak pMOS pull-up device in a

feedback loop can be used to prevent the loss of output voltage level due to charge sharing is

shown figure 5.13 (b), weak P device conducts only when the output of static gate goes low.

i.e. when precharge node voltage is kept high.

Figure 5.13: A weak p device compensates for charge sharing.

Another possible solution for charge sharing is to use separate pMOS transistors to

precharge-high all intermediate nodes in the nMOS transistors, as shown in figure 5.14.

Figure 5.14: Precharge-high all intermediate nodes of nMOS transistors

Other solution is obtained by graded sizing of nMOS transistors in series structures,

where the nMOS transistor closest to the output node has smallest (W/L) ratio and nMOS

transistor closest to the ground has highest (W/L) ratio.

5.6 CLOCKING

Synchronous systems use a clock to keep operations in sequence, this distinguishes from the

previous or next and determine speed at which machine operates. Clock must be distributed

to all the sequencing elements like flip-flops and latches and also distribute clock to other

elements such as Domino circuits and memories.

There are three requirements of the clocking system:

• Signals must occur at the correct time

• Clock must be able to drive the fan-out

• Rise & fall times of the clock pulses must be as short as possible

Long transition times not only slow the circuit but also increase power consumption.

Clocks must be laid out such that the delays from the source of each clock to clocked bistable

elements are identical. Clock signals switches between VDD and GND. Two-phase, non-

overlapping clocking has no timing errors due to races or hazards.

Clock Skew

– Absolute clock skew: difference in arrival of the edge of a clock phase at a destination

in the circuit, with respect to the clock edge at the source of the clock signal.

– Relative clock skew: difference in local clock lag.

Clock skew for rising and falling clock signals need not be same. Careful design of

layout is required to avoid the skew problems. Set-up time, Hold time and Minimum pulse

width are very important for the clocks. Clock delays can be treated as any bus delay

problem; fastest clocking should be established using suitable super-buffers (clock drivers) to

drive the clock bus, or by scaling the clock-driver loads by a factor 2.7. The bus must be kept

as short as possible, and in metal as much as possible.

5.6.1 Clock Generation

All clock signals can be derived from a system clock signal, which is a square-wave.

Multiphase clocks can be generated from a single square-wave input with two toggle flip-

flops and two AND gates as shown in figure 5.15.

Figure 5.15: Generation of two-phase clocking from a primary clock

Other way of generation is as shown in figure 5.16.

Figure 5.16: Non-overlapping clocks

5.6.2 Clock distribution

On a small chip, the clock distribution network is just a wire and possibly an inverter for

clkb. On practical chips, the RC delay of the wire resistance and gate load is very long.

Variations in this delay cause clock to get to different elements at different times, called

clock skew. Clock skew can be minimized by placing all gates of a tree on the same chip.

Most chips use repeaters to buffer the clock and equalize the delay, reduce skew, as shown in

figure 5.17.

The physical layout of the clock network must conform to design rules that ensure the

integrity of the clock signal by minimizing electrical coupling, switching currents, and

impedance mismatches. Equalizing path delays also helps to reduce the skew.

Figure 5.17: H-tree.

The clock signals can cross under power lines using diffusion as shown in figure 5.18.

Figure 5.18: Clock-line crossing under a power line using diffusion

To reduce the clock skew, clock distribution network is required, which requires,

plenty of metal wiring resources. Local Clock Gaters receives the global clock and produce

the physical clocks required by clocked elements.

Clock gaters are often used to stop or gate the clock to unused blocks of logic to save power.

Different clock gaters are:

– Enabled or Gated clock

– Stretched clocks

– Nonoverlapping clocks

– Complementary clocks

– Delayed, Pulsed clocks

– Clock Doubler

– Clock Buffer

Some of the clock gaters are as shown in figure 5.19, with output waveforms.

Figure 5.19: Examples of Clock gaters with output waveforms.

5.6.3 Clocked storage elements

A two-phase clocking scheme with combinational logic inserted between every pair of

registers yields a simple pipelined structure. Feedback path added around a cascade of two

combinational-logic blocks is shown in figure 5.20.

Figure 5.20: Feedback path around a cascade of two combinational-logic blocks

Documents

Unit # 5 DYNAMIC CMOS AND CLOCKING CONTENTS Twin-tub CMOS/Bulk process The starting material is an n+ or p+ substrate, with a lightly doped epitaxial layer on top. This epitaxial layer