4
High speed Domino Logic Circuits-Sahil Bansal (2010CS10244) 1 AbstractIn today’s world the most widely used logic style is static CMOS. The complementary metal oxide semiconductor (CMOS) circuit style falls under a broad class of logic circuits called static circuits in which at every point in time (except during the switching transients), each gate output is connected to either VDD or Vss via a low-resistance path. This is in contrast to the dynamic circuit class that relies on temporary storage of signal values on the capacitance of high-impedance circuit nodes. The domino Logic circuits are one of the most preferred circuits in the present world of high performance processors. This is because they have a speed and area advantage over the static logic circuits. Index Terms—CMOS, Domino logic, Dynamic logic circuits, noise- immunity I. INTRODUCTION HE static CMOS style is really an extension of the static CMOS inverter to multiple inputs. A static CMOS gate is a combination of two networks, called the pullup network (PUN) and the pulldown network (PDN). The function of the PUN is to provide a connection between the output and VDD anytime the output of the logic gate is meant to be 1 (based on the inputs). Similarly, the function of the PDN is to connect the output to VSS when the output of the logic gate is meant to be 0. The PUN and PDN networks are constructed in a mutually exclusive fashion such that one and only one of the networks is conducting in steady state. In review, the primary advantage of the CMOS structure is robustness (i.e., low sensitivity to noise), good performance, and low power consumption (with no static power consumption). Moreover it is easy to translate logic into MOSFETS. However, the CMOS structure face a major drawback of propagating the delay across the logic structure. Figure 1.1 shows the twoinput NAND gate and its equivalent RC switch level model. If both inputs are driven low, the two PMOS devices are on. The delay in this case is 0.69*(Rp/2)*CL, since the two resistors are in parallel. This is not the worstcase lowtohigh transition, which occurs when only one device turns on, and is given by 0.69 * Rp * CL. For the pulldown path, the output is discharged only if both A and B are switched high, and the delay is given by 0.69*(2RN)*CL to a first order. Moreover, the number of transistors required to implement an N fanin gate is 2N. This can result in significant implementation area. The large number of transistors (2N) increases the overall capacitance of the gate. For an Ninput NAND gate, the output capacitance increases linearly with the fanin since the number of PMOS devices connected to the output node increases linearly with the fanin. The fanout has a large impact on the delay of complementary CMOS logic as well. Each input to a CMOS gate connects to both an NMOS and a PMOS device, and presents a load to the driving gate equal to the sum of the gate capacitances. The above observations are summarized by the following formula, which approximates the influence of fanin and fanout on the propagation delay of the complementary CMOS gate tp = a1FI + a2FI2 + a3FO where FI and FO are the fanin and fanout of the gate, respectively, and a1,a2 and a3 are weighting factors that are a function of the technology. Figure 1.2 shows the propagation delay for both transitions as a function of fanin assuming a fixed fanout (NMOS: 0.5mm and PMOS: 1.5mm). As predicted above, the tpLH increases linearly due to the linearly increasing value of the output capacitance. The simultaneous increase in the pulldown resistance and the load capacitance results in an approximately quadratic relationship for tpHL. Gates with a fanin greater than or equal to 4 become excessively slow and must be avoided. High Speed Domino Logic Circuits (Nov. 2011) Sahil Bansal T

Domino Logic Circuits

Embed Size (px)

Citation preview

Page 1: Domino Logic Circuits

High speed Domino Logic Circuits-Sahil Bansal (2010CS10244)

1

Abstract—In today’s world the most widely used logic style is static CMOS. The complementary metal oxide semiconductor (CMOS) circuit style falls under a broad class of logic circuits called static circuits in which at every point in time (except during the switching transients), each gate output is connected to either VDD or Vss via a low-resistance path. This is in contrast to the dynamic circuit class that relies on temporary storage of signal values on the capacitance of high-impedance circuit nodes. The domino Logic circuits are one of the most preferred circuits in the present world of high performance processors. This is because they have a speed and area advantage over the static logic circuits. Index Terms—CMOS, Domino logic, Dynamic logic circuits, noise-immunity

I. INTRODUCTION HE  static  CMOS  style  is  really  an  extension  of  the  static  CMOS  inverter  to  multiple  inputs.  A  static  CMOS  gate  is  a   combination   of   two   networks,   called   the   pull-­‐up  

network   (PUN)   and   the   pull-­‐down   network   (PDN).   The  function  of  the  PUN  is  to  provide  a  connection  between  the  output   and   VDD   anytime   the   output   of   the   logic   gate   is  meant  to  be  1  (based  on  the  inputs).  Similarly,  the  function  of  the  PDN  is  to  connect  the  output  to  VSS  when  the  output  of   the   logic   gate   is   meant   to   be   0.   The   PUN   and   PDN  networks   are   constructed   in   a  mutually   exclusive   fashion  such   that  one  and  only  one  of   the  networks   is   conducting  in  steady  state.  In  review,  the  primary  advantage  of  the  CMOS  structure  is  robustness   (i.e.,   low   sensitivity   to   noise),   good  performance,  and   low  power  consumption   (with  no  static  power  consumption).  Moreover  it  is  easy  to  translate  logic  into  MOSFETS.  However,   the   CMOS   structure   face   a   major   drawback   of  propagating  the  delay  across  the  logic  structure.  Figure  1.1  shows   the   two-­‐input   NAND   gate   and   its   equivalent   RC  switch   level  model.   If  both   inputs  are  driven   low,   the   two  PMOS   devices   are   on.   The   delay   in   this   case   is  0.69*(Rp/2)*CL,  since  the  two  resistors  are  in  parallel.  This  is  not   the  worst-­‐case   low-­‐to-­‐high   transition,  which  occurs  when  only  one  device  turns  on,  and  is  given  by  0.69  *  Rp  *  CL.  For  the  pull-­‐down  path,  the  output  is  discharged  only  if  both  A  and  B  are  switched  high,  and  the  delay   is  given  by  0.69*(2RN)*CL   to   a   first   order.   Moreover,   the   number   of  transistors   required   to   implement   an  N   fan-­‐in   gate   is   2N.  This   can   result   in   significant   implementation   area.   The  large   number   of   transistors   (2N)   increases   the   overall  

capacitance   of   the   gate.   For   an   N-­‐input   NAND   gate,   the  output  capacitance  increases  linearly  with  the  fan-­‐in  since  

 the  number  of  PMOS  devices  connected  to  the  output  node  increases   linearly  with   the   fan-­‐in.   The   fan-­‐out  has   a   large  impact  on  the  delay  of  complementary  CMOS  logic  as  well.  Each  input  to  a  CMOS  gate  connects  to  both  an  NMOS  and  a  PMOS  device,  and  presents  a  load  to  the  driving  gate  equal  to   the   sum   of   the   gate   capacitances.   The   above  observations   are   summarized   by   the   following   formula,  which  approximates  the  influence  of  fan-­‐in  and  fan-­‐out  on  the  propagation  delay  of  the  complementary  CMOS  gate  

tp  =  a1FI  +  a2FI2  +  a3FO  where   FI   and   FO   are   the   fan-­‐in   and   fan-­‐out   of   the   gate,  respectively,  and  a1,  a2  and  a3  are  weighting  factors  that  are  a  function  of  the  technology.  Figure   1.2   shows   the   propagation   delay   for   both  transitions  as  a  function  of  fan-­‐in  assuming  a  fixed  fan-­‐out  (NMOS:   0.5mm   and   PMOS:   1.5mm).   As   predicted   above,  the   tpLH   increases   linearly   due   to   the   linearly   increasing  value  of  the  output  capacitance.  The  simultaneous  increase  in   the   pull-­‐down   resistance   and   the   load   capacitance  results  in  an  approximately  quadratic  relationship  for  tpHL.  Gates   with   a   fan-­‐in   greater   than   or   equal   to   4   become  excessively  slow  and  must  be  avoided.  

 

High Speed Domino Logic Circuits (Nov. 2011) Sahil Bansal

T

Page 2: Domino Logic Circuits

High speed Domino Logic Circuits-Sahil Bansal (2010CS10244)

2

II. DYNAMIC LOGIC CIRCUIT It  was  noted  earlier  that  static  CMOS  logic  with  a  fan-­‐in  of  N   requires   2N   devices.   However   an   alternate   logic   style  called  Dynamic   Logic   can   obtain   correct   result   using   N+2  devices  for  a  fan-­‐in  of  N.  With  the  addition  of  a  clock  input,  it  uses  a  sequence  of  pre-­‐charge  and  conditional  evaluation  phases  to  realize  complex  logic  functions.    The   basic   construction   of   a   N-­‐type   dynamic   logic   gate   is  shown  in  Figure  2.1  The  PDN  is  constructed  exactly  in  the  same  fashion  as  a  complementary  CMOS.  The  operation  of  this   circuit   can   be   divided   into   two   major   phases:   pre-­‐charge   and   evaluation,   with   the   mode   of   operation  determined  by  the  clock  signal.      

 2.2  N-­‐type  Network  

A   number   of   important   properties   can   be   derived   for   the  dynamic  logic  gate:  

• The   number   of   transistors   (for   complex   gates)   is  substantially   lower   than   in   the   static   case:   N   +   2  versus  2N.  

• It   is   non   ratioed.   The   sizing   of   the   PMOS   pre-­‐charge  device   is   not   important   for   realizing   proper  functionality   of   the   gate.   The   size   of   the   pre-­‐charge  device  can  be  made   large  to   improve  the   low-­‐to-­‐high  transition  time  (of  course,  at  a  cost  to  the  high-­‐to  low  transition   time).   There   is   however,   a   trade-­‐off   with  power   dissipation   since   a   larger   pre-­‐charge   device  directly  increases  clock  power  dissipation.  

• It   only   consumes   dynamic   power.   Ideally,   no   static  current   path   ever   exists   between  VDD  and  GND.   The  overall   power   dissipation,   however,   can   be  significantly  higher  compared  to  a  static  logic  gate.  

• The  logic  gates  have  faster  switching  speeds.  There  are  two  main  reasons   for   this.  The   first   (obvious)  reason  is   due   to   the   reduced   load   capacitance   attributed   to  the   number   of   transistors   per   gate   and   the   single-­‐transistor   load   per   fan-­‐in.   Second,   the   dynamic   gate  do  not  have  short   circuit   current,   and  all   the  current  provided  by  the  pull-­‐down  devices  go  into  discharging  the  load  capacitance.  

However,  dynamic  logic  circuits  do  have  certain  limitations  such   as   charge   leakage,   charge   sharing,   back   gate   (and   in  general   capacitive)   coupling,   and   clock   feed   through.  Figure   2.2   illustrates   the   effect   on   waveforms   due   to  Leakage   issues   in   dynamic   circuits.     Moreover   dynamic  logic   circuits   cannot   be   cascaded.   To   illustrate   this,   let   us  consider   2   simple   N-­‐type   dynamic   inverters   cascaded  together.  

 2.2 Waveform

2.3 Cascading of Dynamic N-type block

During   the   pre-­‐charge   phase   (i.e.,   CLK   =0),   the   output   of  both   inverters   is   pre-­‐charged   up   to  VDD.   Assume   that   the  primary   input   In  makes   a   0   to   1   transition   On   the   rising  edge   of   the   clock,   output   Out1   starts   to   discharge.   The  second   output   should   remain   in   the   pre-­‐charged   state   of  VDD  since  Out1  transitions  to  0  during  evaluation.  However,  since   there   is   a   finite   propagation   delay   for   the   input   to  discharge   Out1   to   GND,   the   second   output   also   starts   to  discharge.  As  long  as  Out1  exceeds  the  switching  threshold  of   the   second   gate,   which   approximately   equals   VTn,   a  conducting   path   exists   between   Out2   and   GND.   Out2  therefore   discharges   as   well,   resulting   in   incorrect  evaluation.  

III. DOMINO LOGIC CIRCUIT A Domino Logic module consists of an N-type dynamic

logic circuit followed by a static inverter as illustrated in figure 3.1. During  pre-­‐charge,   the  output  of   the  N-­‐type  dynamic  gate  is  charged  up  to  VDD  and  the  output  of  the  inverter  is  set  to  0.   During   evaluation,   the   dynamic   gate   conditionally  discharges   and   the   output   of   the   inverter   makes   a  conditional   transition   from  0   to  1.  The   input   to  a  Domino  gate   always   comes   from   the   output   of   another   Domino  gate.

Page 3: Domino Logic Circuits

High speed Domino Logic Circuits-Sahil Bansal (2010CS10244)

3

3.1 Domino Logic This  ensures  that  all  inputs  to  the  Domino  gate  are  set  to  0  at   end   of   the   pre-­‐charge   period.   Hence,   the   only   possible  transition  for  the  input  during  the  evaluation  period  is  the  0  to  1  transition,  so  that  the  formulated  rule  is  obeyed.  The  introduction   of   the   static   inverter   has   the   additional  advantage   that   a   static   inverter   with   a   low-­‐impedance  output,  which  increases  noise  immunity,  drives  the  fan-­‐out  of   the   gate.   The   buffer   furthermore   reduces   the  capacitance   of   the   dynamic   output   node   by   separating  internal  and  load  capacitances.    During   pre-­‐charge,   all   inputs   are   set   to   0.   During  evaluation,   the   output   of   the   first   Domino   block   either  stays  at  0  or  makes  a  0  to  1  transition,  affecting  the  second  Domino.  This  effect  might  ripple  through  the  whole  chain,  one   after   the   other,   as   with   a   line   of   falling   dominoes—hence   the   name.   Figure   3.2   illustrates   the   switching  behavior  of  static  and  a  domino  buffer  as  the  data  input  to  the  cell  rises.  The  lower  switching  voltage  of  a  domino  cell  leads   to  a   speedup  since   the   input  driving  cells  will   reach  the   lower   NMOS   threshold   voltage   quicker   than   a   higher  voltage  level.  The  speed  advantage  of  a  domino  cell  over  an  equivalent  static  design  is  in  the  range  of  1.5  to  2.5.  

The output voltage of a static and domino buffer as the input switches from 0

to 1

IV. OPTIMIZING DOMINO CIRCUITS

• Since   domino   gate   outputs   are   low   during   the   pre-­‐charge   phase,   gates,  which   have   only   domino   output  nodes,  as  inputs  don’t  need  the  “evaluate”  NFET  since  all  the  NFET’s  in  the  pull  down  will  be  off  anyway.

• With  the  inclusion  of  the  evaluation  devices  in  Domino  circuits,   all   gates   pre-­‐charge   in   parallel   and   the   pre-­‐charge  operation  is  only  two  gates  as  the  output  of  the  dynamic  gate  charges  to  VDD  and  the  inverter  output  is   driven   low.   The   critical   path   during   evaluation  happens   through   the   pull-­‐down   path   of   the   dynamic  gate  and  the  PMOS  pull-­‐up  path  of  the  static  inverter.  Therefore,  to  speed  up  the  circuit,  the  beta  ratio  of  the  static   inverter   should   be   made   high   so   that   the  switching   threshold   is   close   to   VDD.   This   can   be  accomplished   by   using   a   small   (minimum)   sized  NMOS   and   a   large  PMOS  device.   The  minimum  sized  NMOS  does  not  affect  the  performance  since  the  pre-­‐charge  happens   in  parallel.  The  only  disadvantage  of  

using  a  large  beta  ratio  is  a  reduction  in  noise  margin  is  reduced.  

• One  optimization  that  reduces  area  is  Multiple  Output  Domino   Logic.   The   basic   concept   is   illustrated   is  Figure   4.1.   The   idea   is   to   exploit   the   partial   trees   in  the   pull-­‐down   network   and   the   fact   that   certain  outputs  are  subsets  of  other  outputs.  

•  4.1 Multiple Circuit Domino

V. LIMITATIONS AND MODIFICATIONS

A. Leakage Currents: The   direct   tunneling   of   electrons   and   holes   through   the  gate   insulator   cause   ‘Gate   oxide   Leakage   Current’.  Tunneling   probability   of   carriers   increases   dramatically  with   the  scaling  of  gate  oxide   thickness   (tox)   in  each  new  technology   generation.   Generally   its   hard   to   avoid   charge  sharing  and  leakage  problems  in  a  dynamic  circuit  design.  Therefore   a   weak   P   device   (Staticizing   Gate)   is   added   to  compensate  for  charge  loss  due  to  leakage.  Even  though  it  does   affect   performance   but   the   gate   is   still   faster   than  static  CMOS.  

B. Charge Sharing: Charge   sharing   is   a   phenomenon   in   which   a   discharged  dynamic   node   is   pre-­‐charged   but   its   charge   gets  distributed   with   other   intermediated   pull-­‐down   network  nodes  resulting  in  lesser  charge  than  required.  In  the  given  figure,  when  CLK  goes  high,  the  voltage  on  the  dynamic  

5.1 Staticized gates

 node  goes  to: 3C/(3C+6C).Vdd = 0.3. Vdd

which  is  low  enough  to  switch  the  output  inverter.  However   this   situation   can   be   resolved   by   adding  

Page 4: Domino Logic Circuits

High speed Domino Logic Circuits-Sahil Bansal (2010CS10244)

4

additional pre-­‐charge   devices   to   intermediate   nodes   or  increasing   size   of   output   buffer   which   will   increase  capacitance   of   dynamic   node   (faster   output   buffer   may  compensate  for  larger  internal  capacitance).  

5.2 Charge Sharing

C. Inverting Logic A   major   limitation   in   Domino   logic   is   that   only   non-­‐  inverting   logic   can   be   implemented.   This   is   due   to   the  inclusion   of   the   static   inverter   at   the   output   of   each  dynamic  gate.  

5.3 Dual-rail Domino

This  issue  can  be  resolved  by  implementing  a  Dual-­‐rail  Domino  circuit  (by  generating  both  polarities  of  output).  

D. Capacitive Coupling

5.4 Capacitive Coupling

When   using   multiple-­‐input   gates   as   the   Domino   buffer  changes   in   the   “other”   input   during   evaluate   phase   can  cause   dynamic   node   voltage   to   sag   due   to   capacitive  coupling,  leading  to  unintended  transition.  

E. Technology Mapping For Domino Logic One   tricky   part   in   designing  Domino   Logic   circuit   is   that,  they   can’t   be   automated   as   efficiently   as   normal   logic  circuits.  This  is  because  there  is  a  need  to  put  in  extra  time  checks,  while   designing   the   circuit.   Also,   for   domino   logic  gates  since  the  number  of  possible  cells  is  extremely  large,  the  layout  of  a  cell  is  produced  on  the  fly  instead  of  using  a  parameterized  library  (a  collection  of  gates  in  a  simulation  software).

VI. CONCLUSION AND FURTHER SCOPE OF STUDY Domino   logic   circuit   techniques   have   been   extensively  applied   in   recent   high   performance   microprocessors   due  to   the   superior   speed   and   area   characteristics   of   domino  circuits   as   compared   to   static   CMOS   circuits.   However,  domino   logic   circuits   have   an   inherent   drawback   of  increased  noise  sensitivity.  Also,  there  is  problem  of  lack  of  design   automation   and   increased   power   dissipation.   But,  the  application  of  aggressive  circuit  design  techniques  that  only  focus  on  enhancing  circuit  speed  without  considering  power   is   no   longer   an   acceptable   approach   in   most   high  complexity   digital   systems.   Thus,   the   focus   should   be   on  further   modifications   such   as   Low   Swing   or   Reduced  Dynamic  voltage  swing  Domino  Logic  circuits.  

REFERENCES [1] http://www.webopedia.com/TERM/C/CMOS.html [2] bwrc.eecs.berkeley.edu/classes/icdesign/ee141_f00/.../chapter6.pdf [3] http://en.wikipedia.org/wiki/Cmos [4] http://en.wikipedia.org/wiki/Dynamic_logic_(digital_electronics) [5] http://en.wikipedia.org/wiki/Domino_logic [6] http://6004.csail.mit.edu/6.371/handouts/L11.pdf [7] http://assets.cambridge.org/97805218/73345/excerpt/9780521873345_ex

cerpt.pdf (Cambridge Notes on CMOS/Domino Logic) [8] Low Swing Dual Threshold Voltage Domino Logic by Volkan Kursun

and Eby G. Friedman [9] www.ece.gatech.edu/research/labs/gsigroup/.../Shakeri_asic_02.pdf [10] http://assets.cambridge.org/97805218/73345/excerpt/9780521873345_ex

cerpt.pdf [11] www.ece.ucsb.edu/bears/class/ece224a/Lecture7.ppt [12] users.ece.utexas.edu/~mcdermot/vlsi-2/Lecture_13.pdf (Lectures on

Domino Logic of University Of Texas) [13] Salendra.Govindarajulu - (IJCSE) International Journal on Computer

Science and Engineering Vol. 02, No. 05, 2010, 1741-1745 [14] www.ijcse.com/docs/IJCSE10-01-02-03.pdf [15] Images from: http: //6004.csail.mit.edu/6.371/handouts/L11.pdf and

bwrc.eecs.berkeley.edu/classes/icdesign/ee141_f00/.../chapter6.pdf