10
JOURNAL OF COMPUTER SCIENCE AND ENGINEERING, VOLUME 17, ISSUE 1, JANUARY 2013 1 Error-Detecting Code-Based Minimum Logic of Guaranteed Timing/Soft Error Resilience Bao Liu, Lu Wang and Fiona Teshome Abstract—Nanoscale VLSI systems are subject to increasingly prevalent catastrophic defects, soft errors, and signifi- cant parametric variations, which cannot be reduced below certain levels according to quantum physics, and must be handled by new design methods. In this paper, we leverage the existing fault-secure logic design techniques, and pro- pose design methodologies for (1) group-sliced logic (GSL) networks with outputs in group distance-two code for guar- anteed single soft error resilience, and (2) inversion-free logic (IFL) networks with outputs in delay-insensitive (DI) code, which clears all timing errors and achieves adaptive maximum performance in the absence of external soft errors at a higher area/power cost compared with the existing logic paradigms. Our experimental results demonstrate that (1) GSL achieves guaranteed single soft error resilience at the cost of an average of 1.63× area, 1.63× critical path delay, and 2.17× power consumption, while DMR achieves an average of 2.12× area, 1.26× critical path delay, and 2.79× power consumption compared with the minimum area design; (2) dual-rail static (Domino) IFL outperforms alternative delay-insensitive (DI) code-based static (Domino) IFL with less area, higher performance and lower power consumption in all test cases, and achieves an average of 2.29(2.41)× performance boost, 2.12(1.91)× layout area and 2.38(2.34)× power consumption compared with the traditional minimum area static logic based on the Nangate 45nm open cell li- brary. Index Terms—VLSI; Reliability; Performance; Fault Tolerance. —————————— —————————— 1 INTRODUCTION s VLSI technology scales into the nanometer do main, VLSI systems are subject to increasingly prev alent catastrophic defects and soft errors. They are further subject to significant parametric variations that cannot be reduced below certain levels at nanometer scale accord ing to quantum physics [3]. Such parametric variations lead to signal propagation delay variations at component level, which may accumulate and lead to timing errors. As a result, nanoscale VLSI design faces an unprecedent ed challenge that is to achieve a reliable system based on unreliable components [5]. The existing faulttolerant or soft/timing error resilient VLSI design techniques bear limitations. Asyn chronous design is inherently insensitive to performance variation. However, practical asynchronous design is lim ited due to the lack of an efficient design methodology [18]. BISER design [22,35], Razor logic [14] and other con currentchecking design techniques [16] provide limited timing/soft errordetecting/correcting capabilities based on dualmodule redundancy (DMR) or partial logic du plication. Triplemodule redundancy (TMR) [2] and n module redundancy (NMR) [12] provide the basis for traditional errorcorrecting faulttolerant systems. Mod ern wireless/onchip communication and nanoscale memory systems [13,25,31] rely on more complex and efficient error detecting/correcting codes (EDCC) [23] for reliability enhancement. EDCC have also been applied in VLSI systems [1,8,15,36]. In this paper, we leverage the existing totally self checking (TSC) / strongly faultsecure (SFS) logic design techniques [28], and propose design methodologies for (1) groupsliced logic (GSL) networks with outputs in a group distancetwo code, and (2) inversionfree logic (IFL) networks with outputs in a delayinsensitive (DI) code for soft/timing error resilience and maximum adap tive performance in nanoscale computing. Our experi mental results show that (1) GSL achieves guaranteed single soft error resilience at the cost of an average of 1.63× area, 1.63× critical path delay, and 2.17× power con sumption, while DMR achieves an average of 2.12× area, 1.26× critical path delay, and 2.79× power consumption compared with the minimum area design; (2) dualrail static (Domino) IFL outperforms alternative DI codebased static (Domino) IFL with less area, higher performance and lower power consumption for the large test cases, and achieves an average of 2.29(2.41)× performance boost, 2.12(1.91)× layout area and 2.38(2.34)× power consump tion compared with the traditional minimum area static logic based on the Nangate 45nm open cell library. The rest of the paper is organized as follows. We re view the background of the problem and briefly go through the existing techniques in Section 2.We present ———————————————— B. Liu and L. Wang are with the Electrical and Computer Engineering Department, the University of Texas at San Antonio, San Antonio, TX 78249. F. Teshome is with IBM, Tucson, AZ. A

Error-Detecting Code-Based Minimum Logic of Guaranteed Timing/Soft Error Resilience

Embed Size (px)

DESCRIPTION

Journal of Computer Science and Engineering, ISSN 2043-9091, Volume 17, Issue 1, January 2013http://www.journalcse.co.uk

Citation preview

Page 1: Error-Detecting Code-Based Minimum Logic of Guaranteed Timing/Soft Error Resilience

JOURNAL OF COMPUTER SCIENCE AND ENGINEERING, VOLUME 17, ISSUE 1, JANUARY 2013 1

Error-Detecting Code-Based Minimum Logic of Guaranteed Timing/Soft Error Resilience

Bao Liu, Lu Wang and Fiona Teshome

Abstract—Nanoscale VLSI systems are subject to increasingly prevalent catastrophic defects, soft errors, and signifi-cant parametric variations, which cannot be reduced below certain levels according to quantum physics, and must be handled by new design methods. In this paper, we leverage the existing fault-secure logic design techniques, and pro-pose design methodologies for (1) group-sliced logic (GSL) networks with outputs in group distance-two code for guar-anteed single soft error resilience, and (2) inversion-free logic (IFL) networks with outputs in delay-insensitive (DI) code, which clears all timing errors and achieves adaptive maximum performance in the absence of external soft errors at a higher area/power cost compared with the existing logic paradigms. Our experimental results demonstrate that (1) GSL achieves guaranteed single soft error resilience at the cost of an average of 1.63× area, 1.63× critical path delay, and 2.17× power consumption, while DMR achieves an average of 2.12× area, 1.26× critical path delay, and 2.79× power consumption compared with the minimum area design; (2) dual-rail static (Domino) IFL outperforms alternative delay-insensitive (DI) code-based static (Domino) IFL with less area, higher performance and lower power consumption in all test cases, and achieves an average of 2.29(2.41)× performance boost, 2.12(1.91)× layout area and 2.38(2.34)× power consumption compared with the traditional minimum area static logic based on the Nangate 45nm open cell li-brary.

 Index Terms—VLSI; Reliability; Performance; Fault Tolerance.

—————————— u ——————————

1 INTRODUCTION

s  VLSI  technology  scales  into  the  nanometer  do-­‐‑main,  VLSI  systems  are  subject  to  increasingly  prev-­‐‑

alent  catastrophic  defects  and  soft  errors.  They  are  further  subject  to  significant  parametric  variations  that  cannot  be  reduced  below  certain  levels  at  nanometer  scale  accord-­‐‑ing  to  quantum  physics  [3].  Such  parametric  variations  lead  to  signal  propagation  delay  variations  at  component  level,  which  may  accumulate  and  lead  to  timing  errors.  As  a  result,  nanoscale  VLSI  design  faces  an  unprecedent-­‐‑ed  challenge  that  is  to  achieve  a  reliable  system  based  on  unreliable  components  [5].    

The  existing  fault-­‐‑tolerant  or  soft/timing  error-­‐‑resilient  VLSI  design  techniques  bear  limitations.  Asyn-­‐‑chronous  design  is  inherently  insensitive  to  performance  variation.  However,  practical  asynchronous  design  is  lim-­‐‑ited  due  to  the  lack  of  an  efficient  design  methodology  [18].  BISER  design  [22,35],  Razor  logic  [14]  and  other  con-­‐‑current-­‐‑checking  design  techniques  [16]  provide limited  timing/soft  error-­‐‑detecting/correcting  capabilities  based  on  dual-­‐‑module  redundancy  (DMR)  or  partial  logic  du-­‐‑plication.  Triple-­‐‑module  redundancy  (TMR)  [2]  and  n-­‐‑module  redundancy  (NMR)  [12]  provide  the  basis  for  traditional  error-­‐‑correcting  fault-­‐‑tolerant  systems.  Mod-­‐‑

ern  wireless/on-­‐‑chip  communication  and  nanoscale  memory  systems  [13,25,31]  rely  on  more  complex  and  efficient  error  detecting/correcting  codes  (EDCC)  [23]  for  reliability  enhancement.  EDCC  have  also  been  applied  in  VLSI  systems  [1,8,15,36].            In  this  paper,  we  leverage  the  existing  totally  self-­‐‑checking  (TSC)  /  strongly  faultsecure  (SFS)  logic  design  techniques  [28],  and  propose  design  methodologies  for  (1)  group-­‐‑sliced  logic  (GSL)  networks  with  outputs  in  a  group  distance-­‐‑two  code,  and  (2)  inversion-­‐‑free  logic  (IFL)  networks  with  outputs  in  a  delay-­‐‑insensitive  (DI)  code  for  soft/timing  error  resilience  and  maximum  adap-­‐‑tive  performance  in  nanoscale  computing.  Our  experi-­‐‑mental  results  show  that  (1)  GSL  achieves  guaranteed  single  soft  error  resilience  at  the  cost  of  an  average  of  1.63×  area,  1.63×  critical  path  delay,  and  2.17×  power  con-­‐‑sumption,  while  DMR  achieves  an  average  of  2.12×  area,  1.26×  critical  path  delay,  and  2.79×  power  consumption  compared  with  the  minimum  area  design;  (2)  dual-­‐‑rail  static  (Domino)  IFL  outperforms  alternative  DI  codebased  static  (Domino)  IFL  with  less  area,  higher  performance  and  lower  power  consumption  for  the  large  test  cases,  and  achieves  an  average  of  2.29(2.41)×  performance  boost,  2.12(1.91)×  layout  area  and  2.38(2.34)×  power  consump-­‐‑tion  compared  with  the  traditional  minimum  area  static  logic  based  on  the  Nangate  45nm  open  cell  library.              The  rest  of  the  paper  is  organized  as  follows.  We  re-­‐‑view  the  background  of  the  problem  and  briefly  go  through  the  existing  techniques  in  Section  2.We  present  

———————————————— • B. Liu and L. Wang are with the Electrical and Computer Engineering

Department, the University of Texas at San Antonio, San Antonio, TX 78249.

• F. Teshome is with IBM, Tucson, AZ.

A

Page 2: Error-Detecting Code-Based Minimum Logic of Guaranteed Timing/Soft Error Resilience

2

our  preliminary  works  and  the  theoretical  foundation  of  the  proposed  logic  styles,  namely  the  Strongly-­‐‑Fault-­‐‑Secure  logic  design  techniques  in  Section  3,  before  pre-­‐‑senting  our  proposed  GSL  and  IFL  design  methodologies,  their  analysis  and  synthesis  flows  in  Section  4.  We  evalu-­‐‑ate  GSL  and  IFL  by  experiments  in  Section  5,  and  con-­‐‑clude  in  Section  6.    

2 BACKGROUND 2.1 Nanoscale VLSI Reliability Challenge Nanoscale  VLSI  reliability  is  challenged  by  a  number  of  phenomena.  In  the  manufacturing  process,  parametric  variations  include  geometric  variations  in  the  lateral  (transistor  channel  length)  and  vertical  (gate  oxide  thick-­‐‑ness)  dimensions,  dopant  fluctuation,  mechanical  stress,  etc.  Severe  process  variations  lead  to  catastrophic  defects,  e.g.,  open/short  interconnects,  transistor  oxide  break-­‐‑down,  transistor  channel  punchthrough,  etc.,  which  result  in  yield  loss.  Less  severe  process  variations  may  accumu-­‐‑late  over  time  (through  aging)  or  space  (across  a  chip,  i.e.,  volume  defects),  and  result  in  system  failure.  In  the  system  runtime,  VLSI  parametric  variations  include  on-­‐‑chip  temperature  variation,  supply  voltage  degradation,  stress  and  radiation  or  cosmos  ray  strikeinduced  single-­‐‑event  transients  (SETs).  Severe  runtime  parametric  varia-­‐‑tions  lead  to  soft  errors,  which  are  transient  logic  errors  during  system  runtime,  including:  race/hazard-­‐‑induced  circuit  intrinsic  glitches,  capacitive  or  inductive  intercon-­‐‑nect  crosstalk  noises,  radiation  or  cosmos  ray  strike-­‐‑induced  single-­‐‑event  upsets  (SEUs),  and  setup/hold  time  constraint  violation-­‐‑induced  timing  errors.  Less  severe  system  runtime  parametric  variations  may  accumulate  (e.g.,  along  a  signal  propagation  path),  and  lead  to  a  (e.g.,  timing)  error.  Such  significant  parametric  variations  have  led  to  increasingly  significant  performance  variability  at  recent  technology  nodes,  which  has  severely  hindered  performance  scaling.  Further,  VLSI  systems  will  be  sub-­‐‑ject  to  increasingly  prevalent  defects  and  soft  errors  in  the  long  run.  Such  prevalent  defects,  soft  errors,  and  signifi-­‐‑cant  parametric  variations  cannot  be  reduced  below  cer-­‐‑tain  levels  at  nanometer  scale  (e.g.,  by  improving  the  manufacturing  process)  due  to  the  uncertainty  principle  of  quantum  physics.  Consequently,  design  techniques  are  much  needed  for  defect,  soft  error,  and  performance  vari-­‐‑ation-­‐‑tolerant  nanoscale  VLSI  circuits.  

2.2 Existing Timing/Soft Error-Resilient and Other Fault-Tolerant Techniques Asynchronous  circuits  are  insensitive  to  unlimited  per-­‐‑formance  variations,  making  them  an  ideal  delay  insensi-­‐‑tive  circuit  paradigm  in  nanometer  technologies  [24,  28,32].  Practical  application  of  asynchronous  circuit  de-­‐‑sign  has  been  limited  due  to  the  lack  of  an  efficient  asyn-­‐‑chronous  design  methodology  (e.g.,  for  finite  state  ma-­‐‑

chines)  [18].  Complex  asynchronous  design  and  verifica-­‐‑tion  may  not  guarantee  function  correctness  for  all  possi-­‐‑ble  delay  variations  and  resultant  glitches.  As  a  result,  globally-­‐‑asynchronous-­‐‑locally-­‐‑synchronous  (GALS)  de-­‐‑sign  finds  increasing  applications  in  practice  today.  

The  first  BISER  (build-­‐‑in  soft  error-­‐‑resilient)  de-­‐‑sign  technique  was  proposed  to  enhance  soft  error  im-­‐‑munity  for  sequential  elements  [35],  wherein  a  Muller  C-­‐‑element  combines  the  outputs  of  two  duplicate  flip-­‐‑flops.  A  Muller  C-­‐‑element  is  a  basic  building  block  in  asynchro-­‐‑nous  circuit  design,  which  holds  its  output  unchanged  if  its  inputs  differ.  At  the  occurrence  of  a  soft  error  which  corrupts  one  of  the  two  flip-­‐‑flops,  the  Muller  C-­‐‑element  holds  the  output  unchanged,  achieving  single  soft  error  resilience.  

To  correct  soft  errors  in  combinational  logic  net-­‐‑works,  an  extension  of  the  technique  deploys  a  Muller  C-­‐‑element  that  combines  the  outputs  of  two  latches  that  sample  the  combinational  logic  outputs  nonsimultaneous-­‐‑ly,  or,  simultaneously  if  the  combinational  logic  network  is  duplicated  along  with  the  latch  [22].  

A  similar  technique  is  Razor  logic  [14]  or  Intel  er-­‐‑ror  detection  sequential  (EDS)  design  [7],  wherein  each  flip-­‐‑flop  is  accompanied  by  a  shadow  latch.  The  flip-­‐‑flop  and  the  shadow  latch  sample  the  same  combinational  logic  network  output.  Their  contents  are  compared.  Any  mismatch  indicates  either  a  timing  error  or  a  soft  error.  Upon  error  detection,  a  higher  level  response  mechanism  triggers  recomputation,  e.g.,  by  re-­‐‑fetching  instructions  at  the  architecture  level.  

BISER  design,  Razor  logic,  and  other  concurrent-­‐‑checking  schemes  [16]  achieve  limited  error-­‐‑detecting/correcting  capability  based  on  dual-­‐‑module  redundancy  (DMR)  or  partial  duplication.  Triple-­‐‑module  redundancy  (TMR)  [2]  and  n-­‐‑module  redundancy  (NMR)  [12]  provide  the  basis  for  traditional  error-­‐‑correcting  fault-­‐‑tolerant  systems.  Modern  fault-­‐‑tolerant  systems,  e.g.,  modern  wireless/on-­‐‑chip  communication  [26]  and  nanoscale  memory  systems  [13,25,31]    rely  on  more  effi-­‐‑cient  error-­‐‑detecting/correcting  codes  (EDCC)  [23]  for  reliability  enhancement.  

EDCC  have  also  been  applied  in  VLSI  design.  A  number  of  error-­‐‑detecting  codes,  including  parity  code,  group  parity  code  [11,30],  Hamming  code  [1,15,36],  one-­‐‑hot  code  [36],  m-­‐‑out-­‐‑of-­‐‑n  code  and  Berger  code  [4,19]  have  been  applied  in  self-­‐‑checking  control  logic  design.  AN  codes  and  residue  codes  have  been  applied  in  self-­‐‑checking  arithmetic  circuits  [8].  In  these  circuits,  error  detection  triggers  correction  mechanism  at  a  higher  level,  e.g.,  instruction  re-­‐‑fetching  at  the  architecture  level.  ECC  form  the  basis  of  fault-­‐‑tolerant  error-­‐‑correcting  logic.  Armstrong  proposed  forward  error-­‐‑correcting  logic  in  1961  [1],  wherein  an  error-­‐‑correcting  module  intercepts  the  sequential  element  outputs,  and  sends  the  corrected  

Page 3: Error-Detecting Code-Based Minimum Logic of Guaranteed Timing/Soft Error Resilience

3

signals  to  the  next  stage.  Some  of  the  state-­‐‑of-­‐‑the-­‐‑art  im-­‐‑plementations  can  be  found  in  recent  publications  [15,36].  

Alternatively,  in  backward  error-­‐‑correcting  logic,  cor-­‐‑rected  signals  are  sent  back  to  the  sequential  elements  in  the  current  stage,  without  necessarily  lengthening  the  timing  critical  path  [20].  However,  such  ECC-­‐‑based  error-­‐‑correcting  logic  is  inefficient  in  correcting  soft  errors  in  a  combinational  logic  network,  because  a  single  bit  soft  error  at  a  combinational  logic  network  may  propagate  in  multiple  paths  and  corrupts  a  number  of  combinational  logic  outputs,  which  may  exceed  an  ECC’s  error-­‐‑correcting  capacity.  For  example,  a  Hamming  code  cor-­‐‑rects  no  more  error  bits  than  its  Hamming  distance.    

3 PRELIMINARIES 3.1 Error Detection-Based Timing/Soft Error-

Resilient Logic A  recent  publication  has  proposed  error  detection  rather  than  error  correction  for  timing/soft  error  resilience  [20].  The  motivating  observation  is  that  a  timing/soft  error  lasts  only  for  a  few  nanoseconds.  Consequently,  an  on-­‐‑the-­‐‑fly  error-­‐‑detecting  scheme  which  halts  the  circuit  from  operation  at  the  occurrence  of  a  timing/soft  error  suffices  in  achieving  timing/soft  error  resilience.  The  cir-­‐‑cuit  resumes  operation  once  the  timing/soft  error  is  not  present.  Relying  on  error  detection  rather  than  error  cor-­‐‑rection  leads  to  much  reduced  hardware  overhead.  

For  example,  error  detection  can  be  combined  with  clock  gating  for  timing/soft  error  resilience  (Fig.  1).  In  this  logic  paradigm,  the  combinational  logic  outputs  are  encoded  in  an  error-­‐‑detecting  code  (EDC).  An  error-­‐‑detecting  logic  module  checks  the  combinational  logic  outputs  and  gates  the  clock.  The  flip-­‐‑flops  only  latch  codewords  at  the  logic  outputs.  If  a  timing/soft  error  leads  to  a  non-­‐‑codeword  at  the  combinational  logic  out-­‐‑puts,  the  error-­‐‑detecting  logic  blocks  the  clock,  and  pre-­‐‑vents  the  flip-­‐‑flops  from  latching  the  incorrect  signals.  Once  the  timing/soft  error  is  not  present,  and  a  codeword  appears  at  the  combinational  logic  outputs,  the  circuit  resumes  operation  at  the  earliest  moment,  achieving  min-­‐‑imum  performance  degradation.  

 Fig. 1. Error detection and clock gating-based soft error-resilient logic.

A  critical  problem  for  such  an  error  detection-­‐‑based  timing/soft  error-­‐‑resilient  logic  paradigm  is  as  fol-­‐‑lows.  Because  the  error-­‐‑detecting  logic  only  detects  non-­‐‑codewords  at  the  combinational  logic  outputs,  we  must  

prevent  the  combinational  logic  outputs  from  undergoing  an  undetected  transition  to  an  incorrect  codeword  at  the  occurrence  of  a  timing/soft  error.  For  example,  if  a  single  bit  soft  error  at  an  internal  node  propagates  through  the  combinational  logic  network  and  corrupts  two  output  bits,  it  would  not  be  detected  by  a  parity  checker.  If  we  encode  the  combinational  logic  outputs  in  a  Hamming  code,  we  must  guarantee  that  any  soft  error  in  the  combi-­‐‑national  logic  network  does  not  corrupt  more  output  bits  than  the  Hamming  distance.  Similarly,  for  an  error  detec-­‐‑tion-­‐‑based  self-­‐‑timing  logic,  we  must  guarantee  that  any  incomplete  combinational  logic  computation  does  not  lead  to  an  incorrect  codeword  at  the  combinational  logic  outputs.  

To  solve  this  problem,  we  revisit  the  TSC/SFS  logic  design  techniques  in  literature  [29].  

 3.2. Totally Self-Checking (TSC) / Strongly Fault-Secure (SFS) Logic Networks Definition  1.  A  logic  network  G  of  input  code  space  A  and  output  code  space  B  is  fault  secure  with  respect  to  a  fault  set  F  if  for  all  faults  f  ∈  F  and  all  code  inputs  a  ∈  A,  the  output  is  either  correct  G(a,  f)  =  G(a,  ∅),  or  is  a  non-­‐‑codeword,  i.e.,  be-­‐‑yond  the  output  code  space  G(a,  f)  ∉B.  Definition  2.  A  logic  network  G  of  input  code  space  A  and  output  code  space  B  is  self-­‐‑testing  with  respect  to  a  fault  set  F  if  for  each  fault  f  ∈  F,  there  is  at  least  one  code  input  that  produc-­‐‑es  a  non-­‐‑codeword  output  ∃  a  ∈  A  |  G(a,  f)  ∉  B.  Definition  3.  A  logic  network  G  is  totally  self-­‐‑checking  (TSC)  with  respect  to  a  fault  set  F  if  it  is  fault  secure  and  self-­‐‑testing  with  respect  to  F.  Definition  4.  A  logic  network  G  of  input  code  space  A  and  output  code  space  B  is  strongly  fault-­‐‑secure  (SFS)  with  respect  to  a  fault  set  F  if  for  all  fault  sequences  <  f1,  f2,  .  .  .  fn  >,  fi  ∈  F  and  all  code  inputs  a  ∈  A,  the  output  is  either  correct  G(a,<  f1,  f2,  .  .  .  fn  >)  =  G(a,  ∅)  or  is  a  non-­‐‑codeword  G(a,<  f1,  f2,  .  .  .  fn  >)  ∉  B.  The  set  of  strongly  fault-­‐‑secure  (SFS)  logic  networks  in-­‐‑cludes  all  totally  self-­‐‑checking  (TSC)  logic  networks  [29].  Literature  [29]  further  gives  two  specific  types  of  TSC/SFS  logic  networks  as  follows. 3.2.1. b-Byte Distance-Two Code-Based b-Byte Sliced Networks Definition  5.  A  b-­‐‑byte  distance-­‐‑two  code  is  a  code  in  which  the  codeword  bits  are  in  groups  of  size  b,  and  any  two  codewords  differ  in  at  least  two  groups.  Definition  6.  A  logic  network  is  b-­‐‑byte  sliced  if  the  outputs  are  in  groups  of  size  b,  and  each  group  is  realized  by  an  inde-­‐‑pendent  logic  network  having  only  the  primary  inputs  in  com-­‐‑mon.  Theorem  1.  A  b-­‐‑byte  sliced  logic  network  with  outputs  in  a  b-­‐‑byte  distance-­‐‑two  code  is  TSC/SFS  with  respect  to  single  faults.  For  example,  the  most  simple  b-­‐‑byte  distance-­‐‑two  code  is  achieved  by  duplicating  the  codeword  bits  (or  including  

Page 4: Error-Detecting Code-Based Minimum Logic of Guaranteed Timing/Soft Error Resilience

4

the  complements,  which  forms  a  dual-­‐‑rail  code),  while  the  most  simple  b-­‐‑byte  sliced  logic  network  is  achieved  by  duplicating  the  combinational  logic  network  (or  including  the  complementary  logic  network)  (Fig.  2).  Such  a  logic  network  is  TSC/SFS  to  any  single  bit  soft  error,  because  any  single  bit  soft  error  can  only  corrupt  output  bits  in  one  of  the  two  independent  logic  networks.  The  resultant  non-­‐‑codeword  will  be  detected  by  a  codeword  checker,  which  is  a  comparator  in  this  case.  3.2.2. DI Code-Based Inversion-Free Networks Definition  7.  A  binary  vector  x  covers  binary  vector  y  (x  ≥  y)  if  x  has  a  1  in  every  position  that  y  has  a  1,  or  y[i]  =  1  ⇒  x[i]  =  1  ∀i.  A  code  C  is  unordered,  all  unidirectional  error-­‐‑detecting  (AUED)  [Bose  and  Rao  1982],  or  delay-­‐‑insensitive  (DI)  if  no  codeword  covers  any  other  codeword  [Verhoeff  1988].  

Examples  of  unordered/AUED/DI  codes  include:  m-­‐‑out-­‐‑of-­‐‑n  or  m-­‐‑hot  codes,  Berger  codes,  and  dual-­‐‑rail  codes  [33].  A  m-­‐‑out-­‐‑of-­‐‑n  or  m-­‐‑hot  codeword  has  a  fixed  number  of  logic  one  bits.  A  Berger  codeword  includes  two  parts:  the  information  bits  and  the  check  bits,  wherein  the  check  bits  are  the  binary  representation  of  the  number  of  logic  zero’s  in  the  information  bits.  An  n-­‐‑bit  dual-­‐‑rail  codeword  includes  n/2  information  bits,  and  n/2  complementary  bits.  Or,  it  contains  n/2  bits  of  logic  one,  and  n/2  bits  of  logic  zero.  In  summary,  an  unordered/AUED/DI  code-­‐‑word  includes  a  fixed  number  of  logic  one  bits  (in  a  m-­‐‑hot  or  dual-­‐‑rail  code),  or  a  pre-­‐‑defined  number  of  logic  one  bits  (in  a  Berger  code).

Fig. 2. DMR error detection and clock gating-based soft error-resilient logic. Definition  8.  Unidirectional  faults  are  either  all  stuck-­‐‑at-­‐‑0  or  all  stuck-­‐‑at-­‐‑1  faults.  Theorem  2.  Any  inversion-­‐‑free  logic  network  with  outputs  in  a  DI  code  is  TSC/SFS  with  respect  to  unidirectional  faults  [29].  This  is  because  occurrence  of  unidirectional  faults  chang-­‐‑es  the  number  of  logic  one  bits  at  the  logic  outputs,  lead-­‐‑ing  to  a  non-­‐‑DI-­‐‑codeword  to  be  detected.  Domino  logic  is  a  specific  inversion-­‐‑free  logic.  

4 GROUP-SLICED LOGIC (GSL) In  this  work,  we  study  the  following  problem:  

Problem  1  (Minimum  Logic  of  Single  Error  Detection):  Given  a  combinational  logic  network,  construct  the  minimum  cost  logic  network  of  guaranteed  single  error  detection.  A  reduced  problem  is  as  follows  based  on  Theorem  1.  Problem  2  (Minimum  Cost  Group-­‐‑Sliced  Logic  Net-­‐‑work):  Given  a  combinational  logic  network,  construct  an  equivalent  minimum  cost  group-­‐‑sliced  logic  network  with  out-­‐‑puts  in  a  group  distance-­‐‑two  code.  A  slightly  different  problem  is  as  follows.  Problem  3  (Minimum  Cost  Group  Distance-­‐‑Two  Code):  What  is  the  minimum  length  of  a  group  distance-­‐‑two  code  con-­‐‑sisting  at  least  2n  codewords?  A  few  well  known  linear  code  bounds  include  the  Ham-­‐‑ming  bound,  which  gives  the  lower  bound  for  this  prob-­‐‑lem,  and  the  Gilbert-­‐‑Varshamov  bound,  which  gives  the  upper  bound  (i.e.,  existence  results)  for  this  problem.  We  study  a  more  flexible  problem  wherein  bit  groups  are  allowed  to  have  different  sizes.  We  propose  a  minimum  group  distance-­‐‑two  code  construction  method  (Algorithm  1).    Algorithm  1.  Construct  a  Group  Distance-­‐‑2  Code  Input:     n.  Output:  Group  distance-­‐‑2  code  consisting  at  least  2n  codewords.  1. Construct  a  binary  code  of  length  n  2. Partition  the  binary  codeword  bits  into  k  groups  gi,  

where  0  <  i  <  k,  k  ≥  2  3. Compute  the  k+1-­‐‑th  bit  group  gk  =  ⊕0<i<k  gi,  which  has  

|gk|  bits,  where  |gk|  =  Max0<i<k|gi|  4. Concatenate  the  k  +  1  bit  groups    Theorem  2.  Algorithm  1  gives  a  group  distance-­‐‑2  code.  Proof.  If  two  codewords  differ  by  only  one  group  in  the  first  k  groups,  they  must  also  differ  in  the  k  +  1  group.  As  a  consequence,  they  differ  in  two  groups  and  have  a  group  distance  of  two.  Otherwise,  the  two  codewords  differ  in  at  least  two  groups  in  the  first  k  groups,  and  they  have  a  group  distance  of  at  least  two.  In  summary,  any  two  codewords  constructed  by  Algorithm  1  have  a  group  distance  of  at  least  two.  o  

E.g.,  Table  I  gives  a  group  distance-­‐‑two  code,  wherein  codeword  bits  are  in  three  groups,  each  group  includes  two  bits.  The  first  two  groups  give  a  binary  code,  while  the  third  group  is  the  XOR  result  of  the  first  two  groups.  An  answer  to  Problem  3  is  as  follows.  Corollary  1.  The  minimum  length  of  a  group  distance-­‐‑two  code  of  k  +  1  bit  groups  consisting  at  least  2n  codewords  is  no  more  than  n  +  ⌈n  /  k⌉.  

For  all  possible  group  numbers  k,  the  minimum  length  group  distance-­‐‑two  code  consisting  of  at  least  2n  codewords  is  the  parity  code  of  length  n  +  1,  which  in-­‐‑cludes  only  one  bit  in  each  group.  However,  Problem  2  is  

Page 5: Error-Detecting Code-Based Minimum Logic of Guaranteed Timing/Soft Error Resilience

5

more  difficult  to  solve  than  Problem  3.  Algorithm  1  fur-­‐‑ther  helps  here.  Corollary  2.  The  legality  of  a  group  distance-­‐‑two  code  con-­‐‑structed  by  Algorithm  1  can  be  verified  by  taking  exclusive  OR.    

 Fig. 3. A group-sliced logic network with XOR gates and an OR gate checking the group distance-two code at the logic output and gating the clock.

Corollary  2  provides  an  efficient  codeword  legal-­‐‑ity  check  method  for  a  group  distance-­‐‑2  code  constructed  by  Algorithm  1.  Subsequently,  we  propose  Algorithm  2  which  constructs  a  group-­‐‑sliced  network  with  the  outputs  in  a  group  distance-­‐‑2  code  (Fig.  3).  

 Algorithm  2.  Construct  a  GSL  Network  with  Outputs  in  a  Group  Distance-­‐‑2  Code    Input:     Logic  network  N.  Output:  GSL  network  N’  with  outputs  in  a  group  dis-­‐‑tance-­‐‑two  code.  1. Construct  a  group  distance-­‐‑2  code  for  the  outputs,  

e.g.,  by  Algorithm  1.  2. Synthesize  an  independent  logic  network  for  each  of  

the  k  +1  output  bit  groups.  3. Include  XOR  gates  which  check  the  output  codeword  

legality  and  gate  the  clock.    Theorem  3.  A  logic  stage  constructed  by  Algorithm  2  com-­‐‑bined  with  an  ECC  or  BISER  scheme  achieves  guaranteed  logic  correctness  in  the  presence  of  any  single  soft  error.  Proof.    (1)  Any  single  soft  error  at  the  combinational  logic  network  is  detected  (Theorem  1).  (2)  A  false  positive  (reporting  an  error  while  there  is  no  error)  at  the  errorde-­‐‑tecting  and  clock-­‐‑gating  logic  only  delay  the  function  of  the  circuit.  A  false  negative  (reporting  no  error  while  there  is  an  error)  requires  two  simultaneous  soft  errors,  which  is  beyond  the  scope  of  our  study.  (3)  Any  ECC  or  BISER  scheme  achieves  guaranteed  correction  for  any  single  soft  error  at  the  sequential  elements. o  

A  further  knob  of  optimization  is  in  grouping  the  bi-­‐‑nary  codeword  bits.  We  try  a  few  groupings  for  the  min-­‐‑imum  cost  logic  network  in  our  experiment.  

5 INVERSION-FREE LOGIC (IFL) We  propose  to  construct  Inversion-­‐‑Free  Logic  (IFL)  net-­‐‑works  with  outputs  in  a  DI  code  for  timing  error  resili-­‐‑ence  and  maximum  adaptive  performance  in  nanoscale  computing.  An  IFL  stage  includes:    

1) an  inversion-­‐‑free  functional  combinational  logic  network  with  outputs  in  a  DI  code,    

2) a  DI  codeword  legality  checker  which  checks  the  DI  codewords  at  the  combinational  logic  outputs,  and  generates  a  clock  signal  for  the  receiving  flip-­‐‑flops/latches,  and    

3) the  receiving  flip-­‐‑flops/latches.  IFL  runs  in  a  sequence  of  two  phases:  (1)  pre-­‐‑charge  and  (2)  evaluate  in  a  Domino  logic  implementation,  or  (1)  re-­‐‑set  and  (2)  compute  in  a  static  logic  implementation.  

 Fig. 4. A standard Domino IFL stage, including a standard Domino logic block, which outputs a DI code, and a DI codeword checker, which generates the clock and pre-charge signals.  4.1.  Domino  Logic  In  a  Domino  logic  stage,  the  DI  codeword  checker  gener-­‐‑ates  a  clock  signal  for  the  flip-­‐‑flops/latches,  which  also  provides  the  pre-­‐‑charge  signal  for  the  dynamic  gates  (Fig.  4).  Domino  RAP  logic  runs  as  follows.  

1) A  complete  Domino  logic  computation  yields  a  DI  codeword  at  the  logic  outputs.  

2) The  DI  codeword  legality  checker  outputs  a  ris-­‐‑ing  clock  edge.  

3) The  flip-­‐‑flops/latches  sample  the  inputs.  4) The  Domino  logic  pre-­‐‑charges.  5) The  Domino  logic  outputs  return  to  zero.  6) The  DI  codeword  legality  checker  outputs  a  fall-­‐‑

ing  clock  edge.  7) The  Domino  logic  evaluates.  

 

Page 6: Error-Detecting Code-Based Minimum Logic of Guaranteed Timing/Soft Error Resilience

6

Fig. 5. A standard static IFL logic stage, including (1) in-verting static logic gates (in place of dynamic logic gates in Domino logic), and (2) static NOR gates with a side input of reset (in place of inverters in Domino logic). 4.2.  Static  IFL    Domino  logic  suffers  in  reliability.  At  the  occurrence  of  a  soft  error,  Domino  logic  needs  to  re-­‐‑compute,  while  static  logic  needs  only  to  wait  for  the  soft  error  to  subside,  and  a  valid  codeword  appears  at  the  outputs.  Analogous  to  Domino  logic,  we  propose  static  IFL.  In  standard  static  IFL  (analogous  to  standard  Domino  logic),  each  stage  includes  an  inverting  logic  gate  (analogous  to  a  dynamic  logic  gate  in  standard  Domino  logic),  and  a  NOR  gate  with  a  side  input  of  reset  (analo-­‐‑gous  to  an  inverter  in  standard  Domino  logic)  (Fig.  5).  In  complex  static  IFL  (analogous  to  complex  Domino  logic),  each  stage  includes  inverting  logic  gates  (analogous  to  dynamic  logic  gates  in  complex  Domino  logic),  and  an  additional  inverting  gate  with  a  side  input  of  reset  giving  the  stage  output  (analogous  to  a  static  logic  gate  in  com-­‐‑plex  Domino  logic).  At  the  stage  level,  a  static  IFL  net-­‐‑work  is  inversion-­‐‑free.  Analogous  to  Domino  logic  which  operates  in  a  two-­‐‑phase  sequence  of  pre-­‐‑charge  and  evaluate,  static  IFL  runs  in  a  two-­‐‑phase  sequence  of  reset  and  compute.  Dur-­‐‑ing  compute,  all  signals  are  either  stable  or  rising  at  the  stage  level.  I.e.,  static  IFL  satisfies  the  monotonicity  re-­‐‑quirement.  Static  IFL  runs  as  follows.  

1) A  complete  logic  computation  yields  a  DI  code-­‐‑word  at  the  logic  outputs.  

2) The  DI  codeword  legality  checker  gives  a  rising  output.  

3) The  flip-­‐‑flops/latches  sample  the  inputs.  4) The  combinational  logic  network  resets.  All  the  

stage  outputs  return  to  zero.  5) The  combinational  logic  outputs  return  to  zero.  6) The  DI  codeword  legality  checker  gives  a  falling  

output.  The  clock  returns  to  zero.  7) The  combinational  logic  network  computes.  

4.3.  Analysis  Theorem  4.  In  the  absence  of  external  soft  errors  (e.g.,  noises  and  SEUs),  IFL  is  self-­‐‑timing  (i.e.,  it  detects  logic  computation  completion).  Proof.  

1) Logic  computation  in  an  inversion-­‐‑free  logic  network  (e.g.,  at  the  stage  level  in  RAP  logic)  af-­‐‑ter  reset  is  monotonic,  i.e.,  all  signals  are  either  stable  or  rising  (hence  glitch-­‐‑free),  in  the  absence  of  external  soft  errors.  

2) Timing  errors  are  unidirectional  (stuck-­‐‑at-­‐‑0)  faults  in  monotonic  (rising)  logic  computation.  

3) An  inversion-­‐‑free  logic  network  with  outputs  in  DI  code  detects  all  unidirectional  faults  (Theorem  3.10),  including  all  timing  errors  in  logic  compu-­‐‑

tation  after  reset  in  the  absence  of  external  soft  errors. o  

In  the  presence  of  external  soft  errors  (e.g.,  noises  and  SEUs),  we  have  further  observations  as  follows.  

1) In  the  presence  of  stuck-­‐‑at-­‐‑0  external  soft  errors,  IFL  is  self-­‐‑timing  and  errorresilient.    

2) In  the  presence  of  bidirectional  errors,  e.g.,  a  stuck-­‐‑at-­‐‑1  external  soft  error  and  a  timing  error,  the  error  rate  of  IFL  is  given  by  the  probability  that  an  incorrect  codeword  appears  at  the  logic  output.  

3) To  remove  stuck-­‐‑at-­‐‑1  external  soft  errors  (e.g.,  noises  and  SEUs),  we  need  other  techniques,  e.g.,  shielding  to  remove  coupling  noises,  and  enhanc-­‐‑ing  of  the  NMOSFETs  in  dynamic  gates  and  the  PMOSFETs  in  static  gates  to  resist  all  stuck-­‐‑at-­‐‑1  SEUs.  

4.4.  IFL  Synthesis  Algorithm  3  gives  our  IFL  synthesis  algorithm.  To  pre-­‐‑vent  logic  synthesis  from  removing  the  functionally-­‐‑redundant  check  logic,  we  synthesize  the  DI  codeword  check  logic  separately  from  the  function  logic.  Algorithm  4  gives  an  inversion  two-­‐‑colorable  logic  net-­‐‑work  for  complex  static  or  Domino  logic.  Applying  bub-­‐‑ble  pushing  based  on  De  Morgan’s  theorem  subsequently  gives  an  inversion-­‐‑free  logic  network  for  standard  static  or  Domino  logic.  If  complex  logic  gates  are  available,  merging  two  layers  of  inverting  gates  to  form  a  non-­‐‑inverting  stage  also  gives  a  stage-­‐‑level  inversion-­‐‑free  logic  network.  

In  an  inversion  two-­‐‑colorable  logic  network,  each  node  can  be  colored  in  one  of  two  colors,  such  that  nodes  across  an  inverting  gate  are  in  different  colors,  while  nodes  across  a  non-­‐‑inverting  gate  are  in  the  same  color.  If  a  logic  network  is  not  inversion  two-­‐‑colorable,  it  must  contain  a  loop  of  an  odd  number  of  inverting  gates,  for  example,  a  reconvergent  fanout  where  a  signal  propaga-­‐‑tion  path  of  an  even  number  of  inverting  gates  and  an-­‐‑other  signal  propagation  path  of  an  odd  number  of  in-­‐‑verting  gates  diverge  before  reconverge.  We  need  to  break  such  a  loop  by  including  duplicate  logic,  which  brings  hardware  overhead.  Note  that  each  gate  needs  at  most  one  duplicate,  which  is  in  a  different  color  than  the  original  gate.  To  resolve  subsequent  color  conflicts  in-­‐‑volving  any  of  the  two  gates,  we  only  needs  to  wire  to  the  gate  of  the  needed  color.  

Any  Boolean  logic  can  be  realized  in  an  inver-­‐‑sion-­‐‑free  or  inversion  two-­‐‑colorable  logic  network  by  Al-­‐‑gorithm  2.  This  is  also  proven  in  literature.  Theorem  5.  Any  logic  can  be  realized  by  inversion-­‐‑free  logic  if  the  inputs  and  the  outputs  are  in  DI  codes  [21].  For  example,  any  logic  can  be  realized  in  an  inversion-­‐‑free  network  if  all  inputs  and  outputs  are  given  with  their  complements,  i.e.,  in  dual-­‐‑rail  code.  

Page 7: Error-Detecting Code-Based Minimum Logic of Guaranteed Timing/Soft Error Resilience

7

Algorithm  3.  IFL  Logic  Synthesis  Input:     Design  specifications.  Output:  IFL  logic  stage.  1. Synthesize  functional  logic  with  inputs  and  outputs  

in  DI  codes  2. Synthesize  and  include  codeword  check  logic,  which  

generates  the  clock  signal  3. Transform  to  an  inversion-­‐‑free  or  inversion  two-­‐‑

colorable  logic  network  by  Algorithm  2      Algorithm  4.  Generate  an  Inversion  Two-­‐‑Colorable  Logic  Network  Input:     Logic  network  N.  Output:  Inversion  two-­‐‑colorable  network  N′  of  equal  log-­‐‑ic.  1. Traverse  network  N  from  the  outputs;  2. Color  each  node  according  to  logic  inversion;  for  each  node  i  of  color  conflict  do  3. Find  or  create  a  duplicate  node  i′  and  its  fanin  cone;  4. Re-­‐‑wire  to  the  duplicate  node  i′;  5. Color  the  original  and  duplicate  nodes  i  and  i′  differ-­‐‑

ently;  end  

 To  synthesize  dual-­‐‑rail  RAP  logic  based  on  a  

minimum  area  logic  network,  we  applyAlgorithm  4  with  all  nodes  marked  of  color  conflict  to  duplicate  the  entire  minimum  area  logic  network,  before  including  the  dual-­‐‑rail  codeword  check  logic.  Signal  transitions  in  the  dual-­‐‑rail  code  check  logic  are  monotonically  rising  and  glitch-­‐‑free  during  evaluation/computation.    

6 EXPERIMENTS In  this  section,  we  evaluate  GSL  and  IFL  and  compare  them  with  some  of  the  existing  logic  styles  such  as  mini-­‐‑mum  area  CMOS  logic,  DMR,  and  Domino  logic  in  terms  of  area,  timing  performance  and  power  consumption.  We  perform  logic  synthesis  based  on  Synopsys  Design  Vision  and  the  45nm  Nangate  Open  Cell  Library  [27].  Our  test  cases  include  a  10-­‐‑state  FSM  (fsm  cc8  2.v  in  [10]),  a  16-­‐‑state  FSM  (prep4.v  in  [17]),  a  30-­‐‑state  FSM  (modified  from  Control  Unit.v  in  [9]),  and  a  30/70-­‐‑state  asynchronous  up/down  counter.  

We  first  evaluate  GSL.  For  each  FSM,  we  perform  logic  synthesis  for  a  number  of  finite  state  encoding  schemes  as  follows.  

1) Binary  finite  state  encoding,  achieving  minimum  area  with  no  soft  error  correction  capability.  

2) DMR,  including  a  duplicate  of  the  next  state  log-­‐‑ic,  a  group  of  XOR  gates  and  an  OR  gate  for  iden-­‐‑tity  check  at  the  next  state  logic  outputs,  and  a  

clock  gate.  We  include  a  single  copy  of  the  out-­‐‑put  logic.  

3) Group  distance-­‐‑2  finite  state  encoding,  with  the  next  state  logic  sliced  in  k  groups,  a  group  of  XOR  gates  and  an  OR  gate  for  codeword  legality  check  and  a  clock  gate.  

Table  I  gives  the  layout  areas,  the  critical  path  delays,  and  the  power  consumptions  of  these  FSM  implementations  (in  absolute  numbers  and  in  ratios  to  the  minimum  area  implementations),  respectively.  Our  observations  are  as  follows.  

1) The  minimum  logic  network  area  (besides  the  minimum  codeword  length)  is  achieved  with  the  group  size  |gk|  =  1  among  all  the  group-­‐‑sliced  logic  networks  with  outputs  in  a  group  distance-­‐‑two  code  for  these  5  FSM  test  circuits.  

2) GSL  achieves  guaranteed  single  soft  error  resili-­‐‑ent  logic  networks  of  an  average  of  1.63×  area,  1.63×  critical  path  delay,  and  2.17×  power  con-­‐‑sumption,  while  DMR  achieves  an  average  of  2.12×  area,  1.26×  critical  path  delay,  and  2.79×  power  consumption  compared  with  the  mini-­‐‑mum  area  design  for  these  5  FSM  test  circuits.  

 TABLE 1. AREA (µm2), CRITICAL PATH DELAY (ns), AND POWER CONSUMPTION (µW) (IN ABSOLUTE NUMBERS AND RATIOS TO THE MINIMUM AREA IM-PLEMENTATION) OF FSM IMPLEMENTATIONS WITH BINARY, DMR, AND GROUP DISTANCE-TWO FINITE STATES ENCODINGS.

10-state FSM (fsm cc8 2.v [10]) Area (µm2)

Delay (ns)

Power (µW)

Binary 81.56 2.60 2.26 DMR 166.98 3.26 4.51 [1,1,1,1,1] dist-2 117.24 3.30 3.17 [2,2,2] dist-2 131.30 3.76 3.31 16-state FSM (prep4.v [17]) Binary 183.21 2.78 2.12 DMR 354.42 3.16 8.17 [1,1,1,1,1] dist-2 329.90 5.02 6.99 [2,2,2] dist-2 356.60 4.86 7.44 30-state counter Binary 124.87 2.46 2.29 DMR 257.89 3.22 5.48 [1,1,1,1,1,1] dist-2 184.41 3.71 4.02 [2,2,2,2] dist-2 209.70 4.46 4.00 [2,3,3] dist-2 208.72 4.38 4.40 30-state FSM (Control Unit.v [9]) Binary 178.52 2.59 2.40 DMR 316.33 3.37 6.75 [1,1,1,1,1,1] dist-2 310.24 4.93 6.11 [2,2,2,2] dist-2 344.51 5.24 7.10 [2,3,3] dist-2 324.61 4.60 6.47 70-state counter Binary 194.65 3.13 3.12 DMR 544.44 4.14 9.08 [1,1,1,1,1,1,1,1] dist-2 334.41 5.16 5.72 [2,2,2,2,2] dist-2 403.15 5.53 5.99 [3,4,4] dist-2 456.58 5.67 5.93

Next,  we  evaluate  IFL.  We  have  implemented  

Algorithm  2  with  (or  without)  bubble  pushing  in  C.  By  running  such  a  C  program,  we  transform  a  minimum  area  logic  network  synthesized  by  Synopsys  Design  Vision  to  an  inversion-­‐‑free  (or  inversion  two-­‐‑colorable)  logic  net-­‐‑

Page 8: Error-Detecting Code-Based Minimum Logic of Guaranteed Timing/Soft Error Resilience

8

work  for  the  standard  (or  complex)  static/Domino  logic.  We  verify  the  logic  correctness  by  Synopsys  VCS,  and  evaluate  its  area,  timing  performance  and  power  con-­‐‑sumption  by  Synopsys  Design  Vision.  We  base  on  the  following  cell  libraries.  

1) For  standard  static  logic,  we  base  on  the  Nangate  45nm  open  cell  library  [27].  

2) For  complex  static  logic,  we  have  generated  a  cell  library  by  merging  the  static  logic  Nangate  cell  library  and  a  resettable  static  cell  library.  We  generated  the  resettable  static  cell  library  by  in-­‐‑cluding  an  additional  side  input  of  reset  for  each  cell,  and  updates  its  area  and  delays  based  on  the  closest  cell  in  the  Nangate  cell  library  (e.g.,  a  NOR2  in  the  resettable  static  cell  library  is  based  on  a  NOR3  in  the  Nangate  cell  library).  The  pow-­‐‑er  consumption  estimates  are  intact.  

3) For  standard  Domino  logic,  we  have  generated  a  footed  standard  Domino  logic  cell  library  based  on  the  Nangate  45nm  open  cell  library.  We  up-­‐‑dated  the  area  of  a  Domino  logic  cell  according  to  the  transistor  count  (which  drops  from  2n  to  n+2  for  a  footed  inverting  dynamic  gate),  because  cell  area  is  largely  proportional  to  the  transistor  count  in  the  Nangate  cell  library.  We  updated  the  delays  of  a  Domino  logic  cell  as  follows.  The  ris-­‐‑ing(falling)  delays  of  an  inverting(non-­‐‑inverting)  cell  are  given  by  those  of  an  inverter(buffer).  The  falling(rising)  delays  of  an  invert-­‐‑ing(noninverting)  cell  are  given  by  those  of  the  closest  cell  with  an  extra  NMOSFET  (e.g.,  ap-­‐‑proximate  the  falling  delays  of  a  NAND2  gate  in  Domino  logic  by  those  of  a  NAND3  gate  in  static  logic).  The  power  consumption  estimates  are  in-­‐‑tact.  

4) For  complex  Domino  logic,  we  have  generated  a  cell  library  by  merging  the  standard  Domino  log-­‐‑ic  cell  library  and  the  static  logic  Nangate  cell  li-­‐‑brary.  

We  count  only  rising  signal  delays  in  a  standard  stat-­‐‑ic/Domino  logic  network,  and  only  delays  of  a  path  which  ends  with  a  rising  signal  transition  in  a  complex  stat-­‐‑ic/Domino  logic  network.  For  timing  analysis,  we  keep  an  external  clock,  and  have  the  generated  clock  as  a  critical  path  endpoint.  

For  each  of  the  FSM  testbench  circuit,  we  compare  7  logic  implementations:  

1) The  traditional  minimum  area  static  logic  im-­‐‑plementation  where  finite  states  are  encoded  in  a  binary  code.  

2) IFL  based  on  dual-­‐‑rail  complex  static  logic.  3) IFL  based  on  standard  static  logic  with  an  inver-­‐‑

sion-­‐‑free  logic  network  with  outputs  in  a  m-­‐‑hot  code.  

4) IFL  based  on  complex  static  logic  with  an  inver-­‐‑sion  two-­‐‑colorable  logic  network  with  outputs  in  a  m-­‐‑hot  code.  

5) IFL  based  on  dual-­‐‑rail  complex  Domino  logic.  6) IFL  based  on  standard  Domino  logic  with  an  in-­‐‑

version-­‐‑free  logic  network  with  outputs  in  a  m-­‐‑hot  code.  

7) IFL  based  on  complex  Domino  logic  with  an  in-­‐‑version  two-­‐‑colorable  logic  network  with  outputs  in  a  m-­‐‑hot  code.  

Table  II  gives  the  total  layout  area  (µμm2),  critical  path  delay(ns),  and  power  consumption  (µμW)  of  these  imple-­‐‑mentations.  

The  binary  encoding  static  logic  implementations  are  evaluated  based  on  the  worst  case  (the  slow  cell  library)  while  the  IFL  implementations  are  evaluated  based  on  the  average  case  (the  typical  cell  library).  This  is  because  the  traditional  design  methodology  relies  on  guardbanding,  i.e.,  the  worst  case  critical  path  delay  needs  to  be  less  than  the  external  clock  cycle  time,  while  IFL  achieves  adaptive  high  performance.  Our  observations  are  as  follows.  

1) Standard  (complex)  Domino  logic  achieves  less  area  for  all  the  test  cases  and  higher  performance  for  most  of  the  test  cases4  with  comparable  pow-­‐‑er  consumption  compared  with  standard  (com-­‐‑plex)  static  logic.  On  the  other  hand,  static  logic  achieves  enhanced  reliability  compared  with  Domino  logic:  at  a  soft  error  occurrence,  static  logic  needs  only  to  hold,  while  Domino  logic  needs  to  re-­‐‑compute.  

2) Complex  static  (Domino)  logic  achieves  less  area,  higher  performance  and  comparable  power  con-­‐‑sumption  for  all  the  test  cases  compared  with  standard  static  (Domino)  logic.  

3) While  m-­‐‑hot  complex  static  (Domino)  IFL  achieves  the  minimum  area  for  the  two  smallest  test  cases,  dual-­‐‑rail  static  (Domino)  IFL  achieves  the  minimum  area,  the  maximum  performance  and  the  minimum  power  consumption  for  the  two  largest  test  cases.  This  is  partly  because  (1)  m-­‐‑hot  finite  state  encoding  requires  a  larger  area  than  binary  finite  state  encoding  for  certain  de-­‐‑signs,  e.g.,  a  counter,  and  (2)  constructing  an  in-­‐‑version-­‐‑free  logic  network  needs  to  duplicate  more  gates  for  a  larger  and  more  complex  logic  network.  In  average,  dual-­‐‑rail  static  (Domino)  IFL  achieves  2.29(2.41)×  performance  boost,  2.12(1.91)×  layout  area  and  2.38(2.34)×  power  consumption,  while  m-­‐‑hot  complex  static  (Dom-­‐‑ino)  IFL  achieves  2.19(2.22)×  performance  boost,  1.98(1.94)×  layout  area  and  4.73(4.75)×  power  consumption  compared  with  the  traditional  min-­‐‑imum  area  static  logic  synchronous  design  for  

Page 9: Error-Detecting Code-Based Minimum Logic of Guaranteed Timing/Soft Error Resilience

9

the  5  FSMs  based  on  the  Nangate  45nm  open  cell  library.  

4) Dual-­‐‑rail  static/Domino  logic  provides  further  enhanced  soft  error  resilience.  Recall  that  dual-­‐‑rail  code  is  not  only  a  DI  code,  but  also  a  b-­‐‑byte  distance-­‐‑two  code.  Although  a  dual-­‐‑rail  stat-­‐‑ic/Domino  logic  network  is  not  a  b-­‐‑byte  sliced  logic  network  [Smith  and  Metze  1978],  it  is  a  symmetrically  intertwined  logic  network.  A  sin-­‐‑gle  error  cannot  alter  an  internal  logic  function  and  its  complement  at  the  same  time.  As  a  result,  a  dual-­‐‑rail  static/Domino  logic  network  detects  all  single  errors  and  the  majority  of  multiple  er-­‐‑rors.  

5) Besides  achieving  performance  boost  and  relia-­‐‑bility  enhancement,  these  logic  paradigms  can  al-­‐‑so  lead  to  power  reduction  by  scaling  down  the  power  supply  voltage  and  trading  performance  for  power  consumption  reduction.  The  achieved  reliability  enhancement  enables  such  power  sup-­‐‑ply  voltage  scaling.  

6) IFL  clears  all  timing  errors  in  the  absence  of  ex-­‐‑ternal  soft  errors,  albeit  at  a  higher  area/power  cost  compared  with  Razor  logic.  

 TABLE 2. COMPARISON OF TRADITIONAL STATIC LOGIC AND IFL LOGIC DESIGNS IN TOTAL LAYOUT AREA (µm2), CRITICAL PATH DELAY (ns), AND TOTAL POWER CONSUMPTION (µW) FOR 5 FSMS BASED ON THE NANGATE 45nm OPEN CELL LIBRARY.

10-state FSM (fsm cc8 2.v [10])  

#FF   Area (µm2)  

Delay (ns)  

Power (µW)

Binary Static   4   81.56   2.60   1.60 Dual-Rail Static IFL   4   167.52   1.02   3.20 2-Hot Standard Static IFL   5   269.78   1.28   5.14 2-Hot Complex Static IFL   5   242.47   1.24 4.24  Dual-Rail Domino IFL   4   150.76   1.01   3.13 2-Hot Standard Domino IFL   5   263.84   1.26   4.76 2-Hot Complex Domino IFL   5   216.67   1.22   4.22 16-state FSM (prep4.v [17]) Binary Static 4 183.21 2.78 2.12 Dual-Rail Static IFL 4 390.14 1.01 6.76 3-Hot Standard Static IFL 6 566.88 1.27 9.58 3-Hot Complex Static IFL 6 507.58 1.18 8.56 Dual-Rail Domino IFL 4 350.75 0.93 6.71 3-Hot Standard Domino IFL 6 523.79 1.25 9.38 3-Hot Complex Domino IFL 6 453.05 1.15 8.37 30-state counter Binary Static 5 124.87 2.46 2.29 Dual-Rail Static IFL 5 254.80 1.17 4.57 3-Hot Standard Static IFL 7 517.77 1.68 7.91 3-Hot Complex Static IFL 7 498.48 1.48 7.42 Dual-Rail Domino IFL 5 230.06 1.15 4.50 3-Hot Standard Domino IFL 7 457.53 1.55 7.69 3-Hot Complex Domino IFL 7 425.60 1.32 6.83 30-state FSM (Control Unit.v [9]) Binary Static 5 178.52 2.59 2.40 Dual-Rail Static IFL 5 395.67 1.38 5.27 3-Hot Standard Static IFL 7 473.17 1.59 15.58 3-Hot Complex Static IFL 7 428.90 1.40 12.87 Dual-Rail Domino IFL 5 354.17 1.26 5.15 3-Hot Standard Domino IFL 7 467.35 1.46 15.57 3-Hot Complex Domino IFL   7 422.25 1.42 12.97 70-state counter Binary Static 7 194.65 3.13 3.12 Dual-Rail Static IFL 7 421.13 1.45 7.84 4-Hot Standard Static IFL 8 480.62 1.95 15.85

4-Hot Complex Static IFL 8 435.32 1.66 16.01 Dual-Rail Domino IFL 7 379.37 1.36 7.67 4-Hot Standard Domino IFL 8 460.85 1.60 15.85 4-Hot Complex Domino IFL 8 425.51 1.57 16.06

7 CONCLUSION Our  contributions  in  this  work  are  as  follows.    

1) We  achieve  minimum  logic  networks  of  guaran-­‐‑teed  single  soft  error  resilience.  We  propose  two  construction  methods  for  minimum  group  dis-­‐‑tance-­‐‑two  code,  and  group-­‐‑sliced  logic  (GSL)  networks  with  outputs  in  a  group  distancetwo  code,  respectively.  We  achieve  guaranteed  single  soft  error  resilient  logic  networks  of  an  average  of  1.63×  area,  1.63×critical  path  delay,  and  2.17×  power  consumption,  while  DMR  achieves  an  av-­‐‑erage  of  2.12×  area,  1.26×  critical  path  delay,  and  2.79×  power  consumption  compared  with  the  minimum  area  design  for  the  5  FSM  test  circuits.    

2) We  propose  IFL,  which  provides  a  reliable  high  performance  nanoscale  computing  paradigm  by  achieving  adaptive  high  performance  and  soft  er-­‐‑ror  resilience  in  the  presence  of  performance  var-­‐‑iability  and  soft  errors.  IFL  is  easy  to  implement  in  the  existing  VLSI  design  methodologies.  An  IFL  stage  forms  a  synchronous  system,  or  a  syn-­‐‑chronous  module  in  a  GALS  system.  We  expect  continuing  VLSI  performance  scaling  with  the  proposed  soft/timing  error-­‐‑resilient  VLSI  design  methodologies.  

ACKNOWLEDGMENT This work was supported in part by NSF under grant CCF-1117975.

REFERENCES [1] D. B. Armstrong, “A General Method of Applying Error Correction to

Synchronous Digital Systems,” The Bell System Technical Journal, 40(2), pp. 557-593, 1961.

[2] A. Avizienis, H. Kopetz, and J.-C. Laprie. “The evolution of fault-tolerant computing,” Dependable Computing and Fault Tolerant Sys-tems, 1, 1987.

[3] R. Blish, T. Dellin, S. Huber et al., “Critical Reliability Challeng-es for the International Technology Roadmap for Semiconduc-tors (ITRS),” International SEMATECH. Technology Transfer No. 03024377A-TR, 2003.

[4] C. Bolchini and D. Sciuto, “An output/state encoding for self-checking finite state machine,” In Proc. IEEE Intl. Symp. Circuits and Systems, pp. 2136–2139, 1995.

[5] S. Borkar, “Designing Reliable Systems from Unreliable Compo-nents: The Challenges of Transtor Variability and Degradation,” IEEE Micro, pp. 10-16, 2005.

[6] B. Bose and T. R. N. Rao, “Theory of unidirectional error correct-ing/detecting codes,” IEEE Trans. Computers, C-31(6):521–530, 1982.

Page 10: Error-Detecting Code-Based Minimum Logic of Guaranteed Timing/Soft Error Resilience

10

[7] K. A. Bowman, J. W. Tschanz, S.-L. L. Lu, P. A. Aseron, M. M. Khellah, A. Raychowdhury, B. M. Geuskens, C. Tokunaga, C. B. Wilkerson, T. Karnik, and V. K. De, “A 45nm resilient microproces-sor core for dynamic variation tolerance,” IEEE J. Solid State Cir-cuits, 46(1):194–208, 2011.

[8] T. J. Brosnan and N. R. Strader II. “Modular error detection for bit-serial multiplication,” IEEE Trans. Computers, 37(9):1043–1052, 1988.

[9] M. D. Ciletti. Advanced Digital Design with the Verilog HDL. Pren-tice Hall, 2003.

[10] C. E. Cummings, “The fundamentals of efficient synthesizable finite state machine design using NC-Verilog and BuildGates,” In Proc. In-ternational Cadence Usergroup Conference, 2002.

[11] K. De, C. Natarajan, D. Nair, and P. Banerjee, “Rsyn: A. system for automated synthesis of reliable multilevel circuits,” IEEE Trans. VLSI Systems, 2:186–195, 1994.

[12] P. G. Depledge, “Fault-tolerant computer systems,” IEE Proc. A, 128(4):257–272, 1981.

[13] S. A. Elkind and D. P. Siewiorek, “Reliability and performance of error correcting memory and register arrays,” IEEE Trans. Comput-ers, C-29(10):920–927, 1980.

[14] D. Ernst, S. Das, S. Lee, D. Blaauw, T. Austin, T. Mudge, N. S. Kim, and K. Flautner, “Razor: Circuit-level correction of timing errors for low-power operation,” IEEE MICRO special issue on Top Picks From Microarchitecture Conferences of 2004, 24(6):10–20, 2004.

[15] R. Garg, P. Mathews, and D. Zacher, “Synthesis of fault tolerant circuits for FSMs and RAMs,” In Proc. MAPLD International Con-ference, 2009.

[16] M. Goessel, V. Ocheretny, E. Sogomonyan, and D. Marienfeld. New Methods of Concurrent Checking, Springer, 2008.

[17] S. Golson. “State machine design techniques for Verilog and VHDL,” In Synopsys User Group Conference (SNUG), 1994.

[18] S. Hauck, “Asynchronous design methodologies: An overview,” Proceedings of the IEEE, 83(1):69–93, 1995.

[19] N. K. Jha and S. J. Wang, “Design and synthesis of self-checking VLSI circuits,” IEEE Trans. Computer-Aided Design, 12:878–887, 1993.

[20] B. Liu, “Error-detecting/correcting-code based robust nanoelectronic circuits,” In Proc. NASA/ESA Conference on Adaptive Hardware and Systems, 2010.

[21] G. Mago, “Monotone functions in sequential circuits,” IEEE Trans. Computers, C-22:928–933, 1973.

[22] S. Mitra, M. Zhang, N. Seifert, T. M. Mak, and K. S. Kim, “Soft error resilient system design through error correction,” In IFIP VLSI-SoC, 2006.

[23] T. K. Moon, Error Correction Coding: Mathematical Methods and Algorithms, Wiley-Interscience, 2005.

[24] D. E. Muller and W. S. Bartky, “A theory of asynchronous circuits,” In Proc. International Symposium on the Theory of Switching, pp. 204–243, 1959.

[25] K. Pagiamtzis, N. Azizi, and F. N. Najm, “A soft-error tolerant con-tentaddressable memory (CAM) using an error-correcting-match scheme,” In Proc. Custom Integrated Circuits Conf., pp. 301–304, 2006.

[26] C. B. Schlegel and M. A. Herro, “A burst-error-correcting Viterbi algorithm,” IEEE Trans. Communications, 38(3):285–291, 1990.

[27] Silicon Integration Initiative (SI2), Nangate Open Cell Library. www.si2.org/openeda.si2.org/projects/nangateli.

[28] M. Singh and S. M. Nowick, “MOUSETRAP: Ultra-high-speed tran-sition signaling asynchronous pipelines,” 15(6):684–698, 2007.

[29] J. E. Smith and G. Metze, “Strongly fault secure logic networks,” IEEE Trans. Computers, 27(6):491–499, 1978.

[30] E. S. Sonomonyan, “Design of built-in self-checking monitoring circuits for combinational devices,” Automation and Remote Control, 35(2):280–289, 1974.

[31] F. Sun, S. Devarajan, K. Rose, and T. Zhang, “Multilevel flash memory on-chip error correction based on trellis coded modulation,” 2006.

[32] I. E. Sutherland, “Micropipelines,” Communications of the ACM, 32(6):720–738, 1989.

[33] T. Verhoeff, “Delay-insensitive codes – an overview,” Distributed Computing, 3:1–8, 1988.

[34] N. H. E. West and D. M. Harris. CMOS VLSI Design: A Circuits and Systems Perspective, 4th Edition, Addison-Wesley, 2011.

[35] M. Zhang, S. Mitra, et al., “Sequential element design with build-in soft error resilience,” IEEE Trans. VLSI Systems, 14(12):1368–1378, 2006.

[36] J. Zheng, S. Katanyoutanant, and M. Le, “Safe and efficient one-hot state machine,” In Proc. MAPLD International Conference, 2005.

B. Liu (BS’93,MS’96,Ph.D.’03) is an assistant professor at the Uni-versity of Texas at San Antonio. He serves as co-chair for the Emerging/Innovative Process & Device Technologies and Design Issues (EDT) session in International Symposium on Quality Elec-tronic Design (ISQED) since 2006, co-chair of the Photovoltaics Technology session in Asia Symposium on Quality Electronic Design (ASQED) since 2010, and invited session chair for Emerging Nano-Circuits and Systems in IEEE International Midwest Symposium on Circuits and Systems (MWSCAS) in 2010. Dr. Liu is the receipient of a Best Paper Award in International Conference on Computer De-sign in 2005, and a Best Research Award in UCSD Research Re-view 2002. He has published over 50 journal articles and confer-ences papers. His research interests include VLSI variability and reliability analysis, robust, high performance and low power design, nanoelectronic architecture, and emerging technologies. Dr. Liu is a senior memberof IEEE. L. Wang (BS’09,MS’11) is currently pursuing her Ph.D. degree at the University of Texas at San Antonio. Her research interests in-clude VLSI statistical timing analysis, delay test, and performance optimization. She is a student member of the IEEE. F. Teshome (BS’09,MS’11) currently works with IBM at Tucson, AZ. She is a member of the IEEE.