67
14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 1 of 86 A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep-Neural-Network Engine with >0.1 Timing Error Rate Tolerance for IoT Applications Paul Whatmough , S. K. Lee, H. Lee, S. Rama, D. Brooks, G.-Y. Wei Harvard University, Cambridge, MA

A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

  • Upload
    letuong

  • View
    213

  • Download
    1

Embed Size (px)

Citation preview

Page 1: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 1 of 86

A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network

Engine with >0.1 Timing Error Rate Tolerance for IoT Applications

Paul Whatmough, S. K. Lee,

H. Lee, S. Rama, D. Brooks, G.-Y. Wei Harvard University, Cambridge, MA

Page 2: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 2 of 86

Motivation

Page 3: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 3 of 86

Motivation

•  Deep  Neural  Networks  (DNNs)  –  A  universal  classifier  with  many  diverse  applica5ons  –  Interpret  noisy  real-­‐world  data  from  sensor-­‐rich  systems  

Page 4: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 4 of 86

Motivation

•  Deep  Neural  Networks  (DNNs)  –  A  universal  classifier  with  many  diverse  applica5ons  –  Interpret  noisy  real-­‐world  data  from  sensor-­‐rich  systems  –  Large  memory  footprint  and  compute  load  –  Perform  DNN  inference  on  device  in  micro-­‐Joules  

Page 5: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 5 of 86

Motivation

•  DNN  ENGINE  –  A  Shared  Programmable  Classifier  –  Specialized  Architecture  and  Efficient  Digital  Circuits  

Inpu

t  Vector

Outpu

t  Classes

DNN  ENGINE

Hidden  Layers

Outputs

+DNN  Topology

+Weights

Inputs

Sensor  Data

WeightNeuron Classification

Argmax

Page 6: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 6 of 86

Motivation

•  DNN  ENGINE  –  A  Shared  Programmable  Classifier  –  Specialized  Architecture  and  Efficient  Digital  Circuits  –  Parallelism:  Balanced  memory  bandwidth  and  throughput  

Inpu

t  Vector

Outpu

t  Classes

DNN  ENGINE

Hidden  Layers

Outputs

+DNN  Topology

+Weights

Inputs

Sensor  Data

WeightNeuron Classification

Argmax

Page 7: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 7 of 86

Motivation

•  DNN  ENGINE  –  A  Shared  Programmable  Classifier  –  Specialized  Architecture  and  Efficient  Digital  Circuits  –  Parallelism:  Balanced  memory  bandwidth  and  throughput  –  Sparsity:  HW  support  for  sparse  data  and  small  datatypes  

Inpu

t  Vector

Outpu

t  Classes

DNN  ENGINE

Hidden  Layers

Outputs

+DNN  Topology

+Weights

Inputs

Sensor  Data

WeightNeuron Classification

Argmax

Page 8: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 8 of 86

Motivation

•  DNN  ENGINE  –  A  Shared  Programmable  Classifier  –  Specialized  Architecture  and  Efficient  Digital  Circuits  –  Parallelism:  Balanced  memory  bandwidth  and  throughput  –  Sparsity:  HW  support  for  sparse  data  and  small  datatypes  –  Resilience:  Razor  adapta5on  to  minimize  PVT  margins  

Inpu

t  Vector

Outpu

t  Classes

DNN  ENGINE

Hidden  Layers

Outputs

+DNN  Topology

+Weights

Inputs

Sensor  Data

WeightNeuron Classification

Argmax

Page 9: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 9 of 86

Outline

•  Mo5va5on    

•  DNN  Engine  – Parallelism  – Sparsity  – Resilience  

 

•  Measurement  Results    

•  Summary  

Page 10: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 10 of 86

Fully-Connected DNN Graph Inpu

t  Vector  

Outpu

t  Classes  

Hidden  Layers  

Input  Layer   Output  Layer  

Page 11: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 11 of 86

Fully-Connected DNN Graph Inpu

t  Vector  

Outpu

t  Classes  

Neuron  Weight  

Page 12: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 12 of 86

Fully-Connected DNN Graph Inpu

t  Vector  

Outpu

t  Classes  

SIMD  Window  

Page 13: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 13 of 86

Fully-Connected DNN Graph Inpu

t  Vector  

Outpu

t  Classes  

Weights  

Page 14: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 14 of 86

Fully-Connected DNN Graph Inpu

t  Vector  

Outpu

t  Classes  

Page 15: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 15 of 86

Fully-Connected DNN Graph Inpu

t  Vector  

Outpu

t  Classes  

Page 16: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 16 of 86

Fully-Connected DNN Graph Inpu

t  Vector  

Outpu

t  Classes  

Page 17: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 17 of 86

Fully-Connected DNN Graph Inpu

t  Vector  

Outpu

t  Classes  

Page 18: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 18 of 86

Fully-Connected DNN Graph Inpu

t  Vector  

Outpu

t  Classes  

Page 19: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 19 of 86

Fully-Connected DNN Graph Inpu

t  Vector  

Outpu

t  Classes  

Page 20: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 20 of 86

Fully-Connected DNN Graph Inpu

t  Vector  

Outpu

t  Classes  

Page 21: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 21 of 86

Fully-Connected DNN Graph Inpu

t  Vector  

Outpu

t  Classes  

Page 22: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 22 of 86

0"

10"

20"

30"

0"0.2"0.4"0.6"0.8"1"

1" 4" 8" 16" 32"

Weight"R

eads"/"Cycle"

Data"Reads"(N

orm.)"

SIMD"Width"(Words)"

Balancing Efficiency and Bandwidth

•  Parallelism  increases  throughput  and  data  reuse  

No  Data  Reuse  

Page 23: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 23 of 86

0"

10"

20"

30"

0"0.2"0.4"0.6"0.8"1"

1" 4" 8" 16" 32"

Weight"R

eads"/"Cycle"

Data"Reads"(N

orm.)"

SIMD"Width"(Words)"

Balancing Efficiency and Bandwidth

Bandwidth  too  high  

•  Parallelism  increases  throughput  and  data  reuse  –  But  also  increases  Memory  Bandwidth  demands  

No  Data  Reuse  

Page 24: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 24 of 86

0"

10"

20"

30"

0"0.2"0.4"0.6"0.8"1"

1" 4" 8" 16" 32"

Weight"R

eads"/"Cycle"

Data"Reads"(N

orm.)"

SIMD"Width"(Words)"

Balancing Efficiency and Bandwidth

•  Parallelism  increases  throughput  and  data  reuse  –  But  also  increases  Memory  Bandwidth  demands  

•  8-­‐Way  SIMD  is  efficient  at  reasonable  memory  BW  –  10x  Ac5va5on  reuse,  with  128b  AXI  channel  

Bandwidth  too  high  

No  Data  Reuse  

128b

/cycle  

Page 25: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 25 of 86

DNN ENGINE Micro-Architecture

MAC  DatapathWeight  Load

NBUF

IPBUF

XBUF

Data  In

Data  Out

CFG

ActivationDMA

Host  Processor

Writeback

Data  Read

W-­‐MEM

Page 26: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 26 of 86

MAC  DatapathWeight  Load

NBUF

IPBUF

XBUF

Data  In

Data  Out

CFG

ActivationDMA

Host  Processor

Writeback

Data  Read

W-­‐MEM

DNN ENGINE Micro-Architecture

•  Host  Processor  loads  configura5on  and  input  data  

Page 27: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 27 of 86

MAC  DatapathWeight  Load

NBUF

IPBUF

XBUF

Data  In

Data  Out

CFG

ActivationDMA

Host  Processor

Writeback

Data  Read

W-­‐MEM

DNN ENGINE Micro-Architecture

•  Accumulate  products  of  Ac5va5on  and  Weights  

Page 28: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 28 of 86

MAC  DatapathWeight  Load

NBUF

IPBUF

XBUF

Data  In

Data  Out

CFG

ActivationDMA

Host  Processor

Writeback

Data  Read

W-­‐MEM

DNN ENGINE Micro-Architecture

•  Add  bias,  apply  ReLU  ac5va5on  and  writeback  

Page 29: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 29 of 86

MAC  DatapathWeight  Load

NBUF

IPBUF

XBUF

Data  In

Data  Out

CFG

ActivationDMA

Host  Processor

Writeback

Data  Read

W-­‐MEM

DNN ENGINE Micro-Architecture

•  IRQ  to  host,  which  retrieves  predic5on  data  

Page 30: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 30 of 86

Outline

•  Mo5va5on    

•  DNN  Engine  – Parallelism  – Sparsity  – Resilience  

 

•  Measurement  Results    

•  Summary  

Page 31: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 31 of 86

Exploiting Sparse Data in DNNs Inpu

t  Vector  

Outpu

t  Classes  

Page 32: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 32 of 86

Exploiting Sparse Data in DNNs Inpu

t  Vector  

Outpu

t  Classes  

Page 33: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 33 of 86

Exploiting Sparse Data in DNNs Inpu

t  Vector  

Outpu

t  Classes  

Page 34: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 34 of 86

Exploiting Sparse Data in DNNs Inpu

t  Vector  

Outpu

t  Classes  

Page 35: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 35 of 86

Exploiting Sparse Data in DNNs Inpu

t  Vector  

Outpu

t  Classes  

Page 36: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 36 of 86

Exploiting Sparse Data in DNNs Inpu

t  Vector  

Outpu

t  Classes  

Page 37: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 37 of 86

Exploiting Sparse Data in DNNs Inpu

t  Vector  

Outpu

t  Classes  

Page 38: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 38 of 86

Exploiting Sparse Data in DNNs Inpu

t  Vector  

Outpu

t  Classes  

Page 39: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 39 of 86

Exploiting Sparse Data in DNNs Inpu

t  Vector  

Outpu

t  Classes  

Page 40: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 40 of 86

Exploiting Sparse Data in DNNs Inpu

t  Vector  

Outpu

t  Classes  

Page 41: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 41 of 86

Exploiting Sparse Data in DNNs Inpu

t  Vector  

Outpu

t  Classes  

Page 42: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 42 of 86

Exploiting Sparse Data in DNNs Inpu

t  Vector  

Outpu

t  Classes  

Page 43: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 43 of 86

DNN ENGINE Micro-Architecture

MAC  DatapathWeight  Load

NBUF

IPBUF

XBUF

Data  In

Data  Out

CFG

ActivationDMA

Host  Processor

Writeback

Data  Read

W-­‐MEM

Page 44: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 44 of 86

DNN ENGINE Micro-Architecture

MAC  DatapathWeight  Load

NBUF

IPBUF

XBUF

Data  In

Data  Out

CFG

ActivationDMAIndex

Host  Processor

Writeback

SKIP

Data  Read

W-­‐MEM

Page 45: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 45 of 86

Outline

•  Mo5va5on    

•  DNN  Engine  – Parallelism  – Sparsity  – Resilience  

 

•  Measurement  Results    

•  Summary  

Page 46: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 46 of 86

Razor Hardware Accelerators

CLK  

voltage  

process  

coupling   jiZer   ageing  

safety  

temp  

Page 47: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 47 of 86

Razor Hardware Accelerators

CLK  

voltage  

process  

coupling   jiZer   ageing  

safety  

temp  

•  Occasionally  allow  real  cri5cal  paths  to  fail  5ming    –  Explicitly  check  for  late-­‐arriving  signals  –  Adapt  Vdd/Fclk  to  track  near-­‐zero  error-­‐rate  

Page 48: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 48 of 86

Razor Hardware Accelerators

CLK  

voltage  

process  

coupling   jiZer   ageing  

safety  

temp  

•  Occasionally  allow  real  cri5cal  paths  to  fail  5ming    –  Explicitly  check  for  late-­‐arriving  signals  –  Adapt  Vdd/Fclk  to  track  near-­‐zero  error-­‐rate  

•  Must  ensure  5ming  errors  do  not  affect  system  –  Roll-­‐back  and  replay  [Das  et  al.,  CICC’13]  –  Algorithmic  error  tolerance  [Whatmough  et  al.,  ISSCC’13]  

Page 49: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 49 of 86

MAC  DatapathWeight  Load

RZNBUF

IPBUF

XBUF

RZ

Data  In

Data  Out

CFG

ActivationDMAIndex

Host  Processor

Writeback

SKIP

Data  Read

W-­‐MEM

DNN Engine Micro-Architecture

•  Two  Razor  stages:  W  Memory  and  MAC  Datapath  

Page 50: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 50 of 86

Memory: Algorithm-Level Tolerance

•  Weights  are  zero-­‐mean  Gaussians  (regulariza5on)  –  Small  magnitude,  random  sign  (worst-­‐case  in  TC)  

Two’s  Compliment  (TC)  -1: 11111111+1: 00000001

!0.3% !0.2% !0.1% 0.0% 0.1% 0.2% 0.3%

Coun

t%(a.u.)%

Weight%Value%

Page 51: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 51 of 86

Memory: Algorithm-Level Tolerance Two’s  Compliment  (TC)  -1: 11111111+1: 00000001

Sign-­‐Magnitude  (SM)  -1: 10000001+1: 00000001

•  Weights  are  zero-­‐mean  Gaussians  (regulariza5on)  –  Small  magnitude,  random  sign  (worst-­‐case  in  TC)  

•  Sign-­‐Magnitude  (SM)  reduces  switching  in  weights  –  Lowers  switched  cap  (power)  and  bit  error  rate  

!0.3% !0.2% !0.1% 0.0% 0.1% 0.2% 0.3%

Coun

t%(a.u.)%

Weight%Value%

Page 52: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 52 of 86

RZ

RZ

X

W

Y

Datapath: Circuit-Level Tolerance

•  Algorithmic  error  tolerance  in  datapath  is  poor  –  Errors  are  persistent  in  accumulator  

•  Circuit-­‐level  error  tolerance  using  5me  borrowing  –  Single-­‐sided  cri5cal  path  at  accumulator  register  

Page 53: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 53 of 86

RZ

RZ

X

W

Y

Datapath: Circuit-Level Tolerance

•  Algorithmic  error  tolerance  in  datapath  is  poor  –  Errors  are  persistent  in  accumulator  

•  Circuit-­‐level  error  tolerance  using  5me  borrowing  –  Single-­‐sided  cri5cal  path  at  accumulator  register  

Cri5cal  Path  

Page 54: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 54 of 86

RZ

RZ

X

W

Y

Datapath: Circuit-Level Tolerance

•  Algorithmic  error  tolerance  in  datapath  is  poor  –  Errors  are  persistent  in  accumulator  

•  Circuit-­‐level  error  tolerance  using  5me  borrowing  –  Single-­‐sided  cri5cal  path  at  accumulator  register  

Cri5cal  Path  

Slack  

Page 55: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 55 of 86

Outline

•  Mo5va5on    

•  DNN  Engine  – Parallelism  – Sparsity  – Resilience  

 

•  Measurement  Results    

•  Summary  

Page 56: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 56 of 86

28nm SoC for IoT Applications

ARM  Cortex-­‐M0

UARTDNN  Engine

W-­‐MEM1MB

SYSCTL

Bridge

GPIO

D-­‐MEM64KB

I-­‐MEM64KBTimers

BIST

Wdog

UARTs

32b  AH

B

32b  AP

B

128b

 AXI

M

M

S

S

S

S M S

S

S

Low-­‐BW  Peripherals

S

S

SCAN MS M

S

Accelerator  SubsystemCortex-­‐M0  Subsystem

VACC  –!Accelerator  LogicVMEMP  –!SRAM  PeripheryVMEMC  –!SRAM  Core

USB

GPIO

Scan

USB

DCO

28nm  SoC  Test  Chip

S

VDCO

VSOC

RTC

ASYN

C

FCLKHCLKOSC

VSOC

Page 57: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 57 of 86

Razor Voltage Scaling

•  FC-­‐DNN  784-­‐256-­‐256-­‐256-­‐10  (98.36%  MNIST)  

Sign-­‐off  

VDD Scaling (FCLK = 667MHz)

Tim

ing

Vio

latio

n R

ate

Pow

er (m

W)

TCSM

Page 58: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 58 of 86

Razor Voltage Scaling

•  Opera5on  at  PoFF  (770mV)  lowers  power  by  30%  

VDD Scaling (FCLK = 667MHz)

Tim

ing

Vio

latio

n R

ate

Pow

er (m

W)

TCSM Si

gn-­‐off  

PoFF  

30%  Power  Reduc5on  

Page 59: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 59 of 86

Razor Voltage Scaling

VDD Scaling (FCLK = 667MHz)

Tim

ing

Vio

latio

n R

ate

Pow

er (m

W)

TCSM

Zero  M

argin  

•  Margin  for  fast  varia5ons  down  to  715mV  

30%  Power  Reduc5on  

Sign-­‐off  

Margin  Po

FF  

Page 60: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 60 of 86

Timing Error Tolerance Summary

•  Aggregate  5ming  viola5on  rate  greater  than  10-­‐1  

10-1

10-3

10-5

10-7

MNIST @ 98.36% AccuracyMemory Datapath Combined

Tim

ing

Vio

latio

n R

ate

TC SM BM TC SM TB TC ALL

> 1,

000,

000

X

Page 61: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 61 of 86

0.4

0.6

1.0

0.2

0.8

Nor

mal

ized

Pre

d/S

ec

Nor

mal

ized

Ene

rgy/

Pre

d

TC

ThroughputEnergy

SM 8b

0.9V667MHz

2.0

4.0

8.0

1.0

3.0

5.0

6.0

7.0

Sign-off

Energy/Throughput Summary

-­‐30%  

MNIST @ 98.36% Accuracy

Page 62: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 62 of 86

0.4

0.6

1.0

0.2

0.8

Nor

mal

ized

Pre

d/S

ec

Nor

mal

ized

Ene

rgy/

Pre

d

TC

ThroughputEnergy

SM 8b AVS SKIP

0.9V667MHz

0.77V667MHz

2.0

4.0

8.0

1.0

3.0

5.0

6.0

7.0

PoFFSign-off

Energy/Throughput Summary

-­‐30%  

-­‐4X  

+4X  

MNIST @ 98.36% Accuracy

-­‐30%  

Page 63: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 63 of 86

0.4

0.6

1.0

0.2

0.8

Nor

mal

ized

Pre

d/S

ec

Nor

mal

ized

Ene

rgy/

Pre

d

TC

ThroughputEnergy

260nJ

SM 8b AVS SKIP AVS

0.9V667MHz

0.77V667MHz

0.715V667MHz

2.0

4.0

8.0

1.0

3.0

5.0

6.0

7.0

PoFFSign-off

Energy/Throughput Summary MNIST @ 98.36% Accuracy

-­‐30%  

-­‐4X  

+4X  

-­‐30%  

Page 64: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 64 of 86

0.4

0.6

1.0

0.2

0.8

Nor

mal

ized

Pre

d/S

ec

Nor

mal

ized

Ene

rgy/

Pre

d

TC

ThroughputEnergy

260nJ

SM 8b AVS SKIP AVS AFS SKIP

0.9V667MHz

0.77V667MHz

0.715V667MHz

0.9V1GHz

2.0

4.0

8.0

1.0

3.0

5.0

6.0

7.0

PoFF PoFFSign-off

Energy/Throughput Summary

+50%  

+4X  

-­‐4X  

MNIST @ 98.36% Accuracy

-­‐30%  

Page 65: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 65 of 86

0.4

0.6

1.0

0.2

0.8

Nor

mal

ized

Pre

d/S

ec

Nor

mal

ized

Ene

rgy/

Pre

d

TC

ThroughputEnergy

260nJ570nJ

SM 8b AVS SKIP AVS AFS SKIP AFS

0.9V667MHz

0.77V667MHz

0.715V667MHz

0.9V1GHz

0.9V1.2GHz

2.0

4.0

8.0

1.0

3.0

5.0

6.0

7.0

PoFF PoFFSign-off

Energy/Throughput Summary

+20%  

MNIST @ 98.36% Accuracy

+50%  

+4X  

-­‐4X  

-­‐30%  

Page 66: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 66 of 86

Summary and Conclusion

•  DNN  Engine  –  A  Shared  Programmable  Classifier  –  Parallelism:  10X  Data  Reuse  @  128b/cycle  BW  –  Sparsity:  +4X  Throughput,  -­‐4X  Energy  –  Resilience:  +50%  Throughput  or  -­‐30%  Energy  w/Razor  

•  Energy  Efficient  DNN  –  568  nJ/pred  @  1.2  GHz  

•  Timing  Error  Tolerance  –  Pe  >10-­‐1  @  98.36%  (MNIST)  

Page 67: A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep ...vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/whatmough... · – Specialized$Architecture$and$Efficient$Digital$Circuits$

14.3: A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine

with >0.1 Timing Error Rate Tolerance for IoT Applications © 2017 IEEE International Solid-State Circuits Conference 67 of 86