Neuromorphic Computing based Processors - ISQED · 2/12/2015 3 5 Brain‐like Neuromorphic Circuits...

2/12/2015

Hao Jiang

A collaborative research among San Francisco State University, EI-Lab at

University of Pittsburgh, HP Labs, and AFRL

Neuromorphic Computing based Processors

Outline

• WhyNeuromorphicComputing?

• ChallengesandNewOpportunity

• SpikingNeuromorphicDesign

• AFrameworkofHeterogeneousComputingSystems

• Conclusion

2/12/2015

WhyNeuromorphicComputing?

•VonNeumannarch.isfacingseverechallenges– VonNeumannbottleneck– Inefficientincognitivecomputations

•Humanbrain:highefficiency– 100TFLOPSvs.20Watt– Highlyconnected:50Bneurons&1014 synapses

– Verylight:3lbsHead

Computation & ControlComputation & Control

Neuromorphic Design by Leveraging Memristor Technology

GraymatterWhitematter

Neocortex6layersSignalstravelwithinandbetweenlayers

Brain– TheMostEfficientComputingMachine

Brain:15–30BneuronsExtremelycomplex4km/mm3

Neuron:Processsignalsfromotherneurons.

Synapse:MemoryWeightsignals

NeuralNetwork

2/12/2015

Brain‐likeNeuromorphicCircuits

• Slowprogressinneuromorphichardwareimplementation− Lackofefficientsynapsedesign− Notsupportivetomassconnection

Highlyparallel Ultrapowerefficient

Flexible Extremelyrobust

Realworldinput

Humanfriendlyoutput

Datafriendly

Outline

• Conclusion

2/12/2015

ChallengesinTraditionalApproach

Developingandimplementingneuralnetworkmodelsonlargescalecomputerclustersorsupercomputers.

Performance (100M MIPS) Challenge Energy Challenge

10-20 Megawatts

11000 U.S. households

TraditionalAnalogApproachWeight matrix W

Y = f (W X)

Implementation

floating gates, capacitor, etc.

op-amps, analog voltage multipliers and differentiators

Difficulties

Volatile data, low precision, control signal

Voltage offset, noise generation, voltage saturation

Scaling

O(N2) for weight carrier

O(N2) for voltage multiplier

Weight

Compute

Intrinsic,hardtoovercome

Successfulinsmallscalesystems

Designcomplexity,power,andareagrowveryfast

2/12/2015

LatestProgress

NumentaHTM

Micron Automata

IBM TrueNorth

Memristor– RebirthofAnalogApproach

Memristor

Naturalweightcarriers:• Non‐volatility,highdensity

• Analogresistancestates

• Twoterminalprogramming

MemristorCrossbar

• Naturalweightsummation

• MIMO:avoidsneakpath

• Cost~O(N),notO(N2)

M = RL α + RH (1- α)

I = VM1/M1 + VM2/M2 +…+ VMn/Mn

2/12/2015

0 10 20 30 40 50 60 70300

Pulse number

0 10 20 30 40 50 60 70-4

Memristor– RebirthofNeuromorphicCircuits

• Twoterminal,highdensity• Non‐volatility• Analog/multi‐levelstates

• Naturalmatrixfunction• AMIMOsystem• Goodcombinationwithmemristor

Memristor↔Synapse Crossbar↔Network

TaN1+x

HPlab,2012

EIlab,DAC’12VO

N1 N2 N3 Nj-1 Nj Nn-1 Nn

EIlab,APL’13

EIlab&HPlabTiN-TaOx device, pulses grows linearly in amplitude

Outline

• Conclusion

2/12/2015

Spiking‐based Neuromorphic Computing

• Why spiking?

– Inspired by human brains

– Minimized transition electrical charge

– Reduced data communication distance

– High parallelism in processing

• Approaches in hardware system

– Analoganddigitalcircuitblocks, capacitors

– Crossbar array basing on SRAM, PCM,Memristor cell

– In this work: Memristor based crossbar arrayas synapse

Spiking Neuromorphic System

Integrate and Fire

100 Counters

• Spiking‐based computationsystem

Closer to biological system Power efficiency High reliability

• Matrix computationtransformation

Σ i=1 gij Vi M

I1 I2 In …

g11 g12 g1n

g21 g22 g2n

gm1 gm2 gmn

Mathematicalmatrixcomputation

Memristor‐based crossbar arrayfor matrix computation

2/12/2015

Computation Methodology

Traditional integrate and firemodel

Rate coding

Cm Vth

mi = 2

Ideal: nj ∝ Σ i=1 gij miN

Working mechanism:

• Vm ＜ Vth Cm integrates

Vm ≥ Vth A spike is fired out,

then Cm is rest to 0

• Spike occurs: WeightedcurrenttointegratorNo spike:Nocurrentto&fromintegrator

Memristor crossbar array structure

Voltage (V)

-3 -2 -1 0 1 2 3

Memristor

Selector

• 1S1Mmemristor‐basedcell

– Alleviate the impact of sneak path leakage

– A thin‐film based selector after eachmemristor

– Minimal unit cell area of 4F2

• Selector property and operation

– Spikeoccurs:SelectorON, and Rs_on

＜＜RM

– No spike: SelectorOFF, and Rs_off＞＞RM

gij · gs gij = ~ gij + gs

2/12/2015

High‐speedIntegrateandFireCircuit(H‐IFC)

VREF=33% of vin

Integrate and Fire Circuit

。Vout

VOUT VCm

Comparator

• Structure and property

– Integrate capacitor, Resettransistor, Comparator withpositive feedback

– High speed

– Vth is much smaller than Vdd

• Power and area

– ~100μW

– 28μm×12μm

(180μm Technology)

2/12/2015

SystemVerification

g11 g12 g1n

g21 g22 g2n

gn1 gn2 gnn

I&F I&F I&F . . . Cm Fire Circuit

OutputPulseNumber

Output Pulse Number

Sensing Capacitance

Pulse Duration

Input Voltage

Comparator Trigging Voltage

Memristor Conductance

Selected Row

2/12/2015

TheoreticalComputationvs.SimulationResults

0 0.5 1 1.5 2 2.5 3 3.5

∑ i=1 N mi gij

Ideal Linear Curve

Simulation Result

× 10-4

Parameters Vth 0.5V

Cm 50fF

Vdd 1.8V

• Real: Nonlinearity

• Sums of weighted signaldependent

Reasons:

• Reset time• IFC delay

Optimization:

• Larger Cm• Higher speed of IFC

Be used in neural network

AdaptabilityinNeuronNetwork

0 50 100 150 200 250 (ns) (a)

0 50 100 150 200 250 (ns) (b)

0 50 100 150 200 250 (ns) (c)

Output Spike Number (nj)

Good adaptability in neuralnetwork

2/12/2015

Outline

• Conclusion

OurApproach

• Aframeworkofheterogeneouscomputingsystemsenhancedwithneuromorphiccomputingaccelerators (NCAs).

• Purpose: Tocombinetheflexibility ofconventionalarchitectureinlogicandscientificcomputationandtheefficiency ofneuromorphicarchitectureforANNapplications.

2/12/2015

Frontend:PrepareData&Instructions

Instruction Type Descriptionsetpreg Configuration Placetheroutinginformationstoredat

registerregtocentralroutermovd#(reg) I/O LoadthedatafrommemorytoNCAlaunch Configuration Notifythecentralroutertostarttransmittingdeqreg I/O DequeuetheheaddataofOut‐queueandwrite

ittoregisterreg

bool RecallBSB(float *vec, float *wm)

{ /* simulate the synapse network*/ for(i=0;i<BsbSize;++i) wx[i] += �wm[i*BsbSize+j] * vec[j]; …… /* activation function*/ for(i=0;i<BsbSize; ++i) wx[i] = ALPHA*wx[i] + LAMDA*vec[i]; …… /* check convergence */ for(i=0;i<BsbSize;++i) if(fabs f(vec[i]) != 1.0) return false; return true;}

bool RecallBSB(float *vec)

{ /*inputs to NCA*/

Send(NCA.id, vec);

/*outpus from NCA*/

return Receive(NCA.id)

; send each input from register to input; buffer associated with specific NCALW R1, $(vec)

MOVD NCA.id, R1

……; launch the NCA

SET NCA.id, #VAL LAUNCH; put the output from output buffer of; NCA to register, here is only one output

DEQ R1, NCA.id

RETSource‐to‐source

translation

Training

NCA‐aware compilation

Backend:SystemDesign

Arbiter

I/OCfg

Buffers

Arbiter

I/OCfg

Buffers

Arbiter

I/OCfg

Buffers

Arbiter

I/OCfg

Buffers

Bridge

00101101

Bridge

00101101

GeneralPurposeProcessor

ConventionalProcessing Neuromorphic Computing Accelerators

RF NCA $ NCA

RFPipeline

• Tightlycoupleddesign

• Invokedbyspecialinst.

2/12/2015

NCAArchitecture

• AhierarchicalstructureofMBCarrays

• MixedsignalNoC– Analogcomputation– Digitalcontrolandroutingsignaltransition– MBCarraysconnectedinametamorphouscentralizedmesh(MCMesh)manner

SUM AMP

GroupRouter

CentralRouter

SystemLevelEvaluation• Twoimplementationsrepresentingtradeoffsbetweencomputationperformanceandaccuracy

• 7classificationbenchmarks• Classificationrateisusedas

reliabilitymetric

Multi-layer perception (MLP) Auto-associative memory (AAM)

Benchmark Description

cancer breast cancer diagnose

connect-4 connect-4 game

gene nucleotide sequences detection

lymphography lymph diagnose

MNIST digit recognition

mushroompoisonous mushroom discrimination

thyroid thyroid diagnose

2/12/2015

ExperimentalSetup

The Design Parameters of NCA Components

The Benchmark Implementation Details

ImpactofDeficientHardware

• Programmingprecisionduetolimiteddeviceresolution

• Devicevariationsandsignalfluctuations

• AAMismorerobustthanMLP

2/12/2015

MBCSizeExploration

• Largerarraysizeispreferablefromperformanceperspective• However,asarraysizeincreasestheclassificationratedegradesinducedbytheaggravatedvariations

Comparisonw/OtherDesigns

• Baseline: CPUasgeneralpurposeprocessor

• D‐NPU: apopulardigitalneuromorphicaccelerator(MICRO’12)

• MBCs+D‐Net:MBCarraysw/digitalNoCinordertoevaluatetheefficiencyofmixed‐signalNoC

• NCA: ourdesignw/MBCarraysandmixed‐signalNoC

Sigmoid Unit

Multiply-add

Output buffer

Input buffer

AccumulatorRegister

Weight buffer

PE PE PE PE

2/12/2015

Comparisonw/OtherDesigns

D‐NPU islimitedbythecomputationalbandwidth.

MBCs+D‐Net islimitedbythecostlyAD/DAconversions.

MLP AAM

Speedup 177.67 27.20

Energy Saving

184.71 25.18

NCAImprovement

Outline

• Conclusion

2/12/2015

Conclusion&Perspective

• Invention of new devices inspires the study of the next generation neuromorphic computing systems.

• A spiking neuromorphic computing system by leveraging the memristor crossbar array is demonstrated.

• We propose a heterogeneous system that combines the flexibility of conventional architecture and the efficiency of neuromorphic architecture.

• In the future research, we plan to extend the investigation to larger scale ANN applications.

• The techniques to enhance the run-time robustness in training and testing procedures will be studied.

ThankYouandQuestions?

Neuromorphic Computing based Processors - ISQED · 2/12/2015 3 5 Brain‐like Neuromorphic Circuits...

Documents

Memristive devices for neuromorphic computation

Neuromorphic Electronics - UiO

Neuromorphic Learning towards Nano Second Precision · Neuromorphic Learning towards Nano Second Precision ... and a 4-bit digital weight w i stored in each synapse: g ... to the

PHOTONIC IMPLEMENTATION OF NEUROMORPHIC LEARNING …

F09/F10 Neuromorphic Computing - Physikalisches … · F09/F10 Neuromorphic Computing Prerequisites This experiment will introduce neuromorphic hardware that has been devel-oped in

Advances in Energy Efficient Neuromorphic Computing: Ready for ... · • IBM’s TrueNorth (DARPA’s SyNAPSE project) 65 mW real-time neurosynaptic processor, 4096 neurosynaptic

Chapter XIV Neuromorphic Adiabatic Quantum Computation · Chapter XIV Neuromorphic Adiabatic Quantum Computation ... which makes use of quantum states (Nielsen & Chuang, ... neuromorphic

Synapse Solutions Product Guide Synapse SolutionS · Synapse Solutions Full integration for faster workflows Vendor- and hardware-neutral, Synapse Cardiovascular integrates with the

Neuromorphic Computing Slides - Intel Newsroom

Neuromorphic Image Processing

Photonic Neuromorphic computing - nanoHUB

Neuromorphic Enginerring- subthrshold design

Neuromorphic Sensing and Computing 2019

Neuromorphic VLSI Neuromorphic VLSI Systems

Emerging Technologies - Neuromorphic Engineering / Computing

WELCOME TO ISQED 2016 · welcome you to the 17th International Symposium on Quality Electronic Design, ISQED 2016. The 17th International Symposium on Quality Electronic Design (ISQED

ExecutionModel: Neuromorphic Computation

IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, … · In this work, we present ODIN, a 0.086-mm2 64k-synapse 256-neuron online-learning digital spiking neuromorphic processor

NEUROMORPHIC SILICON PHOTONICSAND APPLICATIONS

Reconfigurable neuromorphic synapse interconnects with TFT · AMORPHOUS SILICON TFT LCD TV –TFT IS USED AS A SWITCH IN AN ACTIVE MATRIX cfr. DowCorning Gen10 2850 x 3050mm Latest