Neuromorphic Computing and Learning: A …Neuromorphic Computing and Learning: A Stochastic Signal...

Neuromorphic Computing and Learning:A Stochastic Signal Processing Perspective

Osvaldo Simeonejoint work with Hyeryung Jang (KCL), Bipin Rajendran (NJIT), Brian

Gardner and Andre Gruning (Univ. of Surrey)

King’s College London

December 11, 2018

Osvaldo Simeone Neuromorphic Computing 1 / 53

Overview

Motivation

Models

Algorithms

Examples

Overview

Motivation

Models

Algorithms

Examples

Machine Learning Today

[Rajendran ’18]

Breakthroughs in ML have come at the expense of massive memory,energy, and time requirements...

Machine Learning Today

... making many state-of-the-art solutions not suitable for mobile orembedded devices.

[Rajendran ’18]

Machine Learning at the EdgeA solution is mobile edge or cloud computing: offload computationsto an edge or cloud server.

Possible privacy and latency issues

Machine Learning at the EdgeA solution is mobile edge or cloud computing: offload computationsto an edge or cloud server.

Possible privacy and latency issues

Machine Learning on Mobile DevicesScaling down energy and memory requirements for implementation onmobile devices requires exploring trade-offs between accuracy andcomplexity.Active field: many new chips released by established players andstart-ups to implement artificial neural networks...

Human vs MachineBeyond artificial neural networks...

13 Million Watts5600 sq. ft. & 340 tons

∼ 1010 ops/J

∼ 20 Watts2 sq. ft. & 1.4 Kg∼ 1015 ops/J

Source: https://www.olcf.ornl.gov, Google Images

Neuromorphic ComputingNeurons in the brain process and communicate over time using sparsebinary signals (spikes or action potentials).This results in a dynamic, sparse, and event-driven operation.

[Gerstner]

Spiking Neural Networks

Spiking Neural Networks (SNNs) are networks of spiking neurons.

Topic at the intersection of computational neuroscience (focused onbiological plausibility) and machine learning (focused on accuracy andefficiency).

Artificial Neural Network (ANN)

Spiking Neural Network (SNN)

Spiking Neural Networks (SNNs) are networks of spiking neurons.

Topic at the intersection of computational neuroscience (focused onbiological plausibility) and machine learning (focused on accuracy andefficiency).

Proof-of-concept hardware implementations of SNNs havedemonstrated significant energy savings as compared to ANNs...

... generating significant (and perhaps premature) press coverage andpositive market predictions.

Overview

Motivation

Models

Algorithms

Examples

I/O Interfaces

SNNNeuromorphic

sensor

Ex.: silicon cochlea, retina

Neuromorphic

actuator

I/O Interfaces

SNNencoder decoder

source actuator

I/O Interfaces

Rate encoding, time encoding, population encoding,...

Rate decoding, first-to-spike decoding,...

SNNencoder decoder

source actuator

Internal Operation

An SNN is a network of spiking neurons.

Unlike ANNs, in SNNs neurons operate sequentially over time byprocessing and communicating via spikes.

Discrete-time vs continuous-time: most hardware implementationsfollow former (e.g., Intel’s Loihi).

Internal Operation

An SNN is a network of spiking neurons.

Unlike ANNs, in SNNs neurons operate sequentially over time byprocessing and communicating via spikes.

Discrete-time vs continuous-time: most hardware implementationsfollow former (e.g., Intel’s Loihi).

Internal Operation

Internal operation defined by:

I topology (connectivity)I spiking mechanism

Topology

Two types of connections between neurons:I Synaptic links

F Causal dependency of a post-synaptic neuron on pre-synaptic neuronF Possibly recurrent: long-term memory

Topology

Two types of connections between neurons:I Synaptic linksI Lateral dependencies:

F Instantaneous correlation between spiking of two neuronsF Excitatory or inhibitory

Topology

An important example is a multi-layer SNN with lateral connectionswithin each layer.

Focus on this topology in the following, although many considerationsgeneralize.

Spiking MechanismEach neuron is characterized by an internal state known as membrane

potential u(l)i ,t [Gerstner and Kistler ’02].

Generally, a higher membrane potentially entails a larger probability ofspiking.It evolves over time as a function of the past behavior of pre-synapticneurons and of the neuron itself.

Spiking MechanismEach neuron is characterized by an internal state known as membrane

potential u(l)i ,t [Gerstner and Kistler ’02].

Generally, a higher membrane potentially entails a larger probability ofspiking.It evolves over time as a function of the past behavior of pre-synapticneurons and of the neuron itself.

Membrane Potential

u(l)i ,t =

∑j∈V(l−1)

w(l)j ,i

(at ∗ s(l−1)j ,t

(bt ∗ s(l)i ,t

Membrane Potential

u(l)i ,t =

∑j∈V(l−1)

w(l)j ,i

(at ∗ s(l−1)j ,t

(bt ∗ s(l)i ,t

Feedforward filter (kernel) at with learnable synaptic weight w(l)j ,i

Membrane Potential

u(l)i ,t =

∑j∈V(l−1)

w(l)j ,i

(at ∗ s(l−1)j ,t

(bt ∗ s(l)i ,t

Feedback filter (kernel) bt with learnable w(l)i (e.g., refractory period)

Bias (threshold) γ(l)i

Membrane Potential

Kernels can more generally be parameterized via multiple basisfunctions and learnable weights [Pillow et al ’08].

This allows learning of temporal processing, e.g., by adapting synapticdelays.

Deterministic Models

The most common model is leaky integrate-and-fire [Gerstner andKistler ’02]:

I Spike when membrane potential is positiveI Non-differentiable with respect to model parametersI Heuristic training algorithms based on ideas such as surrogate gradient

[Neftci ’18] [Anwani and Rajendran ’18].

Probabilistic models are more flexible and yield principleddifferentiable learning rules [Koller and Friedman ’09].

Probabilistic Models

Basic probabilistic model: Generalized Linear Model (GLM)I There are no lateral connections, and the conditional spiking

probability is [Pillow et al ’08]

p(s(l)i,t = 1|s(l−1)≤t−1, s

(l)≤t−1) = σ(u

(l)i,t )

Probabilistic Models

More general energy-based model (e.g., Boltzmann machines for timeseries [Osogami ’17]):

I With lateral correlations defined by parameters r(l)i,j , the joint

probability of spiking for layer l is

pθ(l)(s(l)t |s

(l−1)≤t−1, s

(l)≤t−1) ∝ exp

{ ∑i∈V(l)

u(l)i,t s

(l)i,t +

∑i,j∈V(l)

r(l)i,j s

(l)i,t s

(l)j,t

Overview

Motivation

Models

Algorithms

Examples

Training SNNs

Supervised learning:

training set = {(input, output)} → generalization

Unsupervised learning:

training set = {(input or output)} → compression, sample generation,clustering, ...

Reinforcement learning:

active (iterative) training = state (input) 7→ action (output) 7→ rewardand new input

Training SNNs

Training is carried out by following a learning rule.

A learning rule describes how the model parameters are updated onthe basis of data in order to carry out a given task.

Online or batch

Local vs global information

Training SNNs

Online or batch

Training SNNs

Online or batch

Learning Rules

The general form of many learning rules for synaptic weights followsthe three-factor format [Fremaux and Gerstner ’17]:

θ ← θ + η × learning signal × pre-syn × post-syn

Pre-synaptic and post-synaptic terms are local to each neuron.

Learning signal, aka neuromodulator, is global.

The product pre-syn × post-syn tends to be large when the twoneurons spike at nearly the same time.

“Neurons that fire together, wire together” (Hebbian theory, STDP,BCM theory) [Hebb ’49] [Bienenstock et al ’82].

Learning Rules

Deriving the Learning Rules

Three-factor rules can be derived as a form of stochastic gradientdescent for the probabilistic model:

θ ← θ + η × M × ∇ ln p(output | input)

learningrate

learningsignal

Deriving the Learning Rules

Gradient for synaptic weights under GLM (no lateral connections)obtained by summing over time t

(l)j,i

log pθ(s(l)i ,t |s

(l)≤t−1, s

(l)≤t−1) =

(at ∗ s(l−1)j ,t

)︸︷︷︸pre-synaptic trace

(s(l)i ,t − σ

(u(l)i ,t

))︸︷︷︸post-synaptic error

Post-synaptic error = desired/ observed behavior - model averagedbehavior [Bienenstock et al ’82]

Deriving the Learning Rules: Supervised Learning

= 1 or from VI output ← traininginput ← training

Variational Inference (VI) needed if there are intermediate layers[Rezende et al ’11] [Osogami ’17] [Jang et al ’18].

Deriving the Learning Rules: Unsupervised Learning

For generative (unsupervised) models:

= from VI output ← traininginput ← ∅

Unsupervised learning models always have hidden layers.

Deriving the Learning Rules: Reinforcement Learning

Using policy gradient to learn an SNN policy:

reward/ return& VI

output ← actioninput ← state

Variational Inference (VI) needed if there are intermediate layers[Rezende et al ’11] [Osogami ’17] [Jang et al ’18].

Overview

Motivation

Models

Algorithms

Examples

Supervised Learning

encoder

source

decoder

rotated

Supervised Learning

From [Jang et al ’18-2]

Unsupervised Learning

decoder

variational

SNNencoder

Reinforcement Learning

encoder decoder

actuator

Reinforcement Learning

From [Rosenfeld et al ’18]

Concluding Remarks

Statistical signal processing review of neuromorphic computing viaSpiking Neural Networks.

Additional topics:I recurrent SNNs for long-term memory [Maas ’11]I neural sampling: information encoded in steady-state behavior [Buesing

et al ’11]I Bayesian learning via Langevin dynamics [Pecevski et al ’11] [Kappel et

al ’15]

Some open problems:I meta-learning, life-long learning, transfer learning [Bellec et al ’18]I learning I/O interfaces [Lazar and Toth ’03]

Acknowledgements

This work has received funding from the European Research Council(ERC) under the European Union’s Horizon 2020 research and innovation

programme (grant agreement No. 725731) and from the US NationalScience Foundation (NSF) under grant ECCS 1710009.

References

[Gerstner and Kistler ’02] W. Gerstner and W. M. Kistler, Spiking neuronmodels: Single neurons, populations, plasticity. Cambridge UniversityPress, 2002.[Pillow et al ’08] J.W. Pillow, J. Shlens, L. Paninski, A. Sher, A. M. Litke,E. Chichilnisky, and E. P. Simoncelli, “Spatio-temporal correlations andvisual signalling in a complete neuronal population,” Nature, vol. 454, no.7207, p. 995, 2008.[Osogami ’17] T. Osogami, “Boltzmann machines for time-series,” arXivpreprint arXiv:1708.06004, 2017.[Ibnkahla ’00] Ibnkahla M. Applications of neural networks to digitalcommunications–a survey. Signal processing. 2000 Jul 1;80(7):1185-215.[Koller and Friedman ’09] Koller D, Friedman N, Bach F. Probabilisticgraphical models: principles and techniques. MIT press; 2009.

References

[Fremaux et al ‘08] N. Fremaux and W. Gerstner, “Neuromodulatedspike-timing-dependent plasticity, and theory of three-factor learningrules,” Frontiers in neural circuits, vol. 9, p. 85, 2016.[Jang et al ’18] H. Jang, O. Simeone, B. Gardnerm and A. Gruning,“Spiking neural networks: A stochastic signal processing perspective,” ...[Rezende et al ’11] Rezende DJ, Wierstra D, Gerstner W. “Variationallearning for recurrent spiking networks”, In Advances in Neural InformationProcessing Systems, 2011 (pp. 136-144).[Brea et al ’13] J. Brea, W. Senn, and J.-P. Pfister, “Matching recall andstorage in sequence learning with spiking neural networks,” Journal ofNeuroscience, vol. 33, no. 23, pp. 9565–9575, 2013.[Hebb ’49] D. Hebb, The Organization of Behavior. New York: Wiley andSons, Nov. 1949.

References

[Bienenstock et al ’82] E. L. Bienenstock, L. N. Cooper, and P. W. Munro,“Theory for the development of neuron selectivity: orientation specificityand binocular interaction in visual cortex,” Journal of Neuroscience, vol. 2,no. 1, pp. 32–48, 1982.[Pecevski et al ’11] D. Pecevski, L. Buesing, and W. Maass, “Probabilisticinference in general graphical models through sampling in stochasticnetworks of spiking neurons,” PLOS Computational Biology, vol. 7, no.12, pp. 1–25, 12 2011.[Rosenfeld et al ’18] Rosenfeld B, Simeone O, Rajendran B. LearningFirst-to-Spike Policies for Neuromorphic Control Using Policy Gradients.arXiv preprint arXiv:1810.09977. 2018 Oct 23.[Jang et al ’18-2] Jang H, Simeone O. Training Dynamic ExponentialFamily Models with Causal and Lateral Dependencies for GeneralizedNeuromorphic Computing. arXiv preprint arXiv:1810.08940. 2018 Oct 21.

References

[Bellec et al ’18] G. Bellec, D. Salaj, A. Subramoney, R. Legenstein, andW. Maass, “Long short-term memory and learning-to-learn in networks ofspiking neurons,” arXiv preprint arXiv:1803.09574, 2018.[Kappel et al ’15] Kappel D, Habenschuss S, Legenstein R, Maass W.Synaptic sampling: a Bayesian approach to neural network plasticity andrewiring. InAdvances in Neural Information Processing Systems 2015 (pp.370-378).[Buesing et al ’11] Buesing L, Bill J, Nessler B, Maass W. Neural dynamicsas sampling: a model for stochastic computation in recurrent networks ofspiking neurons. PLoS computational biology. 2011 Nov3;7(11):e1002211.[Lazar and Toth ’03] Lazar, Aurel A., and Laszlo T. Toth. "Time encodingand perfect recovery of bandlimited signals." ICASSP (6). 2003.[Maas ’11] Maass W. Liquid state machines: motivation, theory, andapplications. InComputability in context: computation and logic in thereal world 2011 (pp. 275-296).

References

[Neftci ’18] Neftci EO. Data and power efficient intelligence withneuromorphic learning machines. iScience. 2018 Jul 27;5:52.[Anwani and Rajendran ’18] N. Anwani and B. Rajendran, “TrainingMultilayer Spiking Neural Networks using NormAD based Spatio-TemporalError Backpropagation,” arXiv:1811.10678.

Neuromorphic Computing and Learning: A …Neuromorphic Computing and Learning: A Stochastic Signal...

Documents

Daniele Ielmini, Logic and neuromorphic computing with resistive

Ultra-Low Power Neuromorphic Computing With Spin-Torque

Programming Approaches for Neuromorphic Cognitive Computing · Neuromorphic computing is an emerging technology that promises intelligent real-time computing . at extremely low power,

Applications of Neuromorphic Computing

Spintronics Enabled Efficient Neuromorphic Computing

REVIEW RESEARCH PAPER NEUROMORPHIC COMPUTING AND …

Neuromorphic Computing - Office of Science/media/ascr/pdf/programdocuments/docs/... · Neuromorphic Computing Architectures, Models, and Applications A Beyond-CMOS Approach to Future

NEUROMORPHIC SENSING AND COMPUTING 2019 › ... · Neuromorphic computing technology 160 > Neuromorphic computing - Technology trends > The “Von Neumann bottleneck” > Technology

Neuromorphic Computing with Memristors: From Devices to

Neuromorphic Computing: From Materials to …science.energy.gov/.../docs/Neuromorphic-Computing-Report_FNLBLP.pdfNeuromorphic Computing: From Materials to Systems Architecture Report

Multi-State MRAM Cells For Hardware Neuromorphic Computing

Dot-Product Engine for Neuromorphic Computing: Programming

Neuromorphic Computing - Helen Wills Neuroscience …neuroscience.berkeley.edu/wp-content/uploads/2016/05/Karlheinz... · Neuromorphic Computing in the European Human Brain Project

Convolutional networks for fast, energy-efficient neuromorphic computing · Convolutional networks for fast, energy-efficient neuromorphic computing Steven K. Essera,1, Paul A. Merollaa

Neuromorphic Computing Slides - Intel Newsroom

High Performance Computing on SpiNNaker Neuromorphic Platform… · High Performance Computing on SpiNNaker Neuromorphic Platform: a Case Study for Energy Efﬁcient Image Processing

Analog photonic accelerators for neuromorphic computing

NEUROMORPHIC INTO AI COMPUTING & EVOLUTION

Neuromorphic Computing and Emerging Memory Technologies

Neuromorphic Computing - GitHub Pagesornlcda.github.io/neuromorphic2016/files/ORNL... · Neuromorphic Computing Architectures, Models, and Applications A Beyond-CMOS Approach to Future