Neuromorphic Computing and Learning: A …Neuromorphic Computing and Learning: A Stochastic Signal...

Preview:

Citation preview

Neuromorphic Computing and Learning:A Stochastic Signal Processing Perspective

Osvaldo Simeonejoint work with Hyeryung Jang (KCL), Bipin Rajendran (NJIT), Brian

Gardner and Andre Gruning (Univ. of Surrey)

King’s College London

December 11, 2018

Osvaldo Simeone Neuromorphic Computing 1 / 53

Overview

Motivation

Models

Algorithms

Examples

Osvaldo Simeone Neuromorphic Computing 2 / 53

Overview

Motivation

Models

Algorithms

Examples

Osvaldo Simeone Neuromorphic Computing 3 / 53

Machine Learning Today

[Rajendran ’18]

Breakthroughs in ML have come at the expense of massive memory,energy, and time requirements...

Osvaldo Simeone Neuromorphic Computing 4 / 53

Machine Learning Today

... making many state-of-the-art solutions not suitable for mobile orembedded devices.

[Rajendran ’18]

Osvaldo Simeone Neuromorphic Computing 5 / 53

Machine Learning at the EdgeA solution is mobile edge or cloud computing: offload computationsto an edge or cloud server.

Possible privacy and latency issues

Osvaldo Simeone Neuromorphic Computing 6 / 53

Machine Learning at the EdgeA solution is mobile edge or cloud computing: offload computationsto an edge or cloud server.

Possible privacy and latency issues

Osvaldo Simeone Neuromorphic Computing 6 / 53

Machine Learning on Mobile DevicesScaling down energy and memory requirements for implementation onmobile devices requires exploring trade-offs between accuracy andcomplexity.Active field: many new chips released by established players andstart-ups to implement artificial neural networks...

Osvaldo Simeone Neuromorphic Computing 7 / 53

Human vs MachineBeyond artificial neural networks...

13 Million Watts5600 sq. ft. & 340 tons

∼ 1010 ops/J

∼ 20 Watts2 sq. ft. & 1.4 Kg∼ 1015 ops/J

Source: https://www.olcf.ornl.gov, Google Images

Osvaldo Simeone Neuromorphic Computing 8 / 53

Neuromorphic ComputingNeurons in the brain process and communicate over time using sparsebinary signals (spikes or action potentials).This results in a dynamic, sparse, and event-driven operation.

[Gerstner]

Osvaldo Simeone Neuromorphic Computing 9 / 53

Spiking Neural Networks

Spiking Neural Networks (SNNs) are networks of spiking neurons.

Topic at the intersection of computational neuroscience (focused onbiological plausibility) and machine learning (focused on accuracy andefficiency).

x2

x1

xn

y

Artificial Neural Network (ANN)

Spiking Neural Network (SNN)

Time

Time

w1

wn

w2

...

w1

wn

w2

...

x2(t)

x1(t)

xn(t)

y(t)

Osvaldo Simeone Neuromorphic Computing 10 / 53

Spiking Neural Networks

Spiking Neural Networks (SNNs) are networks of spiking neurons.

Topic at the intersection of computational neuroscience (focused onbiological plausibility) and machine learning (focused on accuracy andefficiency).

x2

x1

xn

y

Artificial Neural Network (ANN)

Spiking Neural Network (SNN)

Time

Time

w1

wn

w2

...

w1

wn

w2

...

x2(t)

x1(t)

xn(t)

y(t)

Osvaldo Simeone Neuromorphic Computing 10 / 53

Spiking Neural Networks

Proof-of-concept hardware implementations of SNNs havedemonstrated significant energy savings as compared to ANNs...

Osvaldo Simeone Neuromorphic Computing 11 / 53

Spiking Neural Networks

... generating significant (and perhaps premature) press coverage andpositive market predictions.

Osvaldo Simeone Neuromorphic Computing 12 / 53

Overview

Motivation

Models

Algorithms

Examples

Osvaldo Simeone Neuromorphic Computing 13 / 53

I/O Interfaces

SNNNeuromorphic

sensor

Ex.: silicon cochlea, retina

Neuromorphic

actuator

Osvaldo Simeone Neuromorphic Computing 14 / 53

I/O Interfaces

SNNencoder decoder

source actuator

5

Osvaldo Simeone Neuromorphic Computing 15 / 53

I/O Interfaces

Rate encoding, time encoding, population encoding,...

Rate decoding, first-to-spike decoding,...

SNNencoder decoder

source actuator

5

Osvaldo Simeone Neuromorphic Computing 16 / 53

Internal Operation

An SNN is a network of spiking neurons.

Unlike ANNs, in SNNs neurons operate sequentially over time byprocessing and communicating via spikes.

Discrete-time vs continuous-time: most hardware implementationsfollow former (e.g., Intel’s Loihi).

x2

x1

xn

y

Artificial Neural Network (ANN)

Spiking Neural Network (SNN)

Time

Time

w1

wn

w2

...

w1

wn

w2

...

x2(t)

x1(t)

xn(t)

y(t)

Osvaldo Simeone Neuromorphic Computing 17 / 53

Internal Operation

An SNN is a network of spiking neurons.

Unlike ANNs, in SNNs neurons operate sequentially over time byprocessing and communicating via spikes.

Discrete-time vs continuous-time: most hardware implementationsfollow former (e.g., Intel’s Loihi).

x2

x1

xn

y

Artificial Neural Network (ANN)

Spiking Neural Network (SNN)

Time

Time

w1

wn

w2

...

w1

wn

w2

...

x2(t)

x1(t)

xn(t)

y(t)

Osvaldo Simeone Neuromorphic Computing 17 / 53

Internal Operation

Internal operation defined by:

I topology (connectivity)I spiking mechanism

x2

x1

xn

y

Artificial Neural Network (ANN)

Spiking Neural Network (SNN)

Time

Time

w1

wn

w2

...

w1

wn

w2

...

x2(t)

x1(t)

xn(t)

y(t)

Osvaldo Simeone Neuromorphic Computing 18 / 53

Topology

Two types of connections between neurons:I Synaptic links

F Causal dependency of a post-synaptic neuron on pre-synaptic neuronF Possibly recurrent: long-term memory

Osvaldo Simeone Neuromorphic Computing 19 / 53

Topology

Two types of connections between neurons:I Synaptic linksI Lateral dependencies:

F Instantaneous correlation between spiking of two neuronsF Excitatory or inhibitory

Osvaldo Simeone Neuromorphic Computing 20 / 53

Topology

An important example is a multi-layer SNN with lateral connectionswithin each layer.

Focus on this topology in the following, although many considerationsgeneralize.

Osvaldo Simeone Neuromorphic Computing 21 / 53

Spiking MechanismEach neuron is characterized by an internal state known as membrane

potential u(l)i ,t [Gerstner and Kistler ’02].

Generally, a higher membrane potentially entails a larger probability ofspiking.It evolves over time as a function of the past behavior of pre-synapticneurons and of the neuron itself.

Osvaldo Simeone Neuromorphic Computing 22 / 53

Spiking MechanismEach neuron is characterized by an internal state known as membrane

potential u(l)i ,t [Gerstner and Kistler ’02].

Generally, a higher membrane potentially entails a larger probability ofspiking.It evolves over time as a function of the past behavior of pre-synapticneurons and of the neuron itself.

Osvaldo Simeone Neuromorphic Computing 22 / 53

Membrane Potential

u(l)i ,t =

∑j∈V(l−1)

w(l)j ,i

(at ∗ s(l−1)j ,t

)+ w

(l)i

(bt ∗ s(l)i ,t

)+ γ

(l)i

Osvaldo Simeone Neuromorphic Computing 23 / 53

Membrane Potential

u(l)i ,t =

∑j∈V(l−1)

w(l)j ,i

(at ∗ s(l−1)j ,t

)+ w

(l)i

(bt ∗ s(l)i ,t

)+ γ

(l)i

Feedforward filter (kernel) at with learnable synaptic weight w(l)j ,i

Osvaldo Simeone Neuromorphic Computing 24 / 53

Membrane Potential

u(l)i ,t =

∑j∈V(l−1)

w(l)j ,i

(at ∗ s(l−1)j ,t

)+ w

(l)i

(bt ∗ s(l)i ,t

)+ γ

(l)i

Feedback filter (kernel) bt with learnable w(l)i (e.g., refractory period)

Bias (threshold) γ(l)i

Osvaldo Simeone Neuromorphic Computing 25 / 53

Membrane Potential

Kernels can more generally be parameterized via multiple basisfunctions and learnable weights [Pillow et al ’08].

This allows learning of temporal processing, e.g., by adapting synapticdelays.

Osvaldo Simeone Neuromorphic Computing 26 / 53

Deterministic Models

The most common model is leaky integrate-and-fire [Gerstner andKistler ’02]:

I Spike when membrane potential is positiveI Non-differentiable with respect to model parametersI Heuristic training algorithms based on ideas such as surrogate gradient

[Neftci ’18] [Anwani and Rajendran ’18].

Probabilistic models are more flexible and yield principleddifferentiable learning rules [Koller and Friedman ’09].

Osvaldo Simeone Neuromorphic Computing 27 / 53

Deterministic Models

The most common model is leaky integrate-and-fire [Gerstner andKistler ’02]:

I Spike when membrane potential is positiveI Non-differentiable with respect to model parametersI Heuristic training algorithms based on ideas such as surrogate gradient

[Neftci ’18] [Anwani and Rajendran ’18].

Probabilistic models are more flexible and yield principleddifferentiable learning rules [Koller and Friedman ’09].

Osvaldo Simeone Neuromorphic Computing 27 / 53

Deterministic Models

The most common model is leaky integrate-and-fire [Gerstner andKistler ’02]:

I Spike when membrane potential is positiveI Non-differentiable with respect to model parametersI Heuristic training algorithms based on ideas such as surrogate gradient

[Neftci ’18] [Anwani and Rajendran ’18].

Probabilistic models are more flexible and yield principleddifferentiable learning rules [Koller and Friedman ’09].

Osvaldo Simeone Neuromorphic Computing 27 / 53

Probabilistic Models

Basic probabilistic model: Generalized Linear Model (GLM)I There are no lateral connections, and the conditional spiking

probability is [Pillow et al ’08]

p(s(l)i,t = 1|s(l−1)≤t−1, s

(l)≤t−1) = σ(u

(l)i,t )

Osvaldo Simeone Neuromorphic Computing 28 / 53

Probabilistic Models

More general energy-based model (e.g., Boltzmann machines for timeseries [Osogami ’17]):

I With lateral correlations defined by parameters r(l)i,j , the joint

probability of spiking for layer l is

pθ(l)(s(l)t |s

(l−1)≤t−1, s

(l)≤t−1) ∝ exp

{ ∑i∈V(l)

u(l)i,t s

(l)i,t +

∑i,j∈V(l)

r(l)i,j s

(l)i,t s

(l)j,t

}

Osvaldo Simeone Neuromorphic Computing 29 / 53

Overview

Motivation

Models

Algorithms

Examples

Osvaldo Simeone Neuromorphic Computing 30 / 53

Training SNNs

Supervised learning:

training set = {(input, output)} → generalization

Unsupervised learning:

training set = {(input or output)} → compression, sample generation,clustering, ...

Reinforcement learning:

active (iterative) training = state (input) 7→ action (output) 7→ rewardand new input

Osvaldo Simeone Neuromorphic Computing 31 / 53

Training SNNs

Supervised learning:

training set = {(input, output)} → generalization

Unsupervised learning:

training set = {(input or output)} → compression, sample generation,clustering, ...

Reinforcement learning:

active (iterative) training = state (input) 7→ action (output) 7→ rewardand new input

Osvaldo Simeone Neuromorphic Computing 31 / 53

Training SNNs

Supervised learning:

training set = {(input, output)} → generalization

Unsupervised learning:

training set = {(input or output)} → compression, sample generation,clustering, ...

Reinforcement learning:

active (iterative) training = state (input) 7→ action (output) 7→ rewardand new input

Osvaldo Simeone Neuromorphic Computing 31 / 53

Training SNNs

Training is carried out by following a learning rule.

A learning rule describes how the model parameters are updated onthe basis of data in order to carry out a given task.

Online or batch

Local vs global information

Osvaldo Simeone Neuromorphic Computing 32 / 53

Training SNNs

Training is carried out by following a learning rule.

A learning rule describes how the model parameters are updated onthe basis of data in order to carry out a given task.

Online or batch

Local vs global information

Osvaldo Simeone Neuromorphic Computing 32 / 53

Training SNNs

Training is carried out by following a learning rule.

A learning rule describes how the model parameters are updated onthe basis of data in order to carry out a given task.

Online or batch

Local vs global information

Osvaldo Simeone Neuromorphic Computing 32 / 53

Learning Rules

The general form of many learning rules for synaptic weights followsthe three-factor format [Fremaux and Gerstner ’17]:

θ ← θ + η × learning signal × pre-syn × post-syn

Pre-synaptic and post-synaptic terms are local to each neuron.

Learning signal, aka neuromodulator, is global.

The product pre-syn × post-syn tends to be large when the twoneurons spike at nearly the same time.

“Neurons that fire together, wire together” (Hebbian theory, STDP,BCM theory) [Hebb ’49] [Bienenstock et al ’82].

Osvaldo Simeone Neuromorphic Computing 33 / 53

Learning Rules

The general form of many learning rules for synaptic weights followsthe three-factor format [Fremaux and Gerstner ’17]:

θ ← θ + η × learning signal × pre-syn × post-syn

Pre-synaptic and post-synaptic terms are local to each neuron.

Learning signal, aka neuromodulator, is global.

The product pre-syn × post-syn tends to be large when the twoneurons spike at nearly the same time.

“Neurons that fire together, wire together” (Hebbian theory, STDP,BCM theory) [Hebb ’49] [Bienenstock et al ’82].

Osvaldo Simeone Neuromorphic Computing 33 / 53

Learning Rules

The general form of many learning rules for synaptic weights followsthe three-factor format [Fremaux and Gerstner ’17]:

θ ← θ + η × learning signal × pre-syn × post-syn

Pre-synaptic and post-synaptic terms are local to each neuron.

Learning signal, aka neuromodulator, is global.

The product pre-syn × post-syn tends to be large when the twoneurons spike at nearly the same time.

“Neurons that fire together, wire together” (Hebbian theory, STDP,BCM theory) [Hebb ’49] [Bienenstock et al ’82].

Osvaldo Simeone Neuromorphic Computing 33 / 53

Deriving the Learning Rules

Three-factor rules can be derived as a form of stochastic gradientdescent for the probabilistic model:

θ ← θ + η × M × ∇ ln p(output | input)

learningrate

learningsignal

Osvaldo Simeone Neuromorphic Computing 34 / 53

Deriving the Learning Rules

Gradient for synaptic weights under GLM (no lateral connections)obtained by summing over time t

∇w

(l)j,i

log pθ(s(l)i ,t |s

(l)≤t−1, s

(l)≤t−1) =

(at ∗ s(l−1)j ,t

)︸ ︷︷ ︸pre-synaptic trace

(s(l)i ,t − σ

(u(l)i ,t

))︸ ︷︷ ︸post-synaptic error

Post-synaptic error = desired/ observed behavior - model averagedbehavior [Bienenstock et al ’82]

Osvaldo Simeone Neuromorphic Computing 35 / 53

Deriving the Learning Rules: Supervised Learning

θ ← θ + η × M × ∇ ln p(output | input)

= 1 or from VI output ← traininginput ← training

Variational Inference (VI) needed if there are intermediate layers[Rezende et al ’11] [Osogami ’17] [Jang et al ’18].

Osvaldo Simeone Neuromorphic Computing 36 / 53

Deriving the Learning Rules: Unsupervised Learning

For generative (unsupervised) models:

θ ← θ + η × M × ∇ ln p(output | input)

= from VI output ← traininginput ← ∅

Unsupervised learning models always have hidden layers.

Osvaldo Simeone Neuromorphic Computing 37 / 53

Deriving the Learning Rules: Reinforcement Learning

Using policy gradient to learn an SNN policy:

θ ← θ + η × M × ∇ ln p(output | input)

reward/ return& VI

output ← actioninput ← state

Variational Inference (VI) needed if there are intermediate layers[Rezende et al ’11] [Osogami ’17] [Jang et al ’18].

Osvaldo Simeone Neuromorphic Computing 38 / 53

Overview

Motivation

Models

Algorithms

Examples

Osvaldo Simeone Neuromorphic Computing 39 / 53

Supervised Learning

encoder

source

decoder

5

decoder

rotated

Osvaldo Simeone Neuromorphic Computing 40 / 53

Supervised Learning

From [Jang et al ’18-2]

Osvaldo Simeone Neuromorphic Computing 41 / 53

Unsupervised Learning

decoder

Osvaldo Simeone Neuromorphic Computing 42 / 53

Unsupervised Learning

decoder

variational

SNNencoder

Osvaldo Simeone Neuromorphic Computing 43 / 53

Unsupervised Learning

Osvaldo Simeone Neuromorphic Computing 44 / 53

Reinforcement Learning

encoder decoder

actuator

up

Osvaldo Simeone Neuromorphic Computing 45 / 53

Reinforcement Learning

From [Rosenfeld et al ’18]

Osvaldo Simeone Neuromorphic Computing 46 / 53

Concluding Remarks

Statistical signal processing review of neuromorphic computing viaSpiking Neural Networks.

Additional topics:I recurrent SNNs for long-term memory [Maas ’11]I neural sampling: information encoded in steady-state behavior [Buesing

et al ’11]I Bayesian learning via Langevin dynamics [Pecevski et al ’11] [Kappel et

al ’15]

Some open problems:I meta-learning, life-long learning, transfer learning [Bellec et al ’18]I learning I/O interfaces [Lazar and Toth ’03]

Osvaldo Simeone Neuromorphic Computing 47 / 53

Acknowledgements

This work has received funding from the European Research Council(ERC) under the European Union’s Horizon 2020 research and innovation

programme (grant agreement No. 725731) and from the US NationalScience Foundation (NSF) under grant ECCS 1710009.

Osvaldo Simeone Neuromorphic Computing 48 / 53

References

[Gerstner and Kistler ’02] W. Gerstner and W. M. Kistler, Spiking neuronmodels: Single neurons, populations, plasticity. Cambridge UniversityPress, 2002.[Pillow et al ’08] J.W. Pillow, J. Shlens, L. Paninski, A. Sher, A. M. Litke,E. Chichilnisky, and E. P. Simoncelli, “Spatio-temporal correlations andvisual signalling in a complete neuronal population,” Nature, vol. 454, no.7207, p. 995, 2008.[Osogami ’17] T. Osogami, “Boltzmann machines for time-series,” arXivpreprint arXiv:1708.06004, 2017.[Ibnkahla ’00] Ibnkahla M. Applications of neural networks to digitalcommunications–a survey. Signal processing. 2000 Jul 1;80(7):1185-215.[Koller and Friedman ’09] Koller D, Friedman N, Bach F. Probabilisticgraphical models: principles and techniques. MIT press; 2009.

Osvaldo Simeone Neuromorphic Computing 49 / 53

References

[Fremaux et al ‘08] N. Fremaux and W. Gerstner, “Neuromodulatedspike-timing-dependent plasticity, and theory of three-factor learningrules,” Frontiers in neural circuits, vol. 9, p. 85, 2016.[Jang et al ’18] H. Jang, O. Simeone, B. Gardnerm and A. Gruning,“Spiking neural networks: A stochastic signal processing perspective,” ...[Rezende et al ’11] Rezende DJ, Wierstra D, Gerstner W. “Variationallearning for recurrent spiking networks”, In Advances in Neural InformationProcessing Systems, 2011 (pp. 136-144).[Brea et al ’13] J. Brea, W. Senn, and J.-P. Pfister, “Matching recall andstorage in sequence learning with spiking neural networks,” Journal ofNeuroscience, vol. 33, no. 23, pp. 9565–9575, 2013.[Hebb ’49] D. Hebb, The Organization of Behavior. New York: Wiley andSons, Nov. 1949.

Osvaldo Simeone Neuromorphic Computing 50 / 53

References

[Bienenstock et al ’82] E. L. Bienenstock, L. N. Cooper, and P. W. Munro,“Theory for the development of neuron selectivity: orientation specificityand binocular interaction in visual cortex,” Journal of Neuroscience, vol. 2,no. 1, pp. 32–48, 1982.[Pecevski et al ’11] D. Pecevski, L. Buesing, and W. Maass, “Probabilisticinference in general graphical models through sampling in stochasticnetworks of spiking neurons,” PLOS Computational Biology, vol. 7, no.12, pp. 1–25, 12 2011.[Rosenfeld et al ’18] Rosenfeld B, Simeone O, Rajendran B. LearningFirst-to-Spike Policies for Neuromorphic Control Using Policy Gradients.arXiv preprint arXiv:1810.09977. 2018 Oct 23.[Jang et al ’18-2] Jang H, Simeone O. Training Dynamic ExponentialFamily Models with Causal and Lateral Dependencies for GeneralizedNeuromorphic Computing. arXiv preprint arXiv:1810.08940. 2018 Oct 21.

Osvaldo Simeone Neuromorphic Computing 51 / 53

References

[Bellec et al ’18] G. Bellec, D. Salaj, A. Subramoney, R. Legenstein, andW. Maass, “Long short-term memory and learning-to-learn in networks ofspiking neurons,” arXiv preprint arXiv:1803.09574, 2018.[Kappel et al ’15] Kappel D, Habenschuss S, Legenstein R, Maass W.Synaptic sampling: a Bayesian approach to neural network plasticity andrewiring. InAdvances in Neural Information Processing Systems 2015 (pp.370-378).[Buesing et al ’11] Buesing L, Bill J, Nessler B, Maass W. Neural dynamicsas sampling: a model for stochastic computation in recurrent networks ofspiking neurons. PLoS computational biology. 2011 Nov3;7(11):e1002211.[Lazar and Toth ’03] Lazar, Aurel A., and Laszlo T. Toth. "Time encodingand perfect recovery of bandlimited signals." ICASSP (6). 2003.[Maas ’11] Maass W. Liquid state machines: motivation, theory, andapplications. InComputability in context: computation and logic in thereal world 2011 (pp. 275-296).

Osvaldo Simeone Neuromorphic Computing 52 / 53

References

[Neftci ’18] Neftci EO. Data and power efficient intelligence withneuromorphic learning machines. iScience. 2018 Jul 27;5:52.[Anwani and Rajendran ’18] N. Anwani and B. Rajendran, “TrainingMultilayer Spiking Neural Networks using NormAD based Spatio-TemporalError Backpropagation,” arXiv:1811.10678.

Osvaldo Simeone Neuromorphic Computing 53 / 53

Recommended