1
A theory of sequence memory in the neocortex Jeff Hawkins, Subutai Ahmad, Yuwei Cui and Chetan Surpur [email protected], [email protected], [email protected] We thank Scott Purdy, Alex Lavin, Matthew Larkum, Ahmed El Hady for helpful discussions. Continuous learning from sequence streams 4. Works well on real-world problems Poirazi P, Brannon T, Mel BW (2003) Neuron 37:989–999. Chklovskii, D. B., Mel, B. W., and Svoboda, K. (2004). Nature 431, 782–8. Hawkins, J., and Ahmad, S. (2015). arXiv:1511.00083 [q-bio.NC] Holtmaat A, Svoboda K (2009) Nat Rev Neurosci 10:647–658. Kanerva, P, Sparse Distributed Memory. The MIT Press, 1988 Lamme, V. A., Supèr, H., and Spekreijse, H. (1998). Curr. Opin. Neurobiol. 8, 529–35. Losonczy, A., Makara, J. K., and Magee, J. C. (2008). Nature 452, 436–41. Major, G., Larkum, M. E., and Schiller, J. (2013). Annu. Rev. Neurosci. 36, 1–24. Piuri, V., J Parallel Distr Com., vol. 61, pp. 18-48. Smith SL, Smith IT, Branco T, Häusser M (2013) Nature 503:115–120. Vinje, W. E., and Gallant, J. L. (2002). J. Neurosci. 22, 2904–2915. Streams of sensory inputs ... 1. Sequence learning is ubiquitous in cortex References Active Dendrites Small sets of synapses in close proximity act as independent pattern detectors. Detection of a pattern causes an NMDA spike and depolarization at the soma. Depolarization acts as prediction, causing cell to fire earlier. Learning occurs by growing new synapses via Hebbian learning rule. Our code is open source We believe in open research and full transparency. Numenta's research and algorithm code is part of the open-source project Numenta Platform for Intelligent Computing (NuPIC). As a fast growing project, NuPIC currently has more than 4,000 followers and more than 1000 forks on Github. What is neural mechanism for sequence learning? HTM Sequence Memory: 1. Neurons learn to recognize hundreds of patterns using active dendrites. 2. Recognition of patterns act as predictions by depolarizing the cell without generating an immediate action potential. 3. A network of neurons with active dendrites forms a power- ful sequence memory Acknowledgements Sequence recognition Sequence prediction Behavior generation X A B B C C D Y Before learning X B’’ C’’ D’ Y’’ After learning A B’ C’ Same columns, but only one cell active per column after learning, Active cells Depolarized (predictive) cells Inactive cells Time 3. HTM network model for sequence learning A sparse set of columns becomes active due to intercolumn inhibition High-order sequence Noise High-order sequence Noise ... ... Task: sequence prediction with streams of high-order sequences Check prediction at the end of each sequence Learning complex high-order sequences ABCD vs XBCY 0 5000 10000 15000 20000 25000 Number of elements seen 0.0 0.2 0.4 0.6 0.8 1.0 Prediction Accuracy HTM LSTM window=1000 LSTM window=3000 LSTM window=9000 Example sequences: G, I, H, E, C, D, A B, I, H, E, C, D, F A, J, H, I, F, D, E, B C, J, H, I, F, D, E, G ...... Modified sequences: G, I, H, E, C, D, F B, I, H, E, C, D, A A, J, H, I, F, D, E, G C, J, H, I, F, D, E, B ...... HTM learns continuously, no batch training required. HTM is more robust and recovers more quickly. High fault tolerance to neuron death 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Fraction of cell death 0.2 0.4 0.6 0.8 1.0 Accuracy after cell death HTM LSTM HTM is fault tolerant due to properties of sparse distributed representations (Kaner- va 1988) and nonlinear dendritic properties of HTM neurons (Hawkins & Ahmad 2015). Mean absolute percent error Negative Log-likelihood Shift ARIMA ESN LSTM1000 LSTM3000 LSTM6000 HTM 0.00 0.10 0.20 0.30 LSTM1000 LSTM3000 LSTM6000 HTM 0.0 0.5 1.0 1.5 2015-04-20 Monday 2015-04-21 Tuesday 2015-04-22 Wednesday 2015-04-23 Thursday 2015-04-24 Friday 2015-04-25 Saturday 2015-04-26 Sunday 0k 5k 10k 15k 20k 25k Passenger Count Data sources: http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml Task: predict taxi passenger count in NYC Apr 01 Apr 08 Apr 15 Apr 22 Apr 29 May 06 LSTM3000 LSTM6000 HTM Mean absolute percent error Negative Log-likelihood 0.10 0.15 0.20 0.25 0.30 0.35 Mean absolute percent error LSTM1000 LSTM3000 LSTM6000 HTM 0.00 0.02 0.04 0.06 0.08 0.10 0.12 LSTM1000 LSTM3000 LSTM6000 HTM 0.0 0.5 1.0 1.5 2.0 HTM has comparable performance to state-of-the-art algorithms 20% increase of weekday night traffic (9pm-11pm) 20% decrease of weekday morning traffic (7am-11am) HTM quickly adapts to changes due to its ability to learn continuously Major, Larkum and Schiller 2013 Activation rules: Select the top 2% of columns with strongest inputs on proximal dendrite as active columns If any cell in an active column is predicted, only the predicted cells fire If no cell in an active column is predicted, all cells in the column fire Unsupervised Hebbian-like learning rules: If a depolarized cell becomes active subsequently, its active dendritic segment will be reinforced If a depolarized cell does not become active, we apply a small decay to active segments of that cell If no cell in an active column is predicted, the cell with the most activated segment gets reinforced 6. Testable predictions Apical inputs predict entire sequences Feedback biases network for sequence B’ C’ D’ Apical dendrites Input C Representation C’ Input Y Does not match expectation It has been speculated that feedback connections implement expectation or bias (Lamme et al., 1998). Our neuron model suggests a mechanism for top-down expectation in the cortex. Feedback to the apical dendrites can predict multiple elements simultaneously. New feedforward input will be intepreted as part of the predicted sequence. 1) Sparser activations during a predictable sensory stream. (Vinje & Gallant 2002) 2) Unanticipated inputs lead to a burst of activity correlated vertically within mini-columns. 3) Neighboring mini-columns will not be correlated. 4) Predicted cells need fast inhibition to inhibit nearby cells within mini-column. 5) For predictable stimuli, dendritic NMDA spikes will be much more frequent than somatic action potentials. (Smith et al., 2013) 6) Strong LTP in distal dendrites requires bAP and NMDA spike (Losonczy et al., 2008) 7) Weak LTP (in the absence of NMDA spikes) in dendritic segments if a cluster of synapses become active followed by a bAP. 8) Localized weak LTD when an NMDA spike is not followed by a bAP. · Unsupervised learning · Quickly adapts to changes in data · Learns high-order structure in sequences · Robust and fault tolerant · Makes multiple simultaneous predictions · Works well on real-world problems · Accurate biological model 2. HTM neuron models active dendrites Learning and activation rules Before learning, Unexpected input After learning, Predicted input Two separate sparse representations Only one cell active per column, due to intracolumn inhibition HTM exhibits many desirable features for sequence learning: Active Dendrites Pattern detectors Each cell can recognize 100’s of unique patterns Feedforward 100’s of synapses “Classic” receptive field Context 1,000’s of synapses Depolarize neuron “Predicted state” Pyramidal neuron HTM neuron Each segment on distal dendrite can recognize a particular pattern and activate dendrite (Poirazi et al., 2003; Palmer et al., 2014) Proximal dendrite can recognize a particular pattern and activate cell Switch to modified dataset 0.0 0.3 1.0 Synapse permanence 0 1 Synapse weight Dendrite Axon Learning with synaptogenesis Chklovskii, Mel, and Svoboda 2004 Holtmaat and Svoboda 2009 5. Summary Further implementation detail can be found at https://github.com/numenta/nupic Injected change to the data In contrast, LSTM and most other artificial neural networks are sensitive to loss of neurons or synapses (Piuri 2001) February 2016

A theory of sequence memory in the neocortex · A theory of sequence memory in the neocortex Jeff Hawkins, Subutai Ahmad, Yuwei Cui and Chetan Surpur [email protected], [email protected],

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A theory of sequence memory in the neocortex · A theory of sequence memory in the neocortex Jeff Hawkins, Subutai Ahmad, Yuwei Cui and Chetan Surpur jhawkins@numenta.com, sahmad@numenta.com,

A theory of sequence memory in the neocortexJeff Hawkins, Subutai Ahmad, Yuwei Cui and Chetan Surpur

[email protected], [email protected], [email protected]

We thank Scott Purdy, Alex Lavin, Matthew Larkum, Ahmed El Hady for helpful discussions.

Continuous learning from sequence streams

4. Works well on real-world problems

Poirazi P, Brannon T, Mel BW (2003) Neuron 37:989–999.Chklovskii, D. B., Mel, B. W., and Svoboda, K. (2004). Nature 431, 782–8. Hawkins, J., and Ahmad, S. (2015). arXiv:1511.00083 [q-bio.NC]Holtmaat A, Svoboda K (2009) Nat Rev Neurosci 10:647–658.Kanerva, P, Sparse Distributed Memory. The MIT Press, 1988Lamme, V. A., Supèr, H., and Spekreijse, H. (1998). Curr. Opin. Neurobiol. 8, 529–35.Losonczy, A., Makara, J. K., and Magee, J. C. (2008). Nature 452, 436–41. Major, G., Larkum, M. E., and Schiller, J. (2013). Annu. Rev. Neurosci. 36, 1–24. Piuri, V., J Parallel Distr Com., vol. 61, pp. 18-48.Smith SL, Smith IT, Branco T, Häusser M (2013) Nature 503:115–120.Vinje, W. E., and Gallant, J. L. (2002). J. Neurosci. 22, 2904–2915.

Streams of sensory inputs

...

1. Sequence learning is ubiquitous in cortex

References

Active Dendrites

Small sets of synapses in close proximity act as independent pattern detectors.

Detection of a pattern causes an NMDA spike and depolarization at the soma.

Depolarization acts as prediction, causing cell to fire earlier.

Learning occurs by growing new synapses via Hebbian learning rule.

Our code is open sourceWe believe in open research and full transparency. Numenta's research and algorithm code is part of the open-source project Numenta Platform for Intelligent Computing (NuPIC). As a fast growing project, NuPIC currently has more than 4,000 followers and more than 1000 forks on Github.

What is neural mechanism for sequence learning?

HTM Sequence Memory:1. Neurons learn to recognize hundreds of patterns using active dendrites.

2. Recognition of patterns act as predictions by depolarizing the cell without generating an immediate action potential.

3. A network of neurons with active dendrites forms a power-ful sequence memory

Acknowledgements

Sequence recognitionSequence predictionBehavior generation

X

A B

B

C

C

D

Y

Before learning

X B’’ C’’

D’

Y’’

After learning

A B’ C’

Same columns,but only one cell active per column after learning,

Active cells

Depolarized (predictive) cells

Inactive cells

Time

3. HTM network model for sequence learning

A sparse set of columns becomes activedue to intercolumn inhibition

High-order sequence Noise High-order sequence Noise ......

Task: sequence prediction with streams of high-order sequences

Check prediction at the end of each sequence

Learning complex high-order sequences

ABCD vs XBCY

0 5000 10000 15000 20000 25000Number of elements seen

0.0

0.2

0.4

0.6

0.8

1.0

Pre

dict

ion

Acc

urac

y

HTMLSTM window=1000LSTM window=3000LSTM window=9000

Example sequences:

G, I, H, E, C, D, AB, I, H, E, C, D, F

A, J, H, I, F, D, E, BC, J, H, I, F, D, E, G......

Modified sequences:

G, I, H, E, C, D, FB, I, H, E, C, D, A

A, J, H, I, F, D, E, GC, J, H, I, F, D, E, B......

HTM learns continuously, no batch training required.HTM is more robust and recovers more quickly.

High fault tolerance to neuron death

0.0 0.1 0.2 0.3 0.4 0.5 0.6Fraction of cell death

0.2

0.4

0.6

0.8

1.0

Acc

urac

y af

ter c

ell d

eath

HTMLSTM HTM is fault tolerant due to properties of

sparse distributed representations (Kaner-va 1988) and nonlinear dendritic properties of HTM neurons (Hawkins & Ahmad 2015).

Mea

n ab

solu

te p

erce

nt e

rror

Neg

ativ

e Lo

g-lik

elih

ood

Shift

ARIMAESN

LSTM10

00

LSTM30

00

LSTM60

00HTM

0.00

0.10

0.20

0.30

LSTM10

00

LSTM30

00

LSTM60

00HTM

0.0

0.5

1.0

1.5

2015-04-20 Monday

2015-04-21Tuesday

2015-04-22Wednesday

2015-04-23 Thursday

2015-04-24 Friday

2015-04-25 Saturday

2015-04-26 Sunday

0k

5k

10k

15k

20k

25k

Pas

seng

er C

ount

Data sources: http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml

Task: predict taxi passenger count in NYC

Apr 01 Apr 08 Apr 15 Apr 22 Apr 29 May 06

LSTM3000LSTM6000HTM

Mea

n ab

solu

te p

erce

nt e

rror

Neg

ativ

e Lo

g-lik

elih

ood

0.10

0.15

0.20

0.25

0.30

0.35

Mea

n ab

solu

te p

erce

nt e

rror

LSTM10

00

LSTM30

00

LSTM60

00HTM

0.00

0.02

0.04

0.06

0.08

0.10

0.12

LSTM10

00

LSTM30

00

LSTM60

00HTM

0.0

0.5

1.0

1.5

2.0

HTM has comparable performance to state-of-the-art algorithms

20% increase of weekday night traffic (9pm-11pm)

20% decrease of weekday morning traffic (7am-11am)

HTM quickly adapts to changes due to its ability to learn continuously

Major, Larkum and Schiller 2013

Activation rules: Select the top 2% of columns with strongest inputs on proximal dendrite as active columns

If any cell in an active column is predicted, only the predicted cells fire

If no cell in an active column is predicted, all cells in the column fire

Unsupervised Hebbian-like learning rules: If a depolarized cell becomes active subsequently, its active dendritic segment will be reinforced

If a depolarized cell does not become active, we apply a small decay to active segments of that cell

If no cell in an active column is predicted, the cell with the most activated segment gets reinforced

6. Testable predictions

Apical inputs predict entire sequences

Feedback biases network for sequence B’ C’ D’

Apical dendrites

Input CRepresentation C’

Input YDoes not match expectation

It has been speculated that feedback connections implement expectation or bias (Lamme et al., 1998). Our neuron model suggests a mechanism for top-down expectation in the cortex.

Feedback to the apical dendrites can predict multiple elements simultaneously.New feedforward input will be intepreted as part of the predicted sequence.

1) Sparser activations during a predictable sensory stream. (Vinje & Gallant 2002)2) Unanticipated inputs lead to a burst of activity correlated vertically within mini-columns.3) Neighboring mini-columns will not be correlated.4) Predicted cells need fast inhibition to inhibit nearby cells within mini-column.5) For predictable stimuli, dendritic NMDA spikes will be much more frequent than somatic action potentials. (Smith et al., 2013)6) Strong LTP in distal dendrites requires bAP and NMDA spike (Losonczy et al., 2008)7) Weak LTP (in the absence of NMDA spikes) in dendritic segments if a cluster of synapses become active followed by a bAP.8) Localized weak LTD when an NMDA spike is not followed by a bAP.

· Unsupervised learning· Quickly adapts to changes in data· Learns high-order structure in sequences· Robust and fault tolerant· Makes multiple simultaneous predictions· Works well on real-world problems· Accurate biological model

2. HTM neuron models active dendrites

Learning and activation rules

Before learning, Unexpected input After learning, Predicted input

Two separate sparse representations

Only one cell active per column,due to intracolumn inhibition

HTM exhibits many desirable features for sequence learning:

Active Dendrites

Pattern detectors Each cell can recognize 100’s of unique patterns

Feedforward

100’s of synapses “Classic” receptive field

Context

1,000’s of synapses Depolarize neuron “Predicted state”

Pyramidal neuron HTM neuron

Each segment on distal dendrite can recognize a particular pattern and activate dendrite(Poirazi et al., 2003; Palmer et al., 2014)

Proximal dendrite can recognize a particular pattern and activate cell

Switch to modified dataset

0.0 0.3 1.0 Synapse permanence

0 1 Synapse weight

Dendrite

Axon

Learning with synaptogenesis

Chklovskii, Mel, and Svoboda 2004Holtmaat and Svoboda 2009

5. Summary

Further implementation detail can be found at https://github.com/numenta/nupic

Injected change to the data

In contrast, LSTM and most other artificial neural networks are sensitive to loss of neurons or synapses (Piuri 2001)

February 2016