Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
ReservoirComputingMethodsBasicsandRecentAdvances
ClaudioGallicchio
1
ContactInformationClaudio Gallicchio, Ph.D.AssistantProfessorComputational Intelligence and Machine Learning Grouphttp://www.di.unipi.it/groups/ciml/
Dipartimento di Informatica - Universita' di PisaLargo Bruno Pontecorvo 3, 56127 Pisa, Italy– PoloFibonacci
Research onReservoir Computing
ChairoftheIEEETaskForceonRChttps://sites.google.com/view/reservoir-computing-tf/home
web: www.di.unipi.it/~gallicchemail: [email protected].:+390502213145
2
DynamicalRecurrentModels} Neuralnetworkarchitectureswithfeedbackconnectionsareabletodealwithtemporaldata inanaturalfashion
} Computationisbasedondynamicalsystems
3
dynamical component
feed-forward component
RecurrentNeuralNetworks(RNNs)} Feedbacksallowstherepresentationofthetemporal context inthestate(neuralmemory)
} Discrete-timenon-autonomousdynamicalsystem} Potentiallytheinputhistorycanbemaintainedforarbitraryperiodsoftime
} Theoreticallyverypowerful} Universalapproximationthroughlearning
4
LearningwithRNNs(repetita)} UniversalapproximationofRNNs(e.g.SRN,NARX)throughlearning
} Trainingalgorithmsinvolvesomedownsidesthatyoualreadyknow} Relativelyhighcomputationaltrainingcostsandpotentiallyslowconvergence
} Localminimaoftheerrorfunction(whichisgenerallynon-convex)
} Vanishingofthegradientsandproblemoflearninglong-termdependencies} Alleviatedbygatedrecurrentarchitectures(althoughtrainingismadequitecomplexinthiscase)
5
DynamicalRecurrentNetworkstrainedeasily
Question:} IsitpossibletotrainRNNarchitecturesmoreefficiently?
} Wecanshiftthefocusfromtrainingalgorithmstothestudyofinitializationconditionsandstability oftheinput-drivensystem
} Toensurestabilityofthedynamicalpartwemustimposeacontractivepropertytothesystemdynamics
6
LiquidStateMachines} W.Maas,T.Natschlaeger,H.Markram (2002)
} Originatedfromthestudyofbiologicallyinspiredspikingneurons} Theliquidshouldsatisfyapointwiseseparationproperty} Dynamicsprovidedbyapoolofspikingneuronswithbio-inspiredarch.
7
W. Maass, T. Natschlaeger, and H. Markram, Real-time computing withoutstable states: A new framework for neural computation based on perturbations,Neural Computation. 14(11), 2531–2560, (2002)
Integrate-and-fire
Izhikevich
FractalPredictionMachines} P.Tino,G.Dorffner (2001)
} ContractiveIteratedFunctionSystems} FractalAnalysis
8
Tino, P., Dor®ner, G.: Predicting the future of discrete sequences from fractal representations of the past. Machine Learning 45 (2001) 187-218
EchoStateNetworks} H.Jaeger(2001)
} Controlthespectralpropertiesoftherecurrencematrix} EchoStateProperty
9
Jaeger, H.: The "echo state" approach to analysing and training recurrent neural networks. Technical Report GMD Report 148, German National Research Center for Information Technology (2001)
Jaeger, H., Haas, H.: Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science 304 (2004) 78-80
𝒅(𝑡)
𝒅 𝑡− 𝑦(𝑡)
ReservoirComputing
10
} Reservoir:untrainednon-linearrecurrenthiddenlayer} Readout:(linear)outputlayer
} Initialize𝐖() and𝐖*randomly
} Scale𝐖* tomeetthecontractive/stabilityproperty
} Drivethenetworkwiththeinputsignal
} Discardaninitialtransient} Trainthereadout
𝐱 t = tanh(𝐖()𝒖 𝑡 +𝐖* 𝐱(t − 1))
𝐲 t = 𝐖678𝒙 𝑡
Verstraeten, David, et al. "An experimentalunification of reservoir computingmethods." Neural networks 20.3 (2007): 391-403.
11
EchoStateNetworks
EchostateNetwork:Architecture
12
Input Space: Reservoir State Space: Output Space:
} Reservoir:untrained,large,sparselyconnected,non-linearlayer
} Readout:trained,linearlayer
EchostateNetwork:Architecture
13
Input Space: Reservoir State Space: Output Space:
Reservoir} Non-linearlyembedtheinputintoahigherdimensionalfeaturespacewhere
theoriginalproblemismorelikelytobesolvedlinearly(Cover’sTh.)} Randomizedbasisexpansioncomputedbyapoolofrandomizedfilters} Providesa“rich”setofinput-drivendynamics} Contextualizeeachnewinputgiventhepreviousstate:memory
Efficient:therecurrentpartisleftcompletelyuntrained
EchostateNetwork:Architecture
14
Input Space: Reservoir State Space: Output Space:
Readout} Computethefeaturesinthereservoirstatespacefortheoutputcomputation
} Typicallyimplementedbyusinglinearmodels
Reservoir:StateComputation
15
} Itisalsousefultoconsidertheiteratedversionofthestatetransitionfunction} thereservoirstateafterthepresentationofanentireinputsequence
} Thereservoirlayerimplementsthestatetransitionfunctionofthedynamicalsystem
EchoStateProperty(ESP)AvalidESNshouldsatisfythe“EchoStateProperty”(ESP)} Def.AnESNsatisfiestheESPwhenever:
} Thestateofthenetworkasymptoticallydependsonlyonthedrivinginputsignal
} Dependenciesontheinitialconditionsareprogressivelylost} Equivalentdefinitions:statecontractivity,stateforgettingandinputforgetting
16
ConditionsfortheESPTheESPcanbeinspectedbycontrollingthealgebraicpropertiesoftherecurrentweightmatrix} Theorem.Ifthemaximumsingularvalueofislessthan1thentheESNsatisfiestheESP.} SufficientconditionfortheESP(contractivedynamicsforeveryinput)
} Theorem. Ifthespectralradiusofisgreaterthan1than(undermildassumptions)theESNdoesnotsatisfytheESP.} NecessaryconditionfortheESP(stabledynamics)
} recall:thespectralradiusisthemaximumamongtheabsolutevaluesoftheeigenvalues
17
ESNInitialization:HowtosetuptheReservoir
18
} Elementsin𝐖() areselectedrandomlyin[−𝑠𝑐𝑎𝑙𝑒(), 𝑠𝑐𝑎𝑙𝑒()]} 𝐖* initializationprocedure:
} Startwitharandomlygeneratedmatrix𝐖*AB)C} Scale𝐖*AB)C tomeettheconditionfortheESP(usually:thenecessaryone)
ESNTraining
19
} Runthenetworkonthewholeinputsequenceandcollectthereservoirstates
} Discardaninitialtransient(washout)} Solvetheleastsquaresproblemdefinedby
s
TrainingtheReadout
20
} On-linetrainingisnotthestandardchoiceforESNs} LeastMeanSquaresistypicallynotsuitable
} Higheigenvaluespread(i.e.largeconditionnumber)ofX
} RecursiveLeastSquaresismoresuitable
} Off-linetrainingisstandardinmostapplications} Closedformsolutionoftheleastsquaresproblembydirectmethods} Moore-Penrosepseudo-inversion
} Possibleregularizationusingrandomnoiseinthestates
} Ridge-regression
} 𝝀𝒓 isaregularizationcoefficient(thehigher,themorethereadoutisregularized)
TrainingtheReadout/2
21
} Multiplereadoutsforthesamereservoir} Solvingmorethan1taskwiththesamereservoirdynamics
} Otherchoicesforthereadout:} Multi-layerPerceptron} SupportVectorMachine} K-NearestNeighbor} …
ESN– AlgorithmicDescription:Training} Initialization
} Win = 2*rand(Nr,Nu) - 1; Win = scale_in * Win;
} Wh = 2*rand(Nr,Nr) - 1; Wh = rho * (Wh / max(abs(eig(Wh)));
} state = zeros(Nr,1);
} Runthereservoirontheinputstream} for t = 1:trainingSteps
state = tanh(Win * u(t) + Wh * state);X(:,end+1) = state;
end
} Discardthewashout} X = X(:,Nwashout+1:end);
} Trainthereadout} Wout = Ytarget(:,Nwashout+1:end)*X’*inv(X*X’+lambda_r*eye(Nr));
} TheESNisnowreadyforoperation(estimations/predictions)
22
ESN– AlgorithmicDescription:OperationPhase} Runthereservoirontheinputstream(testpart)
} for t = testStepsstate = tanh(Win * u(t) + Wh * state);output(:,end+1) = Wout * state;
end
} Note:whenworkingonasingletime-seriesyoudonotneedto} re-initializethestate} discardtheinitialtransient
23
ESNHyper-parameterization&ModelSelectionImplementESNsfollowingagoodpracticeformodelselection(likeforanyotherML/NNmodel)} Carefulselectionofnetwork’shyper-parameters
} reservoirdimension} spectralradius} inputscaling} readoutregularization} …
24
ESNMajorArchitecturalVariants
25
} directconnectionsfromtheinputtothereadout
} feedbackconnectionsfromtheoutputtothereservoir
} mightaffectthestabilityofthenetwork’sdynamics} smallvaluesaretypicallyused
ESNforsequence-to-elementtasks} Thelearningproblemrequiresonesingleoutputforeachinputsequence} Granularityofthetaskisonentiresequences(notontime-steps)
} example:sequenceclassification
26
inputsequence
outputelement
time→
𝒙(𝟏)
𝒙(𝟑)
𝒙(𝟒)𝒙(𝟓)
𝒙(𝟐)
𝒙(𝒔) 𝒚(𝒔)
} Laststate} 𝒙 𝑠 = 𝒙 𝑁N
} Meanstate} 𝒙 𝑠 = O
PQR ∑ 𝒙(𝑡)PQ8TO
} Sumstate} 𝒙 𝑠 = ∑ 𝒙(𝑡)PQ
8TO
𝒔 = [𝒖 𝟏 ,… , 𝒖 𝑵𝒔 ]
LeakyIntegratorESN(LI-ESN)
27
} Useleakyintegratorreservoirunits
} Applyanexponentialmovingaveragetoreservoirstates} low-passfiltertobetterhandleinputsignalsthatchangeslowlywith
respecttothesamplingfrequency
} theleakingrateparameter 𝑎 ∈ 0,1} controlsthespeedofreservoirdynamicsinreactiontotheinput} smallervaluesimplyreservoirthatreactmoreslowlytotheinputchanges} ifa=1 thenstandardESNdynamicsareobtained
28
ExamplesofApplications
ApplicationsofESNs:Examples/1} ESNsformodelingchaotictimeseries} Mackey-Glasstimeseries
29
contraction coefficient reservoir dimension
} forα>16.8thesystemhasachaoticattractor
} mostusedvaluesare17and30
ESNperformanceontheMG17task
ApplicationsofESNs:Examples/2} Forecastingofindoorusermovements
30
Generalizationofpredictiveperformancetounseenenvironments
https://archive.ics.uci.edu/ml/datasets/Indoor+User+Movement+Prediction+from+RSS+dataDatasetisavailableonlineontheUCIrepository
q DeployedWSN:4fixedsensors(anchors)&1sensorwornbytheuser(mobile)
q PredictiftheuserwillchangeroomwhensheisinpositionM
q Input:receivedsignalstrength(RSS)datafromthe4anchors(10dimensionalvectorforeachtimestep,noisydata)
q Target:binaryclassification(changeenvironmentalcontextornot)
http://fp7rubicon.eu/
ApplicationsofESNs:Examples/2} Forecastingofindoorusermovements– Inputdata
31
exampleoftheRSStracesgatheredfromallthe4anchorsintheWSN,fordifferentpossiblemovementpaths
ApplicationsofESNs:Examples/3} HumanActivityRecognition(HAR)andLocalization
32
q Input fromheterogeneoussensorsources(datafusion)q Predictingeventoccurrenceandconfidenceq Highaccuracy ofeventrecognition/indoorlocalization
>90%ontestdataq Effectivenessinlearning avarietyofHAR tasksq Effectivenessintrainingonnew events
ApplicationsofESNs:Examples/4} Robotics
33
q Indoorlocalizationestimationincriticalenvironment(StellaMarisHospital)
q PreciserobotlocalizationestimationusingnoisyRSSIdata(35cm)
q Recalibrationincaseofenvironmentalalterationsorsensormalfunctions
q Input:temporalsequencesofRSSIvalues(10dimensionalvectorforeachtimestep,noisydata)
q Target:temporalsequencesoflaser-basedlocalization(x,y)
ApplicationsofESNs:Examples/7} HumanActivityRecognition
34
q Classificationofhumandailyactivities fromRSSdatageneratedbysensorswornbytheuser
q Input:temporalsequencesofRSSvalues(6dimensionalvectorforeachtimestep,noisydata)
q Target:classificationofhumanactivity(bending,cycling,lying,sitting,standing,walking)
q Extremelygoodaccuracy(≈0,99)andF1score(≈0,96)
q 2ndPrizeat2013EvAAL InternationalCompetition
http://archive.ics.uci.edu/ml/datasets/Activity+Recognition+system+based+on+Multisensor+data+fusion+%28AReM%29DatasetisavailableonlineontheUCIrepository
ApplicationsofESNs:Examples/8
35
Wii Balance Board
BBS
} AutonomousBalanceAssessment} Anunobtrusiveautomaticsystemforbalanceassessmentinelderly} BergBalanceScale(BBS)test:14exercises/items(̴30min.)
} Input:streamofpressuredatagatheredfromthe4cornersNintendoWiiboardduringtheexecutionofjust1(overthe14)BBSexercises
} Target:globalBBSscoreoftheuser(0-56)} TheuseofRNNsallowtoautomaticallyexploittherichnessofthesignal
dynamics oremi
ApplicationsofESNs:Examples/8
36
} AutonomousBalanceAssessment} ExcellentpredictionperformanceusingLI-ESNs
} Practicalexampleofhowperformancecanbeimprovedinareal-worldcase} Byanappropriatedesignofthetaske.g.inclusionofclinicalparametersininput
} Byanappropriatechoicesforthenetworkdesigne.g.byusingaweightsharingapproachontheinput-to-reservoirconnections
LI-ESNmodel TestMAE(BBSpoints)
TestR
standard 4,80± 0,40 0,68
+weight 4,62± 0,30 0,69
LR weightsharing(ws) 4,03± 0,13 0,71
ws+weight 3,80± 0,17 0,76
q Verygoodcomparison
q withrelatedmodels(MLPs,TDNN,RNNs,NARX,…)
q withliteratureapproaches
oremi
ApplicationsofESNs:Examples/9
37
} Phonesrecognitionwithreservoirnetworks} 2-layeredad-hocreservoirarchitecture
} layersfocusondifferentrangesoffrequencies(usingappropriateleakyparameters)andfocusondifferentsub-problems
Triefenbach, Fabian, et al. "Phoneme recognition with large hierarchical reservoirs." Advances in neural information processing systems. 2010.
Triefenbach, Fabian, et al. "Acoustic modeling with hierarchical reservoirs." IEEE Transactions on Audio, Speech, and Language Processing 21.11 (2013): 2439-2450.
38
ExtensionstoStructuredData
LearninginStructuredDomains
39
Sequences Trees Graphs
QSPR analysis of Alkanes
Boiling Point
Predictive Toxicology ChallengeMG -Chaotic Time Series Prediction
{−1, +1}
} Inmanyreal-worldapplicationdomainstheinformationofinterestcanbenaturallyrepresentedbythemeansofstructureddatarepresentations.
} Theproblemsofinterestcanbemodeledasregressionorclassificationtasksonstructureddomains.
LearninginStructuralDomains} RecursiveNeuralNetworksextendtheapplicabilityofRNNmethodologies
tolearningindomainsoftreesandgraphs} Randomizedapproachesenableefficienttrainingandstate-of-theart
performance} EchoStateNetworksextendedtodiscretestructures:
TreeandGraphEchoStateNetworks
} BasicIdea:thereservoirisappliedtoeachnode/vertexoftheinputstructure
[C. Gallicchio, A. Micheli, Priceedings of IJCNN 2010] [C. Gallicchio, A. Micheli, Neurocomputing, 2013]
LearninginStructuredDomains} Thereservoiroperationisgeneralizedfromtemporalsequencestodiscretestructures
} Statetransitionsystemsondiscretetree/graphstructures
41
Standard ESN Tree ESN Graph ESN
TreeESN:Reservoir} Large,sparselyconnected,untrainedlayerofnon-linearrecursiveunits
} Inputdrivendynamicalsystemondiscretetreestructures
42
TreeESN:StatemappingandReadout} StateMappingfunctionfortree-to-elementtasks
} onesingleoutputvector(unstructured)isrequiredforeachinputtree
} example:documentclassification
} ReadoutcomputationandtrainingisasinthecaseofstandardReservoirComputingapproaches} tree-to-tree(isomorphictransductions)} tree-to-element
43
( )x t
Root State Mapping
Mean State Mapping
TreeESN:EchoStateProperty} Therecursivereservoirdynamicscanbeleftuntrainedprovidedthatastabilitypropertyissatisfied
} TreeESP:asymptoticstabilityconditionsontreestructures
} fortwoanyinitialstates,thestatecomputedfortherootoftheinputtreeshouldconvergeforincreasingheightofthetree
} theinfluenceofaperturbationinthelabelofanodewillprogressivelyfadeaway
} SufficientconditionfortheESPfortrees} beingcontractive
} Necessarycondition} beingstable
44
[C. Gallicchio, A. Micheli, Information Sciences, 2019]
degree
TreeESN:Efficiency
45
Computational ComplexityExtremely efficient RC approach: only the linear readout parameters are trained
Encoding Process
For each tree t
• Scales linearly with the number of nodes and the reservoir dimension• The same cost for training and test• Compares well with state of art methods for trees:
• RecNNs: extra cost (time + memory) for gradient computations• Kernel methods: higher cost of encoding (e.g. Quadratic in PT kernels)
number of nodes max degree number of reservoir unitsdegree of connectivity
Output Computation• Depends on the method used (e.g. Direct using SVD or iterative)• The cost of training the linear TreeESN readout is generally inferior to the cost
of training MLPs or SVMs (used in RecNNs and Kernels)
ReservoirinGraphEchoStateNetworks} Input-drivendynamicalsystemondiscretegraphs} Thesamereservoirarchitectureisappliedtoalltheverticesinthestructure
} Stabilityofthestateupdateensuresasolutionevenincaseofcyclicdependenciesamongstatevariables
46
t
ttt
t
t
tttt
v
t
t
t tt̂Standard ESN GraphESN
GraphESN:EchoStateProperty} Astabilityconstraintisrequiredtoachieveusabledynamics
} Foundationalidea:resorttocontractivedynamics} Contractivedynamicsensuretheconvergenceoftheencodingprocess(Banach Th.)
} Enablesaniterativecomputationoftheencodingprocess
} Sufficientcondition} Necessarycondition
47
ResearchonReservoirComputing(Inourgroup)} Applicationstocomplexreal-worldtasks
} NLP,earthquaketime-series,humanmonitoring,…
} GatedReservoirComputingModels} RC-basedanalysisoffullytrainedRNNs} Unsupervisedadaptationofreservoirs
} edgeofstability/chaos} Bayesianoptimization
} DeepEchoStateNetworks} Advancedmathematicalanalysis} ArchitecturalconstructionofhierarchicalRC
} DeepRCforStructures
48
49
EchoStateNetworkDynamics
EchoStateProperty} Assumption:Inputandstatespacesarecompactsets} Areservoirnetworkwhosestateupdateisruledby
satisfiestheESPifinitialconditionsareasymptoticallyforgotten
} Thestatedynamicsprovidesapoolof“echoes”ofthedrivinginput
} Essentially,thisisanstability condition(globalasymptoticstabilityinthesenseofLyapunov)
50
EchoStateProperty:Stability} Whyastableregimeissoimportant?} Anunstablenetworkexhibitssensitivitytoinputperturbations} Twoslightlydifferent(long)inputsequencesdrivethenetworkinto(asymptoticallyvery)differentstates
} Goodfortraining} Thestatevectorstendtobemoreandmorelinearlyseparable(foranygiventask)
} Badforgeneralization:overfitting!} Nogeneralizationabilityifatemporalsequencesimilartooneinthetrainingsetdrivesthenetworkintocompletelydifferentstates
51
ESP:SufficientCondition} ThesufficientconditionfortheESPanalyzesthecaseofcontractive dynamics ofthestatetransitionfunction
} Whateveristhedrivinginputsignal:Ifthesystemiscontractivethenitwillexhibitstability
} Inwhatfollows,weassumestatetransitionfunctionsoftheform:
52
input previous statenew state
input weight matrix recurrent weight matrix
Contractivity
53
Thereservoirstatetransitionfunctionrulestheevolutionofthecorrespondingdynamicalsystem
} Def.ThereservoirhascontractivedynamicswheneveritsstatetransitionfunctionF isLipschitzcontinuouswithconstantC<1
Contractivity andtheESP} Theorem. IfanESNhasacontractivestatetransitionfunctionF(andbounded
statespace),thenitsatisfiestheEchoStateProperty} Assumption:F iscontractivewithparameterC<1
} Giventhiscondition:
} WewanttoshowthattheESPholdstrue:
54
(Contractivity)
(ESP)
Contractivity andtheESP} Theorem. IfanESNhasacontractivestatetransitionfunctionF,thenit
satisfiestheEchoStateProperty} Assumption:F iscontractivewithparameterC<1
55
goes to 0 as n goes to infinity à the ESP holds
Contractivity andReservoirInitialization} IfthereservoirisinitializedtoimplementacontractivemappingthantheESPisguaranteed(inanynorm,foranyinput)
} FormulationofasufficientconditionfortheESP} Assumptions:
} Euclideandistanceasmetricinthestatespace(useL2-norm)} Reservoirunitswithtanh activationfunction(note:squashingnonlinearitiesboundthestatespace)
56
MarkovianNatureofstatespaceorganizations} Contractivedynamicalsystemsarerelatedtosuffix-basedstatespaceorganizations
} Statesassumedincorrespondenceofdifferentinputsequencessharingacommonsuffixareclosetoeachotherproportionallytothelengthofthecommonsuffix} similarsequencesaremappedtoclosestates} differentsequencesaremappedtodifferentstates} similaritiesanddissimilaritiesareintendedinasuffix-basedfashion
} RNNsinitializedwithsmallweights(withcontractivestatetransitionfunction)andboundedstatespaceimplement(approximatearbitrarilywell)definitememorymachines
57
Hammer, B., Tino, P.: Recurrent neural networks with small weights implementdefinite memory machines. Neural Computation 15 (2003) 1897-1929
MarkovianNatureofstatespaceorganizations} MarkovianArchitecturalbiasofRNNs
} recurrentweightsaretypicallyinitializedwithsmallvalues} thisleadstoatypicallycontractiveinitializationofrecurrentdynamics
} IteratedFunctionSystems,fractaltheory,architecturalbiasofRNNs
} RNNsinitializedwithsmallweights(withcontractivestatetransitionfunction)andboundedstatespaceimplement(approximatearbitrarilywell)definitememorymachines
} Thischaracterizationisabias forfullytrainedRNNs:holdsintheearlystagesoflearning
58
Hammer, B., Tino, P.: Recurrent neural networks with small weights implementdefinite memory machines. Neural Computation 15 (2003) 1897-1929
Markovianity andESNs} Usingdynamicalsystemswithcontractivestatetransitionfunctions(inanynorm)impliestheEchoStateProperty(foranyinput)
} ESNsfeaturedbyfixedcontractivedynamics} RelationswiththeuniversalityofRCforboundedmemorycomputation(LSMstheory)
} ESNswithuntrainedcontractivereservoirsarealreadyabletodistinguishinputsequencesonasuffix-basedfashion
} IntheRCframeworkthisisnolongerabias,itisafixedcharacterizationoftheRNNmodel
59
Maass, W., Natschlager, T., Markram, H.: Real-time computing without stable states: A new framework for neural computation based on perturbations. Neural Computation 14 (2002) 2531-2560
Gallicchio C, Micheli A. Architectural and Markovian Factors of Echo State Networks. Neural Networks 2011;24(5):440–456.
WhydoEchoStateNetworkswork?} BecausetheyexploittheMarkovianstatespaceorganization} Thereservoirconstructsahigh-dimensionalMarkovianstatespace
representationoftheinputhistory} Inputsequencessharingacommonsuffixdrivethesystemintoclosestates
} Thestatesareclosetoeachotherproportionallytothelengthofthecommonsuffix
} Asimpleoutput(readout)toolcanthenbesufficienttoseparatethedifferentcases
60
Gallicchio C, Micheli A. Architectural and Markovian Factors of Echo State Networks. Neural Networks 2011;24(5):440–456.
WhendoEchoStateNetworkswork?} WhenthetargetmatchestheMarkovianassumptionbehindthereservoir
statespaceorganization} Markovianity canbeusedtocharacterizeeasy/hardtasksforESNs
Example:easytask
61
+1
+1
-1
-1
WhendoEchoStateNetworkswork?} WhenthetargetmatchestheMarkovianassumptionbehindthereservoir
statespaceorganization} Markovianity canbeusedtocharacterizeeasy/hardtasksforESNs
Example:hardtask
62
+1
-1
+1
-1
?
ESP:NecessaryCondition} Investigating thestability ofreservoirdynamicsfromadynamicalsystemperspective
} Theorem. IfanESNhasunstabledynamicsaroundthezerostateandthezerosequenceisanadmissibleinput,thentheESPisnotsatisfied.
} Approachthisstudybylinearizingthestatetransitionfunction
63
Jacobian matrix
ESP:NecessaryCondition} Linearizationaroundthezerostateandfornullinput
} Remember:
} TheJacobianwithtanh neuronsisgivenby
64
ESP:NecessaryCondition} Linearizationaroundthezerostateandfornullinput
} Remember:
} TheJacobianwithtanh neuronsisgivenby
65
Null input assumption
ESP:NecessaryCondition} Thelinearizedsystemnowreads:
} 0isafixedpoint.Isitstable?} LineardynamicalsystemstheorytellsusthatIf𝜌 𝐖* < 1 thenthefixedpointisstable
} Otherwise:0isnotstableifwestartfromastatenear0 andwedrivethenetworkwitha(infinite-length)nullsequencewedonotendupin0
} Thenullsequenceisacounter-example:theESPdoesnothold!} Thereareatleasttwodifferentorbitsresultingfromthesameinputsequence
66
ESP:NecessaryCondition} Asufficientcondition(underourassumptions)fortheabsenceoftheESPisthat𝜌 𝐖* ≥ 1
} Hence,anecessaryconditionfortheESPisthat𝜌 𝐖* < 1
} Ingeneral:𝜌 𝐖* ≤ 𝐖* 2
} Typically,inapplications:} Thesufficientconditionisoftentoorestrictive} ThenecessaryconditionfortheESPisoftenusedtoscalethe
recurrentweightmatrix(e.g.toavalueofthespectralradiusof0.9)
} However,notethatscalingthespectralradiusbelow1inpresenceofdrivinginputisneithersufficientnornecessaryforensuringechostates
67
ESPIndex:aconcreteperspective} Drivinginputisnotproperlytakenintoaccountinconventionalreservoirinitializationstrategies
} Idea:calculateaveragedeviationsofreservoirstateorbitsfromdifferentinitialconditionsandundertheinfluenceofthesamedrivingexternalinput
68
ESPIndex:aconcreteperspective
• The set of configurations that satisfy the ESPin real cases are well beyond thosecommonly adopted in ESN practice
• A large portion of "good" reservoirs areusually neglected in common practice
69
C. Gallicchio, "Chasing the Echo State Property", ESANN 2019.
70
AdvancesonEchoStateNetworks
ResearchonEchoStateNetworks} TheresearchonESNsfollowstwomajorcomplementaryobjectives:} StudytheintrinsicpropertiesofRNNs,takingasidetheaspectsrelatedtotrainingoftherecurrentconnections
} DevelopefficientlytrainedRNNs
} Advances…} Theoreticalanalysis} Qualityofreservoirdynamics} Architecturalstudies} DeepEchoStateNetworks} ReservoirComputingforStructures
71
EchoStateProperty:Advances} MuchofthetheoreticaladvancesinthestudyofESNsaimtoestablishsimpleconditionsforreservoirinitialization
} Focusofthetheoreticalinvestigationsisshiftedtothestudyofstabilityconstraint
} Analysisofthesystemstabilitygiventheinput
} Non-autonomousdynamicalsystems
} MeanFieldTheory
} LocalLyapunovexponents
72
G. Manjunath and H. Jaeger. Echo state property linked to an input: Exploring a fundamental characteristic of recurrent neural networks. Neural computation, 25(3):671–696, 2013.
M. Massar and S. Massar. Mean-field theory of echo state networks. Physical Review E,87(4):042809, 2013.
G. Wainrib and M.N. Galtier. A local echo state property through the largest lyapunovexponent. Neural Networks, 76:39–45, 2016.
I.B. Yildiz, H. Jaeger, and S.J. Kiebel. Re-visiting the echo state property. Neural networks, 35:1–9, 2012.
ApplicationstoReal-worldProblems} Successfulapplicationsinseveralreal-worldproblems
} Chaotictime-seriesmodeling} Non-linearsystemidentification} Speechrecognition} Financialforecasting} Bio-medicalapplications} Robotlocalization&control} …
} Highdimensionalreservoirsareoftenneededtoachieveexcellentperformanceincomplexreal-worldtasks
} Question:canwecombinetrainingefficiencywithcompact (i.e.smallsize)ESNs?
QualityofReservoirDynamics} Howtoestablishthequalityofareservoir?} IfIcanfindasuitablewaytocharacterizehowgoodareservoirisIcantrytooptimizeit
} Entropyofrecurrentunitsactivations} UnsupervisedadaptationofreservoirsusingIntrinsicPlasticity
} Studytheshort-termmemoryabilityofthesystem} MemoryCapacityandrelationstolinearity
} Edgeofstability/chaos:reservoirdynamicsattheborderofstability} Recurrentsystemsclosetoinstabilityshowoptimalperformanceswheneverthetaskathandrequireslongshort-termmemory
74
Short-termMemoryCapacity} Anaspectofgreatimportanceinthestudyofdynamicalsystems istheanalysisoftheirmemoryabilities
} Jaegerintroducedalearningtask,calledMemoryCapacity(MC)toquantifyit
} Trainindividualoutputunitstorecallincreasinglydelayedversionsofaunivariatei.i.d.inputsignal
75
Jaeger, Herbert. Short term memory in echo state networks. Vol. 5. GMD-Forschungszentrum Informationstechnik, 2001.
MCisthesumofsquaredcorrelationcoefficientsbetweenthedelayedsignalsandtheoutputs
Short-termMemoryCapacity} Forgettingcurvestostudythememorystructure} Plotthesquaredcorrelation(Y-axis,i.e.detCoeff)withrespecttoindividualdelays(X-axis)
76
linear reservoir units tanh reservoir units
Short-termMemoryCapacitySomefundamentaltheoreticalresults:} TheMCofanetworkwithNrecurrentunitsisupper-boundedbyN} MC≤N} ItisimpossibletotrainanESNontaskswhichrequireunbounded-timememory
} Linearreservoirscanachievethemaximumbound} e.g.sufficientcondition:thematrix𝐖* O𝐰()𝐖* a𝐰() …𝐖*P𝐰() hasfullrank} example:unitaryrecurrentmatrices(i.e.orthogonalmatricesintherealcase)
} MemoryversusNon-linearitydilemma} Linearreservoirsarefeaturedbylongershort-termmemories,butnon-linear
reservoirsarerequiredtosolvecomplexreal-worldproblems....
} MemoryCapacityvsPredictiveCapacity
77
ArchitecturalSetup} Howtoconstruct“better”reservoirsthanjustrandomreservoirs?
} CriticalEchoStateNetworks:relevanceoforthogonalrecurrenceweightmatrices(e.g.permutationmatrices)
78
[M.A. Hajnal, A. Lorincz, 2006]
CyclicReservoirs DelayLineReservoirs[A. Rodan, P. Tino, 2011][T. Strauss et al., 2012][J. Boedecker et al., 2009]
[M. Cernansky, P. Tino 2008][A. Rodan, P. Tino 2012]
EdgeofStability} Reservoirdynamicalregimeclosetothetransitionbetweenstableandunstabledynamics} E.g.,studyofLyapunovexponents} λ=0identifiesthetransitionbetween(locally)stable(λ<0)andunstable(λ>0)dynamics
79
[J. Boedecker, O. Obst, J.T. Lizier, N.M. Mayer, and M. Asada. Information processing in echostate networks at the edge of chaos. Theory in Biosciences, 131(3):205–213, 2012.]
DeepNeuralNetworks
80
DeepLearningisanattractiveresearcharea} Hierarchyofmanynon-linearmodels} Abilitytolearndatarepresentationsatdifferent(higher)levelsofabstraction
DeepNeuralNetworks(DeepNNs)} Feed-forwardhierarchyofmultiplehiddenlayersofnon-linearunits
} Impressiveperformance inreal-worldproblems(especiallyinthecognitivearea)
} Remember:deeplearninghasastrongbiologicalplausibility
DeepRecurrentNeuralNetworksExtensionofthedeeplearningmethodologytotemporalprocessing.Aimat naturallycapturetemporal feature representations atdifferenttime-scales
} Text,speech,languageprocessing} Multi-layeredprocessingoftemporalinformationwithfeedbackshasastrongbiologicalplausibility
TheanalysisofdeepRNNsisstillyoung} DeepReservoirComputing:Investigatetheactualroleoflayeringindeeprecurrentarchitectures
} Stability:characterizethedynamicsofhierarchicallyorganizedrecurrentmodels
81
DeepEchoStateNetwork
} Whatistheintrinsicrole oflayering inrecurrentarchitectures?
} Developnovelefficientapproachestoexploit} Multipletime-scalesrepresentations
} Extremeefficiencyoftraining
82
C. Gallicchio, A. Micheli, L. Pedrelli, "Deep Reservoir Computing: A Critical Experimental Analysis", Neurocomputing, 2017
untrained
DeepRNNArchitecture:TheroleofLayering
83
ConstraintstothearchitectureofafullyconnectedRNN,by:} Removingconnectionfrominputtohigherlayers
} Removingconnectionsfromhigherlayerstolowerones
} Removingconnectionstolayersatlevelshigherthan+1
LessweightstostorethanafullyconnectedRNN
DeepESN:OutputComputation
84
} Thereadoutcanmodulatethe(qualitativelydifferent)temporalfeaturesdevelopedatthedifferentlayers
DeepESN:ArchitectureandDynamics
85
first layer
l-th layer (l>1)
Each layer has its own:- leaky integration
constant - Input scaling- Spectral radius- Inter-layer scaling
DeepESN:ArchitectureandDynamics
86
first layer
l-th layer (l>1)
Each layer has its own:- leaky integration
constant - Input scaling- Spectral radius- Inter-layer scaling
The recurrent part of the system is hierarchically structured.Interestingly, this naturally entails a structure into the developed system dynamics
* seminar topic
IntrinsicallyRicherDynamicsLayeringinRNN:aconvenientwayofarchitecturalsetup} Multipletime-scalesrepresentations} Richerdynamicsclosertotheedgeofstability} Longershort-timememory
DeepESN:HierarchicalTemporalFeatures
88
Structuredrepresentationoftemporaldatathroughthedeeparchitecture
EmpiricalInvestigations} Effectsofinputperturbationslastslongerinhigherlayers
} Multipletime-scalesrepresentation
} Orderedalongthenetwork’shierarchy
C. Gallicchio, A. Micheli, L. Pedrelli, "Deep Reservoir Computing: A Critical Experimental Analysis", Neurocomputing, 2017
DeepESN:HierarchicalTemporalFeatures
89
Structuredrepresentationoftemporaldatathroughthedeeparchitecture
FrequencyAnalysis } DiversifiedmagnitudesofFFTcomponents
} Multiplefrequencyrepresentation
} Orderedalongthenetwork’shierarchy
} Higherlayerstendtofocusonlowerfrequencies[C. Gallicchio, A. Micheli, L. Pedrelli, WIRN 2017]
layer 1 layer 4
layer 7 layer 10
DeepESN:MathematicalBackground
90
Structuredrepresentationoftemporaldatathroughthedeeparchitecture
TheoreticalAnalysis} Higher layersintrinsicallyimplementless contractive dynamics
} EchoStatePropertyforDeepESNs
} Deepernetworksnaturallydevelopricher dynamics,closertotheedge of stability
[C. Gallicchio, A. Micheli. Cognitive Computation (2017).]
[C. Gallicchio, A. Micheli, L. Silvestri. Neurocomputing 2018.]
DeepESN:PerformanceinApplications
91
Formidabletrade-offbetweenperformanceandcomputationaltime
C. Gallicchio, A. Micheli, L. Pedrelli, "Comparison between DeepESNs and gatedRNNs on multivariate time-series prediction", ESANN 2019.
DeepTreeEchoStateNetworks
92
} DeepTreeEchoStateNetworks} Untrainedmulti-layeredrecursiveneuralnetwork
Gallicchio, Micheli, IJCNN, 2018.Gallicchio, Micheli, Information Sciences, 2019.
DeepTreeEchoStateNetworks:Unfolding
93
DeepTreeEchoStateNetworks:Advantages
94
} Effectiveinapplications
} Extremelyefficient} Layeredrecursivearchitecture} Shallowcase
Conclusions} ReservoirComputing:paradigmforefficientmodelingofRNNs
} Reservoir:non-lineardynamiccomponent,untrainedaftercontractiveinitialization
} Readout:linearfeed-forwardcomponent,trained} Easy toimplement,fast totrain} Markovianflavourofreservoirstatedynamics} SuccessfulapplicationsRecentextensionstoward:
} DeepLearningarchitecture} StructuredDomains
95
References} JaegerH.The“echostate”approachtoanalysing andtrainingrecurrent
neuralnetworks- withanerratumnote.Tech.rep.GMD- GermanNationalResearchInstituteforComputerScience,Tech.Rep.2001.
} JaegerH,HaasH.Harnessingnonlinearity:predictingchaoticsystemsandsavingenergyinwirelesscommunication.Science.2004;304(5667):78–80.
} Gallicchio C,Micheli A.ArchitecturalandMarkovianFactorsofEchoStateNetworks.NeuralNetworks2011;24(5):440–456.
} Lukoševičius,Mantas,andHerbertJaeger."Reservoircomputingapproachestorecurrentneuralnetworktraining." ComputerScienceReview 3.3(2009):127-149.
} Lukoševičius,Mantas."Apracticalguidetoapplyingechostatenetworks." Neuralnetworks:Tricksofthetrade.SpringerBerlinHeidelberg,2012.659-686.
96
References} Gallicchio,Claudio,AlessioMicheli,andLucaPedrelli."Deepreservoir
computing:Acriticalexperimentalanalysis." Neurocomputing 268(2017):87-99.
} Gallicchio,Claudio,AlessioMicheli,andLucaPedrelli."Designofdeepechostatenetworks." NeuralNetworks 108(2018):33-47.
} Gallicchio,Claudio,andAlessioMicheli."Deepechostatenetwork(deepesn):Abriefsurvey." arXiv preprintarXiv:1712.04323 (2017).
} Gallicchio,Claudio,andAlessioMicheli."DeepReservoirNeuralNetworksforTrees." InformationSciences 480(2019):174-193.
} Gallicchio,Claudio,andAlessioMicheli."Graphechostatenetworks." The2010InternationalJointConferenceonNeuralNetworks(IJCNN).IEEE,2010.
97