12
Improving liquid state machines through iterative refinement of the reservoir David Norton , Dan Ventura Computer Science Department, Brigham Young University, Provo, Utah 84602, United States article info Article history: Received 17 December 2009 Received in revised form 23 July 2010 Accepted 8 August 2010 Communicated by I. Bojak Available online 27 August 2010 Keywords: Spiking neural network Liquid state machine Recurrent network abstract Liquid state machines (LSMs) exploit the power of recurrent spiking neural networks (SNNs) without training the SNN. Instead, LSMs randomly generate this network and then use it as a filter for a generic machine learner. Previous research has shown that LSMs can yield competitive results; however, the process can require numerous time consuming epochs before finding a viable filter. We have developed a method for iteratively refining these randomly generated networks, so that the LSM will yield a more effective filter in fewer epochs than the traditional method. We define a new metric for evaluating the quality of a filter before calculating the accuracy of the LSM. The LSM then uses this metric to drive a novel algorithm founded on principals integral to both Hebbian and reinforcement learning. We compare this new method with traditional LSMs across two artificial pattern recognition problems and two simplified problems derived from the TIMIT dataset. Depending on the problem, our method demonstrates improvements in accuracy of from 15 to almost 600%. & 2010 Elsevier B.V. All rights reserved. 1. Introduction By transmitting a temporal pattern rather than a single quantity along synapses [1], spiking neural networks (SNNs) can process more information at each node than traditional artificial neural networks [2–5]. Furthermore, each node in an SNN has its own memory adding to the computational potential of SNNs. Augmenting SNNs with recurrence allows the model to maintain a trace of past events that can aid in the analysis of later inputs. Despite these strengths, training recurrent SNNs is a problem without a satisfactory solution. Liquid state machines (LSMs) are a model that can take advantage of recurrent SNNs without training the network. Instead, a randomly generated reservoir, or liquid, of spiking neurons is used as a filter for a simple learner, such as a perceptron. As a filter, the liquid has the potential to transform complex temporal patterns into more basic spatial patterns that are more amenable to machine learning. Unfortunately, randomly generated liquids often do not act as a useful filter, which requires researchers to generate many random liquids until a useful filter is found. Since there is no form of regression in this random creation process, there is no way of iterating upon the best liquid found so far. This results in an unsatisfactory termination criterion. Furthermore, in order to determine whether each filter is useful or not, a separate learner must be trained on the liquid and then evaluated. We present an algorithm for training the randomly created liquid of an LSM, so that a useful filter can be found after fewer liquid generations than using a strictly random search. We show that this algorithm not only decreases the number of liquids that must be created to find a usable filter, it also evaluates each liquid without training a learner and allows for a satisfactory termina- tion criterion for the search process. Since the LSM model is built with spiking neurons, our algorithm essentially trains a recurrent SNN. The algorithm does not attempt to directly improve the accuracy of the system like most machine learners; rather, it attempts to improve the ability of the liquid to separate different classes of input into distinct patterns of behaviora more general task. We define a metric to measure this ability to separate and aptly call it separation. Because separation drives the proposed algorithm, we call it separation driven synaptic modification (SDSM). In our results, SDSM yields LSMs with higher accuracy than traditional LSMs on all of the problems that we explore. In order to motivate the subject of LSMs, we begin this paper with Section 2 by describing in detail the characteristics of recurrent SNNs and LSMs. In Section 3 we define separation and provide motivation for its use in driving a learning algorithm. In Section 4, we formally define SDSM. The next two sections show the implementation and results of several experiments comparing SDSM to traditional LSMs. For these experiments we choose two artificial problems and two simplified problems derived from the TIMIT dataset (commonly used in speech recognition) to evaluate SDSM. In addition to our results, Section 5 contains the description and motivation for the artificial problems we use. In Section 6 we describe the problems derived from the TIMIT dataset, how we convert speech data into spike trains for the LSM, Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/neucom Neurocomputing 0925-2312/$ - see front matter & 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.neucom.2010.08.005 Corresponding author. E-mail addresses: [email protected] (D. Norton), [email protected] (D. Ventura). Neurocomputing 73 (2010) 2893–2904

Improving liquid state machines through iterative refinement of the reservoir

Embed Size (px)

Citation preview

Neurocomputing 73 (2010) 2893–2904

Contents lists available at ScienceDirect

Neurocomputing

0925-23

doi:10.1

� Corr

E-m

(D. Ven

journal homepage: www.elsevier.com/locate/neucom

Improving liquid state machines through iterative refinement of the reservoir

David Norton �, Dan Ventura

Computer Science Department, Brigham Young University, Provo, Utah 84602, United States

a r t i c l e i n f o

Article history:

Received 17 December 2009

Received in revised form

23 July 2010

Accepted 8 August 2010

Communicated by I. Bojaka method for iteratively refining these randomly generated networks, so that the LSM will yield a more

Available online 27 August 2010

Keywords:

Spiking neural network

Liquid state machine

Recurrent network

12/$ - see front matter & 2010 Elsevier B.V. A

016/j.neucom.2010.08.005

esponding author.

ail addresses: [email protected] (D. Norton), v

tura).

a b s t r a c t

Liquid state machines (LSMs) exploit the power of recurrent spiking neural networks (SNNs) without

training the SNN. Instead, LSMs randomly generate this network and then use it as a filter for a generic

machine learner. Previous research has shown that LSMs can yield competitive results; however, the

process can require numerous time consuming epochs before finding a viable filter. We have developed

effective filter in fewer epochs than the traditional method. We define a new metric for evaluating the

quality of a filter before calculating the accuracy of the LSM. The LSM then uses this metric to drive a

novel algorithm founded on principals integral to both Hebbian and reinforcement learning. We

compare this new method with traditional LSMs across two artificial pattern recognition problems and

two simplified problems derived from the TIMIT dataset. Depending on the problem, our method

demonstrates improvements in accuracy of from 15 to almost 600%.

& 2010 Elsevier B.V. All rights reserved.

1. Introduction

By transmitting a temporal pattern rather than a singlequantity along synapses [1], spiking neural networks (SNNs) canprocess more information at each node than traditional artificialneural networks [2–5]. Furthermore, each node in an SNN has itsown memory adding to the computational potential of SNNs.Augmenting SNNs with recurrence allows the model to maintain atrace of past events that can aid in the analysis of later inputs.Despite these strengths, training recurrent SNNs is a problemwithout a satisfactory solution.

Liquid state machines (LSMs) are a model that can takeadvantage of recurrent SNNs without training the network.Instead, a randomly generated reservoir, or liquid, of spikingneurons is used as a filter for a simple learner, such as aperceptron. As a filter, the liquid has the potential to transformcomplex temporal patterns into more basic spatial patterns thatare more amenable to machine learning. Unfortunately, randomlygenerated liquids often do not act as a useful filter, which requiresresearchers to generate many random liquids until a useful filteris found. Since there is no form of regression in this randomcreation process, there is no way of iterating upon the best liquidfound so far. This results in an unsatisfactory terminationcriterion. Furthermore, in order to determine whether each filteris useful or not, a separate learner must be trained on the liquidand then evaluated.

ll rights reserved.

[email protected]

We present an algorithm for training the randomly createdliquid of an LSM, so that a useful filter can be found after fewerliquid generations than using a strictly random search. We showthat this algorithm not only decreases the number of liquids thatmust be created to find a usable filter, it also evaluates each liquidwithout training a learner and allows for a satisfactory termina-tion criterion for the search process. Since the LSM model is builtwith spiking neurons, our algorithm essentially trains a recurrentSNN.

The algorithm does not attempt to directly improve theaccuracy of the system like most machine learners; rather, itattempts to improve the ability of the liquid to separate differentclasses of input into distinct patterns of behavior—a more generaltask. We define a metric to measure this ability to separate andaptly call it separation. Because separation drives the proposedalgorithm, we call it separation driven synaptic modification(SDSM). In our results, SDSM yields LSMs with higher accuracythan traditional LSMs on all of the problems that we explore.

In order to motivate the subject of LSMs, we begin this paperwith Section 2 by describing in detail the characteristics ofrecurrent SNNs and LSMs. In Section 3 we define separation andprovide motivation for its use in driving a learning algorithm. InSection 4, we formally define SDSM. The next two sections showthe implementation and results of several experiments comparingSDSM to traditional LSMs. For these experiments we choose twoartificial problems and two simplified problems derived from theTIMIT dataset (commonly used in speech recognition) to evaluateSDSM. In addition to our results, Section 5 contains thedescription and motivation for the artificial problems we use. InSection 6 we describe the problems derived from the TIMITdataset, how we convert speech data into spike trains for the LSM,

D. Norton, D. Ventura / Neurocomputing 73 (2010) 2893–29042894

and the results for those problems. In Section 7 we show someresults demonstrating transfer learning occurring with SDSM.Finally, the conclusions and implications of our results arediscussed in Section 8.

For duplication purposes, we note that all of the LSMs used inthis paper are created using CSIM [6,7].

Fig. 1. Outline of a liquid state machine: We transform the input signal (a) into a

spike train (b) via some function. We then introduce the spike train into the

recurrent SNN, or ‘‘liquid’’ (c). Next, the LSM takes snapshots of the state of the

‘‘liquid’’; these are called state vectors (d). Finally, the LSM uses these state vectors

as input to train a learning algorithm, the readout function (e).

2. Background

To appreciate the benefits of liquid state machines we firstdescribe the strengths and weaknesses of recurrent spiking neuralnetworks. We then explain, in detail, liquid state machinesthemselves.

2.1. Recurrent spiking neural networks

Spiking neural networks (SNNs) emulate biological neurons bytransmitting signals in a sequence of spikes with constantamplitude. Information can be encoded in the varying frequenciesof spikes produced by the neurons rather than being encoded onlyin the single rate value commonly used in non-spiking networks[1]. Thus, SNNs can, in theory, convey temporal information moreaccurately by maintaining non-uniform spiking frequencies. Thisallows SNNs to have greater computational power for problems inwhich timing is important [2,3], including such problems asspeech recognition, biometrics, and robot control. Even inproblems where timing is not an explicit factor, SNNs achievecompetitive results with fewer neurons than traditional artificialneural networks [4,5].

Recurrent neural networks allow cycles to occur within thenetwork. This property theoretically improves neural networks byallowing neural activity at one point in time to affect neuralactivity at a later time. In other words, context is incorporatedindirectly into the network, endowing the network with thecapability to solve timing-critical problems.

Despite the aforementioned qualities of recurrent SNNs, theysuffer from a distinct lack of satisfactory training algorithms.Concerning the recurrent property alone, only a handful ofestablished algorithms exist, all of which have very highcomputational costs, limiting them to very small networks [8].All of these methods also require very specific and sensitiveparameter settings for each application in which they are used.Training non-recurrent SNNs is also a developing area of research.Most SNN training algorithms currently proposed only allow for asingle output spike from each neuron [4,5,9]. This is unrealisticand computationally limiting. Combining recurrence with SNNscompounds the difficulty of training—though biologically plau-sible models, such as those of cortical microcircuits, are showingpromise [10].

2.2. Liquid state machines

The liquid state machine, or LSM, is one approach forharnessing the power of recurrent SNNs without actually trainingthem [11]. LSMs are a type of reservoir computing comparable toecho state networks (ESN) [12] and backpropagation decorrela-tion (BPDC) [13], neither of which employs spiking neurons. Wehave chosen the LSM model for our research primarily because ofits ability to exploit recurrent SNNs. While a similar study may beconducted with echo state networks, no contribution to the SNNcommunity would be possible.

LSMs are composed of two parts: a reservoir featuring a highlyrecurrent SNN, and a readout function characterized by a simplelearning function. Temporal input is fed into the reservoir which

acts as a filter. Then the state of the reservoir, or state vector, isused as input for the readout function. See Fig. 1. In essence, thereadout function trains on the output of the reservoir. No trainingoccurs within the reservoir itself. This process has beenanalogized with dropping objects into a container of liquid andsubsequently reading the ripples created to classify the ob-jects—hence the name liquid state machine.

The state vector is specifically a vector containing the binarystate of each neuron in the reservoir at a given time. Since all ofthe experiments in this paper involve classification problems,with each example contained within a short temporal inputsequence, we take a single sample at the end of this inputsequence. We take the sample at the end in order to allow thereservoir to stabilize. To determine the binary state of a neuron,we look at the firing state of the neuron as a natural threshold—

either the neuron is firing or it is not. Since action potentials occurinstantaneously when modeled, the state of each neuron isdetermined by analyzing a short duration of time for each sample.In summary, for this paper, the state vector indicates whichneurons fired during a time slice at the end of the input sequence.

Because no training occurs in the reservoir, the quality of theLSM is dependent upon the ability of the liquid to effectivelyseparate classes of input. Here the term ‘‘effectively separate’’ isdefined as the ability of a liquid to yield a state vector with whicha readout function can attain acceptable accuracy. Typically, theliquid is created randomly according to some carefully selectedparameters specific to the problem at hand. This parameterselection has been the topic of much research [13,14] althoughthe research has not yet led to a consistent and general method ofgenerating liquids for all problems [15]. Even when adequateparameters for a given problem have been implemented, thecreation of a useful liquid is not guaranteed. Typically hundreds oreven thousands of liquids will be created to find a suitable filter.Fortunately, once such a liquid is discovered, the results of theLSM are comparable with the state-of-the-art [13,16,17]. Suchresults, coupled with the lack of a sufficient algorithm to trainthese promising networks, fuel the exploration of LSMs.

3. Separation

Separation is a metric used to determine the effectiveness of aliquid. It measures how well the liquid separates different classesof input into different reservoir states, or state vectors, and isrelated to supervised clustering in that separation essentiallymeasures how well state vectors are clustered into theirrespective classes. State vectors are described in detail inSection 2. A metric for separation was first devised by Goodman[16] and is inspired by the description of the properties of a liquid

Fig. 2. Correlation between accuracy and separation in 1000 different liquids run

on an artificial problem. The correlation coefficient is 0.6876.

D. Norton, D. Ventura / Neurocomputing 73 (2010) 2893–2904 2895

presented by Maass [11]. Goodman’s definition of separationessentially finds the mean difference between the state vectorsobtained from each of the classes in a problem. In order to moreaccurately perform the desired measure of separation, we haverevised Goodman’s definition to take into consideration thevariance between state vectors of the same class.

Separation is initialized by dividing a set of state vectors, O(t),into n subsets, Ol(t), one for each class, where n is the totalnumber of classes. Individual state vectors are represented by o,and the more state vectors available for the metric, the moreaccurate the calculation of separation. In our notation, rather thanstrictly representing time, t represents the current iteration of theSDMS algorithm described below.

Separation is divided into two parts, the inter-class distance,cd(t), and the intra-class variance, cv(t). cd(t) is defined by Eq. (1)and is the mean distance between the center of mass for everypair of classes. The center of mass for each class, mðOlðtÞÞ, iscalculated with Eq. (2). For clarity, we use j � j notation for setcardinality, and J � Jk notation for the Lk-norm.

cdðtÞ ¼Xn

l ¼ 1

Xn

m ¼ 1

JmðOlðtÞÞ�mðOmðtÞÞJ2

n2ð1Þ

mðOlðtÞÞ ¼

Pom AOlðtÞ

om

jOlðtÞjð2Þ

cv(t) is defined by Eq. (3) and is the mean variance of every cluster,or class, of state vectors. rðOlðtÞÞ is the average amount of variancefor each state vector within class l from the center of mass for thatclass. We calculate the mean variance as shown in Eq. (4).

cvðtÞ ¼1

n

Xn

l ¼ 1

rðOlðtÞÞ ð3Þ

rðOlðtÞÞ ¼

Pom AOlðtÞ

JmðOlðtÞÞ�omJ2

jOlðtÞjð4Þ

Separation can now be defined by Eq. (5). cv(t) is incremented byone to ensure that separation never approaches 1.

SepCðOðtÞÞ ¼cdðtÞ

cvðtÞþ1ð5Þ

Here C is the liquid that produced the set of state vectors, O(t).Separation essentially measures the mean distance betweenclasses of state vectors divided by the average variance foundwithin those classes. It should be noted that this separation metricis similar to the Fisher criterion used in Fisher discriminantanalysis, a technique commonly used in feature selection [18].

In Fig. 2 we show that separation does correlate with theeffectiveness of a liquid. Here, effectiveness is measured as theaccuracy of the LSM at classifying inputs in an artificial problem.One thousand liquids were generated with varying parametersettings to create a large variety of separation values. The artificialproblem consisted of five input classes expressed as templatespiking patterns for four input neurons. The system needed toidentify which class a variation of the template spiking patternbelonged to. These variations were created by jittering each spikein a template by an amount drawn from a zero-mean normaldistribution. The accuracies displayed in Fig. 2 were obtained bytraining the readout with 2000 training instances and then testingwith 500 test instances. Separation was calculated with only threeexamples from each class. Since we were not applying a synapsemodifying algorithm to the liquids, only one iteration, t, wasobserved. The correlation coefficient between accuracy andseparation is a convincing 0.6876.

Another method for identifying the computational power ofliquids, called the linear separation property, has been proposed byMaass et al. [19]. This metric, while geared towards more

biologically plausible models than ours, is shown to generalizeto a variety of neural microcircuits. Development of an algorithmfor preparing effective reservoirs based on the linear separationproperty may prove useful in future research.

4. Separation driven synaptic modification

Separation driven synaptic modification, or SDSM, is anapproach used to modify the synapses of the liquid by using theseparation metric defined in Section 3. The underlying principalsof SDSM are in part inspired by Hebbian learning and bare someresemblance to prior work [20]. However, SDSM is more akin to alinear regression model since it is guided by the separationmetric. Following is an equation that represents SDSM:

wijðtþDtÞ ¼ sgnðwijðtÞÞðjwijðtÞjþeðtÞlf ðtÞÞ ð6Þ

Here wij(t) is the weight of the synapse from neuron j to neuron i

at iteration t, l is the learning rate, sgn(wij(t)) is the sign of wij(t)(i.e. whether the synapse is excitatory or inhibitory), e(t) is afunction of the effect of separation on the weight at iteration t,and f(t) is a function of the firing behavior of all neurons in theliquid at iteration t.

First we will look at the function e(t). To explain this functionand its derivation it is first important to understand what wemean by relative synaptic strength, rs, defined by

rs ¼jwijðtÞj�mw

mwð7Þ

Here mw estimates the expected value of the magnitude ofsynaptic weights in the initial liquid. mw is an estimate of themaximum of the random values drawn from the same distribu-tion used to generate synaptic weights in the initial liquid. (Theseapproximations were obtained via simulation with 10,000samples.) mw normalizes the synaptic strength while mw is usedto differentiate weak synapses and strong synapses. A negative rs

is considered weak while a positive rs is considered strong.Too little distance between centers of mass, cd(t) (Eq. (1)), or

too much variance within classes, cv(t) (Eq. (3)), can decreaseseparation and thus the overall effectiveness of the liquid.Generally speaking, if there is too little distance between centersof mass, it is because strong synapses are driving the liquid tobehave a particular way regardless of input class. To rectify this,we want to strengthen weak synapses and weaken strong

D. Norton, D. Ventura / Neurocomputing 73 (2010) 2893–29042896

synapses. This will drive the liquid towards a more chaoticstructure that will yield results more dependent on the input. Onthe other hand, if there is too much variance within classes, it isbecause the liquid is too chaotic to drive inputs of the same classto behave similarly. To relieve this problem, it is necessary tostrengthen strong synapses and weaken weak synapses evenmore. This will polarize the liquid, requiring greater differences ininput to cause a change in the liquid’s behavior (in other words,the liquid will be less chaotic).

The motivation behind the function e(t) (see Eq. (6)) is to, atthe level of an individual synapse, find a balance betweendifferentiating classes of input and reducing variance withinclasses. The solution to the problem of differentiating classes ofinput, di(t), is implemented with the Eq. (8).

diðtÞ ¼ aiðtÞ 1�cdðtÞ

Sep�C

� �ð8Þ

aiðtÞ ¼

Pnk ¼ 1 miðOkðtÞÞ

nð9Þ

Here aiðtÞ is the activity of a specific neuron i (the post-synaptic neuron of synapse wij) at iteration t and is defined byEq. (9). aiðtÞ contains mðOkðtÞÞ which is the mean of the statevectors in class k. Specifically, miðOkðtÞÞ is the value of the ithelement of the mean state vector. This is also the fraction of statevectors belonging to class k in which neuron i fires. As in Section3, n is the number of classes for the given problem. Sep�C is theoptimal separation value for the given problem and liquid.We evaluate Sep�C by performing a one-time approximation ofthe maximum separation for an artificial set of n state vectors thathave the same cardinality as state vectors from the given liquid.The artificial set of state vectors is constructed such that for allvectors the average Hamming distance to all other vectors is(approximately) maximized. Sep�C must be approximated in thismanner because we are not aware of a method to explicitlycalculate optimal separation. Since the value of this approxima-tion is much greater than the value of the inter-class distancesthat we have encountered in actual liquids, it is suitable as areference point to calculate what is essentially an error functionfor inter-class distance.

In Eq. (8), the normalized value of cd(t) is subtracted from one,so that di(t) will provide greater correction for smaller values ofcd(t). Essentially, Eq. (8), multiplies the activity of a particularneuron by the amount of correction necessary for too littledistance between class centers of mass. We assume that neuronactivity is to blame for this problem. This may or may not be thecase; however, consistently assuming correlation between cd(t)and neuron activity should eventually impose this correlation onthe liquid and ultimately yield the desired results.

The solution to the problem of reducing variance withinclasses, is implemented with

viðtÞ ¼

Pnk ¼ 1 miðOkðtÞÞrðOkðtÞÞ

nð10Þ

vi(t) is calculated similarly to aiðtÞ except that each instance ofmiðOkðtÞÞ is multiplied by the mean variance for class k, becausemean variance is determined class by class. The end result is thatEq. (10) provides greater correction for larger values of cv(t),which is desirable since we are trying to reduce intra-classvariance. Like the equation for di(t) (Eq. (8)), the equation for vi(t)assumes a correlation between the neuron’s activity and cv(t).

Eqs. (8)–(10) all depend upon a sample of state vectorsproduced by the liquid at iteration t. To maintain the efficiencyof SDSM and avoid conventionally training the liquid, only threerandomly selected state vectors per class are used.

The function e(t) is derived from Eqs. (7)–(10) as follows:

eðtÞ ¼ rsðviðtÞ�diðtÞÞ ð11Þ

Here di(t) is subtracted from vi(t) because, as mentionedpreviously, we want the distance correction, di(t), to strengthenweak synapses and weaken strong synapses while we want thevariance correction, vi(t) to strengthen strong synapses andweaken weak synapses. In other words, we want di(t) to increasethe chaotic nature of the liquid and vi(t) to decrease the chaoticnature of the liquid. Ultimately the goal of Eq. (11) is to find abalance between a liquid that is too chaotic and one that is toostable [21]. This goal is in accordance with the findings of otherstudies that have identified the importance of building systemsthat operate near the edge of chaos (referred to as the edge ofstability by some) [22,23].

We now turn our attention to f(t), the function of the firingbehavior of all neurons in the liquid at iteration t. This function isexpressed in three parts as follows:

f ðtÞ ¼

1

fðtÞif wijðtÞeðtÞZ0

fðtÞ if wijðtÞeðtÞo0

8><>: ð12Þ

fðtÞ ¼ 2kðaðtÞ�0:5Þð13Þ

aðtÞ ¼

PoAOðtÞ

PZAoZjoj

jOjð14Þ

Here a(t) is the activity of the entire liquid at iteration t and iscalculated by finding the average fraction of neurons, Z, that fire ineach state vector in O(t). fðtÞ is a transformation of a(t) that reducesit to a function that will allow f(t) to be multiplied by e(t) in Eq. (6).fðtÞ contains the constant, k, which indicates the extent of thetransformation. For our experiments, k¼6 was found, through earlypreliminary tuning, to yield the highest separation values. However,the effects of modifying k have not been explored with a variety ofproblems; so, the robustness of SDSM to changes in k is not clear. f(t)uses the state of the synapse and the results of e(t) to determine howthe global activity of the liquid at iteration t will effect the change inweight. The effect of f(t) is to promote the overall strengthening ofexcitatory synapses while promoting the overall weakening ofinhibitory synapses if less than half of the neurons in the liquidfire. If more than half of the neurons fire, the effect of f(t) is reversed.The goal of f(t) is to direct the liquid to a ‘‘useful’’ amount of activity.This assumes that half of the neurons firing for all state vectors is thedesired fraction of activity to achieve maximum separation. This isnot an unreasonable assumption, as such a condition would allow forquantifiably the most variation between states, and thus allow forthe highest degree of separation possible between numerous states.

5. Applying SDSM

We develop two artificial problems to test the effect ofseparation driven synaptic modification (SDSM) on the accuracyof an LSM given limited liquid creations. This section explores theresults of these experiments and shows the strength of thisalgorithm in selecting ideal liquids for these problems.

We call the simpler of the two problems the frequency

recognition problem. This problem has four input neurons andfive classes. Each input neuron fires at a slow or fast frequency.The five classes are defined by specific combinations of fast andslow input neurons as shown in Table 1, where 1 represents a fastinput neuron and 0 a slow one. These particular patterns are

Table 2

D. Norton, D. Ventura / Neurocomputing 73 (2010) 2893–2904 2897

chosen to challenge the liquid with a variety of combinations aswell as the task of ignoring one channel (input neuron 4).

Individual samples from each class are generated by followingthe class template and then jittering the frequencies. Since eachclass is distinctly defined by a particular pattern of behavior withonly two options for each neuron, this is a fairly simple problem.

The second problem, which is more complex, we call thepattern recognition problem. This problem has eight input neuronsand a variable number of classes. Each class is based on a templatespike pattern randomly created for each input neuron. Wegenerate this random pattern by plotting individual spikes witha random distance between one another. This distance is drawnfrom the absolute value of a normal distribution with a mean of10 ms and a standard deviation of 20 ms. Once we create thetemplate pattern for each input neuron in a class, individualinstances from the class are created by jittering each spike in thetemplate. The spikes are jittered by an amount drawn from a zero-mean normal distribution with a standard deviation of 5 msmaking the problem particularly difficult. We derive all of thesevalues through empirical analysis to create a solvable but difficultproblem. A simplified example with only two classes and threeinput neurons is shown in Fig. 3.

5.1. Parameter settings

The models we used for the neurons and synapses for all of ourexperiments were taken from CSIM [6,7]. Specifically, we used theStaticSpikingSynapse model for synapses, and the SpikingInput-

Neuron and LifNeuron models for the input and reservoir neurons,respectively. We determined the best setting for each of the manyparameters in the liquid by performing extensive preliminarytuning. The parameters that apply to traditional liquids werechosen based on the best tuning results prior to any application of

Table 1Frequency patterns for each class in the frequency recognition problem. Each input

represents one of four input neurons. 1 indicates a fast spiking frequency while 0

represents a slower spiking frequency.

Input 1 Input 2 Input 3 Input 4

Class 1 1 0 0 0

Class 2 0 1 0 0

Class 3 1 1 0 0

Class 4 0 0 1 0

Class 5 1 0 1 0

Fig. 3. Simplified example templates for two classes, A and B, are shown in (a) and (d), r

is time spanning 100 ms. (b) and (c) show examples of instances from class A created f

class B.

SDSM. The different parameters we looked at in these preliminaryefforts were the number of neurons, connection probability, themean synaptic weight and delay, the standard deviation of thesynaptic weight and delay, the number of samples per class usedto determine separation at each instance, the number of iterationsto run, the learning rate, the synaptic time constant, and theamount of noise present in each neuron. While excitatory andinhibitory neurons are not explicitly created, positive synapticweights are excitatory and negative synaptic weights areinhibitory. Table 2 shows the parameters we use for all of theresults presented in this section as well as Section 6. As the resultsof this section and the next section show, these parametersgeneralize very well as long as the temporal scale of the input ison the order of one second.

Some of the parameters presented in Table 2 require furtherexplanation to more easily replicate the results. The connectionprobability is the probability that any given neuron (including inputneurons) is connected to any other liquid neuron (liquid neuronscannot connect back to input neurons). The value for the connectionprobability indicated in Table 2 means that each neuron is connectedto roughly one third of the other neurons. The mean and standarddeviation for synaptic weight define the distribution from which theinitial weight values are drawn for every synapse. The same appliesto the mean and standard deviation for synaptic delay—although thesynaptic delay values for each synapse remain constant duringSDSM. The ‘‘samples per class’’ parameter refers to the number oftraining samples used from each class when calculating separation.This in turn is what drives the SDSM algorithm. The more samplesused, the more accurate the separation calculation will be, but at thecost of speed. The number of iterations is simply how long to run theSDSM algorithm. In our experiments, by 500 iterations, most liquids

espectively. Each class has three input neurons designated by the y-axis. The x-axis

rom jittered versions of the template. (e) and (f) show examples of instances from

Parameters used by SDSM for artificial and phonetic problems.

Neurons 64

Connection probability 0.3

Synaptic weight mean 2�10�8

Synaptic weight SD 4�10�8

Samples per class 3

Training iterations 500

l 5�10�10

Inoise 5�10�8

t 0.003

Synaptic delay mean 0.01

Synaptic delay SD 0.1

State vector length 0.05

D. Norton, D. Ventura / Neurocomputing 73 (2010) 2893–29042898

had reached a plateau in separation improvement. l is the learningrate first shown in Section 4. t is the synaptic time constant whichrefers to the rate at which the membrane potential of each synapsedecays. Inoise is the standard deviation of the noise added to eachneuron and is necessary for an efficient liquid [24]. Finally, the ‘‘statevector length’’ parameter is the sample duration, in seconds, foracquiring state vectors from each liquid (see Section 2).

5.2. Results

Using the established parameters, we create LSMs with SDSMfor both the pattern and the frequency recognition problems(explained above). For the pattern recognition problem weexplore 4-, 8-, and 12-class problems. We only explore thespecifically defined 5-class scenario for the frequency recognitionproblem. For each problem we run 50 experiments, each with aunique randomly generated initial liquid. State vectors obtainedfrom both the initial liquid and the liquid after 500 iterations ofSDSM are used as input to multiple perceptrons. Because we areonly exploring classification problems in this paper, perceptrons

Fig. 4. A comparison of traditional liquids (initial) and those shaped by SDSM (final) acr

Fig. 5. A comparison of traditional liquids (initial) and those shaped by SDSM (final) acr

of 50 LSMs.

are used rather than standard linear regression. We train eachperceptron to classify members of one particular class, so thereare n binary classifiers. We then compare the output of eachperceptron, assigning the class of the perceptron with the greatestconfidence to the state vector in question. This readout function isused because it is a very simple classification model, thusallowing us to more carefully scrutinize the quality of the liquid.For all of our experiments, the test size is 100 samples per class.

Fig. 4(a) shows the mean accuracy (over all 50 experiments) ofthe LSM for each problem. Additionally Fig. 5(a) shows the bestaccuracy obtained out of the 50 experiments for each problem. Inpractice, the liquid with the maximum accuracy is the one that wewould select for future use. However, since we are interested inthe potential of SDSM to decrease the number of liquids thatmust be created, we are also interested in the mean accuracy.Note that the initial liquid is a typical LSM scenario: a strictlyrandomly generated liquid. So, both of these figures are acomparison of standard LSMs to LSMs using SDSM. We didnot perform a statistical analysis of these results since theimprovement of SDSM over the traditional method is clearlysubstantial in all cases.

oss four problems. Results are the mean accuracy (a) and separation (b) of 50 LSMs.

oss four problems. Results are the best accuracy (a) and separation (b) obtained out

D. Norton, D. Ventura / Neurocomputing 73 (2010) 2893–2904 2899

In addition to the accuracy of the LSMs, Figs. 4 and 5 show theseparation of the liquids. It should be noted that the liquid withthe maximum separation is not necessarily the same liquid thataccounts for the maximum accuracy. Overall these results supportthe correlation between separation and accuracy, though there isan anomaly in Fig. 5(b), where a higher separation occurs in aninitial random liquid than in any SDSM shaped liquid. Suchanomalies are bound to occur occasionally due to the randomnature of LSMs. The mean results in Fig. 4(b) confirm theabnormal nature of this event.

Fig. 7. The separation of a liquid at each iteration during training of SDSM. This is

a single representative example out of 200 different liquids created in this set of

experiments. This particular liquid was created using the pattern recognition

problem with eight classes.

5.3. Discussion

These results show a substantial increase in the performanceof liquids using SDSM, particularly in the problems with moreclasses. Note that SDSM’s mean accuracy is higher than thetraditional method’s best accuracy—for every problem (Figs. 4(a)and 5(a)).

Fig. 6 compares the trends in liquid exploration for bothtraditional and SDSM generated liquids. The results in this figureare from the pattern recognition problem with eight classes. Fig. 6(a)shows the accuracy of the best liquid obtained so far, given anincreasing number of total liquids created. (The mean accuracy isalso shown.) Fig. 6(a) demonstrates that far fewer liquids arerequired to obtained a satisfactory liquid using SDSM whencompared with the traditional method. Fig. 6(b) extends the graphfurther for traditional liquid creation, and compares the results tothe best results obtained from SDSM after only 50 liquid creations.We see from this figure that even after 500 liquid creations, a liquidhas not been created by the traditional means that can competewith SDSM after only 11 liquid creations. This figure illustrates thatnot only can SDSM find a suitable liquid in fewer iterations thantraditional methods, but also it can potentially create a liquid thatobtains a higher accuracy than what is possible with conventionalLSMs in a reasonable amount of time.

Fig. 7 shows how separation changes with each iteration overthe history of a typical SDSM trial. Fig. 8 shows the mean value ofseparation over 50 trials of SDSM. These results show thatseparation is clearly improving over time for the patternrecognition problems. For the frequency recognition problem,separation quickly increases and then slowly begins to decrease.This is somewhat distressing and may indicate an instability inthe algorithm when confronted with certain problems; however,even after 500 iterations of SDSM, the average separation is still

Fig. 6. A comparison of accuracy trends in traditional liquids and those generated wi

improves as more liquids are created. The results also show that the mean accuracy st

(b) shows the trend up to 500 liquid creations for traditional liquids only (the accurac

higher than that of the initial liquid. Also, the results in Fig. 4(a)indicate that even on the frequency recognition problem, theaccuracy of liquids trained with SDSM greatly improved over thatof traditional liquids. While we currently do not know the sourceof this anomaly, we are satisfied with the overall increase in liquidperformance and will leave a deeper investigation of theabnormality for future research.

Overall, these results indicate that the SDSM algorithm isbehaving as expected (i.e. improving separation). Since SDSMresults in a significant improvement in accuracy, the notion thatseparation correlates with accuracy is also strengthened by theseresults. It is important to keep in mind that separation in both ofthese figures was calculated using only three samples per class foreach iteration and thus is a very rough estimate. The initial andfinal separation values indicated in Figs. 4(b) and 5(b) are muchmore accurate approximations of separation—here, we use 100samples per class.

Though the correlation between accuracy and separation is notperfect, it is satisfactory as a metric for both evaluating the qualityof a liquid and revealing beneficial changes that can be made to aliquid. As the algorithm in Section 4 reveals, the relationship

th SDSM. These results show how the accuracy of the best liquid obtained so far

abilizes as more liquids are created. (a) shows the trend up to 50 liquid creations.

y for SDSM generated liquids is shown as a baseline only).

Fig. 8. The mean separation history using SDSM for the four problems explored in our experiment. Each history is an average of 50 trials. (a) Frequency recognition: 5

classes; (b) pattern recognition: 4 classes; (c) pattern recognition: 8 classes and (d) pattern recognition: 12 classes.

D. Norton, D. Ventura / Neurocomputing 73 (2010) 2893–29042900

between separation and synaptic modification within the SDSMalgorithm is complicated and cannot necessarily be predicted.Some flaws of the separation metric become apparent whenstudying Fig. 5(b). While the accuracy for the frequency recogni-tion problem and the pattern recognition problem is similar, thebest liquid separation is distinctly higher in liquids created withthe frequency recognition problem. This demonstrates thatdifferent problems will yield different separation values due totheir unique properties. In this particular case the disparity isprobably at least partly a result of the differing number of inputneurons present in each problem. The pattern recognitionproblems all have twice as many input neurons as the frequencyrecognition problem. This characteristic of the separation metricdoes not necessarily call for a change in the metric. It does mean,however, that separation values between different problemscannot be adequately compared to one another.

Finally, for our experiments we fixed the number of trainingiterations to 500 in order to bring attention to the limitations oftraditional liquid creation methods. However, since SDSMtypically yields an increase in separation, surpassing a specificseparation threshold could be used as a desirable terminationcriterion in practical applications. Such a threshold would not beas reliable in the traditional approach as there is not an increase inseparation over time.

6. TIMIT classification with SDSM

Liquids created with SDSM show significant improvement overtraditional LSMs in tightly controlled artificial problems. Now wewill explore the use of SDSM in classifying phonemes foundwithin the TIMIT dataset [25].

6.1. TIMIT

TIMIT consists of 6300 spoken sentences, sampled at 16 kHz,read by 630 people employing various English dialects. Since weare interested in identifying context independent phonemes, webreak each sentence down into its individual phonemes usingphonetic indices included with the sound files. This results in177 389 training instances. We then convert each of thesephoneme WAV files into its 13 Mel frequency cepstral coefficients(mfccs) [26] sampled at 200 Hz. Finally, we convert the mfccs into13 spike trains (one for each mfcc) with a firing rate calculatedusing the following equation (taken from Goodman [16]):

RateiðtÞ ¼mfcciðtÞ�oi

ðOi�oiÞ�MaxRate ð15Þ

Here mfcci(t) is the ith mfcc value at time t which correspondswith the firing rate for input neuron i; oi is the minimum value ofthe ith mfcc, and Oi is its maximum value. MaxRate is themaximum rate of firing possible for a given input neuron. MaxRate

is a constant that as per Goodman, we defined as 200 spikesper second. The spike trains are generated by emitting each spikeat uniform time intervals governed by the rate, obtained fromEq. (15), with no noise injected. The neuron model we use alreadyincorporates the injection of noise into each neuron via the Inoise

parameter (see Table 2). Because noise is already injected in themodel, we assume a clean input signal is preferable.

Preliminary work shows that in order for the liquid to achieveany appreciable level of separation, the time span of the inputneeds to be on the order of one second due to the operatingtimescale of the liquid. Since single phonemes have a length onthe order of 50 ms, we temporally stretch each of the spike trains

D. Norton, D. Ventura / Neurocomputing 73 (2010) 2893–2904 2901

using the following equation:

tnew ¼1

1þe�k�toldð16Þ

Here told is the original length of a spike train while tnew is thenew stretched length. The variable k is a sigmoid gain that we setto five, based on preliminary tuning.

The TIMIT dataset contains roughly 52 phonemes. Out ofcontext, correctly identifying all of these phonemes is a dauntingtask. Using six natural classes of phonemes, we reduce thisproblem to two simpler problems. The first is a general problemthat involves identifying phonemes as either consonants orvowels. For this problem ‘‘stops’’, ‘‘affricates’’, and ‘‘fricatives’’are considered consonants. ‘‘Nasals’’ and ‘‘semivowels’’ areremoved to avoid ambiguous sounds. The training data consistsof 1000 instances of each class and the test data contains 100instances of each class. The second problem is more specific andinvolves identifying one of four distinct ‘‘vowel’’ phonemes. Thephonemes used in this problem are e as in bee t; e as in be t; u as

Fig. 9. A comparison of SDSM and traditional LSMs across two problems derived from t

Fig. 10. A comparison of SDSM and traditional LSMs across two problems derived from

of 50 LSMs.

in bu t; and the er sound in butter. For this problem, the trainingdata consist of 150 instances of each class while the test datacontain 50 instances.

6.2. Results

The two problems outlined above are run on LSMs using SDSM,and on traditional LSMs. The mean results can be found in Fig. 9and the best results are shown in Fig. 10. These results areobtained by running the problems on 50 liquids either generatedwith SDSM or created randomly, with both the accuracy of theLSMs as well as the separation of the liquids displayed (keep inmind that the liquids showing the best separation do notnecessarily correspond to the LSMs with the highest accuracy).The parameter settings we use in these algorithms are the same asthose outlined in Section 5 with the exception of the number oftraining iterations. In experiments with TIMIT, we run SDSM foronly 200 iterations, since liquids tend to reach a plateau inseparation improvement by this point. Again, we did not perform

he TIMIT dataset. Results are the mean accuracy (a) and separation (b) of 50 LSMs.

the TIMIT dataset. Results are the best accuracy (a) and separation (b) obtained out

Fig. 11. We used SDSM to generate 50 unique liquids for each of four different

source problems. The corresponding liquid types are indicated on the X-axis. Each

of the four problems was then applied to each of these liquids as indicated by the

colored bars. The mean accuracy of the readouts trained on each liquid-problem

combination is reported.

D. Norton, D. Ventura / Neurocomputing 73 (2010) 2893–29042902

a statistical analysis of these results since the improvement ofSDSM over the traditional method is clear in all cases.

6.3. Discussion

Although the results of SDSM on TIMIT data are not asdistinguished as the results obtained with an artificial templatematching (Section 5), SDSM still shows a significant improvementover traditional liquids. Based on Fig. 9(a), on average traditionalliquids do no better than guessing with either of the phonemerecognition problems while LSMs using SDSM improve over thisbaseline.

While neither of these phoneme recognition problems areperformed at an immediately applicable level, even when onlylooking at the best liquids created with SDSM (Fig. 10(a)), itshould be reiterated that these results are obtained by classifyingindividual phonemes completely out of context. Most of thesesounds last one twentieth of a second and span a diverse range ofaccents. Speech recognition tasks usually involve classifyingphonemes as they are heard in a stream of data, thus providingcontext for each sound (i.e. preceding phonemes and pauses).Even in cases where words are being classified (such as classifyingspoken digits), the fact that each word consists of multiplephonemes aids in the classification process. We chose to performonly out-of-context experiments in order to keep these problemsparallel to the artificial ones in Section 5 and to focus on theseparation properties of the liquid. The poor accuracy obtained inclassifying phonemes may also lie with our process of convertingmfccs into spike trains. However, different methods of encodinginput into spike trains are a whole domain of research in and ofitself, so we will leave that to future work.

Even on these difficult real-data problems, the improvement ofusing SDSM over traditional methods is clear. Also, the fact thatthe results from both of these real-data problems were obtainedusing essentially the same parameters as those obtained fromboth of the artificial-data problems in Section 5, emphasizesanother potential strength of the SDSM algorithm—a robustnesswith respect to algorithm parameters. It was our observation thatextensive parameter exploration of the liquids for the TIMITproblems did not show a marked difference over the settingsalready obtained in Section 5. Parameter exploration on a problemby problem basis is a ubiquitous and time consuming componentof machine learning. While these observations do not exclude thepossibility of the necessity for parameter exploration on otherproblems, they show that for the four problems that weinvestigate in this paper (two artificial and two TIMIT), multiplesessions of parameter exploration were not necessary. This is anindication that SDSM is, in general, robust with respectto parameter variation; but, further research will be necessaryto verify this.

7. Transfer learning with SDSM

Transfer learning refers to the idea of transferring acquiredknowledge from one domain to another similar, but distinctlydifferent, domain. In machine learning it often refers to theapproach of using a model already trained for success on oneproblem to realize instantaneous performance gains on anotherrelated problem [27,28]. One of the strengths of LSMs is the abilityof the liquid to be transferred to different problems whilemaintaining effectiveness as a filter—effectively transfer learning.This is an important property since useful liquids can be difficultto find. Because SDSM essentially uses training data to create theliquid, there is a concern that the liquid’s ability to transfer maybe compromised. In order to test the transfer-learning capability

of the SDSM generated liquids, we run each of the problemsaddressed in Figs. 4 and 5 on every liquid used to generate thosefigures. In other words, we are using the same liquid generated forone problem as the reservoir for a new readout on a differentproblem. The results are shown in Fig. 11. When discussing theproblem used to create a specific liquid, we refer to the problemas the source problem.

Input neurons are considered part of the liquid, thus theirsynapses are modified as a part of SDSM. When a liquid is createdfrom a source problem, it has a number of input neurons equal tothe number of spike trains that encode the source problem’sinput. Because different problems have differing numbers of spiketrains, when transferring problems across liquids, discrepanciesbetween the number of spike trains and the number of inputneurons must be resolved.

When introducing a new problem to a liquid that was trainedwith the different source problem, we use the following approach. Ifthe new problem’s input is encoded in fewer spike trains than thesource problem, then we arbitrarily map the spike trains to a subsetof the liquid’s input neurons. The excess input neurons receive a nullsignal as input. If the new problem’s input is encoded in more spiketrains than the source problem, then we arbitrarily map multiplespike trains from the new problem’s input to individual inputneurons in the liquid. We combine the spiking patterns by simplyincluding all spike instances from multiple inputs into a single spiketrain. For each experiment, we use the same arbitrary mapping forevery input to maintain consistency.

The results shown in Fig. 11 are obtained using two types ofproblems. All of the pattern recognition problems use eight spiketrains for each instance while the frequency recognition problemuses only four spike trains. When the frequency recognitionproblem is introduced to a source liquid that was generated byapplying SDSM to a pattern recognition problem, four of the inputneurons have no signal. When a pattern recognition problem isintroduced to a source liquid that was generated by applyingSDSM to the frequency recognition problem, each input neuroncombines the signals of two spike trains.

Fig. 12 shows the results using initial random liquids as thesource liquid. Since all of the pattern recognition problems havethe same number of spike trains, and since the initial liquids areuntrained, the three data points showing the results for theseinitial liquids are actually redundant. However, they are includedto allow a side by side comparison with Fig. 11.

Fig. 12. We randomly generate 50 unique liquids for each of four different source

problems. The corresponding liquid types are indicated on the X-axis. Each of the

four problems was then applied to each of these liquids as indicated by the colored

bars. The mean accuracy of the readouts trained on each liquid-problem

combination is reported.

D. Norton, D. Ventura / Neurocomputing 73 (2010) 2893–2904 2903

These results demonstrate the ability of liquids to transferknowledge to different problems. Fig. 12 emphasizes this by thedistinct lack of variation in behavior between liquids generatedfor frequency recognition and pattern recognition problems. Onemight expect a difference in behavior between these two differentliquid states since liquids created for frequency recognition onlyhave four input neurons while liquids created for patternrecognition problems have eight. The fact that the results showno significant difference indicates that the liquid is indeed actingas a general temporal filter.

SDSM uses training data to create new liquids from thoserandomly generated for Fig. 12. Since the new liquids that arecreated depend upon the problem used to train them (the liquidsource), one might expect that the efficacy of the learning transferwould be compromised. Interestingly, the results shown in Fig. 11clearly demonstrate that this is not the case. In fact, liquids notcreated with frequency recognition as the source problem performedbetter on that problem than liquids actually created with frequencyrecognition as the source problem. However, liquids created withvarious pattern recognition source problems did perform better onthose problems than liquids generated with frequency recognition asthe source problem. In both cases, SDSM still performed significantlybetter than traditional LSMs (compare Figs. 11 and 12). The fact thatliquids created with pattern recognition performed better on bothproblems indicates that the source problem used to create the liquidcan make a difference. Looking at Fig. 11 we see that all of the liquidscreated with pattern recognition source problems found liquids thatperformed better on all of the problems than liquids created withfrequency recognition as the source problem. Pattern recognition isarguably the more complicated of the two problems because of thenumber of inputs and more distinct separation of classes inthe frequency recognition problem; and, by applying SDSM withthe more complicated source problem, the liquid may be primed foroverall better performance. It should be noted that transferringlearning across different numbers of classes and input neurons aloneis a difficult problem that SDSM apparently overcomes. Future workshould investigate a greater variety of problems to determine theextent of generalization that is possible using SDSM.

8. Conclusions

For a variety of temporal-pattern-recognition problems, wehave shown that by training the liquid, our new algorithm, SDSM,

reduces the number of liquids that must be created in order tofind one that acts as a useful filter for the LSM. Additionally,separation provides a metric for directly evaluating the quality ofthe liquid, and can thus act as a satisfactory termination criterionin the search for a useful filter.

Also, despite training the liquid for a specific problem, we havedemonstrated the ability of liquids generated with SDSM totransfer between distinct problems effectively. This requiresindividual liquids to elicit variable numbers of distinct responsesto account for differing numbers of classes between problems.Furthermore, transfer learning occurs despite a naive translationfrom one input representation to another in cases with differingnumbers of input neurons.

SDSM may allow researchers to more effectively create liquidsfor most LSM research—both improving results and cutting downon experimental run-time. More generally, since the liquid in anLSM is a recurrent SNN, and since SDSM trains this liquid, ouralgorithm may present a useful option for researchers of recurrentSNNs to explore.

References

[1] W. Gerstner, W. Kistler (Eds.), Spiking Neuron Models, Cambridge UniversityPress, New York, 2002.

[2] S.M. Bohte, Spiking neural networks, Ph.D. Thesis, Centre for Mathematicsand Computer Science, 2003.

[3] W. Maass, Networks of spiking neurons: the third generation of neuralnetwork models, Neural Networks 10 (9) (1997) 1659–1671.

[4] S.M. Bohte, J.N. Kok, H.L. Poutre, Error-backpropagation in temporallyencoded networks of spiking neurons, Neurocomputing 48 (2001) 17–37.

[5] O. Booij, H.T. Nguyen, A gradient descent rule for spiking neurons emittingmultiple spikes, Information Processing Letters 95 (2005) 552–558.

[6] T. Natschlager, Neural micro circuits, /http://www.lsm.tugraz.at/index.htmlS, 2005.

[7] T. Natschlager, W. Maass, H. Markram, Computer models and analysis toolsfor neural microcircuits, in: Neuroscience Databases. A Practical Guide,Kluwer Academic Publishers2003, pp. 123–138 (Chapter 9).

[8] H. Jaeger, A tutorial on training recurrent neural networks, covering BPPT,RTRL, EKF and the ‘‘echo state network’’ approach, Technical Report,International University Bremen, 2005.

[9] W. Maass, Networks of spiking neurons: the third generation of neuralnetwork models, Transactions of the Society for Computer SimulationInternational 14 (1997) 1659–1671.

[10] W. Maass, P. Joshi, E.D. Sontag, Computational aspects of feedback in neuralcircuits, PLoS Computational Biology 3 (1:e165) (2007) 1–20.

[11] W. Maass, T. Natschlager, H. Markram, Real-time computing without stablestates: a new framework for neural computations based on perturbations,Neural Computation 14 (11) (2002) 2531–2560.

[12] H. Jaeger, H. Haas, Harnessing nonlinearity: predicting chaotic systems andsaving energy in wireless communication, Science (2004) 78–80.

[13] D. Verstraeten, B. Schrauwen, M. D’Haene, D. Stroobandt, An experimentalunification of reservoir computing methods, Neural Networks 20 (2007)391–403.

[14] E. Goodman, D. Ventura, Effectively using recurrently connected spikingneural networks, Proceedings of the International Joint Conference on NeuralNetworks, vol. 3, 2005, pp. 1542–1547.

[15] D. Verstraeten, B. Schrauwen, D. Stroobandt, Adapting reservoirs to getGaussian distributions, European Symposium on Artificial Neural Net-works2007, pp. 495–500.

[16] E. Goodman, D. Ventura, Spatiotemporal pattern recognition via liquid statemachines, in: Proceedings of the International Joint Conference on NeuralNetworks, 2006, pp. 3848–3853.

[17] D. Verstraeten, B. Schrauwen, D. Stroobandt, J.V. Campenhout, Isolated wordrecognition with liquid state machine: a case study, Information ProcessingLetters 95 (2005) 521–528.

[18] R.A. Fisher, The use of multiple measurements in taxonomic problems,Annals of Eugenics 7 (1936) 179–188.

[19] W. Maass, N. Bertschinger, R. Legenstein, Methods for estimating thecomputational power and generalization capability of neural microcircuits,Advances in Neural Information Processing Systems 17 (2005) 865–872.

[20] S. Bornholdt, T. Rohl, Self-organized critical neural networks, Physical ReviewE 67 (2003).

[21] N. Brodu, Quantifying the effect of learning on recurrent spiking neurons, in:Proceedings of the International Joint Conference on Neural Networks, 2007,pp. 512–517.

[22] N. Bertschinger, T. Natschlager, Real-time computation at the edge of chaos inrecurrent neural networks, Neural Computation 16 (7) (2004) 1413–1436.

D. Norton, D. Ventura / Neurocomputing 73 (2010) 2893–29042904

[23] T. Natschlager, N. Bertschinger, R. Legenstein, At the edge of chaos: real-timecomputations and self-organized criticality in recurrent neural networks,Advances in Neural Information Processing Systems 17 (2005) 145–152.

[24] K. Jim, C.L. Giles, B.G. Horne, An analysis of noise in recurrent neuralnetworks: convergence and generalization, IEEE Transactions on NeuralNetworks 7 (1996) 1424–1438.

[25] J.S. Garofolo, et al., Timit acoustic-phonetic continuous speech corpus,/http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S1S,linguistic Data Consortium, Philadelphia, 1993.

[26] ETSI ES 202 212STQ: DSR, Extended advanced front-end feature extractionalgorithm; compression algorithms; back-end speech reconstruction algo-rithm, Technical Report, European Telecommunications Standards Institute(ETSI), 2003.

[27] S. Thrun, L. Pratt (Eds.), Learning to Learn, Kluwer Academic Publishers, 1998.[28] K. Yu, V. Tresp, Learning to learn and collaborative filtering, Neural

Information Processing Systems workshop ‘‘Inductive Transfer: 10 YearsLater’’, 2005.

David Nortan recently received a Master’s Degreefrom Brigham Young University. His thesis was titledImproving Liquid State Machines Through IterativeRefinement of the Reservoir. It explores a relativelynew neural network model, the Liquid State Machine.Prior to his Master’s Degree, he received BachelorDegrees in both Computer Science and Zoology. Davidis currently working on his Ph.D. in the ComputerScience Department at Brigham Young University. Hiscurrent research is in the budding field of computa-tional creativity. His dissertation will look at theimportance of intention and self-evaluation in the

creative process.

Dan Ventura is an Associate Professor in the ComputerScience Department at Brigham Young University. Hisresearch focuses on creating artificial intelligentsystems that incorporate robustness, adaptation andcreativity in their approaches to problem solving andencompasses neural models, machine learning techni-ques, and evolutionary computation. Prior to joiningthe faculty at BYU, he was a Member of the Informa-tion Sciences and Technology Division of the AppliedResearch Laboratory and a Member of the GraduateFaculty of Computer Science and Engineering at PennState University. Dan has also spent time in industry as

a Research Scientist with fonix Corporation, working on

the development of state-of-the-art technology for large vocabulary continuousspeech recognition.