Kargupta and Ray

8/12/2019 Kargupta and Ray

1/23

Complex Systems 9 1995 305- 327

A Temporal Sequence Processor a sed on the iologicalReaction diffusion Process

Sylvian R ayHillol Ka rguptaDepartm ent of Computer Science,

Universty o Illinois a t Urbana-Champaign,Urbana, IL 61801, USA

A bst r a ct. Temporal sequences are a fundament al form of information and communica tion in both natural and engineered syst ems .The biological control pr ocess that dir ects t he generat ion of iterativest ructures from undifferentia ted t issue is a typ e of tempora l sequential process . A qu antitat ive explanation of this temporal process isreaction-di usion, init ially proposed in [1] and later widely st ud iedand elaborated.

We have adapted the reaction-diffusion mechanism to create atemporal sequence processor TSP composed of cells in a diffusionsupport ing medium that performs storage, associat ive retr ieval, andpredict ion for temporal sequences . The TSP has severa l interest ingat t ribute s: 1 It s achievab le dep th or degree is constrained only bystorage capacity, 2 it tolerates subs tant ial t ime warp, and 3 it sup ports user-specified flexible groupings of stored sequences into coarserclasses at ret rieval time. The TSP is also capable of pr eserving thet ime extent of stored symbols, as in a musical melody, and permi tsretrieval of both t he symbo ls and their temporal exte nt. Exp erimentalverificat ion of the propert ies of the TSP was performed with Rebergrammar sentences and musical melod ies.

Introduction: Temporal sequencesTemporal sequences arise natur ally when one attempts to process any timedomain signa l for the purpose of recognizing or predicting the presence offeatures relevant to the part icular application . Human speech, biomedicalsigna ls, music, or any funct ion of t ime originating from a sensor constit utesa t empo ral sequence whose mean ing or int elligence cont ent depends not onlyon the exist ence of certain fea tures, but also on their parti cular temporalorder. A t ime domain signal is most convenient ly abst racted as a temporalsequence of fea ture vectors, often called a spatiotemporal sequence,

x {x t;}, = 12 , .. . , .


2/23

306 Syivian R Ray and Hilloi Karguptawhere each vector x t i is a fea ture vect or , a condensed but adequate ly veridical represent ation of the signal in the vicinity of time t..

The feature vect ors X originating directly from signal applications usually span a space of high dimensionality, for example, 15 . In many practicalapplications , the infinite number of possible feature vectors in a real spaceare reduced to a finit e number of classes by a clustering operation such as aself-organizing feature map or k-means clust ering. Upon representing eachof K clusters or equivalence classes wit h a symbol s. i = 1, . K we obtaina manageab le set of symbols forming the assumed input for the temporal sequence pro cessor. Abstract ed spatiotempo ral sequences are thereforerepresented as strings such as cjkltprmmbos without significant loss of representational power compared with the actual signal information. The basicsequences can be supplemented wit h time extent information by minor extension of the fundam ental representation , as we discuss in sect ion 4.7.Also, t he actual cardinality K of t he finite alphabe t , if too large, canbecome a maj or considerat ion, but t his is a quest ion of scale that we willignore.2. The objectives of temporal sequence processingThere are two distinct sequence pr ocessing tasks that we are particularlyint erested in pursuing here. These tasks ar ise in t he context of semanticcontent discovery in signals [2].

1. Embedded sequence recognition ESR . A number of short pat tern sequences, PS PSZ . .. = PS , are sto red in the device as shown in Figur e 1. An unbounded argument sequence , ARGSEQ , is compared withthe set , PS, contained in the sequence memory. When a subsequenceof ARGSEQ is found that matches any member of the set PS withina pr esp ecified accuracy, a successful recognition is indicat ed . As anexample of this case in speech recogniti on , PS is some transformedrepr esent ation of phonemes or quasiphonemes, and the ARGSEQ is asimilarly pr eprocessed speech signal.

2. Sequenti ally addressed sequence m emory SASM . Long sequences ,ST , are stored see Figure 2) . Short a ddress sequences are comparedwith ST for the purpose of locat ing regions of ST that mat ch the appliedaddress sequence. a sufficient ly close match is found, we want to recognize the condition and possibly read out the continuat ion of ST fromthe endpoint of the success ful match or , possibly, find the beginningof ST and read the whole st ored sequence. This case is the tradit ionalCAM problem, modified by the requirement of sequential present ationof the address and sequential readout from the match point .

Two dist inct algorit hmic procedures ES R and SASM) are needed to solvethese two problems.


3/23

TSP Based on the Biological Reaction diffusion Process

EM EDDED SEQUENCE RECOGNITION T SKLong external sequence

307

e t m k d n 0 p X s r e m p n n p w X y z a r

Stored short sequences

p r

Figure 1: Embedded sequence retr ieval is the ident ification of shortstored subsequences within a long or unbounded argument sequence. is performed by the guided sequence retrieval algorithm.

SEQUENCE ADDRESSED SEQUENTIAL MEMORYEXAMPLE ADDRESS SEQUENCE XT F

LONG STORED SEQUENCESR T K C L X T F P C Y W L P P R S T U M N N

E L 0 N G a R S H a R T [x T F F I L L E R

Figure 2: The task defined as sequence-addressed sequential memoryis defined as the location of short sequences within long stored st rings,followed possibly by ret rieval of t he identified sequence.

For ESR and for the address ing phase of SASM, an algorit hm is usedto guide the comparison of an external address sequence wit h intern alsequences . We will call t his algorithm guided sequence retrieval GSR .

For the readou t phase of SASM, after a unique stored sequence hasbeen identified , t he algor it hm used is free sequence retri eval (FSR). Itdepends solely on intern al stored states . FSR is also viewed as t imeseries prediction .

T hus, our main obj ect ives from an applica t ion viewpoint (ESR and SASM)will be seen to reduce to the two algorit hms, GSR and FSR , appropria te lyapplied . f t he effect iveness of GSR and FSR can be demonstrated convinc-


4/23

308 Sylvian R Ray and Hmol Karguptaingly th e success of th e applications follows. The GSR and FSR algorithmswill be covered later.2 D esired qualities of a temporal sequence p rocessorLet us briefly review the qualities we desire for a TSP from an engineeringviewpoint. These qualities will affect how well and how completely the TSPcan achieve GSR and FSR over a range of problems that is its versatility.First some definitions are needed. A complex sequence is one const ruc tedfrom any number and order of symbols from a permissible alphabet . Weassume only complex sequences in th is paper.

The depth or degree of a sequence is the min imum number of symbolspreceding n required to predict s.; accurately. By a system having depth n we mean tha t any symbo l k requiring no more than th e preceding n sym-bols S k n Sk 2 S k to predict it can be pr edict ed wit h certainty. Forexample if the system has depth 2:8 th en having learned th e two sequencesxabcbaahz and axbcbaahy the terminal z and yare infallibly pr edicted.

Quality 1: Depth. The TSP should be capable of minimum depthn 1. n should be a design paramet er. Note that th e depth n forcomplex sequences implies t he ability to count up to n repetitions ofthe same symbol which is a strong requirement .

Quality 2: Tempo ral fl exibility. The TSP should to lerat e temporal scal-ing and warp during st orage addressing and retrieval Scaling refersto the absolute t ime units ; warp is the nonuniformity of intersymb olspacing. However it should be possible to retrieve th e relative presen-tation length of th e symbols if that inform ation is st ored .

Quality 3: Equivalence class fl exibility. The TSP should be capableof flexib ly adjust ing the radius of the equivalence classes of storedsequences . Thus similar sequences learned as dist inct may be t reatedas equivalent without altering the sto red data. This quality impli citlypermi ts limited err ors such as missing or added symbols to be eitherto lerated or not tolerated depending on the object ive during retr ieval.

Quality 4: Min imum storag e. We define th e absolute storage require-ment as th e to tal numb er of weights needed to store a par ticular collec-t ion of sequences. The tot al number of weights required depends on thenumber of common subsequences in th e data. In th e present context the required storage capacity is Nsym Nt where Nsym is th e numberof symbols in the alphabe t and is the numb er of symbol transi-tions that must be dist inguished . Obviously we want to min imize thisrequirement.

Quality 5: Sequential contnt addressability. The TSP should providefor content addressability with a sequent ially supplied address.

IThe terms retrieval and recognition are used synonymously here .


5/23

A T Based on the Biological Reaction-diffusion Process 309

Implications of desired qualitiesWhat exactly are the propert ies tha t t he desired qualit ies and our applica-tions objectives imply about the TSP or network in general terms? A carefulana lysis of this question yields inst ructive insights on the subject of the TSP.

First , in order to simplify the addressing algorithm , it is very desirab let o have only one cell or neuron per symbo l of t he alphabet . This choicedefeats the potential prob lem of having to manage mult iple act ive pat hs whilesearching for a uni que responsive sequence.

Second, to be able to dist inguish sequences such as cbbbh from cbbbbg eachtransition must produ ce a distin ct internal state for example, the transit ionfrom the second to the third b must result in a different internal state fromthat produced by the tran sition from the third to t he fourth b. This impliest hat t he one cell identified with the symbol b must be able to resolve thedifference, however it may be represented, between v and v 1 previousoccurrences of the symbol up to some maximum dep th , n . See [3J for athorough discussion . But , t hird, to satis fy temporal flexibility quality 2,the ability of a cell t o disti nguish between the vt h and l th pr eviousoccurrences of a symbo l must not be rigidly dependent on the absolute orrelative t ime sca le.F inally, the desire for equivalence class resolut ion qua lity 3 can be sat-isfied by an acceptance window diameter that is dynam ically adjus table atretrieual time so that the t ransition from the vt h to t he v l t h previousoccurrence of a symbo l can be either resolved or ignored according to a globalcontrol parameter .

Given these crys tallized requir ements for the TSP, the merits of variousproposed systems can be judged for br eadth of capab ilit ies.

We will show that the TS P prop osed here does satisfy all of these require-ments wit h some limitations on t ime flexibili ty quality 2 and on addressingof int ern al subse quences quality 5 .

3 Previous work on sequence recognit ion by networksFor an excellent summary art icle on temporal sequence processing see [4] Theconclusion regard ing simple recurrent network architectures SRNs [5, 6] isthat they are ext remely limited in depth, to t he degree that they are imprac-ti cal for many applicati ons . [7] concludes that t her e is no st raight forward wayfor these networks to exhibit the property of count ing . urown experimentssupport this conclusion. The limitation on SRNs is centered on the fact thatthere is only one vector t o represent all pas t contexts, which severely limitsdepth . In addit ion, there is no known design formu la that relates achievabledepth to architecture parameters.

The recurrent cascade corre lation network RCC in [8] is anot her int er-esti ng approach to TSP that is act ively const ructive, as is it s par ent , thecascade corre la tion network. It learns a Reber grammar efficiently and pro-cesses the t est set with a very low err or rate. A major limit ation of RCC is


6/23

310 Sylvian R Ray and Hillol Karguptathat its time delays are fixed, which limits its usefulness with t ime-variableinputs see quality 2 of sect ion 2.1).

The ability to tra in for depth and for variable intersymbol delay is afforded by the gamma delay arrangement explored in detail in [9]. t isdifficult to compare their approach with the present work since the gammadelay assumes numerical signals, for which algebra ic operations are defined ,whereas we are t reating the sequences as nominal symbols.[10] also present s an int eresting mechanism for ret aining and distributi ngthe memory of past events in their TEMPO 2 model. They make use of anadaptive gaussian kernel to capture history by distributing the t race of asymbol in t ime. The react ion-diffusion method of history retention used hereis essentially a superset of the diffusion idea , since the reaction process addsto the flexibi lity which can be exploited.

The temporal sequence processing algorithm studied in [ ] addressesmany of the same goals as the current effort . They achieve the depth object ive by assuming an element with multiple terminals. Terminal 1 stores th estate of th e most recent inpu t to the element, terminal 2 the second mostrecent exte rnal input to the element, and so on. Thus, by providing an terminal element represent ing symbol S with each terminal corr espondingto a dist inct history of the sequence prior to k a depth of n can be accuratelystored. This design meet s all of the desired qual it ies specified in sect ion 2.1except , possibly, for some limitations on quality Minor propert ies of thedesign that are less than ideal are the fact that a maximum depth must bespecified in advance and the biological bas is for multiple dist inct terminalsis not supported. utthe design in ] squarely faces the need for sufficientstorage to represent the complexity required, which is t he primary problemwith SRNs and most other attempts . t overcomes or ameliorates its need forprespecified maximum depth by prov iding a mechanism for chunking that is,grouping subsequences iteratively into higher-level representat ions . We willreturn to a comparative discussion of the present reaction-diffusion methodwit h the approach of ] in sect ion 3 e biological basis : Reaction diffusionA well-studied biological expe riment consists of the surgical removal of anint ernal segment of a cockroach t ibia followed by regrafting of th e dist al andproximal parts , as illustrated in Figure 3 [12]. the original t ibia consistedof a sequence of similar but not ident ical segments numbered 123456789, andthe segment s 4567 are removed , it is observed that after one or two moultsthe internal segment is regenerated in its original order. This experiment implies the existence of memory of the t ibia segment sequence and a controlledgrowth process . How is this to be explained? A quantit at ive explanation forthis as well as myr iad other pattern growth processes e.g., zebra st ripes, leafcapillary patterns) was set for th some 40 years ago [1] in t he form of a setof partial differential equations that describe the selfstabilized increase anddecay of growth-st imula ting morphogens in a reactiondiffusion process.


7/23

A TS Based on the Biological Reaction-diffusion Process

Here is an example sys tem .R eaction Diffusion Equa t ions

8gpz = Egp Z _ D 82gpzc agp z + 9 2ut r uZ8r z = 2 ) - D 8 r zs: Y L gp z r z + r bt p uZ

311

(1)2)

where gp represents t he concentrat ion of the pth morphogen that excite s thegrowth of the pth segment of the t ibia , r is the concentration of the commonor global reactant and the coefficients are cons tants, the Ds being diffusioncoefficients.

To give a sketch of the biological mean ing, assume that segment 8 emit smorphogen g7 . r = 1, gdbt will grow by posit ive feedb ack te rm 1 ofequat ion 1) for some time, and the diffusion term cont aining Dg will diffuse t he chem ical into the segment 8 segment 3 interface, stimula ting growthof segment 7 mat erial. At dist ant loca tions, the reactant r will diffuse andsuppress growth of segment 7. Segment 7 material at the interface wit hsegment 3 tr iggers the emission of g6, which causes segment 6 growth . Not ice that the reactant concentration is generated in proportion to the sumof squares of the morphogens present , thus suppressing and stabilizing thepro cess through the app earance of r in the denominator of the first term inequati on 1. Thus the react ion-diffusion equations perm it calcula t ion of thegrowt h, diffusion , and decay of each morphogen in turn , which effectivelycarries the history of previous act ivity forward in time while dist ributing theinformation spatially by diffusion. This is the natural biological process thatwe will simula te as the basis for the int ercommunication of cells elements)and the distribut ion of sequence informat ion .

Figure 3 illustrates the expe riment [13, 14]. A cockroach t ibia is originallyconst ructed as a sequence of similar but not identi cal segments numbered123456789 in proximodist al order. segment s 4567 are surgically removedand the remaining portions of the tibia grafted together to form the sequence12389, after one or two moults the missing elements are found to regenerate. Further experimental evidence shows that growth morphogens areemit ted sequent ially starting from the dist al segment , 8, indu cing growth inthe order 7654.

The existence of this biological sequence reproduct ion , being a form of sequent ial memory, at tracted our interest in using an analogous process in thestorage and retrieval of symbol sequences. The react ion-diffusion temporalsequence pr ocessor ReDi TSP ) emerged from this observat ion. The primarybenefit of this starting point is a well-defined , stable system for communi cating conditions from one cell by a means ot her than discrete conduct ive pathse .g., axons or wires).


8/23

312 Sy lvian R. Ray and Hillol Kargupta

F igur e 3: An expe riment that demonst rates t he regenerati on ofsequ ent ial segments of cockroach t ib ia. Corr ect ly ordered regeneratio n of segments is explained by the react ion-diffusion model. F rom[12].)

3 he computationl modelArchitectur ally, the ReDi T model consists of cells randomly dist ributedin two- or three-dimensiona l space Figure 4). We will assume a two-dimensiona l geometry, without loss of generality, for presentation simplicity. CellsFi gure 4, upper left ) are fixed in pos it ion . Each of the N cells is uniquelylab eled Gi , i = 1, .. . ,N which is also the label of the out put that the cellemits when activated.

The cells are interconnected through a diffusion-supporting medium. Atany t ime t , and at any position x , y , the internal signal is a vector

G x , y , t = G1 G 2 , . . . ,Gk , . . . ,GK where K is the total number of distin ct symbo ls.? G is the symbol concent ration vector at location x,y, and Gk is the concent rat ion of the kt h symbolat x,y.

Each cell has a number of weight registers , i 1, . . . ,P. The lengt hof each V register is K real numbers .A cell can be activated by two means. F irst , t he cell may act as a rad ial

basis function cell in response to t he local value of the G vector . pAG = min G x ,y < Us

l

t he cell is internally excited by level AG . The excited cells form a compet it ive network result ing in a single winner, namely, t he cell , which sat isfiesmint=l AGj , being the most strongly excited cell.

Second , a cell GM may also be excited by the external input directly, byhaving

2Analogous to morphogens in the biological tissue growth model.


9/23

A T Based on th e Biological R eaction-diffusion Pro cess

ACLv. LABELCondo OUTPUTlS. - _ _ SINGLE CELL

Re i TEMPORAL SEQUENCE PROCESSOR

Figure 4: Architecture of the ReDi TSP. Identical cells are distr ibut edrandomly in a thr ee-dimensional or two-dimensional volume suppor ting diffusion.

313

where S is the input symbol and ZM is the external input weight vector ofcell M In thi s case we are assuming nominal as opposed to linear inputsymbo ls so that the input symbols ar e an orthogonal set .

The single winning cell, Cj emits a un it impulse of its lab el symbol, Gj afte r wh ich the symb ols react and diffuse according to the react ion-diffusionequations .

Let us br iefly touch on the principal aspects of operat ion of the TSP.In general terms, the external input pa th with weights ZM provides themeans for learning the sequences originally and for guiding t he sequence ofactiva tions in guided sequence ret rieval operation .

The weight s , VMj sp ecify the P different internal condit ions when cellCM corresponds to the next sequential input symbol. The V regist ers, aftertraining , contain the memory of all learn ed transitions, including context,back to a depth determined by u s Set t ing Us = 0 produces theoret icallyinfinit e depth or eidet ic memory, limi ted only by P the number ofV registers,since every unique context of a symbol is learned. As Us grows, similarsequence contexts are treated as equivalent, which reduces depth bu t alsoreduces the st orage capac ity requ ired . The states store d in the V registersare the sole source of sequentia l operat ion during free sequence retrieval. Thefuller discussion of how the desired qualiti es of the TSP are achieved will bepresented after the complet e algor ithms are descr ibed.


10/23

314 Sylvian Ray and Hillal Karu ta4. Sequence storage and recognition4 .1 Training the ReDi TSP to store sequencesA training or storage sequence 81 8283 . is presented at integral time points.The input symbols are maximally sparse and therefore form an ort hogonalset .

The number of cells is N K , where K is the number of dist inct symbolsin the input symbol alphabet. P is the number of V reg isters per cell . The Vregisters ar e the weight vectors that can respond to intern al symbol vectorsin the diffusion-supporting med ium .

Initially all V regis ters are in an un assigned state In order to simplifythe algorithm statement , without loss of generality, assume cells are uniquelyassigned to each symbol of the alphab et and labeled Gs; The cell assignmentis accomplished by setting the exte rnal weight vector , Z, = s

The Storage Tra ining) Algorithm1. Set i = 1.2. Apply 8 i . The winning and act ive) cell will be Gs; abbreviated AC act ive cell) .3. For the AC ,If

pmin Vi - Gx,Y < U st th e current transition from 8 i to s.) is already known .Else recruit an unused V register , in cell Gs; and setone-shot learn )

V j = Gx,y .4. Emit a unit impulse of symbo l Gs ; 5. Evaluate the react ion-diffusion equat ions for one symbol step ,

growing and diffusing all extant symb ol concentrat ions . Acons tant numb er of integration ste ps ar e used per symbolst ep depending on the dimensions of the cell array.)

6. Increment i and go to step 1 until input sequence end s) .Dur ing storage, each symbol in the applied sequence causes the emission

of a symbol fluid , which diffuses spat ially. The successive symbols build avector G , which varies with both time and space. History of the sequence

3The t ime points can be set by t he appearance of inp ut symbols or t reated as absolutetime uni ts.


11/23

A T Based on the Biological Reaction-diffusion Process 315

v . v- - v - vvv

Figure 5: How th e TSP works. Nine cells with three V registersare shown. Bar graphs represent t he symbol concentration after thesequence addefea is learned. An act ivated cell traps the current symbolconcentrat ion in one of its V registers. For example, the first d trapsonly some Ga in its VI register. Each register represents one vectorth at can act ivate fire) the cell.

is dist ribut ed throughout the volume, where it can be associated to the successor symbols . Later , dur ing testing to det ect the existence of a stored sequence, the sequence of emission and reaction-diffusion that occurred duringtraining will be recapitula ted , exact ly or app roximately depending on asby guided sequence retrieval.

4.2 An example of the t raining algorithm in useAn example of the stored V vectors that would exis t after a sequence is storedis shown in Figure 5. The sequence addefea.. . is stored .

When a first occurs, the environment is blank so a activates Cell athrough the extern al Z inputs, not shown), which record s as, the currentlocal symbol concentrat ion, in it s first V regist er.

When the first d occurs , the symbol a has diffused to the vicinity of celld, which becomes act iva ted from t he external inputs through it s Z weightvector. Being t he AC, it sto res the local symbol vector, G , in it s V register,consisting only of some a symbol concentration.

When the second d occurs, cell d is react ivated and again st ores t helocal G , which contains some a and some d symbol concentrat ions . Cell dthen emits a uni t impulse of d symbol.

Upon reaching the second occurrence of a in the input sequence, thecurrent symbol concent ration at the x y posi tion of cell a is stored in thesecond V register of the cell, which then retains the state . The second Vregister of cell a shows the successively smaller values for e, d anda which are approximately proportional in this case ) to the recency ofoccurrence of the symbols in the input sequence.

4Note that symbol concent rations may grow as well as decay, depending on values ofthe free parameters in equat ions 1) and 2). This allows mor e degrees of freedom thani u sion alone.


12/23

316 Sylvian R Ray and Hillol Kargup taThe actual magnitudes of the symbo ls in the symbo l vector , G x , ydepend on the relat ive posit ion of cells in the volume as well as on theconstants used in the reaction diffusion equations . T he magnitudes of symbo l

concentrat ions do not even have to decay monotonically; it is essential onlythat the same subsequence produce the same concentrations on repeat edappearance .Storage with as = 0 is analogous to eidetic memory and requires a different V register for every unique symbol transit ion. This condit ion is veryprofligate with memory usage. For increasing as, however , an increasing

number of transitions are treated as equivalent by using an exist ing V vectorthat is close enough to a subsequence already seen . In this case , a new Vvect or is not required , and this reduces storage space.We have chosen to use a one-shot t raining algorit hm here. Increment al

training , typically used in neural networks, could be used to average outvariations in timing of the symbols, but we will not pursue that avenue inthe current invest igat ion .4.3 Demonstration of depth and counting qualityThe depth and count ing ability of the TSP , referred to in sect ion 2.1, wasdemonstrated in a simple experiment simula ting only the storage algorithm.This property was demonstr ated by monitoring t he growt h ofV register usagewhile storing a sequence wit h repet itions of a single symbo l. The experimentconsisted of storing the sequence ddddddd. . . and noting the symbol number ,n , where the storage algorithm first did not invoke a new V regist er to storethe state . This would correspond to a depth of n - 1 as well as the abilityto pre dict the different successor symbol for the sequences cdn- 1e and cd t ,for example.This experiment was performed with various values of to lerance, as. The

results are tab ulated below. Unless otherwise noted , experiment s were runwith Dg = 0.3, Dr = 0.3, a = 3= 0.3, E = 0.2, and I = 0.1 in equat ions 1and 2.

ounting Testn largest depth34610

0.050.040.02 1o

When the recognition or ret rieval algorithm, discussed later , uses thesame value of a , t he sto rage pr ocess is recapitulated determinist ically, so wecan conclude that the depth during retrieval will be the same as that foundby this experiment.


13/23

A TS Based on the Biological R eaction-diffusion Process 317

Metaphorically, larger values of o, correspond to paying less attent ion t othe pr ecise count. As the needed count or depth increases , t he penalty isan increase in the required storage capacity, the total number of occupied Vregisters in the TSP. The network in e embedded sequence recognition modeSuppose many short sequences are sto red in a ReDi TSP as in Figure 1 . Leta long argument sequence be pr esent ed beginning at its kt h symbol, Sk , fort he purpose of detecting whether t he arg ument subsequence is stored in thenetwork. This is guided sequence retr ieval ; we are using the known exte rnalsequence to guide the search for it s stored equivalent .

t is assumed that 1 we want to search for the entire sto red short st ringand 2 the period of symbol pr esentation is the same as during the storageoperation . JT is the to lerance measure during tes t operations , set ting theradius of acceptance of matches between the stored and argument sequences .

The algorit hm is as follows.Guided Sequence R et rieval

1. Set the argument string pointer , k = 1. Clear all symbolconcentrations , G x ,y , to zero .2. Using the exte rnal input S k act ivate cell CS k whose lab el isGSk . f none, terminate with failure.

f there exists a V register of CS k such that I Gi xy - Vi S JT

then CS k becomes the new AC and the t ransition Sk 1 Skin the argument sequence is a known previously encounte red transition. Emit one uni t of Sk at the location of CS k ,and perform one symbol step of the reaction-diffusion equations. Else, if no cell sat isfies inequality 2, then the externalsequence is known only from S l to S k 1 . Terminate t he t estwit h unknown sequence.

3. f a known t ransition occurred , increment k and return tost ep 2, it erat ing until an end mark of th e stored sequence isencount ered , imp lying a known sequence.

The foregoing algorithm of the test phase recap itulat es t he training stepsin the sense that the symbol condit ions present at storage are revivified during the successful matching of the argument sequence to an originally storedsequence. Not ice, however , that we are permitting a different act ivation tolerance measure, JT , than we used during training. As JT rises from 0, weare requiring increasingly less strict mat ching of the symbol concent rationswith the original conditions during storage of the sequence. The int erestingfeature here is that the pr ecision of matching the exte rnal sequence to theintern al sequence is controllab le during recogni tion .


14/23

318

TRANSITION DIAGRAM FORREBER GRAMMAR

Sylvian Ray and Hillal Kargupta

EMBEDDED REBERGR MMAR

F igur e 6: mb edded rebel grammar. Only the penult imat e symbol ispredict able, given the second symbol.

4 .5 n em bed d ed sequence recognition experimentEmbedded sequence recognition prov ides a testbed to demonstr at e how equivalence class flexibility and sto rage efficiency are traded off.

A finite-stat e Reber grammar has been strengthened as a test for depthand time-series prediction in recurr ent neural nets . The generated sequencesall have the second and penult imate symbo ls in deterministic relationship ,generating sequences such as TXPSSTTTXT or PVTPSP Figure 6) ] . All interior sequence positions, other than the penultimate, correspondto transitions having equal probab ility of either of two symbols.The average sentence length generated is abo ut 10 symbols, with the

maximum of about 30 symbols in a sample of 500 sent ences.A ReDi TSP with 25 cells in a 5 x 5 grid was defined . Five hundredst rings were generated and applied to t he TSP in the training or storage

mode. Since the grammar uses only 7 symbols, t here were 18 unused cells.Varying the storage to lerance measure, demonstrates the range ofgenera lization possible during sto rage. In the limit as a 0, every transit ion with a unique hist ory is sto red as a unique transition, consuming oneV regist er. The first appearance of a distinct t ransition is signaled as anunknown t ransit ion, which requires recrui tment of a V register. As J s increases, some transitions are t reated as equivalent by the system. For example, when J = 0.05, bpp tttvvpe is sto red as equivalent to bppttvvpe Highervalues of J s result in genera lization or enlargement of equivalence class, eventua lly to the point th at every transition is context-free.

Similarly, increased values of act ivat ion to lerance during retrieval, T permit more tolerance in t he affirmat ive decision that a part icular t rans it ionwith it s possibly) extensive history is permissible.

The accuracy as a function of T is plotted in Figure 7 for J s fixed at0.0001. Cases where a > J T have relatively little mean ing.


15/23

A T Based on the Biological Reaction-diffusion Process99

319

98

97

96

95

94

93

92

91 _--1:: L _ l L _ l .L _ _ _ _ lo 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.01si m TFigure 7: Accu racy of pr edicting embedded Reber grammar sentencesas a fun ction of UT the t est phase to lerance (u s 0.0001 .

600 0Number 500

ofV Re gis. 400Used 300 0 0

200 0100 0

.005 .01 .05Sigma

storage activation toleranceFigu re 8: Number of activation condition registers V registers usedas a function of us Sym bol tran sitions to taled 4492.

To retrieve a sequence , a randomly gener ated Reber sente nce is pr esen tedsymbol-sequentially, and mat ching is attempted by the guided sequence retrieval algorithm. T he retrieval is counted as accura te only if the penu ltimatesymbol transition in the sentence is found to be a known t ransition. Thenumber of activation condition registers V registers required as a functionof U s is displayed in Figure 8. This number is a fun ction only of st oragetolerance, u not retrieval toleran ce.

5The a priori probability is 0.50.


16/23

320 Syivian R Ray and Hillai Kargup ta

Table 1: False positive rate percent for recognit ion of embeddedReber grammar sentences. 8 5T = 0.0001 5T = 0.001 5T = 0.005

0.0001 16.4 25.8 75.60.001 15.8 25.8 75.60.005 8.0 7.0 71.0

Under the condit ion that pr actically every distinct tr ansit ion in the t raining set is known 8 = 0.0001, the retrieval accuracy for 500 newly generatedst rings was found to vary from 91 percent with 5T = 0.001 to 98.6 percentwith 5T = 0.008, a relatively relaxed matching requirement .

The storage efficiency is sugges te d by the fact that for the total 4501symbol transitions in the 500 st ored sequences, the required number of Vregisters ranged from 872 with 8 = 0.0001 to 476 wit h 8 = 0.005.To study the ability of the TSP to reject false sent ences, we trained the

TSP as before, storing embedded Reber sentences cont aining the T T andPi P deterministi c corr elat ion between the second and penultimate symbols.For the GSR or recognition st age, however, the embedded Reber grammarsource was reversed in the penultimate sym bol only, gua ranteeing that noneof the test sequences had been seen complete ly during t raining but that allsequences were grammatically ident ical up to the penult imate symbol.Upon performing the experiment with this modification , all recognizedsentences false knowns corresponded to false posit ives as indicated in Table 1.

When 5T is small, the narrow acceptance window during ret rieval eas ilyblocks acceptance of most nonstored cases , lowering t he false-po sitive rateto the 8 to 16 percent range. Wi th a wider acceptance window, however ,the false-positive rate soars . These result s suggest a good compromise in thevicinity of . = 5T = 0.001, although this choice would interact with the freeconstants used in the reaction-diffusion equations.4. 6 D emonstration of t o le ran ce to warpIn the preceding tes ts , one ste p per symbol was used during both storage andtesting . This condit ion permits the symbo l concent rations appearing duringGSR st eps to reproduce exact ly the same sequence of symbo l concent rations,G t, as the network experienced du ring storage, assuming t he tolerances 8and 5T are equa l. The desirabi lity of tolerat ing int ersymbol t ime var iationsor warp noted as quality 2 in sect ion 2 is obvious, particularly if t he originalsource of t he symbols is a biological one wit h its ty pical random prop erty.

Consider the effects of warp in the ReDi TSP system. Let the spacingbetween symbol s., and Sn l be ::: t during t raining . Next consider the int roduction of 50 percent warp during recogn it ion by GSR. After matching Sn ,the test for Sn l occurs at 1 5::: t seconds later. T he add itional 5 ::: t seconds


17/23

A T P Based on the Biological R eaction-diffusion Process

Table 2: Percentage accuracy in predicting the penultimat e symbolin an embedded Reber sentence when warp is int roduced before thecritical symbol. All sentences were stored with a s = 0.0001. aT isthe test (or retr ieval) tolerance. The term 100 warp means that thetime between the test symbol and its predecessor is twice as long asduring storage. When retrieval is attempted at wider tolerances, forexample, aT 0.005, the accuracy is still excellent up to 200 warp .

aT No Warp 100 Warp 200 Warp0.001 91.0 69.6 47.80.003 92.8 77.6 63.60.005 96.4 95.0 93.20.008 98.6 98.8 99.00.01 99.6 99.6 99.8

321

will result in a difference of 0 G between the current Gand the value storedfor the transition s ; sn in the same cont ext . The magnitude of the erro r0 G will depend on the free parameters in the reaction-diffusion equations,which are not linear. But 0 GII will increase, in general, wit h warp . WhenII 0 G II > ar , t he warp will have exceeded the ab ility of the TSP to recognizethe tr ansition n Sn l as valid.

An exp eriment was conducted to measure the relationship of the erro r towarp. The sequences generated by the embedded Reber grammar generatorwere stored using one ste p per symbol. During retrieval, however , two orthree times as many symbol steps were applied for the t ransition to thepenu ltimate symbol, effecting a warp of 100 to 200 percent between then - th and n - l )th symbols. The correct or incorr ect recognition k nownor unknown) of the n - l )th symbol was the err or criterion . The ret rievalerr or rate was measur ed as a fun ct ion of the tolerance, a T (see Tab le 2). AsaT increases, we are permitt ing an increasing to lerance for acceptance of thecomparison of a stored sequence with t he external sequence .

The results show the error to be rat her high when the retr ieval toleranceis tight aT S 0.001). Relaxat ion of aT largely overcomes the warp bu tdecreases dep th and the corresponding resolution of sentence distinctions.Somewhat greater warp tolerance might be obtained wit h the use of slowerlearning ra ther than one-shot learning. This is one of many trade-offs t hatcan be juggled by t he choice of sto rage and test to lerances.4 .7 Sequentially addressed sequence memoryWe defined t he SASM mode in sect ion 2 as the case in which the storedsequences are relatively long and the content address is sequent ially pr esented. The responsive sequence(s) are then ret rieved . Addressing follows

6Four integrations per symbol step were used . This permitted each symbol's effect toreach anywhere in a 5 x 5 array of cells.


18/23

322 Sylvian R. Ray and Hillal Karguptathe guided sequence retrieval GSR algorithm, leading the TSP t o recapitulat e the training steps and set up the symbol concent rations that occurredduring training. Subsequent ret rieval of the resp onsive sequence is performedby free sequence retrieval FSR .In FSR, there is no exte rnal symbo l input. At the end of GSR the addressing operation, the symbol concentrat ion, G x,y , has been establishedand the pro cessor t hen follows FSR, which is a sequence of cell activationsdeterm ined solely by global-maximum activat ion global WTA , analogousto a falling domino pat tern .

The FSR algorithm follows. Its basic simp licity is complicate d by therat her st ringent demands we will place on it by choice of t ask , which will bediscussed short ly.Free Sequence Retr ieval Algorithm

1. Find the winning cell most highly act ivated globally , C, f C ,corresp onding to

NvC :minll Vk Gx,y ll = MINk lwhere k extends over all V registers in all cells and MIN isthe global minimum.

2. Tentative path A: Emit a uni t of symbol SC i at x y andDiffuse one symbol step . Find the most highly ac tivated cell,Cj a , and the corresponding minimum activat ion, MINA .3. Tentative path B: Diffuse one ste p without an Emit . Find

t he most highly activated cell, Cj b , and the correspondingminimum act ivation, MINB .

4. Select path A if MINA < MINB and MINA < a Otherwise, select path B if MINB < MINA and MINB


19/23


20/23

324 Sylvian R Ray and Hillal KarguptaThe sequences were addressed by GSR up to th e I and the remainder ofthe sequences were ret rieved by the FSR algorithm. The object ive was totest the ability of the TSP to distinguish the F FF case from the F casebased on the initial symbol of th e address, or G.

The retrieved st rings were exact ly as stored . The sequence of G vectorsexperienced dur ing ret rieval was exactly the same as occurred dur ing storagesince IJ s was small enough to give a depth of more than 4, the distance fromthe last unique symbol to the point where the sequences diverge.In this case, not only does the difference among the initial four symbols ofeach sequence predict th e fifth symbol correct ly, bu t the more difficult taskof dist inguishing F FF from F is achieved.Note t hat there is no problem due to the common C subsequenceshere or, in genera l, as long as the effect of the first symbol is not confounded

by the depth being too small i.e., by IJs being too small .Test 2: Variable length symbo ls in the add r essSuppose the same number of appearances of a symbol occurs in two addresses, but the t ime lengths of symbol appearance are different . Can theTSP avoid confusion? To t est this, we sto re the following two sequences.

AB C F F-2 F IE D CAB E F F F F IG F EAgain , using IJ s = 0.0001, which is small enough to ensure that the contextof each transition is uniquely represent ed, there was no problem at all in

retrieving the corr ect sequence following the I by FSR. All parameters wereas in test 1 except that Dg = 0.3 proved to be a better choice.Test 3: Sequences with long common subsequencesFin ally, we stored sequences having common subsequences to test theability of the TSP to avoid confounding the sequence dur ing retr ieval. Thesto red sequences, with th e address sect ions shown, are

F A-4 I B D E F G B C D-SF-2 D IE F G A-3 B-4F-3 C I E F G A-2 B-3B-2 C-3 E-2 D E F C-3 B-2 A CB-2 C-3 E-2 D E C-4 B-2 C AB-3 C-3 E-2 D E C-2 E-4The common subsequences were in the free ret rieved section in somecases and in the address section in other cases . Using th e same parameters

as before, with the except ion that D g = 0.2, all sequences could be retr ievedexactl y meaning both the symbols tones and their duration.48 Comments on t he FSR algorithmWe can now appropriately discuss the complication in the FSR algorit hminvolving the need to test two possible actions at each step see steps 2 and 3


21/23

A TSP Based on the Biological Reactiondiffusion Process 325in the SR algorithm , sect ion 4.7). one permit ted only sentences having asingle symbol per step, the simple ru le of th e emission of a unit of a symbolon each transit ion would be adequate . T he need for th e test arises from thefact that we want to dist inguish cases of held notes versus reattackednotes. For example, in t he subsequences . . . 4B . . . and . .. B . .. each step through the region of Cs requires testing whether the act ive cellsupports a path corresponding to Emit Diffuse) = reat tacked not e ora path corresponding to Diffuse only) = held not e. the sequences wererestr icted to only an at tacked note for each symbol transition, t he testof altern at ive possibilit ies seen in the FSR algorithm would be unn ecessary.Bu t with the allowance of successive appearances of the same symbol eitherheld or reat tacked, the way to distinguish the preferr ed pat h is by test ing thetwo possibilities corresponding to path A and path B in the FSR algorit hm.

Also note t ha t the TSP does not support addressing tha t begins at aninternal subsequence of a stored string. The reason for t his limitation is th atan approxima te value of the G vecto r at t he int ernal symbol where addressingbegins would have to be known in order to conti nue the SR correctly. Foraddressing with an initial substring, the addressing always begins wit h G =and the SR develops the complex distribut ion of symbo l concent ra t ions. isussionWe have proposed and st udied a tempora l sequence processing system thatuses the reaction-diffusion process to prov ide interconnect ion of all cells andconvey memory of past events.

The cells are prov ided with a multiplicity of inputs that are capable ofact ivat ion due to the symbol concentration at the loca t ion of the cell, producing a cell like a multi -inpu t rad ial basis function cell. React ion-diffusionprovides for controlled growth and decay processes, rather than just decayand diffusion , for the concentrations, which adds another dimension of possibilit ies to the sequence representations contained in the TSP and is a uniquefeature. Other designs having similar object ives [10, 11] limit themselves tomonotonic decay of memory of past events.

The expe rimen tal st udy of the ReDi TSP was keyed to five qualities,which are , br iefly, depth , flexibility of equivalence class representat ion , warptolerance, minimum stor age, and content addressability. Various combinations of these qualiti es are pr eferred for various applicat ions.

The present syste m permit s any depth desired bu t at an increasing costin storage capacity for increasing depth. A unique feature of th e system isthat it allows flexible class equivalence at retrieval t ime. Once the sequencesare sto red with a specific degree of uniqueness of the tr ansiti ons, th ey maybe retrieved or recognized) using the same or a lesser degr ee of stringencyin symbo l mat ching.

Storage efficiency, in relative terms, is controllab le by a single parameter , o s which sets the radius in symbol concent rat ion space within whichsequence transit ions are recognized and sto red as unique. Thus, sto rage


22/23

326 Sylvian Ray and Hillol Karguptaefficiency and its inverse, depth, are easily controlled by selection of the parameter O s

The system shows a moderate amount of warp tolerance, but this isachieved at t he cost of accuracy in identifying the stored sequence.Two principal tasks were examined for the purpose of evaluat ion of thesystem: embedded sequence recognition ESR) and sequent ially addressedsequent ial memory SASM). ESR, st ored short sequences are examined fortheir occurrence in a longer argument st ring. Bot h tasks are accomplished bycombinations of the more fund amental algorithms, guided and free sequence

ret rieval GSR and FSR).For the SR problem , the exte rnal sequence guides the inquiry addressing) , which is, ope rationally, guided sequence retr ieval. This algorithm wastested using an embedded Reber grammar to generate short sequences forsto rage. Other sequences were applied as arguments that , alt hough not necessarily physically longer , prov ided an equivalent test to the case of an extensive argument st ring . Successful events were those that correctly reachedand predict ed the penultimate symbol. This t est was fully successful, reaching accur acies up to 98 percent for particular values of o and OT A test offalse-positive responses showed that the accuracies at tributed to true-positivecases were largely valid .

For the SASM problem , the exte rnal sequences are short and the internalsequences are long. The external sequence is used as an address , appliedby the GSR algorit hm , and t he remainder of the internally sto red st ring isretr ieved by SR in a method analogous to following the lowest energy pathfrom the point where the address ends. We confined ourselves to addressingfrom the initial subsequences of the stored sequences only addressing beginning int ernal to th e sto red sequences is not possible, in general, with theTSP.8

The TSP has no problem at all performing the SASM problem for ordinary complex sequences that are sto red and retr ieved with one symbol pert ime step. order to demonstrate th e greater capabilities of the system, weint roduced and stored simple melody-like sequences requiring that both thesymbols and their t ime duration be correctly retr ieved. This experiment wasalso fully successful, alt hough it required an unwanted complicat ion in theSR algorithm in the form of a test-and-choose procedure to correctly ident ify th e held versus the reattacked case when the same symbo l recurs.

Comparing the TSP to the method presented in [11], their approach appears to be more to lerant to warp than ours. T heir approach requires prelimin ary knowledge of th e necessary depth before commencing storage ofsequences. The feature of the TSP method that allows adjustment of recognition to lerance during the recognition phase is not supported, as far as wecan determine, in the meth od in [11]. The important property of chunking, however, is one we have not yet addressed , alt hough we do not see anyinherent difficulty in extending the TSP to perform chunking.

8A modified system tha t permits addressing from intern al sequences has been st udiedby us and will be presented elsewhere.


23/23

A TSP Based on the Biological Reactiondiffusion Process

R eferences

327

[1] Turing, Alan , T he Chemical Basis of Morphogenesis, Phil . Trans. of theRoyal Soc. of London Ser . B 37 (1952) 37-72.

[2] Ray, S.R. Gyaw , T.A. , Universa l Mu ltichannel Signal In terpretation UsingConnectionist Architect ure, in Proceedings of rtificial Neura l Networks inEngineering Conj. V 4 (ASME Press , New York , 1994).

] Cleeremans, Axel, Mechani sms of Implicit Learning MIT Press, Cambridge,MA ,1993) .

[4] Mozer , Michael C. , Neur al Net Architectures for Temporal Sequence Processing , in Predicting the Future and Understanding the Past edited by A.Weigend N. Gers henfeld (Addison -Wesley, Redwoo d City, CA , 1993).

[5] Elman , Jeffrey, Finding St ructure in Time, Cognitiv e Science 14 (1990)179- 211.

[6] Jordan , M.L, Attractor Dynam ics and Parallelism in a Connectionist Sequenti al Machine, in Proceedings of 8th nnual Conference of the CognitiveScienc e Society 1987, 531-546.

[7] Cummings , F. , Rep resentation of Temporal Patterns in Recurrent NeuralNetworks, in Proceedings of the 15th nnual Conference of the CognitiveScience Society 1993 , 377-382 .

[8] Fahlmann , S.E., The Recur rent Cascade-correla tion Architecture , inAdvances in Neural Information Processing Systems; NIPS -3, P roceedingsof the 1990 Conference, held at Denver on November 26-29 , 1990, edited byLippmann , Moody, and Touret zky Morgan Kaufmann , Palo Alto , CA , 1990).

[9] deVries, B . and P rincipe , J.C ., T he GammaModel-A New Neural Net Modelfor Temporal Processing , Neural Networks 5 (1992) 565-576 .[10] Bodenhausen, Ulrich and Waibel, Alex, Learn ing t he Architecture of Neural

Networks for Speech Recognit ion , IEEE Proceedings of the ICASSP 1 (1991)117-1 24.

[11] Wang, DeLiang and Arbib , Michael, T iming and Chunking in ProcessingTemporal Order , IEEE Transcrip ts on SM (1993) 993- 1009.

[12] Meinhar dt, Hans, Models of Biological Pattern Forma tion Academic Press ,1982).[13] Bohn, H. , Interkalare Regeneration und segm entale Gradient en bei den Ex

t remitaten von Leucophaea-Larven (Blattaria) . III . Die Herk unft des interkalaren Regenerats. Wilhelm Roux rch. 67 (1971) 209-22 l.

[14] French , V., Leg Regeneration in t he Cockroach, Blattella germanica L Re

Documents

Kargupta and Ray