15
Research Article A Self-Organizing Incremental Spatiotemporal Associative Memory Networks Model for Problems with Hidden State Zuo-wei Wang Department of Computer Science and Soſtware, Tianjin Polytechnic University, Tianjin 300387, China Correspondence should be addressed to Zuo-wei Wang; [email protected] Received 31 May 2016; Revised 23 July 2016; Accepted 27 July 2016 Academic Editor: Manuel Gra˜ na Copyright © 2016 Zuo-wei Wang. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Identifying the hidden state is important for solving problems with hidden state. We prove any deterministic partially observable Markov decision processes (POMDP) can be represented by a minimal, looping hidden state transition model and propose a heuristic state transition model constructing algorithm. A new spatiotemporal associative memory network (STAMN) is proposed to realize the minimal, looping hidden state transition model. STAMN utilizes the neuroactivity decay to realize the short-term memory, connection weights between different nodes to represent long-term memory, presynaptic potentials, and synchronized activation mechanism to complete identifying and recalling simultaneously. Finally, we give the empirical illustrations of the STAMN and compare the performance of the STAMN model with that of other methods. 1. Introduction e real environment in which agents are is generally an unknown environment where there are partially observable hidden states, as the large partially observable Markov deci- sion processes (POMDP) and hidden Markov model (HMM) literatures attest. e first problem for solving a POMDP is hidden states identifying. In many papers, the method of using the -step short-term memory to identify hidden states has been proposed. e -step memory is generally imple- mented through tree-based models, finite state automata, and recurrent neural networks. e most classic algorithm of tree-based model is U-tree model [1]. is model is a variable length suffix tree model; however, this method can only obtain the task-related expe- riences rather than general knowledge of the environment. A feature reinforcement learning (FRL) framework [2, 3] is proposed, which considers maps from the past observation- reward-action history to an MDP state. Nguyen et al. [4] introduced a practical search context trees algorithm for realizing the MDP. Veness et al. [5] introduced a new Monte- Carlo tree search algorithm integrated with the context tree weighting algorithm to realize the general reinforcement learning. Because the depth of the suffix tree is restricted, these tree-based methods cannot efficiently handle long-term dependent tasks. Holmes and Isbell Jr. [6] first proposed the looping prediction suffix trees (LPST) in the deterministic POMDP environment, which can map the long-term depen- dent histories onto a finite LPST. Daswani et al. [7] extended the feature reinforcement learning framework to the space of looping suffix trees, which is efficient in representing long- term dependencies and perform well on stochastic environ- ments. Daswani et al. [8] introduced a squared -learning algorithm for history-based reinforcement learning; this algorithm used a value-based cost function. Another similar work is by Timmer and Riedmiller [9], who presented the identify and exploit algorithm to realize the reinforcement learning with history lists, which is a model-free reinforce- ment learning algorithm for solving POMDP. Talvitie [10] proposed the temporally abstract decision trees to learning partially observable models. ese -step memory represen- tations based on multidimensional tree required additional computation models, resulting in poor time performances and more storage space. And these models have poor toler- ance to fault and noise because of the accurate matching of each item. More related to our work, finite state automata (FSA) has been proved to approximate the optimal policy on belief Hindawi Publishing Corporation Computational Intelligence and Neuroscience Volume 2016, Article ID 7158507, 14 pages http://dx.doi.org/10.1155/2016/7158507

Research Article A Self-Organizing Incremental ...downloads.hindawi.com › journals › cin › 2016 › 7158507.pdfResearch Article A Self-Organizing Incremental Spatiotemporal Associative

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Research Article A Self-Organizing Incremental ...downloads.hindawi.com › journals › cin › 2016 › 7158507.pdfResearch Article A Self-Organizing Incremental Spatiotemporal Associative

Research ArticleA Self-Organizing Incremental Spatiotemporal AssociativeMemory Networks Model for Problems with Hidden State

Zuo-wei Wang

Department of Computer Science and Software Tianjin Polytechnic University Tianjin 300387 China

Correspondence should be addressed to Zuo-wei Wang wangzuowei126com

Received 31 May 2016 Revised 23 July 2016 Accepted 27 July 2016

Academic Editor Manuel Grana

Copyright copy 2016 Zuo-wei Wang This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Identifying the hidden state is important for solving problems with hidden state We prove any deterministic partially observableMarkov decision processes (POMDP) can be represented by a minimal looping hidden state transition model and propose aheuristic state transition model constructing algorithm A new spatiotemporal associative memory network (STAMN) is proposedto realize the minimal looping hidden state transition model STAMN utilizes the neuroactivity decay to realize the short-termmemory connection weights between different nodes to represent long-term memory presynaptic potentials and synchronizedactivation mechanism to complete identifying and recalling simultaneously Finally we give the empirical illustrations of theSTAMN and compare the performance of the STAMNmodel with that of other methods

1 Introduction

The real environment in which agents are is generally anunknown environment where there are partially observablehidden states as the large partially observable Markov deci-sion processes (POMDP) and hiddenMarkovmodel (HMM)literatures attest The first problem for solving a POMDP ishidden states identifying In many papers the method ofusing the 119896-step short-termmemory to identify hidden stateshas been proposed The 119896-step memory is generally imple-mented through tree-basedmodels finite state automata andrecurrent neural networks

The most classic algorithm of tree-based model is U-treemodel [1] This model is a variable length suffix tree modelhowever this method can only obtain the task-related expe-riences rather than general knowledge of the environmentA feature reinforcement learning (FRL) framework [2 3] isproposed which considers maps from the past observation-reward-action history to an MDP state Nguyen et al [4]introduced a practical search context trees algorithm forrealizing the MDP Veness et al [5] introduced a newMonte-Carlo tree search algorithm integrated with the context treeweighting algorithm to realize the general reinforcementlearning Because the depth of the suffix tree is restricted

these tree-basedmethods cannot efficiently handle long-termdependent tasks Holmes and Isbell Jr [6] first proposed thelooping prediction suffix trees (LPST) in the deterministicPOMDP environment which can map the long-term depen-dent histories onto a finite LPST Daswani et al [7] extendedthe feature reinforcement learning framework to the space oflooping suffix trees which is efficient in representing long-term dependencies and perform well on stochastic environ-ments Daswani et al [8] introduced a squared 119876-learningalgorithm for history-based reinforcement learning thisalgorithm used a value-based cost function Another similarwork is by Timmer and Riedmiller [9] who presented theidentify and exploit algorithm to realize the reinforcementlearning with history lists which is a model-free reinforce-ment learning algorithm for solving POMDP Talvitie [10]proposed the temporally abstract decision trees to learningpartially observable models These 119896-step memory represen-tations based on multidimensional tree required additionalcomputation models resulting in poor time performancesand more storage space And these models have poor toler-ance to fault and noise because of the accurate matching ofeach item

More related to our work finite state automata (FSA)has been proved to approximate the optimal policy on belief

Hindawi Publishing CorporationComputational Intelligence and NeuroscienceVolume 2016 Article ID 7158507 14 pageshttpdxdoiorg10115520167158507

2 Computational Intelligence and Neuroscience

states arbitrarily well McCallum [1] and Mahmud [11] bothintroduced the incremental search algorithm for learn proba-bilistic deterministic finite automata but thesemethods learnextremely slowly and with some other restrictions Otherscholars use recurrent neural networks (RNN) to acquirememory capability A well known architecture for RNN isLong Short-term Memory (LSTM) proposed by Hochreiterand Schmidhuber [12] Deep reinforcement learning (DRL)[13] first was proposed byMnih et al which used deep neuralnetworks to capture and infer hidden states but this methodstill apply to MDP Recently deep recurrent 119876-learning wasproposed [14] where a recurrent LSTM model is used tocapture the long-term dependencies in the history Similarmethods were proposed to learn hidden states for solvingPOMDP [15 16] A hybrid recurrent reinforcement learningapproach that combined the supervised learning with RL wasintroduced to solve customer relationship management [17]These methods can capture and identify hidden states in anautomatic way Because these networks use common weightsand fixed structure it is difficult to achieve incrementallearning These networks were suited to resolve the spa-tiotemporal pattern recognition (STPR) which is extractionof spatiotemporal invariances from the input stream For thetemporal sequence learning and recalling in more accuratefashion such as the trajectory planning decision makingrobot navigation and singing special neural networkmodelsfor temporal sequence learning may be more suitable

Biologically inspired associative memory networks(AMN) have shown some success for this temporal sequencelearning and recalling These networks are not limited tospecific structure realizing incremental sequence learning inan unsupervised fashion Wang and Yuwono [18] establisheda model to recognize and learn complex sequence whichis also capable of incremental learning but need to providedifferent identifiers for each sequence artificially Sudo et al[19] proposed the self-organizing incremental associativememory (SOIAM) to realize the incremental learningKeysermann and Vargas [20] proposed a novel incrementalassociative learning architecture for multidimensionalreal-valued data However these methods cannot addresstemporal sequences By using time-delayed Hebb learningmechanism a self-organizing neural network for learningand recall of complex temporal sequences with repeateditems and shared items is presented in [21 22] which wassuccessfully applied to robot trajectory planning Tangruam-sub et al [23] presented a new self-organizing incrementalassociative memory for robot navigation but this methodonly dealt with simple temporal sequences Nguyen et al[24] proposed a long-term memory architecture which ischaracterized by three features hierarchical structure anti-cipation and one-shot learning Shen et al [25 26] provideda general self-organizing incremental associative memorynetwork This model not only leaned binary and nonbinaryinformation but realized one-to-one and many-to-manyassociations Khouzam [27] presented a taxonomy abouttemporal sequences processing methods Although thesemodels realized heteroassociative memory for complextemporal sequences the memory length 119896 still is decidedby designer which cannot vary in self-adaption fashion

Moreover These models are unable to handle complexsequence with looped hidden state

The rest of this paper is organized as follows In Section 2we introduce the problem setup present the theoreticalanalysis for aminimal looping hidden state transitionmodeland derive a heuristic constructing algorithm for this modelIn Section 3 STAMNmodel is analyzed in detail including itsshort-term memory (STM) long-term memory (LTM) andthe heuristic constructing process In Section 4 we presentdetailed simulations and analysis of the STAMN model andcompare the performance of the STAMN model with that ofother methods Finally a brief discussion and conclusion aregiven in Sections 5 and 6 separately

2 Problem Setup

A deterministic POMDP environment can be represented bya tuple 119864 = ⟨119878 119860 119874 119891119904 119891119900⟩ where 119878 is the finite set of hiddenworld states 119860 is the set of actions that can be taken by theagent119874 is the set of possible observations119891119904 is a determinis-tic transition function 119891119904(119878 119860) rarr 119878 and 119891119900 is a deterministicobservation function 119891119900(119878 119860) rarr 119874 In this thesis we onlyconsider the special observation function 119891119900(119878) rarr 119874 thatsolely depends on the state 119904 A history sequence ℎ is definedas a sequence of past observations and actions 1199001 1198861 11990021198862 119886119905minus1 119900119905 which can be generated by the deterministictransition function 119891119904 and the deterministic observationfunction 119891119900 The length |ℎ| of a history sequence is definedas the number of observations in this history sequence

The environment that we discuss is deterministic andthe state space is finite We also assume the environment isstrongly connected However the environment is determinis-tic which can be highly complicated and nondeterministic atthe level of observation The hidden state can be fully identi-fied by a finite history sequence in a deterministic POMDPwhich is proved in [2] Several notations are defined asfollows

trans(ℎ 119886) = 119900 | 119900 is a possible observation follow-ing ℎ by taking action 119886120574(119904119894 119889119894119895) denotes the observations sequence 1199001 1199002 119900119899 generated by taking actions sequence 119889119894119895 from state119904119894

Our goal is to construct a minimal looping hidden statetransition model by use of the sufficient history sequencesFirst we present the theoretical analysis showing any deter-ministic POMDP environment can be represented by aminimal looping hidden state transition model Then wepresent a heuristic constructing algorithm for this modelCorresponding definitions and lemmas are proposed asfollows

Definition 1 (identifying history sequence) A identifying his-tory sequence ℎ119894 is considered to uniquely identify the hiddenstate 119904119894 In the rest of this paper the hidden state 119904119894 is equallyregarded as its identifying history sequence ℎ119894 so 119904119894 and ℎ119894 canreplace each other

A simple example of deterministic POMDP is illustratedin Figure 1 We conclude easily that an identifying history

Computational Intelligence and Neuroscience 3

fo(s3) = o1

fo(s2) = o2

fo(s1) = o1

s1 s2

s3

a1

a1

a2

a2

a1 a2

Figure 1 A simple example of deterministic POMDP

sequence ℎ2 for 1199042 is expressed by 1199002 and an identifyinghistory sequence ℎ1 for 1199041 is expressed by 1199002 1198861 1199001 1198861 1199001Note that there may exist infinitely many identifying historysequences ℎ119894 for 119904119894 because the environment is stronglyconnected and may exist as unbounded long identifyinghistory sequences ℎ119894 for 119904119894 because of uninformative loopingSo this leads us to determine the minimal identifying historysequences length 119896

Definition 2 (minimal history sequences length 119896) The min-imal identifying history sequences length 119896 for the hiddenstate 119904119894 is defined as follows The minimal identifying historysequence ℎ for the hidden state 119904119894 is the identifying historysequence ℎ such that no suffix ℎ1015840 of ℎ is also identified for 119904119894However theminimal identifying history sequencemay haveunbounded length because the identifying history sequencecan include looping arbitrarily many times which merelylengthens the history sequence For example two identifyinghistory sequences ℎ1 and ℎ10158401 for 1199041 are separately expressedto 1199002 1198861 1199001 1198861 1199001 and 1199002 1198861 1199001 1198861 1199001 1198861 1199001 1198861 1199001 In thissituation 1199001 1198861 1199001 1198861 1199001 is treated as the looped item Sothe minimal identifying history sequences length 119896 for thehidden state 119904119894 is the minimal length value of all identifyinghistory sequences by excising the looping portion

Definition 3 (a criterion for identifying history sequence)Given a sufficient history sequence for the set of historysequences for the hidden state 119904 if each history sequence ℎfor the hidden state 119904 exists which satisfies that trans(ℎ 119886)is the same observation for each action thus we considerthe history sequence ℎ as identifying history sequenceDefinition 3 is correct iff a full sufficient history sequence isgiven

Lemma 4 (backward identifying history sequence) Weassume an identifying history sequence ℎ is for a single hiddenstate 119904 if ℎ1015840 is a backward extension of ℎ by adding an action119886 isin 119860 and an observation 119900 isin 119874 then ℎ1015840 is a new identifying

history sequence for the single state 1199041015840 = 119891119904(119904 119886) 1199041015840 is called theprediction state of 119904Proof We assume ℎ1015840 is a history sequence 1199001 1198861 119886119905minus1 119900119905generated by the transition function 119891119904 and observationfunction119891119900 And ℎ is a history sequence 1199001 1198861 119886119905minus2 119900119905minus1as a prefix of ℎ1015840 Since ℎ is identifying history sequence for 119904it must be that 119904119905minus1 = 119904 Because the environment is deter-ministic POMDP thus the next state 1199041015840 is uniquely deter-mined by 119891119904(119904119905minus1 119886119905minus1) = 119904119905 after the action 119886119905minus1 has beenexecuted Then ℎ1015840 is a new identifying history sequence forthe single state 119904119905 = 119891119904(119904119905minus1 119886119905minus1) = 119891119904(119904 119886119905minus1) = 1199041015840Lemma5 (forward identifying history sequence) Weassumean identifying history sequence ℎ1015840 is for a single hidden state 1199041015840if ℎ is a prefix of ℎ1015840 by removing the last action 119886 isin 119860 andthe last observation 119900 isin 119874 then ℎ is the identifying historysequence for the single state s such that 1199041015840 = 119891119904(119904 119886) s is calledthe previous state of 1199041015840Proof We assume ℎ1015840 is a history sequence 1199001 1198861 119886119905minus1 119900119905generated by the transition function119891119904 and observation func-tion 119891119900 And ℎ is a history sequence 1199001 1198861 119886119905minus2 119900119905minus1 as aprefix of h by removing the action 119886119905minus1 and the observation 119900119905Since ℎ1015840 is an identifying history sequence for the state 1199041015840 itmust be that 119904119905 = 1199041015840 Because the environment is deterministicPOMDP thus the current state 1199041015840 is uniquely determinedby the previous state 119904 such that 119891119904(119904119905minus1 119886119905minus1) = 1199041015840 Then ℎis the identifying history sequence for the state 119904 such that119891119904(119904119905minus1 119886119905minus1) = 119891119904(119904 119886119905minus1) = 1199041015840Theorem 6 Given the sufficient history sequences a finitestrongly connected deterministic POMDP environment canbe represented soundly by a minimal looping hidden statetransition model

Proof We assume 119864 = ⟨119878 119860 119874 119891119904 119891119900⟩ where 119878 is the finiteset of hidden world states and |119878| is the number of the hiddenstates First for all hidden states at least a finite lengthidentifying history sequence ℎ for one state 119904 exists becausethe environment is strongly connected there must exist atransition history ℎ1015840 from 119904 to 1199041015840 according to Lemma 4by backward extending of ℎ by adding ℎ1015840 which is theidentifying history sequence for 1199041015840 Since the hidden statetransition model can identify the looping hidden state thereexists the maximal transition history sequence length from 119904to 1199041015840 which is |119878| Thus this model has minimal hidden statespace with |119878|

Since the hidden state transition model can correctlyrealize the hidden state transition 1199041015840 = 119891119904(119904 119886) this modelcan correctly express all identifying history sequences for allhidden state 119904 and possesses the perfect transition predictioncapacity and this model has minimal hidden state space with|119878| This model is a 119896-step variable history length modeland the memory depth 119896 is a variable value for the differenthidden state This model is realized by the spatiotemporalassociative memory networks in Section 3

If we want to construct correct hidden state transi-tion model first we necessary to collect sufficient history

4 Computational Intelligence and Neuroscience

sequences to perform statistic test to determine the minimalidentifying history sequence length 119896 However in practiceit can be difficult to obtain a sufficient history So we proposea heuristic state transition model constructing algorithmWithout the sufficient history sequence the algorithm mayproduce a premature state transitionmodel but this model atleast is correct for the past experienceWe realize the heuristicconstructing algorithm by way of two definitions and twolemmas as follows

Definition 7 (current history sequence) A current historysequence is represented by the 119896-step history sequence119900119905minus119896+1 119886119905minus119896+1 119900119905minus119896+2 119886119905minus1 119900119905 generated by the transitionfunction 119891119904 and the observation function 119891119900 At 119905 = 0 thecurrent history sequence ℎ0 is empty 119900119905 is the observationvectors at current time 119905 and 119900119905minus119896+1 is the observation vectorsthat precede 119896-step at time 119905Definition 8 (transition instance) The history sequence asso-ciated with time 119905 is captured as a transition instanceThe transition instance is represented by the tuple ℎ119905+1 =(ℎ119905 119886119905 119900119905+1) where ℎ119905 and ℎ119905+1 are current history sequencesoccurring at times 119905 and 119905 + 1 on episode A set of transitioninstances will be denoted by the symbol F which possiblycontains transitions from different episodes

Lemma 9 For any two hidden states 119904119894 and 119904119895 119904119894 = 119904119895 ifftrans(119904119894 119886) = trans(119904119895 119886)

This lemma gives us a sufficient condition to determine 119904119894 =119904119895 However this lemma is not a necessary condition

Proof by Contradiction We assume that 119904119894 = 119904119895 thus for allactions sequences 119889119894119895 = 1198861 1198862 119886119899 there exists 120574(119904119894 119889119894119895) =120574(119904119895 119889119894119895) 120574(119904119894 119889119894119895) = 120574(119904119895 119889119894119895) is contradicting with trans(119904119894119886) = trans(119904119895 119886) Thus original proposition is true

Lemma 10 For any two identifying history sequences ℎ119894 and ℎ119895separately for 119904119894 and 119904119895 there exists ℎ119894 = ℎ119895 iff 119904119894 = 119904119895 HoweverLemma 10 is not a necessary condition

Proof by Contradiction Since an identifying history sequenceℎ is considered to uniquely identify the hidden state 119904 theidentifying history sequences ℎ119894 uniquely identify the hiddenstate 119904119894 and the identifying history sequences ℎ119895 uniquelyidentify the hidden state 119904119895 We assume ℎ119894 = ℎ119895 thus theremust exit 119904119894 = 119904119895 which is contradicting with 119904119894 = 119904119895 Thusoriginal proposition is true

However the necessary condition for Lemma 10 is notalways true because there maybe exist several differentidentifying history sequences for the same hidden state

Algorithm 11 (a heuristic constructing algorithm for thestate transition model) The initial state transition model isconstructed making use of the minimal identifying historysequence length 119896 = 1 And the model is empty initially

The transition instance ℎ is empty and 119905 = 0(1) We assume the given model can perfectly identify the

hidden state The agent makes a step in the environ-ment (according to history sequence definition thefirst item in ℎ is the observation vector) It recordsthe current transition instance on the end of the chainof transition instance And at each time 119905 for thecurrent history sequence ℎ119905 execute Algorithm 12 ifthe current state 119904 is looped to the hidden state 119904119894 thengo to step (2) otherwise a new node 119904119894 is created in themodel go to step (4)

(2) According to Algorithm 13 if trans(ℎ119905 119886119895) = trans(ℎ119894119886119895) then the current state 119904 is distinguished from theidentifying sate 119904119894 then go to step (3) If there existstrans(ℎ119905 119886119895) = trans(ℎ119894 119886119895) then go to step (4)

(3) According to Lemma 10 for any two identifying his-tory sequences ℎ119894 and ℎ119895 separately for 119904119894 and 119904119895 thereexists ℎ119894 = ℎ119895 iff 119904119894 = 119904119895 So the identifying historysequence length for ℎ119894 and ℎ119895 separately for 119904119894 and 119904119895must increase to 119896+1until ℎ119894 and ℎ119895 are discriminatedWe must reconstruct the model based on the newminimal identifying history sequence length 119896 and goto step (1)

(4) If the action node and corresponding 119876 function forthe current identifying state node exist in the modelthe agent chooses its next action node 119886119895 based on theexhaustive exploration 119876 function If the action nodeand corresponding 119876 function do not exist the agentchooses a random action 119886119895 instead and a new actionnode is created in the model After the action controlsignals to delivery the agent obtains the new observa-tion vectors by trans(119904119894 119886119895) go to step (1)

(5) Steps (1)ndash(4) continue until all identifying historysequences ℎ for the same hidden state 119904 can correctlypredict the trans(ℎ 119886) for each action

Note we apply the 119896-step variable history length from119896 = 1 to 119899 and the history length 119896 is a variable value for thedifferent hidden state We adopt the minimalist hypothesismodel in constructing the state transitionmodel processThisconstructing algorithm is a heuristic If we adopt the exhaus-tiveness assumptionmodel the probability of missing loopedhidden state increases exponentially and many valid loopscan be rejected yielding larger redundant state and poorgeneralization

Algorithm 12 (the current history sequence identifying algo-rithm) In order to construct the minimal looping hiddenstate transition model we need to identify the looped stateby identifying hidden state Current history sequence with119896-step is needed According to Definition 7 current 119896-stephistory sequence at time 119905 is expressed by 119900119905minus119896+1 119886119905minus119896+1119900119905minus119896+2 119886119905minus1 119900119905 There exist three identifying processes

Computational Intelligence and Neuroscience 5

(1) 119896-Step History Sequence Identifying If the identifyinghistory sequence for 119904119894 is 119900119894minus119896+1 119886119894minus119896+1 119900119894minus119896+2 119886119894minus1 119900119894satisfy

119900119905 = 119900119894119886119905minus1 = 119886119894minus1

119900119905minus119896+2 = 119900119894minus119896+2119886119905minus119896+1 = 119886119894minus119896+1119900119905minus119896+1 = 119900119894minus119896+1

(1)

Thus the current state 119904 is looped to the hidden state119904119894 In the STAMN the 119896-step history sequence identifyingactivation value119898119904119895(119905) is computed

(2) Following Identifying If the current state 119904 is identified asthe hidden state 119904119894 the next transition history ℎ119905+1 is repre-sented by ℎ119905+1 = (ℎ119905 119886119905 119900119905+1) through trans(119904119894 119886119905) Accord-ing to Lemma 4 the next transition history ℎ119905+1 is identifiedas the transition prediction state 119904119895 = 119891119904(119904119894 119886119905)

In the STAMN the following activation value 119902119904119895(119905) iscomputed

(3) Previous Identifying If there exists a transition instanceℎ119905+1 = (ℎ119905 119886119905 119900119905+1) ℎ119905+1 is an identifying history sequencefor state 119904119895 According to Lemma 5 the previous state 119904119894 isuniquely identified by ℎ119905 such that 119904119895 = 119891119904(119904119894 119886119905) So if thereexists a transition prediction state 119904119895 and 119904119895 = 119891119904(119904119894 119886119905) thenthe previous state 119904119894 is uniquely identified

In the STAMN the previous activation value 119889119904119895(119905) iscomputed

Algorithm 13 (transition prediction criterion) If the modelcorrectly represents the environment model as Morkovchains then for all state 119904 for all identifying history sequenceℎ for the same state it is satisfied that trans(ℎ 119886) is the sameobservation for the same action

If ℎ1 and ℎ2 are the identifying history sequences with 119896-step memory for the hidden state 119904 there exist trans(ℎ1 119886) =trans(ℎ2 119886) according to Lemma 9 thus 119904119894 is discriminativewith 119904119895

In the STAMN the transition prediction criterion isrealized by the transition prediction value Psj(t)

3 Spatiotemporal AssociativeMemory Networks

In this section we want to relate the spatiotemporal sequenceproblem to the idea of identifying the hidden state by asequence of past observations and actions An associativememory network (AMN) is expected to have several charac-teristics (1) An AMN must memorize incrementally whichhas the ability to learn new knowledge without forgettingthe learned knowledge (2) An AMN must be able to notonly record the temporal orders but also record sequence

items duration time with continuous time (3) AnAMNmustbe able to realize the heteroassociation recall (4) An AMNmust be able to process the real-valued feature vectors inbottom-up method not the symbolic items (5) An AMNmust be robust and must be able to recall the sequencecorrectly with incomplete or noisy input (6) An AMN canrealize learning and recalling simultaneously (7) An AMNcan realize interaction between STM and LTM dual-tracetheory suggested that the persistent neural activities of STMcan lead to LTM

The first thing to be defined is the temporal dimension(discrete or continuous) The previous researches are mostlybased on a regular intervals of Δ119905 and few AMN have beenproposed to deal with not only the sequential characteristicbut also sequence items duration time with continuous timeHowever this characteristic is important for many problemsIn speech production writing music generation and motor-planning and so on the sequence item duration time andsequence item repeated emergence have essential differentmeaning For example ldquo1198614rdquo and ldquo119861-119861-119861-119861rdquo are exactlydifferent the former represents that item 119861 sustains for 4timesteps and the latter represents that item 119861 repeatedlyemerges for 4 times STAMN can explicitly distinguish thesetwo temporal characteristics So a history of past observationsand actions can be expressed by a special spatiotemporalsequence ℎ ℎ = 119900119905minus119896+1 119886119905minus119896+1 119900119905minus119896+2 119886119905minus1 119900119905 where 119896is the length of sequence of ℎ the items of ℎ include theobservation items and the action items where 119900119905 denotes thereal-valued observation vectors generated by taking action119886119905minus1 119886119905 denotes the action taken by the agent at time 119905where 119905 does not represent temporal discrete dimension bysampling at regular intervals Δ119905 but represents the 119905th step inthe continuous-time dimension which is the duration timebetween the current item and the next one

A spatiotemporal sequence can classified as simple seq-uence and complex sequence Simple sequence is a sequencewithout repeated items for example the sequence ldquo119861-119860-119864-119871rdquo is a simple sequence whereas those containing repeateditems are defined as the complex sequence In complexsequence the repeated item can be classified as looped itemsand discriminative items by identifying the hidden statefor example in the history sequence ldquo119883-119884-119885-119884-119885rdquo ldquo119884rdquoand ldquo119885rdquo maybe the looped items or discriminative itemsIdentifying the hidden state needs to introduce the contextualinformation resolving by the 119896-step memory The memorydepth 119896 is not fixed and 119896 is a variable value in different partsof state space

31 Spatiotemporal Associative Memory Networks Architec-ture We build a new spatiotemporal associative memorynetwork (STAMN) This model makes use of neuron activitydecay of nodes to achieve short-term memory connectionweights between different nodes to represent long-termmemory presynaptic potentials and neuron synchronizedactivation mechanism to realize identifying and recallingand a time-delayed Hebb learning mechanism to fulfil theone-shot learning

6 Computational Intelligence and Neuroscience

nsi

bsi

naj

baj

wij 120596ij

120596ij

Figure 2 STAMN structure diagram

STAMN is an incremental possibly looping nonfullyconnected asymmetric associative memory network Non-homogeneous nodes correspond to hidden state nodes andaction nodes

For the state node 119895 the input values are defined asfollows (1) the current observation activation value 119887119904119895(119905)responsible for matching degree of state node 119895 and currentobserved value which is obtained from the preprocessingneural networks (if the matching degree is greater thana threshold value then 119887119904119895(119905) = 1) (2) the observationactivation value 119887119904119895(119905) of the presynaptic nodes set 120591119895 of currentstate node 119895 (3) the identifying activation value 119899119904119894 (119905) of theprevious state node 119894 of state node 119895 (4) the activation value119899119886119894 (119905) of the previous action node 119894 of state node 119895 The outputvalues are defined as follows (1) The identifying activationvalue 119899119904119895(119905) represents whether the current state 119904 is identifiedto the hidden state 119904119895 or not (2) The transition predictionvalue 119901119904119895(119905) represents whether the state node 119895 is the currentstate transition prediction node or not

For the action node 119895 the input values are defined asthe current action activation value 119887119886119895 (119905) responsible formatching degree of action node 119895 and current motor vectorsThe output value is activation value of the action nodes 119899119886119895 (119905)and indicates that the action node 119895has been selected by agentto control the robotrsquos current action

For the STAMN all nodes and connections weight donot necessarily exist initially The weights 120596119894119895(119905) and 120596119894119895(119905)connected to state nodes can be learned by a time-delayedHebb learning rules incrementally representing the LTMThe weight 119908119894119895(119905) connected to action nodes can be learnedby reinforcement learning All nodes have activity self-decay mechanism to record the duration time of this noderepresenting the STM The output of the STAMN is thewinner state node or winner action node by winner takes all

STAMN architecture is shown in Figure 2 where blacknodes represent action nodes and concentric circles nodesrepresent state nodes

32 Short-TermMemory We using self-decay of neuron acti-vity to accomplish short-term memory no matter whetherobservation activation 119887119904119894 (119905) or identifying activation 119899119904119894 (119905)The activity of each state node has self-decay mechanismto record the temporal order and the duration time of thisnode Supposing the factor of self-decay is 120574119904119894 isin (0 1) and theactivation value 119887119904119894 (119905) 119899119904119894 (119905) isin [0 1] the self-decay process isshown as

119887119904119894 (119905) 119899119904119894 (119905)

= 0 119887119904119894 (119905) 119899119904119894 (119905) ⩽ 120578120574119904119894 119887119904119894 (119905 minus 1) 120574119904119894 119899119904119894 (119905 minus 1) else

(2)

The activity of action node also has self-decay character-istic Supposing the factor of self-decay is 120575119886119894 isin (0 1) and theactivation value 119887119886119894 (119905) 119899119886119894 (119905) isin [0 1] the self-decay process isshown as

119887119886119894 (119905) 119899119886119894 (119905)

= 0 119887119886119894 (119905) 119899119886119894 (119905) ⩽ 120583120575119886119894 119887119886119894 (119905 minus 1) 120575119886119894 119899119886119894 (119905 minus 1) else

(3)

wherein 120578 and 120583 are active thresholds and 120574119904119894 and 120575119886119894 areself-decay factors Both determine the depth of short-termmemory 119896 where 119905 is a discrete time point by sampling atregular intervals Δ119905 and Δ119905 is a very small regular interval

33 Long-Term Memory Long-term memory can be classi-fied into semantic memory and episodic memory Learningof history sequence is considered as the episodic memorygenerally adopting the one-shot learning to realize We usethe time-delayed Hebb learning rules to fulfil the one-shotlearning

(1) The 119896-Step Long-Term Memory Weight 120596119894119895(119905) The weight120596119894119895(119905) connected to state nodes 119895 represents 119896-step long-termmemory This is a past-oriented behaviour in the STAMNThe weight 120596119894119895(119905) is adjusted according to

120596119894119895 (119905)119894isin120591119895

119899119904119895(119905)=1

= 119899119904119895 (119905) 119899119883119894 (119905) 120596119894119895 (119905 minus 1) = 0 119883 isin 119904 119886120596119894119895 (119905 minus 1) + 120572 (119899119883119894 (119905) minus 120596119894119895 (119905 minus 1)) else (4)

where 119895 is the current identifying activation state node119899119904119895(119905) = 1 Because the identifying activation process is aneuron synchronized activation process when 119899119904119895(119905) = 1 all

nodes whose 119899119904119894 (119905) and 119899119886119894 (119905) are not zero are the contextualinformation related to state node 119895 These nodes whose 119899119904119894 (119905)and 119899119886119894 (119905) are not zero are considered as the presynaptic node

Computational Intelligence and Neuroscience 7

set of current state node 119895The presynaptic node set of currentstate node 119895 is expressed by 120591119895 where 119899119904119894 (119905) 119899119886119894 (119905) representthe activation value at time 119905 and 119899119904119894 (119905) 119899119886119894 (119905) record not onlythe temporal order but also the duration time because of theself-decay of neuron activity If 119899119904119894 (119905) 119899119886119894 (119905) are smaller thenode 119894 is more early to current state node 119895120596119894119895(119905) is activationweight between presynaptic node 119894 and state node 119895 120596119894119895(119905)records context information related to state node 119895 to be usedin identifying and recalling where 120596119894119895(119905) isin [0 1] 120596119894119895(0) = 0and 120572 is learning rate

The weight 120596119894119895(119905) is time-related contextual informationthe update process is shown as in Figure 3

(2) One-Step Transition Prediction Weight 120596119894119895(119905) The weight120596119894119895(119905) connected to state nodes 119895 represents one-step transi-tion prediction in LTM This is a future-oriented behaviourin the STAMN Using the time-delayed Hebb learning rulesthe weight 120596119894119895(119905) is adjusted according to

120596119894119895 (119905)119894=argmax 119899119883

119894(119905)

119899119904119895(119905)=1

=

119899119904119895 (119905) 119899119883119894 (119905) 120596119894119895 (119905 minus 1) = 0 119883 isin 119904 119886120596119894119895 (119905 minus 1) + 120573 (119899119883119894 (119905) minus 120596119894119895 (119905 minus 1)) else (5)

The transition activation of current state node 119895 is onlyassociated with the previous winning state node and actionnode where 119895 is the current identifying activation state node119899119904119895(119905) = 1 The previous winning state node and action nodeare presented by 119894 = argmax119883119894 (119905) 119883 isin 119904 119886 where 120596(119905) isin[0 1] and 120596(0) = 0 120573 is learning rate

The weight 120596119894119895(119905) is one-step transition prediction infor-mation the update process is shown as in Figure 4

(3) The Weight 119908119894119895(119905) Connected to Action NodesThe activa-tion of action node is only associated with the correspondingstate node which selects this action node directly We assumethe state node with maximal identifying activation value isthe state node 119894 at the time 119905 so the connection weight 119908119894119895(119905)connected to current selected action node 119895 is adjusted by

119908119894119895 (119905)119894=argmax 119899119904

119894(119905)

119899119886119895(119905)=1

=

1198760 (119904119894 119886119895) 119908119894119895 (119905 minus 1) = 0minus119876 max const 119886119895 is a invalid action

119876119905 (119904119894 119886119895) else(6)

where 119908119894119895(119905) is represented by 119876 function 119876119905(119904119905 119886119895) Thispaper only discusses how to build generalized environmentmodel not learning the optimal policy so this value is setto be the exhaustive exploration 119876 function based on thecuriosity reward The curiosity reward is described by (7)When action 119886119895 is an invalid action 119908119894119895(119905) is defined to bea large negative constant which is avoided going to the deadends

119903 (119905) = 119862 minus 119899119905

119899ave 119899ave gt 0119862 119899ave = 0

(7)

where 119862 is a constant value 119899119905 represents the count ofexploration to the current action by the agent and 119899ave is theaverage count of exploration to all actions by the agent 119899119905119899averepresents the degree of familiarity with the current selectedaction The curiosity reward is updated when each action is

finished 119876119905(119904119894 119886119895) update equation is showed by (8) where 120582is learning rate

119876119905 (119904119894 119886119895)

= (1 minus 120582)119876119905minus1 (119904119894 119886119895) + 120582119903 (119905) 119894 = argmax 119899119904119894 (119905) 119899119886119895 (119905) = 1119876119905minus1 (119904119894 119886119895) else

(8)

The update process is shown as in Figure 5The action node 119895 is selected by 119876119905(119904119905 119886119895) according to

119886119895 = argmax119895isin119860vl119894119899119904119894 (119905)=1

119876(119904119894 119886119895) (9)

where 119860vl119894 is the valid action set of state node 119894 119899119904119894 (119905) = 1

represent that the sate node 119894 is identified and the action node119895 was selected by agent to control the robotrsquos current actionset 119899119886119894 (119905) = 1

34 The Constructing Process in STAMN In order to con-struct the minimal looping hidden state transition model weneed to identify the looped state by identifying hidden stateThere exist identifying phase and recalling phase (transitionprediction phase) simultaneously in the constructing processin STAMN There is a chicken and egg problem during theconstructing process the building of the STAMN dependson state identifying conversely state identifying dependson the current structure of the STAMN Thus exhaustiveexploration and 119896-step variable memory length (depends onthe prediction criterion) are used to try to avoid local minimathat this interdependent causes

According to Algorithm 12 The identifying activationvalue of state node 119904119895 depends on three identifying processes119896-step history sequence identifying following identifyingand previous identifying We provide calculation equationsfor each identifying process

(1) 119896-Step History Sequence IdentifyingThe matching degreeof current 119896-step history sequence with the identifying

8 Computational Intelligence and Neuroscience

ojminus1 ajminus1120596ij ojmiddot middot middot ns

j(t) = 1

i isin Γj nsi(t) = 0 na

i = 0

ojminusk+1 ajminusk+1

Figure 3 Past-oriented weights 120596119894119895(119905) update process associatedwith state node 119895

oi ojai

120596ij(t)nsj(t) = 1

i = arg max nXi (t)

Figure 4 Future-oriented weights 120596119894119895(119905) update process associatedwith state node 119895

oi ajQ0(si aj)

naj (t) = 1

i = arg max nsi (t)

Figure 5 Weights 119908119894119895(119905) updating process associated with actionnode 119895

history sequence ℎ119895 for 119904119895 is computed to identify the loopingstate node 119895 First we compute the presynaptic potential119881119895(119905)for the state node 119895 according to

119881119895 (119905) = sum119894isin120591119895

119862119894119895 (119887119883119894 (119905) 120596119894119895) 119883 isin 119904 119886 (10)

where 119887119883119894 (119905) is the current activation value in the presynapticnode set 120591119895 119862119894119895 is the confidence parameter which meansthe node 119894rsquos importance degree in the presynaptic node set120591119895 The value of 119862119894119895 can be set in advance and sum119894isin120591119895 119862119894119895 =1 The function represents the similar degree betweenactivation value of the presynaptic node and the contextualinformation 120596119894119895 in LTM (119909 119910) isin [0 1] the similar degreeis high between 119909 and 119910 thus(119909 119910) is close to 1 Accordingto ldquowinner takes allrdquo among all nodes whose presynapticpotentials exceed the threshold 120599 the node with the largestpresynaptic potential will be selected

The presynaptic potential 119881119895(119905) represents the syn-chronous activation process of presynaptic node set of 120591119895

which represents the previous 119896 minus 1 step contextual infor-mation matching of state node 119895 To realize all 119896-step historysequence matching the 119896-step history sequence identifyingactivation value of the state node 119895 is given below

119898119904119895 (119905) = 119867 (119881119895 (119905) 120599)

sdot 119867(119881119895 (119905) max1198961rarr119873119888

(119881119896 (119905))) 119887119904119895 (119905)

119867 (119909 119910) = 0 119909 lt 1199101 119909 ge 119910

(11)

where max1198961rarr119873119888(119881119896(119905)) is the maximum potential value ofnode 119896 119867(119881119895(119905) 120599) sdot 119867(119881119895(119905)max1198961rarr119873119888(119881119896(119905))) means thenode with the largest potential value is selected among allnodes whose synaptic potentials exceed the threshold 120599 119887119904119895(119905)represents the matching degree of state node 119895 and currentobserved value If 119898119904119895(119905) = 1 the current state 119904 is identifiedto looped state node 119895 by the 119896 step memory

(2) Following Identifying If the current state 119904 is identifiedas the state 119904119894 then the next transition prediction state isidentified as the state 119904119895 = 119891119904(119904119894 119886119894) First we compute thetransition prediction value for the state node 119895 according to

119901119904119895 (119905) = sum119894=argmax 119899119883119894 (119905)

119863119894119895120601 (119899119883119894 (119905) 120596119894119895) 119883 isin 119904 119886 (12)

where 119899119883119894 (119905) is the identifying activation value of the statenode and action node at time 119905 119894 = argmax 119899119883119894 (119905) indicatesstate node 119894 and action node 119894 are the previous winner nodes119863119894119895 is the confidence parameter which means the node 119894rsquosimportance degree The value of 119863119894119895 can be set in advanceand sum119894=arg max 119899119883

119894(119905)119863119894119895 = 1 120596119894119895 records one-step transition

prediction information related to state node 119895 to be usedin identifying and recalling phase If the current state 119904 isidentified as the hidden state 119904119894119901119904119895(119905) represent the probabilityof the next transition prediction state is 119895

If the next prediction state node 119895 is the same as thecurrent observation value the current history sequence isidentified as the state node 119895 the following identifying valueof the state node 119895 is given below

119902119904119895 (119905) = 119867 (119901119904119895 (119905) 120599)

sdot 119867(119901119904119895 (119905) max1198961rarr119873119888

(119901119904119896 (119905))) 119887119904119895 (119905) (13)

If 119902119904119895(119905) = 1 the current history sequence ℎ119905 is identifiedto looped state node 119895 by the following identifying If 119901119904119895(119905) gemax1198961rarr119873119888(119901119904119896(119905)) 119901119904119895(119905) ge 120599 and 119902119904119895(119905) = 1 then there exists119887119904119895(119905) = 1 representing mismatching of state node 119895 and cur-rent observed value There exist trans(ℎ119905 119886119894) = trans(ℎ119894 119886119894)According to transition prediction criterion (Algorithm 13)the current history sequence ℎ119905 is not identified by the hidden

Computational Intelligence and Neuroscience 9

sate 119904119894 so the identifying history sequence length for ℎ119905 andℎ119894 separately must increase to 119896 + 1(3) Previous Identifying If the current history sequence isidentified as the state 119904119895 then the previous state 119904119894 is identifiedsuch that 119904119895 = 119891119904(119904119894 119886119894) First we compute the identifyingactivation value for all states 119904 if there exists 119899119904119895(119905) = 1 thenthe current state 119904119905 is identified as state 119904119895 and the previousstate 119904119905minus1 is identified as the previous state 119904119894 of state 119904119895 Theprevious identifying value of the state node 119895 is defined as119889119904119894 (119905) The previous state 119904119894 is 119894 = argmax(119887119904119894 (119905)) satisfyingcondition 1 and condition 2 Then we set 119889119904119894 (119905)

Condition 1

119899119904119895 (119905) = 1 (14)

Condition 2

120601 (119887119904119894 (119905) 120596119894119895) gt 120599 (15)

According to above three identifying processes the iden-tifying activation value of the state node 119895 is defined asfollows

119899119904119895 (119905) = max (119898119904119895 (119905) 119902119904119895 (119905) 119889119904119895 (119905)) (16)

According to Algorithm 11 we give Pseudocode 1 forAlgorithm 11

Thepseudocode forAlgorithm 12 is as follows the currenthistory sequence identifying algorithm (identifying phase)Compute the identifying activation value according to (16)

The pseudocode for Algorithm 13 is in Pseudocode 2

4 Simulation

41 A Simple Example of Deterministic POMDP We wantto construct the looping transition model as in Figure 1First we obtain the transition instance ℎ119905 1199001 1198861 1199001 1198862 11990021198862 1199002 1198861 1199001 1198861 1199001 According to Algorithm 11 set the initialidentifying history sequence length 119896 = 1 The STAMNmodel is constructed incrementally using ℎ119905 as in Figure 6

When ℎ119905 = 1199001 1198861 1199001 1198862 1199002 1198862 1199002 1198861 1199001 1198861 1199001 1198862 1199001because of 119896 = 1 the looped state is identified by the currentobservation vectorThus ℎ1 = 1199001 and ℎ2 = 1199001 are the iden-tifying history sequences for the state 1199041 Since 1199001 = tans(ℎ11198862) and 1199002 = tans(ℎ2 1198862) in ℎ119905 exist so as to trans(ℎ1 1198862) =trans(ℎ2 1198862) according to Lemma 9 ℎ1 and ℎ2 are the iden-tifying history sequences for the states 119904119894 and 119904119895 respectivelyand 119904119894 = 119904119895 So 1199001 is distinguished as 11990010158401 and 119900101584010158401 And ℎ119905 =1199001 1198861 11990010158401 1198862 1199002 1198862 1199002 1198861 1199001 1198861 119900101584010158401 1198862 1199001

According the Lemma 10 for any two identifying historysequences ℎ119894 and ℎ119895 separately for 11990010158401 and 119900101584010158401 there existsℎ119894 = ℎ119895 iff 119904119894 = 119904119895 So the identifying history sequencelength for ℎ119894 and ℎ119895 separately for 119904119894 and 119904119895 must increase to 2For 11990010158401 the identifying history sequence is 1199001 1198861 11990010158401 and for119900101584010158401 the identifying history sequence is 1199001 1198861 119900101584010158401 These twoidentifying history sequences are identical so longer transi-tion instance is needed

fo(s2) = o2

fo(s1) = o1

s1

s2

a1

a1

a2

a2

Figure 6 The STAMNmodel with memory depth 1

Table 1 The identifying history sequences for 11990010158401 and 119900101584010158401 Hidden state 11990010158401 (119896 = 2) 119900101584010158401 (119896 = 3)

Identifying history sequence1199001 1198861 11990010158401 1199002 1198861 1199001 1198861 119900101584010158401119900101584010158401 1198862 119900101584011199002 1198861 11990010158401

When ℎ119905 = 1199001 1198861 11990010158401 1198862 1199002 1198862 1199002 1198861 1199001 1198861 119900101584010158401 1198862 11990010158401 11988621199002 1198861 11990010158401 1198862 1199002 1198861 1199001 1198861 1199001 1198861 11990010158401 1198862 1199002 In ℎ119905 we obtain theidentifying history sequence ℎ119894 = ℎ119895 for states 11990010158401 and 119900101584010158401 For 119896 = 2 the distinguished identifying history sequence for11990010158401 is 119900101584010158401 1198862 11990010158401 and 1199002 1198861 11990010158401 However the distinguishedidentifying history sequence for 119900101584010158401 is 1199001 1198861 119900101584010158401 which is nota discriminated history sequence from the identifying historysequence for 11990010158401 So the identifying history sequence length for119900101584010158401 must increase to 3 The identifying history sequences for11990010158401 and 1199001 are described as in Table 1

According to 119896-step history identifying following iden-tifying and previous identifying ℎ119905 can be represented asℎ119905 = 119900101584010158401 1198861 11990010158401 1198862 1199002 1198862 1199002 1198861 11990010158401 1198861 119900101584010158401 1198862 11990010158401 1198862 1199002 1198861 119900101584011198862 1199002 1198861 11990010158401 1198861 119900101584010158401 1198861 11990010158401 1198862 1199002

According to Pseudocode 1 the STAMN model is con-structed incrementally using ℎ119905 as in Figure 7(a) and theLPST is constructed as Figure 7(b) The STAMN has thefewest nodes because the state nodes in STAMN representthe hidden state and the state nodes in LPST represent theobservation state

To illustrate the difference between the STAMN andLPST we present the comparison results in Figure 8

After 10 timesteps the algorithms use the current learnedmodel to realize the transition prediction and the transitionprediction criterion is expressed by the prediction error Thedata point represents the average prediction error over 10runs We ran three algorithms the STAMN with exhaustiveexploration the STAMN with random exploration and theLPST

10 Computational Intelligence and Neuroscience

Set initial memory depth 119896 = 1 119905 = 0All nodes and connections weight do not exist initially in STAMNThe transition instance ℎ0 = 120601A STAMN is constructed incrementally through interaction with the environment by the agentexpanding the SATAMN until the transition prediction is contradicted with the current minimalhypothesis model reconstruct the hypothesis model by increase the memory depth 119896 in thispaper the constructing process includes the identifying and recalling simultaneouslyWhile one pattern 119900119905 or 119886119905 can be activated from the pattern cognition layer do

ℎ119905 = (ℎ119905minus1 119900119905 119886119905)for all existed state nodescompute the identifying activation value by executing Algorithm 12If exist the 119899119904119895(119905) = 1 thenthe looped state is identified and the weight 120596119894119895120596119894119895 is adjusted according to (4) (5)

elsea new state node 119895 will be created set 119899119904119895(119905) = 1 and the weight 120596119894119895120596119894119895 is adjusted according to (4) (5)end ifend forfor all existed state nodescompute the transition prediction criterion by executing Algorithm 13If IsNotTrans(119904119895) then

for the previous winner state node 119894 of the state node 119895set memory depth 119896 = 119896 + 1 until each ℎ119894 are discriminated

end forreconstruct the hypothesis model by the new memory depth 119896 according to Algorithm 11

end ifend forfor all existed action nodescompute the activation value according to (9)

if exist the 119899119886119895 (119905) = 1 thenthe weight 119908119894119895(119905) is adjusted according to (6)else

a new action node 119895 will be created set 119899119886119895 (119905) = 1 and the weight 119908119894119895(119905) is adjusted according to (6)end ifend forThe agent obtains the new observation vectors by 119900119905+1 = trans(119904119905 119886119905)119905 = 119905 + 1endWhile

Pseudocode 1 Algorithm for heuristic constructing the STAMN

IsNotTrans(119904119895)if the 119901119904119895(119905) gt 120599 according to (12) and the 119902119904119895(119905) = 1 according to (13) thenreturn Falseelsereturn Trueend if

Pseudocode 2 Transition prediction criterion algorithm (recalling phase)

Figure 8 shows that three algorithms all can produce thecorrect model with zero prediction error at last because ofno noise However the STAMN with exhaustive explorationhas the better performance and the faster convergence speedbecause of the exhaustive exploration 119876-function

42 Experimental Comparison in 4 lowast 3 Grid Problem Firstwe use the small POMDP problem to test the feasibility of

the STAMN 4 lowast 3 gird problem is selected which is shownin Figure 9 The agent wanders inside the grid and has onlyleft sensor and right sensor to report the existence of a wallin the current position The agent has four actions forwardbackward turn left and turn rightThe reference direction ofthe agent is northward

In the 4 lowast 3 gird problem the hidden state space size|119878| is 11 the action space size |119860| is 4 for each state and

Computational Intelligence and Neuroscience 11

a1

fo(s3) = o1

fo(s2) = o2fo(s1) = o1 s1 s2

s3

a1a1

a1a2

a2

a2

(a)

a1a1

a1

a1

a1

a2

r

o1

o1

o1

o1

o1

o1

o2

o2

o2

(b)

Figure 7 (a) The STAMNmodel (b) The LPST model

0 20 40 60 80 1000

09

08

07

06

05

04

03

02

01

1 A simple deterministic POMDP

Timesteps

Pred

ictio

n er

ror

STAMN (exhaustive exploration)STAMN (random exploration)LPST

Figure 8 Comparative performance results for a simple determin-istic POMDP

the observation space size |119874| is 4 The upper figure in thegrid represents the observation state and the bottom figure inthe grid represents the hidden state The black grid and thewall are regarded as the obstacles We present a comparisonbetween the LPST and the STAMNThe results are shown inFigure 10 Figure 10 shows that the STAMN with exhaustiveexploration still has the better performance and the fasterconvergence speed in the 4 lowast 3 gird problem

The number of state nodes and action nodes in STAMNand LPST is described in Table 2 In STAMN the state node

20

34

01

08

27

13

02

110

09

16

25

Figure 9 4 lowast 3 gird problem

Table 2 The number of state nodes and action nodes in STAMNand LPST

Algorithm The number of statenodes The number of action nodes

LPST 70 50STAMN 11 44

represents the hidden state but in LPST the sate node repre-sents observation value and the hidden state is expressed by119896-step past observations and actions thus more observationnodes and action nodes are repeated created and most of theobservation nodes and action nodes are the same LPST hasthe perfect observation prediction capability and is the sameas the STAMN but is not the state transition model

In STAMN The setting of parameters 120583 120578 120574119904119894 120575119886119894 is veryimportant which determines the depth 119896 of short-term

12 Computational Intelligence and Neuroscience

0 200 400 600 800 1000Timesteps

STAMN (exhaustive exploration)STAMN (random exploration)LPST

0

09

08

07

06

05

04

03

02

01

1

Pred

ictio

n er

ror

4 lowast 3 grid problem

Figure 10 Comparative performance results for 4lowast3 gird problem

memory We assume 120574119904119894 = 120575119886119894 = 09 and 120583 = 120578 = 09 initialrepresenting 119896 = 1 When we need to increase 119896 to 119896 + 1 weonly need to decrease 120583 120578 according to 120578 = 120578minus02 120583 = 120583minus01The other parameters are determined relative easily We setlearning rates 120572 = 120573 = 120574 = 09 and set the confidenceparameters 119862119894119895 = 1|120591119895| 119863119894119895 = 12 representing the sameimportance degree Set constant values 119862 = 5 120599 = 09

43 Experimental Results in Complex Symmetrical Environ-ments The symmetrical environment in this paper is verycomplexwhich is shown in Figure 11(a)The robotwanders inthe environment By following wall behaviour the agent canrecognize the left wall right wall and corridor landmarksThese observation landmarks are different from the observa-tion vector in the previous grid problem every observationlandmark has different duration time For analysis of thefault tolerance and robustness of STAMN we assume thedisturbed environment is shown in Figure 11(b)

The figures in Figure 11(a) represent the hidden statesThe robot has four actions forward backward turn leftand turn right The initial orientation of the robot is shownin Figure 11(a) Since the robot follows the wall in theenvironment the robot has only one optional action in eachstate The reference direction of action is the robot currentorientation Thus the path 2-3-4-5-6 and path 12-13-14-15-16 are identical and the memory depth 119896 = 6 is necessaryto identify hidden states reliably We present a comparisonbetween the STAMNand the LPST in noise-free environmentand disturbed environment The results are shown as inFigures 12(a) and 12(b) Figure 12(b) shows that STAMN hasbetter noise tolerance and robustness than LPST because ofthe neuron synchronized activation mechanism which bearsthe fault and noise in the sequence item duration time andcan realize the reliable recalling However the LPST cannot

realize the correct convergence in reasonable time because ofaccurate matching

5 Discussion

In this section we compare the relatedworkwith the STAMNmodel The related work mainly includes the LPST and theAMN

51 Looping Prediction Suffix Tree (LPST) LPST is construc-ted incrementally by expanding branches until they are iden-tified or become looped using the observable terminationcriterion Given a sufficient history this model can correctlycapture all predictions of identifying histories and can mapthe all infinite identifying histories onto a finite LPST

The STAMNproposed in this paper is similar to the LPSTHowever the STAMN is looping hidden state transitionmodel so in comparison with LPST STAMN have lessstate nodes and action nodes because these nodes in LPSTare based on observation not hidden state FurthermoreSTAMNhas better noise tolerance and robustness than LPSTThe LPST realizes recalling by successive accurate matchingwhich is sensitive to noise and fault The STAMN offers theneuron synchronized activation mechanism to realize recall-ing Even if in noisy and disturbed environment STAMNcan still realize the reliable recalling Finally the algorithm forlearning a LPST is an additional computation model whichis not a distributed computational model The STAMN isa distributed network and uses the synchronized activationmechanism where performance cannot become poor withhistory sequences increasing and scale increasing

52 Associative Memory Networks (AMN) STAMN is pro-posed based on the development of the associative memorynetworks In existing AMN firstly almost all models areunable to handle complex sequence with looped hiddenstate The STAMN realizes the identifying the looped hiddenstate indeed which can be applied to HMM and POMDPproblems Furthermoremost AMNmodels can obtainmem-ory depth determined by experiments The STAMN offers aself-organizing incremental memory depth learningmethodand the memory depth 119896 is variable in different parts ofstate space Finally the existing AMN models general onlyrecord the temporal orders with discrete intervals Δ119905 ratherthan sequence items durationwith continuous-time STAMNexplicitly deals with the duration time of each item

6 Conclusion and Future Research

POMDP is the long-standing difficult problem in the mach-ine learning In this paper SATMN is proposed to identifythe looped hidden state only by the transition instances indeterministic POMDP environment The learned STAMN isseen as a variable depth 119896-Morkov model We proposed theheuristic constructing algorithm for the STAMN which isproved to be sound and complete given sufficient historysequences The STAMN is real self-organizing incrementalunsupervised learning model These transition instances can

Computational Intelligence and Neuroscience 13

2

3

6

7

8

9

10

11

12

54 14

13

21

22

23

24 18

19

20 16

171

15

(a) (b)

Figure 11 (a) Complex symmetrical simulation environment (b) The disturbed environment

0 100 200 300 400 500

A complex symmetrical environment

Timesteps

STAMNLPST

0

09

08

07

06

05

04

03

02

01

1

Pred

ictio

n er

ror

(a)

0 100 200 300 400 500

A disturbed environment

Timesteps

STAMNLPST

0

09

08

07

06

05

04

03

02

01

1Pr

edic

tion

erro

r

(b)

Figure 12 (a) Comparative results for a noise-free environment (b) Comparative results for a disturbed environment

be obtained by the interaction with the environment by theagent and can be obtained by a number of training datathat does not depend on the real agent The STAMN is veryfast robust and accurate We have also shown that STAMNoutperforms some existing hidden state methods in deter-ministic POMDP environment The STAMN can generallybe applied to almost all temporal sequences problem suchas simultaneous localization and mapping problem (SLAM)robot trajectory planning sequential decision making andmusic generation

We believe that the STAMN can serve as a starting pointfor integrating the associative memory networks with thePOMDP problem Further research will be carried out onthe following aspects how to scale our approach to thestochastic case by heuristic statistical test how to incorporatewith the reinforcement learning to produce a new distributed

reinforcement learning model and how it should be appliedto robot SLAM to resolve the practical navigation problem

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this article

References

[1] A K McCallum Reinforcement learning with selective percep-tion and hidden state [PhD thesis] University of RochesterRochester NY USA 1996

[2] MHutter ldquoFeature reinforcement learning part I unstructuredMDPsrdquo Journal of Artificial General Intelligence vol 1 no 1 pp3ndash24 2009

14 Computational Intelligence and Neuroscience

[3] M Daswani P Sunehag andMHutter ldquoFeature reinforcementlearning state of the artrdquo in Proceedings of the Workshops at the28th AAAI Conference on Artificial Intelligence 2014

[4] PNguyen P Sunehag andMHutter ldquoContext treemaximizingreinforcement learningrdquo in Proceedings of the 26th AAAI Con-ference onArtificial Intelligence pp 1075ndash1082 Toronto CanadaJuly 2012

[5] J Veness K S NgMHutterWUther andD Silver ldquoAMonte-Carlo AIXI approximationrdquo Journal of Artificial IntelligenceResearch vol 40 no 1 pp 95ndash142 2011

[6] M P Holmes and C L Isbell Jr ldquoLooping suffix tree-basedinference of partially observable hidden staterdquo in Proceedings ofthe 23rd International Conference on Machine Learning (ICMLrsquo06) pp 409ndash416 ACM Pittsburgh Pa USA 2006

[7] M Daswani P Sunehag andMHutter ldquoFeature reinforcementlearning using looping suffix treesrdquo in Proceedings of the 10thEuropeanWorkshop on Reinforcement Learning (EWRL rsquo12) pp11ndash24 Edinburgh Scotland 2012

[8] M Daswani P Sunehag and M Hutter ldquoQ-learning forhistory-based reinforcement learningrdquo in Proceedings of theConference on Machine Learning pp 213ndash228 2013

[9] S Timmer and M Riedmiller Reinforcement learning with his-tory lists [PhD thesis] University of Osnabruck OsnabruckGermany 2009

[10] E Talvitie ldquoLearning partially observable models using tempo-rally abstract decision treesrdquo in Advances in Neural InformationProcessing Systems pp 818ndash826 2012

[11] M Mahmud ldquoConstructing states for reinforcement learningrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 727ndash734 June 2010

[12] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquoNeural Computation vol 9 no 8 pp 1735ndash1780 1997

[13] V Mnih K Kavukcuoglu D Silver et al ldquoHuman-level controlthrough deep reinforcement learningrdquo Nature vol 518 no7540 pp 529ndash533 2015

[14] M Hausknecht and P Stone ldquoDeep recurrent Q-learning forpartially observable MDPsrdquo httparxivorgabs150706527

[15] K Narasimhan T Kulkarni and R Barzilay ldquoLanguage under-standing for text-based games using deep reinforcement learn-ingrdquo httpsarxivorgabs150608941

[16] J Oh X Guo H Lee R Lewis and S Singh ldquoAction-condi-tional video prediction using deep networks in atari gamesrdquohttparxivorgabs150708750

[17] X Li L Li J Gao et al ldquoRecurrent reinforcement learning ahybrid approachrdquo httparxivorgabs150903044

[18] D Wang and B Yuwono ldquoIncremental learning of complextemporal patternsrdquo IEEE Transactions on Neural Networks vol7 no 6 pp 1465ndash1481 1996

[19] A Sudo A Sato and O Hasegawa ldquoAssociative memory foronline learning in noisy environments using self-organizingincremental neural networkrdquo IEEE Transactions on NeuralNetworks vol 20 no 6 pp 964ndash972 2009

[20] M U Keysermann and P A Vargas ldquoTowards autonomousrobots via an incremental clustering and associative learningarchitecturerdquo Cognitive Computation vol 7 no 4 pp 414ndash4332014

[21] A F R Araujo and G D A Barreto ldquoContext in temporal seq-uence processing a self-organizing approach and its applicationto roboticsrdquo IEEE Transactions on Neural Networks vol 13 no1 pp 45ndash57 2002

[22] A L Freire G A Barreto M Veloso and A T Varela ldquoShort-term memory mechanisms in neural network learning of robotnavigation tasks A Case Studyrdquo in Proceedings of the 6th LatinAmerican Robotics Symposium (LARS rsquo09) pp 1ndash6 October2009

[23] S Tangruamsub A Kawewong M Tsuboyama and O Hase-gawa ldquoSelf-organizing incremental associative memory-basedrobot navigationrdquo IEICE Transactions on Information andSystems vol 95 no 10 pp 2415ndash2425 2012

[24] V A Nguyen J A Starzyk W-B Goh and D Jachyra ldquoNeuralnetwork structure for spatio-temporal long-term memoryrdquoIEEE Transactions on Neural Networks and Learning Systemsvol 23 no 6 pp 971ndash983 2012

[25] F Shen H Yu W Kasai and O Hasegawa ldquoAn associativememory system for incremental learning and temporal seq-uencerdquo in Proceedings of the International Joint Conference onNeural Networks (IJCNN rsquo10) pp 1ndash8 IEEE Barcelona SpainJuly 2010

[26] F ShenQOuyangWKasai andOHasegawa ldquoA general asso-ciative memory based on self-organizing incremental neuralnetworkrdquo Neurocomputing vol 104 pp 57ndash71 2013

[27] B Khouzam ldquoNeural networks as cellular computing modelsfor temporal sequence processingrdquo Suplec 2014

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 2: Research Article A Self-Organizing Incremental ...downloads.hindawi.com › journals › cin › 2016 › 7158507.pdfResearch Article A Self-Organizing Incremental Spatiotemporal Associative

2 Computational Intelligence and Neuroscience

states arbitrarily well McCallum [1] and Mahmud [11] bothintroduced the incremental search algorithm for learn proba-bilistic deterministic finite automata but thesemethods learnextremely slowly and with some other restrictions Otherscholars use recurrent neural networks (RNN) to acquirememory capability A well known architecture for RNN isLong Short-term Memory (LSTM) proposed by Hochreiterand Schmidhuber [12] Deep reinforcement learning (DRL)[13] first was proposed byMnih et al which used deep neuralnetworks to capture and infer hidden states but this methodstill apply to MDP Recently deep recurrent 119876-learning wasproposed [14] where a recurrent LSTM model is used tocapture the long-term dependencies in the history Similarmethods were proposed to learn hidden states for solvingPOMDP [15 16] A hybrid recurrent reinforcement learningapproach that combined the supervised learning with RL wasintroduced to solve customer relationship management [17]These methods can capture and identify hidden states in anautomatic way Because these networks use common weightsand fixed structure it is difficult to achieve incrementallearning These networks were suited to resolve the spa-tiotemporal pattern recognition (STPR) which is extractionof spatiotemporal invariances from the input stream For thetemporal sequence learning and recalling in more accuratefashion such as the trajectory planning decision makingrobot navigation and singing special neural networkmodelsfor temporal sequence learning may be more suitable

Biologically inspired associative memory networks(AMN) have shown some success for this temporal sequencelearning and recalling These networks are not limited tospecific structure realizing incremental sequence learning inan unsupervised fashion Wang and Yuwono [18] establisheda model to recognize and learn complex sequence whichis also capable of incremental learning but need to providedifferent identifiers for each sequence artificially Sudo et al[19] proposed the self-organizing incremental associativememory (SOIAM) to realize the incremental learningKeysermann and Vargas [20] proposed a novel incrementalassociative learning architecture for multidimensionalreal-valued data However these methods cannot addresstemporal sequences By using time-delayed Hebb learningmechanism a self-organizing neural network for learningand recall of complex temporal sequences with repeateditems and shared items is presented in [21 22] which wassuccessfully applied to robot trajectory planning Tangruam-sub et al [23] presented a new self-organizing incrementalassociative memory for robot navigation but this methodonly dealt with simple temporal sequences Nguyen et al[24] proposed a long-term memory architecture which ischaracterized by three features hierarchical structure anti-cipation and one-shot learning Shen et al [25 26] provideda general self-organizing incremental associative memorynetwork This model not only leaned binary and nonbinaryinformation but realized one-to-one and many-to-manyassociations Khouzam [27] presented a taxonomy abouttemporal sequences processing methods Although thesemodels realized heteroassociative memory for complextemporal sequences the memory length 119896 still is decidedby designer which cannot vary in self-adaption fashion

Moreover These models are unable to handle complexsequence with looped hidden state

The rest of this paper is organized as follows In Section 2we introduce the problem setup present the theoreticalanalysis for aminimal looping hidden state transitionmodeland derive a heuristic constructing algorithm for this modelIn Section 3 STAMNmodel is analyzed in detail including itsshort-term memory (STM) long-term memory (LTM) andthe heuristic constructing process In Section 4 we presentdetailed simulations and analysis of the STAMN model andcompare the performance of the STAMN model with that ofother methods Finally a brief discussion and conclusion aregiven in Sections 5 and 6 separately

2 Problem Setup

A deterministic POMDP environment can be represented bya tuple 119864 = ⟨119878 119860 119874 119891119904 119891119900⟩ where 119878 is the finite set of hiddenworld states 119860 is the set of actions that can be taken by theagent119874 is the set of possible observations119891119904 is a determinis-tic transition function 119891119904(119878 119860) rarr 119878 and 119891119900 is a deterministicobservation function 119891119900(119878 119860) rarr 119874 In this thesis we onlyconsider the special observation function 119891119900(119878) rarr 119874 thatsolely depends on the state 119904 A history sequence ℎ is definedas a sequence of past observations and actions 1199001 1198861 11990021198862 119886119905minus1 119900119905 which can be generated by the deterministictransition function 119891119904 and the deterministic observationfunction 119891119900 The length |ℎ| of a history sequence is definedas the number of observations in this history sequence

The environment that we discuss is deterministic andthe state space is finite We also assume the environment isstrongly connected However the environment is determinis-tic which can be highly complicated and nondeterministic atthe level of observation The hidden state can be fully identi-fied by a finite history sequence in a deterministic POMDPwhich is proved in [2] Several notations are defined asfollows

trans(ℎ 119886) = 119900 | 119900 is a possible observation follow-ing ℎ by taking action 119886120574(119904119894 119889119894119895) denotes the observations sequence 1199001 1199002 119900119899 generated by taking actions sequence 119889119894119895 from state119904119894

Our goal is to construct a minimal looping hidden statetransition model by use of the sufficient history sequencesFirst we present the theoretical analysis showing any deter-ministic POMDP environment can be represented by aminimal looping hidden state transition model Then wepresent a heuristic constructing algorithm for this modelCorresponding definitions and lemmas are proposed asfollows

Definition 1 (identifying history sequence) A identifying his-tory sequence ℎ119894 is considered to uniquely identify the hiddenstate 119904119894 In the rest of this paper the hidden state 119904119894 is equallyregarded as its identifying history sequence ℎ119894 so 119904119894 and ℎ119894 canreplace each other

A simple example of deterministic POMDP is illustratedin Figure 1 We conclude easily that an identifying history

Computational Intelligence and Neuroscience 3

fo(s3) = o1

fo(s2) = o2

fo(s1) = o1

s1 s2

s3

a1

a1

a2

a2

a1 a2

Figure 1 A simple example of deterministic POMDP

sequence ℎ2 for 1199042 is expressed by 1199002 and an identifyinghistory sequence ℎ1 for 1199041 is expressed by 1199002 1198861 1199001 1198861 1199001Note that there may exist infinitely many identifying historysequences ℎ119894 for 119904119894 because the environment is stronglyconnected and may exist as unbounded long identifyinghistory sequences ℎ119894 for 119904119894 because of uninformative loopingSo this leads us to determine the minimal identifying historysequences length 119896

Definition 2 (minimal history sequences length 119896) The min-imal identifying history sequences length 119896 for the hiddenstate 119904119894 is defined as follows The minimal identifying historysequence ℎ for the hidden state 119904119894 is the identifying historysequence ℎ such that no suffix ℎ1015840 of ℎ is also identified for 119904119894However theminimal identifying history sequencemay haveunbounded length because the identifying history sequencecan include looping arbitrarily many times which merelylengthens the history sequence For example two identifyinghistory sequences ℎ1 and ℎ10158401 for 1199041 are separately expressedto 1199002 1198861 1199001 1198861 1199001 and 1199002 1198861 1199001 1198861 1199001 1198861 1199001 1198861 1199001 In thissituation 1199001 1198861 1199001 1198861 1199001 is treated as the looped item Sothe minimal identifying history sequences length 119896 for thehidden state 119904119894 is the minimal length value of all identifyinghistory sequences by excising the looping portion

Definition 3 (a criterion for identifying history sequence)Given a sufficient history sequence for the set of historysequences for the hidden state 119904 if each history sequence ℎfor the hidden state 119904 exists which satisfies that trans(ℎ 119886)is the same observation for each action thus we considerthe history sequence ℎ as identifying history sequenceDefinition 3 is correct iff a full sufficient history sequence isgiven

Lemma 4 (backward identifying history sequence) Weassume an identifying history sequence ℎ is for a single hiddenstate 119904 if ℎ1015840 is a backward extension of ℎ by adding an action119886 isin 119860 and an observation 119900 isin 119874 then ℎ1015840 is a new identifying

history sequence for the single state 1199041015840 = 119891119904(119904 119886) 1199041015840 is called theprediction state of 119904Proof We assume ℎ1015840 is a history sequence 1199001 1198861 119886119905minus1 119900119905generated by the transition function 119891119904 and observationfunction119891119900 And ℎ is a history sequence 1199001 1198861 119886119905minus2 119900119905minus1as a prefix of ℎ1015840 Since ℎ is identifying history sequence for 119904it must be that 119904119905minus1 = 119904 Because the environment is deter-ministic POMDP thus the next state 1199041015840 is uniquely deter-mined by 119891119904(119904119905minus1 119886119905minus1) = 119904119905 after the action 119886119905minus1 has beenexecuted Then ℎ1015840 is a new identifying history sequence forthe single state 119904119905 = 119891119904(119904119905minus1 119886119905minus1) = 119891119904(119904 119886119905minus1) = 1199041015840Lemma5 (forward identifying history sequence) Weassumean identifying history sequence ℎ1015840 is for a single hidden state 1199041015840if ℎ is a prefix of ℎ1015840 by removing the last action 119886 isin 119860 andthe last observation 119900 isin 119874 then ℎ is the identifying historysequence for the single state s such that 1199041015840 = 119891119904(119904 119886) s is calledthe previous state of 1199041015840Proof We assume ℎ1015840 is a history sequence 1199001 1198861 119886119905minus1 119900119905generated by the transition function119891119904 and observation func-tion 119891119900 And ℎ is a history sequence 1199001 1198861 119886119905minus2 119900119905minus1 as aprefix of h by removing the action 119886119905minus1 and the observation 119900119905Since ℎ1015840 is an identifying history sequence for the state 1199041015840 itmust be that 119904119905 = 1199041015840 Because the environment is deterministicPOMDP thus the current state 1199041015840 is uniquely determinedby the previous state 119904 such that 119891119904(119904119905minus1 119886119905minus1) = 1199041015840 Then ℎis the identifying history sequence for the state 119904 such that119891119904(119904119905minus1 119886119905minus1) = 119891119904(119904 119886119905minus1) = 1199041015840Theorem 6 Given the sufficient history sequences a finitestrongly connected deterministic POMDP environment canbe represented soundly by a minimal looping hidden statetransition model

Proof We assume 119864 = ⟨119878 119860 119874 119891119904 119891119900⟩ where 119878 is the finiteset of hidden world states and |119878| is the number of the hiddenstates First for all hidden states at least a finite lengthidentifying history sequence ℎ for one state 119904 exists becausethe environment is strongly connected there must exist atransition history ℎ1015840 from 119904 to 1199041015840 according to Lemma 4by backward extending of ℎ by adding ℎ1015840 which is theidentifying history sequence for 1199041015840 Since the hidden statetransition model can identify the looping hidden state thereexists the maximal transition history sequence length from 119904to 1199041015840 which is |119878| Thus this model has minimal hidden statespace with |119878|

Since the hidden state transition model can correctlyrealize the hidden state transition 1199041015840 = 119891119904(119904 119886) this modelcan correctly express all identifying history sequences for allhidden state 119904 and possesses the perfect transition predictioncapacity and this model has minimal hidden state space with|119878| This model is a 119896-step variable history length modeland the memory depth 119896 is a variable value for the differenthidden state This model is realized by the spatiotemporalassociative memory networks in Section 3

If we want to construct correct hidden state transi-tion model first we necessary to collect sufficient history

4 Computational Intelligence and Neuroscience

sequences to perform statistic test to determine the minimalidentifying history sequence length 119896 However in practiceit can be difficult to obtain a sufficient history So we proposea heuristic state transition model constructing algorithmWithout the sufficient history sequence the algorithm mayproduce a premature state transitionmodel but this model atleast is correct for the past experienceWe realize the heuristicconstructing algorithm by way of two definitions and twolemmas as follows

Definition 7 (current history sequence) A current historysequence is represented by the 119896-step history sequence119900119905minus119896+1 119886119905minus119896+1 119900119905minus119896+2 119886119905minus1 119900119905 generated by the transitionfunction 119891119904 and the observation function 119891119900 At 119905 = 0 thecurrent history sequence ℎ0 is empty 119900119905 is the observationvectors at current time 119905 and 119900119905minus119896+1 is the observation vectorsthat precede 119896-step at time 119905Definition 8 (transition instance) The history sequence asso-ciated with time 119905 is captured as a transition instanceThe transition instance is represented by the tuple ℎ119905+1 =(ℎ119905 119886119905 119900119905+1) where ℎ119905 and ℎ119905+1 are current history sequencesoccurring at times 119905 and 119905 + 1 on episode A set of transitioninstances will be denoted by the symbol F which possiblycontains transitions from different episodes

Lemma 9 For any two hidden states 119904119894 and 119904119895 119904119894 = 119904119895 ifftrans(119904119894 119886) = trans(119904119895 119886)

This lemma gives us a sufficient condition to determine 119904119894 =119904119895 However this lemma is not a necessary condition

Proof by Contradiction We assume that 119904119894 = 119904119895 thus for allactions sequences 119889119894119895 = 1198861 1198862 119886119899 there exists 120574(119904119894 119889119894119895) =120574(119904119895 119889119894119895) 120574(119904119894 119889119894119895) = 120574(119904119895 119889119894119895) is contradicting with trans(119904119894119886) = trans(119904119895 119886) Thus original proposition is true

Lemma 10 For any two identifying history sequences ℎ119894 and ℎ119895separately for 119904119894 and 119904119895 there exists ℎ119894 = ℎ119895 iff 119904119894 = 119904119895 HoweverLemma 10 is not a necessary condition

Proof by Contradiction Since an identifying history sequenceℎ is considered to uniquely identify the hidden state 119904 theidentifying history sequences ℎ119894 uniquely identify the hiddenstate 119904119894 and the identifying history sequences ℎ119895 uniquelyidentify the hidden state 119904119895 We assume ℎ119894 = ℎ119895 thus theremust exit 119904119894 = 119904119895 which is contradicting with 119904119894 = 119904119895 Thusoriginal proposition is true

However the necessary condition for Lemma 10 is notalways true because there maybe exist several differentidentifying history sequences for the same hidden state

Algorithm 11 (a heuristic constructing algorithm for thestate transition model) The initial state transition model isconstructed making use of the minimal identifying historysequence length 119896 = 1 And the model is empty initially

The transition instance ℎ is empty and 119905 = 0(1) We assume the given model can perfectly identify the

hidden state The agent makes a step in the environ-ment (according to history sequence definition thefirst item in ℎ is the observation vector) It recordsthe current transition instance on the end of the chainof transition instance And at each time 119905 for thecurrent history sequence ℎ119905 execute Algorithm 12 ifthe current state 119904 is looped to the hidden state 119904119894 thengo to step (2) otherwise a new node 119904119894 is created in themodel go to step (4)

(2) According to Algorithm 13 if trans(ℎ119905 119886119895) = trans(ℎ119894119886119895) then the current state 119904 is distinguished from theidentifying sate 119904119894 then go to step (3) If there existstrans(ℎ119905 119886119895) = trans(ℎ119894 119886119895) then go to step (4)

(3) According to Lemma 10 for any two identifying his-tory sequences ℎ119894 and ℎ119895 separately for 119904119894 and 119904119895 thereexists ℎ119894 = ℎ119895 iff 119904119894 = 119904119895 So the identifying historysequence length for ℎ119894 and ℎ119895 separately for 119904119894 and 119904119895must increase to 119896+1until ℎ119894 and ℎ119895 are discriminatedWe must reconstruct the model based on the newminimal identifying history sequence length 119896 and goto step (1)

(4) If the action node and corresponding 119876 function forthe current identifying state node exist in the modelthe agent chooses its next action node 119886119895 based on theexhaustive exploration 119876 function If the action nodeand corresponding 119876 function do not exist the agentchooses a random action 119886119895 instead and a new actionnode is created in the model After the action controlsignals to delivery the agent obtains the new observa-tion vectors by trans(119904119894 119886119895) go to step (1)

(5) Steps (1)ndash(4) continue until all identifying historysequences ℎ for the same hidden state 119904 can correctlypredict the trans(ℎ 119886) for each action

Note we apply the 119896-step variable history length from119896 = 1 to 119899 and the history length 119896 is a variable value for thedifferent hidden state We adopt the minimalist hypothesismodel in constructing the state transitionmodel processThisconstructing algorithm is a heuristic If we adopt the exhaus-tiveness assumptionmodel the probability of missing loopedhidden state increases exponentially and many valid loopscan be rejected yielding larger redundant state and poorgeneralization

Algorithm 12 (the current history sequence identifying algo-rithm) In order to construct the minimal looping hiddenstate transition model we need to identify the looped stateby identifying hidden state Current history sequence with119896-step is needed According to Definition 7 current 119896-stephistory sequence at time 119905 is expressed by 119900119905minus119896+1 119886119905minus119896+1119900119905minus119896+2 119886119905minus1 119900119905 There exist three identifying processes

Computational Intelligence and Neuroscience 5

(1) 119896-Step History Sequence Identifying If the identifyinghistory sequence for 119904119894 is 119900119894minus119896+1 119886119894minus119896+1 119900119894minus119896+2 119886119894minus1 119900119894satisfy

119900119905 = 119900119894119886119905minus1 = 119886119894minus1

119900119905minus119896+2 = 119900119894minus119896+2119886119905minus119896+1 = 119886119894minus119896+1119900119905minus119896+1 = 119900119894minus119896+1

(1)

Thus the current state 119904 is looped to the hidden state119904119894 In the STAMN the 119896-step history sequence identifyingactivation value119898119904119895(119905) is computed

(2) Following Identifying If the current state 119904 is identified asthe hidden state 119904119894 the next transition history ℎ119905+1 is repre-sented by ℎ119905+1 = (ℎ119905 119886119905 119900119905+1) through trans(119904119894 119886119905) Accord-ing to Lemma 4 the next transition history ℎ119905+1 is identifiedas the transition prediction state 119904119895 = 119891119904(119904119894 119886119905)

In the STAMN the following activation value 119902119904119895(119905) iscomputed

(3) Previous Identifying If there exists a transition instanceℎ119905+1 = (ℎ119905 119886119905 119900119905+1) ℎ119905+1 is an identifying history sequencefor state 119904119895 According to Lemma 5 the previous state 119904119894 isuniquely identified by ℎ119905 such that 119904119895 = 119891119904(119904119894 119886119905) So if thereexists a transition prediction state 119904119895 and 119904119895 = 119891119904(119904119894 119886119905) thenthe previous state 119904119894 is uniquely identified

In the STAMN the previous activation value 119889119904119895(119905) iscomputed

Algorithm 13 (transition prediction criterion) If the modelcorrectly represents the environment model as Morkovchains then for all state 119904 for all identifying history sequenceℎ for the same state it is satisfied that trans(ℎ 119886) is the sameobservation for the same action

If ℎ1 and ℎ2 are the identifying history sequences with 119896-step memory for the hidden state 119904 there exist trans(ℎ1 119886) =trans(ℎ2 119886) according to Lemma 9 thus 119904119894 is discriminativewith 119904119895

In the STAMN the transition prediction criterion isrealized by the transition prediction value Psj(t)

3 Spatiotemporal AssociativeMemory Networks

In this section we want to relate the spatiotemporal sequenceproblem to the idea of identifying the hidden state by asequence of past observations and actions An associativememory network (AMN) is expected to have several charac-teristics (1) An AMN must memorize incrementally whichhas the ability to learn new knowledge without forgettingthe learned knowledge (2) An AMN must be able to notonly record the temporal orders but also record sequence

items duration time with continuous time (3) AnAMNmustbe able to realize the heteroassociation recall (4) An AMNmust be able to process the real-valued feature vectors inbottom-up method not the symbolic items (5) An AMNmust be robust and must be able to recall the sequencecorrectly with incomplete or noisy input (6) An AMN canrealize learning and recalling simultaneously (7) An AMNcan realize interaction between STM and LTM dual-tracetheory suggested that the persistent neural activities of STMcan lead to LTM

The first thing to be defined is the temporal dimension(discrete or continuous) The previous researches are mostlybased on a regular intervals of Δ119905 and few AMN have beenproposed to deal with not only the sequential characteristicbut also sequence items duration time with continuous timeHowever this characteristic is important for many problemsIn speech production writing music generation and motor-planning and so on the sequence item duration time andsequence item repeated emergence have essential differentmeaning For example ldquo1198614rdquo and ldquo119861-119861-119861-119861rdquo are exactlydifferent the former represents that item 119861 sustains for 4timesteps and the latter represents that item 119861 repeatedlyemerges for 4 times STAMN can explicitly distinguish thesetwo temporal characteristics So a history of past observationsand actions can be expressed by a special spatiotemporalsequence ℎ ℎ = 119900119905minus119896+1 119886119905minus119896+1 119900119905minus119896+2 119886119905minus1 119900119905 where 119896is the length of sequence of ℎ the items of ℎ include theobservation items and the action items where 119900119905 denotes thereal-valued observation vectors generated by taking action119886119905minus1 119886119905 denotes the action taken by the agent at time 119905where 119905 does not represent temporal discrete dimension bysampling at regular intervals Δ119905 but represents the 119905th step inthe continuous-time dimension which is the duration timebetween the current item and the next one

A spatiotemporal sequence can classified as simple seq-uence and complex sequence Simple sequence is a sequencewithout repeated items for example the sequence ldquo119861-119860-119864-119871rdquo is a simple sequence whereas those containing repeateditems are defined as the complex sequence In complexsequence the repeated item can be classified as looped itemsand discriminative items by identifying the hidden statefor example in the history sequence ldquo119883-119884-119885-119884-119885rdquo ldquo119884rdquoand ldquo119885rdquo maybe the looped items or discriminative itemsIdentifying the hidden state needs to introduce the contextualinformation resolving by the 119896-step memory The memorydepth 119896 is not fixed and 119896 is a variable value in different partsof state space

31 Spatiotemporal Associative Memory Networks Architec-ture We build a new spatiotemporal associative memorynetwork (STAMN) This model makes use of neuron activitydecay of nodes to achieve short-term memory connectionweights between different nodes to represent long-termmemory presynaptic potentials and neuron synchronizedactivation mechanism to realize identifying and recallingand a time-delayed Hebb learning mechanism to fulfil theone-shot learning

6 Computational Intelligence and Neuroscience

nsi

bsi

naj

baj

wij 120596ij

120596ij

Figure 2 STAMN structure diagram

STAMN is an incremental possibly looping nonfullyconnected asymmetric associative memory network Non-homogeneous nodes correspond to hidden state nodes andaction nodes

For the state node 119895 the input values are defined asfollows (1) the current observation activation value 119887119904119895(119905)responsible for matching degree of state node 119895 and currentobserved value which is obtained from the preprocessingneural networks (if the matching degree is greater thana threshold value then 119887119904119895(119905) = 1) (2) the observationactivation value 119887119904119895(119905) of the presynaptic nodes set 120591119895 of currentstate node 119895 (3) the identifying activation value 119899119904119894 (119905) of theprevious state node 119894 of state node 119895 (4) the activation value119899119886119894 (119905) of the previous action node 119894 of state node 119895 The outputvalues are defined as follows (1) The identifying activationvalue 119899119904119895(119905) represents whether the current state 119904 is identifiedto the hidden state 119904119895 or not (2) The transition predictionvalue 119901119904119895(119905) represents whether the state node 119895 is the currentstate transition prediction node or not

For the action node 119895 the input values are defined asthe current action activation value 119887119886119895 (119905) responsible formatching degree of action node 119895 and current motor vectorsThe output value is activation value of the action nodes 119899119886119895 (119905)and indicates that the action node 119895has been selected by agentto control the robotrsquos current action

For the STAMN all nodes and connections weight donot necessarily exist initially The weights 120596119894119895(119905) and 120596119894119895(119905)connected to state nodes can be learned by a time-delayedHebb learning rules incrementally representing the LTMThe weight 119908119894119895(119905) connected to action nodes can be learnedby reinforcement learning All nodes have activity self-decay mechanism to record the duration time of this noderepresenting the STM The output of the STAMN is thewinner state node or winner action node by winner takes all

STAMN architecture is shown in Figure 2 where blacknodes represent action nodes and concentric circles nodesrepresent state nodes

32 Short-TermMemory We using self-decay of neuron acti-vity to accomplish short-term memory no matter whetherobservation activation 119887119904119894 (119905) or identifying activation 119899119904119894 (119905)The activity of each state node has self-decay mechanismto record the temporal order and the duration time of thisnode Supposing the factor of self-decay is 120574119904119894 isin (0 1) and theactivation value 119887119904119894 (119905) 119899119904119894 (119905) isin [0 1] the self-decay process isshown as

119887119904119894 (119905) 119899119904119894 (119905)

= 0 119887119904119894 (119905) 119899119904119894 (119905) ⩽ 120578120574119904119894 119887119904119894 (119905 minus 1) 120574119904119894 119899119904119894 (119905 minus 1) else

(2)

The activity of action node also has self-decay character-istic Supposing the factor of self-decay is 120575119886119894 isin (0 1) and theactivation value 119887119886119894 (119905) 119899119886119894 (119905) isin [0 1] the self-decay process isshown as

119887119886119894 (119905) 119899119886119894 (119905)

= 0 119887119886119894 (119905) 119899119886119894 (119905) ⩽ 120583120575119886119894 119887119886119894 (119905 minus 1) 120575119886119894 119899119886119894 (119905 minus 1) else

(3)

wherein 120578 and 120583 are active thresholds and 120574119904119894 and 120575119886119894 areself-decay factors Both determine the depth of short-termmemory 119896 where 119905 is a discrete time point by sampling atregular intervals Δ119905 and Δ119905 is a very small regular interval

33 Long-Term Memory Long-term memory can be classi-fied into semantic memory and episodic memory Learningof history sequence is considered as the episodic memorygenerally adopting the one-shot learning to realize We usethe time-delayed Hebb learning rules to fulfil the one-shotlearning

(1) The 119896-Step Long-Term Memory Weight 120596119894119895(119905) The weight120596119894119895(119905) connected to state nodes 119895 represents 119896-step long-termmemory This is a past-oriented behaviour in the STAMNThe weight 120596119894119895(119905) is adjusted according to

120596119894119895 (119905)119894isin120591119895

119899119904119895(119905)=1

= 119899119904119895 (119905) 119899119883119894 (119905) 120596119894119895 (119905 minus 1) = 0 119883 isin 119904 119886120596119894119895 (119905 minus 1) + 120572 (119899119883119894 (119905) minus 120596119894119895 (119905 minus 1)) else (4)

where 119895 is the current identifying activation state node119899119904119895(119905) = 1 Because the identifying activation process is aneuron synchronized activation process when 119899119904119895(119905) = 1 all

nodes whose 119899119904119894 (119905) and 119899119886119894 (119905) are not zero are the contextualinformation related to state node 119895 These nodes whose 119899119904119894 (119905)and 119899119886119894 (119905) are not zero are considered as the presynaptic node

Computational Intelligence and Neuroscience 7

set of current state node 119895The presynaptic node set of currentstate node 119895 is expressed by 120591119895 where 119899119904119894 (119905) 119899119886119894 (119905) representthe activation value at time 119905 and 119899119904119894 (119905) 119899119886119894 (119905) record not onlythe temporal order but also the duration time because of theself-decay of neuron activity If 119899119904119894 (119905) 119899119886119894 (119905) are smaller thenode 119894 is more early to current state node 119895120596119894119895(119905) is activationweight between presynaptic node 119894 and state node 119895 120596119894119895(119905)records context information related to state node 119895 to be usedin identifying and recalling where 120596119894119895(119905) isin [0 1] 120596119894119895(0) = 0and 120572 is learning rate

The weight 120596119894119895(119905) is time-related contextual informationthe update process is shown as in Figure 3

(2) One-Step Transition Prediction Weight 120596119894119895(119905) The weight120596119894119895(119905) connected to state nodes 119895 represents one-step transi-tion prediction in LTM This is a future-oriented behaviourin the STAMN Using the time-delayed Hebb learning rulesthe weight 120596119894119895(119905) is adjusted according to

120596119894119895 (119905)119894=argmax 119899119883

119894(119905)

119899119904119895(119905)=1

=

119899119904119895 (119905) 119899119883119894 (119905) 120596119894119895 (119905 minus 1) = 0 119883 isin 119904 119886120596119894119895 (119905 minus 1) + 120573 (119899119883119894 (119905) minus 120596119894119895 (119905 minus 1)) else (5)

The transition activation of current state node 119895 is onlyassociated with the previous winning state node and actionnode where 119895 is the current identifying activation state node119899119904119895(119905) = 1 The previous winning state node and action nodeare presented by 119894 = argmax119883119894 (119905) 119883 isin 119904 119886 where 120596(119905) isin[0 1] and 120596(0) = 0 120573 is learning rate

The weight 120596119894119895(119905) is one-step transition prediction infor-mation the update process is shown as in Figure 4

(3) The Weight 119908119894119895(119905) Connected to Action NodesThe activa-tion of action node is only associated with the correspondingstate node which selects this action node directly We assumethe state node with maximal identifying activation value isthe state node 119894 at the time 119905 so the connection weight 119908119894119895(119905)connected to current selected action node 119895 is adjusted by

119908119894119895 (119905)119894=argmax 119899119904

119894(119905)

119899119886119895(119905)=1

=

1198760 (119904119894 119886119895) 119908119894119895 (119905 minus 1) = 0minus119876 max const 119886119895 is a invalid action

119876119905 (119904119894 119886119895) else(6)

where 119908119894119895(119905) is represented by 119876 function 119876119905(119904119905 119886119895) Thispaper only discusses how to build generalized environmentmodel not learning the optimal policy so this value is setto be the exhaustive exploration 119876 function based on thecuriosity reward The curiosity reward is described by (7)When action 119886119895 is an invalid action 119908119894119895(119905) is defined to bea large negative constant which is avoided going to the deadends

119903 (119905) = 119862 minus 119899119905

119899ave 119899ave gt 0119862 119899ave = 0

(7)

where 119862 is a constant value 119899119905 represents the count ofexploration to the current action by the agent and 119899ave is theaverage count of exploration to all actions by the agent 119899119905119899averepresents the degree of familiarity with the current selectedaction The curiosity reward is updated when each action is

finished 119876119905(119904119894 119886119895) update equation is showed by (8) where 120582is learning rate

119876119905 (119904119894 119886119895)

= (1 minus 120582)119876119905minus1 (119904119894 119886119895) + 120582119903 (119905) 119894 = argmax 119899119904119894 (119905) 119899119886119895 (119905) = 1119876119905minus1 (119904119894 119886119895) else

(8)

The update process is shown as in Figure 5The action node 119895 is selected by 119876119905(119904119905 119886119895) according to

119886119895 = argmax119895isin119860vl119894119899119904119894 (119905)=1

119876(119904119894 119886119895) (9)

where 119860vl119894 is the valid action set of state node 119894 119899119904119894 (119905) = 1

represent that the sate node 119894 is identified and the action node119895 was selected by agent to control the robotrsquos current actionset 119899119886119894 (119905) = 1

34 The Constructing Process in STAMN In order to con-struct the minimal looping hidden state transition model weneed to identify the looped state by identifying hidden stateThere exist identifying phase and recalling phase (transitionprediction phase) simultaneously in the constructing processin STAMN There is a chicken and egg problem during theconstructing process the building of the STAMN dependson state identifying conversely state identifying dependson the current structure of the STAMN Thus exhaustiveexploration and 119896-step variable memory length (depends onthe prediction criterion) are used to try to avoid local minimathat this interdependent causes

According to Algorithm 12 The identifying activationvalue of state node 119904119895 depends on three identifying processes119896-step history sequence identifying following identifyingand previous identifying We provide calculation equationsfor each identifying process

(1) 119896-Step History Sequence IdentifyingThe matching degreeof current 119896-step history sequence with the identifying

8 Computational Intelligence and Neuroscience

ojminus1 ajminus1120596ij ojmiddot middot middot ns

j(t) = 1

i isin Γj nsi(t) = 0 na

i = 0

ojminusk+1 ajminusk+1

Figure 3 Past-oriented weights 120596119894119895(119905) update process associatedwith state node 119895

oi ojai

120596ij(t)nsj(t) = 1

i = arg max nXi (t)

Figure 4 Future-oriented weights 120596119894119895(119905) update process associatedwith state node 119895

oi ajQ0(si aj)

naj (t) = 1

i = arg max nsi (t)

Figure 5 Weights 119908119894119895(119905) updating process associated with actionnode 119895

history sequence ℎ119895 for 119904119895 is computed to identify the loopingstate node 119895 First we compute the presynaptic potential119881119895(119905)for the state node 119895 according to

119881119895 (119905) = sum119894isin120591119895

119862119894119895 (119887119883119894 (119905) 120596119894119895) 119883 isin 119904 119886 (10)

where 119887119883119894 (119905) is the current activation value in the presynapticnode set 120591119895 119862119894119895 is the confidence parameter which meansthe node 119894rsquos importance degree in the presynaptic node set120591119895 The value of 119862119894119895 can be set in advance and sum119894isin120591119895 119862119894119895 =1 The function represents the similar degree betweenactivation value of the presynaptic node and the contextualinformation 120596119894119895 in LTM (119909 119910) isin [0 1] the similar degreeis high between 119909 and 119910 thus(119909 119910) is close to 1 Accordingto ldquowinner takes allrdquo among all nodes whose presynapticpotentials exceed the threshold 120599 the node with the largestpresynaptic potential will be selected

The presynaptic potential 119881119895(119905) represents the syn-chronous activation process of presynaptic node set of 120591119895

which represents the previous 119896 minus 1 step contextual infor-mation matching of state node 119895 To realize all 119896-step historysequence matching the 119896-step history sequence identifyingactivation value of the state node 119895 is given below

119898119904119895 (119905) = 119867 (119881119895 (119905) 120599)

sdot 119867(119881119895 (119905) max1198961rarr119873119888

(119881119896 (119905))) 119887119904119895 (119905)

119867 (119909 119910) = 0 119909 lt 1199101 119909 ge 119910

(11)

where max1198961rarr119873119888(119881119896(119905)) is the maximum potential value ofnode 119896 119867(119881119895(119905) 120599) sdot 119867(119881119895(119905)max1198961rarr119873119888(119881119896(119905))) means thenode with the largest potential value is selected among allnodes whose synaptic potentials exceed the threshold 120599 119887119904119895(119905)represents the matching degree of state node 119895 and currentobserved value If 119898119904119895(119905) = 1 the current state 119904 is identifiedto looped state node 119895 by the 119896 step memory

(2) Following Identifying If the current state 119904 is identifiedas the state 119904119894 then the next transition prediction state isidentified as the state 119904119895 = 119891119904(119904119894 119886119894) First we compute thetransition prediction value for the state node 119895 according to

119901119904119895 (119905) = sum119894=argmax 119899119883119894 (119905)

119863119894119895120601 (119899119883119894 (119905) 120596119894119895) 119883 isin 119904 119886 (12)

where 119899119883119894 (119905) is the identifying activation value of the statenode and action node at time 119905 119894 = argmax 119899119883119894 (119905) indicatesstate node 119894 and action node 119894 are the previous winner nodes119863119894119895 is the confidence parameter which means the node 119894rsquosimportance degree The value of 119863119894119895 can be set in advanceand sum119894=arg max 119899119883

119894(119905)119863119894119895 = 1 120596119894119895 records one-step transition

prediction information related to state node 119895 to be usedin identifying and recalling phase If the current state 119904 isidentified as the hidden state 119904119894119901119904119895(119905) represent the probabilityof the next transition prediction state is 119895

If the next prediction state node 119895 is the same as thecurrent observation value the current history sequence isidentified as the state node 119895 the following identifying valueof the state node 119895 is given below

119902119904119895 (119905) = 119867 (119901119904119895 (119905) 120599)

sdot 119867(119901119904119895 (119905) max1198961rarr119873119888

(119901119904119896 (119905))) 119887119904119895 (119905) (13)

If 119902119904119895(119905) = 1 the current history sequence ℎ119905 is identifiedto looped state node 119895 by the following identifying If 119901119904119895(119905) gemax1198961rarr119873119888(119901119904119896(119905)) 119901119904119895(119905) ge 120599 and 119902119904119895(119905) = 1 then there exists119887119904119895(119905) = 1 representing mismatching of state node 119895 and cur-rent observed value There exist trans(ℎ119905 119886119894) = trans(ℎ119894 119886119894)According to transition prediction criterion (Algorithm 13)the current history sequence ℎ119905 is not identified by the hidden

Computational Intelligence and Neuroscience 9

sate 119904119894 so the identifying history sequence length for ℎ119905 andℎ119894 separately must increase to 119896 + 1(3) Previous Identifying If the current history sequence isidentified as the state 119904119895 then the previous state 119904119894 is identifiedsuch that 119904119895 = 119891119904(119904119894 119886119894) First we compute the identifyingactivation value for all states 119904 if there exists 119899119904119895(119905) = 1 thenthe current state 119904119905 is identified as state 119904119895 and the previousstate 119904119905minus1 is identified as the previous state 119904119894 of state 119904119895 Theprevious identifying value of the state node 119895 is defined as119889119904119894 (119905) The previous state 119904119894 is 119894 = argmax(119887119904119894 (119905)) satisfyingcondition 1 and condition 2 Then we set 119889119904119894 (119905)

Condition 1

119899119904119895 (119905) = 1 (14)

Condition 2

120601 (119887119904119894 (119905) 120596119894119895) gt 120599 (15)

According to above three identifying processes the iden-tifying activation value of the state node 119895 is defined asfollows

119899119904119895 (119905) = max (119898119904119895 (119905) 119902119904119895 (119905) 119889119904119895 (119905)) (16)

According to Algorithm 11 we give Pseudocode 1 forAlgorithm 11

Thepseudocode forAlgorithm 12 is as follows the currenthistory sequence identifying algorithm (identifying phase)Compute the identifying activation value according to (16)

The pseudocode for Algorithm 13 is in Pseudocode 2

4 Simulation

41 A Simple Example of Deterministic POMDP We wantto construct the looping transition model as in Figure 1First we obtain the transition instance ℎ119905 1199001 1198861 1199001 1198862 11990021198862 1199002 1198861 1199001 1198861 1199001 According to Algorithm 11 set the initialidentifying history sequence length 119896 = 1 The STAMNmodel is constructed incrementally using ℎ119905 as in Figure 6

When ℎ119905 = 1199001 1198861 1199001 1198862 1199002 1198862 1199002 1198861 1199001 1198861 1199001 1198862 1199001because of 119896 = 1 the looped state is identified by the currentobservation vectorThus ℎ1 = 1199001 and ℎ2 = 1199001 are the iden-tifying history sequences for the state 1199041 Since 1199001 = tans(ℎ11198862) and 1199002 = tans(ℎ2 1198862) in ℎ119905 exist so as to trans(ℎ1 1198862) =trans(ℎ2 1198862) according to Lemma 9 ℎ1 and ℎ2 are the iden-tifying history sequences for the states 119904119894 and 119904119895 respectivelyand 119904119894 = 119904119895 So 1199001 is distinguished as 11990010158401 and 119900101584010158401 And ℎ119905 =1199001 1198861 11990010158401 1198862 1199002 1198862 1199002 1198861 1199001 1198861 119900101584010158401 1198862 1199001

According the Lemma 10 for any two identifying historysequences ℎ119894 and ℎ119895 separately for 11990010158401 and 119900101584010158401 there existsℎ119894 = ℎ119895 iff 119904119894 = 119904119895 So the identifying history sequencelength for ℎ119894 and ℎ119895 separately for 119904119894 and 119904119895 must increase to 2For 11990010158401 the identifying history sequence is 1199001 1198861 11990010158401 and for119900101584010158401 the identifying history sequence is 1199001 1198861 119900101584010158401 These twoidentifying history sequences are identical so longer transi-tion instance is needed

fo(s2) = o2

fo(s1) = o1

s1

s2

a1

a1

a2

a2

Figure 6 The STAMNmodel with memory depth 1

Table 1 The identifying history sequences for 11990010158401 and 119900101584010158401 Hidden state 11990010158401 (119896 = 2) 119900101584010158401 (119896 = 3)

Identifying history sequence1199001 1198861 11990010158401 1199002 1198861 1199001 1198861 119900101584010158401119900101584010158401 1198862 119900101584011199002 1198861 11990010158401

When ℎ119905 = 1199001 1198861 11990010158401 1198862 1199002 1198862 1199002 1198861 1199001 1198861 119900101584010158401 1198862 11990010158401 11988621199002 1198861 11990010158401 1198862 1199002 1198861 1199001 1198861 1199001 1198861 11990010158401 1198862 1199002 In ℎ119905 we obtain theidentifying history sequence ℎ119894 = ℎ119895 for states 11990010158401 and 119900101584010158401 For 119896 = 2 the distinguished identifying history sequence for11990010158401 is 119900101584010158401 1198862 11990010158401 and 1199002 1198861 11990010158401 However the distinguishedidentifying history sequence for 119900101584010158401 is 1199001 1198861 119900101584010158401 which is nota discriminated history sequence from the identifying historysequence for 11990010158401 So the identifying history sequence length for119900101584010158401 must increase to 3 The identifying history sequences for11990010158401 and 1199001 are described as in Table 1

According to 119896-step history identifying following iden-tifying and previous identifying ℎ119905 can be represented asℎ119905 = 119900101584010158401 1198861 11990010158401 1198862 1199002 1198862 1199002 1198861 11990010158401 1198861 119900101584010158401 1198862 11990010158401 1198862 1199002 1198861 119900101584011198862 1199002 1198861 11990010158401 1198861 119900101584010158401 1198861 11990010158401 1198862 1199002

According to Pseudocode 1 the STAMN model is con-structed incrementally using ℎ119905 as in Figure 7(a) and theLPST is constructed as Figure 7(b) The STAMN has thefewest nodes because the state nodes in STAMN representthe hidden state and the state nodes in LPST represent theobservation state

To illustrate the difference between the STAMN andLPST we present the comparison results in Figure 8

After 10 timesteps the algorithms use the current learnedmodel to realize the transition prediction and the transitionprediction criterion is expressed by the prediction error Thedata point represents the average prediction error over 10runs We ran three algorithms the STAMN with exhaustiveexploration the STAMN with random exploration and theLPST

10 Computational Intelligence and Neuroscience

Set initial memory depth 119896 = 1 119905 = 0All nodes and connections weight do not exist initially in STAMNThe transition instance ℎ0 = 120601A STAMN is constructed incrementally through interaction with the environment by the agentexpanding the SATAMN until the transition prediction is contradicted with the current minimalhypothesis model reconstruct the hypothesis model by increase the memory depth 119896 in thispaper the constructing process includes the identifying and recalling simultaneouslyWhile one pattern 119900119905 or 119886119905 can be activated from the pattern cognition layer do

ℎ119905 = (ℎ119905minus1 119900119905 119886119905)for all existed state nodescompute the identifying activation value by executing Algorithm 12If exist the 119899119904119895(119905) = 1 thenthe looped state is identified and the weight 120596119894119895120596119894119895 is adjusted according to (4) (5)

elsea new state node 119895 will be created set 119899119904119895(119905) = 1 and the weight 120596119894119895120596119894119895 is adjusted according to (4) (5)end ifend forfor all existed state nodescompute the transition prediction criterion by executing Algorithm 13If IsNotTrans(119904119895) then

for the previous winner state node 119894 of the state node 119895set memory depth 119896 = 119896 + 1 until each ℎ119894 are discriminated

end forreconstruct the hypothesis model by the new memory depth 119896 according to Algorithm 11

end ifend forfor all existed action nodescompute the activation value according to (9)

if exist the 119899119886119895 (119905) = 1 thenthe weight 119908119894119895(119905) is adjusted according to (6)else

a new action node 119895 will be created set 119899119886119895 (119905) = 1 and the weight 119908119894119895(119905) is adjusted according to (6)end ifend forThe agent obtains the new observation vectors by 119900119905+1 = trans(119904119905 119886119905)119905 = 119905 + 1endWhile

Pseudocode 1 Algorithm for heuristic constructing the STAMN

IsNotTrans(119904119895)if the 119901119904119895(119905) gt 120599 according to (12) and the 119902119904119895(119905) = 1 according to (13) thenreturn Falseelsereturn Trueend if

Pseudocode 2 Transition prediction criterion algorithm (recalling phase)

Figure 8 shows that three algorithms all can produce thecorrect model with zero prediction error at last because ofno noise However the STAMN with exhaustive explorationhas the better performance and the faster convergence speedbecause of the exhaustive exploration 119876-function

42 Experimental Comparison in 4 lowast 3 Grid Problem Firstwe use the small POMDP problem to test the feasibility of

the STAMN 4 lowast 3 gird problem is selected which is shownin Figure 9 The agent wanders inside the grid and has onlyleft sensor and right sensor to report the existence of a wallin the current position The agent has four actions forwardbackward turn left and turn rightThe reference direction ofthe agent is northward

In the 4 lowast 3 gird problem the hidden state space size|119878| is 11 the action space size |119860| is 4 for each state and

Computational Intelligence and Neuroscience 11

a1

fo(s3) = o1

fo(s2) = o2fo(s1) = o1 s1 s2

s3

a1a1

a1a2

a2

a2

(a)

a1a1

a1

a1

a1

a2

r

o1

o1

o1

o1

o1

o1

o2

o2

o2

(b)

Figure 7 (a) The STAMNmodel (b) The LPST model

0 20 40 60 80 1000

09

08

07

06

05

04

03

02

01

1 A simple deterministic POMDP

Timesteps

Pred

ictio

n er

ror

STAMN (exhaustive exploration)STAMN (random exploration)LPST

Figure 8 Comparative performance results for a simple determin-istic POMDP

the observation space size |119874| is 4 The upper figure in thegrid represents the observation state and the bottom figure inthe grid represents the hidden state The black grid and thewall are regarded as the obstacles We present a comparisonbetween the LPST and the STAMNThe results are shown inFigure 10 Figure 10 shows that the STAMN with exhaustiveexploration still has the better performance and the fasterconvergence speed in the 4 lowast 3 gird problem

The number of state nodes and action nodes in STAMNand LPST is described in Table 2 In STAMN the state node

20

34

01

08

27

13

02

110

09

16

25

Figure 9 4 lowast 3 gird problem

Table 2 The number of state nodes and action nodes in STAMNand LPST

Algorithm The number of statenodes The number of action nodes

LPST 70 50STAMN 11 44

represents the hidden state but in LPST the sate node repre-sents observation value and the hidden state is expressed by119896-step past observations and actions thus more observationnodes and action nodes are repeated created and most of theobservation nodes and action nodes are the same LPST hasthe perfect observation prediction capability and is the sameas the STAMN but is not the state transition model

In STAMN The setting of parameters 120583 120578 120574119904119894 120575119886119894 is veryimportant which determines the depth 119896 of short-term

12 Computational Intelligence and Neuroscience

0 200 400 600 800 1000Timesteps

STAMN (exhaustive exploration)STAMN (random exploration)LPST

0

09

08

07

06

05

04

03

02

01

1

Pred

ictio

n er

ror

4 lowast 3 grid problem

Figure 10 Comparative performance results for 4lowast3 gird problem

memory We assume 120574119904119894 = 120575119886119894 = 09 and 120583 = 120578 = 09 initialrepresenting 119896 = 1 When we need to increase 119896 to 119896 + 1 weonly need to decrease 120583 120578 according to 120578 = 120578minus02 120583 = 120583minus01The other parameters are determined relative easily We setlearning rates 120572 = 120573 = 120574 = 09 and set the confidenceparameters 119862119894119895 = 1|120591119895| 119863119894119895 = 12 representing the sameimportance degree Set constant values 119862 = 5 120599 = 09

43 Experimental Results in Complex Symmetrical Environ-ments The symmetrical environment in this paper is verycomplexwhich is shown in Figure 11(a)The robotwanders inthe environment By following wall behaviour the agent canrecognize the left wall right wall and corridor landmarksThese observation landmarks are different from the observa-tion vector in the previous grid problem every observationlandmark has different duration time For analysis of thefault tolerance and robustness of STAMN we assume thedisturbed environment is shown in Figure 11(b)

The figures in Figure 11(a) represent the hidden statesThe robot has four actions forward backward turn leftand turn right The initial orientation of the robot is shownin Figure 11(a) Since the robot follows the wall in theenvironment the robot has only one optional action in eachstate The reference direction of action is the robot currentorientation Thus the path 2-3-4-5-6 and path 12-13-14-15-16 are identical and the memory depth 119896 = 6 is necessaryto identify hidden states reliably We present a comparisonbetween the STAMNand the LPST in noise-free environmentand disturbed environment The results are shown as inFigures 12(a) and 12(b) Figure 12(b) shows that STAMN hasbetter noise tolerance and robustness than LPST because ofthe neuron synchronized activation mechanism which bearsthe fault and noise in the sequence item duration time andcan realize the reliable recalling However the LPST cannot

realize the correct convergence in reasonable time because ofaccurate matching

5 Discussion

In this section we compare the relatedworkwith the STAMNmodel The related work mainly includes the LPST and theAMN

51 Looping Prediction Suffix Tree (LPST) LPST is construc-ted incrementally by expanding branches until they are iden-tified or become looped using the observable terminationcriterion Given a sufficient history this model can correctlycapture all predictions of identifying histories and can mapthe all infinite identifying histories onto a finite LPST

The STAMNproposed in this paper is similar to the LPSTHowever the STAMN is looping hidden state transitionmodel so in comparison with LPST STAMN have lessstate nodes and action nodes because these nodes in LPSTare based on observation not hidden state FurthermoreSTAMNhas better noise tolerance and robustness than LPSTThe LPST realizes recalling by successive accurate matchingwhich is sensitive to noise and fault The STAMN offers theneuron synchronized activation mechanism to realize recall-ing Even if in noisy and disturbed environment STAMNcan still realize the reliable recalling Finally the algorithm forlearning a LPST is an additional computation model whichis not a distributed computational model The STAMN isa distributed network and uses the synchronized activationmechanism where performance cannot become poor withhistory sequences increasing and scale increasing

52 Associative Memory Networks (AMN) STAMN is pro-posed based on the development of the associative memorynetworks In existing AMN firstly almost all models areunable to handle complex sequence with looped hiddenstate The STAMN realizes the identifying the looped hiddenstate indeed which can be applied to HMM and POMDPproblems Furthermoremost AMNmodels can obtainmem-ory depth determined by experiments The STAMN offers aself-organizing incremental memory depth learningmethodand the memory depth 119896 is variable in different parts ofstate space Finally the existing AMN models general onlyrecord the temporal orders with discrete intervals Δ119905 ratherthan sequence items durationwith continuous-time STAMNexplicitly deals with the duration time of each item

6 Conclusion and Future Research

POMDP is the long-standing difficult problem in the mach-ine learning In this paper SATMN is proposed to identifythe looped hidden state only by the transition instances indeterministic POMDP environment The learned STAMN isseen as a variable depth 119896-Morkov model We proposed theheuristic constructing algorithm for the STAMN which isproved to be sound and complete given sufficient historysequences The STAMN is real self-organizing incrementalunsupervised learning model These transition instances can

Computational Intelligence and Neuroscience 13

2

3

6

7

8

9

10

11

12

54 14

13

21

22

23

24 18

19

20 16

171

15

(a) (b)

Figure 11 (a) Complex symmetrical simulation environment (b) The disturbed environment

0 100 200 300 400 500

A complex symmetrical environment

Timesteps

STAMNLPST

0

09

08

07

06

05

04

03

02

01

1

Pred

ictio

n er

ror

(a)

0 100 200 300 400 500

A disturbed environment

Timesteps

STAMNLPST

0

09

08

07

06

05

04

03

02

01

1Pr

edic

tion

erro

r

(b)

Figure 12 (a) Comparative results for a noise-free environment (b) Comparative results for a disturbed environment

be obtained by the interaction with the environment by theagent and can be obtained by a number of training datathat does not depend on the real agent The STAMN is veryfast robust and accurate We have also shown that STAMNoutperforms some existing hidden state methods in deter-ministic POMDP environment The STAMN can generallybe applied to almost all temporal sequences problem suchas simultaneous localization and mapping problem (SLAM)robot trajectory planning sequential decision making andmusic generation

We believe that the STAMN can serve as a starting pointfor integrating the associative memory networks with thePOMDP problem Further research will be carried out onthe following aspects how to scale our approach to thestochastic case by heuristic statistical test how to incorporatewith the reinforcement learning to produce a new distributed

reinforcement learning model and how it should be appliedto robot SLAM to resolve the practical navigation problem

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this article

References

[1] A K McCallum Reinforcement learning with selective percep-tion and hidden state [PhD thesis] University of RochesterRochester NY USA 1996

[2] MHutter ldquoFeature reinforcement learning part I unstructuredMDPsrdquo Journal of Artificial General Intelligence vol 1 no 1 pp3ndash24 2009

14 Computational Intelligence and Neuroscience

[3] M Daswani P Sunehag andMHutter ldquoFeature reinforcementlearning state of the artrdquo in Proceedings of the Workshops at the28th AAAI Conference on Artificial Intelligence 2014

[4] PNguyen P Sunehag andMHutter ldquoContext treemaximizingreinforcement learningrdquo in Proceedings of the 26th AAAI Con-ference onArtificial Intelligence pp 1075ndash1082 Toronto CanadaJuly 2012

[5] J Veness K S NgMHutterWUther andD Silver ldquoAMonte-Carlo AIXI approximationrdquo Journal of Artificial IntelligenceResearch vol 40 no 1 pp 95ndash142 2011

[6] M P Holmes and C L Isbell Jr ldquoLooping suffix tree-basedinference of partially observable hidden staterdquo in Proceedings ofthe 23rd International Conference on Machine Learning (ICMLrsquo06) pp 409ndash416 ACM Pittsburgh Pa USA 2006

[7] M Daswani P Sunehag andMHutter ldquoFeature reinforcementlearning using looping suffix treesrdquo in Proceedings of the 10thEuropeanWorkshop on Reinforcement Learning (EWRL rsquo12) pp11ndash24 Edinburgh Scotland 2012

[8] M Daswani P Sunehag and M Hutter ldquoQ-learning forhistory-based reinforcement learningrdquo in Proceedings of theConference on Machine Learning pp 213ndash228 2013

[9] S Timmer and M Riedmiller Reinforcement learning with his-tory lists [PhD thesis] University of Osnabruck OsnabruckGermany 2009

[10] E Talvitie ldquoLearning partially observable models using tempo-rally abstract decision treesrdquo in Advances in Neural InformationProcessing Systems pp 818ndash826 2012

[11] M Mahmud ldquoConstructing states for reinforcement learningrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 727ndash734 June 2010

[12] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquoNeural Computation vol 9 no 8 pp 1735ndash1780 1997

[13] V Mnih K Kavukcuoglu D Silver et al ldquoHuman-level controlthrough deep reinforcement learningrdquo Nature vol 518 no7540 pp 529ndash533 2015

[14] M Hausknecht and P Stone ldquoDeep recurrent Q-learning forpartially observable MDPsrdquo httparxivorgabs150706527

[15] K Narasimhan T Kulkarni and R Barzilay ldquoLanguage under-standing for text-based games using deep reinforcement learn-ingrdquo httpsarxivorgabs150608941

[16] J Oh X Guo H Lee R Lewis and S Singh ldquoAction-condi-tional video prediction using deep networks in atari gamesrdquohttparxivorgabs150708750

[17] X Li L Li J Gao et al ldquoRecurrent reinforcement learning ahybrid approachrdquo httparxivorgabs150903044

[18] D Wang and B Yuwono ldquoIncremental learning of complextemporal patternsrdquo IEEE Transactions on Neural Networks vol7 no 6 pp 1465ndash1481 1996

[19] A Sudo A Sato and O Hasegawa ldquoAssociative memory foronline learning in noisy environments using self-organizingincremental neural networkrdquo IEEE Transactions on NeuralNetworks vol 20 no 6 pp 964ndash972 2009

[20] M U Keysermann and P A Vargas ldquoTowards autonomousrobots via an incremental clustering and associative learningarchitecturerdquo Cognitive Computation vol 7 no 4 pp 414ndash4332014

[21] A F R Araujo and G D A Barreto ldquoContext in temporal seq-uence processing a self-organizing approach and its applicationto roboticsrdquo IEEE Transactions on Neural Networks vol 13 no1 pp 45ndash57 2002

[22] A L Freire G A Barreto M Veloso and A T Varela ldquoShort-term memory mechanisms in neural network learning of robotnavigation tasks A Case Studyrdquo in Proceedings of the 6th LatinAmerican Robotics Symposium (LARS rsquo09) pp 1ndash6 October2009

[23] S Tangruamsub A Kawewong M Tsuboyama and O Hase-gawa ldquoSelf-organizing incremental associative memory-basedrobot navigationrdquo IEICE Transactions on Information andSystems vol 95 no 10 pp 2415ndash2425 2012

[24] V A Nguyen J A Starzyk W-B Goh and D Jachyra ldquoNeuralnetwork structure for spatio-temporal long-term memoryrdquoIEEE Transactions on Neural Networks and Learning Systemsvol 23 no 6 pp 971ndash983 2012

[25] F Shen H Yu W Kasai and O Hasegawa ldquoAn associativememory system for incremental learning and temporal seq-uencerdquo in Proceedings of the International Joint Conference onNeural Networks (IJCNN rsquo10) pp 1ndash8 IEEE Barcelona SpainJuly 2010

[26] F ShenQOuyangWKasai andOHasegawa ldquoA general asso-ciative memory based on self-organizing incremental neuralnetworkrdquo Neurocomputing vol 104 pp 57ndash71 2013

[27] B Khouzam ldquoNeural networks as cellular computing modelsfor temporal sequence processingrdquo Suplec 2014

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 3: Research Article A Self-Organizing Incremental ...downloads.hindawi.com › journals › cin › 2016 › 7158507.pdfResearch Article A Self-Organizing Incremental Spatiotemporal Associative

Computational Intelligence and Neuroscience 3

fo(s3) = o1

fo(s2) = o2

fo(s1) = o1

s1 s2

s3

a1

a1

a2

a2

a1 a2

Figure 1 A simple example of deterministic POMDP

sequence ℎ2 for 1199042 is expressed by 1199002 and an identifyinghistory sequence ℎ1 for 1199041 is expressed by 1199002 1198861 1199001 1198861 1199001Note that there may exist infinitely many identifying historysequences ℎ119894 for 119904119894 because the environment is stronglyconnected and may exist as unbounded long identifyinghistory sequences ℎ119894 for 119904119894 because of uninformative loopingSo this leads us to determine the minimal identifying historysequences length 119896

Definition 2 (minimal history sequences length 119896) The min-imal identifying history sequences length 119896 for the hiddenstate 119904119894 is defined as follows The minimal identifying historysequence ℎ for the hidden state 119904119894 is the identifying historysequence ℎ such that no suffix ℎ1015840 of ℎ is also identified for 119904119894However theminimal identifying history sequencemay haveunbounded length because the identifying history sequencecan include looping arbitrarily many times which merelylengthens the history sequence For example two identifyinghistory sequences ℎ1 and ℎ10158401 for 1199041 are separately expressedto 1199002 1198861 1199001 1198861 1199001 and 1199002 1198861 1199001 1198861 1199001 1198861 1199001 1198861 1199001 In thissituation 1199001 1198861 1199001 1198861 1199001 is treated as the looped item Sothe minimal identifying history sequences length 119896 for thehidden state 119904119894 is the minimal length value of all identifyinghistory sequences by excising the looping portion

Definition 3 (a criterion for identifying history sequence)Given a sufficient history sequence for the set of historysequences for the hidden state 119904 if each history sequence ℎfor the hidden state 119904 exists which satisfies that trans(ℎ 119886)is the same observation for each action thus we considerthe history sequence ℎ as identifying history sequenceDefinition 3 is correct iff a full sufficient history sequence isgiven

Lemma 4 (backward identifying history sequence) Weassume an identifying history sequence ℎ is for a single hiddenstate 119904 if ℎ1015840 is a backward extension of ℎ by adding an action119886 isin 119860 and an observation 119900 isin 119874 then ℎ1015840 is a new identifying

history sequence for the single state 1199041015840 = 119891119904(119904 119886) 1199041015840 is called theprediction state of 119904Proof We assume ℎ1015840 is a history sequence 1199001 1198861 119886119905minus1 119900119905generated by the transition function 119891119904 and observationfunction119891119900 And ℎ is a history sequence 1199001 1198861 119886119905minus2 119900119905minus1as a prefix of ℎ1015840 Since ℎ is identifying history sequence for 119904it must be that 119904119905minus1 = 119904 Because the environment is deter-ministic POMDP thus the next state 1199041015840 is uniquely deter-mined by 119891119904(119904119905minus1 119886119905minus1) = 119904119905 after the action 119886119905minus1 has beenexecuted Then ℎ1015840 is a new identifying history sequence forthe single state 119904119905 = 119891119904(119904119905minus1 119886119905minus1) = 119891119904(119904 119886119905minus1) = 1199041015840Lemma5 (forward identifying history sequence) Weassumean identifying history sequence ℎ1015840 is for a single hidden state 1199041015840if ℎ is a prefix of ℎ1015840 by removing the last action 119886 isin 119860 andthe last observation 119900 isin 119874 then ℎ is the identifying historysequence for the single state s such that 1199041015840 = 119891119904(119904 119886) s is calledthe previous state of 1199041015840Proof We assume ℎ1015840 is a history sequence 1199001 1198861 119886119905minus1 119900119905generated by the transition function119891119904 and observation func-tion 119891119900 And ℎ is a history sequence 1199001 1198861 119886119905minus2 119900119905minus1 as aprefix of h by removing the action 119886119905minus1 and the observation 119900119905Since ℎ1015840 is an identifying history sequence for the state 1199041015840 itmust be that 119904119905 = 1199041015840 Because the environment is deterministicPOMDP thus the current state 1199041015840 is uniquely determinedby the previous state 119904 such that 119891119904(119904119905minus1 119886119905minus1) = 1199041015840 Then ℎis the identifying history sequence for the state 119904 such that119891119904(119904119905minus1 119886119905minus1) = 119891119904(119904 119886119905minus1) = 1199041015840Theorem 6 Given the sufficient history sequences a finitestrongly connected deterministic POMDP environment canbe represented soundly by a minimal looping hidden statetransition model

Proof We assume 119864 = ⟨119878 119860 119874 119891119904 119891119900⟩ where 119878 is the finiteset of hidden world states and |119878| is the number of the hiddenstates First for all hidden states at least a finite lengthidentifying history sequence ℎ for one state 119904 exists becausethe environment is strongly connected there must exist atransition history ℎ1015840 from 119904 to 1199041015840 according to Lemma 4by backward extending of ℎ by adding ℎ1015840 which is theidentifying history sequence for 1199041015840 Since the hidden statetransition model can identify the looping hidden state thereexists the maximal transition history sequence length from 119904to 1199041015840 which is |119878| Thus this model has minimal hidden statespace with |119878|

Since the hidden state transition model can correctlyrealize the hidden state transition 1199041015840 = 119891119904(119904 119886) this modelcan correctly express all identifying history sequences for allhidden state 119904 and possesses the perfect transition predictioncapacity and this model has minimal hidden state space with|119878| This model is a 119896-step variable history length modeland the memory depth 119896 is a variable value for the differenthidden state This model is realized by the spatiotemporalassociative memory networks in Section 3

If we want to construct correct hidden state transi-tion model first we necessary to collect sufficient history

4 Computational Intelligence and Neuroscience

sequences to perform statistic test to determine the minimalidentifying history sequence length 119896 However in practiceit can be difficult to obtain a sufficient history So we proposea heuristic state transition model constructing algorithmWithout the sufficient history sequence the algorithm mayproduce a premature state transitionmodel but this model atleast is correct for the past experienceWe realize the heuristicconstructing algorithm by way of two definitions and twolemmas as follows

Definition 7 (current history sequence) A current historysequence is represented by the 119896-step history sequence119900119905minus119896+1 119886119905minus119896+1 119900119905minus119896+2 119886119905minus1 119900119905 generated by the transitionfunction 119891119904 and the observation function 119891119900 At 119905 = 0 thecurrent history sequence ℎ0 is empty 119900119905 is the observationvectors at current time 119905 and 119900119905minus119896+1 is the observation vectorsthat precede 119896-step at time 119905Definition 8 (transition instance) The history sequence asso-ciated with time 119905 is captured as a transition instanceThe transition instance is represented by the tuple ℎ119905+1 =(ℎ119905 119886119905 119900119905+1) where ℎ119905 and ℎ119905+1 are current history sequencesoccurring at times 119905 and 119905 + 1 on episode A set of transitioninstances will be denoted by the symbol F which possiblycontains transitions from different episodes

Lemma 9 For any two hidden states 119904119894 and 119904119895 119904119894 = 119904119895 ifftrans(119904119894 119886) = trans(119904119895 119886)

This lemma gives us a sufficient condition to determine 119904119894 =119904119895 However this lemma is not a necessary condition

Proof by Contradiction We assume that 119904119894 = 119904119895 thus for allactions sequences 119889119894119895 = 1198861 1198862 119886119899 there exists 120574(119904119894 119889119894119895) =120574(119904119895 119889119894119895) 120574(119904119894 119889119894119895) = 120574(119904119895 119889119894119895) is contradicting with trans(119904119894119886) = trans(119904119895 119886) Thus original proposition is true

Lemma 10 For any two identifying history sequences ℎ119894 and ℎ119895separately for 119904119894 and 119904119895 there exists ℎ119894 = ℎ119895 iff 119904119894 = 119904119895 HoweverLemma 10 is not a necessary condition

Proof by Contradiction Since an identifying history sequenceℎ is considered to uniquely identify the hidden state 119904 theidentifying history sequences ℎ119894 uniquely identify the hiddenstate 119904119894 and the identifying history sequences ℎ119895 uniquelyidentify the hidden state 119904119895 We assume ℎ119894 = ℎ119895 thus theremust exit 119904119894 = 119904119895 which is contradicting with 119904119894 = 119904119895 Thusoriginal proposition is true

However the necessary condition for Lemma 10 is notalways true because there maybe exist several differentidentifying history sequences for the same hidden state

Algorithm 11 (a heuristic constructing algorithm for thestate transition model) The initial state transition model isconstructed making use of the minimal identifying historysequence length 119896 = 1 And the model is empty initially

The transition instance ℎ is empty and 119905 = 0(1) We assume the given model can perfectly identify the

hidden state The agent makes a step in the environ-ment (according to history sequence definition thefirst item in ℎ is the observation vector) It recordsthe current transition instance on the end of the chainof transition instance And at each time 119905 for thecurrent history sequence ℎ119905 execute Algorithm 12 ifthe current state 119904 is looped to the hidden state 119904119894 thengo to step (2) otherwise a new node 119904119894 is created in themodel go to step (4)

(2) According to Algorithm 13 if trans(ℎ119905 119886119895) = trans(ℎ119894119886119895) then the current state 119904 is distinguished from theidentifying sate 119904119894 then go to step (3) If there existstrans(ℎ119905 119886119895) = trans(ℎ119894 119886119895) then go to step (4)

(3) According to Lemma 10 for any two identifying his-tory sequences ℎ119894 and ℎ119895 separately for 119904119894 and 119904119895 thereexists ℎ119894 = ℎ119895 iff 119904119894 = 119904119895 So the identifying historysequence length for ℎ119894 and ℎ119895 separately for 119904119894 and 119904119895must increase to 119896+1until ℎ119894 and ℎ119895 are discriminatedWe must reconstruct the model based on the newminimal identifying history sequence length 119896 and goto step (1)

(4) If the action node and corresponding 119876 function forthe current identifying state node exist in the modelthe agent chooses its next action node 119886119895 based on theexhaustive exploration 119876 function If the action nodeand corresponding 119876 function do not exist the agentchooses a random action 119886119895 instead and a new actionnode is created in the model After the action controlsignals to delivery the agent obtains the new observa-tion vectors by trans(119904119894 119886119895) go to step (1)

(5) Steps (1)ndash(4) continue until all identifying historysequences ℎ for the same hidden state 119904 can correctlypredict the trans(ℎ 119886) for each action

Note we apply the 119896-step variable history length from119896 = 1 to 119899 and the history length 119896 is a variable value for thedifferent hidden state We adopt the minimalist hypothesismodel in constructing the state transitionmodel processThisconstructing algorithm is a heuristic If we adopt the exhaus-tiveness assumptionmodel the probability of missing loopedhidden state increases exponentially and many valid loopscan be rejected yielding larger redundant state and poorgeneralization

Algorithm 12 (the current history sequence identifying algo-rithm) In order to construct the minimal looping hiddenstate transition model we need to identify the looped stateby identifying hidden state Current history sequence with119896-step is needed According to Definition 7 current 119896-stephistory sequence at time 119905 is expressed by 119900119905minus119896+1 119886119905minus119896+1119900119905minus119896+2 119886119905minus1 119900119905 There exist three identifying processes

Computational Intelligence and Neuroscience 5

(1) 119896-Step History Sequence Identifying If the identifyinghistory sequence for 119904119894 is 119900119894minus119896+1 119886119894minus119896+1 119900119894minus119896+2 119886119894minus1 119900119894satisfy

119900119905 = 119900119894119886119905minus1 = 119886119894minus1

119900119905minus119896+2 = 119900119894minus119896+2119886119905minus119896+1 = 119886119894minus119896+1119900119905minus119896+1 = 119900119894minus119896+1

(1)

Thus the current state 119904 is looped to the hidden state119904119894 In the STAMN the 119896-step history sequence identifyingactivation value119898119904119895(119905) is computed

(2) Following Identifying If the current state 119904 is identified asthe hidden state 119904119894 the next transition history ℎ119905+1 is repre-sented by ℎ119905+1 = (ℎ119905 119886119905 119900119905+1) through trans(119904119894 119886119905) Accord-ing to Lemma 4 the next transition history ℎ119905+1 is identifiedas the transition prediction state 119904119895 = 119891119904(119904119894 119886119905)

In the STAMN the following activation value 119902119904119895(119905) iscomputed

(3) Previous Identifying If there exists a transition instanceℎ119905+1 = (ℎ119905 119886119905 119900119905+1) ℎ119905+1 is an identifying history sequencefor state 119904119895 According to Lemma 5 the previous state 119904119894 isuniquely identified by ℎ119905 such that 119904119895 = 119891119904(119904119894 119886119905) So if thereexists a transition prediction state 119904119895 and 119904119895 = 119891119904(119904119894 119886119905) thenthe previous state 119904119894 is uniquely identified

In the STAMN the previous activation value 119889119904119895(119905) iscomputed

Algorithm 13 (transition prediction criterion) If the modelcorrectly represents the environment model as Morkovchains then for all state 119904 for all identifying history sequenceℎ for the same state it is satisfied that trans(ℎ 119886) is the sameobservation for the same action

If ℎ1 and ℎ2 are the identifying history sequences with 119896-step memory for the hidden state 119904 there exist trans(ℎ1 119886) =trans(ℎ2 119886) according to Lemma 9 thus 119904119894 is discriminativewith 119904119895

In the STAMN the transition prediction criterion isrealized by the transition prediction value Psj(t)

3 Spatiotemporal AssociativeMemory Networks

In this section we want to relate the spatiotemporal sequenceproblem to the idea of identifying the hidden state by asequence of past observations and actions An associativememory network (AMN) is expected to have several charac-teristics (1) An AMN must memorize incrementally whichhas the ability to learn new knowledge without forgettingthe learned knowledge (2) An AMN must be able to notonly record the temporal orders but also record sequence

items duration time with continuous time (3) AnAMNmustbe able to realize the heteroassociation recall (4) An AMNmust be able to process the real-valued feature vectors inbottom-up method not the symbolic items (5) An AMNmust be robust and must be able to recall the sequencecorrectly with incomplete or noisy input (6) An AMN canrealize learning and recalling simultaneously (7) An AMNcan realize interaction between STM and LTM dual-tracetheory suggested that the persistent neural activities of STMcan lead to LTM

The first thing to be defined is the temporal dimension(discrete or continuous) The previous researches are mostlybased on a regular intervals of Δ119905 and few AMN have beenproposed to deal with not only the sequential characteristicbut also sequence items duration time with continuous timeHowever this characteristic is important for many problemsIn speech production writing music generation and motor-planning and so on the sequence item duration time andsequence item repeated emergence have essential differentmeaning For example ldquo1198614rdquo and ldquo119861-119861-119861-119861rdquo are exactlydifferent the former represents that item 119861 sustains for 4timesteps and the latter represents that item 119861 repeatedlyemerges for 4 times STAMN can explicitly distinguish thesetwo temporal characteristics So a history of past observationsand actions can be expressed by a special spatiotemporalsequence ℎ ℎ = 119900119905minus119896+1 119886119905minus119896+1 119900119905minus119896+2 119886119905minus1 119900119905 where 119896is the length of sequence of ℎ the items of ℎ include theobservation items and the action items where 119900119905 denotes thereal-valued observation vectors generated by taking action119886119905minus1 119886119905 denotes the action taken by the agent at time 119905where 119905 does not represent temporal discrete dimension bysampling at regular intervals Δ119905 but represents the 119905th step inthe continuous-time dimension which is the duration timebetween the current item and the next one

A spatiotemporal sequence can classified as simple seq-uence and complex sequence Simple sequence is a sequencewithout repeated items for example the sequence ldquo119861-119860-119864-119871rdquo is a simple sequence whereas those containing repeateditems are defined as the complex sequence In complexsequence the repeated item can be classified as looped itemsand discriminative items by identifying the hidden statefor example in the history sequence ldquo119883-119884-119885-119884-119885rdquo ldquo119884rdquoand ldquo119885rdquo maybe the looped items or discriminative itemsIdentifying the hidden state needs to introduce the contextualinformation resolving by the 119896-step memory The memorydepth 119896 is not fixed and 119896 is a variable value in different partsof state space

31 Spatiotemporal Associative Memory Networks Architec-ture We build a new spatiotemporal associative memorynetwork (STAMN) This model makes use of neuron activitydecay of nodes to achieve short-term memory connectionweights between different nodes to represent long-termmemory presynaptic potentials and neuron synchronizedactivation mechanism to realize identifying and recallingand a time-delayed Hebb learning mechanism to fulfil theone-shot learning

6 Computational Intelligence and Neuroscience

nsi

bsi

naj

baj

wij 120596ij

120596ij

Figure 2 STAMN structure diagram

STAMN is an incremental possibly looping nonfullyconnected asymmetric associative memory network Non-homogeneous nodes correspond to hidden state nodes andaction nodes

For the state node 119895 the input values are defined asfollows (1) the current observation activation value 119887119904119895(119905)responsible for matching degree of state node 119895 and currentobserved value which is obtained from the preprocessingneural networks (if the matching degree is greater thana threshold value then 119887119904119895(119905) = 1) (2) the observationactivation value 119887119904119895(119905) of the presynaptic nodes set 120591119895 of currentstate node 119895 (3) the identifying activation value 119899119904119894 (119905) of theprevious state node 119894 of state node 119895 (4) the activation value119899119886119894 (119905) of the previous action node 119894 of state node 119895 The outputvalues are defined as follows (1) The identifying activationvalue 119899119904119895(119905) represents whether the current state 119904 is identifiedto the hidden state 119904119895 or not (2) The transition predictionvalue 119901119904119895(119905) represents whether the state node 119895 is the currentstate transition prediction node or not

For the action node 119895 the input values are defined asthe current action activation value 119887119886119895 (119905) responsible formatching degree of action node 119895 and current motor vectorsThe output value is activation value of the action nodes 119899119886119895 (119905)and indicates that the action node 119895has been selected by agentto control the robotrsquos current action

For the STAMN all nodes and connections weight donot necessarily exist initially The weights 120596119894119895(119905) and 120596119894119895(119905)connected to state nodes can be learned by a time-delayedHebb learning rules incrementally representing the LTMThe weight 119908119894119895(119905) connected to action nodes can be learnedby reinforcement learning All nodes have activity self-decay mechanism to record the duration time of this noderepresenting the STM The output of the STAMN is thewinner state node or winner action node by winner takes all

STAMN architecture is shown in Figure 2 where blacknodes represent action nodes and concentric circles nodesrepresent state nodes

32 Short-TermMemory We using self-decay of neuron acti-vity to accomplish short-term memory no matter whetherobservation activation 119887119904119894 (119905) or identifying activation 119899119904119894 (119905)The activity of each state node has self-decay mechanismto record the temporal order and the duration time of thisnode Supposing the factor of self-decay is 120574119904119894 isin (0 1) and theactivation value 119887119904119894 (119905) 119899119904119894 (119905) isin [0 1] the self-decay process isshown as

119887119904119894 (119905) 119899119904119894 (119905)

= 0 119887119904119894 (119905) 119899119904119894 (119905) ⩽ 120578120574119904119894 119887119904119894 (119905 minus 1) 120574119904119894 119899119904119894 (119905 minus 1) else

(2)

The activity of action node also has self-decay character-istic Supposing the factor of self-decay is 120575119886119894 isin (0 1) and theactivation value 119887119886119894 (119905) 119899119886119894 (119905) isin [0 1] the self-decay process isshown as

119887119886119894 (119905) 119899119886119894 (119905)

= 0 119887119886119894 (119905) 119899119886119894 (119905) ⩽ 120583120575119886119894 119887119886119894 (119905 minus 1) 120575119886119894 119899119886119894 (119905 minus 1) else

(3)

wherein 120578 and 120583 are active thresholds and 120574119904119894 and 120575119886119894 areself-decay factors Both determine the depth of short-termmemory 119896 where 119905 is a discrete time point by sampling atregular intervals Δ119905 and Δ119905 is a very small regular interval

33 Long-Term Memory Long-term memory can be classi-fied into semantic memory and episodic memory Learningof history sequence is considered as the episodic memorygenerally adopting the one-shot learning to realize We usethe time-delayed Hebb learning rules to fulfil the one-shotlearning

(1) The 119896-Step Long-Term Memory Weight 120596119894119895(119905) The weight120596119894119895(119905) connected to state nodes 119895 represents 119896-step long-termmemory This is a past-oriented behaviour in the STAMNThe weight 120596119894119895(119905) is adjusted according to

120596119894119895 (119905)119894isin120591119895

119899119904119895(119905)=1

= 119899119904119895 (119905) 119899119883119894 (119905) 120596119894119895 (119905 minus 1) = 0 119883 isin 119904 119886120596119894119895 (119905 minus 1) + 120572 (119899119883119894 (119905) minus 120596119894119895 (119905 minus 1)) else (4)

where 119895 is the current identifying activation state node119899119904119895(119905) = 1 Because the identifying activation process is aneuron synchronized activation process when 119899119904119895(119905) = 1 all

nodes whose 119899119904119894 (119905) and 119899119886119894 (119905) are not zero are the contextualinformation related to state node 119895 These nodes whose 119899119904119894 (119905)and 119899119886119894 (119905) are not zero are considered as the presynaptic node

Computational Intelligence and Neuroscience 7

set of current state node 119895The presynaptic node set of currentstate node 119895 is expressed by 120591119895 where 119899119904119894 (119905) 119899119886119894 (119905) representthe activation value at time 119905 and 119899119904119894 (119905) 119899119886119894 (119905) record not onlythe temporal order but also the duration time because of theself-decay of neuron activity If 119899119904119894 (119905) 119899119886119894 (119905) are smaller thenode 119894 is more early to current state node 119895120596119894119895(119905) is activationweight between presynaptic node 119894 and state node 119895 120596119894119895(119905)records context information related to state node 119895 to be usedin identifying and recalling where 120596119894119895(119905) isin [0 1] 120596119894119895(0) = 0and 120572 is learning rate

The weight 120596119894119895(119905) is time-related contextual informationthe update process is shown as in Figure 3

(2) One-Step Transition Prediction Weight 120596119894119895(119905) The weight120596119894119895(119905) connected to state nodes 119895 represents one-step transi-tion prediction in LTM This is a future-oriented behaviourin the STAMN Using the time-delayed Hebb learning rulesthe weight 120596119894119895(119905) is adjusted according to

120596119894119895 (119905)119894=argmax 119899119883

119894(119905)

119899119904119895(119905)=1

=

119899119904119895 (119905) 119899119883119894 (119905) 120596119894119895 (119905 minus 1) = 0 119883 isin 119904 119886120596119894119895 (119905 minus 1) + 120573 (119899119883119894 (119905) minus 120596119894119895 (119905 minus 1)) else (5)

The transition activation of current state node 119895 is onlyassociated with the previous winning state node and actionnode where 119895 is the current identifying activation state node119899119904119895(119905) = 1 The previous winning state node and action nodeare presented by 119894 = argmax119883119894 (119905) 119883 isin 119904 119886 where 120596(119905) isin[0 1] and 120596(0) = 0 120573 is learning rate

The weight 120596119894119895(119905) is one-step transition prediction infor-mation the update process is shown as in Figure 4

(3) The Weight 119908119894119895(119905) Connected to Action NodesThe activa-tion of action node is only associated with the correspondingstate node which selects this action node directly We assumethe state node with maximal identifying activation value isthe state node 119894 at the time 119905 so the connection weight 119908119894119895(119905)connected to current selected action node 119895 is adjusted by

119908119894119895 (119905)119894=argmax 119899119904

119894(119905)

119899119886119895(119905)=1

=

1198760 (119904119894 119886119895) 119908119894119895 (119905 minus 1) = 0minus119876 max const 119886119895 is a invalid action

119876119905 (119904119894 119886119895) else(6)

where 119908119894119895(119905) is represented by 119876 function 119876119905(119904119905 119886119895) Thispaper only discusses how to build generalized environmentmodel not learning the optimal policy so this value is setto be the exhaustive exploration 119876 function based on thecuriosity reward The curiosity reward is described by (7)When action 119886119895 is an invalid action 119908119894119895(119905) is defined to bea large negative constant which is avoided going to the deadends

119903 (119905) = 119862 minus 119899119905

119899ave 119899ave gt 0119862 119899ave = 0

(7)

where 119862 is a constant value 119899119905 represents the count ofexploration to the current action by the agent and 119899ave is theaverage count of exploration to all actions by the agent 119899119905119899averepresents the degree of familiarity with the current selectedaction The curiosity reward is updated when each action is

finished 119876119905(119904119894 119886119895) update equation is showed by (8) where 120582is learning rate

119876119905 (119904119894 119886119895)

= (1 minus 120582)119876119905minus1 (119904119894 119886119895) + 120582119903 (119905) 119894 = argmax 119899119904119894 (119905) 119899119886119895 (119905) = 1119876119905minus1 (119904119894 119886119895) else

(8)

The update process is shown as in Figure 5The action node 119895 is selected by 119876119905(119904119905 119886119895) according to

119886119895 = argmax119895isin119860vl119894119899119904119894 (119905)=1

119876(119904119894 119886119895) (9)

where 119860vl119894 is the valid action set of state node 119894 119899119904119894 (119905) = 1

represent that the sate node 119894 is identified and the action node119895 was selected by agent to control the robotrsquos current actionset 119899119886119894 (119905) = 1

34 The Constructing Process in STAMN In order to con-struct the minimal looping hidden state transition model weneed to identify the looped state by identifying hidden stateThere exist identifying phase and recalling phase (transitionprediction phase) simultaneously in the constructing processin STAMN There is a chicken and egg problem during theconstructing process the building of the STAMN dependson state identifying conversely state identifying dependson the current structure of the STAMN Thus exhaustiveexploration and 119896-step variable memory length (depends onthe prediction criterion) are used to try to avoid local minimathat this interdependent causes

According to Algorithm 12 The identifying activationvalue of state node 119904119895 depends on three identifying processes119896-step history sequence identifying following identifyingand previous identifying We provide calculation equationsfor each identifying process

(1) 119896-Step History Sequence IdentifyingThe matching degreeof current 119896-step history sequence with the identifying

8 Computational Intelligence and Neuroscience

ojminus1 ajminus1120596ij ojmiddot middot middot ns

j(t) = 1

i isin Γj nsi(t) = 0 na

i = 0

ojminusk+1 ajminusk+1

Figure 3 Past-oriented weights 120596119894119895(119905) update process associatedwith state node 119895

oi ojai

120596ij(t)nsj(t) = 1

i = arg max nXi (t)

Figure 4 Future-oriented weights 120596119894119895(119905) update process associatedwith state node 119895

oi ajQ0(si aj)

naj (t) = 1

i = arg max nsi (t)

Figure 5 Weights 119908119894119895(119905) updating process associated with actionnode 119895

history sequence ℎ119895 for 119904119895 is computed to identify the loopingstate node 119895 First we compute the presynaptic potential119881119895(119905)for the state node 119895 according to

119881119895 (119905) = sum119894isin120591119895

119862119894119895 (119887119883119894 (119905) 120596119894119895) 119883 isin 119904 119886 (10)

where 119887119883119894 (119905) is the current activation value in the presynapticnode set 120591119895 119862119894119895 is the confidence parameter which meansthe node 119894rsquos importance degree in the presynaptic node set120591119895 The value of 119862119894119895 can be set in advance and sum119894isin120591119895 119862119894119895 =1 The function represents the similar degree betweenactivation value of the presynaptic node and the contextualinformation 120596119894119895 in LTM (119909 119910) isin [0 1] the similar degreeis high between 119909 and 119910 thus(119909 119910) is close to 1 Accordingto ldquowinner takes allrdquo among all nodes whose presynapticpotentials exceed the threshold 120599 the node with the largestpresynaptic potential will be selected

The presynaptic potential 119881119895(119905) represents the syn-chronous activation process of presynaptic node set of 120591119895

which represents the previous 119896 minus 1 step contextual infor-mation matching of state node 119895 To realize all 119896-step historysequence matching the 119896-step history sequence identifyingactivation value of the state node 119895 is given below

119898119904119895 (119905) = 119867 (119881119895 (119905) 120599)

sdot 119867(119881119895 (119905) max1198961rarr119873119888

(119881119896 (119905))) 119887119904119895 (119905)

119867 (119909 119910) = 0 119909 lt 1199101 119909 ge 119910

(11)

where max1198961rarr119873119888(119881119896(119905)) is the maximum potential value ofnode 119896 119867(119881119895(119905) 120599) sdot 119867(119881119895(119905)max1198961rarr119873119888(119881119896(119905))) means thenode with the largest potential value is selected among allnodes whose synaptic potentials exceed the threshold 120599 119887119904119895(119905)represents the matching degree of state node 119895 and currentobserved value If 119898119904119895(119905) = 1 the current state 119904 is identifiedto looped state node 119895 by the 119896 step memory

(2) Following Identifying If the current state 119904 is identifiedas the state 119904119894 then the next transition prediction state isidentified as the state 119904119895 = 119891119904(119904119894 119886119894) First we compute thetransition prediction value for the state node 119895 according to

119901119904119895 (119905) = sum119894=argmax 119899119883119894 (119905)

119863119894119895120601 (119899119883119894 (119905) 120596119894119895) 119883 isin 119904 119886 (12)

where 119899119883119894 (119905) is the identifying activation value of the statenode and action node at time 119905 119894 = argmax 119899119883119894 (119905) indicatesstate node 119894 and action node 119894 are the previous winner nodes119863119894119895 is the confidence parameter which means the node 119894rsquosimportance degree The value of 119863119894119895 can be set in advanceand sum119894=arg max 119899119883

119894(119905)119863119894119895 = 1 120596119894119895 records one-step transition

prediction information related to state node 119895 to be usedin identifying and recalling phase If the current state 119904 isidentified as the hidden state 119904119894119901119904119895(119905) represent the probabilityof the next transition prediction state is 119895

If the next prediction state node 119895 is the same as thecurrent observation value the current history sequence isidentified as the state node 119895 the following identifying valueof the state node 119895 is given below

119902119904119895 (119905) = 119867 (119901119904119895 (119905) 120599)

sdot 119867(119901119904119895 (119905) max1198961rarr119873119888

(119901119904119896 (119905))) 119887119904119895 (119905) (13)

If 119902119904119895(119905) = 1 the current history sequence ℎ119905 is identifiedto looped state node 119895 by the following identifying If 119901119904119895(119905) gemax1198961rarr119873119888(119901119904119896(119905)) 119901119904119895(119905) ge 120599 and 119902119904119895(119905) = 1 then there exists119887119904119895(119905) = 1 representing mismatching of state node 119895 and cur-rent observed value There exist trans(ℎ119905 119886119894) = trans(ℎ119894 119886119894)According to transition prediction criterion (Algorithm 13)the current history sequence ℎ119905 is not identified by the hidden

Computational Intelligence and Neuroscience 9

sate 119904119894 so the identifying history sequence length for ℎ119905 andℎ119894 separately must increase to 119896 + 1(3) Previous Identifying If the current history sequence isidentified as the state 119904119895 then the previous state 119904119894 is identifiedsuch that 119904119895 = 119891119904(119904119894 119886119894) First we compute the identifyingactivation value for all states 119904 if there exists 119899119904119895(119905) = 1 thenthe current state 119904119905 is identified as state 119904119895 and the previousstate 119904119905minus1 is identified as the previous state 119904119894 of state 119904119895 Theprevious identifying value of the state node 119895 is defined as119889119904119894 (119905) The previous state 119904119894 is 119894 = argmax(119887119904119894 (119905)) satisfyingcondition 1 and condition 2 Then we set 119889119904119894 (119905)

Condition 1

119899119904119895 (119905) = 1 (14)

Condition 2

120601 (119887119904119894 (119905) 120596119894119895) gt 120599 (15)

According to above three identifying processes the iden-tifying activation value of the state node 119895 is defined asfollows

119899119904119895 (119905) = max (119898119904119895 (119905) 119902119904119895 (119905) 119889119904119895 (119905)) (16)

According to Algorithm 11 we give Pseudocode 1 forAlgorithm 11

Thepseudocode forAlgorithm 12 is as follows the currenthistory sequence identifying algorithm (identifying phase)Compute the identifying activation value according to (16)

The pseudocode for Algorithm 13 is in Pseudocode 2

4 Simulation

41 A Simple Example of Deterministic POMDP We wantto construct the looping transition model as in Figure 1First we obtain the transition instance ℎ119905 1199001 1198861 1199001 1198862 11990021198862 1199002 1198861 1199001 1198861 1199001 According to Algorithm 11 set the initialidentifying history sequence length 119896 = 1 The STAMNmodel is constructed incrementally using ℎ119905 as in Figure 6

When ℎ119905 = 1199001 1198861 1199001 1198862 1199002 1198862 1199002 1198861 1199001 1198861 1199001 1198862 1199001because of 119896 = 1 the looped state is identified by the currentobservation vectorThus ℎ1 = 1199001 and ℎ2 = 1199001 are the iden-tifying history sequences for the state 1199041 Since 1199001 = tans(ℎ11198862) and 1199002 = tans(ℎ2 1198862) in ℎ119905 exist so as to trans(ℎ1 1198862) =trans(ℎ2 1198862) according to Lemma 9 ℎ1 and ℎ2 are the iden-tifying history sequences for the states 119904119894 and 119904119895 respectivelyand 119904119894 = 119904119895 So 1199001 is distinguished as 11990010158401 and 119900101584010158401 And ℎ119905 =1199001 1198861 11990010158401 1198862 1199002 1198862 1199002 1198861 1199001 1198861 119900101584010158401 1198862 1199001

According the Lemma 10 for any two identifying historysequences ℎ119894 and ℎ119895 separately for 11990010158401 and 119900101584010158401 there existsℎ119894 = ℎ119895 iff 119904119894 = 119904119895 So the identifying history sequencelength for ℎ119894 and ℎ119895 separately for 119904119894 and 119904119895 must increase to 2For 11990010158401 the identifying history sequence is 1199001 1198861 11990010158401 and for119900101584010158401 the identifying history sequence is 1199001 1198861 119900101584010158401 These twoidentifying history sequences are identical so longer transi-tion instance is needed

fo(s2) = o2

fo(s1) = o1

s1

s2

a1

a1

a2

a2

Figure 6 The STAMNmodel with memory depth 1

Table 1 The identifying history sequences for 11990010158401 and 119900101584010158401 Hidden state 11990010158401 (119896 = 2) 119900101584010158401 (119896 = 3)

Identifying history sequence1199001 1198861 11990010158401 1199002 1198861 1199001 1198861 119900101584010158401119900101584010158401 1198862 119900101584011199002 1198861 11990010158401

When ℎ119905 = 1199001 1198861 11990010158401 1198862 1199002 1198862 1199002 1198861 1199001 1198861 119900101584010158401 1198862 11990010158401 11988621199002 1198861 11990010158401 1198862 1199002 1198861 1199001 1198861 1199001 1198861 11990010158401 1198862 1199002 In ℎ119905 we obtain theidentifying history sequence ℎ119894 = ℎ119895 for states 11990010158401 and 119900101584010158401 For 119896 = 2 the distinguished identifying history sequence for11990010158401 is 119900101584010158401 1198862 11990010158401 and 1199002 1198861 11990010158401 However the distinguishedidentifying history sequence for 119900101584010158401 is 1199001 1198861 119900101584010158401 which is nota discriminated history sequence from the identifying historysequence for 11990010158401 So the identifying history sequence length for119900101584010158401 must increase to 3 The identifying history sequences for11990010158401 and 1199001 are described as in Table 1

According to 119896-step history identifying following iden-tifying and previous identifying ℎ119905 can be represented asℎ119905 = 119900101584010158401 1198861 11990010158401 1198862 1199002 1198862 1199002 1198861 11990010158401 1198861 119900101584010158401 1198862 11990010158401 1198862 1199002 1198861 119900101584011198862 1199002 1198861 11990010158401 1198861 119900101584010158401 1198861 11990010158401 1198862 1199002

According to Pseudocode 1 the STAMN model is con-structed incrementally using ℎ119905 as in Figure 7(a) and theLPST is constructed as Figure 7(b) The STAMN has thefewest nodes because the state nodes in STAMN representthe hidden state and the state nodes in LPST represent theobservation state

To illustrate the difference between the STAMN andLPST we present the comparison results in Figure 8

After 10 timesteps the algorithms use the current learnedmodel to realize the transition prediction and the transitionprediction criterion is expressed by the prediction error Thedata point represents the average prediction error over 10runs We ran three algorithms the STAMN with exhaustiveexploration the STAMN with random exploration and theLPST

10 Computational Intelligence and Neuroscience

Set initial memory depth 119896 = 1 119905 = 0All nodes and connections weight do not exist initially in STAMNThe transition instance ℎ0 = 120601A STAMN is constructed incrementally through interaction with the environment by the agentexpanding the SATAMN until the transition prediction is contradicted with the current minimalhypothesis model reconstruct the hypothesis model by increase the memory depth 119896 in thispaper the constructing process includes the identifying and recalling simultaneouslyWhile one pattern 119900119905 or 119886119905 can be activated from the pattern cognition layer do

ℎ119905 = (ℎ119905minus1 119900119905 119886119905)for all existed state nodescompute the identifying activation value by executing Algorithm 12If exist the 119899119904119895(119905) = 1 thenthe looped state is identified and the weight 120596119894119895120596119894119895 is adjusted according to (4) (5)

elsea new state node 119895 will be created set 119899119904119895(119905) = 1 and the weight 120596119894119895120596119894119895 is adjusted according to (4) (5)end ifend forfor all existed state nodescompute the transition prediction criterion by executing Algorithm 13If IsNotTrans(119904119895) then

for the previous winner state node 119894 of the state node 119895set memory depth 119896 = 119896 + 1 until each ℎ119894 are discriminated

end forreconstruct the hypothesis model by the new memory depth 119896 according to Algorithm 11

end ifend forfor all existed action nodescompute the activation value according to (9)

if exist the 119899119886119895 (119905) = 1 thenthe weight 119908119894119895(119905) is adjusted according to (6)else

a new action node 119895 will be created set 119899119886119895 (119905) = 1 and the weight 119908119894119895(119905) is adjusted according to (6)end ifend forThe agent obtains the new observation vectors by 119900119905+1 = trans(119904119905 119886119905)119905 = 119905 + 1endWhile

Pseudocode 1 Algorithm for heuristic constructing the STAMN

IsNotTrans(119904119895)if the 119901119904119895(119905) gt 120599 according to (12) and the 119902119904119895(119905) = 1 according to (13) thenreturn Falseelsereturn Trueend if

Pseudocode 2 Transition prediction criterion algorithm (recalling phase)

Figure 8 shows that three algorithms all can produce thecorrect model with zero prediction error at last because ofno noise However the STAMN with exhaustive explorationhas the better performance and the faster convergence speedbecause of the exhaustive exploration 119876-function

42 Experimental Comparison in 4 lowast 3 Grid Problem Firstwe use the small POMDP problem to test the feasibility of

the STAMN 4 lowast 3 gird problem is selected which is shownin Figure 9 The agent wanders inside the grid and has onlyleft sensor and right sensor to report the existence of a wallin the current position The agent has four actions forwardbackward turn left and turn rightThe reference direction ofthe agent is northward

In the 4 lowast 3 gird problem the hidden state space size|119878| is 11 the action space size |119860| is 4 for each state and

Computational Intelligence and Neuroscience 11

a1

fo(s3) = o1

fo(s2) = o2fo(s1) = o1 s1 s2

s3

a1a1

a1a2

a2

a2

(a)

a1a1

a1

a1

a1

a2

r

o1

o1

o1

o1

o1

o1

o2

o2

o2

(b)

Figure 7 (a) The STAMNmodel (b) The LPST model

0 20 40 60 80 1000

09

08

07

06

05

04

03

02

01

1 A simple deterministic POMDP

Timesteps

Pred

ictio

n er

ror

STAMN (exhaustive exploration)STAMN (random exploration)LPST

Figure 8 Comparative performance results for a simple determin-istic POMDP

the observation space size |119874| is 4 The upper figure in thegrid represents the observation state and the bottom figure inthe grid represents the hidden state The black grid and thewall are regarded as the obstacles We present a comparisonbetween the LPST and the STAMNThe results are shown inFigure 10 Figure 10 shows that the STAMN with exhaustiveexploration still has the better performance and the fasterconvergence speed in the 4 lowast 3 gird problem

The number of state nodes and action nodes in STAMNand LPST is described in Table 2 In STAMN the state node

20

34

01

08

27

13

02

110

09

16

25

Figure 9 4 lowast 3 gird problem

Table 2 The number of state nodes and action nodes in STAMNand LPST

Algorithm The number of statenodes The number of action nodes

LPST 70 50STAMN 11 44

represents the hidden state but in LPST the sate node repre-sents observation value and the hidden state is expressed by119896-step past observations and actions thus more observationnodes and action nodes are repeated created and most of theobservation nodes and action nodes are the same LPST hasthe perfect observation prediction capability and is the sameas the STAMN but is not the state transition model

In STAMN The setting of parameters 120583 120578 120574119904119894 120575119886119894 is veryimportant which determines the depth 119896 of short-term

12 Computational Intelligence and Neuroscience

0 200 400 600 800 1000Timesteps

STAMN (exhaustive exploration)STAMN (random exploration)LPST

0

09

08

07

06

05

04

03

02

01

1

Pred

ictio

n er

ror

4 lowast 3 grid problem

Figure 10 Comparative performance results for 4lowast3 gird problem

memory We assume 120574119904119894 = 120575119886119894 = 09 and 120583 = 120578 = 09 initialrepresenting 119896 = 1 When we need to increase 119896 to 119896 + 1 weonly need to decrease 120583 120578 according to 120578 = 120578minus02 120583 = 120583minus01The other parameters are determined relative easily We setlearning rates 120572 = 120573 = 120574 = 09 and set the confidenceparameters 119862119894119895 = 1|120591119895| 119863119894119895 = 12 representing the sameimportance degree Set constant values 119862 = 5 120599 = 09

43 Experimental Results in Complex Symmetrical Environ-ments The symmetrical environment in this paper is verycomplexwhich is shown in Figure 11(a)The robotwanders inthe environment By following wall behaviour the agent canrecognize the left wall right wall and corridor landmarksThese observation landmarks are different from the observa-tion vector in the previous grid problem every observationlandmark has different duration time For analysis of thefault tolerance and robustness of STAMN we assume thedisturbed environment is shown in Figure 11(b)

The figures in Figure 11(a) represent the hidden statesThe robot has four actions forward backward turn leftand turn right The initial orientation of the robot is shownin Figure 11(a) Since the robot follows the wall in theenvironment the robot has only one optional action in eachstate The reference direction of action is the robot currentorientation Thus the path 2-3-4-5-6 and path 12-13-14-15-16 are identical and the memory depth 119896 = 6 is necessaryto identify hidden states reliably We present a comparisonbetween the STAMNand the LPST in noise-free environmentand disturbed environment The results are shown as inFigures 12(a) and 12(b) Figure 12(b) shows that STAMN hasbetter noise tolerance and robustness than LPST because ofthe neuron synchronized activation mechanism which bearsthe fault and noise in the sequence item duration time andcan realize the reliable recalling However the LPST cannot

realize the correct convergence in reasonable time because ofaccurate matching

5 Discussion

In this section we compare the relatedworkwith the STAMNmodel The related work mainly includes the LPST and theAMN

51 Looping Prediction Suffix Tree (LPST) LPST is construc-ted incrementally by expanding branches until they are iden-tified or become looped using the observable terminationcriterion Given a sufficient history this model can correctlycapture all predictions of identifying histories and can mapthe all infinite identifying histories onto a finite LPST

The STAMNproposed in this paper is similar to the LPSTHowever the STAMN is looping hidden state transitionmodel so in comparison with LPST STAMN have lessstate nodes and action nodes because these nodes in LPSTare based on observation not hidden state FurthermoreSTAMNhas better noise tolerance and robustness than LPSTThe LPST realizes recalling by successive accurate matchingwhich is sensitive to noise and fault The STAMN offers theneuron synchronized activation mechanism to realize recall-ing Even if in noisy and disturbed environment STAMNcan still realize the reliable recalling Finally the algorithm forlearning a LPST is an additional computation model whichis not a distributed computational model The STAMN isa distributed network and uses the synchronized activationmechanism where performance cannot become poor withhistory sequences increasing and scale increasing

52 Associative Memory Networks (AMN) STAMN is pro-posed based on the development of the associative memorynetworks In existing AMN firstly almost all models areunable to handle complex sequence with looped hiddenstate The STAMN realizes the identifying the looped hiddenstate indeed which can be applied to HMM and POMDPproblems Furthermoremost AMNmodels can obtainmem-ory depth determined by experiments The STAMN offers aself-organizing incremental memory depth learningmethodand the memory depth 119896 is variable in different parts ofstate space Finally the existing AMN models general onlyrecord the temporal orders with discrete intervals Δ119905 ratherthan sequence items durationwith continuous-time STAMNexplicitly deals with the duration time of each item

6 Conclusion and Future Research

POMDP is the long-standing difficult problem in the mach-ine learning In this paper SATMN is proposed to identifythe looped hidden state only by the transition instances indeterministic POMDP environment The learned STAMN isseen as a variable depth 119896-Morkov model We proposed theheuristic constructing algorithm for the STAMN which isproved to be sound and complete given sufficient historysequences The STAMN is real self-organizing incrementalunsupervised learning model These transition instances can

Computational Intelligence and Neuroscience 13

2

3

6

7

8

9

10

11

12

54 14

13

21

22

23

24 18

19

20 16

171

15

(a) (b)

Figure 11 (a) Complex symmetrical simulation environment (b) The disturbed environment

0 100 200 300 400 500

A complex symmetrical environment

Timesteps

STAMNLPST

0

09

08

07

06

05

04

03

02

01

1

Pred

ictio

n er

ror

(a)

0 100 200 300 400 500

A disturbed environment

Timesteps

STAMNLPST

0

09

08

07

06

05

04

03

02

01

1Pr

edic

tion

erro

r

(b)

Figure 12 (a) Comparative results for a noise-free environment (b) Comparative results for a disturbed environment

be obtained by the interaction with the environment by theagent and can be obtained by a number of training datathat does not depend on the real agent The STAMN is veryfast robust and accurate We have also shown that STAMNoutperforms some existing hidden state methods in deter-ministic POMDP environment The STAMN can generallybe applied to almost all temporal sequences problem suchas simultaneous localization and mapping problem (SLAM)robot trajectory planning sequential decision making andmusic generation

We believe that the STAMN can serve as a starting pointfor integrating the associative memory networks with thePOMDP problem Further research will be carried out onthe following aspects how to scale our approach to thestochastic case by heuristic statistical test how to incorporatewith the reinforcement learning to produce a new distributed

reinforcement learning model and how it should be appliedto robot SLAM to resolve the practical navigation problem

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this article

References

[1] A K McCallum Reinforcement learning with selective percep-tion and hidden state [PhD thesis] University of RochesterRochester NY USA 1996

[2] MHutter ldquoFeature reinforcement learning part I unstructuredMDPsrdquo Journal of Artificial General Intelligence vol 1 no 1 pp3ndash24 2009

14 Computational Intelligence and Neuroscience

[3] M Daswani P Sunehag andMHutter ldquoFeature reinforcementlearning state of the artrdquo in Proceedings of the Workshops at the28th AAAI Conference on Artificial Intelligence 2014

[4] PNguyen P Sunehag andMHutter ldquoContext treemaximizingreinforcement learningrdquo in Proceedings of the 26th AAAI Con-ference onArtificial Intelligence pp 1075ndash1082 Toronto CanadaJuly 2012

[5] J Veness K S NgMHutterWUther andD Silver ldquoAMonte-Carlo AIXI approximationrdquo Journal of Artificial IntelligenceResearch vol 40 no 1 pp 95ndash142 2011

[6] M P Holmes and C L Isbell Jr ldquoLooping suffix tree-basedinference of partially observable hidden staterdquo in Proceedings ofthe 23rd International Conference on Machine Learning (ICMLrsquo06) pp 409ndash416 ACM Pittsburgh Pa USA 2006

[7] M Daswani P Sunehag andMHutter ldquoFeature reinforcementlearning using looping suffix treesrdquo in Proceedings of the 10thEuropeanWorkshop on Reinforcement Learning (EWRL rsquo12) pp11ndash24 Edinburgh Scotland 2012

[8] M Daswani P Sunehag and M Hutter ldquoQ-learning forhistory-based reinforcement learningrdquo in Proceedings of theConference on Machine Learning pp 213ndash228 2013

[9] S Timmer and M Riedmiller Reinforcement learning with his-tory lists [PhD thesis] University of Osnabruck OsnabruckGermany 2009

[10] E Talvitie ldquoLearning partially observable models using tempo-rally abstract decision treesrdquo in Advances in Neural InformationProcessing Systems pp 818ndash826 2012

[11] M Mahmud ldquoConstructing states for reinforcement learningrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 727ndash734 June 2010

[12] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquoNeural Computation vol 9 no 8 pp 1735ndash1780 1997

[13] V Mnih K Kavukcuoglu D Silver et al ldquoHuman-level controlthrough deep reinforcement learningrdquo Nature vol 518 no7540 pp 529ndash533 2015

[14] M Hausknecht and P Stone ldquoDeep recurrent Q-learning forpartially observable MDPsrdquo httparxivorgabs150706527

[15] K Narasimhan T Kulkarni and R Barzilay ldquoLanguage under-standing for text-based games using deep reinforcement learn-ingrdquo httpsarxivorgabs150608941

[16] J Oh X Guo H Lee R Lewis and S Singh ldquoAction-condi-tional video prediction using deep networks in atari gamesrdquohttparxivorgabs150708750

[17] X Li L Li J Gao et al ldquoRecurrent reinforcement learning ahybrid approachrdquo httparxivorgabs150903044

[18] D Wang and B Yuwono ldquoIncremental learning of complextemporal patternsrdquo IEEE Transactions on Neural Networks vol7 no 6 pp 1465ndash1481 1996

[19] A Sudo A Sato and O Hasegawa ldquoAssociative memory foronline learning in noisy environments using self-organizingincremental neural networkrdquo IEEE Transactions on NeuralNetworks vol 20 no 6 pp 964ndash972 2009

[20] M U Keysermann and P A Vargas ldquoTowards autonomousrobots via an incremental clustering and associative learningarchitecturerdquo Cognitive Computation vol 7 no 4 pp 414ndash4332014

[21] A F R Araujo and G D A Barreto ldquoContext in temporal seq-uence processing a self-organizing approach and its applicationto roboticsrdquo IEEE Transactions on Neural Networks vol 13 no1 pp 45ndash57 2002

[22] A L Freire G A Barreto M Veloso and A T Varela ldquoShort-term memory mechanisms in neural network learning of robotnavigation tasks A Case Studyrdquo in Proceedings of the 6th LatinAmerican Robotics Symposium (LARS rsquo09) pp 1ndash6 October2009

[23] S Tangruamsub A Kawewong M Tsuboyama and O Hase-gawa ldquoSelf-organizing incremental associative memory-basedrobot navigationrdquo IEICE Transactions on Information andSystems vol 95 no 10 pp 2415ndash2425 2012

[24] V A Nguyen J A Starzyk W-B Goh and D Jachyra ldquoNeuralnetwork structure for spatio-temporal long-term memoryrdquoIEEE Transactions on Neural Networks and Learning Systemsvol 23 no 6 pp 971ndash983 2012

[25] F Shen H Yu W Kasai and O Hasegawa ldquoAn associativememory system for incremental learning and temporal seq-uencerdquo in Proceedings of the International Joint Conference onNeural Networks (IJCNN rsquo10) pp 1ndash8 IEEE Barcelona SpainJuly 2010

[26] F ShenQOuyangWKasai andOHasegawa ldquoA general asso-ciative memory based on self-organizing incremental neuralnetworkrdquo Neurocomputing vol 104 pp 57ndash71 2013

[27] B Khouzam ldquoNeural networks as cellular computing modelsfor temporal sequence processingrdquo Suplec 2014

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 4: Research Article A Self-Organizing Incremental ...downloads.hindawi.com › journals › cin › 2016 › 7158507.pdfResearch Article A Self-Organizing Incremental Spatiotemporal Associative

4 Computational Intelligence and Neuroscience

sequences to perform statistic test to determine the minimalidentifying history sequence length 119896 However in practiceit can be difficult to obtain a sufficient history So we proposea heuristic state transition model constructing algorithmWithout the sufficient history sequence the algorithm mayproduce a premature state transitionmodel but this model atleast is correct for the past experienceWe realize the heuristicconstructing algorithm by way of two definitions and twolemmas as follows

Definition 7 (current history sequence) A current historysequence is represented by the 119896-step history sequence119900119905minus119896+1 119886119905minus119896+1 119900119905minus119896+2 119886119905minus1 119900119905 generated by the transitionfunction 119891119904 and the observation function 119891119900 At 119905 = 0 thecurrent history sequence ℎ0 is empty 119900119905 is the observationvectors at current time 119905 and 119900119905minus119896+1 is the observation vectorsthat precede 119896-step at time 119905Definition 8 (transition instance) The history sequence asso-ciated with time 119905 is captured as a transition instanceThe transition instance is represented by the tuple ℎ119905+1 =(ℎ119905 119886119905 119900119905+1) where ℎ119905 and ℎ119905+1 are current history sequencesoccurring at times 119905 and 119905 + 1 on episode A set of transitioninstances will be denoted by the symbol F which possiblycontains transitions from different episodes

Lemma 9 For any two hidden states 119904119894 and 119904119895 119904119894 = 119904119895 ifftrans(119904119894 119886) = trans(119904119895 119886)

This lemma gives us a sufficient condition to determine 119904119894 =119904119895 However this lemma is not a necessary condition

Proof by Contradiction We assume that 119904119894 = 119904119895 thus for allactions sequences 119889119894119895 = 1198861 1198862 119886119899 there exists 120574(119904119894 119889119894119895) =120574(119904119895 119889119894119895) 120574(119904119894 119889119894119895) = 120574(119904119895 119889119894119895) is contradicting with trans(119904119894119886) = trans(119904119895 119886) Thus original proposition is true

Lemma 10 For any two identifying history sequences ℎ119894 and ℎ119895separately for 119904119894 and 119904119895 there exists ℎ119894 = ℎ119895 iff 119904119894 = 119904119895 HoweverLemma 10 is not a necessary condition

Proof by Contradiction Since an identifying history sequenceℎ is considered to uniquely identify the hidden state 119904 theidentifying history sequences ℎ119894 uniquely identify the hiddenstate 119904119894 and the identifying history sequences ℎ119895 uniquelyidentify the hidden state 119904119895 We assume ℎ119894 = ℎ119895 thus theremust exit 119904119894 = 119904119895 which is contradicting with 119904119894 = 119904119895 Thusoriginal proposition is true

However the necessary condition for Lemma 10 is notalways true because there maybe exist several differentidentifying history sequences for the same hidden state

Algorithm 11 (a heuristic constructing algorithm for thestate transition model) The initial state transition model isconstructed making use of the minimal identifying historysequence length 119896 = 1 And the model is empty initially

The transition instance ℎ is empty and 119905 = 0(1) We assume the given model can perfectly identify the

hidden state The agent makes a step in the environ-ment (according to history sequence definition thefirst item in ℎ is the observation vector) It recordsthe current transition instance on the end of the chainof transition instance And at each time 119905 for thecurrent history sequence ℎ119905 execute Algorithm 12 ifthe current state 119904 is looped to the hidden state 119904119894 thengo to step (2) otherwise a new node 119904119894 is created in themodel go to step (4)

(2) According to Algorithm 13 if trans(ℎ119905 119886119895) = trans(ℎ119894119886119895) then the current state 119904 is distinguished from theidentifying sate 119904119894 then go to step (3) If there existstrans(ℎ119905 119886119895) = trans(ℎ119894 119886119895) then go to step (4)

(3) According to Lemma 10 for any two identifying his-tory sequences ℎ119894 and ℎ119895 separately for 119904119894 and 119904119895 thereexists ℎ119894 = ℎ119895 iff 119904119894 = 119904119895 So the identifying historysequence length for ℎ119894 and ℎ119895 separately for 119904119894 and 119904119895must increase to 119896+1until ℎ119894 and ℎ119895 are discriminatedWe must reconstruct the model based on the newminimal identifying history sequence length 119896 and goto step (1)

(4) If the action node and corresponding 119876 function forthe current identifying state node exist in the modelthe agent chooses its next action node 119886119895 based on theexhaustive exploration 119876 function If the action nodeand corresponding 119876 function do not exist the agentchooses a random action 119886119895 instead and a new actionnode is created in the model After the action controlsignals to delivery the agent obtains the new observa-tion vectors by trans(119904119894 119886119895) go to step (1)

(5) Steps (1)ndash(4) continue until all identifying historysequences ℎ for the same hidden state 119904 can correctlypredict the trans(ℎ 119886) for each action

Note we apply the 119896-step variable history length from119896 = 1 to 119899 and the history length 119896 is a variable value for thedifferent hidden state We adopt the minimalist hypothesismodel in constructing the state transitionmodel processThisconstructing algorithm is a heuristic If we adopt the exhaus-tiveness assumptionmodel the probability of missing loopedhidden state increases exponentially and many valid loopscan be rejected yielding larger redundant state and poorgeneralization

Algorithm 12 (the current history sequence identifying algo-rithm) In order to construct the minimal looping hiddenstate transition model we need to identify the looped stateby identifying hidden state Current history sequence with119896-step is needed According to Definition 7 current 119896-stephistory sequence at time 119905 is expressed by 119900119905minus119896+1 119886119905minus119896+1119900119905minus119896+2 119886119905minus1 119900119905 There exist three identifying processes

Computational Intelligence and Neuroscience 5

(1) 119896-Step History Sequence Identifying If the identifyinghistory sequence for 119904119894 is 119900119894minus119896+1 119886119894minus119896+1 119900119894minus119896+2 119886119894minus1 119900119894satisfy

119900119905 = 119900119894119886119905minus1 = 119886119894minus1

119900119905minus119896+2 = 119900119894minus119896+2119886119905minus119896+1 = 119886119894minus119896+1119900119905minus119896+1 = 119900119894minus119896+1

(1)

Thus the current state 119904 is looped to the hidden state119904119894 In the STAMN the 119896-step history sequence identifyingactivation value119898119904119895(119905) is computed

(2) Following Identifying If the current state 119904 is identified asthe hidden state 119904119894 the next transition history ℎ119905+1 is repre-sented by ℎ119905+1 = (ℎ119905 119886119905 119900119905+1) through trans(119904119894 119886119905) Accord-ing to Lemma 4 the next transition history ℎ119905+1 is identifiedas the transition prediction state 119904119895 = 119891119904(119904119894 119886119905)

In the STAMN the following activation value 119902119904119895(119905) iscomputed

(3) Previous Identifying If there exists a transition instanceℎ119905+1 = (ℎ119905 119886119905 119900119905+1) ℎ119905+1 is an identifying history sequencefor state 119904119895 According to Lemma 5 the previous state 119904119894 isuniquely identified by ℎ119905 such that 119904119895 = 119891119904(119904119894 119886119905) So if thereexists a transition prediction state 119904119895 and 119904119895 = 119891119904(119904119894 119886119905) thenthe previous state 119904119894 is uniquely identified

In the STAMN the previous activation value 119889119904119895(119905) iscomputed

Algorithm 13 (transition prediction criterion) If the modelcorrectly represents the environment model as Morkovchains then for all state 119904 for all identifying history sequenceℎ for the same state it is satisfied that trans(ℎ 119886) is the sameobservation for the same action

If ℎ1 and ℎ2 are the identifying history sequences with 119896-step memory for the hidden state 119904 there exist trans(ℎ1 119886) =trans(ℎ2 119886) according to Lemma 9 thus 119904119894 is discriminativewith 119904119895

In the STAMN the transition prediction criterion isrealized by the transition prediction value Psj(t)

3 Spatiotemporal AssociativeMemory Networks

In this section we want to relate the spatiotemporal sequenceproblem to the idea of identifying the hidden state by asequence of past observations and actions An associativememory network (AMN) is expected to have several charac-teristics (1) An AMN must memorize incrementally whichhas the ability to learn new knowledge without forgettingthe learned knowledge (2) An AMN must be able to notonly record the temporal orders but also record sequence

items duration time with continuous time (3) AnAMNmustbe able to realize the heteroassociation recall (4) An AMNmust be able to process the real-valued feature vectors inbottom-up method not the symbolic items (5) An AMNmust be robust and must be able to recall the sequencecorrectly with incomplete or noisy input (6) An AMN canrealize learning and recalling simultaneously (7) An AMNcan realize interaction between STM and LTM dual-tracetheory suggested that the persistent neural activities of STMcan lead to LTM

The first thing to be defined is the temporal dimension(discrete or continuous) The previous researches are mostlybased on a regular intervals of Δ119905 and few AMN have beenproposed to deal with not only the sequential characteristicbut also sequence items duration time with continuous timeHowever this characteristic is important for many problemsIn speech production writing music generation and motor-planning and so on the sequence item duration time andsequence item repeated emergence have essential differentmeaning For example ldquo1198614rdquo and ldquo119861-119861-119861-119861rdquo are exactlydifferent the former represents that item 119861 sustains for 4timesteps and the latter represents that item 119861 repeatedlyemerges for 4 times STAMN can explicitly distinguish thesetwo temporal characteristics So a history of past observationsand actions can be expressed by a special spatiotemporalsequence ℎ ℎ = 119900119905minus119896+1 119886119905minus119896+1 119900119905minus119896+2 119886119905minus1 119900119905 where 119896is the length of sequence of ℎ the items of ℎ include theobservation items and the action items where 119900119905 denotes thereal-valued observation vectors generated by taking action119886119905minus1 119886119905 denotes the action taken by the agent at time 119905where 119905 does not represent temporal discrete dimension bysampling at regular intervals Δ119905 but represents the 119905th step inthe continuous-time dimension which is the duration timebetween the current item and the next one

A spatiotemporal sequence can classified as simple seq-uence and complex sequence Simple sequence is a sequencewithout repeated items for example the sequence ldquo119861-119860-119864-119871rdquo is a simple sequence whereas those containing repeateditems are defined as the complex sequence In complexsequence the repeated item can be classified as looped itemsand discriminative items by identifying the hidden statefor example in the history sequence ldquo119883-119884-119885-119884-119885rdquo ldquo119884rdquoand ldquo119885rdquo maybe the looped items or discriminative itemsIdentifying the hidden state needs to introduce the contextualinformation resolving by the 119896-step memory The memorydepth 119896 is not fixed and 119896 is a variable value in different partsof state space

31 Spatiotemporal Associative Memory Networks Architec-ture We build a new spatiotemporal associative memorynetwork (STAMN) This model makes use of neuron activitydecay of nodes to achieve short-term memory connectionweights between different nodes to represent long-termmemory presynaptic potentials and neuron synchronizedactivation mechanism to realize identifying and recallingand a time-delayed Hebb learning mechanism to fulfil theone-shot learning

6 Computational Intelligence and Neuroscience

nsi

bsi

naj

baj

wij 120596ij

120596ij

Figure 2 STAMN structure diagram

STAMN is an incremental possibly looping nonfullyconnected asymmetric associative memory network Non-homogeneous nodes correspond to hidden state nodes andaction nodes

For the state node 119895 the input values are defined asfollows (1) the current observation activation value 119887119904119895(119905)responsible for matching degree of state node 119895 and currentobserved value which is obtained from the preprocessingneural networks (if the matching degree is greater thana threshold value then 119887119904119895(119905) = 1) (2) the observationactivation value 119887119904119895(119905) of the presynaptic nodes set 120591119895 of currentstate node 119895 (3) the identifying activation value 119899119904119894 (119905) of theprevious state node 119894 of state node 119895 (4) the activation value119899119886119894 (119905) of the previous action node 119894 of state node 119895 The outputvalues are defined as follows (1) The identifying activationvalue 119899119904119895(119905) represents whether the current state 119904 is identifiedto the hidden state 119904119895 or not (2) The transition predictionvalue 119901119904119895(119905) represents whether the state node 119895 is the currentstate transition prediction node or not

For the action node 119895 the input values are defined asthe current action activation value 119887119886119895 (119905) responsible formatching degree of action node 119895 and current motor vectorsThe output value is activation value of the action nodes 119899119886119895 (119905)and indicates that the action node 119895has been selected by agentto control the robotrsquos current action

For the STAMN all nodes and connections weight donot necessarily exist initially The weights 120596119894119895(119905) and 120596119894119895(119905)connected to state nodes can be learned by a time-delayedHebb learning rules incrementally representing the LTMThe weight 119908119894119895(119905) connected to action nodes can be learnedby reinforcement learning All nodes have activity self-decay mechanism to record the duration time of this noderepresenting the STM The output of the STAMN is thewinner state node or winner action node by winner takes all

STAMN architecture is shown in Figure 2 where blacknodes represent action nodes and concentric circles nodesrepresent state nodes

32 Short-TermMemory We using self-decay of neuron acti-vity to accomplish short-term memory no matter whetherobservation activation 119887119904119894 (119905) or identifying activation 119899119904119894 (119905)The activity of each state node has self-decay mechanismto record the temporal order and the duration time of thisnode Supposing the factor of self-decay is 120574119904119894 isin (0 1) and theactivation value 119887119904119894 (119905) 119899119904119894 (119905) isin [0 1] the self-decay process isshown as

119887119904119894 (119905) 119899119904119894 (119905)

= 0 119887119904119894 (119905) 119899119904119894 (119905) ⩽ 120578120574119904119894 119887119904119894 (119905 minus 1) 120574119904119894 119899119904119894 (119905 minus 1) else

(2)

The activity of action node also has self-decay character-istic Supposing the factor of self-decay is 120575119886119894 isin (0 1) and theactivation value 119887119886119894 (119905) 119899119886119894 (119905) isin [0 1] the self-decay process isshown as

119887119886119894 (119905) 119899119886119894 (119905)

= 0 119887119886119894 (119905) 119899119886119894 (119905) ⩽ 120583120575119886119894 119887119886119894 (119905 minus 1) 120575119886119894 119899119886119894 (119905 minus 1) else

(3)

wherein 120578 and 120583 are active thresholds and 120574119904119894 and 120575119886119894 areself-decay factors Both determine the depth of short-termmemory 119896 where 119905 is a discrete time point by sampling atregular intervals Δ119905 and Δ119905 is a very small regular interval

33 Long-Term Memory Long-term memory can be classi-fied into semantic memory and episodic memory Learningof history sequence is considered as the episodic memorygenerally adopting the one-shot learning to realize We usethe time-delayed Hebb learning rules to fulfil the one-shotlearning

(1) The 119896-Step Long-Term Memory Weight 120596119894119895(119905) The weight120596119894119895(119905) connected to state nodes 119895 represents 119896-step long-termmemory This is a past-oriented behaviour in the STAMNThe weight 120596119894119895(119905) is adjusted according to

120596119894119895 (119905)119894isin120591119895

119899119904119895(119905)=1

= 119899119904119895 (119905) 119899119883119894 (119905) 120596119894119895 (119905 minus 1) = 0 119883 isin 119904 119886120596119894119895 (119905 minus 1) + 120572 (119899119883119894 (119905) minus 120596119894119895 (119905 minus 1)) else (4)

where 119895 is the current identifying activation state node119899119904119895(119905) = 1 Because the identifying activation process is aneuron synchronized activation process when 119899119904119895(119905) = 1 all

nodes whose 119899119904119894 (119905) and 119899119886119894 (119905) are not zero are the contextualinformation related to state node 119895 These nodes whose 119899119904119894 (119905)and 119899119886119894 (119905) are not zero are considered as the presynaptic node

Computational Intelligence and Neuroscience 7

set of current state node 119895The presynaptic node set of currentstate node 119895 is expressed by 120591119895 where 119899119904119894 (119905) 119899119886119894 (119905) representthe activation value at time 119905 and 119899119904119894 (119905) 119899119886119894 (119905) record not onlythe temporal order but also the duration time because of theself-decay of neuron activity If 119899119904119894 (119905) 119899119886119894 (119905) are smaller thenode 119894 is more early to current state node 119895120596119894119895(119905) is activationweight between presynaptic node 119894 and state node 119895 120596119894119895(119905)records context information related to state node 119895 to be usedin identifying and recalling where 120596119894119895(119905) isin [0 1] 120596119894119895(0) = 0and 120572 is learning rate

The weight 120596119894119895(119905) is time-related contextual informationthe update process is shown as in Figure 3

(2) One-Step Transition Prediction Weight 120596119894119895(119905) The weight120596119894119895(119905) connected to state nodes 119895 represents one-step transi-tion prediction in LTM This is a future-oriented behaviourin the STAMN Using the time-delayed Hebb learning rulesthe weight 120596119894119895(119905) is adjusted according to

120596119894119895 (119905)119894=argmax 119899119883

119894(119905)

119899119904119895(119905)=1

=

119899119904119895 (119905) 119899119883119894 (119905) 120596119894119895 (119905 minus 1) = 0 119883 isin 119904 119886120596119894119895 (119905 minus 1) + 120573 (119899119883119894 (119905) minus 120596119894119895 (119905 minus 1)) else (5)

The transition activation of current state node 119895 is onlyassociated with the previous winning state node and actionnode where 119895 is the current identifying activation state node119899119904119895(119905) = 1 The previous winning state node and action nodeare presented by 119894 = argmax119883119894 (119905) 119883 isin 119904 119886 where 120596(119905) isin[0 1] and 120596(0) = 0 120573 is learning rate

The weight 120596119894119895(119905) is one-step transition prediction infor-mation the update process is shown as in Figure 4

(3) The Weight 119908119894119895(119905) Connected to Action NodesThe activa-tion of action node is only associated with the correspondingstate node which selects this action node directly We assumethe state node with maximal identifying activation value isthe state node 119894 at the time 119905 so the connection weight 119908119894119895(119905)connected to current selected action node 119895 is adjusted by

119908119894119895 (119905)119894=argmax 119899119904

119894(119905)

119899119886119895(119905)=1

=

1198760 (119904119894 119886119895) 119908119894119895 (119905 minus 1) = 0minus119876 max const 119886119895 is a invalid action

119876119905 (119904119894 119886119895) else(6)

where 119908119894119895(119905) is represented by 119876 function 119876119905(119904119905 119886119895) Thispaper only discusses how to build generalized environmentmodel not learning the optimal policy so this value is setto be the exhaustive exploration 119876 function based on thecuriosity reward The curiosity reward is described by (7)When action 119886119895 is an invalid action 119908119894119895(119905) is defined to bea large negative constant which is avoided going to the deadends

119903 (119905) = 119862 minus 119899119905

119899ave 119899ave gt 0119862 119899ave = 0

(7)

where 119862 is a constant value 119899119905 represents the count ofexploration to the current action by the agent and 119899ave is theaverage count of exploration to all actions by the agent 119899119905119899averepresents the degree of familiarity with the current selectedaction The curiosity reward is updated when each action is

finished 119876119905(119904119894 119886119895) update equation is showed by (8) where 120582is learning rate

119876119905 (119904119894 119886119895)

= (1 minus 120582)119876119905minus1 (119904119894 119886119895) + 120582119903 (119905) 119894 = argmax 119899119904119894 (119905) 119899119886119895 (119905) = 1119876119905minus1 (119904119894 119886119895) else

(8)

The update process is shown as in Figure 5The action node 119895 is selected by 119876119905(119904119905 119886119895) according to

119886119895 = argmax119895isin119860vl119894119899119904119894 (119905)=1

119876(119904119894 119886119895) (9)

where 119860vl119894 is the valid action set of state node 119894 119899119904119894 (119905) = 1

represent that the sate node 119894 is identified and the action node119895 was selected by agent to control the robotrsquos current actionset 119899119886119894 (119905) = 1

34 The Constructing Process in STAMN In order to con-struct the minimal looping hidden state transition model weneed to identify the looped state by identifying hidden stateThere exist identifying phase and recalling phase (transitionprediction phase) simultaneously in the constructing processin STAMN There is a chicken and egg problem during theconstructing process the building of the STAMN dependson state identifying conversely state identifying dependson the current structure of the STAMN Thus exhaustiveexploration and 119896-step variable memory length (depends onthe prediction criterion) are used to try to avoid local minimathat this interdependent causes

According to Algorithm 12 The identifying activationvalue of state node 119904119895 depends on three identifying processes119896-step history sequence identifying following identifyingand previous identifying We provide calculation equationsfor each identifying process

(1) 119896-Step History Sequence IdentifyingThe matching degreeof current 119896-step history sequence with the identifying

8 Computational Intelligence and Neuroscience

ojminus1 ajminus1120596ij ojmiddot middot middot ns

j(t) = 1

i isin Γj nsi(t) = 0 na

i = 0

ojminusk+1 ajminusk+1

Figure 3 Past-oriented weights 120596119894119895(119905) update process associatedwith state node 119895

oi ojai

120596ij(t)nsj(t) = 1

i = arg max nXi (t)

Figure 4 Future-oriented weights 120596119894119895(119905) update process associatedwith state node 119895

oi ajQ0(si aj)

naj (t) = 1

i = arg max nsi (t)

Figure 5 Weights 119908119894119895(119905) updating process associated with actionnode 119895

history sequence ℎ119895 for 119904119895 is computed to identify the loopingstate node 119895 First we compute the presynaptic potential119881119895(119905)for the state node 119895 according to

119881119895 (119905) = sum119894isin120591119895

119862119894119895 (119887119883119894 (119905) 120596119894119895) 119883 isin 119904 119886 (10)

where 119887119883119894 (119905) is the current activation value in the presynapticnode set 120591119895 119862119894119895 is the confidence parameter which meansthe node 119894rsquos importance degree in the presynaptic node set120591119895 The value of 119862119894119895 can be set in advance and sum119894isin120591119895 119862119894119895 =1 The function represents the similar degree betweenactivation value of the presynaptic node and the contextualinformation 120596119894119895 in LTM (119909 119910) isin [0 1] the similar degreeis high between 119909 and 119910 thus(119909 119910) is close to 1 Accordingto ldquowinner takes allrdquo among all nodes whose presynapticpotentials exceed the threshold 120599 the node with the largestpresynaptic potential will be selected

The presynaptic potential 119881119895(119905) represents the syn-chronous activation process of presynaptic node set of 120591119895

which represents the previous 119896 minus 1 step contextual infor-mation matching of state node 119895 To realize all 119896-step historysequence matching the 119896-step history sequence identifyingactivation value of the state node 119895 is given below

119898119904119895 (119905) = 119867 (119881119895 (119905) 120599)

sdot 119867(119881119895 (119905) max1198961rarr119873119888

(119881119896 (119905))) 119887119904119895 (119905)

119867 (119909 119910) = 0 119909 lt 1199101 119909 ge 119910

(11)

where max1198961rarr119873119888(119881119896(119905)) is the maximum potential value ofnode 119896 119867(119881119895(119905) 120599) sdot 119867(119881119895(119905)max1198961rarr119873119888(119881119896(119905))) means thenode with the largest potential value is selected among allnodes whose synaptic potentials exceed the threshold 120599 119887119904119895(119905)represents the matching degree of state node 119895 and currentobserved value If 119898119904119895(119905) = 1 the current state 119904 is identifiedto looped state node 119895 by the 119896 step memory

(2) Following Identifying If the current state 119904 is identifiedas the state 119904119894 then the next transition prediction state isidentified as the state 119904119895 = 119891119904(119904119894 119886119894) First we compute thetransition prediction value for the state node 119895 according to

119901119904119895 (119905) = sum119894=argmax 119899119883119894 (119905)

119863119894119895120601 (119899119883119894 (119905) 120596119894119895) 119883 isin 119904 119886 (12)

where 119899119883119894 (119905) is the identifying activation value of the statenode and action node at time 119905 119894 = argmax 119899119883119894 (119905) indicatesstate node 119894 and action node 119894 are the previous winner nodes119863119894119895 is the confidence parameter which means the node 119894rsquosimportance degree The value of 119863119894119895 can be set in advanceand sum119894=arg max 119899119883

119894(119905)119863119894119895 = 1 120596119894119895 records one-step transition

prediction information related to state node 119895 to be usedin identifying and recalling phase If the current state 119904 isidentified as the hidden state 119904119894119901119904119895(119905) represent the probabilityof the next transition prediction state is 119895

If the next prediction state node 119895 is the same as thecurrent observation value the current history sequence isidentified as the state node 119895 the following identifying valueof the state node 119895 is given below

119902119904119895 (119905) = 119867 (119901119904119895 (119905) 120599)

sdot 119867(119901119904119895 (119905) max1198961rarr119873119888

(119901119904119896 (119905))) 119887119904119895 (119905) (13)

If 119902119904119895(119905) = 1 the current history sequence ℎ119905 is identifiedto looped state node 119895 by the following identifying If 119901119904119895(119905) gemax1198961rarr119873119888(119901119904119896(119905)) 119901119904119895(119905) ge 120599 and 119902119904119895(119905) = 1 then there exists119887119904119895(119905) = 1 representing mismatching of state node 119895 and cur-rent observed value There exist trans(ℎ119905 119886119894) = trans(ℎ119894 119886119894)According to transition prediction criterion (Algorithm 13)the current history sequence ℎ119905 is not identified by the hidden

Computational Intelligence and Neuroscience 9

sate 119904119894 so the identifying history sequence length for ℎ119905 andℎ119894 separately must increase to 119896 + 1(3) Previous Identifying If the current history sequence isidentified as the state 119904119895 then the previous state 119904119894 is identifiedsuch that 119904119895 = 119891119904(119904119894 119886119894) First we compute the identifyingactivation value for all states 119904 if there exists 119899119904119895(119905) = 1 thenthe current state 119904119905 is identified as state 119904119895 and the previousstate 119904119905minus1 is identified as the previous state 119904119894 of state 119904119895 Theprevious identifying value of the state node 119895 is defined as119889119904119894 (119905) The previous state 119904119894 is 119894 = argmax(119887119904119894 (119905)) satisfyingcondition 1 and condition 2 Then we set 119889119904119894 (119905)

Condition 1

119899119904119895 (119905) = 1 (14)

Condition 2

120601 (119887119904119894 (119905) 120596119894119895) gt 120599 (15)

According to above three identifying processes the iden-tifying activation value of the state node 119895 is defined asfollows

119899119904119895 (119905) = max (119898119904119895 (119905) 119902119904119895 (119905) 119889119904119895 (119905)) (16)

According to Algorithm 11 we give Pseudocode 1 forAlgorithm 11

Thepseudocode forAlgorithm 12 is as follows the currenthistory sequence identifying algorithm (identifying phase)Compute the identifying activation value according to (16)

The pseudocode for Algorithm 13 is in Pseudocode 2

4 Simulation

41 A Simple Example of Deterministic POMDP We wantto construct the looping transition model as in Figure 1First we obtain the transition instance ℎ119905 1199001 1198861 1199001 1198862 11990021198862 1199002 1198861 1199001 1198861 1199001 According to Algorithm 11 set the initialidentifying history sequence length 119896 = 1 The STAMNmodel is constructed incrementally using ℎ119905 as in Figure 6

When ℎ119905 = 1199001 1198861 1199001 1198862 1199002 1198862 1199002 1198861 1199001 1198861 1199001 1198862 1199001because of 119896 = 1 the looped state is identified by the currentobservation vectorThus ℎ1 = 1199001 and ℎ2 = 1199001 are the iden-tifying history sequences for the state 1199041 Since 1199001 = tans(ℎ11198862) and 1199002 = tans(ℎ2 1198862) in ℎ119905 exist so as to trans(ℎ1 1198862) =trans(ℎ2 1198862) according to Lemma 9 ℎ1 and ℎ2 are the iden-tifying history sequences for the states 119904119894 and 119904119895 respectivelyand 119904119894 = 119904119895 So 1199001 is distinguished as 11990010158401 and 119900101584010158401 And ℎ119905 =1199001 1198861 11990010158401 1198862 1199002 1198862 1199002 1198861 1199001 1198861 119900101584010158401 1198862 1199001

According the Lemma 10 for any two identifying historysequences ℎ119894 and ℎ119895 separately for 11990010158401 and 119900101584010158401 there existsℎ119894 = ℎ119895 iff 119904119894 = 119904119895 So the identifying history sequencelength for ℎ119894 and ℎ119895 separately for 119904119894 and 119904119895 must increase to 2For 11990010158401 the identifying history sequence is 1199001 1198861 11990010158401 and for119900101584010158401 the identifying history sequence is 1199001 1198861 119900101584010158401 These twoidentifying history sequences are identical so longer transi-tion instance is needed

fo(s2) = o2

fo(s1) = o1

s1

s2

a1

a1

a2

a2

Figure 6 The STAMNmodel with memory depth 1

Table 1 The identifying history sequences for 11990010158401 and 119900101584010158401 Hidden state 11990010158401 (119896 = 2) 119900101584010158401 (119896 = 3)

Identifying history sequence1199001 1198861 11990010158401 1199002 1198861 1199001 1198861 119900101584010158401119900101584010158401 1198862 119900101584011199002 1198861 11990010158401

When ℎ119905 = 1199001 1198861 11990010158401 1198862 1199002 1198862 1199002 1198861 1199001 1198861 119900101584010158401 1198862 11990010158401 11988621199002 1198861 11990010158401 1198862 1199002 1198861 1199001 1198861 1199001 1198861 11990010158401 1198862 1199002 In ℎ119905 we obtain theidentifying history sequence ℎ119894 = ℎ119895 for states 11990010158401 and 119900101584010158401 For 119896 = 2 the distinguished identifying history sequence for11990010158401 is 119900101584010158401 1198862 11990010158401 and 1199002 1198861 11990010158401 However the distinguishedidentifying history sequence for 119900101584010158401 is 1199001 1198861 119900101584010158401 which is nota discriminated history sequence from the identifying historysequence for 11990010158401 So the identifying history sequence length for119900101584010158401 must increase to 3 The identifying history sequences for11990010158401 and 1199001 are described as in Table 1

According to 119896-step history identifying following iden-tifying and previous identifying ℎ119905 can be represented asℎ119905 = 119900101584010158401 1198861 11990010158401 1198862 1199002 1198862 1199002 1198861 11990010158401 1198861 119900101584010158401 1198862 11990010158401 1198862 1199002 1198861 119900101584011198862 1199002 1198861 11990010158401 1198861 119900101584010158401 1198861 11990010158401 1198862 1199002

According to Pseudocode 1 the STAMN model is con-structed incrementally using ℎ119905 as in Figure 7(a) and theLPST is constructed as Figure 7(b) The STAMN has thefewest nodes because the state nodes in STAMN representthe hidden state and the state nodes in LPST represent theobservation state

To illustrate the difference between the STAMN andLPST we present the comparison results in Figure 8

After 10 timesteps the algorithms use the current learnedmodel to realize the transition prediction and the transitionprediction criterion is expressed by the prediction error Thedata point represents the average prediction error over 10runs We ran three algorithms the STAMN with exhaustiveexploration the STAMN with random exploration and theLPST

10 Computational Intelligence and Neuroscience

Set initial memory depth 119896 = 1 119905 = 0All nodes and connections weight do not exist initially in STAMNThe transition instance ℎ0 = 120601A STAMN is constructed incrementally through interaction with the environment by the agentexpanding the SATAMN until the transition prediction is contradicted with the current minimalhypothesis model reconstruct the hypothesis model by increase the memory depth 119896 in thispaper the constructing process includes the identifying and recalling simultaneouslyWhile one pattern 119900119905 or 119886119905 can be activated from the pattern cognition layer do

ℎ119905 = (ℎ119905minus1 119900119905 119886119905)for all existed state nodescompute the identifying activation value by executing Algorithm 12If exist the 119899119904119895(119905) = 1 thenthe looped state is identified and the weight 120596119894119895120596119894119895 is adjusted according to (4) (5)

elsea new state node 119895 will be created set 119899119904119895(119905) = 1 and the weight 120596119894119895120596119894119895 is adjusted according to (4) (5)end ifend forfor all existed state nodescompute the transition prediction criterion by executing Algorithm 13If IsNotTrans(119904119895) then

for the previous winner state node 119894 of the state node 119895set memory depth 119896 = 119896 + 1 until each ℎ119894 are discriminated

end forreconstruct the hypothesis model by the new memory depth 119896 according to Algorithm 11

end ifend forfor all existed action nodescompute the activation value according to (9)

if exist the 119899119886119895 (119905) = 1 thenthe weight 119908119894119895(119905) is adjusted according to (6)else

a new action node 119895 will be created set 119899119886119895 (119905) = 1 and the weight 119908119894119895(119905) is adjusted according to (6)end ifend forThe agent obtains the new observation vectors by 119900119905+1 = trans(119904119905 119886119905)119905 = 119905 + 1endWhile

Pseudocode 1 Algorithm for heuristic constructing the STAMN

IsNotTrans(119904119895)if the 119901119904119895(119905) gt 120599 according to (12) and the 119902119904119895(119905) = 1 according to (13) thenreturn Falseelsereturn Trueend if

Pseudocode 2 Transition prediction criterion algorithm (recalling phase)

Figure 8 shows that three algorithms all can produce thecorrect model with zero prediction error at last because ofno noise However the STAMN with exhaustive explorationhas the better performance and the faster convergence speedbecause of the exhaustive exploration 119876-function

42 Experimental Comparison in 4 lowast 3 Grid Problem Firstwe use the small POMDP problem to test the feasibility of

the STAMN 4 lowast 3 gird problem is selected which is shownin Figure 9 The agent wanders inside the grid and has onlyleft sensor and right sensor to report the existence of a wallin the current position The agent has four actions forwardbackward turn left and turn rightThe reference direction ofthe agent is northward

In the 4 lowast 3 gird problem the hidden state space size|119878| is 11 the action space size |119860| is 4 for each state and

Computational Intelligence and Neuroscience 11

a1

fo(s3) = o1

fo(s2) = o2fo(s1) = o1 s1 s2

s3

a1a1

a1a2

a2

a2

(a)

a1a1

a1

a1

a1

a2

r

o1

o1

o1

o1

o1

o1

o2

o2

o2

(b)

Figure 7 (a) The STAMNmodel (b) The LPST model

0 20 40 60 80 1000

09

08

07

06

05

04

03

02

01

1 A simple deterministic POMDP

Timesteps

Pred

ictio

n er

ror

STAMN (exhaustive exploration)STAMN (random exploration)LPST

Figure 8 Comparative performance results for a simple determin-istic POMDP

the observation space size |119874| is 4 The upper figure in thegrid represents the observation state and the bottom figure inthe grid represents the hidden state The black grid and thewall are regarded as the obstacles We present a comparisonbetween the LPST and the STAMNThe results are shown inFigure 10 Figure 10 shows that the STAMN with exhaustiveexploration still has the better performance and the fasterconvergence speed in the 4 lowast 3 gird problem

The number of state nodes and action nodes in STAMNand LPST is described in Table 2 In STAMN the state node

20

34

01

08

27

13

02

110

09

16

25

Figure 9 4 lowast 3 gird problem

Table 2 The number of state nodes and action nodes in STAMNand LPST

Algorithm The number of statenodes The number of action nodes

LPST 70 50STAMN 11 44

represents the hidden state but in LPST the sate node repre-sents observation value and the hidden state is expressed by119896-step past observations and actions thus more observationnodes and action nodes are repeated created and most of theobservation nodes and action nodes are the same LPST hasthe perfect observation prediction capability and is the sameas the STAMN but is not the state transition model

In STAMN The setting of parameters 120583 120578 120574119904119894 120575119886119894 is veryimportant which determines the depth 119896 of short-term

12 Computational Intelligence and Neuroscience

0 200 400 600 800 1000Timesteps

STAMN (exhaustive exploration)STAMN (random exploration)LPST

0

09

08

07

06

05

04

03

02

01

1

Pred

ictio

n er

ror

4 lowast 3 grid problem

Figure 10 Comparative performance results for 4lowast3 gird problem

memory We assume 120574119904119894 = 120575119886119894 = 09 and 120583 = 120578 = 09 initialrepresenting 119896 = 1 When we need to increase 119896 to 119896 + 1 weonly need to decrease 120583 120578 according to 120578 = 120578minus02 120583 = 120583minus01The other parameters are determined relative easily We setlearning rates 120572 = 120573 = 120574 = 09 and set the confidenceparameters 119862119894119895 = 1|120591119895| 119863119894119895 = 12 representing the sameimportance degree Set constant values 119862 = 5 120599 = 09

43 Experimental Results in Complex Symmetrical Environ-ments The symmetrical environment in this paper is verycomplexwhich is shown in Figure 11(a)The robotwanders inthe environment By following wall behaviour the agent canrecognize the left wall right wall and corridor landmarksThese observation landmarks are different from the observa-tion vector in the previous grid problem every observationlandmark has different duration time For analysis of thefault tolerance and robustness of STAMN we assume thedisturbed environment is shown in Figure 11(b)

The figures in Figure 11(a) represent the hidden statesThe robot has four actions forward backward turn leftand turn right The initial orientation of the robot is shownin Figure 11(a) Since the robot follows the wall in theenvironment the robot has only one optional action in eachstate The reference direction of action is the robot currentorientation Thus the path 2-3-4-5-6 and path 12-13-14-15-16 are identical and the memory depth 119896 = 6 is necessaryto identify hidden states reliably We present a comparisonbetween the STAMNand the LPST in noise-free environmentand disturbed environment The results are shown as inFigures 12(a) and 12(b) Figure 12(b) shows that STAMN hasbetter noise tolerance and robustness than LPST because ofthe neuron synchronized activation mechanism which bearsthe fault and noise in the sequence item duration time andcan realize the reliable recalling However the LPST cannot

realize the correct convergence in reasonable time because ofaccurate matching

5 Discussion

In this section we compare the relatedworkwith the STAMNmodel The related work mainly includes the LPST and theAMN

51 Looping Prediction Suffix Tree (LPST) LPST is construc-ted incrementally by expanding branches until they are iden-tified or become looped using the observable terminationcriterion Given a sufficient history this model can correctlycapture all predictions of identifying histories and can mapthe all infinite identifying histories onto a finite LPST

The STAMNproposed in this paper is similar to the LPSTHowever the STAMN is looping hidden state transitionmodel so in comparison with LPST STAMN have lessstate nodes and action nodes because these nodes in LPSTare based on observation not hidden state FurthermoreSTAMNhas better noise tolerance and robustness than LPSTThe LPST realizes recalling by successive accurate matchingwhich is sensitive to noise and fault The STAMN offers theneuron synchronized activation mechanism to realize recall-ing Even if in noisy and disturbed environment STAMNcan still realize the reliable recalling Finally the algorithm forlearning a LPST is an additional computation model whichis not a distributed computational model The STAMN isa distributed network and uses the synchronized activationmechanism where performance cannot become poor withhistory sequences increasing and scale increasing

52 Associative Memory Networks (AMN) STAMN is pro-posed based on the development of the associative memorynetworks In existing AMN firstly almost all models areunable to handle complex sequence with looped hiddenstate The STAMN realizes the identifying the looped hiddenstate indeed which can be applied to HMM and POMDPproblems Furthermoremost AMNmodels can obtainmem-ory depth determined by experiments The STAMN offers aself-organizing incremental memory depth learningmethodand the memory depth 119896 is variable in different parts ofstate space Finally the existing AMN models general onlyrecord the temporal orders with discrete intervals Δ119905 ratherthan sequence items durationwith continuous-time STAMNexplicitly deals with the duration time of each item

6 Conclusion and Future Research

POMDP is the long-standing difficult problem in the mach-ine learning In this paper SATMN is proposed to identifythe looped hidden state only by the transition instances indeterministic POMDP environment The learned STAMN isseen as a variable depth 119896-Morkov model We proposed theheuristic constructing algorithm for the STAMN which isproved to be sound and complete given sufficient historysequences The STAMN is real self-organizing incrementalunsupervised learning model These transition instances can

Computational Intelligence and Neuroscience 13

2

3

6

7

8

9

10

11

12

54 14

13

21

22

23

24 18

19

20 16

171

15

(a) (b)

Figure 11 (a) Complex symmetrical simulation environment (b) The disturbed environment

0 100 200 300 400 500

A complex symmetrical environment

Timesteps

STAMNLPST

0

09

08

07

06

05

04

03

02

01

1

Pred

ictio

n er

ror

(a)

0 100 200 300 400 500

A disturbed environment

Timesteps

STAMNLPST

0

09

08

07

06

05

04

03

02

01

1Pr

edic

tion

erro

r

(b)

Figure 12 (a) Comparative results for a noise-free environment (b) Comparative results for a disturbed environment

be obtained by the interaction with the environment by theagent and can be obtained by a number of training datathat does not depend on the real agent The STAMN is veryfast robust and accurate We have also shown that STAMNoutperforms some existing hidden state methods in deter-ministic POMDP environment The STAMN can generallybe applied to almost all temporal sequences problem suchas simultaneous localization and mapping problem (SLAM)robot trajectory planning sequential decision making andmusic generation

We believe that the STAMN can serve as a starting pointfor integrating the associative memory networks with thePOMDP problem Further research will be carried out onthe following aspects how to scale our approach to thestochastic case by heuristic statistical test how to incorporatewith the reinforcement learning to produce a new distributed

reinforcement learning model and how it should be appliedto robot SLAM to resolve the practical navigation problem

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this article

References

[1] A K McCallum Reinforcement learning with selective percep-tion and hidden state [PhD thesis] University of RochesterRochester NY USA 1996

[2] MHutter ldquoFeature reinforcement learning part I unstructuredMDPsrdquo Journal of Artificial General Intelligence vol 1 no 1 pp3ndash24 2009

14 Computational Intelligence and Neuroscience

[3] M Daswani P Sunehag andMHutter ldquoFeature reinforcementlearning state of the artrdquo in Proceedings of the Workshops at the28th AAAI Conference on Artificial Intelligence 2014

[4] PNguyen P Sunehag andMHutter ldquoContext treemaximizingreinforcement learningrdquo in Proceedings of the 26th AAAI Con-ference onArtificial Intelligence pp 1075ndash1082 Toronto CanadaJuly 2012

[5] J Veness K S NgMHutterWUther andD Silver ldquoAMonte-Carlo AIXI approximationrdquo Journal of Artificial IntelligenceResearch vol 40 no 1 pp 95ndash142 2011

[6] M P Holmes and C L Isbell Jr ldquoLooping suffix tree-basedinference of partially observable hidden staterdquo in Proceedings ofthe 23rd International Conference on Machine Learning (ICMLrsquo06) pp 409ndash416 ACM Pittsburgh Pa USA 2006

[7] M Daswani P Sunehag andMHutter ldquoFeature reinforcementlearning using looping suffix treesrdquo in Proceedings of the 10thEuropeanWorkshop on Reinforcement Learning (EWRL rsquo12) pp11ndash24 Edinburgh Scotland 2012

[8] M Daswani P Sunehag and M Hutter ldquoQ-learning forhistory-based reinforcement learningrdquo in Proceedings of theConference on Machine Learning pp 213ndash228 2013

[9] S Timmer and M Riedmiller Reinforcement learning with his-tory lists [PhD thesis] University of Osnabruck OsnabruckGermany 2009

[10] E Talvitie ldquoLearning partially observable models using tempo-rally abstract decision treesrdquo in Advances in Neural InformationProcessing Systems pp 818ndash826 2012

[11] M Mahmud ldquoConstructing states for reinforcement learningrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 727ndash734 June 2010

[12] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquoNeural Computation vol 9 no 8 pp 1735ndash1780 1997

[13] V Mnih K Kavukcuoglu D Silver et al ldquoHuman-level controlthrough deep reinforcement learningrdquo Nature vol 518 no7540 pp 529ndash533 2015

[14] M Hausknecht and P Stone ldquoDeep recurrent Q-learning forpartially observable MDPsrdquo httparxivorgabs150706527

[15] K Narasimhan T Kulkarni and R Barzilay ldquoLanguage under-standing for text-based games using deep reinforcement learn-ingrdquo httpsarxivorgabs150608941

[16] J Oh X Guo H Lee R Lewis and S Singh ldquoAction-condi-tional video prediction using deep networks in atari gamesrdquohttparxivorgabs150708750

[17] X Li L Li J Gao et al ldquoRecurrent reinforcement learning ahybrid approachrdquo httparxivorgabs150903044

[18] D Wang and B Yuwono ldquoIncremental learning of complextemporal patternsrdquo IEEE Transactions on Neural Networks vol7 no 6 pp 1465ndash1481 1996

[19] A Sudo A Sato and O Hasegawa ldquoAssociative memory foronline learning in noisy environments using self-organizingincremental neural networkrdquo IEEE Transactions on NeuralNetworks vol 20 no 6 pp 964ndash972 2009

[20] M U Keysermann and P A Vargas ldquoTowards autonomousrobots via an incremental clustering and associative learningarchitecturerdquo Cognitive Computation vol 7 no 4 pp 414ndash4332014

[21] A F R Araujo and G D A Barreto ldquoContext in temporal seq-uence processing a self-organizing approach and its applicationto roboticsrdquo IEEE Transactions on Neural Networks vol 13 no1 pp 45ndash57 2002

[22] A L Freire G A Barreto M Veloso and A T Varela ldquoShort-term memory mechanisms in neural network learning of robotnavigation tasks A Case Studyrdquo in Proceedings of the 6th LatinAmerican Robotics Symposium (LARS rsquo09) pp 1ndash6 October2009

[23] S Tangruamsub A Kawewong M Tsuboyama and O Hase-gawa ldquoSelf-organizing incremental associative memory-basedrobot navigationrdquo IEICE Transactions on Information andSystems vol 95 no 10 pp 2415ndash2425 2012

[24] V A Nguyen J A Starzyk W-B Goh and D Jachyra ldquoNeuralnetwork structure for spatio-temporal long-term memoryrdquoIEEE Transactions on Neural Networks and Learning Systemsvol 23 no 6 pp 971ndash983 2012

[25] F Shen H Yu W Kasai and O Hasegawa ldquoAn associativememory system for incremental learning and temporal seq-uencerdquo in Proceedings of the International Joint Conference onNeural Networks (IJCNN rsquo10) pp 1ndash8 IEEE Barcelona SpainJuly 2010

[26] F ShenQOuyangWKasai andOHasegawa ldquoA general asso-ciative memory based on self-organizing incremental neuralnetworkrdquo Neurocomputing vol 104 pp 57ndash71 2013

[27] B Khouzam ldquoNeural networks as cellular computing modelsfor temporal sequence processingrdquo Suplec 2014

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 5: Research Article A Self-Organizing Incremental ...downloads.hindawi.com › journals › cin › 2016 › 7158507.pdfResearch Article A Self-Organizing Incremental Spatiotemporal Associative

Computational Intelligence and Neuroscience 5

(1) 119896-Step History Sequence Identifying If the identifyinghistory sequence for 119904119894 is 119900119894minus119896+1 119886119894minus119896+1 119900119894minus119896+2 119886119894minus1 119900119894satisfy

119900119905 = 119900119894119886119905minus1 = 119886119894minus1

119900119905minus119896+2 = 119900119894minus119896+2119886119905minus119896+1 = 119886119894minus119896+1119900119905minus119896+1 = 119900119894minus119896+1

(1)

Thus the current state 119904 is looped to the hidden state119904119894 In the STAMN the 119896-step history sequence identifyingactivation value119898119904119895(119905) is computed

(2) Following Identifying If the current state 119904 is identified asthe hidden state 119904119894 the next transition history ℎ119905+1 is repre-sented by ℎ119905+1 = (ℎ119905 119886119905 119900119905+1) through trans(119904119894 119886119905) Accord-ing to Lemma 4 the next transition history ℎ119905+1 is identifiedas the transition prediction state 119904119895 = 119891119904(119904119894 119886119905)

In the STAMN the following activation value 119902119904119895(119905) iscomputed

(3) Previous Identifying If there exists a transition instanceℎ119905+1 = (ℎ119905 119886119905 119900119905+1) ℎ119905+1 is an identifying history sequencefor state 119904119895 According to Lemma 5 the previous state 119904119894 isuniquely identified by ℎ119905 such that 119904119895 = 119891119904(119904119894 119886119905) So if thereexists a transition prediction state 119904119895 and 119904119895 = 119891119904(119904119894 119886119905) thenthe previous state 119904119894 is uniquely identified

In the STAMN the previous activation value 119889119904119895(119905) iscomputed

Algorithm 13 (transition prediction criterion) If the modelcorrectly represents the environment model as Morkovchains then for all state 119904 for all identifying history sequenceℎ for the same state it is satisfied that trans(ℎ 119886) is the sameobservation for the same action

If ℎ1 and ℎ2 are the identifying history sequences with 119896-step memory for the hidden state 119904 there exist trans(ℎ1 119886) =trans(ℎ2 119886) according to Lemma 9 thus 119904119894 is discriminativewith 119904119895

In the STAMN the transition prediction criterion isrealized by the transition prediction value Psj(t)

3 Spatiotemporal AssociativeMemory Networks

In this section we want to relate the spatiotemporal sequenceproblem to the idea of identifying the hidden state by asequence of past observations and actions An associativememory network (AMN) is expected to have several charac-teristics (1) An AMN must memorize incrementally whichhas the ability to learn new knowledge without forgettingthe learned knowledge (2) An AMN must be able to notonly record the temporal orders but also record sequence

items duration time with continuous time (3) AnAMNmustbe able to realize the heteroassociation recall (4) An AMNmust be able to process the real-valued feature vectors inbottom-up method not the symbolic items (5) An AMNmust be robust and must be able to recall the sequencecorrectly with incomplete or noisy input (6) An AMN canrealize learning and recalling simultaneously (7) An AMNcan realize interaction between STM and LTM dual-tracetheory suggested that the persistent neural activities of STMcan lead to LTM

The first thing to be defined is the temporal dimension(discrete or continuous) The previous researches are mostlybased on a regular intervals of Δ119905 and few AMN have beenproposed to deal with not only the sequential characteristicbut also sequence items duration time with continuous timeHowever this characteristic is important for many problemsIn speech production writing music generation and motor-planning and so on the sequence item duration time andsequence item repeated emergence have essential differentmeaning For example ldquo1198614rdquo and ldquo119861-119861-119861-119861rdquo are exactlydifferent the former represents that item 119861 sustains for 4timesteps and the latter represents that item 119861 repeatedlyemerges for 4 times STAMN can explicitly distinguish thesetwo temporal characteristics So a history of past observationsand actions can be expressed by a special spatiotemporalsequence ℎ ℎ = 119900119905minus119896+1 119886119905minus119896+1 119900119905minus119896+2 119886119905minus1 119900119905 where 119896is the length of sequence of ℎ the items of ℎ include theobservation items and the action items where 119900119905 denotes thereal-valued observation vectors generated by taking action119886119905minus1 119886119905 denotes the action taken by the agent at time 119905where 119905 does not represent temporal discrete dimension bysampling at regular intervals Δ119905 but represents the 119905th step inthe continuous-time dimension which is the duration timebetween the current item and the next one

A spatiotemporal sequence can classified as simple seq-uence and complex sequence Simple sequence is a sequencewithout repeated items for example the sequence ldquo119861-119860-119864-119871rdquo is a simple sequence whereas those containing repeateditems are defined as the complex sequence In complexsequence the repeated item can be classified as looped itemsand discriminative items by identifying the hidden statefor example in the history sequence ldquo119883-119884-119885-119884-119885rdquo ldquo119884rdquoand ldquo119885rdquo maybe the looped items or discriminative itemsIdentifying the hidden state needs to introduce the contextualinformation resolving by the 119896-step memory The memorydepth 119896 is not fixed and 119896 is a variable value in different partsof state space

31 Spatiotemporal Associative Memory Networks Architec-ture We build a new spatiotemporal associative memorynetwork (STAMN) This model makes use of neuron activitydecay of nodes to achieve short-term memory connectionweights between different nodes to represent long-termmemory presynaptic potentials and neuron synchronizedactivation mechanism to realize identifying and recallingand a time-delayed Hebb learning mechanism to fulfil theone-shot learning

6 Computational Intelligence and Neuroscience

nsi

bsi

naj

baj

wij 120596ij

120596ij

Figure 2 STAMN structure diagram

STAMN is an incremental possibly looping nonfullyconnected asymmetric associative memory network Non-homogeneous nodes correspond to hidden state nodes andaction nodes

For the state node 119895 the input values are defined asfollows (1) the current observation activation value 119887119904119895(119905)responsible for matching degree of state node 119895 and currentobserved value which is obtained from the preprocessingneural networks (if the matching degree is greater thana threshold value then 119887119904119895(119905) = 1) (2) the observationactivation value 119887119904119895(119905) of the presynaptic nodes set 120591119895 of currentstate node 119895 (3) the identifying activation value 119899119904119894 (119905) of theprevious state node 119894 of state node 119895 (4) the activation value119899119886119894 (119905) of the previous action node 119894 of state node 119895 The outputvalues are defined as follows (1) The identifying activationvalue 119899119904119895(119905) represents whether the current state 119904 is identifiedto the hidden state 119904119895 or not (2) The transition predictionvalue 119901119904119895(119905) represents whether the state node 119895 is the currentstate transition prediction node or not

For the action node 119895 the input values are defined asthe current action activation value 119887119886119895 (119905) responsible formatching degree of action node 119895 and current motor vectorsThe output value is activation value of the action nodes 119899119886119895 (119905)and indicates that the action node 119895has been selected by agentto control the robotrsquos current action

For the STAMN all nodes and connections weight donot necessarily exist initially The weights 120596119894119895(119905) and 120596119894119895(119905)connected to state nodes can be learned by a time-delayedHebb learning rules incrementally representing the LTMThe weight 119908119894119895(119905) connected to action nodes can be learnedby reinforcement learning All nodes have activity self-decay mechanism to record the duration time of this noderepresenting the STM The output of the STAMN is thewinner state node or winner action node by winner takes all

STAMN architecture is shown in Figure 2 where blacknodes represent action nodes and concentric circles nodesrepresent state nodes

32 Short-TermMemory We using self-decay of neuron acti-vity to accomplish short-term memory no matter whetherobservation activation 119887119904119894 (119905) or identifying activation 119899119904119894 (119905)The activity of each state node has self-decay mechanismto record the temporal order and the duration time of thisnode Supposing the factor of self-decay is 120574119904119894 isin (0 1) and theactivation value 119887119904119894 (119905) 119899119904119894 (119905) isin [0 1] the self-decay process isshown as

119887119904119894 (119905) 119899119904119894 (119905)

= 0 119887119904119894 (119905) 119899119904119894 (119905) ⩽ 120578120574119904119894 119887119904119894 (119905 minus 1) 120574119904119894 119899119904119894 (119905 minus 1) else

(2)

The activity of action node also has self-decay character-istic Supposing the factor of self-decay is 120575119886119894 isin (0 1) and theactivation value 119887119886119894 (119905) 119899119886119894 (119905) isin [0 1] the self-decay process isshown as

119887119886119894 (119905) 119899119886119894 (119905)

= 0 119887119886119894 (119905) 119899119886119894 (119905) ⩽ 120583120575119886119894 119887119886119894 (119905 minus 1) 120575119886119894 119899119886119894 (119905 minus 1) else

(3)

wherein 120578 and 120583 are active thresholds and 120574119904119894 and 120575119886119894 areself-decay factors Both determine the depth of short-termmemory 119896 where 119905 is a discrete time point by sampling atregular intervals Δ119905 and Δ119905 is a very small regular interval

33 Long-Term Memory Long-term memory can be classi-fied into semantic memory and episodic memory Learningof history sequence is considered as the episodic memorygenerally adopting the one-shot learning to realize We usethe time-delayed Hebb learning rules to fulfil the one-shotlearning

(1) The 119896-Step Long-Term Memory Weight 120596119894119895(119905) The weight120596119894119895(119905) connected to state nodes 119895 represents 119896-step long-termmemory This is a past-oriented behaviour in the STAMNThe weight 120596119894119895(119905) is adjusted according to

120596119894119895 (119905)119894isin120591119895

119899119904119895(119905)=1

= 119899119904119895 (119905) 119899119883119894 (119905) 120596119894119895 (119905 minus 1) = 0 119883 isin 119904 119886120596119894119895 (119905 minus 1) + 120572 (119899119883119894 (119905) minus 120596119894119895 (119905 minus 1)) else (4)

where 119895 is the current identifying activation state node119899119904119895(119905) = 1 Because the identifying activation process is aneuron synchronized activation process when 119899119904119895(119905) = 1 all

nodes whose 119899119904119894 (119905) and 119899119886119894 (119905) are not zero are the contextualinformation related to state node 119895 These nodes whose 119899119904119894 (119905)and 119899119886119894 (119905) are not zero are considered as the presynaptic node

Computational Intelligence and Neuroscience 7

set of current state node 119895The presynaptic node set of currentstate node 119895 is expressed by 120591119895 where 119899119904119894 (119905) 119899119886119894 (119905) representthe activation value at time 119905 and 119899119904119894 (119905) 119899119886119894 (119905) record not onlythe temporal order but also the duration time because of theself-decay of neuron activity If 119899119904119894 (119905) 119899119886119894 (119905) are smaller thenode 119894 is more early to current state node 119895120596119894119895(119905) is activationweight between presynaptic node 119894 and state node 119895 120596119894119895(119905)records context information related to state node 119895 to be usedin identifying and recalling where 120596119894119895(119905) isin [0 1] 120596119894119895(0) = 0and 120572 is learning rate

The weight 120596119894119895(119905) is time-related contextual informationthe update process is shown as in Figure 3

(2) One-Step Transition Prediction Weight 120596119894119895(119905) The weight120596119894119895(119905) connected to state nodes 119895 represents one-step transi-tion prediction in LTM This is a future-oriented behaviourin the STAMN Using the time-delayed Hebb learning rulesthe weight 120596119894119895(119905) is adjusted according to

120596119894119895 (119905)119894=argmax 119899119883

119894(119905)

119899119904119895(119905)=1

=

119899119904119895 (119905) 119899119883119894 (119905) 120596119894119895 (119905 minus 1) = 0 119883 isin 119904 119886120596119894119895 (119905 minus 1) + 120573 (119899119883119894 (119905) minus 120596119894119895 (119905 minus 1)) else (5)

The transition activation of current state node 119895 is onlyassociated with the previous winning state node and actionnode where 119895 is the current identifying activation state node119899119904119895(119905) = 1 The previous winning state node and action nodeare presented by 119894 = argmax119883119894 (119905) 119883 isin 119904 119886 where 120596(119905) isin[0 1] and 120596(0) = 0 120573 is learning rate

The weight 120596119894119895(119905) is one-step transition prediction infor-mation the update process is shown as in Figure 4

(3) The Weight 119908119894119895(119905) Connected to Action NodesThe activa-tion of action node is only associated with the correspondingstate node which selects this action node directly We assumethe state node with maximal identifying activation value isthe state node 119894 at the time 119905 so the connection weight 119908119894119895(119905)connected to current selected action node 119895 is adjusted by

119908119894119895 (119905)119894=argmax 119899119904

119894(119905)

119899119886119895(119905)=1

=

1198760 (119904119894 119886119895) 119908119894119895 (119905 minus 1) = 0minus119876 max const 119886119895 is a invalid action

119876119905 (119904119894 119886119895) else(6)

where 119908119894119895(119905) is represented by 119876 function 119876119905(119904119905 119886119895) Thispaper only discusses how to build generalized environmentmodel not learning the optimal policy so this value is setto be the exhaustive exploration 119876 function based on thecuriosity reward The curiosity reward is described by (7)When action 119886119895 is an invalid action 119908119894119895(119905) is defined to bea large negative constant which is avoided going to the deadends

119903 (119905) = 119862 minus 119899119905

119899ave 119899ave gt 0119862 119899ave = 0

(7)

where 119862 is a constant value 119899119905 represents the count ofexploration to the current action by the agent and 119899ave is theaverage count of exploration to all actions by the agent 119899119905119899averepresents the degree of familiarity with the current selectedaction The curiosity reward is updated when each action is

finished 119876119905(119904119894 119886119895) update equation is showed by (8) where 120582is learning rate

119876119905 (119904119894 119886119895)

= (1 minus 120582)119876119905minus1 (119904119894 119886119895) + 120582119903 (119905) 119894 = argmax 119899119904119894 (119905) 119899119886119895 (119905) = 1119876119905minus1 (119904119894 119886119895) else

(8)

The update process is shown as in Figure 5The action node 119895 is selected by 119876119905(119904119905 119886119895) according to

119886119895 = argmax119895isin119860vl119894119899119904119894 (119905)=1

119876(119904119894 119886119895) (9)

where 119860vl119894 is the valid action set of state node 119894 119899119904119894 (119905) = 1

represent that the sate node 119894 is identified and the action node119895 was selected by agent to control the robotrsquos current actionset 119899119886119894 (119905) = 1

34 The Constructing Process in STAMN In order to con-struct the minimal looping hidden state transition model weneed to identify the looped state by identifying hidden stateThere exist identifying phase and recalling phase (transitionprediction phase) simultaneously in the constructing processin STAMN There is a chicken and egg problem during theconstructing process the building of the STAMN dependson state identifying conversely state identifying dependson the current structure of the STAMN Thus exhaustiveexploration and 119896-step variable memory length (depends onthe prediction criterion) are used to try to avoid local minimathat this interdependent causes

According to Algorithm 12 The identifying activationvalue of state node 119904119895 depends on three identifying processes119896-step history sequence identifying following identifyingand previous identifying We provide calculation equationsfor each identifying process

(1) 119896-Step History Sequence IdentifyingThe matching degreeof current 119896-step history sequence with the identifying

8 Computational Intelligence and Neuroscience

ojminus1 ajminus1120596ij ojmiddot middot middot ns

j(t) = 1

i isin Γj nsi(t) = 0 na

i = 0

ojminusk+1 ajminusk+1

Figure 3 Past-oriented weights 120596119894119895(119905) update process associatedwith state node 119895

oi ojai

120596ij(t)nsj(t) = 1

i = arg max nXi (t)

Figure 4 Future-oriented weights 120596119894119895(119905) update process associatedwith state node 119895

oi ajQ0(si aj)

naj (t) = 1

i = arg max nsi (t)

Figure 5 Weights 119908119894119895(119905) updating process associated with actionnode 119895

history sequence ℎ119895 for 119904119895 is computed to identify the loopingstate node 119895 First we compute the presynaptic potential119881119895(119905)for the state node 119895 according to

119881119895 (119905) = sum119894isin120591119895

119862119894119895 (119887119883119894 (119905) 120596119894119895) 119883 isin 119904 119886 (10)

where 119887119883119894 (119905) is the current activation value in the presynapticnode set 120591119895 119862119894119895 is the confidence parameter which meansthe node 119894rsquos importance degree in the presynaptic node set120591119895 The value of 119862119894119895 can be set in advance and sum119894isin120591119895 119862119894119895 =1 The function represents the similar degree betweenactivation value of the presynaptic node and the contextualinformation 120596119894119895 in LTM (119909 119910) isin [0 1] the similar degreeis high between 119909 and 119910 thus(119909 119910) is close to 1 Accordingto ldquowinner takes allrdquo among all nodes whose presynapticpotentials exceed the threshold 120599 the node with the largestpresynaptic potential will be selected

The presynaptic potential 119881119895(119905) represents the syn-chronous activation process of presynaptic node set of 120591119895

which represents the previous 119896 minus 1 step contextual infor-mation matching of state node 119895 To realize all 119896-step historysequence matching the 119896-step history sequence identifyingactivation value of the state node 119895 is given below

119898119904119895 (119905) = 119867 (119881119895 (119905) 120599)

sdot 119867(119881119895 (119905) max1198961rarr119873119888

(119881119896 (119905))) 119887119904119895 (119905)

119867 (119909 119910) = 0 119909 lt 1199101 119909 ge 119910

(11)

where max1198961rarr119873119888(119881119896(119905)) is the maximum potential value ofnode 119896 119867(119881119895(119905) 120599) sdot 119867(119881119895(119905)max1198961rarr119873119888(119881119896(119905))) means thenode with the largest potential value is selected among allnodes whose synaptic potentials exceed the threshold 120599 119887119904119895(119905)represents the matching degree of state node 119895 and currentobserved value If 119898119904119895(119905) = 1 the current state 119904 is identifiedto looped state node 119895 by the 119896 step memory

(2) Following Identifying If the current state 119904 is identifiedas the state 119904119894 then the next transition prediction state isidentified as the state 119904119895 = 119891119904(119904119894 119886119894) First we compute thetransition prediction value for the state node 119895 according to

119901119904119895 (119905) = sum119894=argmax 119899119883119894 (119905)

119863119894119895120601 (119899119883119894 (119905) 120596119894119895) 119883 isin 119904 119886 (12)

where 119899119883119894 (119905) is the identifying activation value of the statenode and action node at time 119905 119894 = argmax 119899119883119894 (119905) indicatesstate node 119894 and action node 119894 are the previous winner nodes119863119894119895 is the confidence parameter which means the node 119894rsquosimportance degree The value of 119863119894119895 can be set in advanceand sum119894=arg max 119899119883

119894(119905)119863119894119895 = 1 120596119894119895 records one-step transition

prediction information related to state node 119895 to be usedin identifying and recalling phase If the current state 119904 isidentified as the hidden state 119904119894119901119904119895(119905) represent the probabilityof the next transition prediction state is 119895

If the next prediction state node 119895 is the same as thecurrent observation value the current history sequence isidentified as the state node 119895 the following identifying valueof the state node 119895 is given below

119902119904119895 (119905) = 119867 (119901119904119895 (119905) 120599)

sdot 119867(119901119904119895 (119905) max1198961rarr119873119888

(119901119904119896 (119905))) 119887119904119895 (119905) (13)

If 119902119904119895(119905) = 1 the current history sequence ℎ119905 is identifiedto looped state node 119895 by the following identifying If 119901119904119895(119905) gemax1198961rarr119873119888(119901119904119896(119905)) 119901119904119895(119905) ge 120599 and 119902119904119895(119905) = 1 then there exists119887119904119895(119905) = 1 representing mismatching of state node 119895 and cur-rent observed value There exist trans(ℎ119905 119886119894) = trans(ℎ119894 119886119894)According to transition prediction criterion (Algorithm 13)the current history sequence ℎ119905 is not identified by the hidden

Computational Intelligence and Neuroscience 9

sate 119904119894 so the identifying history sequence length for ℎ119905 andℎ119894 separately must increase to 119896 + 1(3) Previous Identifying If the current history sequence isidentified as the state 119904119895 then the previous state 119904119894 is identifiedsuch that 119904119895 = 119891119904(119904119894 119886119894) First we compute the identifyingactivation value for all states 119904 if there exists 119899119904119895(119905) = 1 thenthe current state 119904119905 is identified as state 119904119895 and the previousstate 119904119905minus1 is identified as the previous state 119904119894 of state 119904119895 Theprevious identifying value of the state node 119895 is defined as119889119904119894 (119905) The previous state 119904119894 is 119894 = argmax(119887119904119894 (119905)) satisfyingcondition 1 and condition 2 Then we set 119889119904119894 (119905)

Condition 1

119899119904119895 (119905) = 1 (14)

Condition 2

120601 (119887119904119894 (119905) 120596119894119895) gt 120599 (15)

According to above three identifying processes the iden-tifying activation value of the state node 119895 is defined asfollows

119899119904119895 (119905) = max (119898119904119895 (119905) 119902119904119895 (119905) 119889119904119895 (119905)) (16)

According to Algorithm 11 we give Pseudocode 1 forAlgorithm 11

Thepseudocode forAlgorithm 12 is as follows the currenthistory sequence identifying algorithm (identifying phase)Compute the identifying activation value according to (16)

The pseudocode for Algorithm 13 is in Pseudocode 2

4 Simulation

41 A Simple Example of Deterministic POMDP We wantto construct the looping transition model as in Figure 1First we obtain the transition instance ℎ119905 1199001 1198861 1199001 1198862 11990021198862 1199002 1198861 1199001 1198861 1199001 According to Algorithm 11 set the initialidentifying history sequence length 119896 = 1 The STAMNmodel is constructed incrementally using ℎ119905 as in Figure 6

When ℎ119905 = 1199001 1198861 1199001 1198862 1199002 1198862 1199002 1198861 1199001 1198861 1199001 1198862 1199001because of 119896 = 1 the looped state is identified by the currentobservation vectorThus ℎ1 = 1199001 and ℎ2 = 1199001 are the iden-tifying history sequences for the state 1199041 Since 1199001 = tans(ℎ11198862) and 1199002 = tans(ℎ2 1198862) in ℎ119905 exist so as to trans(ℎ1 1198862) =trans(ℎ2 1198862) according to Lemma 9 ℎ1 and ℎ2 are the iden-tifying history sequences for the states 119904119894 and 119904119895 respectivelyand 119904119894 = 119904119895 So 1199001 is distinguished as 11990010158401 and 119900101584010158401 And ℎ119905 =1199001 1198861 11990010158401 1198862 1199002 1198862 1199002 1198861 1199001 1198861 119900101584010158401 1198862 1199001

According the Lemma 10 for any two identifying historysequences ℎ119894 and ℎ119895 separately for 11990010158401 and 119900101584010158401 there existsℎ119894 = ℎ119895 iff 119904119894 = 119904119895 So the identifying history sequencelength for ℎ119894 and ℎ119895 separately for 119904119894 and 119904119895 must increase to 2For 11990010158401 the identifying history sequence is 1199001 1198861 11990010158401 and for119900101584010158401 the identifying history sequence is 1199001 1198861 119900101584010158401 These twoidentifying history sequences are identical so longer transi-tion instance is needed

fo(s2) = o2

fo(s1) = o1

s1

s2

a1

a1

a2

a2

Figure 6 The STAMNmodel with memory depth 1

Table 1 The identifying history sequences for 11990010158401 and 119900101584010158401 Hidden state 11990010158401 (119896 = 2) 119900101584010158401 (119896 = 3)

Identifying history sequence1199001 1198861 11990010158401 1199002 1198861 1199001 1198861 119900101584010158401119900101584010158401 1198862 119900101584011199002 1198861 11990010158401

When ℎ119905 = 1199001 1198861 11990010158401 1198862 1199002 1198862 1199002 1198861 1199001 1198861 119900101584010158401 1198862 11990010158401 11988621199002 1198861 11990010158401 1198862 1199002 1198861 1199001 1198861 1199001 1198861 11990010158401 1198862 1199002 In ℎ119905 we obtain theidentifying history sequence ℎ119894 = ℎ119895 for states 11990010158401 and 119900101584010158401 For 119896 = 2 the distinguished identifying history sequence for11990010158401 is 119900101584010158401 1198862 11990010158401 and 1199002 1198861 11990010158401 However the distinguishedidentifying history sequence for 119900101584010158401 is 1199001 1198861 119900101584010158401 which is nota discriminated history sequence from the identifying historysequence for 11990010158401 So the identifying history sequence length for119900101584010158401 must increase to 3 The identifying history sequences for11990010158401 and 1199001 are described as in Table 1

According to 119896-step history identifying following iden-tifying and previous identifying ℎ119905 can be represented asℎ119905 = 119900101584010158401 1198861 11990010158401 1198862 1199002 1198862 1199002 1198861 11990010158401 1198861 119900101584010158401 1198862 11990010158401 1198862 1199002 1198861 119900101584011198862 1199002 1198861 11990010158401 1198861 119900101584010158401 1198861 11990010158401 1198862 1199002

According to Pseudocode 1 the STAMN model is con-structed incrementally using ℎ119905 as in Figure 7(a) and theLPST is constructed as Figure 7(b) The STAMN has thefewest nodes because the state nodes in STAMN representthe hidden state and the state nodes in LPST represent theobservation state

To illustrate the difference between the STAMN andLPST we present the comparison results in Figure 8

After 10 timesteps the algorithms use the current learnedmodel to realize the transition prediction and the transitionprediction criterion is expressed by the prediction error Thedata point represents the average prediction error over 10runs We ran three algorithms the STAMN with exhaustiveexploration the STAMN with random exploration and theLPST

10 Computational Intelligence and Neuroscience

Set initial memory depth 119896 = 1 119905 = 0All nodes and connections weight do not exist initially in STAMNThe transition instance ℎ0 = 120601A STAMN is constructed incrementally through interaction with the environment by the agentexpanding the SATAMN until the transition prediction is contradicted with the current minimalhypothesis model reconstruct the hypothesis model by increase the memory depth 119896 in thispaper the constructing process includes the identifying and recalling simultaneouslyWhile one pattern 119900119905 or 119886119905 can be activated from the pattern cognition layer do

ℎ119905 = (ℎ119905minus1 119900119905 119886119905)for all existed state nodescompute the identifying activation value by executing Algorithm 12If exist the 119899119904119895(119905) = 1 thenthe looped state is identified and the weight 120596119894119895120596119894119895 is adjusted according to (4) (5)

elsea new state node 119895 will be created set 119899119904119895(119905) = 1 and the weight 120596119894119895120596119894119895 is adjusted according to (4) (5)end ifend forfor all existed state nodescompute the transition prediction criterion by executing Algorithm 13If IsNotTrans(119904119895) then

for the previous winner state node 119894 of the state node 119895set memory depth 119896 = 119896 + 1 until each ℎ119894 are discriminated

end forreconstruct the hypothesis model by the new memory depth 119896 according to Algorithm 11

end ifend forfor all existed action nodescompute the activation value according to (9)

if exist the 119899119886119895 (119905) = 1 thenthe weight 119908119894119895(119905) is adjusted according to (6)else

a new action node 119895 will be created set 119899119886119895 (119905) = 1 and the weight 119908119894119895(119905) is adjusted according to (6)end ifend forThe agent obtains the new observation vectors by 119900119905+1 = trans(119904119905 119886119905)119905 = 119905 + 1endWhile

Pseudocode 1 Algorithm for heuristic constructing the STAMN

IsNotTrans(119904119895)if the 119901119904119895(119905) gt 120599 according to (12) and the 119902119904119895(119905) = 1 according to (13) thenreturn Falseelsereturn Trueend if

Pseudocode 2 Transition prediction criterion algorithm (recalling phase)

Figure 8 shows that three algorithms all can produce thecorrect model with zero prediction error at last because ofno noise However the STAMN with exhaustive explorationhas the better performance and the faster convergence speedbecause of the exhaustive exploration 119876-function

42 Experimental Comparison in 4 lowast 3 Grid Problem Firstwe use the small POMDP problem to test the feasibility of

the STAMN 4 lowast 3 gird problem is selected which is shownin Figure 9 The agent wanders inside the grid and has onlyleft sensor and right sensor to report the existence of a wallin the current position The agent has four actions forwardbackward turn left and turn rightThe reference direction ofthe agent is northward

In the 4 lowast 3 gird problem the hidden state space size|119878| is 11 the action space size |119860| is 4 for each state and

Computational Intelligence and Neuroscience 11

a1

fo(s3) = o1

fo(s2) = o2fo(s1) = o1 s1 s2

s3

a1a1

a1a2

a2

a2

(a)

a1a1

a1

a1

a1

a2

r

o1

o1

o1

o1

o1

o1

o2

o2

o2

(b)

Figure 7 (a) The STAMNmodel (b) The LPST model

0 20 40 60 80 1000

09

08

07

06

05

04

03

02

01

1 A simple deterministic POMDP

Timesteps

Pred

ictio

n er

ror

STAMN (exhaustive exploration)STAMN (random exploration)LPST

Figure 8 Comparative performance results for a simple determin-istic POMDP

the observation space size |119874| is 4 The upper figure in thegrid represents the observation state and the bottom figure inthe grid represents the hidden state The black grid and thewall are regarded as the obstacles We present a comparisonbetween the LPST and the STAMNThe results are shown inFigure 10 Figure 10 shows that the STAMN with exhaustiveexploration still has the better performance and the fasterconvergence speed in the 4 lowast 3 gird problem

The number of state nodes and action nodes in STAMNand LPST is described in Table 2 In STAMN the state node

20

34

01

08

27

13

02

110

09

16

25

Figure 9 4 lowast 3 gird problem

Table 2 The number of state nodes and action nodes in STAMNand LPST

Algorithm The number of statenodes The number of action nodes

LPST 70 50STAMN 11 44

represents the hidden state but in LPST the sate node repre-sents observation value and the hidden state is expressed by119896-step past observations and actions thus more observationnodes and action nodes are repeated created and most of theobservation nodes and action nodes are the same LPST hasthe perfect observation prediction capability and is the sameas the STAMN but is not the state transition model

In STAMN The setting of parameters 120583 120578 120574119904119894 120575119886119894 is veryimportant which determines the depth 119896 of short-term

12 Computational Intelligence and Neuroscience

0 200 400 600 800 1000Timesteps

STAMN (exhaustive exploration)STAMN (random exploration)LPST

0

09

08

07

06

05

04

03

02

01

1

Pred

ictio

n er

ror

4 lowast 3 grid problem

Figure 10 Comparative performance results for 4lowast3 gird problem

memory We assume 120574119904119894 = 120575119886119894 = 09 and 120583 = 120578 = 09 initialrepresenting 119896 = 1 When we need to increase 119896 to 119896 + 1 weonly need to decrease 120583 120578 according to 120578 = 120578minus02 120583 = 120583minus01The other parameters are determined relative easily We setlearning rates 120572 = 120573 = 120574 = 09 and set the confidenceparameters 119862119894119895 = 1|120591119895| 119863119894119895 = 12 representing the sameimportance degree Set constant values 119862 = 5 120599 = 09

43 Experimental Results in Complex Symmetrical Environ-ments The symmetrical environment in this paper is verycomplexwhich is shown in Figure 11(a)The robotwanders inthe environment By following wall behaviour the agent canrecognize the left wall right wall and corridor landmarksThese observation landmarks are different from the observa-tion vector in the previous grid problem every observationlandmark has different duration time For analysis of thefault tolerance and robustness of STAMN we assume thedisturbed environment is shown in Figure 11(b)

The figures in Figure 11(a) represent the hidden statesThe robot has four actions forward backward turn leftand turn right The initial orientation of the robot is shownin Figure 11(a) Since the robot follows the wall in theenvironment the robot has only one optional action in eachstate The reference direction of action is the robot currentorientation Thus the path 2-3-4-5-6 and path 12-13-14-15-16 are identical and the memory depth 119896 = 6 is necessaryto identify hidden states reliably We present a comparisonbetween the STAMNand the LPST in noise-free environmentand disturbed environment The results are shown as inFigures 12(a) and 12(b) Figure 12(b) shows that STAMN hasbetter noise tolerance and robustness than LPST because ofthe neuron synchronized activation mechanism which bearsthe fault and noise in the sequence item duration time andcan realize the reliable recalling However the LPST cannot

realize the correct convergence in reasonable time because ofaccurate matching

5 Discussion

In this section we compare the relatedworkwith the STAMNmodel The related work mainly includes the LPST and theAMN

51 Looping Prediction Suffix Tree (LPST) LPST is construc-ted incrementally by expanding branches until they are iden-tified or become looped using the observable terminationcriterion Given a sufficient history this model can correctlycapture all predictions of identifying histories and can mapthe all infinite identifying histories onto a finite LPST

The STAMNproposed in this paper is similar to the LPSTHowever the STAMN is looping hidden state transitionmodel so in comparison with LPST STAMN have lessstate nodes and action nodes because these nodes in LPSTare based on observation not hidden state FurthermoreSTAMNhas better noise tolerance and robustness than LPSTThe LPST realizes recalling by successive accurate matchingwhich is sensitive to noise and fault The STAMN offers theneuron synchronized activation mechanism to realize recall-ing Even if in noisy and disturbed environment STAMNcan still realize the reliable recalling Finally the algorithm forlearning a LPST is an additional computation model whichis not a distributed computational model The STAMN isa distributed network and uses the synchronized activationmechanism where performance cannot become poor withhistory sequences increasing and scale increasing

52 Associative Memory Networks (AMN) STAMN is pro-posed based on the development of the associative memorynetworks In existing AMN firstly almost all models areunable to handle complex sequence with looped hiddenstate The STAMN realizes the identifying the looped hiddenstate indeed which can be applied to HMM and POMDPproblems Furthermoremost AMNmodels can obtainmem-ory depth determined by experiments The STAMN offers aself-organizing incremental memory depth learningmethodand the memory depth 119896 is variable in different parts ofstate space Finally the existing AMN models general onlyrecord the temporal orders with discrete intervals Δ119905 ratherthan sequence items durationwith continuous-time STAMNexplicitly deals with the duration time of each item

6 Conclusion and Future Research

POMDP is the long-standing difficult problem in the mach-ine learning In this paper SATMN is proposed to identifythe looped hidden state only by the transition instances indeterministic POMDP environment The learned STAMN isseen as a variable depth 119896-Morkov model We proposed theheuristic constructing algorithm for the STAMN which isproved to be sound and complete given sufficient historysequences The STAMN is real self-organizing incrementalunsupervised learning model These transition instances can

Computational Intelligence and Neuroscience 13

2

3

6

7

8

9

10

11

12

54 14

13

21

22

23

24 18

19

20 16

171

15

(a) (b)

Figure 11 (a) Complex symmetrical simulation environment (b) The disturbed environment

0 100 200 300 400 500

A complex symmetrical environment

Timesteps

STAMNLPST

0

09

08

07

06

05

04

03

02

01

1

Pred

ictio

n er

ror

(a)

0 100 200 300 400 500

A disturbed environment

Timesteps

STAMNLPST

0

09

08

07

06

05

04

03

02

01

1Pr

edic

tion

erro

r

(b)

Figure 12 (a) Comparative results for a noise-free environment (b) Comparative results for a disturbed environment

be obtained by the interaction with the environment by theagent and can be obtained by a number of training datathat does not depend on the real agent The STAMN is veryfast robust and accurate We have also shown that STAMNoutperforms some existing hidden state methods in deter-ministic POMDP environment The STAMN can generallybe applied to almost all temporal sequences problem suchas simultaneous localization and mapping problem (SLAM)robot trajectory planning sequential decision making andmusic generation

We believe that the STAMN can serve as a starting pointfor integrating the associative memory networks with thePOMDP problem Further research will be carried out onthe following aspects how to scale our approach to thestochastic case by heuristic statistical test how to incorporatewith the reinforcement learning to produce a new distributed

reinforcement learning model and how it should be appliedto robot SLAM to resolve the practical navigation problem

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this article

References

[1] A K McCallum Reinforcement learning with selective percep-tion and hidden state [PhD thesis] University of RochesterRochester NY USA 1996

[2] MHutter ldquoFeature reinforcement learning part I unstructuredMDPsrdquo Journal of Artificial General Intelligence vol 1 no 1 pp3ndash24 2009

14 Computational Intelligence and Neuroscience

[3] M Daswani P Sunehag andMHutter ldquoFeature reinforcementlearning state of the artrdquo in Proceedings of the Workshops at the28th AAAI Conference on Artificial Intelligence 2014

[4] PNguyen P Sunehag andMHutter ldquoContext treemaximizingreinforcement learningrdquo in Proceedings of the 26th AAAI Con-ference onArtificial Intelligence pp 1075ndash1082 Toronto CanadaJuly 2012

[5] J Veness K S NgMHutterWUther andD Silver ldquoAMonte-Carlo AIXI approximationrdquo Journal of Artificial IntelligenceResearch vol 40 no 1 pp 95ndash142 2011

[6] M P Holmes and C L Isbell Jr ldquoLooping suffix tree-basedinference of partially observable hidden staterdquo in Proceedings ofthe 23rd International Conference on Machine Learning (ICMLrsquo06) pp 409ndash416 ACM Pittsburgh Pa USA 2006

[7] M Daswani P Sunehag andMHutter ldquoFeature reinforcementlearning using looping suffix treesrdquo in Proceedings of the 10thEuropeanWorkshop on Reinforcement Learning (EWRL rsquo12) pp11ndash24 Edinburgh Scotland 2012

[8] M Daswani P Sunehag and M Hutter ldquoQ-learning forhistory-based reinforcement learningrdquo in Proceedings of theConference on Machine Learning pp 213ndash228 2013

[9] S Timmer and M Riedmiller Reinforcement learning with his-tory lists [PhD thesis] University of Osnabruck OsnabruckGermany 2009

[10] E Talvitie ldquoLearning partially observable models using tempo-rally abstract decision treesrdquo in Advances in Neural InformationProcessing Systems pp 818ndash826 2012

[11] M Mahmud ldquoConstructing states for reinforcement learningrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 727ndash734 June 2010

[12] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquoNeural Computation vol 9 no 8 pp 1735ndash1780 1997

[13] V Mnih K Kavukcuoglu D Silver et al ldquoHuman-level controlthrough deep reinforcement learningrdquo Nature vol 518 no7540 pp 529ndash533 2015

[14] M Hausknecht and P Stone ldquoDeep recurrent Q-learning forpartially observable MDPsrdquo httparxivorgabs150706527

[15] K Narasimhan T Kulkarni and R Barzilay ldquoLanguage under-standing for text-based games using deep reinforcement learn-ingrdquo httpsarxivorgabs150608941

[16] J Oh X Guo H Lee R Lewis and S Singh ldquoAction-condi-tional video prediction using deep networks in atari gamesrdquohttparxivorgabs150708750

[17] X Li L Li J Gao et al ldquoRecurrent reinforcement learning ahybrid approachrdquo httparxivorgabs150903044

[18] D Wang and B Yuwono ldquoIncremental learning of complextemporal patternsrdquo IEEE Transactions on Neural Networks vol7 no 6 pp 1465ndash1481 1996

[19] A Sudo A Sato and O Hasegawa ldquoAssociative memory foronline learning in noisy environments using self-organizingincremental neural networkrdquo IEEE Transactions on NeuralNetworks vol 20 no 6 pp 964ndash972 2009

[20] M U Keysermann and P A Vargas ldquoTowards autonomousrobots via an incremental clustering and associative learningarchitecturerdquo Cognitive Computation vol 7 no 4 pp 414ndash4332014

[21] A F R Araujo and G D A Barreto ldquoContext in temporal seq-uence processing a self-organizing approach and its applicationto roboticsrdquo IEEE Transactions on Neural Networks vol 13 no1 pp 45ndash57 2002

[22] A L Freire G A Barreto M Veloso and A T Varela ldquoShort-term memory mechanisms in neural network learning of robotnavigation tasks A Case Studyrdquo in Proceedings of the 6th LatinAmerican Robotics Symposium (LARS rsquo09) pp 1ndash6 October2009

[23] S Tangruamsub A Kawewong M Tsuboyama and O Hase-gawa ldquoSelf-organizing incremental associative memory-basedrobot navigationrdquo IEICE Transactions on Information andSystems vol 95 no 10 pp 2415ndash2425 2012

[24] V A Nguyen J A Starzyk W-B Goh and D Jachyra ldquoNeuralnetwork structure for spatio-temporal long-term memoryrdquoIEEE Transactions on Neural Networks and Learning Systemsvol 23 no 6 pp 971ndash983 2012

[25] F Shen H Yu W Kasai and O Hasegawa ldquoAn associativememory system for incremental learning and temporal seq-uencerdquo in Proceedings of the International Joint Conference onNeural Networks (IJCNN rsquo10) pp 1ndash8 IEEE Barcelona SpainJuly 2010

[26] F ShenQOuyangWKasai andOHasegawa ldquoA general asso-ciative memory based on self-organizing incremental neuralnetworkrdquo Neurocomputing vol 104 pp 57ndash71 2013

[27] B Khouzam ldquoNeural networks as cellular computing modelsfor temporal sequence processingrdquo Suplec 2014

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 6: Research Article A Self-Organizing Incremental ...downloads.hindawi.com › journals › cin › 2016 › 7158507.pdfResearch Article A Self-Organizing Incremental Spatiotemporal Associative

6 Computational Intelligence and Neuroscience

nsi

bsi

naj

baj

wij 120596ij

120596ij

Figure 2 STAMN structure diagram

STAMN is an incremental possibly looping nonfullyconnected asymmetric associative memory network Non-homogeneous nodes correspond to hidden state nodes andaction nodes

For the state node 119895 the input values are defined asfollows (1) the current observation activation value 119887119904119895(119905)responsible for matching degree of state node 119895 and currentobserved value which is obtained from the preprocessingneural networks (if the matching degree is greater thana threshold value then 119887119904119895(119905) = 1) (2) the observationactivation value 119887119904119895(119905) of the presynaptic nodes set 120591119895 of currentstate node 119895 (3) the identifying activation value 119899119904119894 (119905) of theprevious state node 119894 of state node 119895 (4) the activation value119899119886119894 (119905) of the previous action node 119894 of state node 119895 The outputvalues are defined as follows (1) The identifying activationvalue 119899119904119895(119905) represents whether the current state 119904 is identifiedto the hidden state 119904119895 or not (2) The transition predictionvalue 119901119904119895(119905) represents whether the state node 119895 is the currentstate transition prediction node or not

For the action node 119895 the input values are defined asthe current action activation value 119887119886119895 (119905) responsible formatching degree of action node 119895 and current motor vectorsThe output value is activation value of the action nodes 119899119886119895 (119905)and indicates that the action node 119895has been selected by agentto control the robotrsquos current action

For the STAMN all nodes and connections weight donot necessarily exist initially The weights 120596119894119895(119905) and 120596119894119895(119905)connected to state nodes can be learned by a time-delayedHebb learning rules incrementally representing the LTMThe weight 119908119894119895(119905) connected to action nodes can be learnedby reinforcement learning All nodes have activity self-decay mechanism to record the duration time of this noderepresenting the STM The output of the STAMN is thewinner state node or winner action node by winner takes all

STAMN architecture is shown in Figure 2 where blacknodes represent action nodes and concentric circles nodesrepresent state nodes

32 Short-TermMemory We using self-decay of neuron acti-vity to accomplish short-term memory no matter whetherobservation activation 119887119904119894 (119905) or identifying activation 119899119904119894 (119905)The activity of each state node has self-decay mechanismto record the temporal order and the duration time of thisnode Supposing the factor of self-decay is 120574119904119894 isin (0 1) and theactivation value 119887119904119894 (119905) 119899119904119894 (119905) isin [0 1] the self-decay process isshown as

119887119904119894 (119905) 119899119904119894 (119905)

= 0 119887119904119894 (119905) 119899119904119894 (119905) ⩽ 120578120574119904119894 119887119904119894 (119905 minus 1) 120574119904119894 119899119904119894 (119905 minus 1) else

(2)

The activity of action node also has self-decay character-istic Supposing the factor of self-decay is 120575119886119894 isin (0 1) and theactivation value 119887119886119894 (119905) 119899119886119894 (119905) isin [0 1] the self-decay process isshown as

119887119886119894 (119905) 119899119886119894 (119905)

= 0 119887119886119894 (119905) 119899119886119894 (119905) ⩽ 120583120575119886119894 119887119886119894 (119905 minus 1) 120575119886119894 119899119886119894 (119905 minus 1) else

(3)

wherein 120578 and 120583 are active thresholds and 120574119904119894 and 120575119886119894 areself-decay factors Both determine the depth of short-termmemory 119896 where 119905 is a discrete time point by sampling atregular intervals Δ119905 and Δ119905 is a very small regular interval

33 Long-Term Memory Long-term memory can be classi-fied into semantic memory and episodic memory Learningof history sequence is considered as the episodic memorygenerally adopting the one-shot learning to realize We usethe time-delayed Hebb learning rules to fulfil the one-shotlearning

(1) The 119896-Step Long-Term Memory Weight 120596119894119895(119905) The weight120596119894119895(119905) connected to state nodes 119895 represents 119896-step long-termmemory This is a past-oriented behaviour in the STAMNThe weight 120596119894119895(119905) is adjusted according to

120596119894119895 (119905)119894isin120591119895

119899119904119895(119905)=1

= 119899119904119895 (119905) 119899119883119894 (119905) 120596119894119895 (119905 minus 1) = 0 119883 isin 119904 119886120596119894119895 (119905 minus 1) + 120572 (119899119883119894 (119905) minus 120596119894119895 (119905 minus 1)) else (4)

where 119895 is the current identifying activation state node119899119904119895(119905) = 1 Because the identifying activation process is aneuron synchronized activation process when 119899119904119895(119905) = 1 all

nodes whose 119899119904119894 (119905) and 119899119886119894 (119905) are not zero are the contextualinformation related to state node 119895 These nodes whose 119899119904119894 (119905)and 119899119886119894 (119905) are not zero are considered as the presynaptic node

Computational Intelligence and Neuroscience 7

set of current state node 119895The presynaptic node set of currentstate node 119895 is expressed by 120591119895 where 119899119904119894 (119905) 119899119886119894 (119905) representthe activation value at time 119905 and 119899119904119894 (119905) 119899119886119894 (119905) record not onlythe temporal order but also the duration time because of theself-decay of neuron activity If 119899119904119894 (119905) 119899119886119894 (119905) are smaller thenode 119894 is more early to current state node 119895120596119894119895(119905) is activationweight between presynaptic node 119894 and state node 119895 120596119894119895(119905)records context information related to state node 119895 to be usedin identifying and recalling where 120596119894119895(119905) isin [0 1] 120596119894119895(0) = 0and 120572 is learning rate

The weight 120596119894119895(119905) is time-related contextual informationthe update process is shown as in Figure 3

(2) One-Step Transition Prediction Weight 120596119894119895(119905) The weight120596119894119895(119905) connected to state nodes 119895 represents one-step transi-tion prediction in LTM This is a future-oriented behaviourin the STAMN Using the time-delayed Hebb learning rulesthe weight 120596119894119895(119905) is adjusted according to

120596119894119895 (119905)119894=argmax 119899119883

119894(119905)

119899119904119895(119905)=1

=

119899119904119895 (119905) 119899119883119894 (119905) 120596119894119895 (119905 minus 1) = 0 119883 isin 119904 119886120596119894119895 (119905 minus 1) + 120573 (119899119883119894 (119905) minus 120596119894119895 (119905 minus 1)) else (5)

The transition activation of current state node 119895 is onlyassociated with the previous winning state node and actionnode where 119895 is the current identifying activation state node119899119904119895(119905) = 1 The previous winning state node and action nodeare presented by 119894 = argmax119883119894 (119905) 119883 isin 119904 119886 where 120596(119905) isin[0 1] and 120596(0) = 0 120573 is learning rate

The weight 120596119894119895(119905) is one-step transition prediction infor-mation the update process is shown as in Figure 4

(3) The Weight 119908119894119895(119905) Connected to Action NodesThe activa-tion of action node is only associated with the correspondingstate node which selects this action node directly We assumethe state node with maximal identifying activation value isthe state node 119894 at the time 119905 so the connection weight 119908119894119895(119905)connected to current selected action node 119895 is adjusted by

119908119894119895 (119905)119894=argmax 119899119904

119894(119905)

119899119886119895(119905)=1

=

1198760 (119904119894 119886119895) 119908119894119895 (119905 minus 1) = 0minus119876 max const 119886119895 is a invalid action

119876119905 (119904119894 119886119895) else(6)

where 119908119894119895(119905) is represented by 119876 function 119876119905(119904119905 119886119895) Thispaper only discusses how to build generalized environmentmodel not learning the optimal policy so this value is setto be the exhaustive exploration 119876 function based on thecuriosity reward The curiosity reward is described by (7)When action 119886119895 is an invalid action 119908119894119895(119905) is defined to bea large negative constant which is avoided going to the deadends

119903 (119905) = 119862 minus 119899119905

119899ave 119899ave gt 0119862 119899ave = 0

(7)

where 119862 is a constant value 119899119905 represents the count ofexploration to the current action by the agent and 119899ave is theaverage count of exploration to all actions by the agent 119899119905119899averepresents the degree of familiarity with the current selectedaction The curiosity reward is updated when each action is

finished 119876119905(119904119894 119886119895) update equation is showed by (8) where 120582is learning rate

119876119905 (119904119894 119886119895)

= (1 minus 120582)119876119905minus1 (119904119894 119886119895) + 120582119903 (119905) 119894 = argmax 119899119904119894 (119905) 119899119886119895 (119905) = 1119876119905minus1 (119904119894 119886119895) else

(8)

The update process is shown as in Figure 5The action node 119895 is selected by 119876119905(119904119905 119886119895) according to

119886119895 = argmax119895isin119860vl119894119899119904119894 (119905)=1

119876(119904119894 119886119895) (9)

where 119860vl119894 is the valid action set of state node 119894 119899119904119894 (119905) = 1

represent that the sate node 119894 is identified and the action node119895 was selected by agent to control the robotrsquos current actionset 119899119886119894 (119905) = 1

34 The Constructing Process in STAMN In order to con-struct the minimal looping hidden state transition model weneed to identify the looped state by identifying hidden stateThere exist identifying phase and recalling phase (transitionprediction phase) simultaneously in the constructing processin STAMN There is a chicken and egg problem during theconstructing process the building of the STAMN dependson state identifying conversely state identifying dependson the current structure of the STAMN Thus exhaustiveexploration and 119896-step variable memory length (depends onthe prediction criterion) are used to try to avoid local minimathat this interdependent causes

According to Algorithm 12 The identifying activationvalue of state node 119904119895 depends on three identifying processes119896-step history sequence identifying following identifyingand previous identifying We provide calculation equationsfor each identifying process

(1) 119896-Step History Sequence IdentifyingThe matching degreeof current 119896-step history sequence with the identifying

8 Computational Intelligence and Neuroscience

ojminus1 ajminus1120596ij ojmiddot middot middot ns

j(t) = 1

i isin Γj nsi(t) = 0 na

i = 0

ojminusk+1 ajminusk+1

Figure 3 Past-oriented weights 120596119894119895(119905) update process associatedwith state node 119895

oi ojai

120596ij(t)nsj(t) = 1

i = arg max nXi (t)

Figure 4 Future-oriented weights 120596119894119895(119905) update process associatedwith state node 119895

oi ajQ0(si aj)

naj (t) = 1

i = arg max nsi (t)

Figure 5 Weights 119908119894119895(119905) updating process associated with actionnode 119895

history sequence ℎ119895 for 119904119895 is computed to identify the loopingstate node 119895 First we compute the presynaptic potential119881119895(119905)for the state node 119895 according to

119881119895 (119905) = sum119894isin120591119895

119862119894119895 (119887119883119894 (119905) 120596119894119895) 119883 isin 119904 119886 (10)

where 119887119883119894 (119905) is the current activation value in the presynapticnode set 120591119895 119862119894119895 is the confidence parameter which meansthe node 119894rsquos importance degree in the presynaptic node set120591119895 The value of 119862119894119895 can be set in advance and sum119894isin120591119895 119862119894119895 =1 The function represents the similar degree betweenactivation value of the presynaptic node and the contextualinformation 120596119894119895 in LTM (119909 119910) isin [0 1] the similar degreeis high between 119909 and 119910 thus(119909 119910) is close to 1 Accordingto ldquowinner takes allrdquo among all nodes whose presynapticpotentials exceed the threshold 120599 the node with the largestpresynaptic potential will be selected

The presynaptic potential 119881119895(119905) represents the syn-chronous activation process of presynaptic node set of 120591119895

which represents the previous 119896 minus 1 step contextual infor-mation matching of state node 119895 To realize all 119896-step historysequence matching the 119896-step history sequence identifyingactivation value of the state node 119895 is given below

119898119904119895 (119905) = 119867 (119881119895 (119905) 120599)

sdot 119867(119881119895 (119905) max1198961rarr119873119888

(119881119896 (119905))) 119887119904119895 (119905)

119867 (119909 119910) = 0 119909 lt 1199101 119909 ge 119910

(11)

where max1198961rarr119873119888(119881119896(119905)) is the maximum potential value ofnode 119896 119867(119881119895(119905) 120599) sdot 119867(119881119895(119905)max1198961rarr119873119888(119881119896(119905))) means thenode with the largest potential value is selected among allnodes whose synaptic potentials exceed the threshold 120599 119887119904119895(119905)represents the matching degree of state node 119895 and currentobserved value If 119898119904119895(119905) = 1 the current state 119904 is identifiedto looped state node 119895 by the 119896 step memory

(2) Following Identifying If the current state 119904 is identifiedas the state 119904119894 then the next transition prediction state isidentified as the state 119904119895 = 119891119904(119904119894 119886119894) First we compute thetransition prediction value for the state node 119895 according to

119901119904119895 (119905) = sum119894=argmax 119899119883119894 (119905)

119863119894119895120601 (119899119883119894 (119905) 120596119894119895) 119883 isin 119904 119886 (12)

where 119899119883119894 (119905) is the identifying activation value of the statenode and action node at time 119905 119894 = argmax 119899119883119894 (119905) indicatesstate node 119894 and action node 119894 are the previous winner nodes119863119894119895 is the confidence parameter which means the node 119894rsquosimportance degree The value of 119863119894119895 can be set in advanceand sum119894=arg max 119899119883

119894(119905)119863119894119895 = 1 120596119894119895 records one-step transition

prediction information related to state node 119895 to be usedin identifying and recalling phase If the current state 119904 isidentified as the hidden state 119904119894119901119904119895(119905) represent the probabilityof the next transition prediction state is 119895

If the next prediction state node 119895 is the same as thecurrent observation value the current history sequence isidentified as the state node 119895 the following identifying valueof the state node 119895 is given below

119902119904119895 (119905) = 119867 (119901119904119895 (119905) 120599)

sdot 119867(119901119904119895 (119905) max1198961rarr119873119888

(119901119904119896 (119905))) 119887119904119895 (119905) (13)

If 119902119904119895(119905) = 1 the current history sequence ℎ119905 is identifiedto looped state node 119895 by the following identifying If 119901119904119895(119905) gemax1198961rarr119873119888(119901119904119896(119905)) 119901119904119895(119905) ge 120599 and 119902119904119895(119905) = 1 then there exists119887119904119895(119905) = 1 representing mismatching of state node 119895 and cur-rent observed value There exist trans(ℎ119905 119886119894) = trans(ℎ119894 119886119894)According to transition prediction criterion (Algorithm 13)the current history sequence ℎ119905 is not identified by the hidden

Computational Intelligence and Neuroscience 9

sate 119904119894 so the identifying history sequence length for ℎ119905 andℎ119894 separately must increase to 119896 + 1(3) Previous Identifying If the current history sequence isidentified as the state 119904119895 then the previous state 119904119894 is identifiedsuch that 119904119895 = 119891119904(119904119894 119886119894) First we compute the identifyingactivation value for all states 119904 if there exists 119899119904119895(119905) = 1 thenthe current state 119904119905 is identified as state 119904119895 and the previousstate 119904119905minus1 is identified as the previous state 119904119894 of state 119904119895 Theprevious identifying value of the state node 119895 is defined as119889119904119894 (119905) The previous state 119904119894 is 119894 = argmax(119887119904119894 (119905)) satisfyingcondition 1 and condition 2 Then we set 119889119904119894 (119905)

Condition 1

119899119904119895 (119905) = 1 (14)

Condition 2

120601 (119887119904119894 (119905) 120596119894119895) gt 120599 (15)

According to above three identifying processes the iden-tifying activation value of the state node 119895 is defined asfollows

119899119904119895 (119905) = max (119898119904119895 (119905) 119902119904119895 (119905) 119889119904119895 (119905)) (16)

According to Algorithm 11 we give Pseudocode 1 forAlgorithm 11

Thepseudocode forAlgorithm 12 is as follows the currenthistory sequence identifying algorithm (identifying phase)Compute the identifying activation value according to (16)

The pseudocode for Algorithm 13 is in Pseudocode 2

4 Simulation

41 A Simple Example of Deterministic POMDP We wantto construct the looping transition model as in Figure 1First we obtain the transition instance ℎ119905 1199001 1198861 1199001 1198862 11990021198862 1199002 1198861 1199001 1198861 1199001 According to Algorithm 11 set the initialidentifying history sequence length 119896 = 1 The STAMNmodel is constructed incrementally using ℎ119905 as in Figure 6

When ℎ119905 = 1199001 1198861 1199001 1198862 1199002 1198862 1199002 1198861 1199001 1198861 1199001 1198862 1199001because of 119896 = 1 the looped state is identified by the currentobservation vectorThus ℎ1 = 1199001 and ℎ2 = 1199001 are the iden-tifying history sequences for the state 1199041 Since 1199001 = tans(ℎ11198862) and 1199002 = tans(ℎ2 1198862) in ℎ119905 exist so as to trans(ℎ1 1198862) =trans(ℎ2 1198862) according to Lemma 9 ℎ1 and ℎ2 are the iden-tifying history sequences for the states 119904119894 and 119904119895 respectivelyand 119904119894 = 119904119895 So 1199001 is distinguished as 11990010158401 and 119900101584010158401 And ℎ119905 =1199001 1198861 11990010158401 1198862 1199002 1198862 1199002 1198861 1199001 1198861 119900101584010158401 1198862 1199001

According the Lemma 10 for any two identifying historysequences ℎ119894 and ℎ119895 separately for 11990010158401 and 119900101584010158401 there existsℎ119894 = ℎ119895 iff 119904119894 = 119904119895 So the identifying history sequencelength for ℎ119894 and ℎ119895 separately for 119904119894 and 119904119895 must increase to 2For 11990010158401 the identifying history sequence is 1199001 1198861 11990010158401 and for119900101584010158401 the identifying history sequence is 1199001 1198861 119900101584010158401 These twoidentifying history sequences are identical so longer transi-tion instance is needed

fo(s2) = o2

fo(s1) = o1

s1

s2

a1

a1

a2

a2

Figure 6 The STAMNmodel with memory depth 1

Table 1 The identifying history sequences for 11990010158401 and 119900101584010158401 Hidden state 11990010158401 (119896 = 2) 119900101584010158401 (119896 = 3)

Identifying history sequence1199001 1198861 11990010158401 1199002 1198861 1199001 1198861 119900101584010158401119900101584010158401 1198862 119900101584011199002 1198861 11990010158401

When ℎ119905 = 1199001 1198861 11990010158401 1198862 1199002 1198862 1199002 1198861 1199001 1198861 119900101584010158401 1198862 11990010158401 11988621199002 1198861 11990010158401 1198862 1199002 1198861 1199001 1198861 1199001 1198861 11990010158401 1198862 1199002 In ℎ119905 we obtain theidentifying history sequence ℎ119894 = ℎ119895 for states 11990010158401 and 119900101584010158401 For 119896 = 2 the distinguished identifying history sequence for11990010158401 is 119900101584010158401 1198862 11990010158401 and 1199002 1198861 11990010158401 However the distinguishedidentifying history sequence for 119900101584010158401 is 1199001 1198861 119900101584010158401 which is nota discriminated history sequence from the identifying historysequence for 11990010158401 So the identifying history sequence length for119900101584010158401 must increase to 3 The identifying history sequences for11990010158401 and 1199001 are described as in Table 1

According to 119896-step history identifying following iden-tifying and previous identifying ℎ119905 can be represented asℎ119905 = 119900101584010158401 1198861 11990010158401 1198862 1199002 1198862 1199002 1198861 11990010158401 1198861 119900101584010158401 1198862 11990010158401 1198862 1199002 1198861 119900101584011198862 1199002 1198861 11990010158401 1198861 119900101584010158401 1198861 11990010158401 1198862 1199002

According to Pseudocode 1 the STAMN model is con-structed incrementally using ℎ119905 as in Figure 7(a) and theLPST is constructed as Figure 7(b) The STAMN has thefewest nodes because the state nodes in STAMN representthe hidden state and the state nodes in LPST represent theobservation state

To illustrate the difference between the STAMN andLPST we present the comparison results in Figure 8

After 10 timesteps the algorithms use the current learnedmodel to realize the transition prediction and the transitionprediction criterion is expressed by the prediction error Thedata point represents the average prediction error over 10runs We ran three algorithms the STAMN with exhaustiveexploration the STAMN with random exploration and theLPST

10 Computational Intelligence and Neuroscience

Set initial memory depth 119896 = 1 119905 = 0All nodes and connections weight do not exist initially in STAMNThe transition instance ℎ0 = 120601A STAMN is constructed incrementally through interaction with the environment by the agentexpanding the SATAMN until the transition prediction is contradicted with the current minimalhypothesis model reconstruct the hypothesis model by increase the memory depth 119896 in thispaper the constructing process includes the identifying and recalling simultaneouslyWhile one pattern 119900119905 or 119886119905 can be activated from the pattern cognition layer do

ℎ119905 = (ℎ119905minus1 119900119905 119886119905)for all existed state nodescompute the identifying activation value by executing Algorithm 12If exist the 119899119904119895(119905) = 1 thenthe looped state is identified and the weight 120596119894119895120596119894119895 is adjusted according to (4) (5)

elsea new state node 119895 will be created set 119899119904119895(119905) = 1 and the weight 120596119894119895120596119894119895 is adjusted according to (4) (5)end ifend forfor all existed state nodescompute the transition prediction criterion by executing Algorithm 13If IsNotTrans(119904119895) then

for the previous winner state node 119894 of the state node 119895set memory depth 119896 = 119896 + 1 until each ℎ119894 are discriminated

end forreconstruct the hypothesis model by the new memory depth 119896 according to Algorithm 11

end ifend forfor all existed action nodescompute the activation value according to (9)

if exist the 119899119886119895 (119905) = 1 thenthe weight 119908119894119895(119905) is adjusted according to (6)else

a new action node 119895 will be created set 119899119886119895 (119905) = 1 and the weight 119908119894119895(119905) is adjusted according to (6)end ifend forThe agent obtains the new observation vectors by 119900119905+1 = trans(119904119905 119886119905)119905 = 119905 + 1endWhile

Pseudocode 1 Algorithm for heuristic constructing the STAMN

IsNotTrans(119904119895)if the 119901119904119895(119905) gt 120599 according to (12) and the 119902119904119895(119905) = 1 according to (13) thenreturn Falseelsereturn Trueend if

Pseudocode 2 Transition prediction criterion algorithm (recalling phase)

Figure 8 shows that three algorithms all can produce thecorrect model with zero prediction error at last because ofno noise However the STAMN with exhaustive explorationhas the better performance and the faster convergence speedbecause of the exhaustive exploration 119876-function

42 Experimental Comparison in 4 lowast 3 Grid Problem Firstwe use the small POMDP problem to test the feasibility of

the STAMN 4 lowast 3 gird problem is selected which is shownin Figure 9 The agent wanders inside the grid and has onlyleft sensor and right sensor to report the existence of a wallin the current position The agent has four actions forwardbackward turn left and turn rightThe reference direction ofthe agent is northward

In the 4 lowast 3 gird problem the hidden state space size|119878| is 11 the action space size |119860| is 4 for each state and

Computational Intelligence and Neuroscience 11

a1

fo(s3) = o1

fo(s2) = o2fo(s1) = o1 s1 s2

s3

a1a1

a1a2

a2

a2

(a)

a1a1

a1

a1

a1

a2

r

o1

o1

o1

o1

o1

o1

o2

o2

o2

(b)

Figure 7 (a) The STAMNmodel (b) The LPST model

0 20 40 60 80 1000

09

08

07

06

05

04

03

02

01

1 A simple deterministic POMDP

Timesteps

Pred

ictio

n er

ror

STAMN (exhaustive exploration)STAMN (random exploration)LPST

Figure 8 Comparative performance results for a simple determin-istic POMDP

the observation space size |119874| is 4 The upper figure in thegrid represents the observation state and the bottom figure inthe grid represents the hidden state The black grid and thewall are regarded as the obstacles We present a comparisonbetween the LPST and the STAMNThe results are shown inFigure 10 Figure 10 shows that the STAMN with exhaustiveexploration still has the better performance and the fasterconvergence speed in the 4 lowast 3 gird problem

The number of state nodes and action nodes in STAMNand LPST is described in Table 2 In STAMN the state node

20

34

01

08

27

13

02

110

09

16

25

Figure 9 4 lowast 3 gird problem

Table 2 The number of state nodes and action nodes in STAMNand LPST

Algorithm The number of statenodes The number of action nodes

LPST 70 50STAMN 11 44

represents the hidden state but in LPST the sate node repre-sents observation value and the hidden state is expressed by119896-step past observations and actions thus more observationnodes and action nodes are repeated created and most of theobservation nodes and action nodes are the same LPST hasthe perfect observation prediction capability and is the sameas the STAMN but is not the state transition model

In STAMN The setting of parameters 120583 120578 120574119904119894 120575119886119894 is veryimportant which determines the depth 119896 of short-term

12 Computational Intelligence and Neuroscience

0 200 400 600 800 1000Timesteps

STAMN (exhaustive exploration)STAMN (random exploration)LPST

0

09

08

07

06

05

04

03

02

01

1

Pred

ictio

n er

ror

4 lowast 3 grid problem

Figure 10 Comparative performance results for 4lowast3 gird problem

memory We assume 120574119904119894 = 120575119886119894 = 09 and 120583 = 120578 = 09 initialrepresenting 119896 = 1 When we need to increase 119896 to 119896 + 1 weonly need to decrease 120583 120578 according to 120578 = 120578minus02 120583 = 120583minus01The other parameters are determined relative easily We setlearning rates 120572 = 120573 = 120574 = 09 and set the confidenceparameters 119862119894119895 = 1|120591119895| 119863119894119895 = 12 representing the sameimportance degree Set constant values 119862 = 5 120599 = 09

43 Experimental Results in Complex Symmetrical Environ-ments The symmetrical environment in this paper is verycomplexwhich is shown in Figure 11(a)The robotwanders inthe environment By following wall behaviour the agent canrecognize the left wall right wall and corridor landmarksThese observation landmarks are different from the observa-tion vector in the previous grid problem every observationlandmark has different duration time For analysis of thefault tolerance and robustness of STAMN we assume thedisturbed environment is shown in Figure 11(b)

The figures in Figure 11(a) represent the hidden statesThe robot has four actions forward backward turn leftand turn right The initial orientation of the robot is shownin Figure 11(a) Since the robot follows the wall in theenvironment the robot has only one optional action in eachstate The reference direction of action is the robot currentorientation Thus the path 2-3-4-5-6 and path 12-13-14-15-16 are identical and the memory depth 119896 = 6 is necessaryto identify hidden states reliably We present a comparisonbetween the STAMNand the LPST in noise-free environmentand disturbed environment The results are shown as inFigures 12(a) and 12(b) Figure 12(b) shows that STAMN hasbetter noise tolerance and robustness than LPST because ofthe neuron synchronized activation mechanism which bearsthe fault and noise in the sequence item duration time andcan realize the reliable recalling However the LPST cannot

realize the correct convergence in reasonable time because ofaccurate matching

5 Discussion

In this section we compare the relatedworkwith the STAMNmodel The related work mainly includes the LPST and theAMN

51 Looping Prediction Suffix Tree (LPST) LPST is construc-ted incrementally by expanding branches until they are iden-tified or become looped using the observable terminationcriterion Given a sufficient history this model can correctlycapture all predictions of identifying histories and can mapthe all infinite identifying histories onto a finite LPST

The STAMNproposed in this paper is similar to the LPSTHowever the STAMN is looping hidden state transitionmodel so in comparison with LPST STAMN have lessstate nodes and action nodes because these nodes in LPSTare based on observation not hidden state FurthermoreSTAMNhas better noise tolerance and robustness than LPSTThe LPST realizes recalling by successive accurate matchingwhich is sensitive to noise and fault The STAMN offers theneuron synchronized activation mechanism to realize recall-ing Even if in noisy and disturbed environment STAMNcan still realize the reliable recalling Finally the algorithm forlearning a LPST is an additional computation model whichis not a distributed computational model The STAMN isa distributed network and uses the synchronized activationmechanism where performance cannot become poor withhistory sequences increasing and scale increasing

52 Associative Memory Networks (AMN) STAMN is pro-posed based on the development of the associative memorynetworks In existing AMN firstly almost all models areunable to handle complex sequence with looped hiddenstate The STAMN realizes the identifying the looped hiddenstate indeed which can be applied to HMM and POMDPproblems Furthermoremost AMNmodels can obtainmem-ory depth determined by experiments The STAMN offers aself-organizing incremental memory depth learningmethodand the memory depth 119896 is variable in different parts ofstate space Finally the existing AMN models general onlyrecord the temporal orders with discrete intervals Δ119905 ratherthan sequence items durationwith continuous-time STAMNexplicitly deals with the duration time of each item

6 Conclusion and Future Research

POMDP is the long-standing difficult problem in the mach-ine learning In this paper SATMN is proposed to identifythe looped hidden state only by the transition instances indeterministic POMDP environment The learned STAMN isseen as a variable depth 119896-Morkov model We proposed theheuristic constructing algorithm for the STAMN which isproved to be sound and complete given sufficient historysequences The STAMN is real self-organizing incrementalunsupervised learning model These transition instances can

Computational Intelligence and Neuroscience 13

2

3

6

7

8

9

10

11

12

54 14

13

21

22

23

24 18

19

20 16

171

15

(a) (b)

Figure 11 (a) Complex symmetrical simulation environment (b) The disturbed environment

0 100 200 300 400 500

A complex symmetrical environment

Timesteps

STAMNLPST

0

09

08

07

06

05

04

03

02

01

1

Pred

ictio

n er

ror

(a)

0 100 200 300 400 500

A disturbed environment

Timesteps

STAMNLPST

0

09

08

07

06

05

04

03

02

01

1Pr

edic

tion

erro

r

(b)

Figure 12 (a) Comparative results for a noise-free environment (b) Comparative results for a disturbed environment

be obtained by the interaction with the environment by theagent and can be obtained by a number of training datathat does not depend on the real agent The STAMN is veryfast robust and accurate We have also shown that STAMNoutperforms some existing hidden state methods in deter-ministic POMDP environment The STAMN can generallybe applied to almost all temporal sequences problem suchas simultaneous localization and mapping problem (SLAM)robot trajectory planning sequential decision making andmusic generation

We believe that the STAMN can serve as a starting pointfor integrating the associative memory networks with thePOMDP problem Further research will be carried out onthe following aspects how to scale our approach to thestochastic case by heuristic statistical test how to incorporatewith the reinforcement learning to produce a new distributed

reinforcement learning model and how it should be appliedto robot SLAM to resolve the practical navigation problem

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this article

References

[1] A K McCallum Reinforcement learning with selective percep-tion and hidden state [PhD thesis] University of RochesterRochester NY USA 1996

[2] MHutter ldquoFeature reinforcement learning part I unstructuredMDPsrdquo Journal of Artificial General Intelligence vol 1 no 1 pp3ndash24 2009

14 Computational Intelligence and Neuroscience

[3] M Daswani P Sunehag andMHutter ldquoFeature reinforcementlearning state of the artrdquo in Proceedings of the Workshops at the28th AAAI Conference on Artificial Intelligence 2014

[4] PNguyen P Sunehag andMHutter ldquoContext treemaximizingreinforcement learningrdquo in Proceedings of the 26th AAAI Con-ference onArtificial Intelligence pp 1075ndash1082 Toronto CanadaJuly 2012

[5] J Veness K S NgMHutterWUther andD Silver ldquoAMonte-Carlo AIXI approximationrdquo Journal of Artificial IntelligenceResearch vol 40 no 1 pp 95ndash142 2011

[6] M P Holmes and C L Isbell Jr ldquoLooping suffix tree-basedinference of partially observable hidden staterdquo in Proceedings ofthe 23rd International Conference on Machine Learning (ICMLrsquo06) pp 409ndash416 ACM Pittsburgh Pa USA 2006

[7] M Daswani P Sunehag andMHutter ldquoFeature reinforcementlearning using looping suffix treesrdquo in Proceedings of the 10thEuropeanWorkshop on Reinforcement Learning (EWRL rsquo12) pp11ndash24 Edinburgh Scotland 2012

[8] M Daswani P Sunehag and M Hutter ldquoQ-learning forhistory-based reinforcement learningrdquo in Proceedings of theConference on Machine Learning pp 213ndash228 2013

[9] S Timmer and M Riedmiller Reinforcement learning with his-tory lists [PhD thesis] University of Osnabruck OsnabruckGermany 2009

[10] E Talvitie ldquoLearning partially observable models using tempo-rally abstract decision treesrdquo in Advances in Neural InformationProcessing Systems pp 818ndash826 2012

[11] M Mahmud ldquoConstructing states for reinforcement learningrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 727ndash734 June 2010

[12] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquoNeural Computation vol 9 no 8 pp 1735ndash1780 1997

[13] V Mnih K Kavukcuoglu D Silver et al ldquoHuman-level controlthrough deep reinforcement learningrdquo Nature vol 518 no7540 pp 529ndash533 2015

[14] M Hausknecht and P Stone ldquoDeep recurrent Q-learning forpartially observable MDPsrdquo httparxivorgabs150706527

[15] K Narasimhan T Kulkarni and R Barzilay ldquoLanguage under-standing for text-based games using deep reinforcement learn-ingrdquo httpsarxivorgabs150608941

[16] J Oh X Guo H Lee R Lewis and S Singh ldquoAction-condi-tional video prediction using deep networks in atari gamesrdquohttparxivorgabs150708750

[17] X Li L Li J Gao et al ldquoRecurrent reinforcement learning ahybrid approachrdquo httparxivorgabs150903044

[18] D Wang and B Yuwono ldquoIncremental learning of complextemporal patternsrdquo IEEE Transactions on Neural Networks vol7 no 6 pp 1465ndash1481 1996

[19] A Sudo A Sato and O Hasegawa ldquoAssociative memory foronline learning in noisy environments using self-organizingincremental neural networkrdquo IEEE Transactions on NeuralNetworks vol 20 no 6 pp 964ndash972 2009

[20] M U Keysermann and P A Vargas ldquoTowards autonomousrobots via an incremental clustering and associative learningarchitecturerdquo Cognitive Computation vol 7 no 4 pp 414ndash4332014

[21] A F R Araujo and G D A Barreto ldquoContext in temporal seq-uence processing a self-organizing approach and its applicationto roboticsrdquo IEEE Transactions on Neural Networks vol 13 no1 pp 45ndash57 2002

[22] A L Freire G A Barreto M Veloso and A T Varela ldquoShort-term memory mechanisms in neural network learning of robotnavigation tasks A Case Studyrdquo in Proceedings of the 6th LatinAmerican Robotics Symposium (LARS rsquo09) pp 1ndash6 October2009

[23] S Tangruamsub A Kawewong M Tsuboyama and O Hase-gawa ldquoSelf-organizing incremental associative memory-basedrobot navigationrdquo IEICE Transactions on Information andSystems vol 95 no 10 pp 2415ndash2425 2012

[24] V A Nguyen J A Starzyk W-B Goh and D Jachyra ldquoNeuralnetwork structure for spatio-temporal long-term memoryrdquoIEEE Transactions on Neural Networks and Learning Systemsvol 23 no 6 pp 971ndash983 2012

[25] F Shen H Yu W Kasai and O Hasegawa ldquoAn associativememory system for incremental learning and temporal seq-uencerdquo in Proceedings of the International Joint Conference onNeural Networks (IJCNN rsquo10) pp 1ndash8 IEEE Barcelona SpainJuly 2010

[26] F ShenQOuyangWKasai andOHasegawa ldquoA general asso-ciative memory based on self-organizing incremental neuralnetworkrdquo Neurocomputing vol 104 pp 57ndash71 2013

[27] B Khouzam ldquoNeural networks as cellular computing modelsfor temporal sequence processingrdquo Suplec 2014

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 7: Research Article A Self-Organizing Incremental ...downloads.hindawi.com › journals › cin › 2016 › 7158507.pdfResearch Article A Self-Organizing Incremental Spatiotemporal Associative

Computational Intelligence and Neuroscience 7

set of current state node 119895The presynaptic node set of currentstate node 119895 is expressed by 120591119895 where 119899119904119894 (119905) 119899119886119894 (119905) representthe activation value at time 119905 and 119899119904119894 (119905) 119899119886119894 (119905) record not onlythe temporal order but also the duration time because of theself-decay of neuron activity If 119899119904119894 (119905) 119899119886119894 (119905) are smaller thenode 119894 is more early to current state node 119895120596119894119895(119905) is activationweight between presynaptic node 119894 and state node 119895 120596119894119895(119905)records context information related to state node 119895 to be usedin identifying and recalling where 120596119894119895(119905) isin [0 1] 120596119894119895(0) = 0and 120572 is learning rate

The weight 120596119894119895(119905) is time-related contextual informationthe update process is shown as in Figure 3

(2) One-Step Transition Prediction Weight 120596119894119895(119905) The weight120596119894119895(119905) connected to state nodes 119895 represents one-step transi-tion prediction in LTM This is a future-oriented behaviourin the STAMN Using the time-delayed Hebb learning rulesthe weight 120596119894119895(119905) is adjusted according to

120596119894119895 (119905)119894=argmax 119899119883

119894(119905)

119899119904119895(119905)=1

=

119899119904119895 (119905) 119899119883119894 (119905) 120596119894119895 (119905 minus 1) = 0 119883 isin 119904 119886120596119894119895 (119905 minus 1) + 120573 (119899119883119894 (119905) minus 120596119894119895 (119905 minus 1)) else (5)

The transition activation of current state node 119895 is onlyassociated with the previous winning state node and actionnode where 119895 is the current identifying activation state node119899119904119895(119905) = 1 The previous winning state node and action nodeare presented by 119894 = argmax119883119894 (119905) 119883 isin 119904 119886 where 120596(119905) isin[0 1] and 120596(0) = 0 120573 is learning rate

The weight 120596119894119895(119905) is one-step transition prediction infor-mation the update process is shown as in Figure 4

(3) The Weight 119908119894119895(119905) Connected to Action NodesThe activa-tion of action node is only associated with the correspondingstate node which selects this action node directly We assumethe state node with maximal identifying activation value isthe state node 119894 at the time 119905 so the connection weight 119908119894119895(119905)connected to current selected action node 119895 is adjusted by

119908119894119895 (119905)119894=argmax 119899119904

119894(119905)

119899119886119895(119905)=1

=

1198760 (119904119894 119886119895) 119908119894119895 (119905 minus 1) = 0minus119876 max const 119886119895 is a invalid action

119876119905 (119904119894 119886119895) else(6)

where 119908119894119895(119905) is represented by 119876 function 119876119905(119904119905 119886119895) Thispaper only discusses how to build generalized environmentmodel not learning the optimal policy so this value is setto be the exhaustive exploration 119876 function based on thecuriosity reward The curiosity reward is described by (7)When action 119886119895 is an invalid action 119908119894119895(119905) is defined to bea large negative constant which is avoided going to the deadends

119903 (119905) = 119862 minus 119899119905

119899ave 119899ave gt 0119862 119899ave = 0

(7)

where 119862 is a constant value 119899119905 represents the count ofexploration to the current action by the agent and 119899ave is theaverage count of exploration to all actions by the agent 119899119905119899averepresents the degree of familiarity with the current selectedaction The curiosity reward is updated when each action is

finished 119876119905(119904119894 119886119895) update equation is showed by (8) where 120582is learning rate

119876119905 (119904119894 119886119895)

= (1 minus 120582)119876119905minus1 (119904119894 119886119895) + 120582119903 (119905) 119894 = argmax 119899119904119894 (119905) 119899119886119895 (119905) = 1119876119905minus1 (119904119894 119886119895) else

(8)

The update process is shown as in Figure 5The action node 119895 is selected by 119876119905(119904119905 119886119895) according to

119886119895 = argmax119895isin119860vl119894119899119904119894 (119905)=1

119876(119904119894 119886119895) (9)

where 119860vl119894 is the valid action set of state node 119894 119899119904119894 (119905) = 1

represent that the sate node 119894 is identified and the action node119895 was selected by agent to control the robotrsquos current actionset 119899119886119894 (119905) = 1

34 The Constructing Process in STAMN In order to con-struct the minimal looping hidden state transition model weneed to identify the looped state by identifying hidden stateThere exist identifying phase and recalling phase (transitionprediction phase) simultaneously in the constructing processin STAMN There is a chicken and egg problem during theconstructing process the building of the STAMN dependson state identifying conversely state identifying dependson the current structure of the STAMN Thus exhaustiveexploration and 119896-step variable memory length (depends onthe prediction criterion) are used to try to avoid local minimathat this interdependent causes

According to Algorithm 12 The identifying activationvalue of state node 119904119895 depends on three identifying processes119896-step history sequence identifying following identifyingand previous identifying We provide calculation equationsfor each identifying process

(1) 119896-Step History Sequence IdentifyingThe matching degreeof current 119896-step history sequence with the identifying

8 Computational Intelligence and Neuroscience

ojminus1 ajminus1120596ij ojmiddot middot middot ns

j(t) = 1

i isin Γj nsi(t) = 0 na

i = 0

ojminusk+1 ajminusk+1

Figure 3 Past-oriented weights 120596119894119895(119905) update process associatedwith state node 119895

oi ojai

120596ij(t)nsj(t) = 1

i = arg max nXi (t)

Figure 4 Future-oriented weights 120596119894119895(119905) update process associatedwith state node 119895

oi ajQ0(si aj)

naj (t) = 1

i = arg max nsi (t)

Figure 5 Weights 119908119894119895(119905) updating process associated with actionnode 119895

history sequence ℎ119895 for 119904119895 is computed to identify the loopingstate node 119895 First we compute the presynaptic potential119881119895(119905)for the state node 119895 according to

119881119895 (119905) = sum119894isin120591119895

119862119894119895 (119887119883119894 (119905) 120596119894119895) 119883 isin 119904 119886 (10)

where 119887119883119894 (119905) is the current activation value in the presynapticnode set 120591119895 119862119894119895 is the confidence parameter which meansthe node 119894rsquos importance degree in the presynaptic node set120591119895 The value of 119862119894119895 can be set in advance and sum119894isin120591119895 119862119894119895 =1 The function represents the similar degree betweenactivation value of the presynaptic node and the contextualinformation 120596119894119895 in LTM (119909 119910) isin [0 1] the similar degreeis high between 119909 and 119910 thus(119909 119910) is close to 1 Accordingto ldquowinner takes allrdquo among all nodes whose presynapticpotentials exceed the threshold 120599 the node with the largestpresynaptic potential will be selected

The presynaptic potential 119881119895(119905) represents the syn-chronous activation process of presynaptic node set of 120591119895

which represents the previous 119896 minus 1 step contextual infor-mation matching of state node 119895 To realize all 119896-step historysequence matching the 119896-step history sequence identifyingactivation value of the state node 119895 is given below

119898119904119895 (119905) = 119867 (119881119895 (119905) 120599)

sdot 119867(119881119895 (119905) max1198961rarr119873119888

(119881119896 (119905))) 119887119904119895 (119905)

119867 (119909 119910) = 0 119909 lt 1199101 119909 ge 119910

(11)

where max1198961rarr119873119888(119881119896(119905)) is the maximum potential value ofnode 119896 119867(119881119895(119905) 120599) sdot 119867(119881119895(119905)max1198961rarr119873119888(119881119896(119905))) means thenode with the largest potential value is selected among allnodes whose synaptic potentials exceed the threshold 120599 119887119904119895(119905)represents the matching degree of state node 119895 and currentobserved value If 119898119904119895(119905) = 1 the current state 119904 is identifiedto looped state node 119895 by the 119896 step memory

(2) Following Identifying If the current state 119904 is identifiedas the state 119904119894 then the next transition prediction state isidentified as the state 119904119895 = 119891119904(119904119894 119886119894) First we compute thetransition prediction value for the state node 119895 according to

119901119904119895 (119905) = sum119894=argmax 119899119883119894 (119905)

119863119894119895120601 (119899119883119894 (119905) 120596119894119895) 119883 isin 119904 119886 (12)

where 119899119883119894 (119905) is the identifying activation value of the statenode and action node at time 119905 119894 = argmax 119899119883119894 (119905) indicatesstate node 119894 and action node 119894 are the previous winner nodes119863119894119895 is the confidence parameter which means the node 119894rsquosimportance degree The value of 119863119894119895 can be set in advanceand sum119894=arg max 119899119883

119894(119905)119863119894119895 = 1 120596119894119895 records one-step transition

prediction information related to state node 119895 to be usedin identifying and recalling phase If the current state 119904 isidentified as the hidden state 119904119894119901119904119895(119905) represent the probabilityof the next transition prediction state is 119895

If the next prediction state node 119895 is the same as thecurrent observation value the current history sequence isidentified as the state node 119895 the following identifying valueof the state node 119895 is given below

119902119904119895 (119905) = 119867 (119901119904119895 (119905) 120599)

sdot 119867(119901119904119895 (119905) max1198961rarr119873119888

(119901119904119896 (119905))) 119887119904119895 (119905) (13)

If 119902119904119895(119905) = 1 the current history sequence ℎ119905 is identifiedto looped state node 119895 by the following identifying If 119901119904119895(119905) gemax1198961rarr119873119888(119901119904119896(119905)) 119901119904119895(119905) ge 120599 and 119902119904119895(119905) = 1 then there exists119887119904119895(119905) = 1 representing mismatching of state node 119895 and cur-rent observed value There exist trans(ℎ119905 119886119894) = trans(ℎ119894 119886119894)According to transition prediction criterion (Algorithm 13)the current history sequence ℎ119905 is not identified by the hidden

Computational Intelligence and Neuroscience 9

sate 119904119894 so the identifying history sequence length for ℎ119905 andℎ119894 separately must increase to 119896 + 1(3) Previous Identifying If the current history sequence isidentified as the state 119904119895 then the previous state 119904119894 is identifiedsuch that 119904119895 = 119891119904(119904119894 119886119894) First we compute the identifyingactivation value for all states 119904 if there exists 119899119904119895(119905) = 1 thenthe current state 119904119905 is identified as state 119904119895 and the previousstate 119904119905minus1 is identified as the previous state 119904119894 of state 119904119895 Theprevious identifying value of the state node 119895 is defined as119889119904119894 (119905) The previous state 119904119894 is 119894 = argmax(119887119904119894 (119905)) satisfyingcondition 1 and condition 2 Then we set 119889119904119894 (119905)

Condition 1

119899119904119895 (119905) = 1 (14)

Condition 2

120601 (119887119904119894 (119905) 120596119894119895) gt 120599 (15)

According to above three identifying processes the iden-tifying activation value of the state node 119895 is defined asfollows

119899119904119895 (119905) = max (119898119904119895 (119905) 119902119904119895 (119905) 119889119904119895 (119905)) (16)

According to Algorithm 11 we give Pseudocode 1 forAlgorithm 11

Thepseudocode forAlgorithm 12 is as follows the currenthistory sequence identifying algorithm (identifying phase)Compute the identifying activation value according to (16)

The pseudocode for Algorithm 13 is in Pseudocode 2

4 Simulation

41 A Simple Example of Deterministic POMDP We wantto construct the looping transition model as in Figure 1First we obtain the transition instance ℎ119905 1199001 1198861 1199001 1198862 11990021198862 1199002 1198861 1199001 1198861 1199001 According to Algorithm 11 set the initialidentifying history sequence length 119896 = 1 The STAMNmodel is constructed incrementally using ℎ119905 as in Figure 6

When ℎ119905 = 1199001 1198861 1199001 1198862 1199002 1198862 1199002 1198861 1199001 1198861 1199001 1198862 1199001because of 119896 = 1 the looped state is identified by the currentobservation vectorThus ℎ1 = 1199001 and ℎ2 = 1199001 are the iden-tifying history sequences for the state 1199041 Since 1199001 = tans(ℎ11198862) and 1199002 = tans(ℎ2 1198862) in ℎ119905 exist so as to trans(ℎ1 1198862) =trans(ℎ2 1198862) according to Lemma 9 ℎ1 and ℎ2 are the iden-tifying history sequences for the states 119904119894 and 119904119895 respectivelyand 119904119894 = 119904119895 So 1199001 is distinguished as 11990010158401 and 119900101584010158401 And ℎ119905 =1199001 1198861 11990010158401 1198862 1199002 1198862 1199002 1198861 1199001 1198861 119900101584010158401 1198862 1199001

According the Lemma 10 for any two identifying historysequences ℎ119894 and ℎ119895 separately for 11990010158401 and 119900101584010158401 there existsℎ119894 = ℎ119895 iff 119904119894 = 119904119895 So the identifying history sequencelength for ℎ119894 and ℎ119895 separately for 119904119894 and 119904119895 must increase to 2For 11990010158401 the identifying history sequence is 1199001 1198861 11990010158401 and for119900101584010158401 the identifying history sequence is 1199001 1198861 119900101584010158401 These twoidentifying history sequences are identical so longer transi-tion instance is needed

fo(s2) = o2

fo(s1) = o1

s1

s2

a1

a1

a2

a2

Figure 6 The STAMNmodel with memory depth 1

Table 1 The identifying history sequences for 11990010158401 and 119900101584010158401 Hidden state 11990010158401 (119896 = 2) 119900101584010158401 (119896 = 3)

Identifying history sequence1199001 1198861 11990010158401 1199002 1198861 1199001 1198861 119900101584010158401119900101584010158401 1198862 119900101584011199002 1198861 11990010158401

When ℎ119905 = 1199001 1198861 11990010158401 1198862 1199002 1198862 1199002 1198861 1199001 1198861 119900101584010158401 1198862 11990010158401 11988621199002 1198861 11990010158401 1198862 1199002 1198861 1199001 1198861 1199001 1198861 11990010158401 1198862 1199002 In ℎ119905 we obtain theidentifying history sequence ℎ119894 = ℎ119895 for states 11990010158401 and 119900101584010158401 For 119896 = 2 the distinguished identifying history sequence for11990010158401 is 119900101584010158401 1198862 11990010158401 and 1199002 1198861 11990010158401 However the distinguishedidentifying history sequence for 119900101584010158401 is 1199001 1198861 119900101584010158401 which is nota discriminated history sequence from the identifying historysequence for 11990010158401 So the identifying history sequence length for119900101584010158401 must increase to 3 The identifying history sequences for11990010158401 and 1199001 are described as in Table 1

According to 119896-step history identifying following iden-tifying and previous identifying ℎ119905 can be represented asℎ119905 = 119900101584010158401 1198861 11990010158401 1198862 1199002 1198862 1199002 1198861 11990010158401 1198861 119900101584010158401 1198862 11990010158401 1198862 1199002 1198861 119900101584011198862 1199002 1198861 11990010158401 1198861 119900101584010158401 1198861 11990010158401 1198862 1199002

According to Pseudocode 1 the STAMN model is con-structed incrementally using ℎ119905 as in Figure 7(a) and theLPST is constructed as Figure 7(b) The STAMN has thefewest nodes because the state nodes in STAMN representthe hidden state and the state nodes in LPST represent theobservation state

To illustrate the difference between the STAMN andLPST we present the comparison results in Figure 8

After 10 timesteps the algorithms use the current learnedmodel to realize the transition prediction and the transitionprediction criterion is expressed by the prediction error Thedata point represents the average prediction error over 10runs We ran three algorithms the STAMN with exhaustiveexploration the STAMN with random exploration and theLPST

10 Computational Intelligence and Neuroscience

Set initial memory depth 119896 = 1 119905 = 0All nodes and connections weight do not exist initially in STAMNThe transition instance ℎ0 = 120601A STAMN is constructed incrementally through interaction with the environment by the agentexpanding the SATAMN until the transition prediction is contradicted with the current minimalhypothesis model reconstruct the hypothesis model by increase the memory depth 119896 in thispaper the constructing process includes the identifying and recalling simultaneouslyWhile one pattern 119900119905 or 119886119905 can be activated from the pattern cognition layer do

ℎ119905 = (ℎ119905minus1 119900119905 119886119905)for all existed state nodescompute the identifying activation value by executing Algorithm 12If exist the 119899119904119895(119905) = 1 thenthe looped state is identified and the weight 120596119894119895120596119894119895 is adjusted according to (4) (5)

elsea new state node 119895 will be created set 119899119904119895(119905) = 1 and the weight 120596119894119895120596119894119895 is adjusted according to (4) (5)end ifend forfor all existed state nodescompute the transition prediction criterion by executing Algorithm 13If IsNotTrans(119904119895) then

for the previous winner state node 119894 of the state node 119895set memory depth 119896 = 119896 + 1 until each ℎ119894 are discriminated

end forreconstruct the hypothesis model by the new memory depth 119896 according to Algorithm 11

end ifend forfor all existed action nodescompute the activation value according to (9)

if exist the 119899119886119895 (119905) = 1 thenthe weight 119908119894119895(119905) is adjusted according to (6)else

a new action node 119895 will be created set 119899119886119895 (119905) = 1 and the weight 119908119894119895(119905) is adjusted according to (6)end ifend forThe agent obtains the new observation vectors by 119900119905+1 = trans(119904119905 119886119905)119905 = 119905 + 1endWhile

Pseudocode 1 Algorithm for heuristic constructing the STAMN

IsNotTrans(119904119895)if the 119901119904119895(119905) gt 120599 according to (12) and the 119902119904119895(119905) = 1 according to (13) thenreturn Falseelsereturn Trueend if

Pseudocode 2 Transition prediction criterion algorithm (recalling phase)

Figure 8 shows that three algorithms all can produce thecorrect model with zero prediction error at last because ofno noise However the STAMN with exhaustive explorationhas the better performance and the faster convergence speedbecause of the exhaustive exploration 119876-function

42 Experimental Comparison in 4 lowast 3 Grid Problem Firstwe use the small POMDP problem to test the feasibility of

the STAMN 4 lowast 3 gird problem is selected which is shownin Figure 9 The agent wanders inside the grid and has onlyleft sensor and right sensor to report the existence of a wallin the current position The agent has four actions forwardbackward turn left and turn rightThe reference direction ofthe agent is northward

In the 4 lowast 3 gird problem the hidden state space size|119878| is 11 the action space size |119860| is 4 for each state and

Computational Intelligence and Neuroscience 11

a1

fo(s3) = o1

fo(s2) = o2fo(s1) = o1 s1 s2

s3

a1a1

a1a2

a2

a2

(a)

a1a1

a1

a1

a1

a2

r

o1

o1

o1

o1

o1

o1

o2

o2

o2

(b)

Figure 7 (a) The STAMNmodel (b) The LPST model

0 20 40 60 80 1000

09

08

07

06

05

04

03

02

01

1 A simple deterministic POMDP

Timesteps

Pred

ictio

n er

ror

STAMN (exhaustive exploration)STAMN (random exploration)LPST

Figure 8 Comparative performance results for a simple determin-istic POMDP

the observation space size |119874| is 4 The upper figure in thegrid represents the observation state and the bottom figure inthe grid represents the hidden state The black grid and thewall are regarded as the obstacles We present a comparisonbetween the LPST and the STAMNThe results are shown inFigure 10 Figure 10 shows that the STAMN with exhaustiveexploration still has the better performance and the fasterconvergence speed in the 4 lowast 3 gird problem

The number of state nodes and action nodes in STAMNand LPST is described in Table 2 In STAMN the state node

20

34

01

08

27

13

02

110

09

16

25

Figure 9 4 lowast 3 gird problem

Table 2 The number of state nodes and action nodes in STAMNand LPST

Algorithm The number of statenodes The number of action nodes

LPST 70 50STAMN 11 44

represents the hidden state but in LPST the sate node repre-sents observation value and the hidden state is expressed by119896-step past observations and actions thus more observationnodes and action nodes are repeated created and most of theobservation nodes and action nodes are the same LPST hasthe perfect observation prediction capability and is the sameas the STAMN but is not the state transition model

In STAMN The setting of parameters 120583 120578 120574119904119894 120575119886119894 is veryimportant which determines the depth 119896 of short-term

12 Computational Intelligence and Neuroscience

0 200 400 600 800 1000Timesteps

STAMN (exhaustive exploration)STAMN (random exploration)LPST

0

09

08

07

06

05

04

03

02

01

1

Pred

ictio

n er

ror

4 lowast 3 grid problem

Figure 10 Comparative performance results for 4lowast3 gird problem

memory We assume 120574119904119894 = 120575119886119894 = 09 and 120583 = 120578 = 09 initialrepresenting 119896 = 1 When we need to increase 119896 to 119896 + 1 weonly need to decrease 120583 120578 according to 120578 = 120578minus02 120583 = 120583minus01The other parameters are determined relative easily We setlearning rates 120572 = 120573 = 120574 = 09 and set the confidenceparameters 119862119894119895 = 1|120591119895| 119863119894119895 = 12 representing the sameimportance degree Set constant values 119862 = 5 120599 = 09

43 Experimental Results in Complex Symmetrical Environ-ments The symmetrical environment in this paper is verycomplexwhich is shown in Figure 11(a)The robotwanders inthe environment By following wall behaviour the agent canrecognize the left wall right wall and corridor landmarksThese observation landmarks are different from the observa-tion vector in the previous grid problem every observationlandmark has different duration time For analysis of thefault tolerance and robustness of STAMN we assume thedisturbed environment is shown in Figure 11(b)

The figures in Figure 11(a) represent the hidden statesThe robot has four actions forward backward turn leftand turn right The initial orientation of the robot is shownin Figure 11(a) Since the robot follows the wall in theenvironment the robot has only one optional action in eachstate The reference direction of action is the robot currentorientation Thus the path 2-3-4-5-6 and path 12-13-14-15-16 are identical and the memory depth 119896 = 6 is necessaryto identify hidden states reliably We present a comparisonbetween the STAMNand the LPST in noise-free environmentand disturbed environment The results are shown as inFigures 12(a) and 12(b) Figure 12(b) shows that STAMN hasbetter noise tolerance and robustness than LPST because ofthe neuron synchronized activation mechanism which bearsthe fault and noise in the sequence item duration time andcan realize the reliable recalling However the LPST cannot

realize the correct convergence in reasonable time because ofaccurate matching

5 Discussion

In this section we compare the relatedworkwith the STAMNmodel The related work mainly includes the LPST and theAMN

51 Looping Prediction Suffix Tree (LPST) LPST is construc-ted incrementally by expanding branches until they are iden-tified or become looped using the observable terminationcriterion Given a sufficient history this model can correctlycapture all predictions of identifying histories and can mapthe all infinite identifying histories onto a finite LPST

The STAMNproposed in this paper is similar to the LPSTHowever the STAMN is looping hidden state transitionmodel so in comparison with LPST STAMN have lessstate nodes and action nodes because these nodes in LPSTare based on observation not hidden state FurthermoreSTAMNhas better noise tolerance and robustness than LPSTThe LPST realizes recalling by successive accurate matchingwhich is sensitive to noise and fault The STAMN offers theneuron synchronized activation mechanism to realize recall-ing Even if in noisy and disturbed environment STAMNcan still realize the reliable recalling Finally the algorithm forlearning a LPST is an additional computation model whichis not a distributed computational model The STAMN isa distributed network and uses the synchronized activationmechanism where performance cannot become poor withhistory sequences increasing and scale increasing

52 Associative Memory Networks (AMN) STAMN is pro-posed based on the development of the associative memorynetworks In existing AMN firstly almost all models areunable to handle complex sequence with looped hiddenstate The STAMN realizes the identifying the looped hiddenstate indeed which can be applied to HMM and POMDPproblems Furthermoremost AMNmodels can obtainmem-ory depth determined by experiments The STAMN offers aself-organizing incremental memory depth learningmethodand the memory depth 119896 is variable in different parts ofstate space Finally the existing AMN models general onlyrecord the temporal orders with discrete intervals Δ119905 ratherthan sequence items durationwith continuous-time STAMNexplicitly deals with the duration time of each item

6 Conclusion and Future Research

POMDP is the long-standing difficult problem in the mach-ine learning In this paper SATMN is proposed to identifythe looped hidden state only by the transition instances indeterministic POMDP environment The learned STAMN isseen as a variable depth 119896-Morkov model We proposed theheuristic constructing algorithm for the STAMN which isproved to be sound and complete given sufficient historysequences The STAMN is real self-organizing incrementalunsupervised learning model These transition instances can

Computational Intelligence and Neuroscience 13

2

3

6

7

8

9

10

11

12

54 14

13

21

22

23

24 18

19

20 16

171

15

(a) (b)

Figure 11 (a) Complex symmetrical simulation environment (b) The disturbed environment

0 100 200 300 400 500

A complex symmetrical environment

Timesteps

STAMNLPST

0

09

08

07

06

05

04

03

02

01

1

Pred

ictio

n er

ror

(a)

0 100 200 300 400 500

A disturbed environment

Timesteps

STAMNLPST

0

09

08

07

06

05

04

03

02

01

1Pr

edic

tion

erro

r

(b)

Figure 12 (a) Comparative results for a noise-free environment (b) Comparative results for a disturbed environment

be obtained by the interaction with the environment by theagent and can be obtained by a number of training datathat does not depend on the real agent The STAMN is veryfast robust and accurate We have also shown that STAMNoutperforms some existing hidden state methods in deter-ministic POMDP environment The STAMN can generallybe applied to almost all temporal sequences problem suchas simultaneous localization and mapping problem (SLAM)robot trajectory planning sequential decision making andmusic generation

We believe that the STAMN can serve as a starting pointfor integrating the associative memory networks with thePOMDP problem Further research will be carried out onthe following aspects how to scale our approach to thestochastic case by heuristic statistical test how to incorporatewith the reinforcement learning to produce a new distributed

reinforcement learning model and how it should be appliedto robot SLAM to resolve the practical navigation problem

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this article

References

[1] A K McCallum Reinforcement learning with selective percep-tion and hidden state [PhD thesis] University of RochesterRochester NY USA 1996

[2] MHutter ldquoFeature reinforcement learning part I unstructuredMDPsrdquo Journal of Artificial General Intelligence vol 1 no 1 pp3ndash24 2009

14 Computational Intelligence and Neuroscience

[3] M Daswani P Sunehag andMHutter ldquoFeature reinforcementlearning state of the artrdquo in Proceedings of the Workshops at the28th AAAI Conference on Artificial Intelligence 2014

[4] PNguyen P Sunehag andMHutter ldquoContext treemaximizingreinforcement learningrdquo in Proceedings of the 26th AAAI Con-ference onArtificial Intelligence pp 1075ndash1082 Toronto CanadaJuly 2012

[5] J Veness K S NgMHutterWUther andD Silver ldquoAMonte-Carlo AIXI approximationrdquo Journal of Artificial IntelligenceResearch vol 40 no 1 pp 95ndash142 2011

[6] M P Holmes and C L Isbell Jr ldquoLooping suffix tree-basedinference of partially observable hidden staterdquo in Proceedings ofthe 23rd International Conference on Machine Learning (ICMLrsquo06) pp 409ndash416 ACM Pittsburgh Pa USA 2006

[7] M Daswani P Sunehag andMHutter ldquoFeature reinforcementlearning using looping suffix treesrdquo in Proceedings of the 10thEuropeanWorkshop on Reinforcement Learning (EWRL rsquo12) pp11ndash24 Edinburgh Scotland 2012

[8] M Daswani P Sunehag and M Hutter ldquoQ-learning forhistory-based reinforcement learningrdquo in Proceedings of theConference on Machine Learning pp 213ndash228 2013

[9] S Timmer and M Riedmiller Reinforcement learning with his-tory lists [PhD thesis] University of Osnabruck OsnabruckGermany 2009

[10] E Talvitie ldquoLearning partially observable models using tempo-rally abstract decision treesrdquo in Advances in Neural InformationProcessing Systems pp 818ndash826 2012

[11] M Mahmud ldquoConstructing states for reinforcement learningrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 727ndash734 June 2010

[12] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquoNeural Computation vol 9 no 8 pp 1735ndash1780 1997

[13] V Mnih K Kavukcuoglu D Silver et al ldquoHuman-level controlthrough deep reinforcement learningrdquo Nature vol 518 no7540 pp 529ndash533 2015

[14] M Hausknecht and P Stone ldquoDeep recurrent Q-learning forpartially observable MDPsrdquo httparxivorgabs150706527

[15] K Narasimhan T Kulkarni and R Barzilay ldquoLanguage under-standing for text-based games using deep reinforcement learn-ingrdquo httpsarxivorgabs150608941

[16] J Oh X Guo H Lee R Lewis and S Singh ldquoAction-condi-tional video prediction using deep networks in atari gamesrdquohttparxivorgabs150708750

[17] X Li L Li J Gao et al ldquoRecurrent reinforcement learning ahybrid approachrdquo httparxivorgabs150903044

[18] D Wang and B Yuwono ldquoIncremental learning of complextemporal patternsrdquo IEEE Transactions on Neural Networks vol7 no 6 pp 1465ndash1481 1996

[19] A Sudo A Sato and O Hasegawa ldquoAssociative memory foronline learning in noisy environments using self-organizingincremental neural networkrdquo IEEE Transactions on NeuralNetworks vol 20 no 6 pp 964ndash972 2009

[20] M U Keysermann and P A Vargas ldquoTowards autonomousrobots via an incremental clustering and associative learningarchitecturerdquo Cognitive Computation vol 7 no 4 pp 414ndash4332014

[21] A F R Araujo and G D A Barreto ldquoContext in temporal seq-uence processing a self-organizing approach and its applicationto roboticsrdquo IEEE Transactions on Neural Networks vol 13 no1 pp 45ndash57 2002

[22] A L Freire G A Barreto M Veloso and A T Varela ldquoShort-term memory mechanisms in neural network learning of robotnavigation tasks A Case Studyrdquo in Proceedings of the 6th LatinAmerican Robotics Symposium (LARS rsquo09) pp 1ndash6 October2009

[23] S Tangruamsub A Kawewong M Tsuboyama and O Hase-gawa ldquoSelf-organizing incremental associative memory-basedrobot navigationrdquo IEICE Transactions on Information andSystems vol 95 no 10 pp 2415ndash2425 2012

[24] V A Nguyen J A Starzyk W-B Goh and D Jachyra ldquoNeuralnetwork structure for spatio-temporal long-term memoryrdquoIEEE Transactions on Neural Networks and Learning Systemsvol 23 no 6 pp 971ndash983 2012

[25] F Shen H Yu W Kasai and O Hasegawa ldquoAn associativememory system for incremental learning and temporal seq-uencerdquo in Proceedings of the International Joint Conference onNeural Networks (IJCNN rsquo10) pp 1ndash8 IEEE Barcelona SpainJuly 2010

[26] F ShenQOuyangWKasai andOHasegawa ldquoA general asso-ciative memory based on self-organizing incremental neuralnetworkrdquo Neurocomputing vol 104 pp 57ndash71 2013

[27] B Khouzam ldquoNeural networks as cellular computing modelsfor temporal sequence processingrdquo Suplec 2014

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 8: Research Article A Self-Organizing Incremental ...downloads.hindawi.com › journals › cin › 2016 › 7158507.pdfResearch Article A Self-Organizing Incremental Spatiotemporal Associative

8 Computational Intelligence and Neuroscience

ojminus1 ajminus1120596ij ojmiddot middot middot ns

j(t) = 1

i isin Γj nsi(t) = 0 na

i = 0

ojminusk+1 ajminusk+1

Figure 3 Past-oriented weights 120596119894119895(119905) update process associatedwith state node 119895

oi ojai

120596ij(t)nsj(t) = 1

i = arg max nXi (t)

Figure 4 Future-oriented weights 120596119894119895(119905) update process associatedwith state node 119895

oi ajQ0(si aj)

naj (t) = 1

i = arg max nsi (t)

Figure 5 Weights 119908119894119895(119905) updating process associated with actionnode 119895

history sequence ℎ119895 for 119904119895 is computed to identify the loopingstate node 119895 First we compute the presynaptic potential119881119895(119905)for the state node 119895 according to

119881119895 (119905) = sum119894isin120591119895

119862119894119895 (119887119883119894 (119905) 120596119894119895) 119883 isin 119904 119886 (10)

where 119887119883119894 (119905) is the current activation value in the presynapticnode set 120591119895 119862119894119895 is the confidence parameter which meansthe node 119894rsquos importance degree in the presynaptic node set120591119895 The value of 119862119894119895 can be set in advance and sum119894isin120591119895 119862119894119895 =1 The function represents the similar degree betweenactivation value of the presynaptic node and the contextualinformation 120596119894119895 in LTM (119909 119910) isin [0 1] the similar degreeis high between 119909 and 119910 thus(119909 119910) is close to 1 Accordingto ldquowinner takes allrdquo among all nodes whose presynapticpotentials exceed the threshold 120599 the node with the largestpresynaptic potential will be selected

The presynaptic potential 119881119895(119905) represents the syn-chronous activation process of presynaptic node set of 120591119895

which represents the previous 119896 minus 1 step contextual infor-mation matching of state node 119895 To realize all 119896-step historysequence matching the 119896-step history sequence identifyingactivation value of the state node 119895 is given below

119898119904119895 (119905) = 119867 (119881119895 (119905) 120599)

sdot 119867(119881119895 (119905) max1198961rarr119873119888

(119881119896 (119905))) 119887119904119895 (119905)

119867 (119909 119910) = 0 119909 lt 1199101 119909 ge 119910

(11)

where max1198961rarr119873119888(119881119896(119905)) is the maximum potential value ofnode 119896 119867(119881119895(119905) 120599) sdot 119867(119881119895(119905)max1198961rarr119873119888(119881119896(119905))) means thenode with the largest potential value is selected among allnodes whose synaptic potentials exceed the threshold 120599 119887119904119895(119905)represents the matching degree of state node 119895 and currentobserved value If 119898119904119895(119905) = 1 the current state 119904 is identifiedto looped state node 119895 by the 119896 step memory

(2) Following Identifying If the current state 119904 is identifiedas the state 119904119894 then the next transition prediction state isidentified as the state 119904119895 = 119891119904(119904119894 119886119894) First we compute thetransition prediction value for the state node 119895 according to

119901119904119895 (119905) = sum119894=argmax 119899119883119894 (119905)

119863119894119895120601 (119899119883119894 (119905) 120596119894119895) 119883 isin 119904 119886 (12)

where 119899119883119894 (119905) is the identifying activation value of the statenode and action node at time 119905 119894 = argmax 119899119883119894 (119905) indicatesstate node 119894 and action node 119894 are the previous winner nodes119863119894119895 is the confidence parameter which means the node 119894rsquosimportance degree The value of 119863119894119895 can be set in advanceand sum119894=arg max 119899119883

119894(119905)119863119894119895 = 1 120596119894119895 records one-step transition

prediction information related to state node 119895 to be usedin identifying and recalling phase If the current state 119904 isidentified as the hidden state 119904119894119901119904119895(119905) represent the probabilityof the next transition prediction state is 119895

If the next prediction state node 119895 is the same as thecurrent observation value the current history sequence isidentified as the state node 119895 the following identifying valueof the state node 119895 is given below

119902119904119895 (119905) = 119867 (119901119904119895 (119905) 120599)

sdot 119867(119901119904119895 (119905) max1198961rarr119873119888

(119901119904119896 (119905))) 119887119904119895 (119905) (13)

If 119902119904119895(119905) = 1 the current history sequence ℎ119905 is identifiedto looped state node 119895 by the following identifying If 119901119904119895(119905) gemax1198961rarr119873119888(119901119904119896(119905)) 119901119904119895(119905) ge 120599 and 119902119904119895(119905) = 1 then there exists119887119904119895(119905) = 1 representing mismatching of state node 119895 and cur-rent observed value There exist trans(ℎ119905 119886119894) = trans(ℎ119894 119886119894)According to transition prediction criterion (Algorithm 13)the current history sequence ℎ119905 is not identified by the hidden

Computational Intelligence and Neuroscience 9

sate 119904119894 so the identifying history sequence length for ℎ119905 andℎ119894 separately must increase to 119896 + 1(3) Previous Identifying If the current history sequence isidentified as the state 119904119895 then the previous state 119904119894 is identifiedsuch that 119904119895 = 119891119904(119904119894 119886119894) First we compute the identifyingactivation value for all states 119904 if there exists 119899119904119895(119905) = 1 thenthe current state 119904119905 is identified as state 119904119895 and the previousstate 119904119905minus1 is identified as the previous state 119904119894 of state 119904119895 Theprevious identifying value of the state node 119895 is defined as119889119904119894 (119905) The previous state 119904119894 is 119894 = argmax(119887119904119894 (119905)) satisfyingcondition 1 and condition 2 Then we set 119889119904119894 (119905)

Condition 1

119899119904119895 (119905) = 1 (14)

Condition 2

120601 (119887119904119894 (119905) 120596119894119895) gt 120599 (15)

According to above three identifying processes the iden-tifying activation value of the state node 119895 is defined asfollows

119899119904119895 (119905) = max (119898119904119895 (119905) 119902119904119895 (119905) 119889119904119895 (119905)) (16)

According to Algorithm 11 we give Pseudocode 1 forAlgorithm 11

Thepseudocode forAlgorithm 12 is as follows the currenthistory sequence identifying algorithm (identifying phase)Compute the identifying activation value according to (16)

The pseudocode for Algorithm 13 is in Pseudocode 2

4 Simulation

41 A Simple Example of Deterministic POMDP We wantto construct the looping transition model as in Figure 1First we obtain the transition instance ℎ119905 1199001 1198861 1199001 1198862 11990021198862 1199002 1198861 1199001 1198861 1199001 According to Algorithm 11 set the initialidentifying history sequence length 119896 = 1 The STAMNmodel is constructed incrementally using ℎ119905 as in Figure 6

When ℎ119905 = 1199001 1198861 1199001 1198862 1199002 1198862 1199002 1198861 1199001 1198861 1199001 1198862 1199001because of 119896 = 1 the looped state is identified by the currentobservation vectorThus ℎ1 = 1199001 and ℎ2 = 1199001 are the iden-tifying history sequences for the state 1199041 Since 1199001 = tans(ℎ11198862) and 1199002 = tans(ℎ2 1198862) in ℎ119905 exist so as to trans(ℎ1 1198862) =trans(ℎ2 1198862) according to Lemma 9 ℎ1 and ℎ2 are the iden-tifying history sequences for the states 119904119894 and 119904119895 respectivelyand 119904119894 = 119904119895 So 1199001 is distinguished as 11990010158401 and 119900101584010158401 And ℎ119905 =1199001 1198861 11990010158401 1198862 1199002 1198862 1199002 1198861 1199001 1198861 119900101584010158401 1198862 1199001

According the Lemma 10 for any two identifying historysequences ℎ119894 and ℎ119895 separately for 11990010158401 and 119900101584010158401 there existsℎ119894 = ℎ119895 iff 119904119894 = 119904119895 So the identifying history sequencelength for ℎ119894 and ℎ119895 separately for 119904119894 and 119904119895 must increase to 2For 11990010158401 the identifying history sequence is 1199001 1198861 11990010158401 and for119900101584010158401 the identifying history sequence is 1199001 1198861 119900101584010158401 These twoidentifying history sequences are identical so longer transi-tion instance is needed

fo(s2) = o2

fo(s1) = o1

s1

s2

a1

a1

a2

a2

Figure 6 The STAMNmodel with memory depth 1

Table 1 The identifying history sequences for 11990010158401 and 119900101584010158401 Hidden state 11990010158401 (119896 = 2) 119900101584010158401 (119896 = 3)

Identifying history sequence1199001 1198861 11990010158401 1199002 1198861 1199001 1198861 119900101584010158401119900101584010158401 1198862 119900101584011199002 1198861 11990010158401

When ℎ119905 = 1199001 1198861 11990010158401 1198862 1199002 1198862 1199002 1198861 1199001 1198861 119900101584010158401 1198862 11990010158401 11988621199002 1198861 11990010158401 1198862 1199002 1198861 1199001 1198861 1199001 1198861 11990010158401 1198862 1199002 In ℎ119905 we obtain theidentifying history sequence ℎ119894 = ℎ119895 for states 11990010158401 and 119900101584010158401 For 119896 = 2 the distinguished identifying history sequence for11990010158401 is 119900101584010158401 1198862 11990010158401 and 1199002 1198861 11990010158401 However the distinguishedidentifying history sequence for 119900101584010158401 is 1199001 1198861 119900101584010158401 which is nota discriminated history sequence from the identifying historysequence for 11990010158401 So the identifying history sequence length for119900101584010158401 must increase to 3 The identifying history sequences for11990010158401 and 1199001 are described as in Table 1

According to 119896-step history identifying following iden-tifying and previous identifying ℎ119905 can be represented asℎ119905 = 119900101584010158401 1198861 11990010158401 1198862 1199002 1198862 1199002 1198861 11990010158401 1198861 119900101584010158401 1198862 11990010158401 1198862 1199002 1198861 119900101584011198862 1199002 1198861 11990010158401 1198861 119900101584010158401 1198861 11990010158401 1198862 1199002

According to Pseudocode 1 the STAMN model is con-structed incrementally using ℎ119905 as in Figure 7(a) and theLPST is constructed as Figure 7(b) The STAMN has thefewest nodes because the state nodes in STAMN representthe hidden state and the state nodes in LPST represent theobservation state

To illustrate the difference between the STAMN andLPST we present the comparison results in Figure 8

After 10 timesteps the algorithms use the current learnedmodel to realize the transition prediction and the transitionprediction criterion is expressed by the prediction error Thedata point represents the average prediction error over 10runs We ran three algorithms the STAMN with exhaustiveexploration the STAMN with random exploration and theLPST

10 Computational Intelligence and Neuroscience

Set initial memory depth 119896 = 1 119905 = 0All nodes and connections weight do not exist initially in STAMNThe transition instance ℎ0 = 120601A STAMN is constructed incrementally through interaction with the environment by the agentexpanding the SATAMN until the transition prediction is contradicted with the current minimalhypothesis model reconstruct the hypothesis model by increase the memory depth 119896 in thispaper the constructing process includes the identifying and recalling simultaneouslyWhile one pattern 119900119905 or 119886119905 can be activated from the pattern cognition layer do

ℎ119905 = (ℎ119905minus1 119900119905 119886119905)for all existed state nodescompute the identifying activation value by executing Algorithm 12If exist the 119899119904119895(119905) = 1 thenthe looped state is identified and the weight 120596119894119895120596119894119895 is adjusted according to (4) (5)

elsea new state node 119895 will be created set 119899119904119895(119905) = 1 and the weight 120596119894119895120596119894119895 is adjusted according to (4) (5)end ifend forfor all existed state nodescompute the transition prediction criterion by executing Algorithm 13If IsNotTrans(119904119895) then

for the previous winner state node 119894 of the state node 119895set memory depth 119896 = 119896 + 1 until each ℎ119894 are discriminated

end forreconstruct the hypothesis model by the new memory depth 119896 according to Algorithm 11

end ifend forfor all existed action nodescompute the activation value according to (9)

if exist the 119899119886119895 (119905) = 1 thenthe weight 119908119894119895(119905) is adjusted according to (6)else

a new action node 119895 will be created set 119899119886119895 (119905) = 1 and the weight 119908119894119895(119905) is adjusted according to (6)end ifend forThe agent obtains the new observation vectors by 119900119905+1 = trans(119904119905 119886119905)119905 = 119905 + 1endWhile

Pseudocode 1 Algorithm for heuristic constructing the STAMN

IsNotTrans(119904119895)if the 119901119904119895(119905) gt 120599 according to (12) and the 119902119904119895(119905) = 1 according to (13) thenreturn Falseelsereturn Trueend if

Pseudocode 2 Transition prediction criterion algorithm (recalling phase)

Figure 8 shows that three algorithms all can produce thecorrect model with zero prediction error at last because ofno noise However the STAMN with exhaustive explorationhas the better performance and the faster convergence speedbecause of the exhaustive exploration 119876-function

42 Experimental Comparison in 4 lowast 3 Grid Problem Firstwe use the small POMDP problem to test the feasibility of

the STAMN 4 lowast 3 gird problem is selected which is shownin Figure 9 The agent wanders inside the grid and has onlyleft sensor and right sensor to report the existence of a wallin the current position The agent has four actions forwardbackward turn left and turn rightThe reference direction ofthe agent is northward

In the 4 lowast 3 gird problem the hidden state space size|119878| is 11 the action space size |119860| is 4 for each state and

Computational Intelligence and Neuroscience 11

a1

fo(s3) = o1

fo(s2) = o2fo(s1) = o1 s1 s2

s3

a1a1

a1a2

a2

a2

(a)

a1a1

a1

a1

a1

a2

r

o1

o1

o1

o1

o1

o1

o2

o2

o2

(b)

Figure 7 (a) The STAMNmodel (b) The LPST model

0 20 40 60 80 1000

09

08

07

06

05

04

03

02

01

1 A simple deterministic POMDP

Timesteps

Pred

ictio

n er

ror

STAMN (exhaustive exploration)STAMN (random exploration)LPST

Figure 8 Comparative performance results for a simple determin-istic POMDP

the observation space size |119874| is 4 The upper figure in thegrid represents the observation state and the bottom figure inthe grid represents the hidden state The black grid and thewall are regarded as the obstacles We present a comparisonbetween the LPST and the STAMNThe results are shown inFigure 10 Figure 10 shows that the STAMN with exhaustiveexploration still has the better performance and the fasterconvergence speed in the 4 lowast 3 gird problem

The number of state nodes and action nodes in STAMNand LPST is described in Table 2 In STAMN the state node

20

34

01

08

27

13

02

110

09

16

25

Figure 9 4 lowast 3 gird problem

Table 2 The number of state nodes and action nodes in STAMNand LPST

Algorithm The number of statenodes The number of action nodes

LPST 70 50STAMN 11 44

represents the hidden state but in LPST the sate node repre-sents observation value and the hidden state is expressed by119896-step past observations and actions thus more observationnodes and action nodes are repeated created and most of theobservation nodes and action nodes are the same LPST hasthe perfect observation prediction capability and is the sameas the STAMN but is not the state transition model

In STAMN The setting of parameters 120583 120578 120574119904119894 120575119886119894 is veryimportant which determines the depth 119896 of short-term

12 Computational Intelligence and Neuroscience

0 200 400 600 800 1000Timesteps

STAMN (exhaustive exploration)STAMN (random exploration)LPST

0

09

08

07

06

05

04

03

02

01

1

Pred

ictio

n er

ror

4 lowast 3 grid problem

Figure 10 Comparative performance results for 4lowast3 gird problem

memory We assume 120574119904119894 = 120575119886119894 = 09 and 120583 = 120578 = 09 initialrepresenting 119896 = 1 When we need to increase 119896 to 119896 + 1 weonly need to decrease 120583 120578 according to 120578 = 120578minus02 120583 = 120583minus01The other parameters are determined relative easily We setlearning rates 120572 = 120573 = 120574 = 09 and set the confidenceparameters 119862119894119895 = 1|120591119895| 119863119894119895 = 12 representing the sameimportance degree Set constant values 119862 = 5 120599 = 09

43 Experimental Results in Complex Symmetrical Environ-ments The symmetrical environment in this paper is verycomplexwhich is shown in Figure 11(a)The robotwanders inthe environment By following wall behaviour the agent canrecognize the left wall right wall and corridor landmarksThese observation landmarks are different from the observa-tion vector in the previous grid problem every observationlandmark has different duration time For analysis of thefault tolerance and robustness of STAMN we assume thedisturbed environment is shown in Figure 11(b)

The figures in Figure 11(a) represent the hidden statesThe robot has four actions forward backward turn leftand turn right The initial orientation of the robot is shownin Figure 11(a) Since the robot follows the wall in theenvironment the robot has only one optional action in eachstate The reference direction of action is the robot currentorientation Thus the path 2-3-4-5-6 and path 12-13-14-15-16 are identical and the memory depth 119896 = 6 is necessaryto identify hidden states reliably We present a comparisonbetween the STAMNand the LPST in noise-free environmentand disturbed environment The results are shown as inFigures 12(a) and 12(b) Figure 12(b) shows that STAMN hasbetter noise tolerance and robustness than LPST because ofthe neuron synchronized activation mechanism which bearsthe fault and noise in the sequence item duration time andcan realize the reliable recalling However the LPST cannot

realize the correct convergence in reasonable time because ofaccurate matching

5 Discussion

In this section we compare the relatedworkwith the STAMNmodel The related work mainly includes the LPST and theAMN

51 Looping Prediction Suffix Tree (LPST) LPST is construc-ted incrementally by expanding branches until they are iden-tified or become looped using the observable terminationcriterion Given a sufficient history this model can correctlycapture all predictions of identifying histories and can mapthe all infinite identifying histories onto a finite LPST

The STAMNproposed in this paper is similar to the LPSTHowever the STAMN is looping hidden state transitionmodel so in comparison with LPST STAMN have lessstate nodes and action nodes because these nodes in LPSTare based on observation not hidden state FurthermoreSTAMNhas better noise tolerance and robustness than LPSTThe LPST realizes recalling by successive accurate matchingwhich is sensitive to noise and fault The STAMN offers theneuron synchronized activation mechanism to realize recall-ing Even if in noisy and disturbed environment STAMNcan still realize the reliable recalling Finally the algorithm forlearning a LPST is an additional computation model whichis not a distributed computational model The STAMN isa distributed network and uses the synchronized activationmechanism where performance cannot become poor withhistory sequences increasing and scale increasing

52 Associative Memory Networks (AMN) STAMN is pro-posed based on the development of the associative memorynetworks In existing AMN firstly almost all models areunable to handle complex sequence with looped hiddenstate The STAMN realizes the identifying the looped hiddenstate indeed which can be applied to HMM and POMDPproblems Furthermoremost AMNmodels can obtainmem-ory depth determined by experiments The STAMN offers aself-organizing incremental memory depth learningmethodand the memory depth 119896 is variable in different parts ofstate space Finally the existing AMN models general onlyrecord the temporal orders with discrete intervals Δ119905 ratherthan sequence items durationwith continuous-time STAMNexplicitly deals with the duration time of each item

6 Conclusion and Future Research

POMDP is the long-standing difficult problem in the mach-ine learning In this paper SATMN is proposed to identifythe looped hidden state only by the transition instances indeterministic POMDP environment The learned STAMN isseen as a variable depth 119896-Morkov model We proposed theheuristic constructing algorithm for the STAMN which isproved to be sound and complete given sufficient historysequences The STAMN is real self-organizing incrementalunsupervised learning model These transition instances can

Computational Intelligence and Neuroscience 13

2

3

6

7

8

9

10

11

12

54 14

13

21

22

23

24 18

19

20 16

171

15

(a) (b)

Figure 11 (a) Complex symmetrical simulation environment (b) The disturbed environment

0 100 200 300 400 500

A complex symmetrical environment

Timesteps

STAMNLPST

0

09

08

07

06

05

04

03

02

01

1

Pred

ictio

n er

ror

(a)

0 100 200 300 400 500

A disturbed environment

Timesteps

STAMNLPST

0

09

08

07

06

05

04

03

02

01

1Pr

edic

tion

erro

r

(b)

Figure 12 (a) Comparative results for a noise-free environment (b) Comparative results for a disturbed environment

be obtained by the interaction with the environment by theagent and can be obtained by a number of training datathat does not depend on the real agent The STAMN is veryfast robust and accurate We have also shown that STAMNoutperforms some existing hidden state methods in deter-ministic POMDP environment The STAMN can generallybe applied to almost all temporal sequences problem suchas simultaneous localization and mapping problem (SLAM)robot trajectory planning sequential decision making andmusic generation

We believe that the STAMN can serve as a starting pointfor integrating the associative memory networks with thePOMDP problem Further research will be carried out onthe following aspects how to scale our approach to thestochastic case by heuristic statistical test how to incorporatewith the reinforcement learning to produce a new distributed

reinforcement learning model and how it should be appliedto robot SLAM to resolve the practical navigation problem

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this article

References

[1] A K McCallum Reinforcement learning with selective percep-tion and hidden state [PhD thesis] University of RochesterRochester NY USA 1996

[2] MHutter ldquoFeature reinforcement learning part I unstructuredMDPsrdquo Journal of Artificial General Intelligence vol 1 no 1 pp3ndash24 2009

14 Computational Intelligence and Neuroscience

[3] M Daswani P Sunehag andMHutter ldquoFeature reinforcementlearning state of the artrdquo in Proceedings of the Workshops at the28th AAAI Conference on Artificial Intelligence 2014

[4] PNguyen P Sunehag andMHutter ldquoContext treemaximizingreinforcement learningrdquo in Proceedings of the 26th AAAI Con-ference onArtificial Intelligence pp 1075ndash1082 Toronto CanadaJuly 2012

[5] J Veness K S NgMHutterWUther andD Silver ldquoAMonte-Carlo AIXI approximationrdquo Journal of Artificial IntelligenceResearch vol 40 no 1 pp 95ndash142 2011

[6] M P Holmes and C L Isbell Jr ldquoLooping suffix tree-basedinference of partially observable hidden staterdquo in Proceedings ofthe 23rd International Conference on Machine Learning (ICMLrsquo06) pp 409ndash416 ACM Pittsburgh Pa USA 2006

[7] M Daswani P Sunehag andMHutter ldquoFeature reinforcementlearning using looping suffix treesrdquo in Proceedings of the 10thEuropeanWorkshop on Reinforcement Learning (EWRL rsquo12) pp11ndash24 Edinburgh Scotland 2012

[8] M Daswani P Sunehag and M Hutter ldquoQ-learning forhistory-based reinforcement learningrdquo in Proceedings of theConference on Machine Learning pp 213ndash228 2013

[9] S Timmer and M Riedmiller Reinforcement learning with his-tory lists [PhD thesis] University of Osnabruck OsnabruckGermany 2009

[10] E Talvitie ldquoLearning partially observable models using tempo-rally abstract decision treesrdquo in Advances in Neural InformationProcessing Systems pp 818ndash826 2012

[11] M Mahmud ldquoConstructing states for reinforcement learningrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 727ndash734 June 2010

[12] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquoNeural Computation vol 9 no 8 pp 1735ndash1780 1997

[13] V Mnih K Kavukcuoglu D Silver et al ldquoHuman-level controlthrough deep reinforcement learningrdquo Nature vol 518 no7540 pp 529ndash533 2015

[14] M Hausknecht and P Stone ldquoDeep recurrent Q-learning forpartially observable MDPsrdquo httparxivorgabs150706527

[15] K Narasimhan T Kulkarni and R Barzilay ldquoLanguage under-standing for text-based games using deep reinforcement learn-ingrdquo httpsarxivorgabs150608941

[16] J Oh X Guo H Lee R Lewis and S Singh ldquoAction-condi-tional video prediction using deep networks in atari gamesrdquohttparxivorgabs150708750

[17] X Li L Li J Gao et al ldquoRecurrent reinforcement learning ahybrid approachrdquo httparxivorgabs150903044

[18] D Wang and B Yuwono ldquoIncremental learning of complextemporal patternsrdquo IEEE Transactions on Neural Networks vol7 no 6 pp 1465ndash1481 1996

[19] A Sudo A Sato and O Hasegawa ldquoAssociative memory foronline learning in noisy environments using self-organizingincremental neural networkrdquo IEEE Transactions on NeuralNetworks vol 20 no 6 pp 964ndash972 2009

[20] M U Keysermann and P A Vargas ldquoTowards autonomousrobots via an incremental clustering and associative learningarchitecturerdquo Cognitive Computation vol 7 no 4 pp 414ndash4332014

[21] A F R Araujo and G D A Barreto ldquoContext in temporal seq-uence processing a self-organizing approach and its applicationto roboticsrdquo IEEE Transactions on Neural Networks vol 13 no1 pp 45ndash57 2002

[22] A L Freire G A Barreto M Veloso and A T Varela ldquoShort-term memory mechanisms in neural network learning of robotnavigation tasks A Case Studyrdquo in Proceedings of the 6th LatinAmerican Robotics Symposium (LARS rsquo09) pp 1ndash6 October2009

[23] S Tangruamsub A Kawewong M Tsuboyama and O Hase-gawa ldquoSelf-organizing incremental associative memory-basedrobot navigationrdquo IEICE Transactions on Information andSystems vol 95 no 10 pp 2415ndash2425 2012

[24] V A Nguyen J A Starzyk W-B Goh and D Jachyra ldquoNeuralnetwork structure for spatio-temporal long-term memoryrdquoIEEE Transactions on Neural Networks and Learning Systemsvol 23 no 6 pp 971ndash983 2012

[25] F Shen H Yu W Kasai and O Hasegawa ldquoAn associativememory system for incremental learning and temporal seq-uencerdquo in Proceedings of the International Joint Conference onNeural Networks (IJCNN rsquo10) pp 1ndash8 IEEE Barcelona SpainJuly 2010

[26] F ShenQOuyangWKasai andOHasegawa ldquoA general asso-ciative memory based on self-organizing incremental neuralnetworkrdquo Neurocomputing vol 104 pp 57ndash71 2013

[27] B Khouzam ldquoNeural networks as cellular computing modelsfor temporal sequence processingrdquo Suplec 2014

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 9: Research Article A Self-Organizing Incremental ...downloads.hindawi.com › journals › cin › 2016 › 7158507.pdfResearch Article A Self-Organizing Incremental Spatiotemporal Associative

Computational Intelligence and Neuroscience 9

sate 119904119894 so the identifying history sequence length for ℎ119905 andℎ119894 separately must increase to 119896 + 1(3) Previous Identifying If the current history sequence isidentified as the state 119904119895 then the previous state 119904119894 is identifiedsuch that 119904119895 = 119891119904(119904119894 119886119894) First we compute the identifyingactivation value for all states 119904 if there exists 119899119904119895(119905) = 1 thenthe current state 119904119905 is identified as state 119904119895 and the previousstate 119904119905minus1 is identified as the previous state 119904119894 of state 119904119895 Theprevious identifying value of the state node 119895 is defined as119889119904119894 (119905) The previous state 119904119894 is 119894 = argmax(119887119904119894 (119905)) satisfyingcondition 1 and condition 2 Then we set 119889119904119894 (119905)

Condition 1

119899119904119895 (119905) = 1 (14)

Condition 2

120601 (119887119904119894 (119905) 120596119894119895) gt 120599 (15)

According to above three identifying processes the iden-tifying activation value of the state node 119895 is defined asfollows

119899119904119895 (119905) = max (119898119904119895 (119905) 119902119904119895 (119905) 119889119904119895 (119905)) (16)

According to Algorithm 11 we give Pseudocode 1 forAlgorithm 11

Thepseudocode forAlgorithm 12 is as follows the currenthistory sequence identifying algorithm (identifying phase)Compute the identifying activation value according to (16)

The pseudocode for Algorithm 13 is in Pseudocode 2

4 Simulation

41 A Simple Example of Deterministic POMDP We wantto construct the looping transition model as in Figure 1First we obtain the transition instance ℎ119905 1199001 1198861 1199001 1198862 11990021198862 1199002 1198861 1199001 1198861 1199001 According to Algorithm 11 set the initialidentifying history sequence length 119896 = 1 The STAMNmodel is constructed incrementally using ℎ119905 as in Figure 6

When ℎ119905 = 1199001 1198861 1199001 1198862 1199002 1198862 1199002 1198861 1199001 1198861 1199001 1198862 1199001because of 119896 = 1 the looped state is identified by the currentobservation vectorThus ℎ1 = 1199001 and ℎ2 = 1199001 are the iden-tifying history sequences for the state 1199041 Since 1199001 = tans(ℎ11198862) and 1199002 = tans(ℎ2 1198862) in ℎ119905 exist so as to trans(ℎ1 1198862) =trans(ℎ2 1198862) according to Lemma 9 ℎ1 and ℎ2 are the iden-tifying history sequences for the states 119904119894 and 119904119895 respectivelyand 119904119894 = 119904119895 So 1199001 is distinguished as 11990010158401 and 119900101584010158401 And ℎ119905 =1199001 1198861 11990010158401 1198862 1199002 1198862 1199002 1198861 1199001 1198861 119900101584010158401 1198862 1199001

According the Lemma 10 for any two identifying historysequences ℎ119894 and ℎ119895 separately for 11990010158401 and 119900101584010158401 there existsℎ119894 = ℎ119895 iff 119904119894 = 119904119895 So the identifying history sequencelength for ℎ119894 and ℎ119895 separately for 119904119894 and 119904119895 must increase to 2For 11990010158401 the identifying history sequence is 1199001 1198861 11990010158401 and for119900101584010158401 the identifying history sequence is 1199001 1198861 119900101584010158401 These twoidentifying history sequences are identical so longer transi-tion instance is needed

fo(s2) = o2

fo(s1) = o1

s1

s2

a1

a1

a2

a2

Figure 6 The STAMNmodel with memory depth 1

Table 1 The identifying history sequences for 11990010158401 and 119900101584010158401 Hidden state 11990010158401 (119896 = 2) 119900101584010158401 (119896 = 3)

Identifying history sequence1199001 1198861 11990010158401 1199002 1198861 1199001 1198861 119900101584010158401119900101584010158401 1198862 119900101584011199002 1198861 11990010158401

When ℎ119905 = 1199001 1198861 11990010158401 1198862 1199002 1198862 1199002 1198861 1199001 1198861 119900101584010158401 1198862 11990010158401 11988621199002 1198861 11990010158401 1198862 1199002 1198861 1199001 1198861 1199001 1198861 11990010158401 1198862 1199002 In ℎ119905 we obtain theidentifying history sequence ℎ119894 = ℎ119895 for states 11990010158401 and 119900101584010158401 For 119896 = 2 the distinguished identifying history sequence for11990010158401 is 119900101584010158401 1198862 11990010158401 and 1199002 1198861 11990010158401 However the distinguishedidentifying history sequence for 119900101584010158401 is 1199001 1198861 119900101584010158401 which is nota discriminated history sequence from the identifying historysequence for 11990010158401 So the identifying history sequence length for119900101584010158401 must increase to 3 The identifying history sequences for11990010158401 and 1199001 are described as in Table 1

According to 119896-step history identifying following iden-tifying and previous identifying ℎ119905 can be represented asℎ119905 = 119900101584010158401 1198861 11990010158401 1198862 1199002 1198862 1199002 1198861 11990010158401 1198861 119900101584010158401 1198862 11990010158401 1198862 1199002 1198861 119900101584011198862 1199002 1198861 11990010158401 1198861 119900101584010158401 1198861 11990010158401 1198862 1199002

According to Pseudocode 1 the STAMN model is con-structed incrementally using ℎ119905 as in Figure 7(a) and theLPST is constructed as Figure 7(b) The STAMN has thefewest nodes because the state nodes in STAMN representthe hidden state and the state nodes in LPST represent theobservation state

To illustrate the difference between the STAMN andLPST we present the comparison results in Figure 8

After 10 timesteps the algorithms use the current learnedmodel to realize the transition prediction and the transitionprediction criterion is expressed by the prediction error Thedata point represents the average prediction error over 10runs We ran three algorithms the STAMN with exhaustiveexploration the STAMN with random exploration and theLPST

10 Computational Intelligence and Neuroscience

Set initial memory depth 119896 = 1 119905 = 0All nodes and connections weight do not exist initially in STAMNThe transition instance ℎ0 = 120601A STAMN is constructed incrementally through interaction with the environment by the agentexpanding the SATAMN until the transition prediction is contradicted with the current minimalhypothesis model reconstruct the hypothesis model by increase the memory depth 119896 in thispaper the constructing process includes the identifying and recalling simultaneouslyWhile one pattern 119900119905 or 119886119905 can be activated from the pattern cognition layer do

ℎ119905 = (ℎ119905minus1 119900119905 119886119905)for all existed state nodescompute the identifying activation value by executing Algorithm 12If exist the 119899119904119895(119905) = 1 thenthe looped state is identified and the weight 120596119894119895120596119894119895 is adjusted according to (4) (5)

elsea new state node 119895 will be created set 119899119904119895(119905) = 1 and the weight 120596119894119895120596119894119895 is adjusted according to (4) (5)end ifend forfor all existed state nodescompute the transition prediction criterion by executing Algorithm 13If IsNotTrans(119904119895) then

for the previous winner state node 119894 of the state node 119895set memory depth 119896 = 119896 + 1 until each ℎ119894 are discriminated

end forreconstruct the hypothesis model by the new memory depth 119896 according to Algorithm 11

end ifend forfor all existed action nodescompute the activation value according to (9)

if exist the 119899119886119895 (119905) = 1 thenthe weight 119908119894119895(119905) is adjusted according to (6)else

a new action node 119895 will be created set 119899119886119895 (119905) = 1 and the weight 119908119894119895(119905) is adjusted according to (6)end ifend forThe agent obtains the new observation vectors by 119900119905+1 = trans(119904119905 119886119905)119905 = 119905 + 1endWhile

Pseudocode 1 Algorithm for heuristic constructing the STAMN

IsNotTrans(119904119895)if the 119901119904119895(119905) gt 120599 according to (12) and the 119902119904119895(119905) = 1 according to (13) thenreturn Falseelsereturn Trueend if

Pseudocode 2 Transition prediction criterion algorithm (recalling phase)

Figure 8 shows that three algorithms all can produce thecorrect model with zero prediction error at last because ofno noise However the STAMN with exhaustive explorationhas the better performance and the faster convergence speedbecause of the exhaustive exploration 119876-function

42 Experimental Comparison in 4 lowast 3 Grid Problem Firstwe use the small POMDP problem to test the feasibility of

the STAMN 4 lowast 3 gird problem is selected which is shownin Figure 9 The agent wanders inside the grid and has onlyleft sensor and right sensor to report the existence of a wallin the current position The agent has four actions forwardbackward turn left and turn rightThe reference direction ofthe agent is northward

In the 4 lowast 3 gird problem the hidden state space size|119878| is 11 the action space size |119860| is 4 for each state and

Computational Intelligence and Neuroscience 11

a1

fo(s3) = o1

fo(s2) = o2fo(s1) = o1 s1 s2

s3

a1a1

a1a2

a2

a2

(a)

a1a1

a1

a1

a1

a2

r

o1

o1

o1

o1

o1

o1

o2

o2

o2

(b)

Figure 7 (a) The STAMNmodel (b) The LPST model

0 20 40 60 80 1000

09

08

07

06

05

04

03

02

01

1 A simple deterministic POMDP

Timesteps

Pred

ictio

n er

ror

STAMN (exhaustive exploration)STAMN (random exploration)LPST

Figure 8 Comparative performance results for a simple determin-istic POMDP

the observation space size |119874| is 4 The upper figure in thegrid represents the observation state and the bottom figure inthe grid represents the hidden state The black grid and thewall are regarded as the obstacles We present a comparisonbetween the LPST and the STAMNThe results are shown inFigure 10 Figure 10 shows that the STAMN with exhaustiveexploration still has the better performance and the fasterconvergence speed in the 4 lowast 3 gird problem

The number of state nodes and action nodes in STAMNand LPST is described in Table 2 In STAMN the state node

20

34

01

08

27

13

02

110

09

16

25

Figure 9 4 lowast 3 gird problem

Table 2 The number of state nodes and action nodes in STAMNand LPST

Algorithm The number of statenodes The number of action nodes

LPST 70 50STAMN 11 44

represents the hidden state but in LPST the sate node repre-sents observation value and the hidden state is expressed by119896-step past observations and actions thus more observationnodes and action nodes are repeated created and most of theobservation nodes and action nodes are the same LPST hasthe perfect observation prediction capability and is the sameas the STAMN but is not the state transition model

In STAMN The setting of parameters 120583 120578 120574119904119894 120575119886119894 is veryimportant which determines the depth 119896 of short-term

12 Computational Intelligence and Neuroscience

0 200 400 600 800 1000Timesteps

STAMN (exhaustive exploration)STAMN (random exploration)LPST

0

09

08

07

06

05

04

03

02

01

1

Pred

ictio

n er

ror

4 lowast 3 grid problem

Figure 10 Comparative performance results for 4lowast3 gird problem

memory We assume 120574119904119894 = 120575119886119894 = 09 and 120583 = 120578 = 09 initialrepresenting 119896 = 1 When we need to increase 119896 to 119896 + 1 weonly need to decrease 120583 120578 according to 120578 = 120578minus02 120583 = 120583minus01The other parameters are determined relative easily We setlearning rates 120572 = 120573 = 120574 = 09 and set the confidenceparameters 119862119894119895 = 1|120591119895| 119863119894119895 = 12 representing the sameimportance degree Set constant values 119862 = 5 120599 = 09

43 Experimental Results in Complex Symmetrical Environ-ments The symmetrical environment in this paper is verycomplexwhich is shown in Figure 11(a)The robotwanders inthe environment By following wall behaviour the agent canrecognize the left wall right wall and corridor landmarksThese observation landmarks are different from the observa-tion vector in the previous grid problem every observationlandmark has different duration time For analysis of thefault tolerance and robustness of STAMN we assume thedisturbed environment is shown in Figure 11(b)

The figures in Figure 11(a) represent the hidden statesThe robot has four actions forward backward turn leftand turn right The initial orientation of the robot is shownin Figure 11(a) Since the robot follows the wall in theenvironment the robot has only one optional action in eachstate The reference direction of action is the robot currentorientation Thus the path 2-3-4-5-6 and path 12-13-14-15-16 are identical and the memory depth 119896 = 6 is necessaryto identify hidden states reliably We present a comparisonbetween the STAMNand the LPST in noise-free environmentand disturbed environment The results are shown as inFigures 12(a) and 12(b) Figure 12(b) shows that STAMN hasbetter noise tolerance and robustness than LPST because ofthe neuron synchronized activation mechanism which bearsthe fault and noise in the sequence item duration time andcan realize the reliable recalling However the LPST cannot

realize the correct convergence in reasonable time because ofaccurate matching

5 Discussion

In this section we compare the relatedworkwith the STAMNmodel The related work mainly includes the LPST and theAMN

51 Looping Prediction Suffix Tree (LPST) LPST is construc-ted incrementally by expanding branches until they are iden-tified or become looped using the observable terminationcriterion Given a sufficient history this model can correctlycapture all predictions of identifying histories and can mapthe all infinite identifying histories onto a finite LPST

The STAMNproposed in this paper is similar to the LPSTHowever the STAMN is looping hidden state transitionmodel so in comparison with LPST STAMN have lessstate nodes and action nodes because these nodes in LPSTare based on observation not hidden state FurthermoreSTAMNhas better noise tolerance and robustness than LPSTThe LPST realizes recalling by successive accurate matchingwhich is sensitive to noise and fault The STAMN offers theneuron synchronized activation mechanism to realize recall-ing Even if in noisy and disturbed environment STAMNcan still realize the reliable recalling Finally the algorithm forlearning a LPST is an additional computation model whichis not a distributed computational model The STAMN isa distributed network and uses the synchronized activationmechanism where performance cannot become poor withhistory sequences increasing and scale increasing

52 Associative Memory Networks (AMN) STAMN is pro-posed based on the development of the associative memorynetworks In existing AMN firstly almost all models areunable to handle complex sequence with looped hiddenstate The STAMN realizes the identifying the looped hiddenstate indeed which can be applied to HMM and POMDPproblems Furthermoremost AMNmodels can obtainmem-ory depth determined by experiments The STAMN offers aself-organizing incremental memory depth learningmethodand the memory depth 119896 is variable in different parts ofstate space Finally the existing AMN models general onlyrecord the temporal orders with discrete intervals Δ119905 ratherthan sequence items durationwith continuous-time STAMNexplicitly deals with the duration time of each item

6 Conclusion and Future Research

POMDP is the long-standing difficult problem in the mach-ine learning In this paper SATMN is proposed to identifythe looped hidden state only by the transition instances indeterministic POMDP environment The learned STAMN isseen as a variable depth 119896-Morkov model We proposed theheuristic constructing algorithm for the STAMN which isproved to be sound and complete given sufficient historysequences The STAMN is real self-organizing incrementalunsupervised learning model These transition instances can

Computational Intelligence and Neuroscience 13

2

3

6

7

8

9

10

11

12

54 14

13

21

22

23

24 18

19

20 16

171

15

(a) (b)

Figure 11 (a) Complex symmetrical simulation environment (b) The disturbed environment

0 100 200 300 400 500

A complex symmetrical environment

Timesteps

STAMNLPST

0

09

08

07

06

05

04

03

02

01

1

Pred

ictio

n er

ror

(a)

0 100 200 300 400 500

A disturbed environment

Timesteps

STAMNLPST

0

09

08

07

06

05

04

03

02

01

1Pr

edic

tion

erro

r

(b)

Figure 12 (a) Comparative results for a noise-free environment (b) Comparative results for a disturbed environment

be obtained by the interaction with the environment by theagent and can be obtained by a number of training datathat does not depend on the real agent The STAMN is veryfast robust and accurate We have also shown that STAMNoutperforms some existing hidden state methods in deter-ministic POMDP environment The STAMN can generallybe applied to almost all temporal sequences problem suchas simultaneous localization and mapping problem (SLAM)robot trajectory planning sequential decision making andmusic generation

We believe that the STAMN can serve as a starting pointfor integrating the associative memory networks with thePOMDP problem Further research will be carried out onthe following aspects how to scale our approach to thestochastic case by heuristic statistical test how to incorporatewith the reinforcement learning to produce a new distributed

reinforcement learning model and how it should be appliedto robot SLAM to resolve the practical navigation problem

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this article

References

[1] A K McCallum Reinforcement learning with selective percep-tion and hidden state [PhD thesis] University of RochesterRochester NY USA 1996

[2] MHutter ldquoFeature reinforcement learning part I unstructuredMDPsrdquo Journal of Artificial General Intelligence vol 1 no 1 pp3ndash24 2009

14 Computational Intelligence and Neuroscience

[3] M Daswani P Sunehag andMHutter ldquoFeature reinforcementlearning state of the artrdquo in Proceedings of the Workshops at the28th AAAI Conference on Artificial Intelligence 2014

[4] PNguyen P Sunehag andMHutter ldquoContext treemaximizingreinforcement learningrdquo in Proceedings of the 26th AAAI Con-ference onArtificial Intelligence pp 1075ndash1082 Toronto CanadaJuly 2012

[5] J Veness K S NgMHutterWUther andD Silver ldquoAMonte-Carlo AIXI approximationrdquo Journal of Artificial IntelligenceResearch vol 40 no 1 pp 95ndash142 2011

[6] M P Holmes and C L Isbell Jr ldquoLooping suffix tree-basedinference of partially observable hidden staterdquo in Proceedings ofthe 23rd International Conference on Machine Learning (ICMLrsquo06) pp 409ndash416 ACM Pittsburgh Pa USA 2006

[7] M Daswani P Sunehag andMHutter ldquoFeature reinforcementlearning using looping suffix treesrdquo in Proceedings of the 10thEuropeanWorkshop on Reinforcement Learning (EWRL rsquo12) pp11ndash24 Edinburgh Scotland 2012

[8] M Daswani P Sunehag and M Hutter ldquoQ-learning forhistory-based reinforcement learningrdquo in Proceedings of theConference on Machine Learning pp 213ndash228 2013

[9] S Timmer and M Riedmiller Reinforcement learning with his-tory lists [PhD thesis] University of Osnabruck OsnabruckGermany 2009

[10] E Talvitie ldquoLearning partially observable models using tempo-rally abstract decision treesrdquo in Advances in Neural InformationProcessing Systems pp 818ndash826 2012

[11] M Mahmud ldquoConstructing states for reinforcement learningrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 727ndash734 June 2010

[12] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquoNeural Computation vol 9 no 8 pp 1735ndash1780 1997

[13] V Mnih K Kavukcuoglu D Silver et al ldquoHuman-level controlthrough deep reinforcement learningrdquo Nature vol 518 no7540 pp 529ndash533 2015

[14] M Hausknecht and P Stone ldquoDeep recurrent Q-learning forpartially observable MDPsrdquo httparxivorgabs150706527

[15] K Narasimhan T Kulkarni and R Barzilay ldquoLanguage under-standing for text-based games using deep reinforcement learn-ingrdquo httpsarxivorgabs150608941

[16] J Oh X Guo H Lee R Lewis and S Singh ldquoAction-condi-tional video prediction using deep networks in atari gamesrdquohttparxivorgabs150708750

[17] X Li L Li J Gao et al ldquoRecurrent reinforcement learning ahybrid approachrdquo httparxivorgabs150903044

[18] D Wang and B Yuwono ldquoIncremental learning of complextemporal patternsrdquo IEEE Transactions on Neural Networks vol7 no 6 pp 1465ndash1481 1996

[19] A Sudo A Sato and O Hasegawa ldquoAssociative memory foronline learning in noisy environments using self-organizingincremental neural networkrdquo IEEE Transactions on NeuralNetworks vol 20 no 6 pp 964ndash972 2009

[20] M U Keysermann and P A Vargas ldquoTowards autonomousrobots via an incremental clustering and associative learningarchitecturerdquo Cognitive Computation vol 7 no 4 pp 414ndash4332014

[21] A F R Araujo and G D A Barreto ldquoContext in temporal seq-uence processing a self-organizing approach and its applicationto roboticsrdquo IEEE Transactions on Neural Networks vol 13 no1 pp 45ndash57 2002

[22] A L Freire G A Barreto M Veloso and A T Varela ldquoShort-term memory mechanisms in neural network learning of robotnavigation tasks A Case Studyrdquo in Proceedings of the 6th LatinAmerican Robotics Symposium (LARS rsquo09) pp 1ndash6 October2009

[23] S Tangruamsub A Kawewong M Tsuboyama and O Hase-gawa ldquoSelf-organizing incremental associative memory-basedrobot navigationrdquo IEICE Transactions on Information andSystems vol 95 no 10 pp 2415ndash2425 2012

[24] V A Nguyen J A Starzyk W-B Goh and D Jachyra ldquoNeuralnetwork structure for spatio-temporal long-term memoryrdquoIEEE Transactions on Neural Networks and Learning Systemsvol 23 no 6 pp 971ndash983 2012

[25] F Shen H Yu W Kasai and O Hasegawa ldquoAn associativememory system for incremental learning and temporal seq-uencerdquo in Proceedings of the International Joint Conference onNeural Networks (IJCNN rsquo10) pp 1ndash8 IEEE Barcelona SpainJuly 2010

[26] F ShenQOuyangWKasai andOHasegawa ldquoA general asso-ciative memory based on self-organizing incremental neuralnetworkrdquo Neurocomputing vol 104 pp 57ndash71 2013

[27] B Khouzam ldquoNeural networks as cellular computing modelsfor temporal sequence processingrdquo Suplec 2014

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 10: Research Article A Self-Organizing Incremental ...downloads.hindawi.com › journals › cin › 2016 › 7158507.pdfResearch Article A Self-Organizing Incremental Spatiotemporal Associative

10 Computational Intelligence and Neuroscience

Set initial memory depth 119896 = 1 119905 = 0All nodes and connections weight do not exist initially in STAMNThe transition instance ℎ0 = 120601A STAMN is constructed incrementally through interaction with the environment by the agentexpanding the SATAMN until the transition prediction is contradicted with the current minimalhypothesis model reconstruct the hypothesis model by increase the memory depth 119896 in thispaper the constructing process includes the identifying and recalling simultaneouslyWhile one pattern 119900119905 or 119886119905 can be activated from the pattern cognition layer do

ℎ119905 = (ℎ119905minus1 119900119905 119886119905)for all existed state nodescompute the identifying activation value by executing Algorithm 12If exist the 119899119904119895(119905) = 1 thenthe looped state is identified and the weight 120596119894119895120596119894119895 is adjusted according to (4) (5)

elsea new state node 119895 will be created set 119899119904119895(119905) = 1 and the weight 120596119894119895120596119894119895 is adjusted according to (4) (5)end ifend forfor all existed state nodescompute the transition prediction criterion by executing Algorithm 13If IsNotTrans(119904119895) then

for the previous winner state node 119894 of the state node 119895set memory depth 119896 = 119896 + 1 until each ℎ119894 are discriminated

end forreconstruct the hypothesis model by the new memory depth 119896 according to Algorithm 11

end ifend forfor all existed action nodescompute the activation value according to (9)

if exist the 119899119886119895 (119905) = 1 thenthe weight 119908119894119895(119905) is adjusted according to (6)else

a new action node 119895 will be created set 119899119886119895 (119905) = 1 and the weight 119908119894119895(119905) is adjusted according to (6)end ifend forThe agent obtains the new observation vectors by 119900119905+1 = trans(119904119905 119886119905)119905 = 119905 + 1endWhile

Pseudocode 1 Algorithm for heuristic constructing the STAMN

IsNotTrans(119904119895)if the 119901119904119895(119905) gt 120599 according to (12) and the 119902119904119895(119905) = 1 according to (13) thenreturn Falseelsereturn Trueend if

Pseudocode 2 Transition prediction criterion algorithm (recalling phase)

Figure 8 shows that three algorithms all can produce thecorrect model with zero prediction error at last because ofno noise However the STAMN with exhaustive explorationhas the better performance and the faster convergence speedbecause of the exhaustive exploration 119876-function

42 Experimental Comparison in 4 lowast 3 Grid Problem Firstwe use the small POMDP problem to test the feasibility of

the STAMN 4 lowast 3 gird problem is selected which is shownin Figure 9 The agent wanders inside the grid and has onlyleft sensor and right sensor to report the existence of a wallin the current position The agent has four actions forwardbackward turn left and turn rightThe reference direction ofthe agent is northward

In the 4 lowast 3 gird problem the hidden state space size|119878| is 11 the action space size |119860| is 4 for each state and

Computational Intelligence and Neuroscience 11

a1

fo(s3) = o1

fo(s2) = o2fo(s1) = o1 s1 s2

s3

a1a1

a1a2

a2

a2

(a)

a1a1

a1

a1

a1

a2

r

o1

o1

o1

o1

o1

o1

o2

o2

o2

(b)

Figure 7 (a) The STAMNmodel (b) The LPST model

0 20 40 60 80 1000

09

08

07

06

05

04

03

02

01

1 A simple deterministic POMDP

Timesteps

Pred

ictio

n er

ror

STAMN (exhaustive exploration)STAMN (random exploration)LPST

Figure 8 Comparative performance results for a simple determin-istic POMDP

the observation space size |119874| is 4 The upper figure in thegrid represents the observation state and the bottom figure inthe grid represents the hidden state The black grid and thewall are regarded as the obstacles We present a comparisonbetween the LPST and the STAMNThe results are shown inFigure 10 Figure 10 shows that the STAMN with exhaustiveexploration still has the better performance and the fasterconvergence speed in the 4 lowast 3 gird problem

The number of state nodes and action nodes in STAMNand LPST is described in Table 2 In STAMN the state node

20

34

01

08

27

13

02

110

09

16

25

Figure 9 4 lowast 3 gird problem

Table 2 The number of state nodes and action nodes in STAMNand LPST

Algorithm The number of statenodes The number of action nodes

LPST 70 50STAMN 11 44

represents the hidden state but in LPST the sate node repre-sents observation value and the hidden state is expressed by119896-step past observations and actions thus more observationnodes and action nodes are repeated created and most of theobservation nodes and action nodes are the same LPST hasthe perfect observation prediction capability and is the sameas the STAMN but is not the state transition model

In STAMN The setting of parameters 120583 120578 120574119904119894 120575119886119894 is veryimportant which determines the depth 119896 of short-term

12 Computational Intelligence and Neuroscience

0 200 400 600 800 1000Timesteps

STAMN (exhaustive exploration)STAMN (random exploration)LPST

0

09

08

07

06

05

04

03

02

01

1

Pred

ictio

n er

ror

4 lowast 3 grid problem

Figure 10 Comparative performance results for 4lowast3 gird problem

memory We assume 120574119904119894 = 120575119886119894 = 09 and 120583 = 120578 = 09 initialrepresenting 119896 = 1 When we need to increase 119896 to 119896 + 1 weonly need to decrease 120583 120578 according to 120578 = 120578minus02 120583 = 120583minus01The other parameters are determined relative easily We setlearning rates 120572 = 120573 = 120574 = 09 and set the confidenceparameters 119862119894119895 = 1|120591119895| 119863119894119895 = 12 representing the sameimportance degree Set constant values 119862 = 5 120599 = 09

43 Experimental Results in Complex Symmetrical Environ-ments The symmetrical environment in this paper is verycomplexwhich is shown in Figure 11(a)The robotwanders inthe environment By following wall behaviour the agent canrecognize the left wall right wall and corridor landmarksThese observation landmarks are different from the observa-tion vector in the previous grid problem every observationlandmark has different duration time For analysis of thefault tolerance and robustness of STAMN we assume thedisturbed environment is shown in Figure 11(b)

The figures in Figure 11(a) represent the hidden statesThe robot has four actions forward backward turn leftand turn right The initial orientation of the robot is shownin Figure 11(a) Since the robot follows the wall in theenvironment the robot has only one optional action in eachstate The reference direction of action is the robot currentorientation Thus the path 2-3-4-5-6 and path 12-13-14-15-16 are identical and the memory depth 119896 = 6 is necessaryto identify hidden states reliably We present a comparisonbetween the STAMNand the LPST in noise-free environmentand disturbed environment The results are shown as inFigures 12(a) and 12(b) Figure 12(b) shows that STAMN hasbetter noise tolerance and robustness than LPST because ofthe neuron synchronized activation mechanism which bearsthe fault and noise in the sequence item duration time andcan realize the reliable recalling However the LPST cannot

realize the correct convergence in reasonable time because ofaccurate matching

5 Discussion

In this section we compare the relatedworkwith the STAMNmodel The related work mainly includes the LPST and theAMN

51 Looping Prediction Suffix Tree (LPST) LPST is construc-ted incrementally by expanding branches until they are iden-tified or become looped using the observable terminationcriterion Given a sufficient history this model can correctlycapture all predictions of identifying histories and can mapthe all infinite identifying histories onto a finite LPST

The STAMNproposed in this paper is similar to the LPSTHowever the STAMN is looping hidden state transitionmodel so in comparison with LPST STAMN have lessstate nodes and action nodes because these nodes in LPSTare based on observation not hidden state FurthermoreSTAMNhas better noise tolerance and robustness than LPSTThe LPST realizes recalling by successive accurate matchingwhich is sensitive to noise and fault The STAMN offers theneuron synchronized activation mechanism to realize recall-ing Even if in noisy and disturbed environment STAMNcan still realize the reliable recalling Finally the algorithm forlearning a LPST is an additional computation model whichis not a distributed computational model The STAMN isa distributed network and uses the synchronized activationmechanism where performance cannot become poor withhistory sequences increasing and scale increasing

52 Associative Memory Networks (AMN) STAMN is pro-posed based on the development of the associative memorynetworks In existing AMN firstly almost all models areunable to handle complex sequence with looped hiddenstate The STAMN realizes the identifying the looped hiddenstate indeed which can be applied to HMM and POMDPproblems Furthermoremost AMNmodels can obtainmem-ory depth determined by experiments The STAMN offers aself-organizing incremental memory depth learningmethodand the memory depth 119896 is variable in different parts ofstate space Finally the existing AMN models general onlyrecord the temporal orders with discrete intervals Δ119905 ratherthan sequence items durationwith continuous-time STAMNexplicitly deals with the duration time of each item

6 Conclusion and Future Research

POMDP is the long-standing difficult problem in the mach-ine learning In this paper SATMN is proposed to identifythe looped hidden state only by the transition instances indeterministic POMDP environment The learned STAMN isseen as a variable depth 119896-Morkov model We proposed theheuristic constructing algorithm for the STAMN which isproved to be sound and complete given sufficient historysequences The STAMN is real self-organizing incrementalunsupervised learning model These transition instances can

Computational Intelligence and Neuroscience 13

2

3

6

7

8

9

10

11

12

54 14

13

21

22

23

24 18

19

20 16

171

15

(a) (b)

Figure 11 (a) Complex symmetrical simulation environment (b) The disturbed environment

0 100 200 300 400 500

A complex symmetrical environment

Timesteps

STAMNLPST

0

09

08

07

06

05

04

03

02

01

1

Pred

ictio

n er

ror

(a)

0 100 200 300 400 500

A disturbed environment

Timesteps

STAMNLPST

0

09

08

07

06

05

04

03

02

01

1Pr

edic

tion

erro

r

(b)

Figure 12 (a) Comparative results for a noise-free environment (b) Comparative results for a disturbed environment

be obtained by the interaction with the environment by theagent and can be obtained by a number of training datathat does not depend on the real agent The STAMN is veryfast robust and accurate We have also shown that STAMNoutperforms some existing hidden state methods in deter-ministic POMDP environment The STAMN can generallybe applied to almost all temporal sequences problem suchas simultaneous localization and mapping problem (SLAM)robot trajectory planning sequential decision making andmusic generation

We believe that the STAMN can serve as a starting pointfor integrating the associative memory networks with thePOMDP problem Further research will be carried out onthe following aspects how to scale our approach to thestochastic case by heuristic statistical test how to incorporatewith the reinforcement learning to produce a new distributed

reinforcement learning model and how it should be appliedto robot SLAM to resolve the practical navigation problem

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this article

References

[1] A K McCallum Reinforcement learning with selective percep-tion and hidden state [PhD thesis] University of RochesterRochester NY USA 1996

[2] MHutter ldquoFeature reinforcement learning part I unstructuredMDPsrdquo Journal of Artificial General Intelligence vol 1 no 1 pp3ndash24 2009

14 Computational Intelligence and Neuroscience

[3] M Daswani P Sunehag andMHutter ldquoFeature reinforcementlearning state of the artrdquo in Proceedings of the Workshops at the28th AAAI Conference on Artificial Intelligence 2014

[4] PNguyen P Sunehag andMHutter ldquoContext treemaximizingreinforcement learningrdquo in Proceedings of the 26th AAAI Con-ference onArtificial Intelligence pp 1075ndash1082 Toronto CanadaJuly 2012

[5] J Veness K S NgMHutterWUther andD Silver ldquoAMonte-Carlo AIXI approximationrdquo Journal of Artificial IntelligenceResearch vol 40 no 1 pp 95ndash142 2011

[6] M P Holmes and C L Isbell Jr ldquoLooping suffix tree-basedinference of partially observable hidden staterdquo in Proceedings ofthe 23rd International Conference on Machine Learning (ICMLrsquo06) pp 409ndash416 ACM Pittsburgh Pa USA 2006

[7] M Daswani P Sunehag andMHutter ldquoFeature reinforcementlearning using looping suffix treesrdquo in Proceedings of the 10thEuropeanWorkshop on Reinforcement Learning (EWRL rsquo12) pp11ndash24 Edinburgh Scotland 2012

[8] M Daswani P Sunehag and M Hutter ldquoQ-learning forhistory-based reinforcement learningrdquo in Proceedings of theConference on Machine Learning pp 213ndash228 2013

[9] S Timmer and M Riedmiller Reinforcement learning with his-tory lists [PhD thesis] University of Osnabruck OsnabruckGermany 2009

[10] E Talvitie ldquoLearning partially observable models using tempo-rally abstract decision treesrdquo in Advances in Neural InformationProcessing Systems pp 818ndash826 2012

[11] M Mahmud ldquoConstructing states for reinforcement learningrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 727ndash734 June 2010

[12] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquoNeural Computation vol 9 no 8 pp 1735ndash1780 1997

[13] V Mnih K Kavukcuoglu D Silver et al ldquoHuman-level controlthrough deep reinforcement learningrdquo Nature vol 518 no7540 pp 529ndash533 2015

[14] M Hausknecht and P Stone ldquoDeep recurrent Q-learning forpartially observable MDPsrdquo httparxivorgabs150706527

[15] K Narasimhan T Kulkarni and R Barzilay ldquoLanguage under-standing for text-based games using deep reinforcement learn-ingrdquo httpsarxivorgabs150608941

[16] J Oh X Guo H Lee R Lewis and S Singh ldquoAction-condi-tional video prediction using deep networks in atari gamesrdquohttparxivorgabs150708750

[17] X Li L Li J Gao et al ldquoRecurrent reinforcement learning ahybrid approachrdquo httparxivorgabs150903044

[18] D Wang and B Yuwono ldquoIncremental learning of complextemporal patternsrdquo IEEE Transactions on Neural Networks vol7 no 6 pp 1465ndash1481 1996

[19] A Sudo A Sato and O Hasegawa ldquoAssociative memory foronline learning in noisy environments using self-organizingincremental neural networkrdquo IEEE Transactions on NeuralNetworks vol 20 no 6 pp 964ndash972 2009

[20] M U Keysermann and P A Vargas ldquoTowards autonomousrobots via an incremental clustering and associative learningarchitecturerdquo Cognitive Computation vol 7 no 4 pp 414ndash4332014

[21] A F R Araujo and G D A Barreto ldquoContext in temporal seq-uence processing a self-organizing approach and its applicationto roboticsrdquo IEEE Transactions on Neural Networks vol 13 no1 pp 45ndash57 2002

[22] A L Freire G A Barreto M Veloso and A T Varela ldquoShort-term memory mechanisms in neural network learning of robotnavigation tasks A Case Studyrdquo in Proceedings of the 6th LatinAmerican Robotics Symposium (LARS rsquo09) pp 1ndash6 October2009

[23] S Tangruamsub A Kawewong M Tsuboyama and O Hase-gawa ldquoSelf-organizing incremental associative memory-basedrobot navigationrdquo IEICE Transactions on Information andSystems vol 95 no 10 pp 2415ndash2425 2012

[24] V A Nguyen J A Starzyk W-B Goh and D Jachyra ldquoNeuralnetwork structure for spatio-temporal long-term memoryrdquoIEEE Transactions on Neural Networks and Learning Systemsvol 23 no 6 pp 971ndash983 2012

[25] F Shen H Yu W Kasai and O Hasegawa ldquoAn associativememory system for incremental learning and temporal seq-uencerdquo in Proceedings of the International Joint Conference onNeural Networks (IJCNN rsquo10) pp 1ndash8 IEEE Barcelona SpainJuly 2010

[26] F ShenQOuyangWKasai andOHasegawa ldquoA general asso-ciative memory based on self-organizing incremental neuralnetworkrdquo Neurocomputing vol 104 pp 57ndash71 2013

[27] B Khouzam ldquoNeural networks as cellular computing modelsfor temporal sequence processingrdquo Suplec 2014

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 11: Research Article A Self-Organizing Incremental ...downloads.hindawi.com › journals › cin › 2016 › 7158507.pdfResearch Article A Self-Organizing Incremental Spatiotemporal Associative

Computational Intelligence and Neuroscience 11

a1

fo(s3) = o1

fo(s2) = o2fo(s1) = o1 s1 s2

s3

a1a1

a1a2

a2

a2

(a)

a1a1

a1

a1

a1

a2

r

o1

o1

o1

o1

o1

o1

o2

o2

o2

(b)

Figure 7 (a) The STAMNmodel (b) The LPST model

0 20 40 60 80 1000

09

08

07

06

05

04

03

02

01

1 A simple deterministic POMDP

Timesteps

Pred

ictio

n er

ror

STAMN (exhaustive exploration)STAMN (random exploration)LPST

Figure 8 Comparative performance results for a simple determin-istic POMDP

the observation space size |119874| is 4 The upper figure in thegrid represents the observation state and the bottom figure inthe grid represents the hidden state The black grid and thewall are regarded as the obstacles We present a comparisonbetween the LPST and the STAMNThe results are shown inFigure 10 Figure 10 shows that the STAMN with exhaustiveexploration still has the better performance and the fasterconvergence speed in the 4 lowast 3 gird problem

The number of state nodes and action nodes in STAMNand LPST is described in Table 2 In STAMN the state node

20

34

01

08

27

13

02

110

09

16

25

Figure 9 4 lowast 3 gird problem

Table 2 The number of state nodes and action nodes in STAMNand LPST

Algorithm The number of statenodes The number of action nodes

LPST 70 50STAMN 11 44

represents the hidden state but in LPST the sate node repre-sents observation value and the hidden state is expressed by119896-step past observations and actions thus more observationnodes and action nodes are repeated created and most of theobservation nodes and action nodes are the same LPST hasthe perfect observation prediction capability and is the sameas the STAMN but is not the state transition model

In STAMN The setting of parameters 120583 120578 120574119904119894 120575119886119894 is veryimportant which determines the depth 119896 of short-term

12 Computational Intelligence and Neuroscience

0 200 400 600 800 1000Timesteps

STAMN (exhaustive exploration)STAMN (random exploration)LPST

0

09

08

07

06

05

04

03

02

01

1

Pred

ictio

n er

ror

4 lowast 3 grid problem

Figure 10 Comparative performance results for 4lowast3 gird problem

memory We assume 120574119904119894 = 120575119886119894 = 09 and 120583 = 120578 = 09 initialrepresenting 119896 = 1 When we need to increase 119896 to 119896 + 1 weonly need to decrease 120583 120578 according to 120578 = 120578minus02 120583 = 120583minus01The other parameters are determined relative easily We setlearning rates 120572 = 120573 = 120574 = 09 and set the confidenceparameters 119862119894119895 = 1|120591119895| 119863119894119895 = 12 representing the sameimportance degree Set constant values 119862 = 5 120599 = 09

43 Experimental Results in Complex Symmetrical Environ-ments The symmetrical environment in this paper is verycomplexwhich is shown in Figure 11(a)The robotwanders inthe environment By following wall behaviour the agent canrecognize the left wall right wall and corridor landmarksThese observation landmarks are different from the observa-tion vector in the previous grid problem every observationlandmark has different duration time For analysis of thefault tolerance and robustness of STAMN we assume thedisturbed environment is shown in Figure 11(b)

The figures in Figure 11(a) represent the hidden statesThe robot has four actions forward backward turn leftand turn right The initial orientation of the robot is shownin Figure 11(a) Since the robot follows the wall in theenvironment the robot has only one optional action in eachstate The reference direction of action is the robot currentorientation Thus the path 2-3-4-5-6 and path 12-13-14-15-16 are identical and the memory depth 119896 = 6 is necessaryto identify hidden states reliably We present a comparisonbetween the STAMNand the LPST in noise-free environmentand disturbed environment The results are shown as inFigures 12(a) and 12(b) Figure 12(b) shows that STAMN hasbetter noise tolerance and robustness than LPST because ofthe neuron synchronized activation mechanism which bearsthe fault and noise in the sequence item duration time andcan realize the reliable recalling However the LPST cannot

realize the correct convergence in reasonable time because ofaccurate matching

5 Discussion

In this section we compare the relatedworkwith the STAMNmodel The related work mainly includes the LPST and theAMN

51 Looping Prediction Suffix Tree (LPST) LPST is construc-ted incrementally by expanding branches until they are iden-tified or become looped using the observable terminationcriterion Given a sufficient history this model can correctlycapture all predictions of identifying histories and can mapthe all infinite identifying histories onto a finite LPST

The STAMNproposed in this paper is similar to the LPSTHowever the STAMN is looping hidden state transitionmodel so in comparison with LPST STAMN have lessstate nodes and action nodes because these nodes in LPSTare based on observation not hidden state FurthermoreSTAMNhas better noise tolerance and robustness than LPSTThe LPST realizes recalling by successive accurate matchingwhich is sensitive to noise and fault The STAMN offers theneuron synchronized activation mechanism to realize recall-ing Even if in noisy and disturbed environment STAMNcan still realize the reliable recalling Finally the algorithm forlearning a LPST is an additional computation model whichis not a distributed computational model The STAMN isa distributed network and uses the synchronized activationmechanism where performance cannot become poor withhistory sequences increasing and scale increasing

52 Associative Memory Networks (AMN) STAMN is pro-posed based on the development of the associative memorynetworks In existing AMN firstly almost all models areunable to handle complex sequence with looped hiddenstate The STAMN realizes the identifying the looped hiddenstate indeed which can be applied to HMM and POMDPproblems Furthermoremost AMNmodels can obtainmem-ory depth determined by experiments The STAMN offers aself-organizing incremental memory depth learningmethodand the memory depth 119896 is variable in different parts ofstate space Finally the existing AMN models general onlyrecord the temporal orders with discrete intervals Δ119905 ratherthan sequence items durationwith continuous-time STAMNexplicitly deals with the duration time of each item

6 Conclusion and Future Research

POMDP is the long-standing difficult problem in the mach-ine learning In this paper SATMN is proposed to identifythe looped hidden state only by the transition instances indeterministic POMDP environment The learned STAMN isseen as a variable depth 119896-Morkov model We proposed theheuristic constructing algorithm for the STAMN which isproved to be sound and complete given sufficient historysequences The STAMN is real self-organizing incrementalunsupervised learning model These transition instances can

Computational Intelligence and Neuroscience 13

2

3

6

7

8

9

10

11

12

54 14

13

21

22

23

24 18

19

20 16

171

15

(a) (b)

Figure 11 (a) Complex symmetrical simulation environment (b) The disturbed environment

0 100 200 300 400 500

A complex symmetrical environment

Timesteps

STAMNLPST

0

09

08

07

06

05

04

03

02

01

1

Pred

ictio

n er

ror

(a)

0 100 200 300 400 500

A disturbed environment

Timesteps

STAMNLPST

0

09

08

07

06

05

04

03

02

01

1Pr

edic

tion

erro

r

(b)

Figure 12 (a) Comparative results for a noise-free environment (b) Comparative results for a disturbed environment

be obtained by the interaction with the environment by theagent and can be obtained by a number of training datathat does not depend on the real agent The STAMN is veryfast robust and accurate We have also shown that STAMNoutperforms some existing hidden state methods in deter-ministic POMDP environment The STAMN can generallybe applied to almost all temporal sequences problem suchas simultaneous localization and mapping problem (SLAM)robot trajectory planning sequential decision making andmusic generation

We believe that the STAMN can serve as a starting pointfor integrating the associative memory networks with thePOMDP problem Further research will be carried out onthe following aspects how to scale our approach to thestochastic case by heuristic statistical test how to incorporatewith the reinforcement learning to produce a new distributed

reinforcement learning model and how it should be appliedto robot SLAM to resolve the practical navigation problem

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this article

References

[1] A K McCallum Reinforcement learning with selective percep-tion and hidden state [PhD thesis] University of RochesterRochester NY USA 1996

[2] MHutter ldquoFeature reinforcement learning part I unstructuredMDPsrdquo Journal of Artificial General Intelligence vol 1 no 1 pp3ndash24 2009

14 Computational Intelligence and Neuroscience

[3] M Daswani P Sunehag andMHutter ldquoFeature reinforcementlearning state of the artrdquo in Proceedings of the Workshops at the28th AAAI Conference on Artificial Intelligence 2014

[4] PNguyen P Sunehag andMHutter ldquoContext treemaximizingreinforcement learningrdquo in Proceedings of the 26th AAAI Con-ference onArtificial Intelligence pp 1075ndash1082 Toronto CanadaJuly 2012

[5] J Veness K S NgMHutterWUther andD Silver ldquoAMonte-Carlo AIXI approximationrdquo Journal of Artificial IntelligenceResearch vol 40 no 1 pp 95ndash142 2011

[6] M P Holmes and C L Isbell Jr ldquoLooping suffix tree-basedinference of partially observable hidden staterdquo in Proceedings ofthe 23rd International Conference on Machine Learning (ICMLrsquo06) pp 409ndash416 ACM Pittsburgh Pa USA 2006

[7] M Daswani P Sunehag andMHutter ldquoFeature reinforcementlearning using looping suffix treesrdquo in Proceedings of the 10thEuropeanWorkshop on Reinforcement Learning (EWRL rsquo12) pp11ndash24 Edinburgh Scotland 2012

[8] M Daswani P Sunehag and M Hutter ldquoQ-learning forhistory-based reinforcement learningrdquo in Proceedings of theConference on Machine Learning pp 213ndash228 2013

[9] S Timmer and M Riedmiller Reinforcement learning with his-tory lists [PhD thesis] University of Osnabruck OsnabruckGermany 2009

[10] E Talvitie ldquoLearning partially observable models using tempo-rally abstract decision treesrdquo in Advances in Neural InformationProcessing Systems pp 818ndash826 2012

[11] M Mahmud ldquoConstructing states for reinforcement learningrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 727ndash734 June 2010

[12] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquoNeural Computation vol 9 no 8 pp 1735ndash1780 1997

[13] V Mnih K Kavukcuoglu D Silver et al ldquoHuman-level controlthrough deep reinforcement learningrdquo Nature vol 518 no7540 pp 529ndash533 2015

[14] M Hausknecht and P Stone ldquoDeep recurrent Q-learning forpartially observable MDPsrdquo httparxivorgabs150706527

[15] K Narasimhan T Kulkarni and R Barzilay ldquoLanguage under-standing for text-based games using deep reinforcement learn-ingrdquo httpsarxivorgabs150608941

[16] J Oh X Guo H Lee R Lewis and S Singh ldquoAction-condi-tional video prediction using deep networks in atari gamesrdquohttparxivorgabs150708750

[17] X Li L Li J Gao et al ldquoRecurrent reinforcement learning ahybrid approachrdquo httparxivorgabs150903044

[18] D Wang and B Yuwono ldquoIncremental learning of complextemporal patternsrdquo IEEE Transactions on Neural Networks vol7 no 6 pp 1465ndash1481 1996

[19] A Sudo A Sato and O Hasegawa ldquoAssociative memory foronline learning in noisy environments using self-organizingincremental neural networkrdquo IEEE Transactions on NeuralNetworks vol 20 no 6 pp 964ndash972 2009

[20] M U Keysermann and P A Vargas ldquoTowards autonomousrobots via an incremental clustering and associative learningarchitecturerdquo Cognitive Computation vol 7 no 4 pp 414ndash4332014

[21] A F R Araujo and G D A Barreto ldquoContext in temporal seq-uence processing a self-organizing approach and its applicationto roboticsrdquo IEEE Transactions on Neural Networks vol 13 no1 pp 45ndash57 2002

[22] A L Freire G A Barreto M Veloso and A T Varela ldquoShort-term memory mechanisms in neural network learning of robotnavigation tasks A Case Studyrdquo in Proceedings of the 6th LatinAmerican Robotics Symposium (LARS rsquo09) pp 1ndash6 October2009

[23] S Tangruamsub A Kawewong M Tsuboyama and O Hase-gawa ldquoSelf-organizing incremental associative memory-basedrobot navigationrdquo IEICE Transactions on Information andSystems vol 95 no 10 pp 2415ndash2425 2012

[24] V A Nguyen J A Starzyk W-B Goh and D Jachyra ldquoNeuralnetwork structure for spatio-temporal long-term memoryrdquoIEEE Transactions on Neural Networks and Learning Systemsvol 23 no 6 pp 971ndash983 2012

[25] F Shen H Yu W Kasai and O Hasegawa ldquoAn associativememory system for incremental learning and temporal seq-uencerdquo in Proceedings of the International Joint Conference onNeural Networks (IJCNN rsquo10) pp 1ndash8 IEEE Barcelona SpainJuly 2010

[26] F ShenQOuyangWKasai andOHasegawa ldquoA general asso-ciative memory based on self-organizing incremental neuralnetworkrdquo Neurocomputing vol 104 pp 57ndash71 2013

[27] B Khouzam ldquoNeural networks as cellular computing modelsfor temporal sequence processingrdquo Suplec 2014

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 12: Research Article A Self-Organizing Incremental ...downloads.hindawi.com › journals › cin › 2016 › 7158507.pdfResearch Article A Self-Organizing Incremental Spatiotemporal Associative

12 Computational Intelligence and Neuroscience

0 200 400 600 800 1000Timesteps

STAMN (exhaustive exploration)STAMN (random exploration)LPST

0

09

08

07

06

05

04

03

02

01

1

Pred

ictio

n er

ror

4 lowast 3 grid problem

Figure 10 Comparative performance results for 4lowast3 gird problem

memory We assume 120574119904119894 = 120575119886119894 = 09 and 120583 = 120578 = 09 initialrepresenting 119896 = 1 When we need to increase 119896 to 119896 + 1 weonly need to decrease 120583 120578 according to 120578 = 120578minus02 120583 = 120583minus01The other parameters are determined relative easily We setlearning rates 120572 = 120573 = 120574 = 09 and set the confidenceparameters 119862119894119895 = 1|120591119895| 119863119894119895 = 12 representing the sameimportance degree Set constant values 119862 = 5 120599 = 09

43 Experimental Results in Complex Symmetrical Environ-ments The symmetrical environment in this paper is verycomplexwhich is shown in Figure 11(a)The robotwanders inthe environment By following wall behaviour the agent canrecognize the left wall right wall and corridor landmarksThese observation landmarks are different from the observa-tion vector in the previous grid problem every observationlandmark has different duration time For analysis of thefault tolerance and robustness of STAMN we assume thedisturbed environment is shown in Figure 11(b)

The figures in Figure 11(a) represent the hidden statesThe robot has four actions forward backward turn leftand turn right The initial orientation of the robot is shownin Figure 11(a) Since the robot follows the wall in theenvironment the robot has only one optional action in eachstate The reference direction of action is the robot currentorientation Thus the path 2-3-4-5-6 and path 12-13-14-15-16 are identical and the memory depth 119896 = 6 is necessaryto identify hidden states reliably We present a comparisonbetween the STAMNand the LPST in noise-free environmentand disturbed environment The results are shown as inFigures 12(a) and 12(b) Figure 12(b) shows that STAMN hasbetter noise tolerance and robustness than LPST because ofthe neuron synchronized activation mechanism which bearsthe fault and noise in the sequence item duration time andcan realize the reliable recalling However the LPST cannot

realize the correct convergence in reasonable time because ofaccurate matching

5 Discussion

In this section we compare the relatedworkwith the STAMNmodel The related work mainly includes the LPST and theAMN

51 Looping Prediction Suffix Tree (LPST) LPST is construc-ted incrementally by expanding branches until they are iden-tified or become looped using the observable terminationcriterion Given a sufficient history this model can correctlycapture all predictions of identifying histories and can mapthe all infinite identifying histories onto a finite LPST

The STAMNproposed in this paper is similar to the LPSTHowever the STAMN is looping hidden state transitionmodel so in comparison with LPST STAMN have lessstate nodes and action nodes because these nodes in LPSTare based on observation not hidden state FurthermoreSTAMNhas better noise tolerance and robustness than LPSTThe LPST realizes recalling by successive accurate matchingwhich is sensitive to noise and fault The STAMN offers theneuron synchronized activation mechanism to realize recall-ing Even if in noisy and disturbed environment STAMNcan still realize the reliable recalling Finally the algorithm forlearning a LPST is an additional computation model whichis not a distributed computational model The STAMN isa distributed network and uses the synchronized activationmechanism where performance cannot become poor withhistory sequences increasing and scale increasing

52 Associative Memory Networks (AMN) STAMN is pro-posed based on the development of the associative memorynetworks In existing AMN firstly almost all models areunable to handle complex sequence with looped hiddenstate The STAMN realizes the identifying the looped hiddenstate indeed which can be applied to HMM and POMDPproblems Furthermoremost AMNmodels can obtainmem-ory depth determined by experiments The STAMN offers aself-organizing incremental memory depth learningmethodand the memory depth 119896 is variable in different parts ofstate space Finally the existing AMN models general onlyrecord the temporal orders with discrete intervals Δ119905 ratherthan sequence items durationwith continuous-time STAMNexplicitly deals with the duration time of each item

6 Conclusion and Future Research

POMDP is the long-standing difficult problem in the mach-ine learning In this paper SATMN is proposed to identifythe looped hidden state only by the transition instances indeterministic POMDP environment The learned STAMN isseen as a variable depth 119896-Morkov model We proposed theheuristic constructing algorithm for the STAMN which isproved to be sound and complete given sufficient historysequences The STAMN is real self-organizing incrementalunsupervised learning model These transition instances can

Computational Intelligence and Neuroscience 13

2

3

6

7

8

9

10

11

12

54 14

13

21

22

23

24 18

19

20 16

171

15

(a) (b)

Figure 11 (a) Complex symmetrical simulation environment (b) The disturbed environment

0 100 200 300 400 500

A complex symmetrical environment

Timesteps

STAMNLPST

0

09

08

07

06

05

04

03

02

01

1

Pred

ictio

n er

ror

(a)

0 100 200 300 400 500

A disturbed environment

Timesteps

STAMNLPST

0

09

08

07

06

05

04

03

02

01

1Pr

edic

tion

erro

r

(b)

Figure 12 (a) Comparative results for a noise-free environment (b) Comparative results for a disturbed environment

be obtained by the interaction with the environment by theagent and can be obtained by a number of training datathat does not depend on the real agent The STAMN is veryfast robust and accurate We have also shown that STAMNoutperforms some existing hidden state methods in deter-ministic POMDP environment The STAMN can generallybe applied to almost all temporal sequences problem suchas simultaneous localization and mapping problem (SLAM)robot trajectory planning sequential decision making andmusic generation

We believe that the STAMN can serve as a starting pointfor integrating the associative memory networks with thePOMDP problem Further research will be carried out onthe following aspects how to scale our approach to thestochastic case by heuristic statistical test how to incorporatewith the reinforcement learning to produce a new distributed

reinforcement learning model and how it should be appliedto robot SLAM to resolve the practical navigation problem

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this article

References

[1] A K McCallum Reinforcement learning with selective percep-tion and hidden state [PhD thesis] University of RochesterRochester NY USA 1996

[2] MHutter ldquoFeature reinforcement learning part I unstructuredMDPsrdquo Journal of Artificial General Intelligence vol 1 no 1 pp3ndash24 2009

14 Computational Intelligence and Neuroscience

[3] M Daswani P Sunehag andMHutter ldquoFeature reinforcementlearning state of the artrdquo in Proceedings of the Workshops at the28th AAAI Conference on Artificial Intelligence 2014

[4] PNguyen P Sunehag andMHutter ldquoContext treemaximizingreinforcement learningrdquo in Proceedings of the 26th AAAI Con-ference onArtificial Intelligence pp 1075ndash1082 Toronto CanadaJuly 2012

[5] J Veness K S NgMHutterWUther andD Silver ldquoAMonte-Carlo AIXI approximationrdquo Journal of Artificial IntelligenceResearch vol 40 no 1 pp 95ndash142 2011

[6] M P Holmes and C L Isbell Jr ldquoLooping suffix tree-basedinference of partially observable hidden staterdquo in Proceedings ofthe 23rd International Conference on Machine Learning (ICMLrsquo06) pp 409ndash416 ACM Pittsburgh Pa USA 2006

[7] M Daswani P Sunehag andMHutter ldquoFeature reinforcementlearning using looping suffix treesrdquo in Proceedings of the 10thEuropeanWorkshop on Reinforcement Learning (EWRL rsquo12) pp11ndash24 Edinburgh Scotland 2012

[8] M Daswani P Sunehag and M Hutter ldquoQ-learning forhistory-based reinforcement learningrdquo in Proceedings of theConference on Machine Learning pp 213ndash228 2013

[9] S Timmer and M Riedmiller Reinforcement learning with his-tory lists [PhD thesis] University of Osnabruck OsnabruckGermany 2009

[10] E Talvitie ldquoLearning partially observable models using tempo-rally abstract decision treesrdquo in Advances in Neural InformationProcessing Systems pp 818ndash826 2012

[11] M Mahmud ldquoConstructing states for reinforcement learningrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 727ndash734 June 2010

[12] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquoNeural Computation vol 9 no 8 pp 1735ndash1780 1997

[13] V Mnih K Kavukcuoglu D Silver et al ldquoHuman-level controlthrough deep reinforcement learningrdquo Nature vol 518 no7540 pp 529ndash533 2015

[14] M Hausknecht and P Stone ldquoDeep recurrent Q-learning forpartially observable MDPsrdquo httparxivorgabs150706527

[15] K Narasimhan T Kulkarni and R Barzilay ldquoLanguage under-standing for text-based games using deep reinforcement learn-ingrdquo httpsarxivorgabs150608941

[16] J Oh X Guo H Lee R Lewis and S Singh ldquoAction-condi-tional video prediction using deep networks in atari gamesrdquohttparxivorgabs150708750

[17] X Li L Li J Gao et al ldquoRecurrent reinforcement learning ahybrid approachrdquo httparxivorgabs150903044

[18] D Wang and B Yuwono ldquoIncremental learning of complextemporal patternsrdquo IEEE Transactions on Neural Networks vol7 no 6 pp 1465ndash1481 1996

[19] A Sudo A Sato and O Hasegawa ldquoAssociative memory foronline learning in noisy environments using self-organizingincremental neural networkrdquo IEEE Transactions on NeuralNetworks vol 20 no 6 pp 964ndash972 2009

[20] M U Keysermann and P A Vargas ldquoTowards autonomousrobots via an incremental clustering and associative learningarchitecturerdquo Cognitive Computation vol 7 no 4 pp 414ndash4332014

[21] A F R Araujo and G D A Barreto ldquoContext in temporal seq-uence processing a self-organizing approach and its applicationto roboticsrdquo IEEE Transactions on Neural Networks vol 13 no1 pp 45ndash57 2002

[22] A L Freire G A Barreto M Veloso and A T Varela ldquoShort-term memory mechanisms in neural network learning of robotnavigation tasks A Case Studyrdquo in Proceedings of the 6th LatinAmerican Robotics Symposium (LARS rsquo09) pp 1ndash6 October2009

[23] S Tangruamsub A Kawewong M Tsuboyama and O Hase-gawa ldquoSelf-organizing incremental associative memory-basedrobot navigationrdquo IEICE Transactions on Information andSystems vol 95 no 10 pp 2415ndash2425 2012

[24] V A Nguyen J A Starzyk W-B Goh and D Jachyra ldquoNeuralnetwork structure for spatio-temporal long-term memoryrdquoIEEE Transactions on Neural Networks and Learning Systemsvol 23 no 6 pp 971ndash983 2012

[25] F Shen H Yu W Kasai and O Hasegawa ldquoAn associativememory system for incremental learning and temporal seq-uencerdquo in Proceedings of the International Joint Conference onNeural Networks (IJCNN rsquo10) pp 1ndash8 IEEE Barcelona SpainJuly 2010

[26] F ShenQOuyangWKasai andOHasegawa ldquoA general asso-ciative memory based on self-organizing incremental neuralnetworkrdquo Neurocomputing vol 104 pp 57ndash71 2013

[27] B Khouzam ldquoNeural networks as cellular computing modelsfor temporal sequence processingrdquo Suplec 2014

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 13: Research Article A Self-Organizing Incremental ...downloads.hindawi.com › journals › cin › 2016 › 7158507.pdfResearch Article A Self-Organizing Incremental Spatiotemporal Associative

Computational Intelligence and Neuroscience 13

2

3

6

7

8

9

10

11

12

54 14

13

21

22

23

24 18

19

20 16

171

15

(a) (b)

Figure 11 (a) Complex symmetrical simulation environment (b) The disturbed environment

0 100 200 300 400 500

A complex symmetrical environment

Timesteps

STAMNLPST

0

09

08

07

06

05

04

03

02

01

1

Pred

ictio

n er

ror

(a)

0 100 200 300 400 500

A disturbed environment

Timesteps

STAMNLPST

0

09

08

07

06

05

04

03

02

01

1Pr

edic

tion

erro

r

(b)

Figure 12 (a) Comparative results for a noise-free environment (b) Comparative results for a disturbed environment

be obtained by the interaction with the environment by theagent and can be obtained by a number of training datathat does not depend on the real agent The STAMN is veryfast robust and accurate We have also shown that STAMNoutperforms some existing hidden state methods in deter-ministic POMDP environment The STAMN can generallybe applied to almost all temporal sequences problem suchas simultaneous localization and mapping problem (SLAM)robot trajectory planning sequential decision making andmusic generation

We believe that the STAMN can serve as a starting pointfor integrating the associative memory networks with thePOMDP problem Further research will be carried out onthe following aspects how to scale our approach to thestochastic case by heuristic statistical test how to incorporatewith the reinforcement learning to produce a new distributed

reinforcement learning model and how it should be appliedto robot SLAM to resolve the practical navigation problem

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this article

References

[1] A K McCallum Reinforcement learning with selective percep-tion and hidden state [PhD thesis] University of RochesterRochester NY USA 1996

[2] MHutter ldquoFeature reinforcement learning part I unstructuredMDPsrdquo Journal of Artificial General Intelligence vol 1 no 1 pp3ndash24 2009

14 Computational Intelligence and Neuroscience

[3] M Daswani P Sunehag andMHutter ldquoFeature reinforcementlearning state of the artrdquo in Proceedings of the Workshops at the28th AAAI Conference on Artificial Intelligence 2014

[4] PNguyen P Sunehag andMHutter ldquoContext treemaximizingreinforcement learningrdquo in Proceedings of the 26th AAAI Con-ference onArtificial Intelligence pp 1075ndash1082 Toronto CanadaJuly 2012

[5] J Veness K S NgMHutterWUther andD Silver ldquoAMonte-Carlo AIXI approximationrdquo Journal of Artificial IntelligenceResearch vol 40 no 1 pp 95ndash142 2011

[6] M P Holmes and C L Isbell Jr ldquoLooping suffix tree-basedinference of partially observable hidden staterdquo in Proceedings ofthe 23rd International Conference on Machine Learning (ICMLrsquo06) pp 409ndash416 ACM Pittsburgh Pa USA 2006

[7] M Daswani P Sunehag andMHutter ldquoFeature reinforcementlearning using looping suffix treesrdquo in Proceedings of the 10thEuropeanWorkshop on Reinforcement Learning (EWRL rsquo12) pp11ndash24 Edinburgh Scotland 2012

[8] M Daswani P Sunehag and M Hutter ldquoQ-learning forhistory-based reinforcement learningrdquo in Proceedings of theConference on Machine Learning pp 213ndash228 2013

[9] S Timmer and M Riedmiller Reinforcement learning with his-tory lists [PhD thesis] University of Osnabruck OsnabruckGermany 2009

[10] E Talvitie ldquoLearning partially observable models using tempo-rally abstract decision treesrdquo in Advances in Neural InformationProcessing Systems pp 818ndash826 2012

[11] M Mahmud ldquoConstructing states for reinforcement learningrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 727ndash734 June 2010

[12] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquoNeural Computation vol 9 no 8 pp 1735ndash1780 1997

[13] V Mnih K Kavukcuoglu D Silver et al ldquoHuman-level controlthrough deep reinforcement learningrdquo Nature vol 518 no7540 pp 529ndash533 2015

[14] M Hausknecht and P Stone ldquoDeep recurrent Q-learning forpartially observable MDPsrdquo httparxivorgabs150706527

[15] K Narasimhan T Kulkarni and R Barzilay ldquoLanguage under-standing for text-based games using deep reinforcement learn-ingrdquo httpsarxivorgabs150608941

[16] J Oh X Guo H Lee R Lewis and S Singh ldquoAction-condi-tional video prediction using deep networks in atari gamesrdquohttparxivorgabs150708750

[17] X Li L Li J Gao et al ldquoRecurrent reinforcement learning ahybrid approachrdquo httparxivorgabs150903044

[18] D Wang and B Yuwono ldquoIncremental learning of complextemporal patternsrdquo IEEE Transactions on Neural Networks vol7 no 6 pp 1465ndash1481 1996

[19] A Sudo A Sato and O Hasegawa ldquoAssociative memory foronline learning in noisy environments using self-organizingincremental neural networkrdquo IEEE Transactions on NeuralNetworks vol 20 no 6 pp 964ndash972 2009

[20] M U Keysermann and P A Vargas ldquoTowards autonomousrobots via an incremental clustering and associative learningarchitecturerdquo Cognitive Computation vol 7 no 4 pp 414ndash4332014

[21] A F R Araujo and G D A Barreto ldquoContext in temporal seq-uence processing a self-organizing approach and its applicationto roboticsrdquo IEEE Transactions on Neural Networks vol 13 no1 pp 45ndash57 2002

[22] A L Freire G A Barreto M Veloso and A T Varela ldquoShort-term memory mechanisms in neural network learning of robotnavigation tasks A Case Studyrdquo in Proceedings of the 6th LatinAmerican Robotics Symposium (LARS rsquo09) pp 1ndash6 October2009

[23] S Tangruamsub A Kawewong M Tsuboyama and O Hase-gawa ldquoSelf-organizing incremental associative memory-basedrobot navigationrdquo IEICE Transactions on Information andSystems vol 95 no 10 pp 2415ndash2425 2012

[24] V A Nguyen J A Starzyk W-B Goh and D Jachyra ldquoNeuralnetwork structure for spatio-temporal long-term memoryrdquoIEEE Transactions on Neural Networks and Learning Systemsvol 23 no 6 pp 971ndash983 2012

[25] F Shen H Yu W Kasai and O Hasegawa ldquoAn associativememory system for incremental learning and temporal seq-uencerdquo in Proceedings of the International Joint Conference onNeural Networks (IJCNN rsquo10) pp 1ndash8 IEEE Barcelona SpainJuly 2010

[26] F ShenQOuyangWKasai andOHasegawa ldquoA general asso-ciative memory based on self-organizing incremental neuralnetworkrdquo Neurocomputing vol 104 pp 57ndash71 2013

[27] B Khouzam ldquoNeural networks as cellular computing modelsfor temporal sequence processingrdquo Suplec 2014

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 14: Research Article A Self-Organizing Incremental ...downloads.hindawi.com › journals › cin › 2016 › 7158507.pdfResearch Article A Self-Organizing Incremental Spatiotemporal Associative

14 Computational Intelligence and Neuroscience

[3] M Daswani P Sunehag andMHutter ldquoFeature reinforcementlearning state of the artrdquo in Proceedings of the Workshops at the28th AAAI Conference on Artificial Intelligence 2014

[4] PNguyen P Sunehag andMHutter ldquoContext treemaximizingreinforcement learningrdquo in Proceedings of the 26th AAAI Con-ference onArtificial Intelligence pp 1075ndash1082 Toronto CanadaJuly 2012

[5] J Veness K S NgMHutterWUther andD Silver ldquoAMonte-Carlo AIXI approximationrdquo Journal of Artificial IntelligenceResearch vol 40 no 1 pp 95ndash142 2011

[6] M P Holmes and C L Isbell Jr ldquoLooping suffix tree-basedinference of partially observable hidden staterdquo in Proceedings ofthe 23rd International Conference on Machine Learning (ICMLrsquo06) pp 409ndash416 ACM Pittsburgh Pa USA 2006

[7] M Daswani P Sunehag andMHutter ldquoFeature reinforcementlearning using looping suffix treesrdquo in Proceedings of the 10thEuropeanWorkshop on Reinforcement Learning (EWRL rsquo12) pp11ndash24 Edinburgh Scotland 2012

[8] M Daswani P Sunehag and M Hutter ldquoQ-learning forhistory-based reinforcement learningrdquo in Proceedings of theConference on Machine Learning pp 213ndash228 2013

[9] S Timmer and M Riedmiller Reinforcement learning with his-tory lists [PhD thesis] University of Osnabruck OsnabruckGermany 2009

[10] E Talvitie ldquoLearning partially observable models using tempo-rally abstract decision treesrdquo in Advances in Neural InformationProcessing Systems pp 818ndash826 2012

[11] M Mahmud ldquoConstructing states for reinforcement learningrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 727ndash734 June 2010

[12] S Hochreiter and J Schmidhuber ldquoLong short-term memoryrdquoNeural Computation vol 9 no 8 pp 1735ndash1780 1997

[13] V Mnih K Kavukcuoglu D Silver et al ldquoHuman-level controlthrough deep reinforcement learningrdquo Nature vol 518 no7540 pp 529ndash533 2015

[14] M Hausknecht and P Stone ldquoDeep recurrent Q-learning forpartially observable MDPsrdquo httparxivorgabs150706527

[15] K Narasimhan T Kulkarni and R Barzilay ldquoLanguage under-standing for text-based games using deep reinforcement learn-ingrdquo httpsarxivorgabs150608941

[16] J Oh X Guo H Lee R Lewis and S Singh ldquoAction-condi-tional video prediction using deep networks in atari gamesrdquohttparxivorgabs150708750

[17] X Li L Li J Gao et al ldquoRecurrent reinforcement learning ahybrid approachrdquo httparxivorgabs150903044

[18] D Wang and B Yuwono ldquoIncremental learning of complextemporal patternsrdquo IEEE Transactions on Neural Networks vol7 no 6 pp 1465ndash1481 1996

[19] A Sudo A Sato and O Hasegawa ldquoAssociative memory foronline learning in noisy environments using self-organizingincremental neural networkrdquo IEEE Transactions on NeuralNetworks vol 20 no 6 pp 964ndash972 2009

[20] M U Keysermann and P A Vargas ldquoTowards autonomousrobots via an incremental clustering and associative learningarchitecturerdquo Cognitive Computation vol 7 no 4 pp 414ndash4332014

[21] A F R Araujo and G D A Barreto ldquoContext in temporal seq-uence processing a self-organizing approach and its applicationto roboticsrdquo IEEE Transactions on Neural Networks vol 13 no1 pp 45ndash57 2002

[22] A L Freire G A Barreto M Veloso and A T Varela ldquoShort-term memory mechanisms in neural network learning of robotnavigation tasks A Case Studyrdquo in Proceedings of the 6th LatinAmerican Robotics Symposium (LARS rsquo09) pp 1ndash6 October2009

[23] S Tangruamsub A Kawewong M Tsuboyama and O Hase-gawa ldquoSelf-organizing incremental associative memory-basedrobot navigationrdquo IEICE Transactions on Information andSystems vol 95 no 10 pp 2415ndash2425 2012

[24] V A Nguyen J A Starzyk W-B Goh and D Jachyra ldquoNeuralnetwork structure for spatio-temporal long-term memoryrdquoIEEE Transactions on Neural Networks and Learning Systemsvol 23 no 6 pp 971ndash983 2012

[25] F Shen H Yu W Kasai and O Hasegawa ldquoAn associativememory system for incremental learning and temporal seq-uencerdquo in Proceedings of the International Joint Conference onNeural Networks (IJCNN rsquo10) pp 1ndash8 IEEE Barcelona SpainJuly 2010

[26] F ShenQOuyangWKasai andOHasegawa ldquoA general asso-ciative memory based on self-organizing incremental neuralnetworkrdquo Neurocomputing vol 104 pp 57ndash71 2013

[27] B Khouzam ldquoNeural networks as cellular computing modelsfor temporal sequence processingrdquo Suplec 2014

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 15: Research Article A Self-Organizing Incremental ...downloads.hindawi.com › journals › cin › 2016 › 7158507.pdfResearch Article A Self-Organizing Incremental Spatiotemporal Associative

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014