Hidden Markov Model for Control Strategy Learning · For a discrete HMM, B = {hj(Ok)}, where Ok represents a discret,e observation symbol. For a continuous HMM, B = {bJ(x)}? where

Hidden Markov Model for Control Strategy Learning

Jie Yang, Yangsheng Xu

CMU-RI-TR-94-11

The Robotics Institute Carnegie Mellon University

Pittsburgh, Pennsylvania 15213

May 1994

01994 Carnegie Mellon University

Contents

1 Introduction 1

2 Hidden Markov Modeling 2

2.2 Concept and Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 2.1 Hidden Markov Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 HMM-Based Controller 4

3.1 Preprocessor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3.2 Pattern Evaluation Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.3 Model Bank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Control Signal Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

c 1

4 Case Studies 10

4.1 Linear System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4.2 Inverted Pendulum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

5 Conclusion 15

1

.. 11

List of Figures

1

2

3

4

5

.,in IIhfiV-based control system . . . . . . . . . . . . . . . . . . . . . . . . . 5-sta.t,c left-right HMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Block diagram of the sun-seeker control syst,ern . . . . . . . . . . . . . . . .

Step responses of the system . . . . . . . . . . . . . . . . . . . . . . . . . . Response of HMM-based control system t o a square wave input . . . . . . .

5

Y

11

12

13 m ,."

G The relation between m and ce(ij i=l

13

7

8

Inverted pendulum system . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Responses of HMM-based system and the teacher . . . . . . . . . . . . . . . 11

16

... 111

Abstract

This report presents a method for learning a control strategy using the hidden Markov model (HMM), i.e., dexyeloping a feedback controller based on IIMMs. The HMM is a parametric model for non-stat,ionary pattern recognit,ion and is feasible to characterizc a doubly st,ochas- t i c process involuing okscrvable actions and a hidden decision pattern. Thc control st,ralegy is encoded by HlvIMs through a t,raining process. The t,rained models are t,hen crnployed t,o control t.he system. The proposed method has been inlicstigat,ed by sirnulat,ions ol a linear system and an inverted pendulum system. The HMiWbased controller provides a novel way to learn cont,rol strategy and to model the human decision making process.

iv

1 Introduction

An inklligent controller has the ability lo comprehend, reason, and learn about, processes and environment. Because an intelligent control system is complex, analytic method in control t,lieory is insufficient and inefficient, for analyzing and designing an intelligent control system. Various methods have been proposed Cor designing intelligent, c.ontro1 systems sur.h as pattern recognition method [I , 21; fuzzy control [3? 41, and neurocont,rol [5, G1 71.

A controller maps its input ont,o an appropriate set of control act,ions. The correspondence between pa,ttern recognition and control can be considered as the learned response of the control system to known patterns in the input data. The power of fuzzy sets and nema1 networks lies in their ability to represent t.he mapping of a controller. It has been proved that any continuous nonlinear mapping can be approximated as exactly as needed with a finite set, of fuzzy va.riables, values, and rules [SI. In a typical neural network learning application, the desired mapping is static. The hidden assumption is that the nonlinear static map generat,ed by the neural network can adequately represent the system’s behavior [9]. Some mappings in the control systems, however, are non-st,ationary. Human perforniancc is an example. Human performance is the actions and/or reactions of humans under specified circumstances. Actions reflect human skill of performing a certain task and reactions reflect t,he control strategy t,o environment. A human associates responses wit,h st,imuli, ac.t,ions wit.11 scenarios, labels with patterns, and effects with causes. Because bot,h human decision and sensory processes are st,ochastic, human control actions differ even when input,s are the same. In short, t,he human control strategy is non-linear, non-deterministic, and non-stat,ionary, and it is necessary to model such a mapping with an appropriate tool.

The hidden Markov model (HMM) is a powerful tool for pattern recognit,ion of the non- stationary stochastic process [lo]. HMM is a doubly stochastic proc.ess: t,hc hidden underlying stochastic process can only be observed through another set of stochastic processes. In addition, the HMM is a parametric model that can be optimized with efficient, algorithms. HMMs has been successfully used in speech recognition [lo1 11, 1 2 , 131. Recent,ly their effectiveness has been studied in force analysis, task-context final segmentat,ion, and extract,ion of discrete finite-state Markov signals [14, 15, 161. We have successfully applied HMMs to human action modeling 117, 181. Action and reaction are two importa.111 aspects of human performance. Both of them play important roles in human performalice. For example, playing tennis is a complicated process involving actions and reactions. A tennis player determines his strategy based on his experience and the immediate information! such as ball’s incoming speed and height. The player‘s actions will directly influence his reactions; hence, high-quality performance requires both good reactions and good actions. In this report, we discuss the problem of reaction modeling and application of HMMs to control strategy learning.

The object,ive of a learning controller is t,o acquire a control st,rategy from experience to achieve certain desired goals. In general, an HMM-based controller can be regarded as a means of learning a control strategy from a teacher. The control strategy modeled with HMMs is used for controlling the system. For a given system, the control st,rategy is partit,ioned into cert,ain decision patterns, and these patterns are described by corresponding

HMMs. Uuring t h e training process. HMMs learn a control st,rategy by adjusting t,heir parameters. During controlling the system: at each “sampling time?” the controller evaluates thc feedback signal using the trained IIMMs, and their scores (probabilities) are used l o gcnerat,e the controller output, i.e., the pat,terns with higher probabilities contribute more to the controller out,put.

2 Hidden Markov Modeling

2.1 Hidden Markov Model

A hidden Markov model is a collection of finite stat,es connected by transitions. Each state is characterized by two sets of probabilities: a, transition probability, and either a discretc output probabilit,y distribution or a cont,inuous output probability density function which: given the state, defines the condition probability of emitting each out,put symbol from a finite alphabet or a continuous random vector.

An HMM can be defined by:

A set of states {S}, with an initial state SI and a final state SF

The t,ransit,ion probability matrix, A = {a i j }? where a;, is t,he transition probabilit,y ol taking the transition from state i to state j

The output, probability matrix B. For a discrete HMM, B = { h j ( O k ) } , where Ok

represents a discret,e observation symbol. For a continuous HMM, B = { b J ( x ) } ? where x represents continuous observations of k-dimensional random vectors

In this report, we consider only a discrete HMM. For a discrete HMM, uij and b j ( 0 k ) have the following properties:

a.;j 2 0, b j ( 0 k ) 2 0, Vi, j , k: (1)

k

If the initial state distribution FT = {T;}; the complete parameter set of the HMM can be expressed compactly as

X = ( A , I3.a). (4)

For a detailed description of the theory and computing HMM, the readers are rcfcrred to [lo, 121.

2

2.2 Concept and Approach

Pat,tern recognition can be used for classifying objects or processes of unknown origin into predetermined classes. The problem of pattern recognition is usually charaderized by a descript,ion and classification of a set of objects or processes. Many control problems can be described as pattern recognition problems. For example, in an on-off roorri-tempera.ture control system, the task of a controller is to maintain the room temperature based on two patterns. Howeverl controller usually have to identify infinite number of patterns and their outputs are continuous. Therefore, more complex techniques are required for controlling such systems based on a pa,ttern recognition approach.

In a feedbac,k system, the controller input is the error signal and the output is the control signal. The feedback control strategy, in general, is a function of t,he error signal, its history, and t,he cont,rol signal history. Usually, the feedback and control signals are multi- dimensional. If t,he control st,rategy depends only on the finite history of the feedback signal, the crilical issue is to determine decision patt,erns, Le., the number of HMMs for character- izing the decision patterns, because of no fixed pattern available for a controller. In order t,o obtain finite decision patterns, we can partition control space into finite patterns, model these patt,erns, and t,hen generate control signals based on these models. We have developcd t,he following t,echnique to implement an HMWbased controller.

1. Record the c.ontroller input and output data, which are the feedback signals e(i) and control signals ~ ( 2 ) . Assuming u ( i ) depends only on IC samplings of e ( i ) , the correspondence between ~ ( i ) and the feedhack signals can be established.

These data will be used for t,raining HMMs.

2 . Partition control patterns and quantize the feedback signals into finite levels. (a) Partition [Um.in, .Vrnaz] into IV patterns:

u = {Ul, uz,. . . , U N } . ( 5 )

(b) At each time i = 1 , 2 , . . . , M , u(z) belongs to one of the pa,tterns, sponds to a set of sequences { E ( i ) } , where

and corre-

3. Handmark each { E ( i ) } to corresponding one of { U j } > where i = 1,2 , . . . IM and j = 1>2: ' ' . N .

4. Cse HMM to describe each Vi, j = 1,2; .. ,Ar, and train the models by the data.

5 . Put t,he trained models into the model bank for cont,rolling the system. At each sa.nipling time, { E ( i ) } is scored by all trained models for obtaining the probabilities of decision patt,erns j = 1,2>. . . , N, mat,ching wit,h { E ( i ) } :

P(u,)? P ( U ~ ) , . . . , p(uJxrj. (7)

3

Then P(Uj)! j = 1,2 , . . . ? N , are sorted in a decreasing order and the first ni /‘(ti;) are employed as w-eights to compute the controller output:

where Vi. J = 1.2, . . . . rn, stand for the sorted patterns and u: stands for the rorrr- sponding control signal values.

In next sect,ion: --e discuss how to develop an HMM-based controller in detail.

3 HMM-Based Controller

Figure 1 shows the basic configuration of an IIMWbased controller which consists of foiir components: signal preprocessor, pattern evaluation unit, model bank, and control signal gencrator. The signal preprocessor memures the values of feedback signals, maps the range of values of measured signals onto corresponding universes of discourse, and converts the input, data into suitable symbols which will be used by Hh,ZMs. The pattern evaluation unit estimates the probabilities t,hat the input signals match with the models in the model bank. The model bank is the kernel of a,n HhlM-based controller; it contains t,he trained HMMs which represent the most likely decision patterns of controller. The models in the model bank are trained by training examples. The control signal generator generates control signals based on the pattern evaluations, and converts the range of values of the cont,rol signals into corresponding universes of discourse.

3.1 Preprocessor

Thc measured feedback signals are preprocessed for appropriate enhancement,. First t,liey are filtered to eliminate noise. Then the result,ing sample is converted into finite symbols because we use discrete HMM for describing decision patterns. In a multi-input system, thr feedback signals are sequential vectors. Vector quantization (VQ) technique [19] is a suit,a.ble tool to map real vectors onto finite symbols. A vector quantizer is completely decided by a codebook which consists of a set of the fixed prototype vectors. .4 description of the VQ process includes: (1) the distortion measure, and (2) the generation of thc certain numbcr of prolot,ype vcct,ors. In t,he implemenhtion, the squared error distortion measure is used: ].e..

k-1

d(x,kj = ~ ~ x , q = E(.; - q 2 . (9) i d

4

SIGN.% PATELV CONIROL

PREPROCESSOR - EVALUATION - SIGNAL

1 GDiERATOR LmTr

Figure I : An HMiWbasecl control system

* __L

r . 1 lie codrlmol< is generated bj VQ codrlmok [20]. The LBG partitioiis ivit,h a centroid for

Y *

I the VQ algorit,hm. 14,'e use t.he LBG algorithm to Iprodiice thr . T,, algoritlini iteratively splits tlic training data. iiit,n 2: 1. , . .: -. 7

each part,i t,ion.

~

C o m o L m S Y S T E P

(PROCESS )

-

Fur singlr-input single-out,put (SISO) systemsl we can simply use a scalw qnaiilization t,ccli- niqilp t n iiiap the feedhack signal ont,o finite symbols.

3.2 Pattern Evaluation Unit

Pat,tcrn ewluat.ion is the problem of deterniiiiiiig the probability that a. givcn HMM X generates il seqi~iciice, 0 = 01U2 . . 07 . The most. stra.ighlforwaacl way to calculate t,hr prrihabilil~ of thr ohservation sequence U = U102 . . UT is through enumerating every possiible stat? scrpiciice d length T. Fnr every fixed state seque1ic.c. S = S ' I S Z . . ST, ba.srd on the aswmp- tiniir we :hted, tlir probability of t h e observation sccpience 0, P(UJ5': A ) can he coiiipit,cd as follow^:

PJOIS: A) = h,,(O1)b,*(O2).'.b,,(U1.). (10)

Oii i,lLir other hand, the proha,hility of a. state sequence S can also be miit.t,cn a.s

Tlir ,join1 I)roliability of 0 and 5, i.e., the probability that, 0 and S VCCIII' simultanconsl!~, is simply tliv I)i.oduct of the above two terms.

P(O>SIA) = P(OI$ .A)P(5lX) . (12 )

' l l i r 1)mbaI>ilit,y P ( OlX) is the summatimi of this joint probability over all possible stat,e

seqrirnccs:

P(0JX) = P(OlS, X)P(SlX) all s

(13)

N o k t,hat the compntation for the above cquation is on the order of O ( N T ) . We must go through N possible states at every t imet = 1; 2 , . . . , T . Even for small value of T and N: this computation is expensive. For example, if N = 5 and T = 100, the required computation is on the order of 10". To avoid this computation expense: we can use a morc eficient algorithm known as the Forward-Backward algorithm [12].

Forward algorithm

The forward variable cxt(ij is defined as

at(;) = P(O102 " 'Ot: st = i l X j . i 14)

This is the proba,bility of a partial observation sequence to time f , and stat,c S, which is reached at time t , given t,he model A. This probability can be inductively computed by t,hc following st,eps:

1. Initialization: aI(ij = *,bi(OIj 1 5 i 5 N . (15)

2. Induction:

1 5 t 5 T - 1, 1 5 j 5 N.

3. Termination: ."I

P(OIXj = CaT( i j t = l

With this algorithm, the computation of a t ( j ) is only on the order of O ( N 2 T )

Backward algorithm

In it similar way, t,lie backward variable &( i ) is defined as

& ( a ) = P(ot+iot+2 ' ' ' OTlSt = i , A)

6

i x . ? the probabilit,y of a partial observation sequence from t + 1 t,o the final observation T . given state i a,t t imet and the model A. The backward variable is c.omputed in the following manner.

1 . Initializat,ion: (19)

2. Induction:

3. 'Termination:

i=l

The computational complexity of of(; ) is approximately t h e same as that of cq(i). Both t.he Forward a,nd Backward algorithms can be used for computing P(0IX). Howevcr, in the problem of recognition, we need to computer P(AI0). Using Bayes' formnla, P(XI0) can be obtained by

3.3 Model Bank

In order to characterize the decision patt,erns, each pat,tern is described by an HMM. Sinc.r the fecdback signal is a time sequence, the underlying state sequence associated with the niodel has the property that, as time increases, the state index increases (or stays t,he same), Le., the states proceed from left to right. Supposing the control actions depend on the most, rec.ent m, feedback samples, we can use an n (n < m ) state left-right HMA~l. or so called Bakis model, to describe the pattern as shown in Figure 2.

transition matrix in this case is

0 an-i,n-i G - I , ~ ~ 10 I:: I:: . . . 0 ann

7

Figure 2: 5-st,ate left-right HMM

C:lea.rl\: this iiiotlel 1ia.s fewer paramet,ers t,han that of ergodic, or fully conner-t.c-tl HMh;Ix. Fiirt,lirrmore. the initial state proliabilit,ies have t.he property

Moreover, the stat,e traiisit,ioii coefficients ol state n. are specified as

nrrn = I ,

art,, = 0, i < 11.. (2.5)

If we convert, the feedback signals inm p syiibols hy certain signal processins t,echniclues, B ina,trii is an ' 1 1 x $1 mat,rix.

T.carniiig i 4 acliicvctl I>\: acljiisting the H M M paranieters ( A , B! r ) to nia,.iiinize [lie prubahilitj. oC the ol)wrva!iuii sr-quence. If the model pa.rameters are knorvni we caii conipu!e rlie probal>ilit.ies of an oliservat,ion produced by giwn model parameters and thrn rtptla!e the iiiotlel parameters Imsrd 011 the current probabilities. if the model paraniet.ers a rc ~ n i h ~ ~ o w n . however, no aiialyt,ic methocl is available for updating the iiiodcl parameters.

. h i iterative al~urithin is used to npdate t,he model parameters. For any model ,I wit,h nuw zero pa.rainetprs, we first define the posterior probability of transitions -jz,. from stale .i to stat,e j . given the model and the observation sequence,

T t ( i , j ) = PjS, = 2, S,,] = j l O , X )

Similarly tl ip post,ei-ior probability of being in state i at. t,iine t, -it(?). givcn the uI>servat.ion sequeiirc imd inotlel. is defincrl as

Tt(i) = P ( S , = i p : A)

T-1 Thc cxpression 1 rt(z) can be inkrpreted as the expected (over time) number of times that

st,ate Si i s visited, or. the expected number of t,ransitions made from state Si if timc slot t = 1’ is excluded from the summation. Similarly, the summat,ion of rt(z’;jj from t = 1 l o /. = T - 1 can be interpreted as the expected number of transitions from state Si to slate Si. Using t h e above formulas and the concept to count event occurrences, a new model X = (A.B:?i) can be created t,o improve t,he old model X = (A,B,7r). A set of reasonablc reestimation formulas for “ ,A , and B is:

t=1

- _ -

?= 71, (28)

j = 1 : 2 , . . . N, k = I , 2;. . . M,

where 7:k is t,hc observation symbol.

Equations (28) to (30) are extensions of the Baum-Welch reestimation algorithm [Z]. The Baum-Welch algorithm gives the maximum likelihood estimate of the HMM and can be used to obtain the model which describes t.he most likely performance for a givcn @tern. It has proved that either the initial model X defines a critical point of the likelihood function! where new estimates equal old ones, or are more likely to produce the given observat,ion scqiience, i.e.! the niodel x i s more likely than t,he model X in the sense that P(0lxj 2 P(0 IX) .

By repeating the above reestimation and replacing X with 1, we ensure that P(0IX) can be improved nntil a limiting point is reached.

3.4 Control Signal Generator

E.qiiation (22) provides a. way to evaluate how well the feedback signal matches t,lie HMMs in the model bank. Because P ( 0 ) is a. const,annt for a givcn input, only P(OJX)P(X) needs computing. P(A) is the probability that the control @tern is used. This problem ca,n bc solved with a syntact,ic method [21].

Synt,a.r.t,ic methods assume that patkrns are composed of subpatterns in the same way that phrases and sentences are built by concatenating words and words are built by concatenating

9

characters. The st,ruct,ure informatmion of the patkrns is provided by thc so-called “pattern descript,ion languagc.” The “grammar” of the patt,crn description language governs the cornposition of subpatt,erns int,o pat,terns. Once each subpa,tterns wit,hin the patt,ern has bccii recognized, t,hc recognition process is accomplished by performing a syntax analysis. The syntax analysis generat,es a structure description of the sentence reprcscnting t,he given pat,tern. Based on t.he syntactic approach, P(A) is detcrrnined by the “grammar.‘ The grammar determines the probability with which one control action is followed by another action. This striicture information can introduce control history into current cont,rol out,put to avoid unexpccted control out,put. P(A) can be learncd from the training da.t,a or assigned by expericnce. 111 the simplest case, if all the control patterns are eqmlly likcly to be used. only thc term P(O/X) is a variable. In this case, Equation (8) can be rewritten as:

m

P(0IAj) ‘ “3

Since only a single value u ( j ) . j = 1 , 2 , . . . , m: is used for each pattern, thcrc is a need to compute this single value. One way to do this is to simply take the center point, ol‘ the pattern in t,he control space. An alternative method is to use mean value of all the training data bclonged to the pattern. We have t,ried both methods a,nd found t,hal both of them were effective.

4 Case Studies

To evaluat,e thc validity and effectiveness of the proposed met,hod, we have carried out two case studies. we first, examined HMM-based controller for a sun-seeker cont,rol system. The system is mounted on a space vehicle to track the sun. Ne thcn applied the HMM learning controller t,o a benchmark problem in intelligent, control - inverted pendulum balancing. The problem is challenging because of its nonlinear unstable behavior.

Conceptually, if n-e ca,n obtain good results with these systems, we have sufficient rca.sons t,o expect the HMM controller to work well for other systems, because the only diffcrcnce lies in t.he data measurement and processing. We performed simulations to examine thc fcasihility of the method for handling well-defined problems and to reveal various important issues such as parameterization and stability.

4.1 Linear System

The system is shown in Figure 3, and the transfer function of the plant is:

2500 G,(s) =

s(s + 25)

10

I n -r

Figiirc 3: Blocl< tliagvam of the suii-seel<cr cont,rol system

' h i s is a second order linear system. To obtain the training data. we employed a pha,sr-Irad coiitioller a,s t,he "t,eacher." The controller t.raiisfer function is:

1 t 0.0:342s I + 0.005sss

U , ( s ) =

T11c iioisr n ( t ] is the Ga.iissian white noise wit.h zero mean and variance 0.01. Le..

b.'{n(t)] = 0. E { n ( l ) l l ( t ) * } = 0.01.

The iiiput,'oiit,pnt, data of the c.oiitroller were vccortled while the phase-lead coiitroller coli- t~.ulIpd tlir systeiii. We assumed t,hat t.he dccisioii patterns are related to teii consistent. saiiiple? of error sigiials. This means t.liat k i i samples of error signals are used for (lccisioii i lali i i ig. Becmse this is an SISO sysleni: we can use a sca,lar quantization technique t,o con- v-;ei.t tlie ~ r r o r signal into finite symbols. The feedback s i g d was quantized into 256 levels ant l tlic miitmller output was based 011 26 decision patterns. '1hereCool.e. there is a total of 26 H M M s iii the model bank antl 256 symbols in the output probabilit,y distribution fuiic- tioiis ol each discrete HMM. Each input sa~npliiig c.orresponded t.o oiie of 256 synihols and mcli of 10 symbols is associated with an HMM. To describe the patterns, wr used fi\ie-sta,t.e left.-riglit HR;lMs. Let 12. = .i we can obtain the lorni of t,ransition matrix A> t.hc initial stat,e prolialiilitier. and t,hc state transition c0efficient.s of state 5 from equations (24) to (25). The obrw\nliility inat.ris B is a 256 x 3 matrix where each col~niin represents tlie ol>srrvat,ion pi-ol>abilit,y distribut.ion for one state.

J~ lic t.rii,iiiing data were obtained by providing a unit square wave with a chaiigeaLlc wirlt , l i as rrfcvence input t,o t,he system (see Figure 3). After collecting 9000 samples ol' coutrollcr iiipiit./output data, we partit,ioned each control command int,o one of the 26 pa,ttern:: and cuiivert.ed rach feedback sampling into oiie of 256 symbols. Then w r ha.ndiiiai-l<ecl eii,ch of 26 patterns v.-itli corresponding feedlmck symbols for training. To initialize t h P inodcl pa,raiiir- ters. wc set output, probabilities to &: ,ivhere 256 is the quantization I~vcl. The trmsitioii proha hili ties were in i t,ializ.ecl by the unitormly d i s t r ihe t l ran dorn number, 165th t,hesc ini- L i d paramet.ers, tlrr Forward-Backward algorithiii w a s rim recursively 011 the tra.iniiIg datir.

r 1

11

Figure 4: Step responses of t h e system

, . I lip ~auni -Wclc l i algorithm was used itcratively to reestimate t,lie paramekrs lmsed on the forward iuid backward variables. After each iteration, thc out,put probahility c1isthiL)utions were xiioot.hed using a, floor between 0.0001 and 0,00001 and renormalized to meet st,ocbastic const,raints. Twelvr iterations wcre run for t,lie training processes.

The I.rained HMMs were t.lien put, into the model bmk and t,he IIMhl-bascd con~roller was uiiployrcl to coiitrol tlie system. Figure -I sliows a comparison of st,ep respomps Ihelween t.he phase-lead cont,rolled system and tlir HI\.Ih~:l-l~ased control system. JVe noticr t.hak. the ovrrslioot of the HMXbased system is less t.1ia.n thal of the phase-lead cont,rollcr system. brit. the stcady state error of the HMM-based s j en1 is larger. Figure 5 shows t,ra,cltiiig performance of the HMI\;I-based controller. The HMh,l-ba.sed control system tritcks a square wiive refercnce input, well. In all siniiilat,ions! we added the Gaussian whit,c noise wi th zcro i i i r an mtl variance 0.01. These result,s denionst,rate that IIMMs are ahlc to learn a cont,rol st,riiteg and to control systems.

In thc H h l l ~ f ~ b a s e d routroller, tlie probability of an H M M matching with t,lie feedhark signal p:i.tt,ern clelerniines the contribut,ion of the HMM to controller output,. i.e.. the prohahili!,y is a \wight . However. there is no criterion for selerting rri. in Equation(8). The relat.ion between i r i m d summation of square error for the system discussed in this report, is shown i i i Figiirc ti. T l ~ e summatioil of square error is a minimum when rn = 8 and remains uiicliaiigcd wlieii ' i t ) . >_ 1 3 'l'his irnplies t,liat for conipiit,ing control outpnls, a large nnmhei- of the pat,.eriis C I O P ~ not giiara.1itt.e better perfornia.ncc.

4.2 Inverted Pendulum

Thr inwr!ed pendiilrim system shown in Figure 7 consists of an invertcd pendulum of Ic1igt.h L and I I I ~ I , ~ 177 a.nd a c a t of inass M. The pivot of the pendulum is niounted on a. c u t .

In

Figuie 6: The relation bebeen ~ J I and 2=1 ti?)

I :3

L X - J

Figure 7: hiwrted pciiduluni system

m:liicli cai i move i n a horizont~al direction. The cart. is controlled by thc Iiurizont.al force 1 1 .

t.lie input to the systeiii.

\\jv assuiiie t.hat the pmdulun-1 is a narrow, uniform rod and there is no friction at t,he pivot and iin act,iiator dynamics. The iiwerted peiidulum syst,ein was siiriulat,erl using the following cqiiatious of iniocion:

'- :3 ; . H = -\gsllld - S c o s B ) : ( 3 i j 4L

-. .x- (3i) n j ( ~ siiisti3 - ig sin 20) ~ ,f.t I ci

111 + m(1 + c.os2 8)

whew ~ I I = I (,il.y)> m = 0.1 j k g ) , L. = I (in). .f = 5 j k g j s ) : and g = 9.&1 ( I I ~ ~ ~ : I

This system wa,s siniulated iiunierically using Euler's approximatioil method, O [ k 7 lj = Bi4.j +TB[L:]. with a t,imr step T = 0.02 second. The sampling rate of the iiiverted peirt lulum'~ sta,tes and tlie rate at which control forccs are applied are the same as the siiTiulat,ioii rat,e* kc.. .%I Hz. Oiir control goal here was to balance the pendulum by applging a sequence ol right and left forces regardless of t,he cart.'s position and velocity. We assumed Clie pendulum angle # to lie measurable. To obtain the training data. we cmployed nonlinear coutrol law

3 hl = -gsin#, 4 L.

113)

(37 : )

3 hi = - C O S 0.

4L

14

where k1 = 25: k2 = 10,

In order t,o obt,ain t.raining data, 80 iterations of simulation were condiicted with the above control law and init,ial srate [X>%:O, @" = [l.O,O.O: 5.0: O.OIT. The noisic with zcro mean and 0.01 variance was added to initial \Fahe of 0 t,o avoid the same dat,a for all itcrations. The values of 0 and 11 weric recorded for training the HMM. The feedback signal 0 was quanbized into 256 levels and partitioned into 40 decision pat,terns. Therefore, a total of 40 HMMs. each ol which c.ont,ains 6 states, was employed to encode decision patterns. Each HMM was trained by corresponding data. Eight sequential samples of feedback signal were rvaluat,ed each time to det,erniine control pat,terns. Figure 8 compares the response of the HMM-bascd control sysrem and the ideal response of the nonlinear control system.

5 Conclusion

In many rea,] world practices, the rontrol strategy cannot, be explicitlyexpressed as a function, but, can Le given by exa,mples. Learning from examples is a feasible way t,o develop a controller autoInat,ically. If t,he control strategy has uncertainties, the learning met,hod must have the mechanism to cope wit,h stochastic nature. HMM is a powcrful paramet,ric model and is feasible to characterize a doubly stochastic process. This report presents a learning cont,roller based on BMM to characterize and learn t,he decision pat,terns. We have developed a technique to build an HMM-based learning controller. We demonstrat,ed the feasibility of the proposed method by simulation studies. We investipted two cases: a linear system and an invert,ed pendulum system.

Although we have examined only cases of well known controllers as teachers, the teacher can be any t,ype controller: including cont,rol strategies used by a human. For a MIMO system, it is possible to use a multi-dimensional HMM to encode the control strategy. 4 multi- diniensional H M M is an HMM which has more than one observable symbol at each time t and is capable of dealing with niultiple feedback signals. A multi-dimensional HMM is also appropriate for fusing different, sensory signals with different physical meanings. Howewr, bo use HMMs for an MIMO system, more efficient method must be developed t,o generate multiple control outputs. A possible way is to use the VQ technique for partitiouing the conbrol space and then to use the same met,hod as an SISO system for generating each component of control vector. The future research on HMM-based strategy modeling lies in both theory and practice. Because a HMM-based cont,roller is nonlinear, it is difficult to analyze t,he HMWhased system. .4 t,heory is needed to analyze the stability and performance of the system. In practice, more realistic problems should be investigated which Irads to t,he development of new met,hods for designing HhIhI-based controller. One short term extension is to develop a systematic proccdure for optimally partitioning cont,rol space.

15

I C 0.5 1 1.5 2 2.5

Tvns (5emndl

Figure S: Responses of 1IMR;l-based system and the tea.r.he1

16

Acknowledgement

rhe authors \vould likr to thank Dr. 7'. Kanade, and Dr. suggestions and discussions.

C.S. Chen Tor t,heir valiuable

References

[I] G . Saridis, "Application of pa.ttern recognition methods to control systems," IEEE Trans. on Automatic Con.trol, Vol. AC-26, No.3, pp. 638-645, 1981.

[ 2 ] K.S. Fii? "Thr theoretical background of recognition as applicable to iridostrial control," Learning System and Pattern. Recognition in Industrial Control (Pmceedin.ys of th,r ihhth. Annud Adoanced Con.lrol Conference): pp. 29-40, 1383.

[3] C.C. Lee, "Fuzzy logic in control systems: fuzzy logic controller ~~ part I:" IEEE Tmns. on. system.^. Mu.TI., an.d Cybernetics, Vol. 20! No. 2, pp. 404-418. 1990.

[ 3 ] C.C. Lee, "Fuzzy logic in control systems: fuzzy logic controller, part 11:' IKEE 7i-ans. on Syslern.s, M m , a,n,d Cybermtics, Vol. 20, No. 2, pp. 419-435, 1990.

[5] F'anos J . h t sak l i s , Guest Editor, Special issue 011 neural networks in control systerns, IEEE Con.tro[ System. Magagazine: Vol.10, No. 3 , pp. 3-87, 1930.

[6] W.'I.'. Miller, R. S. Sutton and P.I. Werbos, Edihrs, '' Neural network for c.ont,rol," ;\UT prcss, 1990.

[TI Panos J. Antsaklis, Guest E.ditor, Special issue on neural networks in control systems, IEEE Cont~-ol Systern. Mqazine, Vo1.12, No. 2: pp. 8-57, 1992.

[SI B. Kosko, "Neural Xetwork and Fuzzy Systems,'' Englewood Cliffs, NJ, Prent,ice Hall, 1992.

[Y] Panos J. .4ntsaklis, "Neural net,work in control systems: IEEE Con.trol System. M a g o - zinc: V01.12~ No. 2, pp. 8-10, 1992.

[ lo] L.R. Kabincr, "A Tutorial on hidden htlarkov models and select,ed applications in sprech recognilion." Proceedings or the fEEE. Vol. 77, No. 2, pp.257-286, 1989.

1111 K.F. Lee, H.W. Hon and R. Reddy. '.An overview of the SPIlINX speech recognition system," IEEE h n s . on ASSP, Vol, 38, Yo. 1, pp. 35-45, 1990.

(121 X.D. Huang, 1'. Arih-i and 16. A. .Jack, "Hidden Markov Models for Spcech Recognition,'' Ldinbwqh. C7nivemity PJY~S? 1990.

[13] X.D. Huang, "Phoneme classification using semicont,iniious hidden Markov models," IFEE Trans. on ASSP, Vol. 40, No. 5 . pp. 1062-1067! 1932.

17

114) B. Ha,rinaford, P. Lee, “Hidden Markov model analysis of force/ torque information i n tekmanipulation,? Th.e Interna.tional Journa.l vf Robotics Research, Vol. 10, No. 5> pp.52Y-539, 1991.

(1.51 E. I<. Puok and D. Ballard, “Recognizing teleoperated manipulations,” ProcePdin.gs of 1.993 IEEE I n k r . Cunf, on Robdics and Automation? .4tlanta, Georgia, USA: Vol. Z1 pp.578-58.5> 1993.

[16] V. Krishnamurthyl J.B. Moorel and S.-II. Chung, “Hidden Markov niodel signal processing in prescnce of unknown deterministic interferences,” IEEE Trans. kutom. Contml, Vo1.38, No.1, pp. 146.152, 1993.

[17] J. Yang. Y . Xu and C.S. Chen, “Hidden Markov model approach to skill learning and its application in telerobotics,” Plvceedings of 1993 IEEE Internativn.aE C h j w e n c r OTJ.

Robotics and Aulontation, pp. 396-402: 1993.

[18] J. Yang, Y . Xu, and C.S. Chen, Gesture interface: modeling and learning, Procnedings of 1994 IEEE International Conjerence on Robotics and Au.tvm.ation, May 8-13> 1!)94.

[I91 R.M. Gray, “Vector quantization,” IEEE ASSP Magazinc, Vol. 1: Xo.2: pp. 4-29, 1981.

1201 Y. Linde, A. Buzo, and R.M. Gray; An algorithm for vector quantizer dcsign:” IEEE Trans. o’n. Conimun.? Vol. COM-28 pp.84-95, 1980.

1211 E<. S. Fu, “Syntactic Methods in Pattern Remgnition:” I.’olu.me I 1 2 of Mathrumtics in Science and Engineering, New York: Academic Press, 1974.

[Z] LE. Baum, T. Petrie, G. Soules, and N,Weiss, ‘‘ A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chins,’’ Ann. Math.. Stat.: Vol. 41, No. 1, pp. 164-171, 1970.

[23] A . Gucz and J. Selinsky. “A trainable neuromorphic controller?” Journal of Robotic .yystems> voib1.5, No.4, pp. 363-388, 1988.

Documents

Hidden Markov Model for Control Strategy Learning · For a discrete HMM, B = {hj(Ok)}, where Ok represents a discret,e observation symbol. For a continuous HMM, B = {bJ(x)}? where