6
Emotional Control of Inverted Pendulum System A soft switching from Imitative to emotional learning Mehrsan Javan-Roshtkhari, Arash Arami and Caro Lucas Control and Intelligent Processing Center of Excellence School of ECE, University of Tehran Tehran, Iran [email protected] , [email protected] , [email protected]   Abstract    Model-free control of unidentified systems with unstable equilibriums results in serious problems. In order to surmount these difficulties, firstly an existing model-based controller is used as a mentor for emotional-learning controller. This learning phase prepares the controller to behave like the mentor, while prevents any instability. Next, the controller is softly switched from model based to emotional one, using a FIS 1 . Also the emotional stress is softly switched from the mentor- imitator output difference to the combination of objectives generated by a FIS which attentionally modulated stresses. For evaluating the proposed model free controller, a laboratorial inverted pendulum 2  is employed.  Keywords - BELBIC , Imitati ve learning , Fuzzy inference system, model free control, Inverted pendulum system I. I  NTRODUCTION Imitation is a powerful mechanism for knowledge sharing,  particularly for intelligent agents. Imitativ e learning is an approach to artificial intelligence to transfer knowledge from an expert agent to another agent without any brain to brain transfers [1]. The imitation accelerates the learning speed and  performa nce of the learner. In learning by imitation, an agent who wants to learn (imitator) tries to achieve the same result which mentor has been gained. It is different from mimicking,  because the imitator d oes not act like mentor, it only performs an action which tends to the same results in environment. In many tasks in which there is an expert agent and new agent come to learn the proper action for doing the task, the imitative learning is employed. Imitative learning is widely used in robotics [2-5]. Development of new algorithms based on biological systems is an area of interest. The most important aspect of any intelligent system is its’ capability of learning. Although the learning process can be done in various ways, the main aim is adaptation of parameters to improve the performance of the system and come over the changes in the environment [6]. As the emotional behavior of humans and other animals is an important part of their intelligence, modeling this process leads to have an intelligent system with fast learning ability [7]. Although evolution mechanism codes emotional reactions in 1  Fuzzy Inference Systems  2  The Digital Pendulum Control System, cra ne system, manufactured by Feedback Instruments Limited, England  animals, the mammalian can learn them very fast. In biological system, emotional reactions are utilized for fast decision making in complex environments or emergency situations. The main part of mammalians’ brain which is responsible for emotional processes is called the limbic system. Several attempts have been made to model the limbic system [8, 9]. The computational models of Amygdala and Orbitofrontal cortex which are the main parts of limbic system in the brain were first introduced in [10]. Consequently, based on works in [10], brain emotional learning based intelligent controller (BELBIC) which is an intelligent controller introduced in [11]. The fast learning ability of BELBIC makes it a powerful model free controller for many tasks. BELBIC is applied on several applications such as control of intelligent washing machines [12, 13], in [14] a modified version of BELBIC is employed for controlling heating, ventilating and air conditioning (HVAC) systems. Moreover, the BELBIC is used in time series  prediction [15] and sensor-d ata fusion [16]. The real-time implementation of the BELBIC for interior permanent magnet synchronous motor (IPMSM) drives was first introduced in [17]. The controller was successfully implemented real-time by using a digital signal processor board for a laboratory 1-hp IPMSM and the results show fast response, simple implementation, robustness with respects to uncertainties such as manufacturing imperfections and good disturbance rejection. Another real-time implementation of BELBIC in position tracking and swing damping of laboratorial overhead crane in computer control via MATLAB external mode is described in [18]. The stability of brain emotional learning (BEL) system which is used in control as BELBIC was discussed in [19], also in order to ensure stability of the system, a general idea for choosing control parameters is described [19]. Also nonlinear combinations of objectives are used to design emotional stresses for BELBIC to control an overhead crane under uncertainties and disturbances [20]. The main drawback of model free controllers with learning ability -without any prior knowledge of the system’s dynamics- such as reinforcement learning based controllers and BELBIC is that in early stages of learning process, they may cause the low performance, due to producing wrong control signal. This  preliminary phase of learning can result in instability in some cases. After this first period of learning, if no instability occurs, the controller can learn the proper control signals to improve  performa nce gradually. Although BELBIC shows fast learning ability, it has the same problem, but in a shorter period of time. If the system is inherently unstable, applying these controllers Proceedings of the 4th International Conference on Autonomous Robots and Agents, Feb 10-12, 2009, Wellington, New Zealand 978-1-4244-2713-0/09/$25.00 ©2009 IEEE 651

Emotional Control of Inverted Pendulum System a Soft Switching From Imitative to Emotional Learning

Embed Size (px)

Citation preview

8/12/2019 Emotional Control of Inverted Pendulum System a Soft Switching From Imitative to Emotional Learning

http://slidepdf.com/reader/full/emotional-control-of-inverted-pendulum-system-a-soft-switching-from-imitative 1/6

Emotional Control of Inverted Pendulum SystemA soft switching from Imitative to emotional learning

Mehrsan Javan-Roshtkhari, Arash Arami and Caro Lucas

Control and Intelligent Processing Center of Excellence School of ECE, University of TehranTehran, Iran

[email protected] , [email protected] , [email protected]  

 Abstract  —   Model-free control of unidentified systems with

unstable equilibriums results in serious problems. In order tosurmount these difficulties, firstly an existing model-based

controller is used as a mentor for emotional-learning controller.This learning phase prepares the controller to behave like the

mentor, while prevents any instability. Next, the controller issoftly switched from model based to emotional one, using a FIS1.

Also the emotional stress is softly switched from the mentor-

imitator output difference to the combination of objectivesgenerated by a FIS which attentionally modulated stresses. For

evaluating the proposed model free controller, a laboratorialinverted pendulum2 is employed.

 Keywords- BELBIC, Imitative learning, Fuzzy inference

system, model free control, Inverted pendulum system

I.  I NTRODUCTION

Imitation is a powerful mechanism for knowledge sharing, particularly for intelligent agents. Imitative learning is anapproach to artificial intelligence to transfer knowledge froman expert agent to another agent without any brain to braintransfers [1]. The imitation accelerates the learning speed and performance of the learner. In learning by imitation, an agentwho wants to learn (imitator) tries to achieve the same resultwhich mentor has been gained. It is different from mimicking,

 because the imitator does not act like mentor, it only performsan action which tends to the same results in environment. Inmany tasks in which there is an expert agent and new agentcome to learn the proper action for doing the task, the imitativelearning is employed. Imitative learning is widely used inrobotics [2-5].

Development of new algorithms based on biologicalsystems is an area of interest. The most important aspect of anyintelligent system is its’ capability of learning. Although thelearning process can be done in various ways, the main aim isadaptation of parameters to improve the performance of thesystem and come over the changes in the environment [6]. Asthe emotional behavior of humans and other animals is animportant part of their intelligence, modeling this process leads

to have an intelligent system with fast learning ability [7].Although evolution mechanism codes emotional reactions in

1 Fuzzy Inference Systems

 2 The Digital Pendulum Control System, cra ne system, manufactured by Feedback Instruments

Limited, England 

animals, the mammalian can learn them very fast. In biologicalsystem, emotional reactions are utilized for fast decisionmaking in complex environments or emergency situations. Themain part of mammalians’ brain which is responsible foremotional processes is called the limbic system. Severalattempts have been made to model the limbic system [8, 9].The computational models of Amygdala and Orbitofrontalcortex which are the main parts of limbic system in the brainwere first introduced in [10]. Consequently, based on works in[10], brain emotional learning based intelligent controller

(BELBIC) which is an intelligent controller introduced in [11].The fast learning ability of BELBIC makes it a powerful modelfree controller for many tasks. BELBIC is applied on severalapplications such as control of intelligent washing machines[12, 13], in [14] a modified version of BELBIC is employedfor controlling heating, ventilating and air conditioning(HVAC) systems. Moreover, the BELBIC is used in time series prediction [15] and sensor-data fusion [16]. The real-timeimplementation of the BELBIC for interior permanent magnetsynchronous motor (IPMSM) drives was first introduced in[17]. The controller was successfully implemented real-time byusing a digital signal processor board for a laboratory 1-hpIPMSM and the results show fast response, simpleimplementation, robustness with respects to uncertainties suchas manufacturing imperfections and good disturbance rejection.

Another real-time implementation of BELBIC in positiontracking and swing damping of laboratorial overhead crane incomputer control via MATLAB external mode is described in[18]. The stability of brain emotional learning (BEL) systemwhich is used in control as BELBIC was discussed in [19], alsoin order to ensure stability of the system, a general idea forchoosing control parameters is described [19]. Also nonlinearcombinations of objectives are used to design emotionalstresses for BELBIC to control an overhead crane underuncertainties and disturbances [20].

The main drawback of model free controllers with learningability -without any prior knowledge of the system’s dynamics-such as reinforcement learning based controllers and BELBICis that in early stages of learning process, they may cause the

low performance, due to producing wrong control signal. This preliminary phase of learning can result in instability in somecases. After this first period of learning, if no instability occurs,the controller can learn the proper control signals to improve performance gradually. Although BELBIC shows fast learningability, it has the same problem, but in a shorter period of time.If the system is inherently unstable, applying these controllers

Proceedings of the 4th International Conference on Autonomous Robots and Agents, Feb 10-12, 2009, Wellington, New Zealand

978-1-4244-2713-0/09/$25.00 ©2009 IEEE 651

8/12/2019 Emotional Control of Inverted Pendulum System a Soft Switching From Imitative to Emotional Learning

http://slidepdf.com/reader/full/emotional-control-of-inverted-pendulum-system-a-soft-switching-from-imitative 2/6

may cause the system become unstable, and the process must be stopped in order to prevent damages. So, BELBIC cannot beapplied on such systems. To solve this problem, anotherapproach is introduced.

In this paper we employed BELBIC to control an inverted pendulum. As the pendulum angle is very sensitive to thecontrol signal and any wrong changes in control signal makesthe system oscillates and the pendulum falls down. So, if weuse BELBIC as a complete model free controller that learn

from scratch individually the learning phase will be impossible,and the pendulum will fall down many times. For solving the problem and accelerating the learning phase, a new approach isused. First, BELBIC learns from a classical simple controllerof the system imitatively. The classical controller can be asimple one that only stabilizes the system, regardless of good performance and robustness. Then the output of BELBIC isgradually applied to the system and it replaces the initialcontroller. The important part in switching between controllersis changing the emotional signal of BELBIC due to change inobjective. When BELBIC learns to imitate behavior of theinitial controller, the objective is reducing the error betweencontrollers output, and when BELBIC replaces the initialcontroller the objective is reduction of tracking and angle error.

This paper organized as follows: Section 2 brieflyintroduces BELBIC, in Section 3 a description of the proposedcontroller is demonstrated, section 4 discusses the simulationresults and finally section 5 concludes the paper.

II.  BELBIC

The BELBIC structure is a simple computational model ofmost important parts in limbic system of brain, Amygdala andOrbitofrontal cortex. Fig. 1 shows the schematic diagram ofBELBIC structure and each part of it will be described briefly[10].

As depicted in Fig. 1, the system consists of four main parts. As it is seen, Sensory Input signals first entered inThalamus. Thalamus is a simple model of real thalamus in the

 brain in which some simple pre-processing on sensory inputsignals is done. After pre-processing in Thalamus, the signalwill be sent to Amygdala and Sensory Cortex. Sensory cortexis responsible for subdivision and discrimination of the coarseoutput from thalamus and then sent it to Amygdala andOrbitofrontal cortex. Amygdala is a small structure in themedial temporal lobe of brain which is thought to beresponsible for the emotional evaluation of stimuli. Thisevaluation is in turn used as a basis of emotional states,emotional reactions and is used to signal attention and layingdown long-term memories. And the last part, OrbitofrontalCortex, is supposed to inhibit inappropriate responses from theAmygdala, based on the context given by the hippocampus[10]. In this section, functionality of these parts and thelearning algorithm is based on what is stated in [10].

Figure 1. Structure of BELBIC [10]

As the thalamus must provide a fast response to stimuli, inthis model the maximum signal, over all sensory inputs, S, issent directly to the Amygdala as another input (Eq. 1). Unlikeother inputs to the Amygdala, the thalamic input is not projected into the Orbitofrontal cortex, so it cannot be inhibited by itself.

(1)

In the Amygdala, each  A node has a plastic connectionweightV  . The sensory input is multiplied by the weight andforms output of the node.

(2)

In the Orbitofrontal cortex, each O  is similar to  A  nodes,and the output is calculated by applying connection weight

W  the input signal.

(3)

The model output can be computed as follow:

(4)

Where the  A  nodes produce their outputs proportionally totheir contribution in predicting the  stress, while the O  nodesinhibit the output of E  if necessary.

As it is observed in Fig. 1, except the thalamic signal goingdirectly to the Amygdala, the Amygdala and the OrbitofrontalCortex receive the same input signals. But the main difference between them is the learning rules.

The connection weights iV    are adjusted proportionally to

the difference between the reinforcement signal and theactivation of the  A  nodes. The a   term is a constant used to

adjust the learning speed.

(5)

As mentioned before, the task of the Amygdala is learningthe associations between the sensory and the emotional input togenerate an output. But the Eq 5 is mainly different fromsimilar associative learning systems, because this weight

adaptation rule is monotonic, i.e., the weights V  cannot bedecreased. At first, it may seem as a drawback of learning rule,

)max( ith S S   =

iii V S  A   =

iii W S O   =

å å-= ii O A E 

( )( )å-=D  jii  A stressS V  ,0maxa

Proceedings of the 4th International Conference on Autonomous Robots and Agents, Feb 10-12, 2009, Wellington, New Zealand

652

8/12/2019 Emotional Control of Inverted Pendulum System a Soft Switching From Imitative to Emotional Learning

http://slidepdf.com/reader/full/emotional-control-of-inverted-pendulum-system-a-soft-switching-from-imitative 3/6

 but this adaptation rule has biological reasons. According towhat occurs in Amygdala, once an emotional reaction islearned, this should become permanent. The Orbitofrontalcortex inhibits inappropriate reactions of Amygdala.

The Orbitofrontal cortex learning rule is very similar to theAmygdala rule:

(6)

The reinforcement signal for the O nodes is defined asdifference between model output  E   and the  stress  signal. In

other words, the O nodes compare expected and receivedreinforcement signal, and inhibit output of the model if there isa mismatch. The main difference between adaptation rule ofOrbitofrontal cortex and Amygdala, is that the Orbitofrontalconnection weight can be increased and decreased as needed to

track the required inhibiting of Amygdala. Parameter  b   is

another learning rate constant.

As discussed, BELBIC learns from its emotional signal and produce its output based on sensory inputs and connectionweights. In [19] the stability of BELBIC is demonstrated byusing cell to cell mapping method.

III. 

CONTROLLER DESIGN

As mentioned before, the test bed for evaluating thecontroller performance is an inverted pendulum which is a wellknown SIMO system. Controlling the inverted pendulum is ahard and interesting control task. The control task is trackingreference signal and stabilizing the pendulum. The systemwhich is used to evaluate the controller performance is anonlinear model of a laboratorial inverted pendulum system provided by Feedback Ltd.

Due to nonlinearity of system’s state equations andnonlinear properties of driving motor and friction, designing amodel based controller is a hard task. We used BELBIC as amodel free controller to control the inverted pendulum. Themain challenge in using BELBIC as a model free controller in

unstable systems or stable systems with unstable equilibrium point such as our test bed is the learning phase at the beginning.

As BELBIC has no information of system’s dynamics, performance of the controlled system may seems to be awful atthe beginning of learning process and the pendulum falls down.BELBIC has fast learning ability; and theoretically in the shorttime it should learn the proper control action according to itssensory inputs and emotional stress. But in this task the pendulum angle is very sensitive to the control signal, and anywrong changes in control signal make the system unstable.First we used BELBIC as the only controller of the system. It is possible for BELBIC to learn the proper control strategy, but inour simulation we find that this process will take too long or probably impossible in real applications.

To accelerate the learning process and avoiding making thesystem unstable, we proposed a new approach. This approachconsists of two parts, in the first part, a simple stabilizingcontroller used as the main control system and BELBIC learnsto imitate the behavior of this controller. In the second, afterBELBIC imitatively learned to stabilize the system from initial

controller, the controller is replaced with BELBIC. Due to thecapability of learning, BELBIC will learn to enhance thesystem’s performance.

 A.   Proposed Controller

According to idea of hierarchical controller structure, todesign a controller to satisfy various objectives, at first it isassumed that the objectives can be decoupled and then aseparate controller is designed  to satisfy each objective. After

that, outputs of these controllers must be fused together. Fig. 2shows the proposed BELBIC structure. As there are two majorobjectives, position tracking and pendulum angle regulating,two BELBICs are employed. The cart position error and itsfirst derivation are defined as sensory signals for one ofBELBICs and the pendulum angel and its first derivation arefor the other. In most of the previously reported structures ofBELBIC, they have only one neuron, because the sensory inputsignal was one dimensional. In our structure, as each BELBIChas two sensory inputs, they must have more than one neuronand for this task two neurons seems to be adequate.

Figure 2. Structure of controller

The emotional stress signal which will be described,couples the two separate controllers. Also by employing this

kind of stress signal there will be no need to use complexfusion block to combine the output of controllers, and just asummation operator is adequate [16]. And the computationalcost of output fusion is reduced to the cost of fusing some mainand auxiliary objectives in stress generator block. Also tochange the control objectives, and switching from imitativelearning to normal learning, there is no need to change thecontroller structure and only changing the emotional stresssignal is enough.

 B.  Stress generation

As stated before, BELBIC can show various behaviors byapplying different stress signal on it. So, to satisfy differentcontrol objective, proper stress signal must be defined based oneach objective. The ability of achieving more differentobjectives can be obtained by defining different stress signals.

1)   Imitative learningIn imitative learning, the objective is that BELBIC

 produces a similar control signal to the initial controller. So

reducing the difference between these two control signals is

)((  stress E S W  ii   -=D   b 

Proceedings of the 4th International Conference on Autonomous Robots and Agents, Feb 10-12, 2009, Wellington, New Zealand

653

8/12/2019 Emotional Control of Inverted Pendulum System a Soft Switching From Imitative to Emotional Learning

http://slidepdf.com/reader/full/emotional-control-of-inverted-pendulum-system-a-soft-switching-from-imitative 4/6

the main goal in this part and there is no more control

objectives. So the emotional stress signal is consists of two

signals, error of control signal and first derivation of this error. 

(7)

The reason to add ue& in above sum is that in sometimes

ue may become zero, while these control signals may have

completely different behaviors.

2) 

 Improving the performance

After BELBIC imitatively learns the control action frominitial controller, it becomes the main controller of the systemand the initial controller is replaced with it. At this time thecontrol objective changes and reducing position tracking errorand angle error become new control objectives. So the stresssignal must be modified.

To satisfy more than two objectives, more complicatedcombination of stress signals which are associated with eachobjective is necessary. In order to make BELBIC capable tosatisfy more objectives, the emotional stress signal bycombining of stresses is applied, as describe as follow. Togenerate the proper stress signal for all objectives, at any time,

the more important objective must be attended more than theothers. To generate this attention mechanism we use linguisticrules.

a)   Fuzzy stress generation

In order to enhance the behavior of system the majorobjectives are defined. Moreover some extra objectives should be attended. Tracking error of the cart position and error of pendulum angle are the main concerns which needed to bedecreased as much as possible. One of the extra objectives is toavoid reaching edges of the rail which leads to breaking theoperation. To impose this behavior to the cart, closeness to theedges of rail must be punished via stress signal. Anotherimportant index which must be considered in every controltasks is energy of control force and its variations. This

objective is imposed to the stress generation unit by such a behavior that when the stresses of previous parts are small,BELBIC tries to decrease the control forces. Also when thesestresses are significant, the limiting of control force is relaxedto increase the possibility of fast responses. Figure 3 shows thestress generating function. By use of a set of linguistic rules themost salient objectives can be reinforced to generate emotionalstress with respect to contemporary situation.

To generate the stress signal, we used linguistic rules andthen import them to Sugeno fuzzy inference system [21]. Usingthis method to generate stress signal makes BELBIC capable toattend important parts of stress at any time. As it is mentioned before, four effective variables, errors of the cart position and pendulum angle, control force and first derivation of it and 16rules are employed for generating the emotional stress. Fig. 3shows some of the resulted fuzzy surface. The inputs of thestress generation function are angle and position error, controlforce and its first derivation.

0

0.05

0.1

0

0.005

0.01

0.02

0.04

0.06

0.08

0.1

Position-Error Angle-Error 

       S      t     r     e     s     s

 

0

0.05

0.1

-1

0

1

0.02

0.04

0.06

Position-Error Control-Force

       S      t     r     e     s     s

 

-1-0.5

00.5

1

-1

0

1

0.02

0.04

0.06

0.08

Control-ForceDerivation-of-Control-Force

       S      t     r     e     s     s

 

Figure 3. Resulted fuzzy surfaces

C.  Switching between controllers and stress signals 

As we observed in experimental results, hard switching

 between controllers and changing stress signals makes theBELBICs become unstable. So instead of hard switching, a softswitching must be employed, and the BELBIC control systemmust gradually replace the initial controller and at the sametime its emotional stress must be gradually changed. To do this,we employed a fuzzy inference system to make soft switching,as it common solution for soft switching. The human linguisticrules can be imported to Sugeno fuzzy inference system [21]easily. Fig. 4 shows the fuzzy surface for this task. The inputsof this fuzzy system are the two mentioned stresses (forimitative learning and improving performance), the error inimitative learning phase (difference between initial controlleroutput and BELBIC output). The mentioned fuzzy switch isused for switching between both controllers and stresses.

IV. 

R ESULTS 

To validate the result of proposed controller, the results arecompared with the original supplied controller, which consistsof two PID controllers [22]. The initial controller for imitativelearning phase is the mentioned original controller. In addition,without employing imitative learning, BELBIC did not learn

uuuueewewewStress   &&

321  ++=

Proceedings of the 4th International Conference on Autonomous Robots and Agents, Feb 10-12, 2009, Wellington, New Zealand

654

8/12/2019 Emotional Control of Inverted Pendulum System a Soft Switching From Imitative to Emotional Learning

http://slidepdf.com/reader/full/emotional-control-of-inverted-pendulum-system-a-soft-switching-from-imitative 5/6

the proper control signal in more than 150 seconds of training.Also the pendulum falls down many times in this duration.

20

30

40

50

0

0.05

0.1

0.02

0.04

0.06

0.08

timeImitative-Stress

       S      t     r     e     s     s

 

20

30

40

50

0

0.05

0.1

0.02

0.04

0.06

timePerformance-Stress

       S      t     r     e     s     s

 

20

30

40

50

0

0.05

0.1

0.01

0.02

0.03

timeerror 

       S      t     r     e     s     s

 Figure 4. Resulted fuzzy surfaces

In order to evaluate the ability of controller to rejectdisturbances, a random voltage produced by a Gaussiandistribution with zero mean and 0.1 of variance is applied tothe motor from in some instances. The time of applying thisvoltage is random variable which obtained from a uniformdistribution. The mentioned disturbance is applied 8 times fromthe 55th seconds until the end of simulation. The results oforiginal PID controller and proposed controller are depicted inFigs 5 and 6 respectively. As it is seen BELBIC can imitate the behavior of original controller in about 10 second from startingtime completely. After that, based on fuzzy switch structure,from 30 second to 50 the both controller are controlling thesystem and after it BELBIC controls the system individually. Itis clear that after imitative learning, BELBIC performance inreducing tracking and angle error is far better.

As it can be seen, BELBIC clearly shows better performance in tracking and disturbance rejection which is the

results of its learning capability.

20 40 60 80 100 120 140-0.5

0

0.5

   C  a  r   t   P  o  s   i   t   i  o  n

 

Desired Position Actual Position

20 40 60 80 100 120 140-0.05

0

0.05

   P  e  n   d  u   l  u  m   A  n  g   l  e

 

Figure 5. Results Results of original ly supplied controller (Double PID) in presence of disturbance

20 40 60 80 100 120 140-0.5

0

0.5

   C  a  r   t   P  o  s   i   t   i  o  n

 

Desired Position Actual Position

20 40 60 80 100 120 140-0.05

0

0.05

   P  e  n   d  u   l  u  m   A  n  g   l  e

 

Figure 6. Results of BELBIC with fuzzy stress in presence of disturbance

To have a meaningful comparison these controllers, four performance measures are defined as follow and calculated to both control systems, originally supplied controller andBELBIC. As the disturbance applied in randomly selectedtimes, the experiments carried out 20 times and the statisticalmoments of the following parameters (Mean and standarddeviation) are calculated.

IAE: Integral Absolute Error (for cart position and pendulum angle)

IACF: Integral of Absolute values of Control Force

IADCF: Integral of Absolute values of derivation ofControl Force (shows the fluctuations of the control force)

These performance measures are calculated for the twomentioned controllers, in normal operation and withoutapplying disturbance and the results are demonstrated in TableI.

TABLE I. PERFORMANCE MEASURES OF VARIOUS CONTROLLERS

WITHOUT DISTURBANCE 

No-Disturbance IAE (position) IAE (angle) IACF IADCF

BELBIC- fuzzy stress 3.301 0.367 9.432 9.262

Double PID 3.925 0.457 12.219 14.353

Proceedings of the 4th International Conference on Autonomous Robots and Agents, Feb 10-12, 2009, Wellington, New Zealand

655

8/12/2019 Emotional Control of Inverted Pendulum System a Soft Switching From Imitative to Emotional Learning

http://slidepdf.com/reader/full/emotional-control-of-inverted-pendulum-system-a-soft-switching-from-imitative 6/6

From the Table I it is seen that BELBIC shows the fastlearning ability for tracking. Also the control force signalwhich is penalized by stress signal is lower than control forcein other controllers and has less oscillation.

In the presence of disturbance, the above mentionedmeasures are calculated for the controllers and the results are presented in Table II. As i t is seen, in presence of disturbanceBELBIC again shows better performance although its performance decreases slightly in comparison with normal

operation. But it shows better disturbance rejection androbustness.

TABLE II. PERFORMANCE MEASURES OF VARIOUS CONTROLLERS WITH

DISTURBANCE 

   E  m  p   l  o  y   i  n  g

   D   i  s   t  u  r   b  a  n  c  e IAE

(position)IAE (angle) IACF MADCF

E STD E STD E STD E STD

BELBIC-

fuzzy

stress

5.699 0.915 0.476 0.174 73.247 3.152 151.26 6.245

Double

PID11.725 2.141 1.692 0.371 82.705 3.247 160.58 7.831

V.  CONCLUSIONS 

In this paper a new approach in employing model freecontroller with learning ability is introduced based on a softswitching between two phases of learning. Although BELBIChas rapid and powerful learning capability, it cannot simplyused to control systems with unstable equilibriums. Theexperimental results show that by employing imitative learningat first phase, BELBIC can rapidly learn to produce propercontrol signal for controlling a system with unstableequilibrium point. After BELBIC learns imitatively from asimple classical designed controller, by gradually alter theobjectives and stress signals, we can reduce the tracking andangle error. Moreover it shows more robustness withdisturbances. Another advantage of BELBIC is that it produces

smoother control force with lower energy.

The fuzzy stress generation leads to superior performancein terms of tracking and angle error than alternative method forstress generation. Another interesting result is that BELBIC has better performance in presence of disturbance than theoriginally supplied controller, which is a model basedcontroller that is well tuned especially for this task. This is theeffect of learning capability of BELBIC, which can producemore appropriate control force at various working conditions.

R EFERENCES 

[1]  A.P. Shon, D.B. Grimes, C.L. Baker, R.P.N. Rao, A.N. Meltzoff, “ A

Model-Based Goal-Directed Bayesian Framework for ImitationLearning in Humans and Machines”, Cognitive Science, 2004.

[2] 

M.I. Kuniyoshi and I. Inoue, “Learning by watching: extracting reusabletask knowledge from visual observation of human performance,” inProc. of IEEE Trans. on Robotics and Automation, vol. 10, no. 6, pp.799–822, 1994.

[3]  Alissandrakis, C. L. Nehaniv and K. Dautenhahn, “Learning how to dothings with imitation,” in Proc. Learning How to Do Things, AAAI Fall

Symp. Series Sea Crest Oceanfront Resort &Conf. Center, North

Falmouth, MA, USA, pp. 1-8, Nov. 3-5, 2000.

[4] 

H. Mobahi, M. N. Ahmadabadi, B. N. Araabi, “Concept Oriented

Imitation, Towards Verbal Human-Robot Interaction”, Proc. of IEEEInternational Conference on Robotics and Automation, Barcelona,Spain, April 2005, pp.1507-12.

[5]  Lopes, M., Santos-Victor, J. Visual learning by imitation with motorrepresentations. IEEE Transactions on Systems, Man and Cybernetics,

Part B, Vol. 35, No. 3 (2005) 438–449

[6]  D. Shahmirzadi, “COMPUTATIONAL MODELING OF THE BRAINLIMBIC SYSTEM AND ITS APPLICATION IN CONTROL

ENGINEERING”, MSc thesis, Texas A&M University, USA, 2005.

[7]  C. Balkenius, and J. Moren, “Emotional learning: a computational modelof the Amygdala,” Cybernetics and Systems, 2001.

[8] 

C. Balkenius, J. Moren, “A Computational Model of Emotionalconditioning in the Brain,” workshop on Grounding Emotions inAdaptive Systems, Zurich, 1998.

[9]  J. Moren, “Emotion and Learning: A computational model of the

amygdale,” PhD thesis, Lund university, Lund, Sweden, 2002.

[10]  J. Moren, C. Balkenius, “A Computational Model of Emotional

Learning in the Amygdala: From animals to animals,” Proc. Of 6thInternational Conference on the Simulation of Adaptive Behavior,Cambridge, MIT Press, pp.383-391, 2000.

[11] 

C. Lucas, D. Shahmirzadi, N. Sheikholeslami, “Introducing BELBIC:Brain Emotional Learning Based Intelligent Controller,” International

Journal of Intelligent Automation and Soft Computing, pages11- 22,2004.

[12] 

R.M. Milasi, C. Lucas, B. N. Araabi, “Intelligent Modeling and Controlof Washing Machine Using Locally Linear Neuro- Fuzzy (LLNF)Modeling and Modified Brain Emotional Learning Based Intelligent

Controller (BELBIC),” Asian Journal of Control, 8 (4), 2005.

[13] 

R.M. Milasi, M.R. Jamali, C. Lucas, “Intelligent Washing Machine: ABioinspired and MultiObjective Approach,” Accepted in International

Journal of Control, Automation, and Systems.

[14]  N. Sheikholesla mi, D. Shahmirzadi, E. Semsar, C. Lucas, M. J.Yazdanpanah, “Applying Brain Emotional Learning Algorithm forMultivariable Control of HVAC Systems,” International Journalof Intelligent and Fuzzy Systems, 17 (1), pp 35- 46, 2006.

[15]  A. Gholipour, C. Lucas, D. Shahmirzadi, “Purposeful prediction of

space weather phenomena by simulated emotional learning”,  IASTED Internat ional Journal of Modelling and Simulation, Vol. 24, No. 2, pp.65–72, 2004.

[16]  D. Shahmirzadi, C. Lucas, R. Langari, “Intelligent signal fusionalgorithm using BEL- brain emotional learning”, 7th Joint Conference

on Information Sciences, JCIS’03, 1st Symposium on Brain-LikeComputer Architecture, September 26–30, Cary, NC, 2003.

[17]  R. M. Milasi, C. Lucas, B. N. Araabi, T. S. Radwan, M. A. Rahman,“Implementation of Emotional Controller for Interior Permanent MagnetSynchronous Motor Drive,” IEEE / IAS 41st Annual Meeting: IndustryApplications, Tampa, Florida, USA, October 8 – 12, 2006.

[18] 

M. R. Jamali, A. Arami, B. Hosseini, B. Moshiri, C. Lucas, “Real Time

Emotional Control for Anti-Swing and Positioning Control of SIMOOverhead Travelling Crane”, Int. Journal of Innovative Computing,

Information and Control (IJICIC), Vol.4, No. 9, pp. 2333-2344,September 2008.

[19]  D. Shahmirzadi and R. Langari, “Stability of Amygdala Learning

System using Cell-to-Cell Mapping Algorithm,” Journal of Intelligentsystem and control, 497-119, (2005)

[20] 

A. Arami, M. Javan Roshtkhari and C. Lucas, “A Fast Model Free

Intelligent Controller Based on Fused Emotions: A Practical CaseImplementation”, 16th IEEE Mediterranean Conference on Control and

Automation June 25-27, 2008, Corsica, France.

[21] 

T. Takagi and M. Sugeno. Derivation of Fuzzy control rules from humanoperators’ control actions. Proc of IFAC symp. On fuzzy information,

knowledge representation and decision analysis. Pp. 55-60, July 1983

[22]  Feedback Instrument Ltd, Digital Pendulum Control Experiments,

Manual: 33-935/936-1V60, 2002.

Proceedings of the 4th International Conference on Autonomous Robots and Agents, Feb 10-12, 2009, Wellington, New Zealand

656