[IEEE 2005 International Conference on Neural Networks and Brain - Beijing, China (13-15 Oct. 2005)] 2005 International Conference on Neural Networks and Brain - SA-RL Algorithm Based

SA-RL Algorithm Based Ship Steering Controller

Guang YEAutomation and Elec. Eng. College

Dalian Maritime UniversityDalian 116026

[email protected]

Abstract-Based on simulated annealing (SA) andreinforcement learning (RL) algorithm, a hybrid intelligentcontroller is proposed to ship steering. The SA algorithm is apowerful way to solve hard combinatorial optimizationproblems, which is used to adjust the parameters of thecontroller in this paper. The RL algorithm shows its particularsuperiority in ship steering, which just needs simple fuzzyinformation. With the advantages of the two algorithms, thecontroller can overcome the influence of the wind, wave andflow, the limitation that data are not exactly accurate. At last,the results of the simulation show that the ship course can beproperly controlled when changeable wind, wave, and measureerror exists.Keywords: Simulated Annealing; Reinforcement Learning;Ship Steering Control

L. INTRODUCTION

The ship course control directly influences themaneuverability, economy and security of ship navigationand the combat capacity of warship [1]. And it is acomplicated control problem in general. In order to solvethe uncertainty in the ship steering control, it's valuable inpractical application to find a new algorithm to adjust theparameters of the controller on-line.The fuzzy logic deals with linguistic and imprecise rules

based on expert's knowledge. Neural networks are appliedin the case where we do not have any rules but several data.The ANFIS (adaptive-network-based fuzzy inferencesystem) are the realizations of the functionality of fuzzysystems using neural networks [2]. And ANFIS canself-adjust the parameter- of the fuzzy rules usingneural-network-based learning algorithms. With the realprocess plant becoming more and more complex, thevarying of environment, the hydrodynamic nonlinearities ofthe system being considered, the demand to the learningalgorithms of ANFIS becomes strict.The simulated annealing (SA) algorithm is a powerful

way to solve hard combinatorial optimization problems [3].It was first used by Kirkpatrik and others to make computerdesign and combinatorial optimization, and also used inimage processing and neural network computing. It not onlyhas the merit of global convergence, but also needs littledomain knowledge in searching. So, it is used to adjust the

Chen GUOAutomation and Elec. Eng. College

Dalian Maritime UniversityDalian [email protected]

parameters of the ANFIS in this paper. But it is obvious thatthe accurate data, needed in the SA algorithm to train theANFIS, are hard to get for the disturbance and the error ininstrument measuring. In this circumstance, thereinforcement learning (RL) algorithm shows its particularsuperiority [4], which just needs very simple informationsuch as estimable and critical information, "right" or"wrong". This is info meaningful in the ship course control.For with the fuzzy rmation such as "good", "normal","bad" easily obtained from the control result, thereinforcement learning algorithm not only can adjust theparameters on-line to satisfy the real-time requirement, butalso improve the control result of the ship steering. Basedon simulated annealing and reinforcement learningalgorithm, a hybrid intelligent controller is proposed to shipsteering. The results of the simulation show that the shipcourse can be properly controlled when changeable windand wave exists.

II. CONTROLLER BASED ON THE SA-RLALGORMTH

Based on simulated annealing and reinforcement learningalgorithm, a hybrid intelligent controller is proposed asshown in Fig.l. In Fig.l, the evaluation network is athree-layered forward neural network, which has five inputsand single output; the action network is an ANFIS with twoinputs and one output. A calculated rudder angle 5(t)obtained from the ANFIS, added with the modified rudderangle AS computed from the evaluated reinforcementsignal by the evaluation network, turns out to be the actualrudder angle dr (t), which is used to control the shipmotion. Every time the ship motion system gets aninput-signal dir (t), and outputs three useful signals courseangle yi, yaw rate y and the reinforcement signal r,with which the two networks are adjusted on-line.

There are two closed loops in the control system, one is:Action Network -> dr * Ship MotionSystem - Vi,) - Action Network, and the other is:

Evaluation Network -> AS -* Ship Motion System-* r -> Evaluation Network. Although the evaluation

0-7803-9422-4/05/$20.00 C2005 IEEE1780

network is used to predict the signalfeedback controller in practice.

r, it is an appended

Di sturbance

z

Fig.1. Structure Of The Hybrid Intelligent Controller

A. Reinforcement Signal

Reinforcement leaming is learning what to do---how tomap situations to actions---so as to maximize a numericalreward signal, which is also called reinforcement signal inthis paper. So, it's vital to choose an appropriate signal asthe reinforcement signal in the reinforcement learningalgorithm.

Usually a reinforcement signal r can be defined as:(1) A two-value number {-1,1} r = -1 means

"failure" and r = 1 means "success".(2) A dispersed number in the range [0,1], like

re {-1,-O.5,0,0.5,1}, denotes the degree of"success", or 'failure" in several levels.

(3) A discrete number in the range [0,1],corresponding to different degree of success orfailure. The larger r, the better control effect.

Based on the requirement of the ship steering control andthe experience of the navigators [4], the reinforcementsignal r is defined as equation (1) for ship course controlin this paper.

0. 2IYf 0. d>(5r={0.5 20 <| ;V, ->5 <S

1 |'VF-

< 20where YVd is the set course, V/ is the actual course,

and Y'd -;V is the course error.

In equation (1), if the course error is less than 20,r = 1, it can be thought that the control result is good. If

the error is more than 5 , r = 0, the control effectshould be considered bad. Otherwise r = 0.5. Obviously,r is the second form, which has been mentioned.The reinforcement signal r(t) produced by the current

control variable dr (t) , just can be obtained at time stept + 1 . So the evaluation network has to be used to predictthe reinforcement signal r(t) .

B. Evaluation Network

A predicted reinforcement signal p(t) can be obtainedaccording to current input information from the evaluationnetwork, through which, the computed rudder angle 6(t)can be modified in advance to improve the controlperformance.

In this paper, the evaluation network is a normalthree-layered forward neural network, with five inputs andsingle output. The five inputs are set course Y'd , the actual

courseattime step t-1 and t-2 VI(t-1) ,t'(t-2)and the actual rudder angle gr (t -1) ' gr (t - 2) . Theerror function of the network is taken as

E(t) = 0.5 * (r(t) - p(t))2 (2)BP algorithm is used to learn the weights of the

evaluation network.Because of the limit of the paper length, the structure of

the evaluation network and the details of BP algorithm willbe not discussed in this paper.

C. Action Network

The structure of the action network is illustrated inFig.2[2]. The simulated annealing algorithm is used to learnthe parameters of the action network, to map the inputs ofthe network, the ship motion situations, to the output rudderangle 5(t), to control the ship motion system.

The The The The TheFirst Second Third Fouth FifthLayer Layer Layer Layer Layer

I I I II

Fig.2. Structure ofANFIS

The inputs ofthe ANFIS are A V =yVd-yf (y is

1781

the set course, yV is the actual course) and y = d V/ / dt .The output is the computed rudder angle 3i. The totalrules' number is nine, which is illustrated in table.l. Thei th rule of the ANFIS is defined as follow:

If Ayr is AWi and r is R1 then

fi =p.iA+qir +y.,i = 1,2,3..,9

Layer5 (Overall output):Layer 5consist of one fixed node denoted as E and it is

the aggregation of the all qualified consequents and it canbe shown as:

9 9

a= W iII Wii=l i=l

(8)

(3)

where AP and R respectively map the numericalvalue to fuzzy logic aggregate of the fuzzy logic variablesAyV and y .

The ANFIS has 5 layers, the relationships among thelayers are specified formally as:

Layerl (membershipffunctions):In this paper, AT and R are divided into three fuzzy

logic sub-areas, whose corresponding fuzzy membershipfunctions flAw (A V)-, AR (7) are described as:

X-C' 2b ]4(x,aj,bj,cj)=1/[1+ ( ) ]a1

j=N,Z,P(4)

where x is AVy or 7, the parameters a1,bj,cjare premise parameters, which are set in table.2.

Layer2 (fuzzy weights):Layer 2 consists of fixed nodes with notation Iland it is

basically the rule based represented by the weight, which isthe product of the membership functions of the given inputs

Wi =,Ai (X) -JBj (y) (S)where i is basically the number of rules.

Layer3 (normalizedjfuzzy weights):Layer 3 consists of fixed nodes by N. The outputs of

this layer are the normalized firing strengths calculated foreach i node as

9

Wi =wjLEj=l

(6)

where wi is defined as the ratio of the firing strengths ofthe rules and the sum of all firing strength of the rules.

Layer4 (normalized valueforfuzzy rules):Layer 4 consists of function nodes, which represent the

qualified consequent part of the ANFIS structure denotedas:

wifi = w (pix+q,y+ r,) (7)

where wi is the output of the layer 3 and pi , qi, Yis the parameter set, which represent the consequent part.

TABLE I

CONTROL RULES

N z p

N f141z 12151

p f3/6f9

TABLE II

PARAMTERS OFANFIS

a1 b' ci

N 20.0 2.0 -30.0

A z 20.0 2.0 0

P 20.0 .2.0 30.0

N 20. 2.0 -3.0

R Z 2.0 2.0 0

P 2.0 2.0 3.0

The parameters pi , qi, ri in the rules aggregate (3)are initially random numbers from zero to 1, which will beadjusted by the simulated annealing algorithm.

D. Simulated AnnealingAn algorithm that- simulates an annealing process on

computer is called simulated annealing algorithm. Theannealing process involves heating a solid to a hightemperature and then letting it gradually cool down to asolidifying point to form crystals. An inappropriateannealing, for example, reducing temperature too fast canresult in flaws or defects or glass without crystallizationorder. An improved SA algorithm [5] is used in this paper.

In this paper, the parameter vector of the simulatedannealing algorithm is described as equation (9), and theenergy function is described as equation (10).

Q =[pl,ql,Yl,P2,q2,y2, ... ppq9Y,9]T (9)

1782

J(Q) =-A(yvr -lj)2 +A22] (10)2

(0

The procedure of the simulated annealing algorithmcan be described as the following pseudo codes:Simulated AnnealingInitial(Qo, to, Lo);Do I

for l=1 toLk{ Generate(Qj from Si);

if J(Qj)<J(Qi) then Qi=Qj;else if exp(- .i ) 2 random(O,1)

tk

then Qi = Qj

k =k+1;LengthCalculate(Lk);ControlCalculate(tk);

Whtile(Stoplter)In the pseudo codes, several functions have to be

presented particularly:(1) Initial(QO,tO,LO) is the function that initials the

program. Qo is the initial value of the parameter

vector. In this paper, the parameters pi , qi, Y of Q0are initially 0.1. to is the initial temperature, which

should be great enough. Lo is the initial length of theMarkov chain, which is set to 100.

(2) Generate(Qj fromSi) is a generating function,

which creates a new from Qi in the field Si . Thisfunction can be illustrated in Fig.3.

Q1=Qi +AQ (11)A Q=AQ2 9 AQm m * 9 AQ)N ]T (12)

AQM = random(0,1) * 2q. - qm; m = 1,29X.. * N (13)In equation (13), q. is the maximum of AQm.

Fig.3. Diagram of Generating Function

(3) LengthCalculate(Lk) is a function that producethe length Lk of the Markov chain. In the iteration process,

the time of iteration should be great enough so that thesystem can get transient stabilization when the temperatureis tk. Usually, Lk is set to a constant. In this paper, whenthe simulated annealing algorithm is applied to the shipsteering control, some improvement has been made for themodification of Lk [5].

When the control parameter tk is high, the energy

function J(Q) may be up and down in the Metropoliscirculation. The time of iteration should be decreased toreduce the amount of calculation. When the controlparameter tk is low, the optimization process has become

convergent. If the length Lk is not big enough, the system

can't get transient stabilization. Then Lk should be

increased. The modification of Lk is shown as:

Lk+l =fliAk,k = 1,2,. (14)where , > 1, in this paper,l is set to 1.05.

(4) ControlCalculate(tk ) is a function that computethe control parameter tk at every time step k . The controlparameter often falls according to the following equation:

tk+1 =-tk 9 k = 1,2,.-- ., f (15)where a decides the decreasing velocity of the

temperature, and it is usually set to 0.5 - 0.99.When a is a constant, it is obviously that the different

require about the decreasing velocity of the temperaturebetween the high and low temperature region, is ignored. Inthis paper, the control parameter is adjusted as [5]:

tk+1 ak (16)

ax=axo+q (17)

In the equation (17), ao is the initial value of the

decreasing velocity of the temperature, and it is set to 0.9.q7 is a regulating gene, whose value will be increased a

little when the control parameter tk is decreased. But,

a must be always less than 1. It can be seen that when tkis low, the slower it decreases, the better the system will getstabilization; when tk is high, the faster it decreases, thesmaller the amount of calculation will be.

E. Calculation ofActual RudderAngle

In Fig. 1, the output 6(t) of action network does notdirectly act on ship. Instead, it is treated as a computedrudder angle. The actual angle dr (t) is obtained byexploring 6(t) with a modified angle A&(t):

1783

6r (t) = (t)± A(t)The modified rudder angle A6(t) can be described .

A((t)= range(p(t)) * (1- p(t))where gene (1- p(t)) can be explained as: if

reinforcement signal p(t) is equal to 1, it can be thothat the control quality is good enough that the actual ruangle should not be modified. Otherwise the smp(t) is, the larger AS(t) should be. (range(p(t)) that is a variable proportional coeffizdenotes the search range of Ag(t) [4]:

range(p(t)) = k

1+e2p(t)where k is a scaling coefficient which is set to 4 in

paper. if p(t) is large, range(p(t)) will be srmeans that the output of action network 6(t) is very cto the best action. If p(t) is small, range(p(t)) wil

large, so the actual rudder angle gr (t) will be cdifferent from 6(t) should be set.The selection of operator "+" or "-2 in equation (v

decided by two rules [5]:Rule 1: if r(t-1) . r(t-2), the operator at time

t remains the same as the last time step t-1.Rule 1: if r(t-1) < r(t-2), the operator at time

t will be changed from "+" to "-" or from "-" to "+".

I][. MATHMAIC MODEL OF SHIP STEERING

The linear Nomoto model has been widely acceptedesigning ship course controller

1. KT T

Where V/ is actual course, 65 is ruder angle, T is 1constant, k is rudder gain. The equation (21) is validsmall rudder angles and low frequencies of rudder actBut, it is necessary to consider the hydrodynanonlinearities, under some steering conditions liklcourse-changing operation. In order to better describeship steering dynamic behavior so that steering equa

(21) is also valid for rapid and large rudder angles,

Kmust be replaced with a non-linear term -H(oTH(Vt) is described as equation (22). So te nonlinear,;

(18) response model is expressed as equation (23).as: H(yr) = a Or+ fl/3(19)

K Kthe .. +TH(or) = K

-i-1.7. 7'

(22)

(23)

where parameters a, fi and K,T are related toship's velocity.

IV. SIMULATON RESULTS

20) This paper uses above algorithm in ship steering control,) in which ship response model is like equation (23).The

parameters of equation(23) used in the simulation study areK = 0.2511 , T = 206.7686 , a = 11.2944 ,

, = 9.5982 .These values are obtained from identificationAlose results of a frigate at a speed of 12m/s .11 be When accurate training data can be obtained, the SA:uite algorithm can perform well. But in fact, it is hard to get the

precise information Reinforcement learning, which justneeds simple fuzzy feedback information, has practical

8) is meaning in this case. In this paper, the following conditionsare assumed: set course is 400, wind force is Beaufort 5,

step wind direction is 300 , by adding a constantdisturbance 20 to simulate the case that the instrument has

step measurement error, the simulation results are shown in Figs4, 5 and 6. Fig.4 is the control curve where the constantdisturbance is not added and RL is not used. Fig 5 is thecontrol curve where the constant disturbance is added whenthe rudder angle is not adjusted by RL. Fig.6 show thecontrol curve, where rudder angle 6(t) is adjusted by

d in p(t) at time step 100s. It is easy to see that thereinforcement learning can reduce the static control error.

50 II I I I

I I I I II

I I II

I I I .Ark I4U

30

20

6/(m) 10.

X . -,

I I

I l-

. , , .

I I I

.-~~~~~Ia ..I a. .L fi X

iU F- k* WI* I II I I I I

I . I I I0 50 100 150 200 250

t/(S)Fig.4. Control Curve Without Disturbance

1784

,-- .- 9. - -F-Z _z

A,:l |III L

---- 1.

50

40

30

20

'5/(}) 10

0

-0

i u()40

20

0

23, Issue 3, May-June 1993 Page(s):665 - 685[3] Adler, D. "Genetic algorithm and simulated annealing: A marriage

proposal. Proc. " IEEE Int. Conf Neural Networks, San Fransisco,1993, II: 1104-1109

[4]Guoxun Yang, Chen Guo, Xinle Jia "Study on ship steering based onhybrid intelligent control." American Control Conference 2002, pp2118-2123

[5]Guoxun Yang, Chen Guo, Xinle Jia, Zhigang Meng "Fuzzy ShipCourse Control based on Simulated Annealing Algorithm" ShipBuilding of China2001 42(4):42-45,

[6] Yan-Hwang Kuo, Jang-Pong Hsu and Cheng-Wen Wang. "A parallelfuzzy inference model with distributed prediction scheme forreinforcement leaming. " IEEE Transactions on system, man andcybernetics, 1998, 28(2):161-172

., . . . ..1 | I ! t| | I [ |

- . - r . _ z _ |_ _ __ L _ _ _ _ _ _ L _ _ _ _ _ _ L _ _ _ _ _ _L _ _ _ _ _ _ I

/ l | I I |f | § , ,/ I I | | |

| | | | I-J----L______L______L______L______I{ | | | |} I | | | !I | | I | IJ I I | | |4_ _ _ _ _ L _ _ _ _ _ _ L _ _ _ _ _ _ L _ _ _ _ _ _:L _ _ _ _ _ II | I I .l 1J | | @ | @J | I | I I

.l I I | | |K_____L______L_ ____L______L______I

| | | | || l l l l

11 | I | | $:1 | | | * IY_ . _ . L _ . 4 _ _ _ L e - _ _ _ _ * _ _ _ _ _ _ L _ _ _ _ _ _ I

50 100 160 200 2D0

Fig.5. Control Curve With Disturbance

-4U0 60

I , , |. . . ., , .. . ^ . .

- - b - ! - - _ _ X _ _ _ _ _ _ _jf * 1 1 1 |

Jt' | ! ! I IE 1 1 | | |{ I I I I I

e | I | | |4_ _ _ _ L _ _ _ _ _ _ _ _ _ _ _ L _ _ _ _ _ _ L _ _ _ _ _ _ t

s I I | IS lI | | I |

! _ _ _ _ L _ w _-_ _ _ _ _ _ _ L _ _ _ _ _ _ L _ _ _ _ _ _ |

| ! ! I| 1. | | |

______L _____ l______L______L______|

l l l l ll l l l| |

I ffi I I "1 1: 1 1 1

100 160 200 260tl/&)

Fig.6. Control Curve With RudderAngle Modified

V. CONCLUSIONS

Simulated annealing algorithm not only has the merit of globalconvergence, but also needs little domain knowledge in searching,which has been applied in image processing and neural networkcomputing by so many people. The reinforcement learning, whichjust needs very simple information such as estimable and criticalinformation, "right" or "wrong", is meaningful in the ship coursecontrol. Based on simulated annealing and reinforcement learningalgorithm, a hybrid intelligent controller is proposed to shipsteering. The simulation results show that the hybrid intelligentcontroller is feasible. It has some real application potential.

REFERENCES

[1] J. Q. Huang, "Adaptive Control Theories and Its Applications in Shipsystems," Beijing: National Defense Industry Press, 1992, pp.168-175.

[2] Jang, J,-S,R., "ANFIS: Adaptive-network-based fuzzy inferencesystems". IEEE Transactions on Systems, Man and Cybemetics, vol.

1785

L-

6G --------------------------F------ I

- -

-20

.An I

Documents

[IEEE 2005 International Conference on Neural Networks and Brain - Beijing, China (13-15 Oct. 2005)] 2005 International Conference on Neural Networks and Brain - SA-RL Algorithm Based