7
International Journal of Control, Automation, and Systems (2013) 11(3):496-502 DOI 10.1007/s12555-011-0243-y ISSN:1598-6446 eISSN:2005-4092 http://www.springer.com/12555 Wavelet Reduced Order Observer based Adaptive Tracking Control for a Class of Uncertain Nonlinear Systems using Reinforcement Learning Manish Sharma and Ajay Verma Abstract: This Paper investigates the mean to design the reduced order observer and observer based controllers for a class of uncertain nonlinear system using reinforcement learning. A new design ap- proach of wavelet based adaptive reduced order observer is proposed. The proposed wavelet adaptive reduced order observer performs the task of identification of unknown system dynamics in addition to the reconstruction of states of the system. Reinforcement learning is used via two wavelet neural net- works (WNN), critic WNN and action WNN, which are combined to form an adaptive WNN controller. The “strategic” utility function is approximated by the critic WNN and is minimized by the action WNN. Owing to their superior learning capabilities, wavelet networks are employed in this work for the purpose of identification of unknown system dynamics. Using the feedback control, based on re- constructed states, the behavior of closed loop system is investigated. By Lyapunov approach, the un- iformly ultimate boundedness of the closed-loop tracking error is verified. A numerical example is provided to verify the effectiveness of theoretical development. Keywords: Adaptive control, Lyapunov functional, optimal control, reduced order observer, reinforce- ment learning, Wavelet neural networks. 1. INTRODUCTION In many practical systems, the system model always contains some uncertain elements; these uncertainties may be due to additive unknown internal or external noise, environmental influence, nonlinearities such as hysteresis or friction, poor plant knowledge, reduced- order models, and uncertain or slowly varying parameters. Hence, the state observer for the uncertain system will be useful and apply to reconstruct the states of a dynamic system. The means to design adaptive observers through estimation of states and parameters in linear and nonlinear systems has been actively studied in recent years [1-3]. Adaptive observers, specially the reduced order observers, of nonlinear systems have attracted much attention due to their wide uses in theory and practice which, different from full order observers, needs to estimate only unmeasurable states of the studied system [4-7]. Reinforcement learning (RL) is a class of algorithms for solving multi-step, sequential decision problems by finding a policy for choosing sequences of actions that optimize the sum of some performance criterion over time [8-10]. In RL problems, an agent interacts with an unknown environment. At each time step, the agent observes the state, takes an action, and receives a reward. The goal of the agent is to learn a policy (i.e., a mapping from states to actions) that maximizes the long-term return. Actor-Critic algorithm is an implementation of RL which has separate structures for perception (critic) and action (actor) [11-14]. Given a specific state, the actor decides what action to take and the critic evaluates the outcome of the action in terms of future reward (goal). System identification plays a critical role in the designing of controllers for uncertain nonlinear systems. Controller is expected to provide efficient, safe and desired performance. To design such a controller a highly accurate model of the system is required which is quite difficult due to modeling inaccuracies. In such cases intelligent control tools are integrated with the control strategies to obtain reliable and accurate control performance. Since last decade wavelet networks have attracted much attention of researchers. A wavelet network is constructed as an alternative to neural networks as a system identification tool. Wavelet network integrate the space frequency localization property of wavelets with learning capabilities of neural networks to improve the function approximation ability. Wavelet networks finds application in multi-scale analysis and synthesis, time frequency signal analysis in signal processing, identification of nonstationary signals [15,16]. Due to its property of multi resolution analysis and suitability for the development of online tunable control laws, an adaptive wavelet based control strategies are cited in the literature [16,17]. Incorporating the advantages of WNN, adaptive actor- critic WNN-based control has emerged as a promising approach for the nonlinear systems. In the actor-critic WNN based control; a long-term as well as short-term © ICROS, KIEE and Springer 2013 __________ Manuscript received May 31, 2011; revised October 27, 2012; accepted February 26, 2013. Recommended by Editor Young ll Lee. Manish Sharma is with the Medicaps Institute of Management and Science, Rajiv Gandhi Technical University, Bhopal, India (e- mail: [email protected]). Ajay Verma is with Institute of Engineering and Technology, DAVV, Indore, India (e-mail: [email protected]).

Wavelet reduced order observer based adaptive tracking control for a class of uncertain nonlinear systems using reinforcement learning

Embed Size (px)

Citation preview

Page 1: Wavelet reduced order observer based adaptive tracking control for a class of uncertain nonlinear systems using reinforcement learning

International Journal of Control, Automation, and Systems (2013) 11(3):496-502 DOI 10.1007/s12555-011-0243-y

ISSN:1598-6446 eISSN:2005-4092http://www.springer.com/12555

Wavelet Reduced Order Observer based Adaptive Tracking Control for a

Class of Uncertain Nonlinear Systems using Reinforcement Learning

Manish Sharma and Ajay Verma

Abstract: This Paper investigates the mean to design the reduced order observer and observer based

controllers for a class of uncertain nonlinear system using reinforcement learning. A new design ap-

proach of wavelet based adaptive reduced order observer is proposed. The proposed wavelet adaptive

reduced order observer performs the task of identification of unknown system dynamics in addition to

the reconstruction of states of the system. Reinforcement learning is used via two wavelet neural net-

works (WNN), critic WNN and action WNN, which are combined to form an adaptive WNN controller.

The “strategic” utility function is approximated by the critic WNN and is minimized by the action

WNN. Owing to their superior learning capabilities, wavelet networks are employed in this work for

the purpose of identification of unknown system dynamics. Using the feedback control, based on re-

constructed states, the behavior of closed loop system is investigated. By Lyapunov approach, the un-

iformly ultimate boundedness of the closed-loop tracking error is verified. A numerical example is

provided to verify the effectiveness of theoretical development.

Keywords: Adaptive control, Lyapunov functional, optimal control, reduced order observer, reinforce-

ment learning, Wavelet neural networks.

1. INTRODUCTION

In many practical systems, the system model always

contains some uncertain elements; these uncertainties

may be due to additive unknown internal or external

noise, environmental influence, nonlinearities such as

hysteresis or friction, poor plant knowledge, reduced-

order models, and uncertain or slowly varying

parameters. Hence, the state observer for the uncertain

system will be useful and apply to reconstruct the states

of a dynamic system. The means to design adaptive

observers through estimation of states and parameters in

linear and nonlinear systems has been actively studied in

recent years [1-3]. Adaptive observers, specially the

reduced order observers, of nonlinear systems have

attracted much attention due to their wide uses in theory

and practice which, different from full order observers,

needs to estimate only unmeasurable states of the studied

system [4-7].

Reinforcement learning (RL) is a class of algorithms

for solving multi-step, sequential decision problems by

finding a policy for choosing sequences of actions that

optimize the sum of some performance criterion over

time [8-10]. In RL problems, an agent interacts with an

unknown environment. At each time step, the agent

observes the state, takes an action, and receives a reward.

The goal of the agent is to learn a policy (i.e., a mapping

from states to actions) that maximizes the long-term

return. Actor-Critic algorithm is an implementation of

RL which has separate structures for perception (critic)

and action (actor) [11-14]. Given a specific state, the

actor decides what action to take and the critic evaluates

the outcome of the action in terms of future reward

(goal).

System identification plays a critical role in the

designing of controllers for uncertain nonlinear systems.

Controller is expected to provide efficient, safe and

desired performance. To design such a controller a

highly accurate model of the system is required which is

quite difficult due to modeling inaccuracies. In such

cases intelligent control tools are integrated with the

control strategies to obtain reliable and accurate control

performance. Since last decade wavelet networks have

attracted much attention of researchers. A wavelet

network is constructed as an alternative to neural

networks as a system identification tool. Wavelet

network integrate the space frequency localization

property of wavelets with learning capabilities of neural

networks to improve the function approximation ability.

Wavelet networks finds application in multi-scale

analysis and synthesis, time frequency signal analysis in

signal processing, identification of nonstationary signals

[15,16]. Due to its property of multi resolution analysis

and suitability for the development of online tunable

control laws, an adaptive wavelet based control strategies

are cited in the literature [16,17].

Incorporating the advantages of WNN, adaptive actor-

critic WNN-based control has emerged as a promising

approach for the nonlinear systems. In the actor-critic

WNN based control; a long-term as well as short-term

© ICROS, KIEE and Springer 2013

__________

Manuscript received May 31, 2011; revised October 27, 2012;accepted February 26, 2013. Recommended by Editor Young llLee. Manish Sharma is with the Medicaps Institute of Managementand Science, Rajiv Gandhi Technical University, Bhopal, India (e-mail: [email protected]). Ajay Verma is with Institute of Engineering and Technology,

DAVV, Indore, India (e-mail: [email protected]).

Page 2: Wavelet reduced order observer based adaptive tracking control for a class of uncertain nonlinear systems using reinforcement learning

Wavelet Reduced Order Observer based Adaptive Tracking Control for a Class of Uncertain Nonlinear Systems using…

497

system-performance measure can be optimized. While

the role of the actor is to select actions, the role of the

critic is to evaluate the performance of the actor. This

evaluation is used to provide the actor with a signal that

allows it to improve its performance, typically by

updating its parameters along an estimate of the gradient

of some measure of performance, with respect to the

actor’s parameters. The critic WNN approximates a

certain “strategic” utility function that is similar to a

standard Bellman equation, which is taken as the long-

term performance measure of the system. The weights of

action WNN are tuned online by both the critic WNN

signal and the filtered tracking error. It minimizes the

strategic utility function and uncertain system dynamic

estimation errors so that the optimal control signal can be

generated. This optimal action NN control signal

combined with an additional outer-loop conventional

control signal is applied as the overall control input to

the nonlinear system. The outer loop conventional signal

allows the action and critic NNs to learn online while

making the system stable. This conventional signal that

uses the tracking error is viewed as the “supervisory”

signal [7].

These motivate us to consider the designing of WNN

reduced order observer based adaptive tracking

controller for a class of uncertain nonlinear systems

using reinforcement learning. WNN are used for

approximating the system uncertainty as well as to

optimize the performance of the control strategy.

The paper is organized as follows: Section 2 deals

with the system preliminaries, system description is

given in Section 3. Reduced order observer and

controller designing are discussed in Sections 4 and 5

respectively. Section 6 deals with the tuning algorithm

for actor-critic wavelets. The stability analysis of the

proposed control scheme and the observer is given in

Section 7. Effectiveness of the proposed strategy is

illustrated through an example in Section 8 while Section

9 concludes the paper.

2. SYSTEM PRELIMINARIES

2.1. Fundamentals of wavelet neural network

Wavelet network is a type of building block for func-

tion approximation. The building block is obtained by

translating and dilating the mother wavelet function.

Corresponding to certain countable family of am and b

n,

wavelet function can be expressed as [18]

/ 2: , .

d dn

m

m

x ba m Z n Z

⎧ ⎫⎛ ⎞−⎪ ⎪∈ ∈⎨ ⎬⎜ ⎟

⎪ ⎪⎝ ⎠⎩ ⎭ (1)

Considering

0 0 0, , , .

m m d

m na a b na b m Z n Z

= = ∈ ∈ (2)

The wavelet in (1) can be expressed as

( ){ }/ 2

0 0 0: , ,

md m d

mna a x nb m Z n Zψ ψ− −

= − ∈ ∈ (3)

where the scalar parameters a0 and b0 define the step size

of dilation and translation discretizations (typically a0 = 2

and b0 =1) and 1 2

[ , ,..., ]T n

nx x x x R= ∈ is the input vec-

tor.

Output of an n dimensional WNN with m wavelet

nodes is [18]

.

d

mn mn

m Z n Z

f α ψ

∈ ∈

= ∑ ∑ (4)

3. SYSTEM DESCRIPTION

Consider a single input single output (SISO) nonlinear

system of the form

1 2

2 3

( )

,

n

x x

x x

x f x u

y Cx

=

=

= +

=

(5)

where 1 2

[ , ,..., ] , ,T

nx x x x u y= are state variable, control

input and output respectively. ( ) : nf x ℜ →ℜ is a smooth

unknown nonlinear function.

Rewriting the system (5) as

( ( ) ),

,

x Ax B f x u

y Cx

= + +

=

(6)

where A is the system matrix of ,n n× B is the input

matrix of order 1n× and C is the output matrix of order

1 .n× Also n

x∈ℜ and .

py∈ℜ For all real time sys-

tems, .p n≤ Suppose that C has full rank, it is possible

to make a linear change of coordinates

,

m

u

Cx x

Q

ξξ

ξ

⎡ ⎤ ⎡ ⎤= = Λ =⎢ ⎥ ⎢ ⎥

⎣ ⎦⎣ ⎦ (7)

where Q is chosen so that Λ is an invertible matrix.

Also p

mξ ∈ℜ and .

n p

uξ −

∈ℜ

Applying the coordinate transformation (7), the plant (6)

takes the form

1

2

( , ),

( , )

( , ),

m um

m uu

m u

F

F

y

ξ ξξ

ξ ξξ

φ ξ ξ

⎡ ⎤ ⎡ ⎤=⎢ ⎥ ⎢ ⎥⎣ ⎦⎣ ⎦

=

� (8)

where 1( ) ( ).F fξ ξ−

= Λ

The motivation for the reduced order observer stems

from the fact that in the plant model (6), the state ξm is

directly available for measurement and hence it suffices

to build an observer that estimates only the unmeasured

state ξu. The order of such an observer will correspond to

the dimension of the unmeasured state, namely n p− ≤

n. This type of observer is called a reduced order observ-

er [6] and it has many important applications in design

problems.

The objective is to formulate a state feedback control

law to achieve the desired tracking performance. The

Page 3: Wavelet reduced order observer based adaptive tracking control for a class of uncertain nonlinear systems using reinforcement learning

Manish Sharma and Ajay Verma

498

control law is formulated using the transformed system

(6). Let 1

[ , , , ]n

T

d d d dy y y y

= � … be the vector of desired

tracking trajectory. Following assumptions are taken for

the systems under consideration.

Assumption 1: Desired trajectory ( )dy t is assumed

to be smooth, continuous n

C and available for mea-

surement.

4. WAVELET REDUCED ORDER OBSERVER

DESIGN

Applying the linear transformation (11), the system

(10) takes the form

11 12 11

21 22 21

11 12 1 2

( ( ) ),

,

m m

u u

m

u

x xA A Bf x u

x xA A B

xy C C y y

x

⎡ ⎤ ⎡ ⎤⎡ ⎤ ⎡ ⎤= + +⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦⎣ ⎦ ⎣ ⎦

⎡ ⎤⎡ ⎤= = +⎢ ⎥⎣ ⎦

⎣ ⎦

(9)

where p

mx ∈ℜ is the measured state, n p

ux

∈ℜ is the

unmeasured state and py∈ℜ is the output of the sys-

tem (5) which depends upon measurable states as well as

unmeasurable states and it is assumed that both the parts

of the output, y1 and y2 are explicitly available for mea-

surement.

Wavelet based reduced order observer that estimates

the states of the system (6) is given by

22 12 21 21ˆ ˆ ˆ( )u u u ux A x mA x x B u B f= + − + +

� , (10)

where ˆ

ux is the estimation of state vector ,

ux m =

1 2[ , , , ]

Tn pm m m−

… is the observer gain matrix, selected

such that the matrix 22 12

A mA− is stable. In this work

WNN is used for system identification. Substitution of

(9) in (10) results

22 11 11 12

21 21

ˆ ˆ ˆ( ( ( ) ) )

.

u u m m ux A x m x A x B f x u A x

B u B f

= + − − + −

+ +

Or equivalently

22 11 11

12

ˆ ˆ

ˆˆ .

u u m m

u

x A x mx mA x mB u

mA x f ε

= + − −

− + +

(11)

Note that the above equation contains ˆ

ux for the de-

sign of observer which is not available for the design. So

the given transformation has to be applied to generate an

intermediate state.

ˆ

u u mx x mx′ = −

Applying the above transformation, the observer takes

the form

22 11 11 12ˆ

ˆ ˆ ˆ

u u m ux A x mA x mB u mA x f ε′ = − − − + +� . (12)

Assumption 2: a) 1

ˆ( ( )) ( ( ))u

f x t f x t xγ− ≤�

b) For a symmetric positive definite matrix Q there ex-

ist a symmetric positive definite matrix P such that

22 12 22 12( ) ( )TA mA P P A mA Q− + − = − ,

21( )TPB C= ,

where ˆu u ux x x= −� is the unmeasurable state variable

estimation error while 1γ is a positive constant.

Now defining the error system as

22 12

12 2

ˆ( ) ( ( ) ( )),

.

u u

u

x A mA x B f x f x

y C x y

= − + −

= =

� �

� � �

(13)

With the help of the proposed tuning laws presented in

the next part of this subsection, the error term ( )f x� is

reduced to a small arbitrary value, which is further atte-

nuated by robust control term vr. Adaptation laws for the

wavelet network used to approximate ( )f x� will be:

1 2 1 1

2 2 1

3 2 1

ˆ ˆ ˆ ˆ( ),

ˆˆ ,

ˆˆ ˆ ,

T Tf f f f f

f f f

f f f

y A w B c

w w y A

c c y B

α α β ϕ

β α

β α

= − = − −

= − =

= − =

��

� �

��

� �

� �

(14)

where 1,β

2β and

3β are the learning rates with positive

constants.

5. BASIC CONTROLLER DESIGN USING

FILTERED TRACKING ERROR

Defining the state tracking error vector ˆ( )e t as

ˆ ˆ ˆ( ) ( ) ( ) ( ) ( )d m u

e t x t y t e t e t= − = + . (15)

The filter tracking error is defined as

ˆ ˆ

m m u ur K e K e= + , (16)

where 1 2 1

[ , , ]m n p

K k k k− −

= … and 1 2 1

[ , , ]u p

K k k k−

= …

are the appropriately chosen coefficient vectors such that

ˆ 0e→ exponentially as 0.ℜ→

Applying the feedback linearization method the con-

trol law is defined as

ˆ

ˆ ˆ

n

d m m u uu y K e K e r f= − − − − . (17)

Stability of the system (5) with the proposed observer

and controller strategy will be analyzed in the next

section.

6. ADAPTIVE WNN CONTROLLER DESIGN

A novel strategic utility function is defined as the

long-term performance measure for the system. It is ap-

proximated by the WNN critic signal. The action WNN

signal is constructed to minimize this strategic utility

function by using a quadratic optimization function. The

critic WNN and action WNN weight tuning laws are

derived. Stability analysis using the Lyapunov direct

method is carried out for the closed-loop system (6) with

novel weight tuning updates.

Page 4: Wavelet reduced order observer based adaptive tracking control for a class of uncertain nonlinear systems using reinforcement learning

Wavelet Reduced Order Observer based Adaptive Tracking Control for a Class of Uncertain Nonlinear Systems using…

499

6.1. Strategic utility function

The utility function 1

( ) [ ( )]m m

i ip k p k

=

= ∈ℜ is defined

on the basis of the filtered tracking error r̂ and is given

by:

2

2

ˆ( ) 0 if

ˆ 1 if > ,

i i

i

p k r

r

η

η

= ≤

=

(18)

where ( ) ,ip k ∈ℜ 1,2, ,i m= … and η ∈ℜ is the pre-

defined threshold. p(k) can be considered as the current

performance index defining good tracking performance

for p(k) = 0 and poor tracking performance for p(k) = 1..

The strategic utility function ( ) m

Q k′ ∈ℜ can be de-

fined using the binary utility function as

1

1

( ) ( 1) ( 2)

( ) ,

N N

k

Q k p k p k

p N

α α

α

+

′ = + + + +

+ +

(19)

where α ∈ℜ and 0 1α< < and N is the horizon. Above

equation may be rewritten as

1( )( ) min { ( 1) ( )}.N

u kQ k Q k p kα α

+= − − (20)

6.2. Critic WNN

The long term system performance can be approx-

imated by the critic WNN by defining the prediction

error as

ˆ ˆ( ) ( ) ( ( 1) ( ))N

ce k Q k Q k p kα α= − − − , (21)

where ˆ ( )Q k =1 1 1ˆ ( ) ( ( ))T Tw k v x kφ =

1 1ˆ ( ) ( ),Tw k kφ ( ) ,m

ce k ∈ℜ

ˆ ( ) m

Q k ∈ℜ is the critic signal, 1

1( )

n m

w k×

∈ℜ and 1v ∈

1nm n×

ℜ represent the weight estimates, 1

1( ) n

kφ ∈ℜ is

the wavelet activation function and 1n is the number of

nodes in the wavelet layer. The objective function to be

minimized by the critic NN is defined as:

1( ) ( ) ( ).

2

T

c c cE k e k e k= (22)

The weight update rule for the critic NN is derived

from gradient-based adaptation, which is given by

1 1 1ˆ ˆ ˆ( 1) ( ) ( )w k w k w k+ = + Δ , (23)

where 1 1

1

( )ˆ ( )

ˆ ( )

cE k

w kw k

α

⎡ ⎤∂Δ = −⎢ ⎥

∂⎣ ⎦ or

1 1 1 1 1 1

1

1 1

ˆ ˆ ˆ( 1) ( ) ( ) ( ( ) ( )

ˆ( ) ( 1) ( 1)) ,

T

N T T

w k w k k w k k

p k w k k

α φ φ

α α φ+

+ = − ×

+ − − −

(24)

where 1

α ∈ℜ is the WNN adaptation gain. The critic

WNN weights are tuned by the reinforcement learning

signal and the discounted past output values of critic

WNN.

6.3. Action WNN

The action NN is implemented for the approximation

of the unknown nonlinear function ( ( ))f x k and to pro-

vide an optimal control signal to the overall input u(k) as

2 2 2 2 2ˆ ˆ ˆ( ) ( ) ( ( )) ( ) ( )T T Tf k w k v x k w k kφ φ= = , (25)

where 2

2ˆ ( )

n m

w k×

∈ℜ and 2

2

nm n

∈ℜ represent the

matrix of weight estimate, 2

2( )

n

kφ ∈ℜ is the activation

function, n2 is the number of nodes in the hidden layer.

Suppose that the unknown target output-layer weight for

the action WNN is w2, and then we have

2 2 2 2

2 2 2

( ) ( ) ( ( )) ( ( ))

( ) ( ) ( ( )),

T T

T

f k w k v x k x k

w k k x k

φ ε

φ ε

=

=

(26)

where 2( ( )) m

x kε ∈ℜ is the WNN approximation error.

From (25) and (26), we get

2 2 2 2

ˆ( ) ( ) ( )

ˆ( ( ) ) ( ) ( ( )),T

f k f k f k

w k w k x kφ ε

= −

= − −

(27)

where ( ) mf k ∈ℜ� is the functional estimation error.

The action WNN weights are tuned by using the func-

tional estimation error ( )f k� and the error between the

desired strategic utility function ( ) m

dQ k ∈ℜ and the

critic signal ˆ ( ).Q k Define

ˆ( ) ( ) ( ( ) ( )).a de k f k Q k Q k= + −

� (28)

The objective is to make the utility function ( )d

Q k

zero at every step. Thus (28) becomes

ˆ( ) ( ) ( ).ae k f k Q k= +

� (29)

The objective function to be minimized by the action NN

is given by

1( ) ( ) ( ).

2

T

a a aE k e k e k= (30)

The weight update rule for the action NN is also a gra-

dient based adaptation, which is defined as

2 2 2ˆ ˆ ˆ( 1) ( ) ( ),w k w k w k+ = + Δ (31)

where 2 2

2

( )ˆ ( )

ˆ ( )

aE k

w kw k

α

⎡ ⎤∂Δ = −⎢ ⎥

∂⎣ ⎦ or

2 2 2 2ˆˆ ˆ( 1) ( ) ( )( ( ) ( )) ,T

w k w k k Q k f kα φ+ = − +� (32)

where 2

α ∈ℜ is the WNN adaptation gain.

The WNN weight updating rule in (32) cannot be im-

plemented in practice since the nonlinear function

( ( ))f x k is unknown. However, using (16), the func-

tional estimation error is given by

( ) ( ).f k r r kδ= − +

� (33)

Substituting (33) in to (32) ,we get

2 2 2 2ˆˆ ˆ( 1) ( ) ( )( ( ) ( ))Tw k w k k Q k r r kα φ δ+ = − + − +� .

To implement the weight update rule, the unknown but

bounded disturbance δ(k) is taken to be zero. Then, (32)

is rewritten as

2 2 2 2ˆˆ ˆ( 1) ( ) ( )( ( ) )Tw k w k k Q k r rα φ+ = − + −� . (34)

Page 5: Wavelet reduced order observer based adaptive tracking control for a class of uncertain nonlinear systems using reinforcement learning

Manish Sharma and Ajay Verma

500

Coincidentally, after replacing the functional approxi-

mation error, the weight update for the action NN is tuned

by the critic WNN output, current filtered tracking error,

and a conventional outer-loop signal. The block diagram

of the proposed control strategy is shown in Fig. 1.

7. STABILITY ANALYSIS

Consider a Lyapunov functional of the form

21 1ˆ

2 2

T

u uV x Px r= +� � . (35)

Differentiating it along the trajectories of the system

21

2

1 ˆ ˆ ˆ( ( ) ( )) (2

ˆˆ ˆ ˆ( ( ) ( ) ).

T T

u u u m m

n

u u u d

V x Qx x PB f x f x r K e

K e K f x u m y Cx y

= − + − +

+ + + + − −

� � �

Substituting control law u in the above equation

2

min 21

2 22

min 1 2 21

3 4

1ˆ( ( ) ( )

2

ˆ ˆ ˆ( ))+ ( )

1ˆ( )

2

ˆ ˆ( ),

T

u u

r

T

u u u u

u r

Q x x PB f x f x

f x r KmCx k r

Q x M x M x x PB f x

M r M x k r

= − + −

+ −

≤ − + + +

+ + −

� �

� � � �

where 1 21 3

,M PB γ=2 max 1

,M P γ=3 4

ˆmax ,M r γ= +

4maxM KmC= and

1 2 3 4, , , 0γ γ γ γ ≥

22

min 1

22 22

2 3

1

2

ˆ ˆ .2

u u

u r u

V Q x M x

yM x M k r r x

≤ − +

+ − + − +

� �

� �

The system is stable as long as

2

2 22

min

2

1 2 3

2 2

ˆ( ) .

r u

u

yk r Q x

M M x M r x

+ +

≥ + + +

� �

(36)

By proper selection of kr, P and Q, the above condition

can be satisfied.

8. SIMULATION RESULTS

Simulation is performed to verify the effectiveness of

proposed WNN reduced order observer based control

strategy. Considering a system of the form

1 2

2 3

3 1 2 3 1 2

1 2 3

,

,

5 6 9 0.01 sin ,

2 .

x x

x x

x x x x x x u

y x x x

=

=

= − − − + +

= + +

(37)

Here x1 and x2 are assumed to be known states and x3

is assumed to be estimated using the proposed reduced

order observer. System belongs to the class of uncertain

nonlinear systems defined by (5) with n =3. It is assumed

that only output is available for measurement. The

proposed observer controller strategy is applied to this

system with an objective to solve the tracking problem of

system. The desired trajectory is taken as 0.5sindy t=

0.1cos 0.4.t+ + Initial conditions are taken as (0)x =

[0.6,0.2,0.5] .T Attenuation level for robust controller is

taken as 0.01. Controller gain vector is taken as [10,k =

5,1]. Wavelet networks with discrete Shannon’s wavelet

as the mother wavelet is used for approximating the

unknown system dynamics. Wavelet parameters for these

wavelet networks are tuned online using the proposed

adaptation laws. Initial conditions for all the wavelet

parameters are set to zero. Simulation results are shown

in the figures. As observed from the figures, system

response tracks the desired trajectory rapidly.

ˆ

ux

ˆf

,m

y x

mx

r�

,m ue e

+

rdy

1z−

α

( )p k

+

+

+

+

r

1Nα

+

u

,m mx x�

ˆ

ux

mx

Fig. 1. Block diagram of the closed loop system.

Page 6: Wavelet reduced order observer based adaptive tracking control for a class of uncertain nonlinear systems using reinforcement learning

Wavelet Reduced Order Observer based Adaptive Tracking Control for a Class of Uncertain Nonlinear Systems using…

501

0 10 20 30 40-1

0

1

time(sec)

X1 a

nd y

d

0 10 20 30 40-1

0

1

time(sec)

x2 a

nd y

dd

0 10 20 30 40-1

0

1

time(sec)

X3(e

st)

and y

ddd

yd

x1

x2

ydd

x3(est)

yddd

Fig. 2. States and the desired trajectories.

0 10 20 30 40-5

0

5

10

time(sec)

Co

ntr

ol

eff

ort

0 10 20 30 40-1

-0.5

0

0.5

time(sec)

Tra

ck

ing

err

or

Fig. 3. Control effort and Tracking error

0 5 10 15 20 25 30 35 40-2

0

2

time(sec)

y

0 5 10 15 20 25 30 35 40-2

0

2

time(sec)

y(e

st)

0 5 10 15 20 25 30 35 40-0.5

0

0.5

time(sec)

outp

ut

err

or

Fig. 4. Actual output, estimated output and output error.

9. CONCLUSION

A WNN reduced order observer based adaptive

tracking control strategy is proposed for a class of

systems with unknown system dynamics. Adaptive

wavelet networks are used for approximating the

unknown system dynamics. Adaptation laws are

developed for online tuning of the wavelet parameters.

The stability of the overall system is guaranteed by using

the Lyapunov functional. The theoretical analysis is

validated by the simulation results. As observed from the

Fig. 2 that all the states are bounded and tracking their

desired trajectory. Fig. 3 indicates the control effort and

the tracking error between the system output and desired

output which converges rapidly. The plant output and the

estimated output are shown in Fig. 4. Also the error

between them is shown, which is of the order of 10–4

which reveals the efficient observer design. A

convergence pattern in the observer error is also reflected

from the Fig. 4.

REFERENCES

[1] H. Lens and J. Adamy, “Observer based controller

design for linear systems with input constraints,”

Proc. of the 17th World Congress, The Internation-

al Federation of Automatic Control, Seoul, Korea,

pp. 9916-9921, July 2008.

[2] F. Abdollahi, H. A Talebi, and R. V. Patel, “A sta-

ble neural network-based observer with application

to flexible-joint manipulators,” IEEE Trans. on

Neural Networks, vol. 17, no. 1, pp. 118-129, Janu-

ary 2006.

[3] M. Sharma, A. Kulkarni, and A. Verma, “Wavelet

adaptive observer based control for a class of un-

certain time delay nonlinear systems with input

constraints,” Proc. of IEEE International Confe-

rence on Advances in Recent Technologies in

Communication and Computing, ARTCOM, pp.

863-86, 2009.

[4] V. Sundarapandian, “Reduced order observer de-

sign for nonlinear systems,” Applied Mathematics

Letters, vol. 19, pp. 936-941, 2006.

[5] Z. F. Lai and D. X. Hao, “The design of reduced-

order observer for systems with monotone nonli-

nearities,” ACTA Automatica Sinica, vol. 33, no. 2,

pp. 1290-1293, 2007.

[6] Y. G. Liu and J. F. Zhang, “Reduced-order observ-

er-based control design for nonlinear stochastic sys-

tems,” Systems & Control Letters, vol. 52, pp. 123-

135, 2004.

[7] G. Bartolini, E. Punta, and T. Zolezzi, “Reduced-

order observer for sliding mode control of nonli-

near non-affine systems,” Proc. of the 47th IEEE

Conference on Decision and Control, Mexico,

2008.

[8] P. He and S. Jagannathan, “Reinforcement learning

neural-network-based controller for nonlinear dis-

crete-time systems with input constraints,” IEEE

Trans. on Systems, Man, and Cybernetics—Part B:

Cybernetics, vol. 37, no. 2, pp. 425-436, April 2007.

Page 7: Wavelet reduced order observer based adaptive tracking control for a class of uncertain nonlinear systems using reinforcement learning

Manish Sharma and Ajay Verma

502

[9] W. S. Lin, L. H. Chang, and P. C. Yang, “Adaptive

critic anti-slip control of wheeled autonomous ro-

bot,” IET Control Theory Applications, vol. 1, no. 1,

January 2007.

[10] L. G. Crespo, “Optimal performance, robustness

and reliability base designs of systems with struc-

tured uncertainty,” Proc. of American Control Con-

ference, pp. 4219-4224, USA, Colorado, 2003.

[11] J. Peters and S. Schaal, “Policy gradient methods

for robotics,” Proc. of the IEEE International Con-

ference on Intelligent Robotics Systems, pp. 2219-

2225, 2006.

[12] J. J. Murray, C. Cox, G. G. Lendaris, and R. Saeks,

“Adaptive dynamic programming,” IEEE Trans. on

Syst., Man, Cybern., vol. 32, no. 2, pp. 140-153,

May 2002.

[13] D. V. Prokhorov and D. C. Wunsch, “Adaptive

critic designs,” IEEE Trans. on Neural Networks,

vol. 8, no. 5, pp. 997-1007, September 1997.

[14] H. V. Hasselt and M. Wiering, “Reinforcement

learning in continuous action spaces,” Proc. of

IEEE Symposium on Approximate Dynamic Pro-

gramming and Reinforcement Learning, pp. 272-

279, 2007.

[15] Q. Zhang and A. Benveniste, “Wavelet networks,”

IEEE Trans. on Neural Networks, vol. 3, no. 6, pp.

889-898, November 1992.

[16] J. Zhang, G. G. Walter, Y. Miao, and. W. Lee,

“Wavelet neural networks for function learning,”

IEEE Trans. on Signal Processing, vol. 43, no. 6,

pp. 1485-1497, June 1995.

[17] B. Delyon, A. Juditsky, and A. Benveniste, “Accu-

racy analysis for wavelet approximations,” IEEE

Trans. on Neural Networks, vol. 6, no. 2, pp. 332-

348, March 1995.

[18] W. Sun, Y. Wang, and J. Mao, “Wavelet network

for identifying the model of robot manipulator,”

Proc. of the 4th World Congress on Intelligent

Control and Automation, pp. 1634-1638, China,

June 2002.

Manish Sharma is pursuing a Ph.D.

degree in Electronics and Telecommuni-

cation Engineering from Devi Ahilya

University. His research interests include

nonlinear adaptive control, wavelet neur-

al network, observer based control and

system identification.

Ajay Verma received his Ph.D. degree in

Electronics and Telecommunication En-

gineering from Devi Ahilya University.

His research interests include nonlinear

dynamics and system theory, neural net-

works and nonlinear control system.