Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2013 Article ID 387817 16 pageshttpdxdoiorg1011552013387817
Research ArticleDecentralized Reinforcement Learning Robust OptimalTracking Control for Time Varying Constrained ReconfigurableModular Robot Based on ACI and 119876-Function
Bo Dong1 and Yuanchun Li2
1 Department of Communication Engineering Jilin University Changchun 130022 China2Department of Control Engineering Changchun University of Technology Changchun 130012 China
Correspondence should be addressed to Yuanchun Li yuanchunjlueducn
Received 20 August 2013 Revised 13 November 2013 Accepted 13 November 2013
Academic Editor M Onder Efe
Copyright copy 2013 B Dong and Y Li This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
A novel decentralized reinforcement learning robust optimal tracking control theory for time varying constrained reconfigurablemodular robots based on action-critic-identifier (ACI) and state-action value function (119876-function) has been presented to solve theproblem of the continuous time nonlinear optimal control policy for strongly coupled uncertainty robotic system The dynamicsof time varying constrained reconfigurable modular robot is described as a synthesis of interconnected subsystem and continuoustime state equation and119876-function have been designed in this paper CombiningwithACI and RBF network the global uncertaintyof the subsystem and the HJB (Hamilton-Jacobi-Bellman) equation have been estimated where critic-NN and action-NN are usedto approximate the optimal119876-function and the optimal control policy and the identifier is adopted to identify the global uncertaintyas well as RBF-NN which is used to update the weights of ACI-NN On this basis a novel decentralized robust optimal trackingcontroller of the subsystem is proposed so that the subsystem can track the desired trajectory and the tracking error can convergeto zero in a finite time The stability of ACI and the robust optimal tracking controller are confirmed by Lyapunov theory Finallycomparative simulation examples are presented to illustrate the effectiveness of the proposed ACI and decentralized control theory
1 Introduction
Reconfigurable modular robot could transform its config-uration depending on the different external situations andthe requirements of the tasks According to the concept ofmodular design and the decentralized control theory of thesubsystem reconfigurable modular robot can complete thetask by changing its structure efficiently in different situa-tions without redesigning the control law At the same timereconfigurable modular robot possessed a good accuracy andflexibility
Many scholars have studied the dynamics and the controlmethod of reconfigurable modular robot A novel VGSTA-ESO based decentralized ADRC control method for recon-figurable modular robot has been proposed in [1] Throughdesigning a high-precision VGSTA-ESO to estimate thedynamic model nonlinear terms and the interconnectionterms of the subsystem the joint trajectory tracking control
is implemented Based on calculating torque a robust fuzzyneural network controller is proposed in [2] which is usedto solve the problem of model uncertainty in the process ofmodel generating In [3] it shows a decentralized adaptivefuzzy sliding mode control method of reconfigurable modu-lar robot The fuzzy logic system is used to approximate theunknown dynamics of the subsystem and a sliding modecontroller with an adaptive scheme is designed to avoid boththe interconnection term and the fuzzy approximation errorA decentralized adaptive neural network control algorithmfor reconfigurable manipulators is proposed in [4] wherethe neural networks are used to approximate the unknowndynamic functions and interconnections in the subsystemby using the adaptive algorithm A new distributed controlmethod is proposed in [5] which uses a decompositionalgorithm to decompose the robot dynamic system intoa number of dynamical systems and the adaptive slidingmode controller is designed to offset the impact of model
2 Mathematical Problems in Engineering
uncertainty An observer based decentralized adaptive fuzzycontroller for reconfigurable manipulator is proposed in [6]by designing the state observer the adaptive fuzzy systemswhich are used to model the unknown dynamics of thesubsystem and the interconnection term can be constructedby using the state estimations Nevertheless the require-ment of the dynamics of the reconfigurable modular robotsystem is hard to be satisfied either fully or even partiallyknow Moreover because of there are strong coupling modeluncertainties and interconnection terms of subsystems in thereconfigurable modular robot system besides the processingload on the controller would be increased the fact that thegreater time delay and calculation error are easy to produceso that it is too complicated to design the controllers by usingthe methods and algorithms above
In recent years as one of the most effective methodsto solve the control problems for continuous time whichstrongly coupled with nonlinear system the reinforcementlearning algorithm has received extensive attention fromscholars Reinforcement learning [7 8] is a kind of learningmethod mapping situations to actions so as to maximize anumerical reward signal Compared with supervised learn-ing reinforcement learning does not need to predict thementor signal in various states but learns in the processof interaction with the situation Because of its adaptiveoptimization capability in the nonlinear model under thecondition of uncertainty reinforcement learning has a uniqueadvantage to solve the problems of optimization strategiesand the control method in terms of the complex models [9ndash11] Zhang and his team presented an infinite time optimaltracking control scheme for discrete-time nonlinear systemvia the greedy HDP iteration algorithm [12ndash14] Accordingto the system transformation the optimal tracking problemis transformed into an optimal regulation problem and thegreedyHDP iteration algorithm is introduced to deal with theregulation problem with the rigorous convergence analysisThen a data-driven robust approximate optimal trackingcontrol is proposed by using the adaptive dynamic program-ming as well as a data-drivenmodel which is established by arecurrent neural network to reconstruct the unknown systemdynamics by using available input-output date [15] After thisthey design a fuzzy critic estimator which is used to estimatethe value function for nonlinear continuous-time system[16] On this basis a synchronization problem for an arrayof neural networks with hybrid coupling and interval timevarying delay is concerned with an augmented Lyapunov-Krasovskii functional method [17] The FRL scheme onlyusing immediate reward and sufficient conditions is adoptedto analyze the convergence of the optimal task performanceBhasin presents a neural network control of a robot inter-acting with an uncertain viscoelastic environment [18] In[18] a continuous controller is developed for a robot thatmoved in free space and then regulated the new coupleddynamic system to a desired setpoint Khan et al presentan implementation of a model-free Q-learning based onthe discrete model reference compliance controller for ahumanoid robot arm [19] where reinforcement learningscheme uses a recently developed Q-learning scheme todevelop an optimal policy online Patchaikani et al propose
an adaptive critic-based real-time redundancy resolutionscheme for kinematic control of the redundant manipulator[20] The kinematic control of the redundant manipulatoris formulated as a discrete-time input affine system andthen an optimal real-time redundancy resolution scheme isproposed Although the research of reinforcement learningalgorithm has been rapidly developed in recent years thereare still some deficiencies For example when there aremultiple subsystems in the global system the methods abovecould not handle the impacts of the interconnection termsbetween the subsystems Meanwhile the methods above aremostly used to solve the learning and optimization problemsof the system itself but when the external constraints exist inthe system thesemethods are no longer applicableThereforeit is an urgent issue to solve the problem of how to designa kind of robust reinforcement learning optimal controlmethod in the case of the external constraints existing andmultiple subsystems coupled in the system are the urgentproblems to be solved
In this paper we presented a novel continuous timedecentralized reinforcement learning robust optimal trackingcontrol theory for the time varying constrained reconfig-urable modular robot Combining with ACI and RBF-NNthe critic-NN is used to estimate the optimal119876-function theaction-NN is proposed to approximate the optimal controlpolicy and then the identifier is adopted to identify theglobal uncertainty so that the HJB equation can be estimatedand the estimation error is bounded and converged Firstlysince the decentralized control method is adopted in thispaper whichmeans that each joint subsystemowns a separatecontroller thus the processing loads of the controllers arereduced greatly Secondly due to the fact that the timevarying constraints can be compensated in the subsystemstherefore the proposedmethod in this paper is suitable for thereconfigurablemodular robot in the time varying constrainedoutside environment Thirdly the proposed control methodcould compensate for the impacts of the model uncertaintiesand the interconnection terms on the system so that it canmake the subsystems track the desired trajectories and thetracking error can converge to zero in finite time
2 Problem Formulation
Assume that the time varying external constraints for the endof reconfigurable modular robot is shown as
Ψ (119902 119905) = 0 (1)
Here 119902 isin 119877119899 is the vector of joint displacements Function
Ψ 119877119899
rarr 119877119898 119898 is the dimension of the external limiting
conditions With the time varying constraints the dynamicsof a reconfigurablemodular robot can be presented as follows
119872(119902) 119902 + 119862 (119902 119902) 119902 + 119866 (119902) + 119865 (119902 119902) = 119906 + 119869119879
Ψ(119902 119905) 119891 (2)
119872(119902) isin 119877119899times119899 is the inertia matrix 119862(119902 119902) isin 119877
119899 is theCoriolis and centripetal force 119866(119902) isin 119877
119899 is the gravity term119865(119902 119902) is the unmodeled dynamics including friction termsand external disturbances 119906 isin 119877
119899 is the applied joint torque
Mathematical Problems in Engineering 3
and 119869119879
Ψ(119902 119905)119891 is the contact force generated by the contact
of the end of the reconfigurable modular robot and externalconstraints
After introducing 119898th constraints for the robot whichworks in the free space because of the limitation of (1) thesystem lost 119898th degrees of freedom Therefore the degreesof freedom of the robot change from 119899 to (119899 minus 119898) so thatonly (119899 minus 119898) independent joint displacements are needed todescribe the system of restricted movement fully
Define
119902 = [1199021
1199022
] 1199021isin 119877
119899minus119898
1199022isin 119877
119898
(3)
Putting the equation above into (1) then we can get that
Ψ (1199021 Θ (119902
1 119905) 119905) = 0 (4)
where
1199022= Θ (119902
1 119905) (5)
Therefore (3) can be described by joint displacement 1199021
fully shown as follows
119902 = [1199021
Θ(1199021 119905)
] (6)
The derivation of (6) is
119902 = [
[
1199021
120597Θ (1199021 119905)
1205971199021
1199021+120597Θ (119902
1 119905)
120597119905
]
]
= [
[
119868119899minus119898
0
120597Θ (1199021 119905)
1205971199021
119868119898
]
]
[1199021
0] + [
[
0
120597Θ (1199021 119905)
120597119905
]
]
= 119879 120579 + 119867
(7)
In (7)
119879 = [
[
119868119899minus119898
0
120597Θ (1199021 119905)
1205971199021
119868119898
]
]
isin 119877119899times119899
120579 = [1199021
0] isin 119877
119899
119867 = [
[
0
120597Θ (1199021 119905)
120597119905
]
]
isin 119877119899
(8)
Therefore the second derivation of 119902 can be achieved easilyas
119902 = 119879 120579 + 120579 + (9)
Putting (7) and (9) into (2) we can get
119906 + 119869119879
Ψ(119902 119905) 119891 = 119872(119902) (119879 120579 + 120579 + )
+ 119862 (119902 119902) (119879 120579 + ) + 119866 (119902) + 119865 (119902 119902)
(10)
Define
119864 = [119868(119899minus119898)times(119899minus119898)
0119898times(119899minus119898)
] isin 119877119899times(119899minus119898)
(11)
Therefore
120579 = [1199021
0] = 119864119902
1(12)
So (2) can be decomposed into the following form
119899
sum
119895=1
119872119894119895(119902) [(119879119864 119902
1)119895+ (119864 119902
1)119895
+ 119895]
+
119899
sum
119895=1
119862119894119895(119902 119902) [(119879119864 119902
1)119895+ 119867
119895] + 119866
119894(119902)
+ 119865119894(119902
119894 119902
119894) minus 119891
119894= 119906
119894
(13)
In the equation above (119879119864 1199021)119895 (119864 119902
1)119895 (119879119864 119902
1)119895 and 119867
119895
are the 119895th element of (119879119864 1199021) (119864 119902
1) (119879119864 119902
1) and 119867
respectively 119866119894(119902) 119865
119894(119902
119894 119902
119894) and 119906
119894are the 119894th element of
119866(119902) 119865(119902 119902) and 119906 119891119894is the constraint force which suffered
by the 119894th joint 119872119894119895(119902) and 119862
119894119895(119902 119902) are the 119894119895th element of
119872(119902) and 119862(119902 119902) respectively So as shown in Figure 1 eachsubsystem dynamical model can be formulated in joint spaceas follows
119872119894(119902
119894) 119902
119894+ 119862
119894(119902
119894 119902
119894) 119902
119894+ 119866
119894(119902
119894) + 119865
119894(119902
119894 119902
119894) + 119885
119894(119902 119902 119902)
= 119906119894
(14)
119885119894(119902 119902 119902) =
119899
sum
119895=1
119895 = 119894
119872119894119895(119902) [(119879119864 119902
1)119895+ (119864 119902
1)119895
+ 119895]
+119872119894119894(119902) [(119879119864 119902
1)119894+ (119864 119902
1)119894
+ 119894]
minus119872119894(119902
119894) 119902
119894+
119899
sum
119895=1
119895 = 119894
119862119894119895(119902 119902) [(119879119864 119902
1)119895+ 119867
119895]
+ 119862119894119894(119902 119902) [(119879119864 119902
1)119895+ 119867
119895]
minus 119862119894(119902
119894 119902
119894) 119902
119894+ [119866
119894(119902) minus 119866
119894(119902
119894)]
(15)
Let 119909119894= [119909
1198941 119909
1198942]119879
= [119902119894 119902
119894]119879 for 119894 = 1 119899 then (10) can be
presented by the following state equation
119878119894
1198941= 119909
1198942
1198942= minus119891 (119909
119894 119906
119894) minus ℎ
119894(119902 119902 119902) minus 119891
119894
119910119894= 119909
1198941
(16)
4 Mathematical Problems in Engineering
Zi(q q q)
Zn(q q q)
Z1(q q q)
M1q1 + C1q1 + G1 + F1 + Z1 minus f1 = u1
Miqi + Ciqi + Gi + Fi + Zi minus fi = ui
Mnqn+ Cnqn+ Gn+ Fn+ Znminus fn= un
q1
qq1
q
qn
qi
qn
qi
q
u1
un
uiu
sum
sum
sum
minus
minus
minus
Subsystem n
Subsystem i
Subsystem 1
q=
(uminusCqminusGminusF+
(qt)f
)M
minus1
JT Ψ
Figure 1 The architecture of the time varying constrained reconfigurable modular robot system
where 119909119894is the state vector of subsystem 119878
119894 119910
119894is the output
of subsystem 119878119894 and ℎ
119894(119902 119902 119902) is the interconnection term of
the subsystem 119891(119909119894 119906
119894) and ℎ
119894(119902 119902 119902) can be defined as
119891 (119909119894 119906
119894) = 119872
minus1
119894(119902
119894) [
119862119894(119902
119894 119902
119894) 119902
119894+ 119866
119894(119902
119894)
+119865119894(119902
119894 119902
119894) minus 119906
119894
]
ℎ119894(119902 119902 119902) = minus119872
minus1
119894(119902
119894) 119885
119894(119902 119902 119902)
(17)
In response to the time varying constrained reconfig-urable modular robot system we need to design a decen-tralized robust optimal tracking control policy to make thesubsystem track the desired trajectory as well as the trackingerror is converged and bounded
3 Decentralized Reinforcement LearningRobust Optimal Tracking Control Basedon ACI and 119876-Function
Assumption 1 Desired trajectory 119910119894119889 119910
119894119889 119910
119894119889and input gain
matrix 119887119894(119909
119894) are bounded
Then (16) can be transformed to the below Consider
119878119894
1198941= 119909
1198942
1198942= minus [119865 (119909
119894 119906
119894) + ℎ
119894(119902 119902 119902) + 119891
119894] + 119887
119894(119909
119894) 119906
119894
119910119894= 119909
1198941
(18)
where 119865(119909119894 119906
119894) = 119891(119909
119894 119906
119894) + 119887
119894(119909
119894)119906
119894
Assumption 2 The interconnection terms are bounded sat-isfying the following equation
1003816100381610038161003816ℎ119894 (119902 119902 119902)1003816100381610038161003816 le 120575
1198940+
119899
sum
119895=1
120575119894119895(10038161003816100381610038161003816119904119894119895
10038161003816100381610038161003816) (19)
where 1205751198940gt 0 is an unknown constant and 120575
119894119895(|119904
119894119895|) ge 0 is an
unknown smooth Lipschitz functionThe trajectory tracking error of the joint subsystem 119888 can
be defined as
119890119894(119905) = 119909
119894minus 119910
119894119889 (20)
With regard to the continuous time state equation ofthe subsystem in (18) with the nonlinear function andinterconnection terms generally the value function can bedefined as
119881119906119894(119890119894(119905))
119894(119890
119894(119905)) = int
infin
0
119903119894(119890
119894(119905) 119906
119894(119890
119894(119905))) 119889119905 (21)
In order to facilitate the equation we use 119890119894 119906
119894instead of
119890119894(119905) 119906
119894(119890
119894(119905)) Since the trajectory 119910
119894119889relies upon the control
of the subsystem 119906119894for updating in order to avoid the infinity
results by using (21) we need to transform the value functioninto the following form
119881119906119894
119894(119890
119894) = int
infin
0
119903119894(119890
119894(120591) 119906
119894(119890
119894(120591))) 119889120591 119905 le 120591 lt infin (22)
Thus the optimal value function of the subsystem can bedefined as follows
119881lowast
119894(119890
119894) = min
119906119894
119905le120591ltinfin
int
infin
0
119903119894(119890
119894(120591) 119906
119894(119890
119894(120591))) 119889120591 (23)
Mathematical Problems in Engineering 5
Here 119903119894(119890
119894 119906
119894) represents the reward function for the current
state shown as
119903119894(119890
119894 119906
119894) = 119890
119879
119894119876119890119890119894+ 119906
119879
119894119877119906
119894 (24)
where 119876119890and 119877 are the positive definite matrixes
Typically recording the value of state-action pairs is moreuseful than recording the value of state only since the state-action pairs are the predictions of the reward Even if thereward value of a state is low it does not mean that the valueof state-action pairs is low too If the state of the subsystemin a period time produces a higher reward then it can stillget a higher state-action value Therefore from a long termperspective defining a suitable state-action value function(119876-function) can make actions produce more rewards [2122]
According to (23) and (24) the continuous-time optimal119876-function can be defined as
119876lowast
119894(119890
119894 119886
119894 119906
119894) = 119903
119894(119890
119894 119886
119894 119906
119894) + 119881
lowast
119894(119890
119894 119906
119894)
= 119903119894(119890
119894 119886
119894 119906
119894)
+ min119906119894
119905le120591ltinfin
int
infin
0
119903119894(119890
119894(120591) 119906
119894(119890
119894(120591))) 119889120591
(25)
Assumption 3 The partial derivation of 119876lowast
119894and 119903
119894(119890
119894 119886
119894 119906
119894)
exist and they are continuous in the domain According to(18) and (24) by using the control policy 119906
119894 the optimal
119876-function can satisfy the following Hamiltonian-Jacobi-Bellman equation [23]
HJB119894(119890
119894 119906
119894 nabla119876
lowast
119894)
= min119906119894(119890119894)
[119903119894(119890
119894 119886
119894 119906
119894)
+nabla119876lowast
119894(minus119865 (119890
119894 119906
119894) minus ℎ
119894(119890 119890 119890) minus 119891
119894+ 119887
119894(119890
119894) 119906
119894)]
= min119906119894(119890119894)
[119903119894(119890
119894 119886
119894 119906
119894) + nabla119876
lowast
119894Φ
119894(119890
119894 119906
119894)]
(26)
whereΦ119894(119890
119894 119906
119894) = minus119865(119890
119894 119906
119894)minusℎ
119894(119890 119890 119890)minus119891
119894+119887
119894(119890
119894)119906
119894means the
global uncertainty including the unknown dynamics of thesubsystem and the interconnection term andnabla119876lowast
119894= 120597119876
lowast
119894120597119890
119894
means the gradient of the optimal 119876-function
Lemma 4 (see [24]) Considering dynamics of the subsystemof time varying constrained reconfigurable modular robot in(14) in order to ensure the minimum of the HJB equation (26)possessing the stationary point with respect to 119906
119894 the optimal
119876-function and the optimal control policy must satisfy thefollowing conditions
(1) 120597119867119869119861(119890119894 119906
119894 nabla119876
119894)120597119906
119894= 0
(2) 1205972119867119869119861119894(119890
119894 119906
119894 nabla119876
119894)(120597119906
119894times 120597119906
119879
119894) ge 0
The necessary conditions above lead us to the followingresults
(a) The bounded control policy can guarantee a localminimum of the HJB equation (26) and satisfy theconstraints imposed on the control inputs
(b) The Hessian matrix is positive-definite and the controlpolice 119906
119894can render the global minimum of the HJB
equation(c) If an optimal algorithm exists it is unique
According to Lemma 4 if the reward function is smoothand the optimal control 119906lowast
119894is adopted then the HJB equation
satisfies the following equation
HJBlowast
119894(119890
119894 119906
lowast
119894 nabla119876
lowast
119894) = min
119906lowast
119894(119890119894)
[119903119894(119890
119894 119886
119894 119906
lowast
119894) + nabla119876
lowast
119894Φ
119894(119890
119894 119906
lowast
119894)]
= 0
(27)
And the optimal control can be expressed as follows
119906lowast
119894(119890
119894) = arg
119906lowast
119894
min [HJBlowast
119894(119890
119894 119906
lowast
119894 nabla119876
lowast
119894)]
=1
2119877minus1
119887119879
119894(119890
119894)120597119876
lowast
119894(119890
119894 119886
119894 119906
119894)119879
120597119890119894
(28)
If the optimal 119876-function 119876lowast
119894is continuous derivable
and known and the initial value119876lowast
119894(0) = 0 as well as the opti-
mal control policy 119906lowast
119894(119890
119894) and the global uncertainty of the
subsystemΦ119894(119890
119894 119906
lowast
119894) is known then the HJB equation in (27)
is held and solvableHowever in the actual situation119876lowast
119894is not
derivable everywhere and 119906lowast119894(119890
119894) andΦ
119894(119890
119894 119906
lowast
119894) are unknown
Therefore it is not feasible to solve the HJB equation byusing average method In this paper we combine the action-critic identifier (ACI) with RBF neural network to estimatethe optimal control policy the optimal 119876-function and theglobal uncertainty of the subsystem Action-NN is used toestimate 119906
lowast
119894(119890
119894) and is denoted as
119894(119890
119894) 119876lowast
119894is estimated
by critic-NN and expressed as 119876119894 then we use the robust
neural network identifier to identify Φ119894(119890
119894 119906
lowast
119894) denoted as
Φ119894(119890
119894 119906
lowast
119894)Theblock diagramof theACI architecture is shown
in Figure 2The estimated HJB equation can be expressed as follows
HJBlowast
119894(119890
119894
119894 nabla119876
119894) = min
119906119894(119890119894)[119903
119894(119890
119894 119886
119894
119894) + nabla119876
119894Φ
119894(119890
119894
119894)]
(29)
The identification error of the HJB equation above can beexpressed as
120575ℎ119894= HJBlowast
119894(119890
119894
119894 nabla119876
119894) minusHJBlowast
119894(119890
119894 119906
lowast
119894 nabla119876
lowast
119894) (30)
A classic radial basis function of the neural network isproposed in [25] shown as (31)
119873(119909) = 119882lowast119879
119878 (119909) + 120576 (119909) (31)
6 Mathematical Problems in Engineering
Action
Rewardfunction
HJB
error
Identifier
Subsystem
Critic
minus+
Qi(ei ai ui)Qi(ei ai ui)
ri(ei ai ui)
ri(ei ai ui)
Φi(ei ui)
Φi(ei ui)
Φi(ei )
eiF(t)
ui
ui
(t)
120575hi
1s
Figure 2 The architecture of action-critic-identifier
where 119882lowast means the ideal neural network weights and 120576(119909)
represents the estimation error In the case of using sufficientnumber of nodes if the center and width of the nodes arebuilt appropriately then any kind of continuous functioncould be approximated by RBF-NN Therefore the optimal119876-function and the optimal control policy can be expressedas follows
119876lowast
119894= 119882
119879
119894119878119894(119890
119894) + 120576
119894119888(119890
119894)
119906lowast
119894(119890
119894) = minus
1
2119877minus1
119887119879
119894(119890
119894) [ 119878
119894(119890
119894)119879
119882119894+ 120576
119894119886(119890
119894)]
(32)
where 119878119894(119890
119894) = [119904
1198941(119890
119894) sdot sdot sdot 119904
119894119899(119890
119894)]119879 indicates the smooth
basis function of the neural network 119882119894means the ideal
unknown neural network weight and 120576119894119888(119890
119894) and 120576
119894119886(119890
119894) are
the estimation error By using 119876119894and
119894(119890
119894) to estimate 119876
lowast
119894
and 119906lowast
119894(119890
119894) we can get the following equations
119876119894=
119879
119894119888119878119894119888(119890
119894) (33)
119894(119890
119894) = minus
1
2119877minus1
119887119879
119894(119890
119894) 119878
119894119886(119890
119894)119879
119894119886 (34)
According to the equations above 119894119888(119905) and
119894119886(119905) indicated
the weights of critic-NN and action-NN And the estimationerrors of weights are shown as follows
119894119888(119905) = 119882
119894minus
119894119888(119905) (35)
119894119886(119905) = 119882
119894minus
119894119886(119905) (36)
The update law of the weight for the critic-NN is a gradientdescent algorithm which is shown as follows
119882
119894119888(119905) = minus119899
1119871119894(119871
119879
119894
119894119888+ 119890
119879
119894119876119890119890119894+ 119906
119879
119894119877119906
119894) (37)
In the equation above 119899119894gt 0 is the adaptive gain of the neural
network 119871119894and 119897
119894are defined as
119871119894=
119897119894
119897119879119894119897119894+ 1
119897119894= nabla119878
119894119888(119890
119894) 119890
119894
(38)
Therefore according to the definition above the followinginequalities can be obtained
119871119894119898
le 119871119894le 119871
119894119872
119878119894119888119898
le 119878119894119888(119890
119894) le 119878
119894119888119872
119878119894119886119898
le 119878119894119886(119890
119894) le 119878
119894119886119872
(39)
Mathematical Problems in Engineering 7
Combining (35) with (38) we can get that
119882119894119888(119905) = minus119899
1119871119894(119871
119879
119894
119894119888+ 120575
ℎ119894) (40)
The update law of the weight for the action-NN is developedby a gradient descent algorithm expressed as follows
119882
119894119886(119905) = minus119899
2119878119894119886(119890
119894)
times ((119879
119894119886119878119894119886(119890
119894) +
1
2119877minus1
119887119894(119890
119894) nabla119878
119894119888(119890
119894)119879
119894119888))
119879
(41)
According to the estimation error of action-NN in (36) theoptimal control 119906lowast
119894(119890
119894) can minimize the optimal119876-function
and we can get the following equation
119882119879
119894119886119878119894119886(119890
119894) +
1
2119877minus1
119887119894(119890
119894) nabla119878
119894119888(119890
119894)119879
119882119894119888
+1
2119877minus1
119887119894(119890
119894) nabla120576
119894119888(119890
119894) + 120576
119894119886(119890
119894) = 0
(42)
Putting (41) into (42) we can get that
119882119894119886
= minus1198992119878119894119886(119890
119894)(
119879
119894119886119878119894119886(119890
119894) +
1
2119877minus1
119887119894(119890
119894) nabla119878
119894119888(119890
119894)119879
119894119888
minus1
2119877minus1
119887119894(119890
119894) nabla120576
119894119888(119890
119894) minus 120576
119894119886(119890
119894)
)
(43)
After using critic-NN and action-NN to estimate 119876119894and
119894(119890
119894) we need to design a kind of robust RBF-NN identifier
to identify the nonlinear uncertainties of the subsystem HereΦ
119894(119890
119894
119894) can be expressed as follows
Φ119894(119890
119894
119894) = 119890
119894119865= 119882
119879
119894119865120581 (Λ
119879
119894119865119890119894119865) + 120576
119894119865(119890
119894119865) + 119887
119894(119890
119894)
119894
(44)
where 120581(sdot) means the basic function of neural network and119882
119894119865Λ
119894119865indicate the unknown ideal neural network weights
Equation (44) can be identified by using robust RBF-NNidentifier so we can get
Φ119894(119890
119894
119894) = 119890
119894119865= 119882
119879
119894119865120581119894119865+ 119887
119894(119890
119894)
119894+ 120583
119894 (45)
Here 120581119894119865indicates the estimated value of the basic function of
the neural network 119882119894119865 Λ
119894119865are expressed as the estimated
value of neural network 120583119894isin R means the feedback error
term shown as follows [26]120583119894= 119896 (119890
119894119865(119905) minus 119890
119894119865(119905)) minus 119896 (119890
119894119865(0) minus 119890
119894119865(0)) + 120599
= 119896 (119890119894119865(119905) minus 119890
119894119865(0)) + 120599
120599 = (119896120572 + 120574) 119890119894119865+ 120573
1sat (119890
119894119865)
(46)
where 119896 120572 1205731 and 120574 are the positive control gain constants
and sat(sdot) is a saturation functionTherefore the state estima-tion error of the identifier-NN can be expressed as follows
119890119894119865= 119890
119894119865minus 119890
119894119865
= 119882119879
119894119865120581119894119865minus
119879
119894119865120581119894119865+ 120576
119894119865(119890
119894119865) minus 120583
119894
(47)
A filtered identification error is defined as follows
119864119894= 119890
119894119865+ 120572119890
119894119865 (48)
The derivation of the equation above is shown as
119894= 119882
119879
119894119865119894119865Λ119879
119894119865119890119894119865minus
119879
119894119865
120581119894119865
Λ119879
119894119865119890119894119865+ 120572 119890
119894119865minus 119896119864
119894minus 120574119890
119894119865
minus
119882119879
119894119865120581119894119865+ 120576
119894119865(119890
119894119865) minus 120573
1sat (119890
119894119865) minus
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865
(49)
Here the weight 119882119894119865 Λ
119894119865of the identification-NN can be
updated by
119882
119894119865= proj (Γ
119894119882119865
120581119894119865Λ119879
119894119865
119890119894119865119890119879
119894119865)
Λ119894119865= proj (Γ
119894Λ119865
119890119894119865119890119879
119894119865
119879
119894119865
120581119894119865)
(50)
where Γ119894119882119865
Γ119894Λ119865
are positive constant adaptation gain matri-ces In order to analyze the convergence of the filteredidentification error 119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865
can be divided into thefollowing form
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865
=1
2120581119894119865
119890119894119865[(Λ
119879
119894119865minus Λ
119879
119894119865) (119882
119879
119894119865minus
119879
119894119865)
+ (119882119879
119894119865minus
119879
119894119865) (Λ
119879
119894119865minus Λ
119879
119894119865)]
=1
2120581119894119865
119890119894119865[
119879
119894119865(Λ
119879
119894119865minus Λ
119879
119894119865) + (119882
119879
119894119865minus
119879
119894119865) Λ
119879
119894119865
minus119882119879
119894119865(Λ
119879
119894119865minus Λ
119879
119894119865) minus (119882
119879
119894119865minus
119879
119894119865)Λ
119879
119894119865
]
=1
2120581119894119865
119890119894119865[
119879
119894119865(Λ
119879
119894119865minus Λ
119879
119894119865) + (119882
119879
119894119865minus
119879
119894119865) Λ
119879
119894119865]
minus1
2120581119894119865
119890119894119865[119882
119879
119894119865(Λ
119879
119894119865minus Λ
119879
119894119865) + (119882
119879
119894119865minus
119879
119894119865)Λ
119879
119894119865]
=1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865
minus1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865minus1
2
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865
+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865
(51)
where 119879
119894119865= 119882
119879
119894119865minus
119879
119894119865 Λ119879
119894119865= Λ
119879
119894119865minus Λ
119879
119894119865 Putting (51) into
(49) then (49) can be reduced to the following form
119894= 119875
1198651+ 119875
1198652+ 119875
1198653minus 119896119864
119894minus 120574119890
119894119865minus 120573
1sat (119890
119894119865) (52)
8 Mathematical Problems in Engineering
Among the equations above 1198751198651+119875
1198652+119875
1198653can be expressed
respectively as follows
1198751198651
=1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865minus
119879
119894119865
120581119894119865
Λ119879
119894119865119890119894119865
+ 120572 119890119894119865minus
119879
119894119865120581119894119865
(53)
1198751198652
= minus1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865minus1
2
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865
+119882119879
119894119865119894119865Λ119879
119894119865119890119894119865+ 120576
119894119865(119890
119894119865)
(54)
1198751198653
=1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865 (55)
According to Assumption 1 (48) and (50) the upper boundsof 119875
1198651 119875
1198652 119875
1198653are shown as
100381710038171003817100381711987511986511003817100381710038171003817 le 119869
1(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
100381710038171003817100381711987511986521003817100381710038171003817 le 120589
1
100381710038171003817100381711987511986531003817100381710038171003817 le 120589
2
(56)
Combining (53) and (54) with (55) then we can get that100381710038171003817100381710038171198652
+ 1198653
10038171003817100381710038171003817le 120589
3+ 120589
41198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817 (57)
where 120593119894(119890
119879
119894119865 119864
119879
119894) = [119890
119879
119894119865119864119879
119894]119879 and 119869
119894(sdot) is a global invertible
nondecreasing function 120589119894 (119894 = 1 2 3 4) are computable
positive constants
Theorem 5 Considering dynamics of the subsystem of timevarying constrained reconfigurable modular robot in (14) andthe state equation (18) if the designed identifier and thecorresponding weight update laws are adopted then the globaluncertainty of the subsystem which depends explicitly on theerror term can be identified and the identification error isconverged and bounded
Proof Define the Lyapunov function as the follows
119881119894119871(119890
119894119865 119864
119894) =
1
2119864119879
119894119864119894+1
2120574119890
119879
119894119865119890119894119865+ 120603
119894(119905) + 120601
119894(119905) (58)
In the equation above 120603119894(119905) and 120601
119894(119905) can be expressed as
follows
119894(119905) = minus[
119864119879
119894(119875
1198652minus 120573
1sat (119890
119894119865)) + 119890
119879
1198941198651198751198653
minus12057321198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)100381710038171003817100381710038171003817100381710038171003817119890119894119865
1003817100381710038171003817
]
120603119894(0) = 120573
1
1003816100381610038161003816119890119894119865 (0)1003816100381610038161003816 minus 119890
119879
119894119865(0) (119875
1198652(0) + 119875
1198653(0))
(59)
120601119894(119905) =
1
4120572 [ tr (
119879
119894119865Γminus1
119894119882119865
119894119865) + tr (Λ
119879
119894119865Γminus1
119894Λ119865Λ
119894119865)] (60)
where tr(sdot) represents the trace of matrix Defining 119889 =
[119864119879
119894119890119879
11989411986512060312
11989412060112
119894] 120573
1 120573
2isin R are positive adaptation gains
which are chosen to ensure 120603119894(119905) ge 0 so we can get
1198801(119889) le 119881
119894119871(119890
119894119865 119864
119894) le 119880
2(119889) (61)
where
1198801(119889) =
1
2min (1 120574) 1198892
1198802(119889) = max (1 120574) 1198892
(62)
The derivation of (58) is shown as follows
119894119871(119890
119894119865 119864
119894) = nabla119881
119879
119894119871119870[
119894
119890119879
119894119865
1
212060312
119894119894
1
212060112
119894
120601119894]119879
(63)
where119870[sdot] is expressed as a Filipov set [27]So
119894119871(119890
119894119865 119864
119894) can be deformed as the following form
119894119871(119890
119894119865 119864
119894)
= [119864119879
119894120574119890
119879
1198941198652120603
12
1198942120601
12
119894]119870[
119894
119890119879
119894119865
1
212060312
119894119894
1
212060112
119894
120601119894]119879
le 120574119879
(
1
2
119882119879
119894119865
119894119865
Λ119879
119894119865
119894119865+
1
2
119879
119894119865
119894119865Λ119879
119894119865
119894119865minus
119879
119894119865
119894119865
Λ
119879
119894119865119890119894119865
+120572119894119865minus
1
2
119882119879
119894119865
119894119865
Λ119879
119894119865119890119894119865minus
1
2
119879
119894119865
119894119865Λ119879
119894119865119890119894119865minus 120574119890
119894119865
minus119879
119894119865120581119894119865+119882
119879
119894119865119894119865Λ119879
119894119865119890119894119865+ 120576
119894119865(119890
119894119865) +
1
2
119879
119894119865
119894119865
Λ119879
119894119865
119894119865
+
1
2
119879
119894119865
119894119865
Λ119879
119894119865
119894119865minus 119896119864
119894minus 120573
1119870[sat (119890
119894119865)]
)
minus119864119879
119894(
minus1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865minus1
2
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865
+119882119879
119894119865119894119865Λ119879
119894119865119890119894119865+ 120576
119894119865(119890
119894) minus 120573
1119870[sat (119890
119894119865)])
minus 119890119879
119894119865
1
2(
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865)
+ 120574119890119879
119894119865(119864
119894minus 120572119890
119894119865)
+ 12057321198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)100381710038171003817100381710038171003817100381710038171003817119890119894119865
1003817100381710038171003817
minus1
2120572 tr (119879
119894119865Γminus1
119894119882119865
119882
119894119865) minus
1
2120572 tr (Λ119879
119894119865Γminus1
119894Λ119865
Λ119894119865)
(64)
Put (53) (54) and (55) into (64) then we can get
119894119871(119890
119894119865 119864
119894)
= 119864119879
119894(119875
1198651+ 119875
1198652+ 119875
1198653minus 120573
1119870[sat (119890
119894119865)] minus 119896119864
119894minus 120574119890
119894119865)
+ 120574119890119879
119894119865(119864
119894minus 120572119890
119894119865)
minus 119864119879
119894(119875
1198652minus 120573
1119870[sat (119890
119894119865)])
minus 119890119879
1198941198651198751198653
+ 12057321198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)100381710038171003817100381710038171003817100381710038171003817119890119894119865
1003817100381710038171003817
minus1
2120572 tr (119879
119894119865Γminus1
119894119882119865
119882
119894119865)
minus1
2120572 tr (Λ119879
119894119865Γminus1
119894Λ119865
Λ119894119865)
Mathematical Problems in Engineering 9
= minus120572120574119890119879
119894119865119890119894119865+ (119864
119879
119894minus 119890
119879
119894119865)119875
1198653
1198691(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)2
41198963
minus1
2120572 tr (119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865119890119879
119894119865) minus
1
2120572 tr (Λ119879
119894119865
119890119894119865119890119879
119894119865
119879
119894119865
120581119894119865)
le minus1198961
100381710038171003817100381711989011989411986510038171003817100381710038172
minus 1198962
1003817100381710038171003817119864119894
10038171003817100381710038172
+1198691(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)2
41198963
10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
+1205732
21198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)2
41205721198964
10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
(65)
where 119896min = min1198961 119896
2 120585 = min119896
3 120572119896
4120573
2
2 and
119869(120593119894(119890
119879
119894119865 119864
119879
119894))
2
= 1198691(120593
119894(119890
119879
119894119865 119864
119879
119894))
2 + 1198692(120593
119894(119890
119879
119894119865 119864
119879
119894))
2 sothe following conclusion can be obtained
119894119871(119890
119894119865 119864
119894)
le minus119896min10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
+119869(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)210038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
4120585
le minus11988810038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
(66)
Therefore for an arbitrary constant 119888 minus119888120593119894(119890
119879
119894119865 119864
119879
119894)
2
is a negative semidefinite function which is defined in theadjustable interval119863 expressed as follows
119863 = 119889 (119905) | 119889 le 119869minus1
(2radic119896min120585) (67)
so that Lyapunov stability theory shows that the system isstable In order to make the subsystem of time varying con-strained reconfigurable modular robot tracking the desiredtrajectory progressively in this paper a novel decentralizedreinforcement learning robust optimal tracking controllerhas been designed by using the robust term to compensatethe neural network approximation errors Design the robustcontrol term as
119906119894119903119887
=119873
119903119887119890119894
119890119879119894119890119894+ 120577
(68)
In the equation above 120577 gt 0 is a constant And 119873119903119887can
be expressed as
119873119903119887ge [
[
1205752
ℎ119894
21198991
+1198991(minus120576
119894119886(119890
119894) minus 119877
minus1
119887119894(119890
119894) (nabla120576
119894119888(119890
119894)2))
2
21198992
+11989911198992
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
(minus120576119894119886(119890
119894) minus 119877
minus1
119887119894(119890
119894)nabla120576
119894119888(119890
119894)
2)
2
]
]
sdot(119890
119879
119894119890119894+ 120577)
211989911198992
1003817100381710038171003817119887119894 (119890119894)1003817100381710038171003817 119890
119879
119894119890119894
ge [1198992
1(minus120576
119894119886(119890
119894) minus 119877
minus1
119887119894(119890
119894)nabla120576
119894119888(119890
119894)
2)
2
+ 11989921205752
ℎ119894+ 2119899
2
11198992
2
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
sdot (minus120576119894119886(119890
119894) minus 119877
minus1
119887119894(119890
119894)nabla120576
119894119888(119890
119894)
2)
2
]
sdot(119890
119879
119894119890119894+ 120577)
41198992111989922
1003817100381710038171003817119887119894 (119890119894)1003817100381710038171003817 119890
119879
119894119890119894
(69)
Therefore the global control law can be designed asfollows
119906mix = 119906119894+ 119906
119894119903119887
= minus1
2119877minus1
119887119879
119894(119890
119894) 119878
119894119886(119890
119894)119879
119894119886+
119873119903119887119890119894
119890119879119894119890119894+ 120577
(70)
Theorem 6 Considering dynamics of the subsystem of timevarying constrained reconfigurable modular robot in (14) if thesystem parameters conditions and the assumptions are held thecritic-NN action-NN and identifier are given by (33) (34)and (45) respectively and the decentralized robust optimaltracking controller of the subsystem in (70) is adopted thenthe system is closed-loop stability and the desired trajectory canbe tracked asymptotically by the actual output
Proof Design the Lyapunov function as follows
119881119894119906(119890
119894 119906mix) =
1
21198991
tr 119879
119894119888
119894119888 +
1198991
21198992
tr 119879
119894119886
119894119886
+ 11989911198992[119890
119879
119894119865119890119894119865+ Ξint
infin
0
119903119894(119890
119894 119906mix) 119889120591]
(71)
where Ξ gt 0 is the undetermined parameter 119905 le 120591 lt infin Thederivation of (71) is shown as follows
119894119906(119890
119894 119906mix)
=1
21198991
tr 119879
119894119888
119882
119894119888 +
1198991
21198992
tr 119879
119894119886
119882
119894119886
+ 11989911198992[119890
119879
119894119865119890119894119865+ Ξ119903
119894(119890
119894 119906mix)]
=1
21198991
tr 119879
119894119888(minus119899
1119871119894(119871
119879
119894
119894119888+ 120575
ℎ119894))
+1198991
21198992
tr
times
119879
119894119886
[[[[
[
minus1198992119878119894119886(119890
119894)(
119879
119894119886119878119894119886(119890
119894) minus 120576
119894119886(119890
119894)
+1
2119877minus1
119887119894(119890
119894) nabla119878
119894119888(119890
119894)119879
119894119888
minus1
2119877minus1
119887119894(119890
119894) nabla120576
119894119888(119890
119894)
)
]]]]
]
+ 11989911198992119890119879
119894119865(119882
119879
119894119865120581 (Λ
119879
119894119865119890119894) + 120576
119894119865(119890
119894) + 119887
119894(119890
119894) mix)
+Ξ (119890119879
119894119876119890119890119894+ 119906
119879
mix119877119906mix)
10 Mathematical Problems in Engineering
le minus(1198712
119894119898minus1198991
21198712
119894119872)10038171003817100381710038171003817
119894119888
10038171003817100381710038171003817
2
+1
21198991
1205752
ℎ119894
minus (11989911198782
119894119886119898minus3
4119899111989921198782
119894119886119872)10038171003817100381710038171003817
119894119886
10038171003817100381710038171003817
2
+1198991
41198992
10038171003817100381710038171003817119877minus110038171003817100381710038171003817
21003817100381710038171003817119887119894(119890119894)10038171003817100381710038172 10038171003817100381710038171003817nabla119878
2
119894119888119872
10038171003817100381710038171003817
10038171003817100381710038171003817
119894119888
10038171003817100381710038171003817
2
+1198991
21198992
(120576119894119886(119890
119894) + 119877
minus1
119887119894(119890
119894)nabla120576
119894119888(119890
119894)
2)
119879
sdot (120576119894119886(119890
119894) + 119877
minus1
119887119894(119890
119894) nabla120576
119894119888(119890
119894)
2)
+ 11989911198992
1003817100381710038171003817119887119894 (119890119894)10038171003817100381710038172
120576119894119886(119890
119894)119879
120576119894119886(119890
119894)
+ 119899111989921198782
119894119886119872
1003817100381710038171003817119887119894 (119890119894)1003817100381710038171003817210038171003817100381710038171003817
119894119886
10038171003817100381710038171003817
2
+ 11989911198992Ξ120582min (119876119890
)1003817100381710038171003817119890119894119865
10038171003817100381710038172
+ 11989911198992(1003817100381710038171003817119887119894 (119890119894)
10038171003817100381710038172
minus Ξ120582min (119877))1003817100381710038171003817119906mix
10038171003817100381710038172
(72)
If the following inequalities can satisfy
120582min 1198761198901003817100381710038171003817119890119894119865
10038171003817100381710038172
2le 119890
119879
119894119865119876119890119890119894119865le 120582max 119876119890
1003817100381710038171003817119890119894119865
10038171003817100381710038172
2
120582min 1198771003817100381710038171003817119906mix
10038171003817100381710038172
2le 119906
119879
mix119877119906mix le 120582max 1198771003817100381710038171003817119906mix
10038171003817100381710038172
2
Ξ gt
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
120582min 119877
(73)
then 119894119906(119890
119894 119906mix) can be further transformed as
119894119906(119890
119894 119906mix)
le minus(1198712
119894119898minus1198991
21198712
119894119872minus
1198991
41198992
10038171003817100381710038171003817119877minus110038171003817100381710038171003817
21003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
nabla1198782
119894119888119872)10038171003817100381710038171003817
119894119888
10038171003817100381710038171003817
2
minus (11989911198782
119894119886119898minus3
4119899111989921198782
119894119886119872minus 119899
111989921198782
119894119886119872
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
)10038171003817100381710038171003817
119894119886
10038171003817100381710038171003817
2
minus 11989911198992(1003817100381710038171003817119887119894(119890119894)
10038171003817100381710038172
+ Ξ120582min (119877))1003817100381710038171003817119906mix
10038171003817100381710038172
minus 11989911198992Ξ120582min (119876119890
)1003817100381710038171003817119890119894119865
10038171003817100381710038172
le minus11989911198992Ξ120582min (119876119890
)1003817100381710038171003817119890119894119865
10038171003817100381710038172
(74)
Therefore we can get the conclusion that 119894119906(119890
119894 119906mix) lt 0
4 Simulations
In order to verify the validity and convergence of the pro-posed decentralized reinforcement learning robust optimaltracking control method based on ACI and to study theconvergence of the error by comparing the simulation resultin this paper two different configurations of the time varying
external constrained reconfigurablemodular robot have beenapplied shown in Figures 3 and 4
For the sake of the facilitation of the analysis of theconfigurations above we can transform them into a formof analytic charts which are shown in Figures 5 and 6where 119871
1 119871
2 and 119871
4are the length of the links 119871
3is the
distance between the time varying constraint joint and thebase modular
The time varying constraint can be defined as a kindof column which rotated about with a certain degree offreedom The constraint equations of configuration A andconfiguration B are shown as follows
Ψ119860(119902 119905) = 119871
1cos 119902
1+ 119871
2cos 119902
2minus [119871
3+ 119871
4cot120572 (119905)]
Ψ119861(119902 119905) = 119871
1+ 119871
2cos 119902
2minus [119871
3+ 119871
4cot120572 (119905)]
(75)
In the equation above the angle 120572(119905) between the timevarying constraint and the119883-axis can be defined as follows
120572 (119905) = 075120587 + 02 sin 119905
2 (76)
The initial positions of joint models are 1199021(0) = 2 119902
2(0) =
2 in configurationA and 1199021(0) = 2 119902
2(0) = 2 in configuration
BThe initial velocities of joints are zerosThe dynamicmodelof configurations A and B is designed as follows
119872119860(119902) = [
036 cos (1199022) + 06066 018 cos (119902
2) + 01233
018 cos (1199022) + 01233 01233
]
119872119861(119902) = [
017 minus 01166cos2 (1199022) minus006 cos (119902
2)
minus006 cos (1199022) 01233
]
119862119860(119902 119902) = [
minus036 sin (1199022) 119902
2minus018 sin (119902
2) 119902
2
018 sin (1199022) ( 119902
1minus 119902
2) 018 sin (119902
2) 119902
1
]
119862119861(119902 119902) = [
01166 sin (21199022) 119902
2006 sin (119902
2) 119902
2
006 sin (1199022) 119902
20
]
119866119860(119902) = [
minus588 sin (1199021+ 119902
2) minus 1764 sin (119902
1)
minus588 sin (1199021+ 119902
2)
]
119866119861(119902) = [
0
minus588 cos (1199022)]
119865119860(119902 119902) = [
1199021+ 10 sin (3119902
1) + 2 sgn ( 119902
1)
12 1199022+ 5 sin (2119902
2) + sgn ( 119902
2)]
119865119861(119902 119902) = [
0
15 1199022+ sin (119902
2) + 12 sgn ( 119902
2)]
(77)
The desired trajectory of configurations A and B is shown asConfiguration A
1199101119889
= 05 cos (119905) + 02 sin (3119905)
1199102119889
= Θ (1199101119889 119905)
= arcsin[1198711sin (120572 (119905) minus 119910
1119889) minus 119871
3sin (120572 (119905))
1198712
] + 120572 (119905)
(78)
Mathematical Problems in Engineering 11
Figure 3 Configuration A for simulation
Figure 4 Configuration B for simulation
Configuration B
1199101119889
= 0
1199102119889
= Θ (1199101119889 119905)
= arcsin [1198711sin (120572 (119905)) minus 119871
3sin (120572 (119905))
1198712
] + 120572 (119905)
(79)
Due to the limit of dynamic external constraints thevariety of joint 1 in configuration B is zero
In order to confirm that the adopted method can beapplied in different configurations and to verify the trackingperformance for the subsystem desired trajectory by usingthe ACI and RBF-NN based decentralized reinforcementlearning robust optimal tracking control method in this partthe comparative simulations which include the classic RBFneural network control method and ACI based decentral-ized reinforcement learning robust optimal tracking controlmethod have been adopted respectively
From Figures 7 8 9 10 11 12 13 and 14 the joint trackingand error curves are shown by using classic RBF neural net-work [4] to compensate the effect of the dynamic nonlinearterm and the interconnection term in the subsystem Figures7 to 10 show that the actual output trajectories take about2 seconds to track the desired trajectories This is due tothe fact that the classical neural network method requires alonger training process and parameter adaptive process Theerror curves in Figures 11ndash14 show that the joint subsystem
q1L2
L3
L4
L1
Y
X
120572
q2
Figure 5 The analytic chart of configuration A
q2
L4
L2
L1
L3
Y
120572
X
q1
Figure 6 The analytic chart of configuration B
constraints cannot be well compensated by adopting the clas-sical neural control method for the reconfigurable modularrobot when the time varying constrains exist When thejoint output variables turn larger the reconfigurable modularsystem cannot exhibit a good robustness and the trackingerrors are larger than before
Figures 15 16 17 18 19 20 21 and 22 showed thejoint tracking and error curves by adopting ACI to identifythe optimal 119876-function optimal control policy and globaldynamic nonlinear terms of the HJB equation in the sub-system Figures 15 to 18 show that the actual output trajec-tories of the subsystems can track the desired trajectoriesin 05 seconds by using the proposed robust reinforcementlearning optimal control method This is due to the excellentidentifying ability of ACIwhich can identify the uncertaintiescontained in the subsystems in a short timeThe error curvesin Figures 19ndash22 show that the tracking error is very smalland it can converge to zero in a short time Besides whenthe proposed decentralized control method is adopted for thejoint subsystems the robustness is manifested
Figures 23 and 24 showed the 3D-tip trajectory curvesby using ACI algorithm These two figures show that theproposed decentralized control method can fully satisfythe accessibility requirement of the reconfigurable modularrobot Besides the singular displacements of the joint subsys-tems and the end-effector would not appear The parametersdefined in ACI are shown in Table 1
12 Mathematical Problems in Engineering
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
25
Time (s)
Join
t 1 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
minus1
minus05
Figure 7 Trajectory tracking curve of configuration A joint 1 withRBF neural network
0 1 2 3 4 5 6 7 8 9 1008
1
12
14
16
18
2
22
24
26
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 8 Trajectory tracking curve of configuration A joint 2 withRBF neural network
Table 1 Parameter list of action-critic-identifier
119896 120572 120592 1205781198861
1205781198862
120578119888
1205731
1205732
120574
800 300 0005 10 50 20 02 2 05
The simulation results show that compared with thesituation of using classic RBF neural network controllerthe decentralized robust optimal tracking control methodbased on ACI and reinforcement learning can be appliedinto different configurations of the time varying constrainedreconfigurable modular robot And the joint variables cantrack the desired trajectory within a very short time indifferent configurations and the fluctuation of the errorconvergence range is minimal
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus2
minus15
minus05
minus1
Desired trajectoryActual trajectory
Figure 9 Trajectory tracking curve of configuration B joint 1 withRBF neural network
0 1 2 3 4 5 6 7 8 9 1017
18
19
2
21
22
23
24
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 10 Trajectory tracking curve of configuration B joint 2 withRBF neural network
0 1 2 3 4 5 6 7 8 9 10
0
002
004
006
008
01
Time (s)
Join
t 1 er
ror (
rad)
minus01
minus008
minus006
minus004
minus002
Figure 11 Tracking error curve of configuration A joint 1 with RBFneural network
Mathematical Problems in Engineering 13
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
002
004
006
008
01
Join
t 2 er
ror (
rad)
minus01
minus008
minus006
minus004
minus002
Figure 12 Tracking error curve of configuration A joint 2 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10
0
001
002
003
004
005
Time (s)
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 13 Tracking error curve of configuration B joint 1 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 2 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 14 Tracking error curve of configuration B joint 2 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus1
minus05
Desired trajectoryActual trajectory
Figure 15 Trajectory tracking curve of configuration A joint 1 withACI based reinforcement learning
0 1 2 3 4 5 6 7 8 9 1008
1
12
14
16
18
2
22
24
26
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 16 Trajectory tracking curve of configuration A joint 2 withACI based reinforcement learning
5 Conclusions and Future Work
In this paper combining ACI with RBF neural network anovel decentralized reinforcement learning robust optimaltracking control theory has been proposed for time varyingconstrained reconfigurable modular robots Moreover thistheory is used to solve the problem of the continuoustime nonlinear optimal control policy for strongly coupleduncertainty robotic system Firstly we build the model ofsubsystem with the time varying external constraints anddescribed the global robot system as a synthesis of inter-connected subsystem Secondly ACI is used to estimate theHJB equation and the global uncertainty where a continuous-time optimal 119876-function is adopted to take the place oftraditional optimal value function the optimal 119876-function
14 Mathematical Problems in Engineering
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus15
minus05
minus2
minus1
Desired trajectoryActual trajectory
Figure 17 Trajectory tracking curve of configuration B joint 1 withACI based reinforcement learning
0 1 2 3 4 5 6 7 8 9 1017
18
19
2
21
22
23
24
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 18 Trajectory tracking curve of configuration B joint 2 withACI based reinforcement learning
and the optimal control police are approximated by critic-NNand action-NN and the global uncertainty is identified bythe identifier Thirdly we design a novel decentralized robustoptimal tracking controller so the desired trajectory can betracked and the tracking error could converge to zero infinite timeOn this basis two kinds of Lyapunov functions aredesigned to confirm the stability of ACI and the subsystemFinally in order to confirm the superiority of the proposedcontrol theory the comparative simulation examples havebeen presented combining two different configurations of therobot
In the future more complex configuration for timevarying external constrained reconfigurable modular robotcan be included Therefore the decentralized controller withhigher control and error precision will be considered
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 19 Tracking error curve of configuration A joint 1 with ACIbased reinforcement learning
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005Jo
int 2
erro
r (ra
d)
minus005
minus004
minus003
minus002
minus001
Figure 20 Tracking error curve of configuration A joint 2 with ACIbased reinforcement learning
0
0
1 2 3 4 5 6 7 8 9 10Time (s)
001
002
003
004
005
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 21 Tracking error curve of configuration B joint 1 with ACIbased reinforcement learning
Mathematical Problems in Engineering 15
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 2 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 22 Tracking error curve of configuration B joint 2 with ACIbased reinforcement learning
005
1
02 03 04 05 06 07
0
01
02
03
minus1
minus05minus02
minus01
xla
bel
y labelz label
Desired trajectoryActual trajectory
Figure 23 3D-tip trajectory curve of configuration A with ACI
005
1
035 036 037 038 039 04
006008
01012014016018
minus1
minus05
xla
bel
y labelz label
Desired trajectoryActual trajectory
Figure 24 3D-tip trajectory curve of configuration B with ACI
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work is financially supported by the National NaturalScience Foundation of China (Grant nos 61374051 and60974010) the Scientific Technological Development PlanProject in Jilin Province of China (Grant no 20110705) Thefirst author is funded by China Scholarship Council
References
[1] Y Li and B Dong ldquoDecentralized ADRC control for reconfig-urable manipulators based on VGSTA-ESO of sliding moderdquoINFORMATION vol 15 no 6 pp 2453ndash2466 2012
[2] Y LiM-C Zhu andY-C Li ldquoSimulation on robust neurofuzzycompensator for reconfigurable manipulator motion controlrdquoJournal of System Simulation vol 19 no 22 pp 5169ndash5174 2007
[3] M-C Zhu and Y-C Li ldquoDecentralized adaptive sliding modecontrol for reconfigurable manipulators using fuzzy logicrdquoJournal of Jilin University (Engineering and Technology Edition)vol 39 no 1 pp 170ndash176 2009
[4] L Zhu and Y Li ldquoDecentralized adaptive neural networkcontrol for reconfigurable manipulatorsrdquo in Proceedings of theChinese Control and Decision Conference (CCDC rsquo10) pp 1760ndash1765 Xuzhou China May 2010
[5] M Zhu Y Li and Y Li ldquoA new distributed control schemeof modular and reconfigurable robotsrdquo in Proceedings of theIEEE International Conference onMechatronics and Automation(ICMA rsquo07) pp 2622ndash2627 Harbin China August 2007
[6] M C Zhu Y Li and Y C Li ldquoObserver-based decentralizedadaptive fuzzy control for reconfigurable manipulatorrdquo Controland Decision vol 24 no 3 pp 429ndash434 2009
[7] S Richard and A G Barto Reinforcement Learning An Intro-duction The MIT Press London UK 1998
[8] F L Lewis and D Liu ReinForcement Learning and Approxi-mate Dynamic Programming For Feedback Control Wiley-IEEEPress New York NY USA 2012
[9] Y-K Xu and X-R Cao ldquoLebesgue-sampling-based optimalcontrol problems with time aggregationrdquo IEEE Transactions onAutomatic Control vol 56 no 5 pp 1097ndash1109 2011
[10] F L Lewis andD Vrabie ldquoReinforcement learning and adaptivedynamic programming for feedback controlrdquo IEEE Circuits andSystems Magazine vol 9 no 3 pp 32ndash50 2009
[11] X Xu H-G He and D Hu ldquoEfficient reinforcement learningusing recursive least-squares methodsrdquo Journal of ArtificialIntelligence Research vol 16 pp 259ndash292 2002
[12] H Zhang Q Wei and Y Luo ldquoA novel infinite-time optimaltracking control scheme for a class of discrete-time nonlinearsystems via the greedy HDP iteration algorithmrdquo IEEE Trans-actions on Systems Man and Cybernetics B vol 38 no 4 pp937ndash942 2008
[13] H Zhang R Song Q Wei and T Zhang ldquoOptimal trackingcontrol for a class of nonlinear discrete-time systems withtime delays based on heuristic dynamic programmingrdquo IEEETransactions on Neural Networks vol 22 no 12 pp 1851ndash18622011
16 Mathematical Problems in Engineering
[14] H Zhang X Xie and S Tong ldquoHomogenous polynomiallyparameter-dependent 119867
infinfilter designs of discrete-time fuzzy
systemsrdquo IEEE Transactions on Systems Man and CyberneticsB vol 41 no 5 pp 1313ndash1322 2011
[15] H Zhang L Cui X Zhang and Y Luo ldquoData-drivenrobust approximate optimal tracking control for unknown gen-eral nonlinear systems using adaptive dynamic programmingmethodrdquo IEEE Transactions on Neural Networks vol 22 no 12pp 2226ndash2236 2011
[16] J Zhang H Zhang Y Luo and H Liang ldquoOptimal controldesign for nonlinear systems adaptive dynamic programmingbased on fuzzy critic estimatorrdquo in The International JointConference on Neural Network pp 1ndash6 Brisbane Australia2012
[17] H Zhang D Gong B Chen and Z Liu ldquoSynchronization forcoupled neural networkswith interval delay a novel augmentedlyapunov-krasovskii functional methodrdquo IEEE Transactions onNeural Networks and Learning Systems vol 24 no 1 pp 58ndash702013
[18] S Bhasin K Dupree P M Patre and W E Dixon ldquoNeuralnetwork control of a robot interacting with an uncertain vis-coelastic environmentrdquo IEEE Transactions on Control SystemsTechnology vol 19 no 4 pp 947ndash955 2011
[19] S G Khan G Herrmann F Lewis T Pipe and C MelhuishldquoA Q-learning based Cartesian model reference compliancecontroller implementation for a humanoid robot armrdquo inProceedings of the IEEE 5th International Conference onRoboticsAutomation andMechatronics (RAM rsquo11) pp 214ndash219 QingdaoChina September 2011
[20] P K Patchaikani L Behera and G Prasad ldquoA single networkadaptive critic-based redundancy resolution scheme for robotmanipulatorsrdquo IEEE Transactions on Industrial Electronics vol59 no 8 pp 3241ndash3253 2012
[21] C J C H Watkins and P Dayan ldquoQ-learningrdquo MachineLearning vol 8 no 3-4 pp 279ndash292 1992
[22] C J CHWatkinsLearning fromdelayed rewards [PhD thesis]University of Cambridge Cambridge UK 1989
[23] F L Lewis and V L Syrmos Optimal Control John Wiley ampSons New York NY USA 1995
[24] M Sassano and A Astolfi ldquoDynamic approximate solutionsof the HJ inequality and of the HJB equation for input-affinenonlinear systemsrdquo IEEE Transactions on Automatic Controlvol 57 no 10 pp 2490ndash2503 2012
[25] Y X Wu and C Wang ldquoDeterministic learning based adaptivenetwork control of robot in task spacerdquoActa Automatica Sinicavol 39 no 1 pp 1ndash10 2013
[26] P M Patre W MacKunis K Kaiser and W E DixonldquoAsymptotic tracking for uncertain dynamic systems via amultilayer neural network feedforward and RISE feedbackcontrol structurerdquo IEEE Transactions on Automatic Control vol53 no 9 pp 2180ndash2185 2008
[27] B E Paden and S S Sastry ldquoA calculus for computing Filippovrsquosdifferential inclusion with application to the variable structurecontrol of robot manipulatorsrdquo IEEE Transactions on Circuitsand Systems vol 34 no 1 pp 73ndash82 1987
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
2 Mathematical Problems in Engineering
uncertainty An observer based decentralized adaptive fuzzycontroller for reconfigurable manipulator is proposed in [6]by designing the state observer the adaptive fuzzy systemswhich are used to model the unknown dynamics of thesubsystem and the interconnection term can be constructedby using the state estimations Nevertheless the require-ment of the dynamics of the reconfigurable modular robotsystem is hard to be satisfied either fully or even partiallyknow Moreover because of there are strong coupling modeluncertainties and interconnection terms of subsystems in thereconfigurable modular robot system besides the processingload on the controller would be increased the fact that thegreater time delay and calculation error are easy to produceso that it is too complicated to design the controllers by usingthe methods and algorithms above
In recent years as one of the most effective methodsto solve the control problems for continuous time whichstrongly coupled with nonlinear system the reinforcementlearning algorithm has received extensive attention fromscholars Reinforcement learning [7 8] is a kind of learningmethod mapping situations to actions so as to maximize anumerical reward signal Compared with supervised learn-ing reinforcement learning does not need to predict thementor signal in various states but learns in the processof interaction with the situation Because of its adaptiveoptimization capability in the nonlinear model under thecondition of uncertainty reinforcement learning has a uniqueadvantage to solve the problems of optimization strategiesand the control method in terms of the complex models [9ndash11] Zhang and his team presented an infinite time optimaltracking control scheme for discrete-time nonlinear systemvia the greedy HDP iteration algorithm [12ndash14] Accordingto the system transformation the optimal tracking problemis transformed into an optimal regulation problem and thegreedyHDP iteration algorithm is introduced to deal with theregulation problem with the rigorous convergence analysisThen a data-driven robust approximate optimal trackingcontrol is proposed by using the adaptive dynamic program-ming as well as a data-drivenmodel which is established by arecurrent neural network to reconstruct the unknown systemdynamics by using available input-output date [15] After thisthey design a fuzzy critic estimator which is used to estimatethe value function for nonlinear continuous-time system[16] On this basis a synchronization problem for an arrayof neural networks with hybrid coupling and interval timevarying delay is concerned with an augmented Lyapunov-Krasovskii functional method [17] The FRL scheme onlyusing immediate reward and sufficient conditions is adoptedto analyze the convergence of the optimal task performanceBhasin presents a neural network control of a robot inter-acting with an uncertain viscoelastic environment [18] In[18] a continuous controller is developed for a robot thatmoved in free space and then regulated the new coupleddynamic system to a desired setpoint Khan et al presentan implementation of a model-free Q-learning based onthe discrete model reference compliance controller for ahumanoid robot arm [19] where reinforcement learningscheme uses a recently developed Q-learning scheme todevelop an optimal policy online Patchaikani et al propose
an adaptive critic-based real-time redundancy resolutionscheme for kinematic control of the redundant manipulator[20] The kinematic control of the redundant manipulatoris formulated as a discrete-time input affine system andthen an optimal real-time redundancy resolution scheme isproposed Although the research of reinforcement learningalgorithm has been rapidly developed in recent years thereare still some deficiencies For example when there aremultiple subsystems in the global system the methods abovecould not handle the impacts of the interconnection termsbetween the subsystems Meanwhile the methods above aremostly used to solve the learning and optimization problemsof the system itself but when the external constraints exist inthe system thesemethods are no longer applicableThereforeit is an urgent issue to solve the problem of how to designa kind of robust reinforcement learning optimal controlmethod in the case of the external constraints existing andmultiple subsystems coupled in the system are the urgentproblems to be solved
In this paper we presented a novel continuous timedecentralized reinforcement learning robust optimal trackingcontrol theory for the time varying constrained reconfig-urable modular robot Combining with ACI and RBF-NNthe critic-NN is used to estimate the optimal119876-function theaction-NN is proposed to approximate the optimal controlpolicy and then the identifier is adopted to identify theglobal uncertainty so that the HJB equation can be estimatedand the estimation error is bounded and converged Firstlysince the decentralized control method is adopted in thispaper whichmeans that each joint subsystemowns a separatecontroller thus the processing loads of the controllers arereduced greatly Secondly due to the fact that the timevarying constraints can be compensated in the subsystemstherefore the proposedmethod in this paper is suitable for thereconfigurablemodular robot in the time varying constrainedoutside environment Thirdly the proposed control methodcould compensate for the impacts of the model uncertaintiesand the interconnection terms on the system so that it canmake the subsystems track the desired trajectories and thetracking error can converge to zero in finite time
2 Problem Formulation
Assume that the time varying external constraints for the endof reconfigurable modular robot is shown as
Ψ (119902 119905) = 0 (1)
Here 119902 isin 119877119899 is the vector of joint displacements Function
Ψ 119877119899
rarr 119877119898 119898 is the dimension of the external limiting
conditions With the time varying constraints the dynamicsof a reconfigurablemodular robot can be presented as follows
119872(119902) 119902 + 119862 (119902 119902) 119902 + 119866 (119902) + 119865 (119902 119902) = 119906 + 119869119879
Ψ(119902 119905) 119891 (2)
119872(119902) isin 119877119899times119899 is the inertia matrix 119862(119902 119902) isin 119877
119899 is theCoriolis and centripetal force 119866(119902) isin 119877
119899 is the gravity term119865(119902 119902) is the unmodeled dynamics including friction termsand external disturbances 119906 isin 119877
119899 is the applied joint torque
Mathematical Problems in Engineering 3
and 119869119879
Ψ(119902 119905)119891 is the contact force generated by the contact
of the end of the reconfigurable modular robot and externalconstraints
After introducing 119898th constraints for the robot whichworks in the free space because of the limitation of (1) thesystem lost 119898th degrees of freedom Therefore the degreesof freedom of the robot change from 119899 to (119899 minus 119898) so thatonly (119899 minus 119898) independent joint displacements are needed todescribe the system of restricted movement fully
Define
119902 = [1199021
1199022
] 1199021isin 119877
119899minus119898
1199022isin 119877
119898
(3)
Putting the equation above into (1) then we can get that
Ψ (1199021 Θ (119902
1 119905) 119905) = 0 (4)
where
1199022= Θ (119902
1 119905) (5)
Therefore (3) can be described by joint displacement 1199021
fully shown as follows
119902 = [1199021
Θ(1199021 119905)
] (6)
The derivation of (6) is
119902 = [
[
1199021
120597Θ (1199021 119905)
1205971199021
1199021+120597Θ (119902
1 119905)
120597119905
]
]
= [
[
119868119899minus119898
0
120597Θ (1199021 119905)
1205971199021
119868119898
]
]
[1199021
0] + [
[
0
120597Θ (1199021 119905)
120597119905
]
]
= 119879 120579 + 119867
(7)
In (7)
119879 = [
[
119868119899minus119898
0
120597Θ (1199021 119905)
1205971199021
119868119898
]
]
isin 119877119899times119899
120579 = [1199021
0] isin 119877
119899
119867 = [
[
0
120597Θ (1199021 119905)
120597119905
]
]
isin 119877119899
(8)
Therefore the second derivation of 119902 can be achieved easilyas
119902 = 119879 120579 + 120579 + (9)
Putting (7) and (9) into (2) we can get
119906 + 119869119879
Ψ(119902 119905) 119891 = 119872(119902) (119879 120579 + 120579 + )
+ 119862 (119902 119902) (119879 120579 + ) + 119866 (119902) + 119865 (119902 119902)
(10)
Define
119864 = [119868(119899minus119898)times(119899minus119898)
0119898times(119899minus119898)
] isin 119877119899times(119899minus119898)
(11)
Therefore
120579 = [1199021
0] = 119864119902
1(12)
So (2) can be decomposed into the following form
119899
sum
119895=1
119872119894119895(119902) [(119879119864 119902
1)119895+ (119864 119902
1)119895
+ 119895]
+
119899
sum
119895=1
119862119894119895(119902 119902) [(119879119864 119902
1)119895+ 119867
119895] + 119866
119894(119902)
+ 119865119894(119902
119894 119902
119894) minus 119891
119894= 119906
119894
(13)
In the equation above (119879119864 1199021)119895 (119864 119902
1)119895 (119879119864 119902
1)119895 and 119867
119895
are the 119895th element of (119879119864 1199021) (119864 119902
1) (119879119864 119902
1) and 119867
respectively 119866119894(119902) 119865
119894(119902
119894 119902
119894) and 119906
119894are the 119894th element of
119866(119902) 119865(119902 119902) and 119906 119891119894is the constraint force which suffered
by the 119894th joint 119872119894119895(119902) and 119862
119894119895(119902 119902) are the 119894119895th element of
119872(119902) and 119862(119902 119902) respectively So as shown in Figure 1 eachsubsystem dynamical model can be formulated in joint spaceas follows
119872119894(119902
119894) 119902
119894+ 119862
119894(119902
119894 119902
119894) 119902
119894+ 119866
119894(119902
119894) + 119865
119894(119902
119894 119902
119894) + 119885
119894(119902 119902 119902)
= 119906119894
(14)
119885119894(119902 119902 119902) =
119899
sum
119895=1
119895 = 119894
119872119894119895(119902) [(119879119864 119902
1)119895+ (119864 119902
1)119895
+ 119895]
+119872119894119894(119902) [(119879119864 119902
1)119894+ (119864 119902
1)119894
+ 119894]
minus119872119894(119902
119894) 119902
119894+
119899
sum
119895=1
119895 = 119894
119862119894119895(119902 119902) [(119879119864 119902
1)119895+ 119867
119895]
+ 119862119894119894(119902 119902) [(119879119864 119902
1)119895+ 119867
119895]
minus 119862119894(119902
119894 119902
119894) 119902
119894+ [119866
119894(119902) minus 119866
119894(119902
119894)]
(15)
Let 119909119894= [119909
1198941 119909
1198942]119879
= [119902119894 119902
119894]119879 for 119894 = 1 119899 then (10) can be
presented by the following state equation
119878119894
1198941= 119909
1198942
1198942= minus119891 (119909
119894 119906
119894) minus ℎ
119894(119902 119902 119902) minus 119891
119894
119910119894= 119909
1198941
(16)
4 Mathematical Problems in Engineering
Zi(q q q)
Zn(q q q)
Z1(q q q)
M1q1 + C1q1 + G1 + F1 + Z1 minus f1 = u1
Miqi + Ciqi + Gi + Fi + Zi minus fi = ui
Mnqn+ Cnqn+ Gn+ Fn+ Znminus fn= un
q1
qq1
q
qn
qi
qn
qi
q
u1
un
uiu
sum
sum
sum
minus
minus
minus
Subsystem n
Subsystem i
Subsystem 1
q=
(uminusCqminusGminusF+
(qt)f
)M
minus1
JT Ψ
Figure 1 The architecture of the time varying constrained reconfigurable modular robot system
where 119909119894is the state vector of subsystem 119878
119894 119910
119894is the output
of subsystem 119878119894 and ℎ
119894(119902 119902 119902) is the interconnection term of
the subsystem 119891(119909119894 119906
119894) and ℎ
119894(119902 119902 119902) can be defined as
119891 (119909119894 119906
119894) = 119872
minus1
119894(119902
119894) [
119862119894(119902
119894 119902
119894) 119902
119894+ 119866
119894(119902
119894)
+119865119894(119902
119894 119902
119894) minus 119906
119894
]
ℎ119894(119902 119902 119902) = minus119872
minus1
119894(119902
119894) 119885
119894(119902 119902 119902)
(17)
In response to the time varying constrained reconfig-urable modular robot system we need to design a decen-tralized robust optimal tracking control policy to make thesubsystem track the desired trajectory as well as the trackingerror is converged and bounded
3 Decentralized Reinforcement LearningRobust Optimal Tracking Control Basedon ACI and 119876-Function
Assumption 1 Desired trajectory 119910119894119889 119910
119894119889 119910
119894119889and input gain
matrix 119887119894(119909
119894) are bounded
Then (16) can be transformed to the below Consider
119878119894
1198941= 119909
1198942
1198942= minus [119865 (119909
119894 119906
119894) + ℎ
119894(119902 119902 119902) + 119891
119894] + 119887
119894(119909
119894) 119906
119894
119910119894= 119909
1198941
(18)
where 119865(119909119894 119906
119894) = 119891(119909
119894 119906
119894) + 119887
119894(119909
119894)119906
119894
Assumption 2 The interconnection terms are bounded sat-isfying the following equation
1003816100381610038161003816ℎ119894 (119902 119902 119902)1003816100381610038161003816 le 120575
1198940+
119899
sum
119895=1
120575119894119895(10038161003816100381610038161003816119904119894119895
10038161003816100381610038161003816) (19)
where 1205751198940gt 0 is an unknown constant and 120575
119894119895(|119904
119894119895|) ge 0 is an
unknown smooth Lipschitz functionThe trajectory tracking error of the joint subsystem 119888 can
be defined as
119890119894(119905) = 119909
119894minus 119910
119894119889 (20)
With regard to the continuous time state equation ofthe subsystem in (18) with the nonlinear function andinterconnection terms generally the value function can bedefined as
119881119906119894(119890119894(119905))
119894(119890
119894(119905)) = int
infin
0
119903119894(119890
119894(119905) 119906
119894(119890
119894(119905))) 119889119905 (21)
In order to facilitate the equation we use 119890119894 119906
119894instead of
119890119894(119905) 119906
119894(119890
119894(119905)) Since the trajectory 119910
119894119889relies upon the control
of the subsystem 119906119894for updating in order to avoid the infinity
results by using (21) we need to transform the value functioninto the following form
119881119906119894
119894(119890
119894) = int
infin
0
119903119894(119890
119894(120591) 119906
119894(119890
119894(120591))) 119889120591 119905 le 120591 lt infin (22)
Thus the optimal value function of the subsystem can bedefined as follows
119881lowast
119894(119890
119894) = min
119906119894
119905le120591ltinfin
int
infin
0
119903119894(119890
119894(120591) 119906
119894(119890
119894(120591))) 119889120591 (23)
Mathematical Problems in Engineering 5
Here 119903119894(119890
119894 119906
119894) represents the reward function for the current
state shown as
119903119894(119890
119894 119906
119894) = 119890
119879
119894119876119890119890119894+ 119906
119879
119894119877119906
119894 (24)
where 119876119890and 119877 are the positive definite matrixes
Typically recording the value of state-action pairs is moreuseful than recording the value of state only since the state-action pairs are the predictions of the reward Even if thereward value of a state is low it does not mean that the valueof state-action pairs is low too If the state of the subsystemin a period time produces a higher reward then it can stillget a higher state-action value Therefore from a long termperspective defining a suitable state-action value function(119876-function) can make actions produce more rewards [2122]
According to (23) and (24) the continuous-time optimal119876-function can be defined as
119876lowast
119894(119890
119894 119886
119894 119906
119894) = 119903
119894(119890
119894 119886
119894 119906
119894) + 119881
lowast
119894(119890
119894 119906
119894)
= 119903119894(119890
119894 119886
119894 119906
119894)
+ min119906119894
119905le120591ltinfin
int
infin
0
119903119894(119890
119894(120591) 119906
119894(119890
119894(120591))) 119889120591
(25)
Assumption 3 The partial derivation of 119876lowast
119894and 119903
119894(119890
119894 119886
119894 119906
119894)
exist and they are continuous in the domain According to(18) and (24) by using the control policy 119906
119894 the optimal
119876-function can satisfy the following Hamiltonian-Jacobi-Bellman equation [23]
HJB119894(119890
119894 119906
119894 nabla119876
lowast
119894)
= min119906119894(119890119894)
[119903119894(119890
119894 119886
119894 119906
119894)
+nabla119876lowast
119894(minus119865 (119890
119894 119906
119894) minus ℎ
119894(119890 119890 119890) minus 119891
119894+ 119887
119894(119890
119894) 119906
119894)]
= min119906119894(119890119894)
[119903119894(119890
119894 119886
119894 119906
119894) + nabla119876
lowast
119894Φ
119894(119890
119894 119906
119894)]
(26)
whereΦ119894(119890
119894 119906
119894) = minus119865(119890
119894 119906
119894)minusℎ
119894(119890 119890 119890)minus119891
119894+119887
119894(119890
119894)119906
119894means the
global uncertainty including the unknown dynamics of thesubsystem and the interconnection term andnabla119876lowast
119894= 120597119876
lowast
119894120597119890
119894
means the gradient of the optimal 119876-function
Lemma 4 (see [24]) Considering dynamics of the subsystemof time varying constrained reconfigurable modular robot in(14) in order to ensure the minimum of the HJB equation (26)possessing the stationary point with respect to 119906
119894 the optimal
119876-function and the optimal control policy must satisfy thefollowing conditions
(1) 120597119867119869119861(119890119894 119906
119894 nabla119876
119894)120597119906
119894= 0
(2) 1205972119867119869119861119894(119890
119894 119906
119894 nabla119876
119894)(120597119906
119894times 120597119906
119879
119894) ge 0
The necessary conditions above lead us to the followingresults
(a) The bounded control policy can guarantee a localminimum of the HJB equation (26) and satisfy theconstraints imposed on the control inputs
(b) The Hessian matrix is positive-definite and the controlpolice 119906
119894can render the global minimum of the HJB
equation(c) If an optimal algorithm exists it is unique
According to Lemma 4 if the reward function is smoothand the optimal control 119906lowast
119894is adopted then the HJB equation
satisfies the following equation
HJBlowast
119894(119890
119894 119906
lowast
119894 nabla119876
lowast
119894) = min
119906lowast
119894(119890119894)
[119903119894(119890
119894 119886
119894 119906
lowast
119894) + nabla119876
lowast
119894Φ
119894(119890
119894 119906
lowast
119894)]
= 0
(27)
And the optimal control can be expressed as follows
119906lowast
119894(119890
119894) = arg
119906lowast
119894
min [HJBlowast
119894(119890
119894 119906
lowast
119894 nabla119876
lowast
119894)]
=1
2119877minus1
119887119879
119894(119890
119894)120597119876
lowast
119894(119890
119894 119886
119894 119906
119894)119879
120597119890119894
(28)
If the optimal 119876-function 119876lowast
119894is continuous derivable
and known and the initial value119876lowast
119894(0) = 0 as well as the opti-
mal control policy 119906lowast
119894(119890
119894) and the global uncertainty of the
subsystemΦ119894(119890
119894 119906
lowast
119894) is known then the HJB equation in (27)
is held and solvableHowever in the actual situation119876lowast
119894is not
derivable everywhere and 119906lowast119894(119890
119894) andΦ
119894(119890
119894 119906
lowast
119894) are unknown
Therefore it is not feasible to solve the HJB equation byusing average method In this paper we combine the action-critic identifier (ACI) with RBF neural network to estimatethe optimal control policy the optimal 119876-function and theglobal uncertainty of the subsystem Action-NN is used toestimate 119906
lowast
119894(119890
119894) and is denoted as
119894(119890
119894) 119876lowast
119894is estimated
by critic-NN and expressed as 119876119894 then we use the robust
neural network identifier to identify Φ119894(119890
119894 119906
lowast
119894) denoted as
Φ119894(119890
119894 119906
lowast
119894)Theblock diagramof theACI architecture is shown
in Figure 2The estimated HJB equation can be expressed as follows
HJBlowast
119894(119890
119894
119894 nabla119876
119894) = min
119906119894(119890119894)[119903
119894(119890
119894 119886
119894
119894) + nabla119876
119894Φ
119894(119890
119894
119894)]
(29)
The identification error of the HJB equation above can beexpressed as
120575ℎ119894= HJBlowast
119894(119890
119894
119894 nabla119876
119894) minusHJBlowast
119894(119890
119894 119906
lowast
119894 nabla119876
lowast
119894) (30)
A classic radial basis function of the neural network isproposed in [25] shown as (31)
119873(119909) = 119882lowast119879
119878 (119909) + 120576 (119909) (31)
6 Mathematical Problems in Engineering
Action
Rewardfunction
HJB
error
Identifier
Subsystem
Critic
minus+
Qi(ei ai ui)Qi(ei ai ui)
ri(ei ai ui)
ri(ei ai ui)
Φi(ei ui)
Φi(ei ui)
Φi(ei )
eiF(t)
ui
ui
(t)
120575hi
1s
Figure 2 The architecture of action-critic-identifier
where 119882lowast means the ideal neural network weights and 120576(119909)
represents the estimation error In the case of using sufficientnumber of nodes if the center and width of the nodes arebuilt appropriately then any kind of continuous functioncould be approximated by RBF-NN Therefore the optimal119876-function and the optimal control policy can be expressedas follows
119876lowast
119894= 119882
119879
119894119878119894(119890
119894) + 120576
119894119888(119890
119894)
119906lowast
119894(119890
119894) = minus
1
2119877minus1
119887119879
119894(119890
119894) [ 119878
119894(119890
119894)119879
119882119894+ 120576
119894119886(119890
119894)]
(32)
where 119878119894(119890
119894) = [119904
1198941(119890
119894) sdot sdot sdot 119904
119894119899(119890
119894)]119879 indicates the smooth
basis function of the neural network 119882119894means the ideal
unknown neural network weight and 120576119894119888(119890
119894) and 120576
119894119886(119890
119894) are
the estimation error By using 119876119894and
119894(119890
119894) to estimate 119876
lowast
119894
and 119906lowast
119894(119890
119894) we can get the following equations
119876119894=
119879
119894119888119878119894119888(119890
119894) (33)
119894(119890
119894) = minus
1
2119877minus1
119887119879
119894(119890
119894) 119878
119894119886(119890
119894)119879
119894119886 (34)
According to the equations above 119894119888(119905) and
119894119886(119905) indicated
the weights of critic-NN and action-NN And the estimationerrors of weights are shown as follows
119894119888(119905) = 119882
119894minus
119894119888(119905) (35)
119894119886(119905) = 119882
119894minus
119894119886(119905) (36)
The update law of the weight for the critic-NN is a gradientdescent algorithm which is shown as follows
119882
119894119888(119905) = minus119899
1119871119894(119871
119879
119894
119894119888+ 119890
119879
119894119876119890119890119894+ 119906
119879
119894119877119906
119894) (37)
In the equation above 119899119894gt 0 is the adaptive gain of the neural
network 119871119894and 119897
119894are defined as
119871119894=
119897119894
119897119879119894119897119894+ 1
119897119894= nabla119878
119894119888(119890
119894) 119890
119894
(38)
Therefore according to the definition above the followinginequalities can be obtained
119871119894119898
le 119871119894le 119871
119894119872
119878119894119888119898
le 119878119894119888(119890
119894) le 119878
119894119888119872
119878119894119886119898
le 119878119894119886(119890
119894) le 119878
119894119886119872
(39)
Mathematical Problems in Engineering 7
Combining (35) with (38) we can get that
119882119894119888(119905) = minus119899
1119871119894(119871
119879
119894
119894119888+ 120575
ℎ119894) (40)
The update law of the weight for the action-NN is developedby a gradient descent algorithm expressed as follows
119882
119894119886(119905) = minus119899
2119878119894119886(119890
119894)
times ((119879
119894119886119878119894119886(119890
119894) +
1
2119877minus1
119887119894(119890
119894) nabla119878
119894119888(119890
119894)119879
119894119888))
119879
(41)
According to the estimation error of action-NN in (36) theoptimal control 119906lowast
119894(119890
119894) can minimize the optimal119876-function
and we can get the following equation
119882119879
119894119886119878119894119886(119890
119894) +
1
2119877minus1
119887119894(119890
119894) nabla119878
119894119888(119890
119894)119879
119882119894119888
+1
2119877minus1
119887119894(119890
119894) nabla120576
119894119888(119890
119894) + 120576
119894119886(119890
119894) = 0
(42)
Putting (41) into (42) we can get that
119882119894119886
= minus1198992119878119894119886(119890
119894)(
119879
119894119886119878119894119886(119890
119894) +
1
2119877minus1
119887119894(119890
119894) nabla119878
119894119888(119890
119894)119879
119894119888
minus1
2119877minus1
119887119894(119890
119894) nabla120576
119894119888(119890
119894) minus 120576
119894119886(119890
119894)
)
(43)
After using critic-NN and action-NN to estimate 119876119894and
119894(119890
119894) we need to design a kind of robust RBF-NN identifier
to identify the nonlinear uncertainties of the subsystem HereΦ
119894(119890
119894
119894) can be expressed as follows
Φ119894(119890
119894
119894) = 119890
119894119865= 119882
119879
119894119865120581 (Λ
119879
119894119865119890119894119865) + 120576
119894119865(119890
119894119865) + 119887
119894(119890
119894)
119894
(44)
where 120581(sdot) means the basic function of neural network and119882
119894119865Λ
119894119865indicate the unknown ideal neural network weights
Equation (44) can be identified by using robust RBF-NNidentifier so we can get
Φ119894(119890
119894
119894) = 119890
119894119865= 119882
119879
119894119865120581119894119865+ 119887
119894(119890
119894)
119894+ 120583
119894 (45)
Here 120581119894119865indicates the estimated value of the basic function of
the neural network 119882119894119865 Λ
119894119865are expressed as the estimated
value of neural network 120583119894isin R means the feedback error
term shown as follows [26]120583119894= 119896 (119890
119894119865(119905) minus 119890
119894119865(119905)) minus 119896 (119890
119894119865(0) minus 119890
119894119865(0)) + 120599
= 119896 (119890119894119865(119905) minus 119890
119894119865(0)) + 120599
120599 = (119896120572 + 120574) 119890119894119865+ 120573
1sat (119890
119894119865)
(46)
where 119896 120572 1205731 and 120574 are the positive control gain constants
and sat(sdot) is a saturation functionTherefore the state estima-tion error of the identifier-NN can be expressed as follows
119890119894119865= 119890
119894119865minus 119890
119894119865
= 119882119879
119894119865120581119894119865minus
119879
119894119865120581119894119865+ 120576
119894119865(119890
119894119865) minus 120583
119894
(47)
A filtered identification error is defined as follows
119864119894= 119890
119894119865+ 120572119890
119894119865 (48)
The derivation of the equation above is shown as
119894= 119882
119879
119894119865119894119865Λ119879
119894119865119890119894119865minus
119879
119894119865
120581119894119865
Λ119879
119894119865119890119894119865+ 120572 119890
119894119865minus 119896119864
119894minus 120574119890
119894119865
minus
119882119879
119894119865120581119894119865+ 120576
119894119865(119890
119894119865) minus 120573
1sat (119890
119894119865) minus
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865
(49)
Here the weight 119882119894119865 Λ
119894119865of the identification-NN can be
updated by
119882
119894119865= proj (Γ
119894119882119865
120581119894119865Λ119879
119894119865
119890119894119865119890119879
119894119865)
Λ119894119865= proj (Γ
119894Λ119865
119890119894119865119890119879
119894119865
119879
119894119865
120581119894119865)
(50)
where Γ119894119882119865
Γ119894Λ119865
are positive constant adaptation gain matri-ces In order to analyze the convergence of the filteredidentification error 119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865
can be divided into thefollowing form
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865
=1
2120581119894119865
119890119894119865[(Λ
119879
119894119865minus Λ
119879
119894119865) (119882
119879
119894119865minus
119879
119894119865)
+ (119882119879
119894119865minus
119879
119894119865) (Λ
119879
119894119865minus Λ
119879
119894119865)]
=1
2120581119894119865
119890119894119865[
119879
119894119865(Λ
119879
119894119865minus Λ
119879
119894119865) + (119882
119879
119894119865minus
119879
119894119865) Λ
119879
119894119865
minus119882119879
119894119865(Λ
119879
119894119865minus Λ
119879
119894119865) minus (119882
119879
119894119865minus
119879
119894119865)Λ
119879
119894119865
]
=1
2120581119894119865
119890119894119865[
119879
119894119865(Λ
119879
119894119865minus Λ
119879
119894119865) + (119882
119879
119894119865minus
119879
119894119865) Λ
119879
119894119865]
minus1
2120581119894119865
119890119894119865[119882
119879
119894119865(Λ
119879
119894119865minus Λ
119879
119894119865) + (119882
119879
119894119865minus
119879
119894119865)Λ
119879
119894119865]
=1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865
minus1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865minus1
2
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865
+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865
(51)
where 119879
119894119865= 119882
119879
119894119865minus
119879
119894119865 Λ119879
119894119865= Λ
119879
119894119865minus Λ
119879
119894119865 Putting (51) into
(49) then (49) can be reduced to the following form
119894= 119875
1198651+ 119875
1198652+ 119875
1198653minus 119896119864
119894minus 120574119890
119894119865minus 120573
1sat (119890
119894119865) (52)
8 Mathematical Problems in Engineering
Among the equations above 1198751198651+119875
1198652+119875
1198653can be expressed
respectively as follows
1198751198651
=1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865minus
119879
119894119865
120581119894119865
Λ119879
119894119865119890119894119865
+ 120572 119890119894119865minus
119879
119894119865120581119894119865
(53)
1198751198652
= minus1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865minus1
2
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865
+119882119879
119894119865119894119865Λ119879
119894119865119890119894119865+ 120576
119894119865(119890
119894119865)
(54)
1198751198653
=1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865 (55)
According to Assumption 1 (48) and (50) the upper boundsof 119875
1198651 119875
1198652 119875
1198653are shown as
100381710038171003817100381711987511986511003817100381710038171003817 le 119869
1(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
100381710038171003817100381711987511986521003817100381710038171003817 le 120589
1
100381710038171003817100381711987511986531003817100381710038171003817 le 120589
2
(56)
Combining (53) and (54) with (55) then we can get that100381710038171003817100381710038171198652
+ 1198653
10038171003817100381710038171003817le 120589
3+ 120589
41198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817 (57)
where 120593119894(119890
119879
119894119865 119864
119879
119894) = [119890
119879
119894119865119864119879
119894]119879 and 119869
119894(sdot) is a global invertible
nondecreasing function 120589119894 (119894 = 1 2 3 4) are computable
positive constants
Theorem 5 Considering dynamics of the subsystem of timevarying constrained reconfigurable modular robot in (14) andthe state equation (18) if the designed identifier and thecorresponding weight update laws are adopted then the globaluncertainty of the subsystem which depends explicitly on theerror term can be identified and the identification error isconverged and bounded
Proof Define the Lyapunov function as the follows
119881119894119871(119890
119894119865 119864
119894) =
1
2119864119879
119894119864119894+1
2120574119890
119879
119894119865119890119894119865+ 120603
119894(119905) + 120601
119894(119905) (58)
In the equation above 120603119894(119905) and 120601
119894(119905) can be expressed as
follows
119894(119905) = minus[
119864119879
119894(119875
1198652minus 120573
1sat (119890
119894119865)) + 119890
119879
1198941198651198751198653
minus12057321198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)100381710038171003817100381710038171003817100381710038171003817119890119894119865
1003817100381710038171003817
]
120603119894(0) = 120573
1
1003816100381610038161003816119890119894119865 (0)1003816100381610038161003816 minus 119890
119879
119894119865(0) (119875
1198652(0) + 119875
1198653(0))
(59)
120601119894(119905) =
1
4120572 [ tr (
119879
119894119865Γminus1
119894119882119865
119894119865) + tr (Λ
119879
119894119865Γminus1
119894Λ119865Λ
119894119865)] (60)
where tr(sdot) represents the trace of matrix Defining 119889 =
[119864119879
119894119890119879
11989411986512060312
11989412060112
119894] 120573
1 120573
2isin R are positive adaptation gains
which are chosen to ensure 120603119894(119905) ge 0 so we can get
1198801(119889) le 119881
119894119871(119890
119894119865 119864
119894) le 119880
2(119889) (61)
where
1198801(119889) =
1
2min (1 120574) 1198892
1198802(119889) = max (1 120574) 1198892
(62)
The derivation of (58) is shown as follows
119894119871(119890
119894119865 119864
119894) = nabla119881
119879
119894119871119870[
119894
119890119879
119894119865
1
212060312
119894119894
1
212060112
119894
120601119894]119879
(63)
where119870[sdot] is expressed as a Filipov set [27]So
119894119871(119890
119894119865 119864
119894) can be deformed as the following form
119894119871(119890
119894119865 119864
119894)
= [119864119879
119894120574119890
119879
1198941198652120603
12
1198942120601
12
119894]119870[
119894
119890119879
119894119865
1
212060312
119894119894
1
212060112
119894
120601119894]119879
le 120574119879
(
1
2
119882119879
119894119865
119894119865
Λ119879
119894119865
119894119865+
1
2
119879
119894119865
119894119865Λ119879
119894119865
119894119865minus
119879
119894119865
119894119865
Λ
119879
119894119865119890119894119865
+120572119894119865minus
1
2
119882119879
119894119865
119894119865
Λ119879
119894119865119890119894119865minus
1
2
119879
119894119865
119894119865Λ119879
119894119865119890119894119865minus 120574119890
119894119865
minus119879
119894119865120581119894119865+119882
119879
119894119865119894119865Λ119879
119894119865119890119894119865+ 120576
119894119865(119890
119894119865) +
1
2
119879
119894119865
119894119865
Λ119879
119894119865
119894119865
+
1
2
119879
119894119865
119894119865
Λ119879
119894119865
119894119865minus 119896119864
119894minus 120573
1119870[sat (119890
119894119865)]
)
minus119864119879
119894(
minus1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865minus1
2
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865
+119882119879
119894119865119894119865Λ119879
119894119865119890119894119865+ 120576
119894119865(119890
119894) minus 120573
1119870[sat (119890
119894119865)])
minus 119890119879
119894119865
1
2(
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865)
+ 120574119890119879
119894119865(119864
119894minus 120572119890
119894119865)
+ 12057321198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)100381710038171003817100381710038171003817100381710038171003817119890119894119865
1003817100381710038171003817
minus1
2120572 tr (119879
119894119865Γminus1
119894119882119865
119882
119894119865) minus
1
2120572 tr (Λ119879
119894119865Γminus1
119894Λ119865
Λ119894119865)
(64)
Put (53) (54) and (55) into (64) then we can get
119894119871(119890
119894119865 119864
119894)
= 119864119879
119894(119875
1198651+ 119875
1198652+ 119875
1198653minus 120573
1119870[sat (119890
119894119865)] minus 119896119864
119894minus 120574119890
119894119865)
+ 120574119890119879
119894119865(119864
119894minus 120572119890
119894119865)
minus 119864119879
119894(119875
1198652minus 120573
1119870[sat (119890
119894119865)])
minus 119890119879
1198941198651198751198653
+ 12057321198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)100381710038171003817100381710038171003817100381710038171003817119890119894119865
1003817100381710038171003817
minus1
2120572 tr (119879
119894119865Γminus1
119894119882119865
119882
119894119865)
minus1
2120572 tr (Λ119879
119894119865Γminus1
119894Λ119865
Λ119894119865)
Mathematical Problems in Engineering 9
= minus120572120574119890119879
119894119865119890119894119865+ (119864
119879
119894minus 119890
119879
119894119865)119875
1198653
1198691(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)2
41198963
minus1
2120572 tr (119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865119890119879
119894119865) minus
1
2120572 tr (Λ119879
119894119865
119890119894119865119890119879
119894119865
119879
119894119865
120581119894119865)
le minus1198961
100381710038171003817100381711989011989411986510038171003817100381710038172
minus 1198962
1003817100381710038171003817119864119894
10038171003817100381710038172
+1198691(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)2
41198963
10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
+1205732
21198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)2
41205721198964
10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
(65)
where 119896min = min1198961 119896
2 120585 = min119896
3 120572119896
4120573
2
2 and
119869(120593119894(119890
119879
119894119865 119864
119879
119894))
2
= 1198691(120593
119894(119890
119879
119894119865 119864
119879
119894))
2 + 1198692(120593
119894(119890
119879
119894119865 119864
119879
119894))
2 sothe following conclusion can be obtained
119894119871(119890
119894119865 119864
119894)
le minus119896min10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
+119869(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)210038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
4120585
le minus11988810038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
(66)
Therefore for an arbitrary constant 119888 minus119888120593119894(119890
119879
119894119865 119864
119879
119894)
2
is a negative semidefinite function which is defined in theadjustable interval119863 expressed as follows
119863 = 119889 (119905) | 119889 le 119869minus1
(2radic119896min120585) (67)
so that Lyapunov stability theory shows that the system isstable In order to make the subsystem of time varying con-strained reconfigurable modular robot tracking the desiredtrajectory progressively in this paper a novel decentralizedreinforcement learning robust optimal tracking controllerhas been designed by using the robust term to compensatethe neural network approximation errors Design the robustcontrol term as
119906119894119903119887
=119873
119903119887119890119894
119890119879119894119890119894+ 120577
(68)
In the equation above 120577 gt 0 is a constant And 119873119903119887can
be expressed as
119873119903119887ge [
[
1205752
ℎ119894
21198991
+1198991(minus120576
119894119886(119890
119894) minus 119877
minus1
119887119894(119890
119894) (nabla120576
119894119888(119890
119894)2))
2
21198992
+11989911198992
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
(minus120576119894119886(119890
119894) minus 119877
minus1
119887119894(119890
119894)nabla120576
119894119888(119890
119894)
2)
2
]
]
sdot(119890
119879
119894119890119894+ 120577)
211989911198992
1003817100381710038171003817119887119894 (119890119894)1003817100381710038171003817 119890
119879
119894119890119894
ge [1198992
1(minus120576
119894119886(119890
119894) minus 119877
minus1
119887119894(119890
119894)nabla120576
119894119888(119890
119894)
2)
2
+ 11989921205752
ℎ119894+ 2119899
2
11198992
2
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
sdot (minus120576119894119886(119890
119894) minus 119877
minus1
119887119894(119890
119894)nabla120576
119894119888(119890
119894)
2)
2
]
sdot(119890
119879
119894119890119894+ 120577)
41198992111989922
1003817100381710038171003817119887119894 (119890119894)1003817100381710038171003817 119890
119879
119894119890119894
(69)
Therefore the global control law can be designed asfollows
119906mix = 119906119894+ 119906
119894119903119887
= minus1
2119877minus1
119887119879
119894(119890
119894) 119878
119894119886(119890
119894)119879
119894119886+
119873119903119887119890119894
119890119879119894119890119894+ 120577
(70)
Theorem 6 Considering dynamics of the subsystem of timevarying constrained reconfigurable modular robot in (14) if thesystem parameters conditions and the assumptions are held thecritic-NN action-NN and identifier are given by (33) (34)and (45) respectively and the decentralized robust optimaltracking controller of the subsystem in (70) is adopted thenthe system is closed-loop stability and the desired trajectory canbe tracked asymptotically by the actual output
Proof Design the Lyapunov function as follows
119881119894119906(119890
119894 119906mix) =
1
21198991
tr 119879
119894119888
119894119888 +
1198991
21198992
tr 119879
119894119886
119894119886
+ 11989911198992[119890
119879
119894119865119890119894119865+ Ξint
infin
0
119903119894(119890
119894 119906mix) 119889120591]
(71)
where Ξ gt 0 is the undetermined parameter 119905 le 120591 lt infin Thederivation of (71) is shown as follows
119894119906(119890
119894 119906mix)
=1
21198991
tr 119879
119894119888
119882
119894119888 +
1198991
21198992
tr 119879
119894119886
119882
119894119886
+ 11989911198992[119890
119879
119894119865119890119894119865+ Ξ119903
119894(119890
119894 119906mix)]
=1
21198991
tr 119879
119894119888(minus119899
1119871119894(119871
119879
119894
119894119888+ 120575
ℎ119894))
+1198991
21198992
tr
times
119879
119894119886
[[[[
[
minus1198992119878119894119886(119890
119894)(
119879
119894119886119878119894119886(119890
119894) minus 120576
119894119886(119890
119894)
+1
2119877minus1
119887119894(119890
119894) nabla119878
119894119888(119890
119894)119879
119894119888
minus1
2119877minus1
119887119894(119890
119894) nabla120576
119894119888(119890
119894)
)
]]]]
]
+ 11989911198992119890119879
119894119865(119882
119879
119894119865120581 (Λ
119879
119894119865119890119894) + 120576
119894119865(119890
119894) + 119887
119894(119890
119894) mix)
+Ξ (119890119879
119894119876119890119890119894+ 119906
119879
mix119877119906mix)
10 Mathematical Problems in Engineering
le minus(1198712
119894119898minus1198991
21198712
119894119872)10038171003817100381710038171003817
119894119888
10038171003817100381710038171003817
2
+1
21198991
1205752
ℎ119894
minus (11989911198782
119894119886119898minus3
4119899111989921198782
119894119886119872)10038171003817100381710038171003817
119894119886
10038171003817100381710038171003817
2
+1198991
41198992
10038171003817100381710038171003817119877minus110038171003817100381710038171003817
21003817100381710038171003817119887119894(119890119894)10038171003817100381710038172 10038171003817100381710038171003817nabla119878
2
119894119888119872
10038171003817100381710038171003817
10038171003817100381710038171003817
119894119888
10038171003817100381710038171003817
2
+1198991
21198992
(120576119894119886(119890
119894) + 119877
minus1
119887119894(119890
119894)nabla120576
119894119888(119890
119894)
2)
119879
sdot (120576119894119886(119890
119894) + 119877
minus1
119887119894(119890
119894) nabla120576
119894119888(119890
119894)
2)
+ 11989911198992
1003817100381710038171003817119887119894 (119890119894)10038171003817100381710038172
120576119894119886(119890
119894)119879
120576119894119886(119890
119894)
+ 119899111989921198782
119894119886119872
1003817100381710038171003817119887119894 (119890119894)1003817100381710038171003817210038171003817100381710038171003817
119894119886
10038171003817100381710038171003817
2
+ 11989911198992Ξ120582min (119876119890
)1003817100381710038171003817119890119894119865
10038171003817100381710038172
+ 11989911198992(1003817100381710038171003817119887119894 (119890119894)
10038171003817100381710038172
minus Ξ120582min (119877))1003817100381710038171003817119906mix
10038171003817100381710038172
(72)
If the following inequalities can satisfy
120582min 1198761198901003817100381710038171003817119890119894119865
10038171003817100381710038172
2le 119890
119879
119894119865119876119890119890119894119865le 120582max 119876119890
1003817100381710038171003817119890119894119865
10038171003817100381710038172
2
120582min 1198771003817100381710038171003817119906mix
10038171003817100381710038172
2le 119906
119879
mix119877119906mix le 120582max 1198771003817100381710038171003817119906mix
10038171003817100381710038172
2
Ξ gt
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
120582min 119877
(73)
then 119894119906(119890
119894 119906mix) can be further transformed as
119894119906(119890
119894 119906mix)
le minus(1198712
119894119898minus1198991
21198712
119894119872minus
1198991
41198992
10038171003817100381710038171003817119877minus110038171003817100381710038171003817
21003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
nabla1198782
119894119888119872)10038171003817100381710038171003817
119894119888
10038171003817100381710038171003817
2
minus (11989911198782
119894119886119898minus3
4119899111989921198782
119894119886119872minus 119899
111989921198782
119894119886119872
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
)10038171003817100381710038171003817
119894119886
10038171003817100381710038171003817
2
minus 11989911198992(1003817100381710038171003817119887119894(119890119894)
10038171003817100381710038172
+ Ξ120582min (119877))1003817100381710038171003817119906mix
10038171003817100381710038172
minus 11989911198992Ξ120582min (119876119890
)1003817100381710038171003817119890119894119865
10038171003817100381710038172
le minus11989911198992Ξ120582min (119876119890
)1003817100381710038171003817119890119894119865
10038171003817100381710038172
(74)
Therefore we can get the conclusion that 119894119906(119890
119894 119906mix) lt 0
4 Simulations
In order to verify the validity and convergence of the pro-posed decentralized reinforcement learning robust optimaltracking control method based on ACI and to study theconvergence of the error by comparing the simulation resultin this paper two different configurations of the time varying
external constrained reconfigurablemodular robot have beenapplied shown in Figures 3 and 4
For the sake of the facilitation of the analysis of theconfigurations above we can transform them into a formof analytic charts which are shown in Figures 5 and 6where 119871
1 119871
2 and 119871
4are the length of the links 119871
3is the
distance between the time varying constraint joint and thebase modular
The time varying constraint can be defined as a kindof column which rotated about with a certain degree offreedom The constraint equations of configuration A andconfiguration B are shown as follows
Ψ119860(119902 119905) = 119871
1cos 119902
1+ 119871
2cos 119902
2minus [119871
3+ 119871
4cot120572 (119905)]
Ψ119861(119902 119905) = 119871
1+ 119871
2cos 119902
2minus [119871
3+ 119871
4cot120572 (119905)]
(75)
In the equation above the angle 120572(119905) between the timevarying constraint and the119883-axis can be defined as follows
120572 (119905) = 075120587 + 02 sin 119905
2 (76)
The initial positions of joint models are 1199021(0) = 2 119902
2(0) =
2 in configurationA and 1199021(0) = 2 119902
2(0) = 2 in configuration
BThe initial velocities of joints are zerosThe dynamicmodelof configurations A and B is designed as follows
119872119860(119902) = [
036 cos (1199022) + 06066 018 cos (119902
2) + 01233
018 cos (1199022) + 01233 01233
]
119872119861(119902) = [
017 minus 01166cos2 (1199022) minus006 cos (119902
2)
minus006 cos (1199022) 01233
]
119862119860(119902 119902) = [
minus036 sin (1199022) 119902
2minus018 sin (119902
2) 119902
2
018 sin (1199022) ( 119902
1minus 119902
2) 018 sin (119902
2) 119902
1
]
119862119861(119902 119902) = [
01166 sin (21199022) 119902
2006 sin (119902
2) 119902
2
006 sin (1199022) 119902
20
]
119866119860(119902) = [
minus588 sin (1199021+ 119902
2) minus 1764 sin (119902
1)
minus588 sin (1199021+ 119902
2)
]
119866119861(119902) = [
0
minus588 cos (1199022)]
119865119860(119902 119902) = [
1199021+ 10 sin (3119902
1) + 2 sgn ( 119902
1)
12 1199022+ 5 sin (2119902
2) + sgn ( 119902
2)]
119865119861(119902 119902) = [
0
15 1199022+ sin (119902
2) + 12 sgn ( 119902
2)]
(77)
The desired trajectory of configurations A and B is shown asConfiguration A
1199101119889
= 05 cos (119905) + 02 sin (3119905)
1199102119889
= Θ (1199101119889 119905)
= arcsin[1198711sin (120572 (119905) minus 119910
1119889) minus 119871
3sin (120572 (119905))
1198712
] + 120572 (119905)
(78)
Mathematical Problems in Engineering 11
Figure 3 Configuration A for simulation
Figure 4 Configuration B for simulation
Configuration B
1199101119889
= 0
1199102119889
= Θ (1199101119889 119905)
= arcsin [1198711sin (120572 (119905)) minus 119871
3sin (120572 (119905))
1198712
] + 120572 (119905)
(79)
Due to the limit of dynamic external constraints thevariety of joint 1 in configuration B is zero
In order to confirm that the adopted method can beapplied in different configurations and to verify the trackingperformance for the subsystem desired trajectory by usingthe ACI and RBF-NN based decentralized reinforcementlearning robust optimal tracking control method in this partthe comparative simulations which include the classic RBFneural network control method and ACI based decentral-ized reinforcement learning robust optimal tracking controlmethod have been adopted respectively
From Figures 7 8 9 10 11 12 13 and 14 the joint trackingand error curves are shown by using classic RBF neural net-work [4] to compensate the effect of the dynamic nonlinearterm and the interconnection term in the subsystem Figures7 to 10 show that the actual output trajectories take about2 seconds to track the desired trajectories This is due tothe fact that the classical neural network method requires alonger training process and parameter adaptive process Theerror curves in Figures 11ndash14 show that the joint subsystem
q1L2
L3
L4
L1
Y
X
120572
q2
Figure 5 The analytic chart of configuration A
q2
L4
L2
L1
L3
Y
120572
X
q1
Figure 6 The analytic chart of configuration B
constraints cannot be well compensated by adopting the clas-sical neural control method for the reconfigurable modularrobot when the time varying constrains exist When thejoint output variables turn larger the reconfigurable modularsystem cannot exhibit a good robustness and the trackingerrors are larger than before
Figures 15 16 17 18 19 20 21 and 22 showed thejoint tracking and error curves by adopting ACI to identifythe optimal 119876-function optimal control policy and globaldynamic nonlinear terms of the HJB equation in the sub-system Figures 15 to 18 show that the actual output trajec-tories of the subsystems can track the desired trajectoriesin 05 seconds by using the proposed robust reinforcementlearning optimal control method This is due to the excellentidentifying ability of ACIwhich can identify the uncertaintiescontained in the subsystems in a short timeThe error curvesin Figures 19ndash22 show that the tracking error is very smalland it can converge to zero in a short time Besides whenthe proposed decentralized control method is adopted for thejoint subsystems the robustness is manifested
Figures 23 and 24 showed the 3D-tip trajectory curvesby using ACI algorithm These two figures show that theproposed decentralized control method can fully satisfythe accessibility requirement of the reconfigurable modularrobot Besides the singular displacements of the joint subsys-tems and the end-effector would not appear The parametersdefined in ACI are shown in Table 1
12 Mathematical Problems in Engineering
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
25
Time (s)
Join
t 1 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
minus1
minus05
Figure 7 Trajectory tracking curve of configuration A joint 1 withRBF neural network
0 1 2 3 4 5 6 7 8 9 1008
1
12
14
16
18
2
22
24
26
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 8 Trajectory tracking curve of configuration A joint 2 withRBF neural network
Table 1 Parameter list of action-critic-identifier
119896 120572 120592 1205781198861
1205781198862
120578119888
1205731
1205732
120574
800 300 0005 10 50 20 02 2 05
The simulation results show that compared with thesituation of using classic RBF neural network controllerthe decentralized robust optimal tracking control methodbased on ACI and reinforcement learning can be appliedinto different configurations of the time varying constrainedreconfigurable modular robot And the joint variables cantrack the desired trajectory within a very short time indifferent configurations and the fluctuation of the errorconvergence range is minimal
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus2
minus15
minus05
minus1
Desired trajectoryActual trajectory
Figure 9 Trajectory tracking curve of configuration B joint 1 withRBF neural network
0 1 2 3 4 5 6 7 8 9 1017
18
19
2
21
22
23
24
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 10 Trajectory tracking curve of configuration B joint 2 withRBF neural network
0 1 2 3 4 5 6 7 8 9 10
0
002
004
006
008
01
Time (s)
Join
t 1 er
ror (
rad)
minus01
minus008
minus006
minus004
minus002
Figure 11 Tracking error curve of configuration A joint 1 with RBFneural network
Mathematical Problems in Engineering 13
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
002
004
006
008
01
Join
t 2 er
ror (
rad)
minus01
minus008
minus006
minus004
minus002
Figure 12 Tracking error curve of configuration A joint 2 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10
0
001
002
003
004
005
Time (s)
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 13 Tracking error curve of configuration B joint 1 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 2 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 14 Tracking error curve of configuration B joint 2 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus1
minus05
Desired trajectoryActual trajectory
Figure 15 Trajectory tracking curve of configuration A joint 1 withACI based reinforcement learning
0 1 2 3 4 5 6 7 8 9 1008
1
12
14
16
18
2
22
24
26
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 16 Trajectory tracking curve of configuration A joint 2 withACI based reinforcement learning
5 Conclusions and Future Work
In this paper combining ACI with RBF neural network anovel decentralized reinforcement learning robust optimaltracking control theory has been proposed for time varyingconstrained reconfigurable modular robots Moreover thistheory is used to solve the problem of the continuoustime nonlinear optimal control policy for strongly coupleduncertainty robotic system Firstly we build the model ofsubsystem with the time varying external constraints anddescribed the global robot system as a synthesis of inter-connected subsystem Secondly ACI is used to estimate theHJB equation and the global uncertainty where a continuous-time optimal 119876-function is adopted to take the place oftraditional optimal value function the optimal 119876-function
14 Mathematical Problems in Engineering
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus15
minus05
minus2
minus1
Desired trajectoryActual trajectory
Figure 17 Trajectory tracking curve of configuration B joint 1 withACI based reinforcement learning
0 1 2 3 4 5 6 7 8 9 1017
18
19
2
21
22
23
24
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 18 Trajectory tracking curve of configuration B joint 2 withACI based reinforcement learning
and the optimal control police are approximated by critic-NNand action-NN and the global uncertainty is identified bythe identifier Thirdly we design a novel decentralized robustoptimal tracking controller so the desired trajectory can betracked and the tracking error could converge to zero infinite timeOn this basis two kinds of Lyapunov functions aredesigned to confirm the stability of ACI and the subsystemFinally in order to confirm the superiority of the proposedcontrol theory the comparative simulation examples havebeen presented combining two different configurations of therobot
In the future more complex configuration for timevarying external constrained reconfigurable modular robotcan be included Therefore the decentralized controller withhigher control and error precision will be considered
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 19 Tracking error curve of configuration A joint 1 with ACIbased reinforcement learning
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005Jo
int 2
erro
r (ra
d)
minus005
minus004
minus003
minus002
minus001
Figure 20 Tracking error curve of configuration A joint 2 with ACIbased reinforcement learning
0
0
1 2 3 4 5 6 7 8 9 10Time (s)
001
002
003
004
005
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 21 Tracking error curve of configuration B joint 1 with ACIbased reinforcement learning
Mathematical Problems in Engineering 15
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 2 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 22 Tracking error curve of configuration B joint 2 with ACIbased reinforcement learning
005
1
02 03 04 05 06 07
0
01
02
03
minus1
minus05minus02
minus01
xla
bel
y labelz label
Desired trajectoryActual trajectory
Figure 23 3D-tip trajectory curve of configuration A with ACI
005
1
035 036 037 038 039 04
006008
01012014016018
minus1
minus05
xla
bel
y labelz label
Desired trajectoryActual trajectory
Figure 24 3D-tip trajectory curve of configuration B with ACI
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work is financially supported by the National NaturalScience Foundation of China (Grant nos 61374051 and60974010) the Scientific Technological Development PlanProject in Jilin Province of China (Grant no 20110705) Thefirst author is funded by China Scholarship Council
References
[1] Y Li and B Dong ldquoDecentralized ADRC control for reconfig-urable manipulators based on VGSTA-ESO of sliding moderdquoINFORMATION vol 15 no 6 pp 2453ndash2466 2012
[2] Y LiM-C Zhu andY-C Li ldquoSimulation on robust neurofuzzycompensator for reconfigurable manipulator motion controlrdquoJournal of System Simulation vol 19 no 22 pp 5169ndash5174 2007
[3] M-C Zhu and Y-C Li ldquoDecentralized adaptive sliding modecontrol for reconfigurable manipulators using fuzzy logicrdquoJournal of Jilin University (Engineering and Technology Edition)vol 39 no 1 pp 170ndash176 2009
[4] L Zhu and Y Li ldquoDecentralized adaptive neural networkcontrol for reconfigurable manipulatorsrdquo in Proceedings of theChinese Control and Decision Conference (CCDC rsquo10) pp 1760ndash1765 Xuzhou China May 2010
[5] M Zhu Y Li and Y Li ldquoA new distributed control schemeof modular and reconfigurable robotsrdquo in Proceedings of theIEEE International Conference onMechatronics and Automation(ICMA rsquo07) pp 2622ndash2627 Harbin China August 2007
[6] M C Zhu Y Li and Y C Li ldquoObserver-based decentralizedadaptive fuzzy control for reconfigurable manipulatorrdquo Controland Decision vol 24 no 3 pp 429ndash434 2009
[7] S Richard and A G Barto Reinforcement Learning An Intro-duction The MIT Press London UK 1998
[8] F L Lewis and D Liu ReinForcement Learning and Approxi-mate Dynamic Programming For Feedback Control Wiley-IEEEPress New York NY USA 2012
[9] Y-K Xu and X-R Cao ldquoLebesgue-sampling-based optimalcontrol problems with time aggregationrdquo IEEE Transactions onAutomatic Control vol 56 no 5 pp 1097ndash1109 2011
[10] F L Lewis andD Vrabie ldquoReinforcement learning and adaptivedynamic programming for feedback controlrdquo IEEE Circuits andSystems Magazine vol 9 no 3 pp 32ndash50 2009
[11] X Xu H-G He and D Hu ldquoEfficient reinforcement learningusing recursive least-squares methodsrdquo Journal of ArtificialIntelligence Research vol 16 pp 259ndash292 2002
[12] H Zhang Q Wei and Y Luo ldquoA novel infinite-time optimaltracking control scheme for a class of discrete-time nonlinearsystems via the greedy HDP iteration algorithmrdquo IEEE Trans-actions on Systems Man and Cybernetics B vol 38 no 4 pp937ndash942 2008
[13] H Zhang R Song Q Wei and T Zhang ldquoOptimal trackingcontrol for a class of nonlinear discrete-time systems withtime delays based on heuristic dynamic programmingrdquo IEEETransactions on Neural Networks vol 22 no 12 pp 1851ndash18622011
16 Mathematical Problems in Engineering
[14] H Zhang X Xie and S Tong ldquoHomogenous polynomiallyparameter-dependent 119867
infinfilter designs of discrete-time fuzzy
systemsrdquo IEEE Transactions on Systems Man and CyberneticsB vol 41 no 5 pp 1313ndash1322 2011
[15] H Zhang L Cui X Zhang and Y Luo ldquoData-drivenrobust approximate optimal tracking control for unknown gen-eral nonlinear systems using adaptive dynamic programmingmethodrdquo IEEE Transactions on Neural Networks vol 22 no 12pp 2226ndash2236 2011
[16] J Zhang H Zhang Y Luo and H Liang ldquoOptimal controldesign for nonlinear systems adaptive dynamic programmingbased on fuzzy critic estimatorrdquo in The International JointConference on Neural Network pp 1ndash6 Brisbane Australia2012
[17] H Zhang D Gong B Chen and Z Liu ldquoSynchronization forcoupled neural networkswith interval delay a novel augmentedlyapunov-krasovskii functional methodrdquo IEEE Transactions onNeural Networks and Learning Systems vol 24 no 1 pp 58ndash702013
[18] S Bhasin K Dupree P M Patre and W E Dixon ldquoNeuralnetwork control of a robot interacting with an uncertain vis-coelastic environmentrdquo IEEE Transactions on Control SystemsTechnology vol 19 no 4 pp 947ndash955 2011
[19] S G Khan G Herrmann F Lewis T Pipe and C MelhuishldquoA Q-learning based Cartesian model reference compliancecontroller implementation for a humanoid robot armrdquo inProceedings of the IEEE 5th International Conference onRoboticsAutomation andMechatronics (RAM rsquo11) pp 214ndash219 QingdaoChina September 2011
[20] P K Patchaikani L Behera and G Prasad ldquoA single networkadaptive critic-based redundancy resolution scheme for robotmanipulatorsrdquo IEEE Transactions on Industrial Electronics vol59 no 8 pp 3241ndash3253 2012
[21] C J C H Watkins and P Dayan ldquoQ-learningrdquo MachineLearning vol 8 no 3-4 pp 279ndash292 1992
[22] C J CHWatkinsLearning fromdelayed rewards [PhD thesis]University of Cambridge Cambridge UK 1989
[23] F L Lewis and V L Syrmos Optimal Control John Wiley ampSons New York NY USA 1995
[24] M Sassano and A Astolfi ldquoDynamic approximate solutionsof the HJ inequality and of the HJB equation for input-affinenonlinear systemsrdquo IEEE Transactions on Automatic Controlvol 57 no 10 pp 2490ndash2503 2012
[25] Y X Wu and C Wang ldquoDeterministic learning based adaptivenetwork control of robot in task spacerdquoActa Automatica Sinicavol 39 no 1 pp 1ndash10 2013
[26] P M Patre W MacKunis K Kaiser and W E DixonldquoAsymptotic tracking for uncertain dynamic systems via amultilayer neural network feedforward and RISE feedbackcontrol structurerdquo IEEE Transactions on Automatic Control vol53 no 9 pp 2180ndash2185 2008
[27] B E Paden and S S Sastry ldquoA calculus for computing Filippovrsquosdifferential inclusion with application to the variable structurecontrol of robot manipulatorsrdquo IEEE Transactions on Circuitsand Systems vol 34 no 1 pp 73ndash82 1987
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Mathematical Problems in Engineering 3
and 119869119879
Ψ(119902 119905)119891 is the contact force generated by the contact
of the end of the reconfigurable modular robot and externalconstraints
After introducing 119898th constraints for the robot whichworks in the free space because of the limitation of (1) thesystem lost 119898th degrees of freedom Therefore the degreesof freedom of the robot change from 119899 to (119899 minus 119898) so thatonly (119899 minus 119898) independent joint displacements are needed todescribe the system of restricted movement fully
Define
119902 = [1199021
1199022
] 1199021isin 119877
119899minus119898
1199022isin 119877
119898
(3)
Putting the equation above into (1) then we can get that
Ψ (1199021 Θ (119902
1 119905) 119905) = 0 (4)
where
1199022= Θ (119902
1 119905) (5)
Therefore (3) can be described by joint displacement 1199021
fully shown as follows
119902 = [1199021
Θ(1199021 119905)
] (6)
The derivation of (6) is
119902 = [
[
1199021
120597Θ (1199021 119905)
1205971199021
1199021+120597Θ (119902
1 119905)
120597119905
]
]
= [
[
119868119899minus119898
0
120597Θ (1199021 119905)
1205971199021
119868119898
]
]
[1199021
0] + [
[
0
120597Θ (1199021 119905)
120597119905
]
]
= 119879 120579 + 119867
(7)
In (7)
119879 = [
[
119868119899minus119898
0
120597Θ (1199021 119905)
1205971199021
119868119898
]
]
isin 119877119899times119899
120579 = [1199021
0] isin 119877
119899
119867 = [
[
0
120597Θ (1199021 119905)
120597119905
]
]
isin 119877119899
(8)
Therefore the second derivation of 119902 can be achieved easilyas
119902 = 119879 120579 + 120579 + (9)
Putting (7) and (9) into (2) we can get
119906 + 119869119879
Ψ(119902 119905) 119891 = 119872(119902) (119879 120579 + 120579 + )
+ 119862 (119902 119902) (119879 120579 + ) + 119866 (119902) + 119865 (119902 119902)
(10)
Define
119864 = [119868(119899minus119898)times(119899minus119898)
0119898times(119899minus119898)
] isin 119877119899times(119899minus119898)
(11)
Therefore
120579 = [1199021
0] = 119864119902
1(12)
So (2) can be decomposed into the following form
119899
sum
119895=1
119872119894119895(119902) [(119879119864 119902
1)119895+ (119864 119902
1)119895
+ 119895]
+
119899
sum
119895=1
119862119894119895(119902 119902) [(119879119864 119902
1)119895+ 119867
119895] + 119866
119894(119902)
+ 119865119894(119902
119894 119902
119894) minus 119891
119894= 119906
119894
(13)
In the equation above (119879119864 1199021)119895 (119864 119902
1)119895 (119879119864 119902
1)119895 and 119867
119895
are the 119895th element of (119879119864 1199021) (119864 119902
1) (119879119864 119902
1) and 119867
respectively 119866119894(119902) 119865
119894(119902
119894 119902
119894) and 119906
119894are the 119894th element of
119866(119902) 119865(119902 119902) and 119906 119891119894is the constraint force which suffered
by the 119894th joint 119872119894119895(119902) and 119862
119894119895(119902 119902) are the 119894119895th element of
119872(119902) and 119862(119902 119902) respectively So as shown in Figure 1 eachsubsystem dynamical model can be formulated in joint spaceas follows
119872119894(119902
119894) 119902
119894+ 119862
119894(119902
119894 119902
119894) 119902
119894+ 119866
119894(119902
119894) + 119865
119894(119902
119894 119902
119894) + 119885
119894(119902 119902 119902)
= 119906119894
(14)
119885119894(119902 119902 119902) =
119899
sum
119895=1
119895 = 119894
119872119894119895(119902) [(119879119864 119902
1)119895+ (119864 119902
1)119895
+ 119895]
+119872119894119894(119902) [(119879119864 119902
1)119894+ (119864 119902
1)119894
+ 119894]
minus119872119894(119902
119894) 119902
119894+
119899
sum
119895=1
119895 = 119894
119862119894119895(119902 119902) [(119879119864 119902
1)119895+ 119867
119895]
+ 119862119894119894(119902 119902) [(119879119864 119902
1)119895+ 119867
119895]
minus 119862119894(119902
119894 119902
119894) 119902
119894+ [119866
119894(119902) minus 119866
119894(119902
119894)]
(15)
Let 119909119894= [119909
1198941 119909
1198942]119879
= [119902119894 119902
119894]119879 for 119894 = 1 119899 then (10) can be
presented by the following state equation
119878119894
1198941= 119909
1198942
1198942= minus119891 (119909
119894 119906
119894) minus ℎ
119894(119902 119902 119902) minus 119891
119894
119910119894= 119909
1198941
(16)
4 Mathematical Problems in Engineering
Zi(q q q)
Zn(q q q)
Z1(q q q)
M1q1 + C1q1 + G1 + F1 + Z1 minus f1 = u1
Miqi + Ciqi + Gi + Fi + Zi minus fi = ui
Mnqn+ Cnqn+ Gn+ Fn+ Znminus fn= un
q1
qq1
q
qn
qi
qn
qi
q
u1
un
uiu
sum
sum
sum
minus
minus
minus
Subsystem n
Subsystem i
Subsystem 1
q=
(uminusCqminusGminusF+
(qt)f
)M
minus1
JT Ψ
Figure 1 The architecture of the time varying constrained reconfigurable modular robot system
where 119909119894is the state vector of subsystem 119878
119894 119910
119894is the output
of subsystem 119878119894 and ℎ
119894(119902 119902 119902) is the interconnection term of
the subsystem 119891(119909119894 119906
119894) and ℎ
119894(119902 119902 119902) can be defined as
119891 (119909119894 119906
119894) = 119872
minus1
119894(119902
119894) [
119862119894(119902
119894 119902
119894) 119902
119894+ 119866
119894(119902
119894)
+119865119894(119902
119894 119902
119894) minus 119906
119894
]
ℎ119894(119902 119902 119902) = minus119872
minus1
119894(119902
119894) 119885
119894(119902 119902 119902)
(17)
In response to the time varying constrained reconfig-urable modular robot system we need to design a decen-tralized robust optimal tracking control policy to make thesubsystem track the desired trajectory as well as the trackingerror is converged and bounded
3 Decentralized Reinforcement LearningRobust Optimal Tracking Control Basedon ACI and 119876-Function
Assumption 1 Desired trajectory 119910119894119889 119910
119894119889 119910
119894119889and input gain
matrix 119887119894(119909
119894) are bounded
Then (16) can be transformed to the below Consider
119878119894
1198941= 119909
1198942
1198942= minus [119865 (119909
119894 119906
119894) + ℎ
119894(119902 119902 119902) + 119891
119894] + 119887
119894(119909
119894) 119906
119894
119910119894= 119909
1198941
(18)
where 119865(119909119894 119906
119894) = 119891(119909
119894 119906
119894) + 119887
119894(119909
119894)119906
119894
Assumption 2 The interconnection terms are bounded sat-isfying the following equation
1003816100381610038161003816ℎ119894 (119902 119902 119902)1003816100381610038161003816 le 120575
1198940+
119899
sum
119895=1
120575119894119895(10038161003816100381610038161003816119904119894119895
10038161003816100381610038161003816) (19)
where 1205751198940gt 0 is an unknown constant and 120575
119894119895(|119904
119894119895|) ge 0 is an
unknown smooth Lipschitz functionThe trajectory tracking error of the joint subsystem 119888 can
be defined as
119890119894(119905) = 119909
119894minus 119910
119894119889 (20)
With regard to the continuous time state equation ofthe subsystem in (18) with the nonlinear function andinterconnection terms generally the value function can bedefined as
119881119906119894(119890119894(119905))
119894(119890
119894(119905)) = int
infin
0
119903119894(119890
119894(119905) 119906
119894(119890
119894(119905))) 119889119905 (21)
In order to facilitate the equation we use 119890119894 119906
119894instead of
119890119894(119905) 119906
119894(119890
119894(119905)) Since the trajectory 119910
119894119889relies upon the control
of the subsystem 119906119894for updating in order to avoid the infinity
results by using (21) we need to transform the value functioninto the following form
119881119906119894
119894(119890
119894) = int
infin
0
119903119894(119890
119894(120591) 119906
119894(119890
119894(120591))) 119889120591 119905 le 120591 lt infin (22)
Thus the optimal value function of the subsystem can bedefined as follows
119881lowast
119894(119890
119894) = min
119906119894
119905le120591ltinfin
int
infin
0
119903119894(119890
119894(120591) 119906
119894(119890
119894(120591))) 119889120591 (23)
Mathematical Problems in Engineering 5
Here 119903119894(119890
119894 119906
119894) represents the reward function for the current
state shown as
119903119894(119890
119894 119906
119894) = 119890
119879
119894119876119890119890119894+ 119906
119879
119894119877119906
119894 (24)
where 119876119890and 119877 are the positive definite matrixes
Typically recording the value of state-action pairs is moreuseful than recording the value of state only since the state-action pairs are the predictions of the reward Even if thereward value of a state is low it does not mean that the valueof state-action pairs is low too If the state of the subsystemin a period time produces a higher reward then it can stillget a higher state-action value Therefore from a long termperspective defining a suitable state-action value function(119876-function) can make actions produce more rewards [2122]
According to (23) and (24) the continuous-time optimal119876-function can be defined as
119876lowast
119894(119890
119894 119886
119894 119906
119894) = 119903
119894(119890
119894 119886
119894 119906
119894) + 119881
lowast
119894(119890
119894 119906
119894)
= 119903119894(119890
119894 119886
119894 119906
119894)
+ min119906119894
119905le120591ltinfin
int
infin
0
119903119894(119890
119894(120591) 119906
119894(119890
119894(120591))) 119889120591
(25)
Assumption 3 The partial derivation of 119876lowast
119894and 119903
119894(119890
119894 119886
119894 119906
119894)
exist and they are continuous in the domain According to(18) and (24) by using the control policy 119906
119894 the optimal
119876-function can satisfy the following Hamiltonian-Jacobi-Bellman equation [23]
HJB119894(119890
119894 119906
119894 nabla119876
lowast
119894)
= min119906119894(119890119894)
[119903119894(119890
119894 119886
119894 119906
119894)
+nabla119876lowast
119894(minus119865 (119890
119894 119906
119894) minus ℎ
119894(119890 119890 119890) minus 119891
119894+ 119887
119894(119890
119894) 119906
119894)]
= min119906119894(119890119894)
[119903119894(119890
119894 119886
119894 119906
119894) + nabla119876
lowast
119894Φ
119894(119890
119894 119906
119894)]
(26)
whereΦ119894(119890
119894 119906
119894) = minus119865(119890
119894 119906
119894)minusℎ
119894(119890 119890 119890)minus119891
119894+119887
119894(119890
119894)119906
119894means the
global uncertainty including the unknown dynamics of thesubsystem and the interconnection term andnabla119876lowast
119894= 120597119876
lowast
119894120597119890
119894
means the gradient of the optimal 119876-function
Lemma 4 (see [24]) Considering dynamics of the subsystemof time varying constrained reconfigurable modular robot in(14) in order to ensure the minimum of the HJB equation (26)possessing the stationary point with respect to 119906
119894 the optimal
119876-function and the optimal control policy must satisfy thefollowing conditions
(1) 120597119867119869119861(119890119894 119906
119894 nabla119876
119894)120597119906
119894= 0
(2) 1205972119867119869119861119894(119890
119894 119906
119894 nabla119876
119894)(120597119906
119894times 120597119906
119879
119894) ge 0
The necessary conditions above lead us to the followingresults
(a) The bounded control policy can guarantee a localminimum of the HJB equation (26) and satisfy theconstraints imposed on the control inputs
(b) The Hessian matrix is positive-definite and the controlpolice 119906
119894can render the global minimum of the HJB
equation(c) If an optimal algorithm exists it is unique
According to Lemma 4 if the reward function is smoothand the optimal control 119906lowast
119894is adopted then the HJB equation
satisfies the following equation
HJBlowast
119894(119890
119894 119906
lowast
119894 nabla119876
lowast
119894) = min
119906lowast
119894(119890119894)
[119903119894(119890
119894 119886
119894 119906
lowast
119894) + nabla119876
lowast
119894Φ
119894(119890
119894 119906
lowast
119894)]
= 0
(27)
And the optimal control can be expressed as follows
119906lowast
119894(119890
119894) = arg
119906lowast
119894
min [HJBlowast
119894(119890
119894 119906
lowast
119894 nabla119876
lowast
119894)]
=1
2119877minus1
119887119879
119894(119890
119894)120597119876
lowast
119894(119890
119894 119886
119894 119906
119894)119879
120597119890119894
(28)
If the optimal 119876-function 119876lowast
119894is continuous derivable
and known and the initial value119876lowast
119894(0) = 0 as well as the opti-
mal control policy 119906lowast
119894(119890
119894) and the global uncertainty of the
subsystemΦ119894(119890
119894 119906
lowast
119894) is known then the HJB equation in (27)
is held and solvableHowever in the actual situation119876lowast
119894is not
derivable everywhere and 119906lowast119894(119890
119894) andΦ
119894(119890
119894 119906
lowast
119894) are unknown
Therefore it is not feasible to solve the HJB equation byusing average method In this paper we combine the action-critic identifier (ACI) with RBF neural network to estimatethe optimal control policy the optimal 119876-function and theglobal uncertainty of the subsystem Action-NN is used toestimate 119906
lowast
119894(119890
119894) and is denoted as
119894(119890
119894) 119876lowast
119894is estimated
by critic-NN and expressed as 119876119894 then we use the robust
neural network identifier to identify Φ119894(119890
119894 119906
lowast
119894) denoted as
Φ119894(119890
119894 119906
lowast
119894)Theblock diagramof theACI architecture is shown
in Figure 2The estimated HJB equation can be expressed as follows
HJBlowast
119894(119890
119894
119894 nabla119876
119894) = min
119906119894(119890119894)[119903
119894(119890
119894 119886
119894
119894) + nabla119876
119894Φ
119894(119890
119894
119894)]
(29)
The identification error of the HJB equation above can beexpressed as
120575ℎ119894= HJBlowast
119894(119890
119894
119894 nabla119876
119894) minusHJBlowast
119894(119890
119894 119906
lowast
119894 nabla119876
lowast
119894) (30)
A classic radial basis function of the neural network isproposed in [25] shown as (31)
119873(119909) = 119882lowast119879
119878 (119909) + 120576 (119909) (31)
6 Mathematical Problems in Engineering
Action
Rewardfunction
HJB
error
Identifier
Subsystem
Critic
minus+
Qi(ei ai ui)Qi(ei ai ui)
ri(ei ai ui)
ri(ei ai ui)
Φi(ei ui)
Φi(ei ui)
Φi(ei )
eiF(t)
ui
ui
(t)
120575hi
1s
Figure 2 The architecture of action-critic-identifier
where 119882lowast means the ideal neural network weights and 120576(119909)
represents the estimation error In the case of using sufficientnumber of nodes if the center and width of the nodes arebuilt appropriately then any kind of continuous functioncould be approximated by RBF-NN Therefore the optimal119876-function and the optimal control policy can be expressedas follows
119876lowast
119894= 119882
119879
119894119878119894(119890
119894) + 120576
119894119888(119890
119894)
119906lowast
119894(119890
119894) = minus
1
2119877minus1
119887119879
119894(119890
119894) [ 119878
119894(119890
119894)119879
119882119894+ 120576
119894119886(119890
119894)]
(32)
where 119878119894(119890
119894) = [119904
1198941(119890
119894) sdot sdot sdot 119904
119894119899(119890
119894)]119879 indicates the smooth
basis function of the neural network 119882119894means the ideal
unknown neural network weight and 120576119894119888(119890
119894) and 120576
119894119886(119890
119894) are
the estimation error By using 119876119894and
119894(119890
119894) to estimate 119876
lowast
119894
and 119906lowast
119894(119890
119894) we can get the following equations
119876119894=
119879
119894119888119878119894119888(119890
119894) (33)
119894(119890
119894) = minus
1
2119877minus1
119887119879
119894(119890
119894) 119878
119894119886(119890
119894)119879
119894119886 (34)
According to the equations above 119894119888(119905) and
119894119886(119905) indicated
the weights of critic-NN and action-NN And the estimationerrors of weights are shown as follows
119894119888(119905) = 119882
119894minus
119894119888(119905) (35)
119894119886(119905) = 119882
119894minus
119894119886(119905) (36)
The update law of the weight for the critic-NN is a gradientdescent algorithm which is shown as follows
119882
119894119888(119905) = minus119899
1119871119894(119871
119879
119894
119894119888+ 119890
119879
119894119876119890119890119894+ 119906
119879
119894119877119906
119894) (37)
In the equation above 119899119894gt 0 is the adaptive gain of the neural
network 119871119894and 119897
119894are defined as
119871119894=
119897119894
119897119879119894119897119894+ 1
119897119894= nabla119878
119894119888(119890
119894) 119890
119894
(38)
Therefore according to the definition above the followinginequalities can be obtained
119871119894119898
le 119871119894le 119871
119894119872
119878119894119888119898
le 119878119894119888(119890
119894) le 119878
119894119888119872
119878119894119886119898
le 119878119894119886(119890
119894) le 119878
119894119886119872
(39)
Mathematical Problems in Engineering 7
Combining (35) with (38) we can get that
119882119894119888(119905) = minus119899
1119871119894(119871
119879
119894
119894119888+ 120575
ℎ119894) (40)
The update law of the weight for the action-NN is developedby a gradient descent algorithm expressed as follows
119882
119894119886(119905) = minus119899
2119878119894119886(119890
119894)
times ((119879
119894119886119878119894119886(119890
119894) +
1
2119877minus1
119887119894(119890
119894) nabla119878
119894119888(119890
119894)119879
119894119888))
119879
(41)
According to the estimation error of action-NN in (36) theoptimal control 119906lowast
119894(119890
119894) can minimize the optimal119876-function
and we can get the following equation
119882119879
119894119886119878119894119886(119890
119894) +
1
2119877minus1
119887119894(119890
119894) nabla119878
119894119888(119890
119894)119879
119882119894119888
+1
2119877minus1
119887119894(119890
119894) nabla120576
119894119888(119890
119894) + 120576
119894119886(119890
119894) = 0
(42)
Putting (41) into (42) we can get that
119882119894119886
= minus1198992119878119894119886(119890
119894)(
119879
119894119886119878119894119886(119890
119894) +
1
2119877minus1
119887119894(119890
119894) nabla119878
119894119888(119890
119894)119879
119894119888
minus1
2119877minus1
119887119894(119890
119894) nabla120576
119894119888(119890
119894) minus 120576
119894119886(119890
119894)
)
(43)
After using critic-NN and action-NN to estimate 119876119894and
119894(119890
119894) we need to design a kind of robust RBF-NN identifier
to identify the nonlinear uncertainties of the subsystem HereΦ
119894(119890
119894
119894) can be expressed as follows
Φ119894(119890
119894
119894) = 119890
119894119865= 119882
119879
119894119865120581 (Λ
119879
119894119865119890119894119865) + 120576
119894119865(119890
119894119865) + 119887
119894(119890
119894)
119894
(44)
where 120581(sdot) means the basic function of neural network and119882
119894119865Λ
119894119865indicate the unknown ideal neural network weights
Equation (44) can be identified by using robust RBF-NNidentifier so we can get
Φ119894(119890
119894
119894) = 119890
119894119865= 119882
119879
119894119865120581119894119865+ 119887
119894(119890
119894)
119894+ 120583
119894 (45)
Here 120581119894119865indicates the estimated value of the basic function of
the neural network 119882119894119865 Λ
119894119865are expressed as the estimated
value of neural network 120583119894isin R means the feedback error
term shown as follows [26]120583119894= 119896 (119890
119894119865(119905) minus 119890
119894119865(119905)) minus 119896 (119890
119894119865(0) minus 119890
119894119865(0)) + 120599
= 119896 (119890119894119865(119905) minus 119890
119894119865(0)) + 120599
120599 = (119896120572 + 120574) 119890119894119865+ 120573
1sat (119890
119894119865)
(46)
where 119896 120572 1205731 and 120574 are the positive control gain constants
and sat(sdot) is a saturation functionTherefore the state estima-tion error of the identifier-NN can be expressed as follows
119890119894119865= 119890
119894119865minus 119890
119894119865
= 119882119879
119894119865120581119894119865minus
119879
119894119865120581119894119865+ 120576
119894119865(119890
119894119865) minus 120583
119894
(47)
A filtered identification error is defined as follows
119864119894= 119890
119894119865+ 120572119890
119894119865 (48)
The derivation of the equation above is shown as
119894= 119882
119879
119894119865119894119865Λ119879
119894119865119890119894119865minus
119879
119894119865
120581119894119865
Λ119879
119894119865119890119894119865+ 120572 119890
119894119865minus 119896119864
119894minus 120574119890
119894119865
minus
119882119879
119894119865120581119894119865+ 120576
119894119865(119890
119894119865) minus 120573
1sat (119890
119894119865) minus
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865
(49)
Here the weight 119882119894119865 Λ
119894119865of the identification-NN can be
updated by
119882
119894119865= proj (Γ
119894119882119865
120581119894119865Λ119879
119894119865
119890119894119865119890119879
119894119865)
Λ119894119865= proj (Γ
119894Λ119865
119890119894119865119890119879
119894119865
119879
119894119865
120581119894119865)
(50)
where Γ119894119882119865
Γ119894Λ119865
are positive constant adaptation gain matri-ces In order to analyze the convergence of the filteredidentification error 119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865
can be divided into thefollowing form
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865
=1
2120581119894119865
119890119894119865[(Λ
119879
119894119865minus Λ
119879
119894119865) (119882
119879
119894119865minus
119879
119894119865)
+ (119882119879
119894119865minus
119879
119894119865) (Λ
119879
119894119865minus Λ
119879
119894119865)]
=1
2120581119894119865
119890119894119865[
119879
119894119865(Λ
119879
119894119865minus Λ
119879
119894119865) + (119882
119879
119894119865minus
119879
119894119865) Λ
119879
119894119865
minus119882119879
119894119865(Λ
119879
119894119865minus Λ
119879
119894119865) minus (119882
119879
119894119865minus
119879
119894119865)Λ
119879
119894119865
]
=1
2120581119894119865
119890119894119865[
119879
119894119865(Λ
119879
119894119865minus Λ
119879
119894119865) + (119882
119879
119894119865minus
119879
119894119865) Λ
119879
119894119865]
minus1
2120581119894119865
119890119894119865[119882
119879
119894119865(Λ
119879
119894119865minus Λ
119879
119894119865) + (119882
119879
119894119865minus
119879
119894119865)Λ
119879
119894119865]
=1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865
minus1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865minus1
2
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865
+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865
(51)
where 119879
119894119865= 119882
119879
119894119865minus
119879
119894119865 Λ119879
119894119865= Λ
119879
119894119865minus Λ
119879
119894119865 Putting (51) into
(49) then (49) can be reduced to the following form
119894= 119875
1198651+ 119875
1198652+ 119875
1198653minus 119896119864
119894minus 120574119890
119894119865minus 120573
1sat (119890
119894119865) (52)
8 Mathematical Problems in Engineering
Among the equations above 1198751198651+119875
1198652+119875
1198653can be expressed
respectively as follows
1198751198651
=1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865minus
119879
119894119865
120581119894119865
Λ119879
119894119865119890119894119865
+ 120572 119890119894119865minus
119879
119894119865120581119894119865
(53)
1198751198652
= minus1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865minus1
2
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865
+119882119879
119894119865119894119865Λ119879
119894119865119890119894119865+ 120576
119894119865(119890
119894119865)
(54)
1198751198653
=1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865 (55)
According to Assumption 1 (48) and (50) the upper boundsof 119875
1198651 119875
1198652 119875
1198653are shown as
100381710038171003817100381711987511986511003817100381710038171003817 le 119869
1(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
100381710038171003817100381711987511986521003817100381710038171003817 le 120589
1
100381710038171003817100381711987511986531003817100381710038171003817 le 120589
2
(56)
Combining (53) and (54) with (55) then we can get that100381710038171003817100381710038171198652
+ 1198653
10038171003817100381710038171003817le 120589
3+ 120589
41198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817 (57)
where 120593119894(119890
119879
119894119865 119864
119879
119894) = [119890
119879
119894119865119864119879
119894]119879 and 119869
119894(sdot) is a global invertible
nondecreasing function 120589119894 (119894 = 1 2 3 4) are computable
positive constants
Theorem 5 Considering dynamics of the subsystem of timevarying constrained reconfigurable modular robot in (14) andthe state equation (18) if the designed identifier and thecorresponding weight update laws are adopted then the globaluncertainty of the subsystem which depends explicitly on theerror term can be identified and the identification error isconverged and bounded
Proof Define the Lyapunov function as the follows
119881119894119871(119890
119894119865 119864
119894) =
1
2119864119879
119894119864119894+1
2120574119890
119879
119894119865119890119894119865+ 120603
119894(119905) + 120601
119894(119905) (58)
In the equation above 120603119894(119905) and 120601
119894(119905) can be expressed as
follows
119894(119905) = minus[
119864119879
119894(119875
1198652minus 120573
1sat (119890
119894119865)) + 119890
119879
1198941198651198751198653
minus12057321198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)100381710038171003817100381710038171003817100381710038171003817119890119894119865
1003817100381710038171003817
]
120603119894(0) = 120573
1
1003816100381610038161003816119890119894119865 (0)1003816100381610038161003816 minus 119890
119879
119894119865(0) (119875
1198652(0) + 119875
1198653(0))
(59)
120601119894(119905) =
1
4120572 [ tr (
119879
119894119865Γminus1
119894119882119865
119894119865) + tr (Λ
119879
119894119865Γminus1
119894Λ119865Λ
119894119865)] (60)
where tr(sdot) represents the trace of matrix Defining 119889 =
[119864119879
119894119890119879
11989411986512060312
11989412060112
119894] 120573
1 120573
2isin R are positive adaptation gains
which are chosen to ensure 120603119894(119905) ge 0 so we can get
1198801(119889) le 119881
119894119871(119890
119894119865 119864
119894) le 119880
2(119889) (61)
where
1198801(119889) =
1
2min (1 120574) 1198892
1198802(119889) = max (1 120574) 1198892
(62)
The derivation of (58) is shown as follows
119894119871(119890
119894119865 119864
119894) = nabla119881
119879
119894119871119870[
119894
119890119879
119894119865
1
212060312
119894119894
1
212060112
119894
120601119894]119879
(63)
where119870[sdot] is expressed as a Filipov set [27]So
119894119871(119890
119894119865 119864
119894) can be deformed as the following form
119894119871(119890
119894119865 119864
119894)
= [119864119879
119894120574119890
119879
1198941198652120603
12
1198942120601
12
119894]119870[
119894
119890119879
119894119865
1
212060312
119894119894
1
212060112
119894
120601119894]119879
le 120574119879
(
1
2
119882119879
119894119865
119894119865
Λ119879
119894119865
119894119865+
1
2
119879
119894119865
119894119865Λ119879
119894119865
119894119865minus
119879
119894119865
119894119865
Λ
119879
119894119865119890119894119865
+120572119894119865minus
1
2
119882119879
119894119865
119894119865
Λ119879
119894119865119890119894119865minus
1
2
119879
119894119865
119894119865Λ119879
119894119865119890119894119865minus 120574119890
119894119865
minus119879
119894119865120581119894119865+119882
119879
119894119865119894119865Λ119879
119894119865119890119894119865+ 120576
119894119865(119890
119894119865) +
1
2
119879
119894119865
119894119865
Λ119879
119894119865
119894119865
+
1
2
119879
119894119865
119894119865
Λ119879
119894119865
119894119865minus 119896119864
119894minus 120573
1119870[sat (119890
119894119865)]
)
minus119864119879
119894(
minus1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865minus1
2
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865
+119882119879
119894119865119894119865Λ119879
119894119865119890119894119865+ 120576
119894119865(119890
119894) minus 120573
1119870[sat (119890
119894119865)])
minus 119890119879
119894119865
1
2(
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865)
+ 120574119890119879
119894119865(119864
119894minus 120572119890
119894119865)
+ 12057321198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)100381710038171003817100381710038171003817100381710038171003817119890119894119865
1003817100381710038171003817
minus1
2120572 tr (119879
119894119865Γminus1
119894119882119865
119882
119894119865) minus
1
2120572 tr (Λ119879
119894119865Γminus1
119894Λ119865
Λ119894119865)
(64)
Put (53) (54) and (55) into (64) then we can get
119894119871(119890
119894119865 119864
119894)
= 119864119879
119894(119875
1198651+ 119875
1198652+ 119875
1198653minus 120573
1119870[sat (119890
119894119865)] minus 119896119864
119894minus 120574119890
119894119865)
+ 120574119890119879
119894119865(119864
119894minus 120572119890
119894119865)
minus 119864119879
119894(119875
1198652minus 120573
1119870[sat (119890
119894119865)])
minus 119890119879
1198941198651198751198653
+ 12057321198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)100381710038171003817100381710038171003817100381710038171003817119890119894119865
1003817100381710038171003817
minus1
2120572 tr (119879
119894119865Γminus1
119894119882119865
119882
119894119865)
minus1
2120572 tr (Λ119879
119894119865Γminus1
119894Λ119865
Λ119894119865)
Mathematical Problems in Engineering 9
= minus120572120574119890119879
119894119865119890119894119865+ (119864
119879
119894minus 119890
119879
119894119865)119875
1198653
1198691(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)2
41198963
minus1
2120572 tr (119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865119890119879
119894119865) minus
1
2120572 tr (Λ119879
119894119865
119890119894119865119890119879
119894119865
119879
119894119865
120581119894119865)
le minus1198961
100381710038171003817100381711989011989411986510038171003817100381710038172
minus 1198962
1003817100381710038171003817119864119894
10038171003817100381710038172
+1198691(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)2
41198963
10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
+1205732
21198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)2
41205721198964
10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
(65)
where 119896min = min1198961 119896
2 120585 = min119896
3 120572119896
4120573
2
2 and
119869(120593119894(119890
119879
119894119865 119864
119879
119894))
2
= 1198691(120593
119894(119890
119879
119894119865 119864
119879
119894))
2 + 1198692(120593
119894(119890
119879
119894119865 119864
119879
119894))
2 sothe following conclusion can be obtained
119894119871(119890
119894119865 119864
119894)
le minus119896min10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
+119869(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)210038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
4120585
le minus11988810038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
(66)
Therefore for an arbitrary constant 119888 minus119888120593119894(119890
119879
119894119865 119864
119879
119894)
2
is a negative semidefinite function which is defined in theadjustable interval119863 expressed as follows
119863 = 119889 (119905) | 119889 le 119869minus1
(2radic119896min120585) (67)
so that Lyapunov stability theory shows that the system isstable In order to make the subsystem of time varying con-strained reconfigurable modular robot tracking the desiredtrajectory progressively in this paper a novel decentralizedreinforcement learning robust optimal tracking controllerhas been designed by using the robust term to compensatethe neural network approximation errors Design the robustcontrol term as
119906119894119903119887
=119873
119903119887119890119894
119890119879119894119890119894+ 120577
(68)
In the equation above 120577 gt 0 is a constant And 119873119903119887can
be expressed as
119873119903119887ge [
[
1205752
ℎ119894
21198991
+1198991(minus120576
119894119886(119890
119894) minus 119877
minus1
119887119894(119890
119894) (nabla120576
119894119888(119890
119894)2))
2
21198992
+11989911198992
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
(minus120576119894119886(119890
119894) minus 119877
minus1
119887119894(119890
119894)nabla120576
119894119888(119890
119894)
2)
2
]
]
sdot(119890
119879
119894119890119894+ 120577)
211989911198992
1003817100381710038171003817119887119894 (119890119894)1003817100381710038171003817 119890
119879
119894119890119894
ge [1198992
1(minus120576
119894119886(119890
119894) minus 119877
minus1
119887119894(119890
119894)nabla120576
119894119888(119890
119894)
2)
2
+ 11989921205752
ℎ119894+ 2119899
2
11198992
2
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
sdot (minus120576119894119886(119890
119894) minus 119877
minus1
119887119894(119890
119894)nabla120576
119894119888(119890
119894)
2)
2
]
sdot(119890
119879
119894119890119894+ 120577)
41198992111989922
1003817100381710038171003817119887119894 (119890119894)1003817100381710038171003817 119890
119879
119894119890119894
(69)
Therefore the global control law can be designed asfollows
119906mix = 119906119894+ 119906
119894119903119887
= minus1
2119877minus1
119887119879
119894(119890
119894) 119878
119894119886(119890
119894)119879
119894119886+
119873119903119887119890119894
119890119879119894119890119894+ 120577
(70)
Theorem 6 Considering dynamics of the subsystem of timevarying constrained reconfigurable modular robot in (14) if thesystem parameters conditions and the assumptions are held thecritic-NN action-NN and identifier are given by (33) (34)and (45) respectively and the decentralized robust optimaltracking controller of the subsystem in (70) is adopted thenthe system is closed-loop stability and the desired trajectory canbe tracked asymptotically by the actual output
Proof Design the Lyapunov function as follows
119881119894119906(119890
119894 119906mix) =
1
21198991
tr 119879
119894119888
119894119888 +
1198991
21198992
tr 119879
119894119886
119894119886
+ 11989911198992[119890
119879
119894119865119890119894119865+ Ξint
infin
0
119903119894(119890
119894 119906mix) 119889120591]
(71)
where Ξ gt 0 is the undetermined parameter 119905 le 120591 lt infin Thederivation of (71) is shown as follows
119894119906(119890
119894 119906mix)
=1
21198991
tr 119879
119894119888
119882
119894119888 +
1198991
21198992
tr 119879
119894119886
119882
119894119886
+ 11989911198992[119890
119879
119894119865119890119894119865+ Ξ119903
119894(119890
119894 119906mix)]
=1
21198991
tr 119879
119894119888(minus119899
1119871119894(119871
119879
119894
119894119888+ 120575
ℎ119894))
+1198991
21198992
tr
times
119879
119894119886
[[[[
[
minus1198992119878119894119886(119890
119894)(
119879
119894119886119878119894119886(119890
119894) minus 120576
119894119886(119890
119894)
+1
2119877minus1
119887119894(119890
119894) nabla119878
119894119888(119890
119894)119879
119894119888
minus1
2119877minus1
119887119894(119890
119894) nabla120576
119894119888(119890
119894)
)
]]]]
]
+ 11989911198992119890119879
119894119865(119882
119879
119894119865120581 (Λ
119879
119894119865119890119894) + 120576
119894119865(119890
119894) + 119887
119894(119890
119894) mix)
+Ξ (119890119879
119894119876119890119890119894+ 119906
119879
mix119877119906mix)
10 Mathematical Problems in Engineering
le minus(1198712
119894119898minus1198991
21198712
119894119872)10038171003817100381710038171003817
119894119888
10038171003817100381710038171003817
2
+1
21198991
1205752
ℎ119894
minus (11989911198782
119894119886119898minus3
4119899111989921198782
119894119886119872)10038171003817100381710038171003817
119894119886
10038171003817100381710038171003817
2
+1198991
41198992
10038171003817100381710038171003817119877minus110038171003817100381710038171003817
21003817100381710038171003817119887119894(119890119894)10038171003817100381710038172 10038171003817100381710038171003817nabla119878
2
119894119888119872
10038171003817100381710038171003817
10038171003817100381710038171003817
119894119888
10038171003817100381710038171003817
2
+1198991
21198992
(120576119894119886(119890
119894) + 119877
minus1
119887119894(119890
119894)nabla120576
119894119888(119890
119894)
2)
119879
sdot (120576119894119886(119890
119894) + 119877
minus1
119887119894(119890
119894) nabla120576
119894119888(119890
119894)
2)
+ 11989911198992
1003817100381710038171003817119887119894 (119890119894)10038171003817100381710038172
120576119894119886(119890
119894)119879
120576119894119886(119890
119894)
+ 119899111989921198782
119894119886119872
1003817100381710038171003817119887119894 (119890119894)1003817100381710038171003817210038171003817100381710038171003817
119894119886
10038171003817100381710038171003817
2
+ 11989911198992Ξ120582min (119876119890
)1003817100381710038171003817119890119894119865
10038171003817100381710038172
+ 11989911198992(1003817100381710038171003817119887119894 (119890119894)
10038171003817100381710038172
minus Ξ120582min (119877))1003817100381710038171003817119906mix
10038171003817100381710038172
(72)
If the following inequalities can satisfy
120582min 1198761198901003817100381710038171003817119890119894119865
10038171003817100381710038172
2le 119890
119879
119894119865119876119890119890119894119865le 120582max 119876119890
1003817100381710038171003817119890119894119865
10038171003817100381710038172
2
120582min 1198771003817100381710038171003817119906mix
10038171003817100381710038172
2le 119906
119879
mix119877119906mix le 120582max 1198771003817100381710038171003817119906mix
10038171003817100381710038172
2
Ξ gt
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
120582min 119877
(73)
then 119894119906(119890
119894 119906mix) can be further transformed as
119894119906(119890
119894 119906mix)
le minus(1198712
119894119898minus1198991
21198712
119894119872minus
1198991
41198992
10038171003817100381710038171003817119877minus110038171003817100381710038171003817
21003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
nabla1198782
119894119888119872)10038171003817100381710038171003817
119894119888
10038171003817100381710038171003817
2
minus (11989911198782
119894119886119898minus3
4119899111989921198782
119894119886119872minus 119899
111989921198782
119894119886119872
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
)10038171003817100381710038171003817
119894119886
10038171003817100381710038171003817
2
minus 11989911198992(1003817100381710038171003817119887119894(119890119894)
10038171003817100381710038172
+ Ξ120582min (119877))1003817100381710038171003817119906mix
10038171003817100381710038172
minus 11989911198992Ξ120582min (119876119890
)1003817100381710038171003817119890119894119865
10038171003817100381710038172
le minus11989911198992Ξ120582min (119876119890
)1003817100381710038171003817119890119894119865
10038171003817100381710038172
(74)
Therefore we can get the conclusion that 119894119906(119890
119894 119906mix) lt 0
4 Simulations
In order to verify the validity and convergence of the pro-posed decentralized reinforcement learning robust optimaltracking control method based on ACI and to study theconvergence of the error by comparing the simulation resultin this paper two different configurations of the time varying
external constrained reconfigurablemodular robot have beenapplied shown in Figures 3 and 4
For the sake of the facilitation of the analysis of theconfigurations above we can transform them into a formof analytic charts which are shown in Figures 5 and 6where 119871
1 119871
2 and 119871
4are the length of the links 119871
3is the
distance between the time varying constraint joint and thebase modular
The time varying constraint can be defined as a kindof column which rotated about with a certain degree offreedom The constraint equations of configuration A andconfiguration B are shown as follows
Ψ119860(119902 119905) = 119871
1cos 119902
1+ 119871
2cos 119902
2minus [119871
3+ 119871
4cot120572 (119905)]
Ψ119861(119902 119905) = 119871
1+ 119871
2cos 119902
2minus [119871
3+ 119871
4cot120572 (119905)]
(75)
In the equation above the angle 120572(119905) between the timevarying constraint and the119883-axis can be defined as follows
120572 (119905) = 075120587 + 02 sin 119905
2 (76)
The initial positions of joint models are 1199021(0) = 2 119902
2(0) =
2 in configurationA and 1199021(0) = 2 119902
2(0) = 2 in configuration
BThe initial velocities of joints are zerosThe dynamicmodelof configurations A and B is designed as follows
119872119860(119902) = [
036 cos (1199022) + 06066 018 cos (119902
2) + 01233
018 cos (1199022) + 01233 01233
]
119872119861(119902) = [
017 minus 01166cos2 (1199022) minus006 cos (119902
2)
minus006 cos (1199022) 01233
]
119862119860(119902 119902) = [
minus036 sin (1199022) 119902
2minus018 sin (119902
2) 119902
2
018 sin (1199022) ( 119902
1minus 119902
2) 018 sin (119902
2) 119902
1
]
119862119861(119902 119902) = [
01166 sin (21199022) 119902
2006 sin (119902
2) 119902
2
006 sin (1199022) 119902
20
]
119866119860(119902) = [
minus588 sin (1199021+ 119902
2) minus 1764 sin (119902
1)
minus588 sin (1199021+ 119902
2)
]
119866119861(119902) = [
0
minus588 cos (1199022)]
119865119860(119902 119902) = [
1199021+ 10 sin (3119902
1) + 2 sgn ( 119902
1)
12 1199022+ 5 sin (2119902
2) + sgn ( 119902
2)]
119865119861(119902 119902) = [
0
15 1199022+ sin (119902
2) + 12 sgn ( 119902
2)]
(77)
The desired trajectory of configurations A and B is shown asConfiguration A
1199101119889
= 05 cos (119905) + 02 sin (3119905)
1199102119889
= Θ (1199101119889 119905)
= arcsin[1198711sin (120572 (119905) minus 119910
1119889) minus 119871
3sin (120572 (119905))
1198712
] + 120572 (119905)
(78)
Mathematical Problems in Engineering 11
Figure 3 Configuration A for simulation
Figure 4 Configuration B for simulation
Configuration B
1199101119889
= 0
1199102119889
= Θ (1199101119889 119905)
= arcsin [1198711sin (120572 (119905)) minus 119871
3sin (120572 (119905))
1198712
] + 120572 (119905)
(79)
Due to the limit of dynamic external constraints thevariety of joint 1 in configuration B is zero
In order to confirm that the adopted method can beapplied in different configurations and to verify the trackingperformance for the subsystem desired trajectory by usingthe ACI and RBF-NN based decentralized reinforcementlearning robust optimal tracking control method in this partthe comparative simulations which include the classic RBFneural network control method and ACI based decentral-ized reinforcement learning robust optimal tracking controlmethod have been adopted respectively
From Figures 7 8 9 10 11 12 13 and 14 the joint trackingand error curves are shown by using classic RBF neural net-work [4] to compensate the effect of the dynamic nonlinearterm and the interconnection term in the subsystem Figures7 to 10 show that the actual output trajectories take about2 seconds to track the desired trajectories This is due tothe fact that the classical neural network method requires alonger training process and parameter adaptive process Theerror curves in Figures 11ndash14 show that the joint subsystem
q1L2
L3
L4
L1
Y
X
120572
q2
Figure 5 The analytic chart of configuration A
q2
L4
L2
L1
L3
Y
120572
X
q1
Figure 6 The analytic chart of configuration B
constraints cannot be well compensated by adopting the clas-sical neural control method for the reconfigurable modularrobot when the time varying constrains exist When thejoint output variables turn larger the reconfigurable modularsystem cannot exhibit a good robustness and the trackingerrors are larger than before
Figures 15 16 17 18 19 20 21 and 22 showed thejoint tracking and error curves by adopting ACI to identifythe optimal 119876-function optimal control policy and globaldynamic nonlinear terms of the HJB equation in the sub-system Figures 15 to 18 show that the actual output trajec-tories of the subsystems can track the desired trajectoriesin 05 seconds by using the proposed robust reinforcementlearning optimal control method This is due to the excellentidentifying ability of ACIwhich can identify the uncertaintiescontained in the subsystems in a short timeThe error curvesin Figures 19ndash22 show that the tracking error is very smalland it can converge to zero in a short time Besides whenthe proposed decentralized control method is adopted for thejoint subsystems the robustness is manifested
Figures 23 and 24 showed the 3D-tip trajectory curvesby using ACI algorithm These two figures show that theproposed decentralized control method can fully satisfythe accessibility requirement of the reconfigurable modularrobot Besides the singular displacements of the joint subsys-tems and the end-effector would not appear The parametersdefined in ACI are shown in Table 1
12 Mathematical Problems in Engineering
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
25
Time (s)
Join
t 1 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
minus1
minus05
Figure 7 Trajectory tracking curve of configuration A joint 1 withRBF neural network
0 1 2 3 4 5 6 7 8 9 1008
1
12
14
16
18
2
22
24
26
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 8 Trajectory tracking curve of configuration A joint 2 withRBF neural network
Table 1 Parameter list of action-critic-identifier
119896 120572 120592 1205781198861
1205781198862
120578119888
1205731
1205732
120574
800 300 0005 10 50 20 02 2 05
The simulation results show that compared with thesituation of using classic RBF neural network controllerthe decentralized robust optimal tracking control methodbased on ACI and reinforcement learning can be appliedinto different configurations of the time varying constrainedreconfigurable modular robot And the joint variables cantrack the desired trajectory within a very short time indifferent configurations and the fluctuation of the errorconvergence range is minimal
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus2
minus15
minus05
minus1
Desired trajectoryActual trajectory
Figure 9 Trajectory tracking curve of configuration B joint 1 withRBF neural network
0 1 2 3 4 5 6 7 8 9 1017
18
19
2
21
22
23
24
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 10 Trajectory tracking curve of configuration B joint 2 withRBF neural network
0 1 2 3 4 5 6 7 8 9 10
0
002
004
006
008
01
Time (s)
Join
t 1 er
ror (
rad)
minus01
minus008
minus006
minus004
minus002
Figure 11 Tracking error curve of configuration A joint 1 with RBFneural network
Mathematical Problems in Engineering 13
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
002
004
006
008
01
Join
t 2 er
ror (
rad)
minus01
minus008
minus006
minus004
minus002
Figure 12 Tracking error curve of configuration A joint 2 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10
0
001
002
003
004
005
Time (s)
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 13 Tracking error curve of configuration B joint 1 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 2 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 14 Tracking error curve of configuration B joint 2 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus1
minus05
Desired trajectoryActual trajectory
Figure 15 Trajectory tracking curve of configuration A joint 1 withACI based reinforcement learning
0 1 2 3 4 5 6 7 8 9 1008
1
12
14
16
18
2
22
24
26
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 16 Trajectory tracking curve of configuration A joint 2 withACI based reinforcement learning
5 Conclusions and Future Work
In this paper combining ACI with RBF neural network anovel decentralized reinforcement learning robust optimaltracking control theory has been proposed for time varyingconstrained reconfigurable modular robots Moreover thistheory is used to solve the problem of the continuoustime nonlinear optimal control policy for strongly coupleduncertainty robotic system Firstly we build the model ofsubsystem with the time varying external constraints anddescribed the global robot system as a synthesis of inter-connected subsystem Secondly ACI is used to estimate theHJB equation and the global uncertainty where a continuous-time optimal 119876-function is adopted to take the place oftraditional optimal value function the optimal 119876-function
14 Mathematical Problems in Engineering
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus15
minus05
minus2
minus1
Desired trajectoryActual trajectory
Figure 17 Trajectory tracking curve of configuration B joint 1 withACI based reinforcement learning
0 1 2 3 4 5 6 7 8 9 1017
18
19
2
21
22
23
24
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 18 Trajectory tracking curve of configuration B joint 2 withACI based reinforcement learning
and the optimal control police are approximated by critic-NNand action-NN and the global uncertainty is identified bythe identifier Thirdly we design a novel decentralized robustoptimal tracking controller so the desired trajectory can betracked and the tracking error could converge to zero infinite timeOn this basis two kinds of Lyapunov functions aredesigned to confirm the stability of ACI and the subsystemFinally in order to confirm the superiority of the proposedcontrol theory the comparative simulation examples havebeen presented combining two different configurations of therobot
In the future more complex configuration for timevarying external constrained reconfigurable modular robotcan be included Therefore the decentralized controller withhigher control and error precision will be considered
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 19 Tracking error curve of configuration A joint 1 with ACIbased reinforcement learning
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005Jo
int 2
erro
r (ra
d)
minus005
minus004
minus003
minus002
minus001
Figure 20 Tracking error curve of configuration A joint 2 with ACIbased reinforcement learning
0
0
1 2 3 4 5 6 7 8 9 10Time (s)
001
002
003
004
005
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 21 Tracking error curve of configuration B joint 1 with ACIbased reinforcement learning
Mathematical Problems in Engineering 15
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 2 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 22 Tracking error curve of configuration B joint 2 with ACIbased reinforcement learning
005
1
02 03 04 05 06 07
0
01
02
03
minus1
minus05minus02
minus01
xla
bel
y labelz label
Desired trajectoryActual trajectory
Figure 23 3D-tip trajectory curve of configuration A with ACI
005
1
035 036 037 038 039 04
006008
01012014016018
minus1
minus05
xla
bel
y labelz label
Desired trajectoryActual trajectory
Figure 24 3D-tip trajectory curve of configuration B with ACI
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work is financially supported by the National NaturalScience Foundation of China (Grant nos 61374051 and60974010) the Scientific Technological Development PlanProject in Jilin Province of China (Grant no 20110705) Thefirst author is funded by China Scholarship Council
References
[1] Y Li and B Dong ldquoDecentralized ADRC control for reconfig-urable manipulators based on VGSTA-ESO of sliding moderdquoINFORMATION vol 15 no 6 pp 2453ndash2466 2012
[2] Y LiM-C Zhu andY-C Li ldquoSimulation on robust neurofuzzycompensator for reconfigurable manipulator motion controlrdquoJournal of System Simulation vol 19 no 22 pp 5169ndash5174 2007
[3] M-C Zhu and Y-C Li ldquoDecentralized adaptive sliding modecontrol for reconfigurable manipulators using fuzzy logicrdquoJournal of Jilin University (Engineering and Technology Edition)vol 39 no 1 pp 170ndash176 2009
[4] L Zhu and Y Li ldquoDecentralized adaptive neural networkcontrol for reconfigurable manipulatorsrdquo in Proceedings of theChinese Control and Decision Conference (CCDC rsquo10) pp 1760ndash1765 Xuzhou China May 2010
[5] M Zhu Y Li and Y Li ldquoA new distributed control schemeof modular and reconfigurable robotsrdquo in Proceedings of theIEEE International Conference onMechatronics and Automation(ICMA rsquo07) pp 2622ndash2627 Harbin China August 2007
[6] M C Zhu Y Li and Y C Li ldquoObserver-based decentralizedadaptive fuzzy control for reconfigurable manipulatorrdquo Controland Decision vol 24 no 3 pp 429ndash434 2009
[7] S Richard and A G Barto Reinforcement Learning An Intro-duction The MIT Press London UK 1998
[8] F L Lewis and D Liu ReinForcement Learning and Approxi-mate Dynamic Programming For Feedback Control Wiley-IEEEPress New York NY USA 2012
[9] Y-K Xu and X-R Cao ldquoLebesgue-sampling-based optimalcontrol problems with time aggregationrdquo IEEE Transactions onAutomatic Control vol 56 no 5 pp 1097ndash1109 2011
[10] F L Lewis andD Vrabie ldquoReinforcement learning and adaptivedynamic programming for feedback controlrdquo IEEE Circuits andSystems Magazine vol 9 no 3 pp 32ndash50 2009
[11] X Xu H-G He and D Hu ldquoEfficient reinforcement learningusing recursive least-squares methodsrdquo Journal of ArtificialIntelligence Research vol 16 pp 259ndash292 2002
[12] H Zhang Q Wei and Y Luo ldquoA novel infinite-time optimaltracking control scheme for a class of discrete-time nonlinearsystems via the greedy HDP iteration algorithmrdquo IEEE Trans-actions on Systems Man and Cybernetics B vol 38 no 4 pp937ndash942 2008
[13] H Zhang R Song Q Wei and T Zhang ldquoOptimal trackingcontrol for a class of nonlinear discrete-time systems withtime delays based on heuristic dynamic programmingrdquo IEEETransactions on Neural Networks vol 22 no 12 pp 1851ndash18622011
16 Mathematical Problems in Engineering
[14] H Zhang X Xie and S Tong ldquoHomogenous polynomiallyparameter-dependent 119867
infinfilter designs of discrete-time fuzzy
systemsrdquo IEEE Transactions on Systems Man and CyberneticsB vol 41 no 5 pp 1313ndash1322 2011
[15] H Zhang L Cui X Zhang and Y Luo ldquoData-drivenrobust approximate optimal tracking control for unknown gen-eral nonlinear systems using adaptive dynamic programmingmethodrdquo IEEE Transactions on Neural Networks vol 22 no 12pp 2226ndash2236 2011
[16] J Zhang H Zhang Y Luo and H Liang ldquoOptimal controldesign for nonlinear systems adaptive dynamic programmingbased on fuzzy critic estimatorrdquo in The International JointConference on Neural Network pp 1ndash6 Brisbane Australia2012
[17] H Zhang D Gong B Chen and Z Liu ldquoSynchronization forcoupled neural networkswith interval delay a novel augmentedlyapunov-krasovskii functional methodrdquo IEEE Transactions onNeural Networks and Learning Systems vol 24 no 1 pp 58ndash702013
[18] S Bhasin K Dupree P M Patre and W E Dixon ldquoNeuralnetwork control of a robot interacting with an uncertain vis-coelastic environmentrdquo IEEE Transactions on Control SystemsTechnology vol 19 no 4 pp 947ndash955 2011
[19] S G Khan G Herrmann F Lewis T Pipe and C MelhuishldquoA Q-learning based Cartesian model reference compliancecontroller implementation for a humanoid robot armrdquo inProceedings of the IEEE 5th International Conference onRoboticsAutomation andMechatronics (RAM rsquo11) pp 214ndash219 QingdaoChina September 2011
[20] P K Patchaikani L Behera and G Prasad ldquoA single networkadaptive critic-based redundancy resolution scheme for robotmanipulatorsrdquo IEEE Transactions on Industrial Electronics vol59 no 8 pp 3241ndash3253 2012
[21] C J C H Watkins and P Dayan ldquoQ-learningrdquo MachineLearning vol 8 no 3-4 pp 279ndash292 1992
[22] C J CHWatkinsLearning fromdelayed rewards [PhD thesis]University of Cambridge Cambridge UK 1989
[23] F L Lewis and V L Syrmos Optimal Control John Wiley ampSons New York NY USA 1995
[24] M Sassano and A Astolfi ldquoDynamic approximate solutionsof the HJ inequality and of the HJB equation for input-affinenonlinear systemsrdquo IEEE Transactions on Automatic Controlvol 57 no 10 pp 2490ndash2503 2012
[25] Y X Wu and C Wang ldquoDeterministic learning based adaptivenetwork control of robot in task spacerdquoActa Automatica Sinicavol 39 no 1 pp 1ndash10 2013
[26] P M Patre W MacKunis K Kaiser and W E DixonldquoAsymptotic tracking for uncertain dynamic systems via amultilayer neural network feedforward and RISE feedbackcontrol structurerdquo IEEE Transactions on Automatic Control vol53 no 9 pp 2180ndash2185 2008
[27] B E Paden and S S Sastry ldquoA calculus for computing Filippovrsquosdifferential inclusion with application to the variable structurecontrol of robot manipulatorsrdquo IEEE Transactions on Circuitsand Systems vol 34 no 1 pp 73ndash82 1987
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
4 Mathematical Problems in Engineering
Zi(q q q)
Zn(q q q)
Z1(q q q)
M1q1 + C1q1 + G1 + F1 + Z1 minus f1 = u1
Miqi + Ciqi + Gi + Fi + Zi minus fi = ui
Mnqn+ Cnqn+ Gn+ Fn+ Znminus fn= un
q1
qq1
q
qn
qi
qn
qi
q
u1
un
uiu
sum
sum
sum
minus
minus
minus
Subsystem n
Subsystem i
Subsystem 1
q=
(uminusCqminusGminusF+
(qt)f
)M
minus1
JT Ψ
Figure 1 The architecture of the time varying constrained reconfigurable modular robot system
where 119909119894is the state vector of subsystem 119878
119894 119910
119894is the output
of subsystem 119878119894 and ℎ
119894(119902 119902 119902) is the interconnection term of
the subsystem 119891(119909119894 119906
119894) and ℎ
119894(119902 119902 119902) can be defined as
119891 (119909119894 119906
119894) = 119872
minus1
119894(119902
119894) [
119862119894(119902
119894 119902
119894) 119902
119894+ 119866
119894(119902
119894)
+119865119894(119902
119894 119902
119894) minus 119906
119894
]
ℎ119894(119902 119902 119902) = minus119872
minus1
119894(119902
119894) 119885
119894(119902 119902 119902)
(17)
In response to the time varying constrained reconfig-urable modular robot system we need to design a decen-tralized robust optimal tracking control policy to make thesubsystem track the desired trajectory as well as the trackingerror is converged and bounded
3 Decentralized Reinforcement LearningRobust Optimal Tracking Control Basedon ACI and 119876-Function
Assumption 1 Desired trajectory 119910119894119889 119910
119894119889 119910
119894119889and input gain
matrix 119887119894(119909
119894) are bounded
Then (16) can be transformed to the below Consider
119878119894
1198941= 119909
1198942
1198942= minus [119865 (119909
119894 119906
119894) + ℎ
119894(119902 119902 119902) + 119891
119894] + 119887
119894(119909
119894) 119906
119894
119910119894= 119909
1198941
(18)
where 119865(119909119894 119906
119894) = 119891(119909
119894 119906
119894) + 119887
119894(119909
119894)119906
119894
Assumption 2 The interconnection terms are bounded sat-isfying the following equation
1003816100381610038161003816ℎ119894 (119902 119902 119902)1003816100381610038161003816 le 120575
1198940+
119899
sum
119895=1
120575119894119895(10038161003816100381610038161003816119904119894119895
10038161003816100381610038161003816) (19)
where 1205751198940gt 0 is an unknown constant and 120575
119894119895(|119904
119894119895|) ge 0 is an
unknown smooth Lipschitz functionThe trajectory tracking error of the joint subsystem 119888 can
be defined as
119890119894(119905) = 119909
119894minus 119910
119894119889 (20)
With regard to the continuous time state equation ofthe subsystem in (18) with the nonlinear function andinterconnection terms generally the value function can bedefined as
119881119906119894(119890119894(119905))
119894(119890
119894(119905)) = int
infin
0
119903119894(119890
119894(119905) 119906
119894(119890
119894(119905))) 119889119905 (21)
In order to facilitate the equation we use 119890119894 119906
119894instead of
119890119894(119905) 119906
119894(119890
119894(119905)) Since the trajectory 119910
119894119889relies upon the control
of the subsystem 119906119894for updating in order to avoid the infinity
results by using (21) we need to transform the value functioninto the following form
119881119906119894
119894(119890
119894) = int
infin
0
119903119894(119890
119894(120591) 119906
119894(119890
119894(120591))) 119889120591 119905 le 120591 lt infin (22)
Thus the optimal value function of the subsystem can bedefined as follows
119881lowast
119894(119890
119894) = min
119906119894
119905le120591ltinfin
int
infin
0
119903119894(119890
119894(120591) 119906
119894(119890
119894(120591))) 119889120591 (23)
Mathematical Problems in Engineering 5
Here 119903119894(119890
119894 119906
119894) represents the reward function for the current
state shown as
119903119894(119890
119894 119906
119894) = 119890
119879
119894119876119890119890119894+ 119906
119879
119894119877119906
119894 (24)
where 119876119890and 119877 are the positive definite matrixes
Typically recording the value of state-action pairs is moreuseful than recording the value of state only since the state-action pairs are the predictions of the reward Even if thereward value of a state is low it does not mean that the valueof state-action pairs is low too If the state of the subsystemin a period time produces a higher reward then it can stillget a higher state-action value Therefore from a long termperspective defining a suitable state-action value function(119876-function) can make actions produce more rewards [2122]
According to (23) and (24) the continuous-time optimal119876-function can be defined as
119876lowast
119894(119890
119894 119886
119894 119906
119894) = 119903
119894(119890
119894 119886
119894 119906
119894) + 119881
lowast
119894(119890
119894 119906
119894)
= 119903119894(119890
119894 119886
119894 119906
119894)
+ min119906119894
119905le120591ltinfin
int
infin
0
119903119894(119890
119894(120591) 119906
119894(119890
119894(120591))) 119889120591
(25)
Assumption 3 The partial derivation of 119876lowast
119894and 119903
119894(119890
119894 119886
119894 119906
119894)
exist and they are continuous in the domain According to(18) and (24) by using the control policy 119906
119894 the optimal
119876-function can satisfy the following Hamiltonian-Jacobi-Bellman equation [23]
HJB119894(119890
119894 119906
119894 nabla119876
lowast
119894)
= min119906119894(119890119894)
[119903119894(119890
119894 119886
119894 119906
119894)
+nabla119876lowast
119894(minus119865 (119890
119894 119906
119894) minus ℎ
119894(119890 119890 119890) minus 119891
119894+ 119887
119894(119890
119894) 119906
119894)]
= min119906119894(119890119894)
[119903119894(119890
119894 119886
119894 119906
119894) + nabla119876
lowast
119894Φ
119894(119890
119894 119906
119894)]
(26)
whereΦ119894(119890
119894 119906
119894) = minus119865(119890
119894 119906
119894)minusℎ
119894(119890 119890 119890)minus119891
119894+119887
119894(119890
119894)119906
119894means the
global uncertainty including the unknown dynamics of thesubsystem and the interconnection term andnabla119876lowast
119894= 120597119876
lowast
119894120597119890
119894
means the gradient of the optimal 119876-function
Lemma 4 (see [24]) Considering dynamics of the subsystemof time varying constrained reconfigurable modular robot in(14) in order to ensure the minimum of the HJB equation (26)possessing the stationary point with respect to 119906
119894 the optimal
119876-function and the optimal control policy must satisfy thefollowing conditions
(1) 120597119867119869119861(119890119894 119906
119894 nabla119876
119894)120597119906
119894= 0
(2) 1205972119867119869119861119894(119890
119894 119906
119894 nabla119876
119894)(120597119906
119894times 120597119906
119879
119894) ge 0
The necessary conditions above lead us to the followingresults
(a) The bounded control policy can guarantee a localminimum of the HJB equation (26) and satisfy theconstraints imposed on the control inputs
(b) The Hessian matrix is positive-definite and the controlpolice 119906
119894can render the global minimum of the HJB
equation(c) If an optimal algorithm exists it is unique
According to Lemma 4 if the reward function is smoothand the optimal control 119906lowast
119894is adopted then the HJB equation
satisfies the following equation
HJBlowast
119894(119890
119894 119906
lowast
119894 nabla119876
lowast
119894) = min
119906lowast
119894(119890119894)
[119903119894(119890
119894 119886
119894 119906
lowast
119894) + nabla119876
lowast
119894Φ
119894(119890
119894 119906
lowast
119894)]
= 0
(27)
And the optimal control can be expressed as follows
119906lowast
119894(119890
119894) = arg
119906lowast
119894
min [HJBlowast
119894(119890
119894 119906
lowast
119894 nabla119876
lowast
119894)]
=1
2119877minus1
119887119879
119894(119890
119894)120597119876
lowast
119894(119890
119894 119886
119894 119906
119894)119879
120597119890119894
(28)
If the optimal 119876-function 119876lowast
119894is continuous derivable
and known and the initial value119876lowast
119894(0) = 0 as well as the opti-
mal control policy 119906lowast
119894(119890
119894) and the global uncertainty of the
subsystemΦ119894(119890
119894 119906
lowast
119894) is known then the HJB equation in (27)
is held and solvableHowever in the actual situation119876lowast
119894is not
derivable everywhere and 119906lowast119894(119890
119894) andΦ
119894(119890
119894 119906
lowast
119894) are unknown
Therefore it is not feasible to solve the HJB equation byusing average method In this paper we combine the action-critic identifier (ACI) with RBF neural network to estimatethe optimal control policy the optimal 119876-function and theglobal uncertainty of the subsystem Action-NN is used toestimate 119906
lowast
119894(119890
119894) and is denoted as
119894(119890
119894) 119876lowast
119894is estimated
by critic-NN and expressed as 119876119894 then we use the robust
neural network identifier to identify Φ119894(119890
119894 119906
lowast
119894) denoted as
Φ119894(119890
119894 119906
lowast
119894)Theblock diagramof theACI architecture is shown
in Figure 2The estimated HJB equation can be expressed as follows
HJBlowast
119894(119890
119894
119894 nabla119876
119894) = min
119906119894(119890119894)[119903
119894(119890
119894 119886
119894
119894) + nabla119876
119894Φ
119894(119890
119894
119894)]
(29)
The identification error of the HJB equation above can beexpressed as
120575ℎ119894= HJBlowast
119894(119890
119894
119894 nabla119876
119894) minusHJBlowast
119894(119890
119894 119906
lowast
119894 nabla119876
lowast
119894) (30)
A classic radial basis function of the neural network isproposed in [25] shown as (31)
119873(119909) = 119882lowast119879
119878 (119909) + 120576 (119909) (31)
6 Mathematical Problems in Engineering
Action
Rewardfunction
HJB
error
Identifier
Subsystem
Critic
minus+
Qi(ei ai ui)Qi(ei ai ui)
ri(ei ai ui)
ri(ei ai ui)
Φi(ei ui)
Φi(ei ui)
Φi(ei )
eiF(t)
ui
ui
(t)
120575hi
1s
Figure 2 The architecture of action-critic-identifier
where 119882lowast means the ideal neural network weights and 120576(119909)
represents the estimation error In the case of using sufficientnumber of nodes if the center and width of the nodes arebuilt appropriately then any kind of continuous functioncould be approximated by RBF-NN Therefore the optimal119876-function and the optimal control policy can be expressedas follows
119876lowast
119894= 119882
119879
119894119878119894(119890
119894) + 120576
119894119888(119890
119894)
119906lowast
119894(119890
119894) = minus
1
2119877minus1
119887119879
119894(119890
119894) [ 119878
119894(119890
119894)119879
119882119894+ 120576
119894119886(119890
119894)]
(32)
where 119878119894(119890
119894) = [119904
1198941(119890
119894) sdot sdot sdot 119904
119894119899(119890
119894)]119879 indicates the smooth
basis function of the neural network 119882119894means the ideal
unknown neural network weight and 120576119894119888(119890
119894) and 120576
119894119886(119890
119894) are
the estimation error By using 119876119894and
119894(119890
119894) to estimate 119876
lowast
119894
and 119906lowast
119894(119890
119894) we can get the following equations
119876119894=
119879
119894119888119878119894119888(119890
119894) (33)
119894(119890
119894) = minus
1
2119877minus1
119887119879
119894(119890
119894) 119878
119894119886(119890
119894)119879
119894119886 (34)
According to the equations above 119894119888(119905) and
119894119886(119905) indicated
the weights of critic-NN and action-NN And the estimationerrors of weights are shown as follows
119894119888(119905) = 119882
119894minus
119894119888(119905) (35)
119894119886(119905) = 119882
119894minus
119894119886(119905) (36)
The update law of the weight for the critic-NN is a gradientdescent algorithm which is shown as follows
119882
119894119888(119905) = minus119899
1119871119894(119871
119879
119894
119894119888+ 119890
119879
119894119876119890119890119894+ 119906
119879
119894119877119906
119894) (37)
In the equation above 119899119894gt 0 is the adaptive gain of the neural
network 119871119894and 119897
119894are defined as
119871119894=
119897119894
119897119879119894119897119894+ 1
119897119894= nabla119878
119894119888(119890
119894) 119890
119894
(38)
Therefore according to the definition above the followinginequalities can be obtained
119871119894119898
le 119871119894le 119871
119894119872
119878119894119888119898
le 119878119894119888(119890
119894) le 119878
119894119888119872
119878119894119886119898
le 119878119894119886(119890
119894) le 119878
119894119886119872
(39)
Mathematical Problems in Engineering 7
Combining (35) with (38) we can get that
119882119894119888(119905) = minus119899
1119871119894(119871
119879
119894
119894119888+ 120575
ℎ119894) (40)
The update law of the weight for the action-NN is developedby a gradient descent algorithm expressed as follows
119882
119894119886(119905) = minus119899
2119878119894119886(119890
119894)
times ((119879
119894119886119878119894119886(119890
119894) +
1
2119877minus1
119887119894(119890
119894) nabla119878
119894119888(119890
119894)119879
119894119888))
119879
(41)
According to the estimation error of action-NN in (36) theoptimal control 119906lowast
119894(119890
119894) can minimize the optimal119876-function
and we can get the following equation
119882119879
119894119886119878119894119886(119890
119894) +
1
2119877minus1
119887119894(119890
119894) nabla119878
119894119888(119890
119894)119879
119882119894119888
+1
2119877minus1
119887119894(119890
119894) nabla120576
119894119888(119890
119894) + 120576
119894119886(119890
119894) = 0
(42)
Putting (41) into (42) we can get that
119882119894119886
= minus1198992119878119894119886(119890
119894)(
119879
119894119886119878119894119886(119890
119894) +
1
2119877minus1
119887119894(119890
119894) nabla119878
119894119888(119890
119894)119879
119894119888
minus1
2119877minus1
119887119894(119890
119894) nabla120576
119894119888(119890
119894) minus 120576
119894119886(119890
119894)
)
(43)
After using critic-NN and action-NN to estimate 119876119894and
119894(119890
119894) we need to design a kind of robust RBF-NN identifier
to identify the nonlinear uncertainties of the subsystem HereΦ
119894(119890
119894
119894) can be expressed as follows
Φ119894(119890
119894
119894) = 119890
119894119865= 119882
119879
119894119865120581 (Λ
119879
119894119865119890119894119865) + 120576
119894119865(119890
119894119865) + 119887
119894(119890
119894)
119894
(44)
where 120581(sdot) means the basic function of neural network and119882
119894119865Λ
119894119865indicate the unknown ideal neural network weights
Equation (44) can be identified by using robust RBF-NNidentifier so we can get
Φ119894(119890
119894
119894) = 119890
119894119865= 119882
119879
119894119865120581119894119865+ 119887
119894(119890
119894)
119894+ 120583
119894 (45)
Here 120581119894119865indicates the estimated value of the basic function of
the neural network 119882119894119865 Λ
119894119865are expressed as the estimated
value of neural network 120583119894isin R means the feedback error
term shown as follows [26]120583119894= 119896 (119890
119894119865(119905) minus 119890
119894119865(119905)) minus 119896 (119890
119894119865(0) minus 119890
119894119865(0)) + 120599
= 119896 (119890119894119865(119905) minus 119890
119894119865(0)) + 120599
120599 = (119896120572 + 120574) 119890119894119865+ 120573
1sat (119890
119894119865)
(46)
where 119896 120572 1205731 and 120574 are the positive control gain constants
and sat(sdot) is a saturation functionTherefore the state estima-tion error of the identifier-NN can be expressed as follows
119890119894119865= 119890
119894119865minus 119890
119894119865
= 119882119879
119894119865120581119894119865minus
119879
119894119865120581119894119865+ 120576
119894119865(119890
119894119865) minus 120583
119894
(47)
A filtered identification error is defined as follows
119864119894= 119890
119894119865+ 120572119890
119894119865 (48)
The derivation of the equation above is shown as
119894= 119882
119879
119894119865119894119865Λ119879
119894119865119890119894119865minus
119879
119894119865
120581119894119865
Λ119879
119894119865119890119894119865+ 120572 119890
119894119865minus 119896119864
119894minus 120574119890
119894119865
minus
119882119879
119894119865120581119894119865+ 120576
119894119865(119890
119894119865) minus 120573
1sat (119890
119894119865) minus
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865
(49)
Here the weight 119882119894119865 Λ
119894119865of the identification-NN can be
updated by
119882
119894119865= proj (Γ
119894119882119865
120581119894119865Λ119879
119894119865
119890119894119865119890119879
119894119865)
Λ119894119865= proj (Γ
119894Λ119865
119890119894119865119890119879
119894119865
119879
119894119865
120581119894119865)
(50)
where Γ119894119882119865
Γ119894Λ119865
are positive constant adaptation gain matri-ces In order to analyze the convergence of the filteredidentification error 119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865
can be divided into thefollowing form
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865
=1
2120581119894119865
119890119894119865[(Λ
119879
119894119865minus Λ
119879
119894119865) (119882
119879
119894119865minus
119879
119894119865)
+ (119882119879
119894119865minus
119879
119894119865) (Λ
119879
119894119865minus Λ
119879
119894119865)]
=1
2120581119894119865
119890119894119865[
119879
119894119865(Λ
119879
119894119865minus Λ
119879
119894119865) + (119882
119879
119894119865minus
119879
119894119865) Λ
119879
119894119865
minus119882119879
119894119865(Λ
119879
119894119865minus Λ
119879
119894119865) minus (119882
119879
119894119865minus
119879
119894119865)Λ
119879
119894119865
]
=1
2120581119894119865
119890119894119865[
119879
119894119865(Λ
119879
119894119865minus Λ
119879
119894119865) + (119882
119879
119894119865minus
119879
119894119865) Λ
119879
119894119865]
minus1
2120581119894119865
119890119894119865[119882
119879
119894119865(Λ
119879
119894119865minus Λ
119879
119894119865) + (119882
119879
119894119865minus
119879
119894119865)Λ
119879
119894119865]
=1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865
minus1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865minus1
2
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865
+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865
(51)
where 119879
119894119865= 119882
119879
119894119865minus
119879
119894119865 Λ119879
119894119865= Λ
119879
119894119865minus Λ
119879
119894119865 Putting (51) into
(49) then (49) can be reduced to the following form
119894= 119875
1198651+ 119875
1198652+ 119875
1198653minus 119896119864
119894minus 120574119890
119894119865minus 120573
1sat (119890
119894119865) (52)
8 Mathematical Problems in Engineering
Among the equations above 1198751198651+119875
1198652+119875
1198653can be expressed
respectively as follows
1198751198651
=1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865minus
119879
119894119865
120581119894119865
Λ119879
119894119865119890119894119865
+ 120572 119890119894119865minus
119879
119894119865120581119894119865
(53)
1198751198652
= minus1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865minus1
2
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865
+119882119879
119894119865119894119865Λ119879
119894119865119890119894119865+ 120576
119894119865(119890
119894119865)
(54)
1198751198653
=1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865 (55)
According to Assumption 1 (48) and (50) the upper boundsof 119875
1198651 119875
1198652 119875
1198653are shown as
100381710038171003817100381711987511986511003817100381710038171003817 le 119869
1(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
100381710038171003817100381711987511986521003817100381710038171003817 le 120589
1
100381710038171003817100381711987511986531003817100381710038171003817 le 120589
2
(56)
Combining (53) and (54) with (55) then we can get that100381710038171003817100381710038171198652
+ 1198653
10038171003817100381710038171003817le 120589
3+ 120589
41198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817 (57)
where 120593119894(119890
119879
119894119865 119864
119879
119894) = [119890
119879
119894119865119864119879
119894]119879 and 119869
119894(sdot) is a global invertible
nondecreasing function 120589119894 (119894 = 1 2 3 4) are computable
positive constants
Theorem 5 Considering dynamics of the subsystem of timevarying constrained reconfigurable modular robot in (14) andthe state equation (18) if the designed identifier and thecorresponding weight update laws are adopted then the globaluncertainty of the subsystem which depends explicitly on theerror term can be identified and the identification error isconverged and bounded
Proof Define the Lyapunov function as the follows
119881119894119871(119890
119894119865 119864
119894) =
1
2119864119879
119894119864119894+1
2120574119890
119879
119894119865119890119894119865+ 120603
119894(119905) + 120601
119894(119905) (58)
In the equation above 120603119894(119905) and 120601
119894(119905) can be expressed as
follows
119894(119905) = minus[
119864119879
119894(119875
1198652minus 120573
1sat (119890
119894119865)) + 119890
119879
1198941198651198751198653
minus12057321198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)100381710038171003817100381710038171003817100381710038171003817119890119894119865
1003817100381710038171003817
]
120603119894(0) = 120573
1
1003816100381610038161003816119890119894119865 (0)1003816100381610038161003816 minus 119890
119879
119894119865(0) (119875
1198652(0) + 119875
1198653(0))
(59)
120601119894(119905) =
1
4120572 [ tr (
119879
119894119865Γminus1
119894119882119865
119894119865) + tr (Λ
119879
119894119865Γminus1
119894Λ119865Λ
119894119865)] (60)
where tr(sdot) represents the trace of matrix Defining 119889 =
[119864119879
119894119890119879
11989411986512060312
11989412060112
119894] 120573
1 120573
2isin R are positive adaptation gains
which are chosen to ensure 120603119894(119905) ge 0 so we can get
1198801(119889) le 119881
119894119871(119890
119894119865 119864
119894) le 119880
2(119889) (61)
where
1198801(119889) =
1
2min (1 120574) 1198892
1198802(119889) = max (1 120574) 1198892
(62)
The derivation of (58) is shown as follows
119894119871(119890
119894119865 119864
119894) = nabla119881
119879
119894119871119870[
119894
119890119879
119894119865
1
212060312
119894119894
1
212060112
119894
120601119894]119879
(63)
where119870[sdot] is expressed as a Filipov set [27]So
119894119871(119890
119894119865 119864
119894) can be deformed as the following form
119894119871(119890
119894119865 119864
119894)
= [119864119879
119894120574119890
119879
1198941198652120603
12
1198942120601
12
119894]119870[
119894
119890119879
119894119865
1
212060312
119894119894
1
212060112
119894
120601119894]119879
le 120574119879
(
1
2
119882119879
119894119865
119894119865
Λ119879
119894119865
119894119865+
1
2
119879
119894119865
119894119865Λ119879
119894119865
119894119865minus
119879
119894119865
119894119865
Λ
119879
119894119865119890119894119865
+120572119894119865minus
1
2
119882119879
119894119865
119894119865
Λ119879
119894119865119890119894119865minus
1
2
119879
119894119865
119894119865Λ119879
119894119865119890119894119865minus 120574119890
119894119865
minus119879
119894119865120581119894119865+119882
119879
119894119865119894119865Λ119879
119894119865119890119894119865+ 120576
119894119865(119890
119894119865) +
1
2
119879
119894119865
119894119865
Λ119879
119894119865
119894119865
+
1
2
119879
119894119865
119894119865
Λ119879
119894119865
119894119865minus 119896119864
119894minus 120573
1119870[sat (119890
119894119865)]
)
minus119864119879
119894(
minus1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865minus1
2
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865
+119882119879
119894119865119894119865Λ119879
119894119865119890119894119865+ 120576
119894119865(119890
119894) minus 120573
1119870[sat (119890
119894119865)])
minus 119890119879
119894119865
1
2(
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865)
+ 120574119890119879
119894119865(119864
119894minus 120572119890
119894119865)
+ 12057321198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)100381710038171003817100381710038171003817100381710038171003817119890119894119865
1003817100381710038171003817
minus1
2120572 tr (119879
119894119865Γminus1
119894119882119865
119882
119894119865) minus
1
2120572 tr (Λ119879
119894119865Γminus1
119894Λ119865
Λ119894119865)
(64)
Put (53) (54) and (55) into (64) then we can get
119894119871(119890
119894119865 119864
119894)
= 119864119879
119894(119875
1198651+ 119875
1198652+ 119875
1198653minus 120573
1119870[sat (119890
119894119865)] minus 119896119864
119894minus 120574119890
119894119865)
+ 120574119890119879
119894119865(119864
119894minus 120572119890
119894119865)
minus 119864119879
119894(119875
1198652minus 120573
1119870[sat (119890
119894119865)])
minus 119890119879
1198941198651198751198653
+ 12057321198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)100381710038171003817100381710038171003817100381710038171003817119890119894119865
1003817100381710038171003817
minus1
2120572 tr (119879
119894119865Γminus1
119894119882119865
119882
119894119865)
minus1
2120572 tr (Λ119879
119894119865Γminus1
119894Λ119865
Λ119894119865)
Mathematical Problems in Engineering 9
= minus120572120574119890119879
119894119865119890119894119865+ (119864
119879
119894minus 119890
119879
119894119865)119875
1198653
1198691(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)2
41198963
minus1
2120572 tr (119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865119890119879
119894119865) minus
1
2120572 tr (Λ119879
119894119865
119890119894119865119890119879
119894119865
119879
119894119865
120581119894119865)
le minus1198961
100381710038171003817100381711989011989411986510038171003817100381710038172
minus 1198962
1003817100381710038171003817119864119894
10038171003817100381710038172
+1198691(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)2
41198963
10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
+1205732
21198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)2
41205721198964
10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
(65)
where 119896min = min1198961 119896
2 120585 = min119896
3 120572119896
4120573
2
2 and
119869(120593119894(119890
119879
119894119865 119864
119879
119894))
2
= 1198691(120593
119894(119890
119879
119894119865 119864
119879
119894))
2 + 1198692(120593
119894(119890
119879
119894119865 119864
119879
119894))
2 sothe following conclusion can be obtained
119894119871(119890
119894119865 119864
119894)
le minus119896min10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
+119869(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)210038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
4120585
le minus11988810038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
(66)
Therefore for an arbitrary constant 119888 minus119888120593119894(119890
119879
119894119865 119864
119879
119894)
2
is a negative semidefinite function which is defined in theadjustable interval119863 expressed as follows
119863 = 119889 (119905) | 119889 le 119869minus1
(2radic119896min120585) (67)
so that Lyapunov stability theory shows that the system isstable In order to make the subsystem of time varying con-strained reconfigurable modular robot tracking the desiredtrajectory progressively in this paper a novel decentralizedreinforcement learning robust optimal tracking controllerhas been designed by using the robust term to compensatethe neural network approximation errors Design the robustcontrol term as
119906119894119903119887
=119873
119903119887119890119894
119890119879119894119890119894+ 120577
(68)
In the equation above 120577 gt 0 is a constant And 119873119903119887can
be expressed as
119873119903119887ge [
[
1205752
ℎ119894
21198991
+1198991(minus120576
119894119886(119890
119894) minus 119877
minus1
119887119894(119890
119894) (nabla120576
119894119888(119890
119894)2))
2
21198992
+11989911198992
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
(minus120576119894119886(119890
119894) minus 119877
minus1
119887119894(119890
119894)nabla120576
119894119888(119890
119894)
2)
2
]
]
sdot(119890
119879
119894119890119894+ 120577)
211989911198992
1003817100381710038171003817119887119894 (119890119894)1003817100381710038171003817 119890
119879
119894119890119894
ge [1198992
1(minus120576
119894119886(119890
119894) minus 119877
minus1
119887119894(119890
119894)nabla120576
119894119888(119890
119894)
2)
2
+ 11989921205752
ℎ119894+ 2119899
2
11198992
2
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
sdot (minus120576119894119886(119890
119894) minus 119877
minus1
119887119894(119890
119894)nabla120576
119894119888(119890
119894)
2)
2
]
sdot(119890
119879
119894119890119894+ 120577)
41198992111989922
1003817100381710038171003817119887119894 (119890119894)1003817100381710038171003817 119890
119879
119894119890119894
(69)
Therefore the global control law can be designed asfollows
119906mix = 119906119894+ 119906
119894119903119887
= minus1
2119877minus1
119887119879
119894(119890
119894) 119878
119894119886(119890
119894)119879
119894119886+
119873119903119887119890119894
119890119879119894119890119894+ 120577
(70)
Theorem 6 Considering dynamics of the subsystem of timevarying constrained reconfigurable modular robot in (14) if thesystem parameters conditions and the assumptions are held thecritic-NN action-NN and identifier are given by (33) (34)and (45) respectively and the decentralized robust optimaltracking controller of the subsystem in (70) is adopted thenthe system is closed-loop stability and the desired trajectory canbe tracked asymptotically by the actual output
Proof Design the Lyapunov function as follows
119881119894119906(119890
119894 119906mix) =
1
21198991
tr 119879
119894119888
119894119888 +
1198991
21198992
tr 119879
119894119886
119894119886
+ 11989911198992[119890
119879
119894119865119890119894119865+ Ξint
infin
0
119903119894(119890
119894 119906mix) 119889120591]
(71)
where Ξ gt 0 is the undetermined parameter 119905 le 120591 lt infin Thederivation of (71) is shown as follows
119894119906(119890
119894 119906mix)
=1
21198991
tr 119879
119894119888
119882
119894119888 +
1198991
21198992
tr 119879
119894119886
119882
119894119886
+ 11989911198992[119890
119879
119894119865119890119894119865+ Ξ119903
119894(119890
119894 119906mix)]
=1
21198991
tr 119879
119894119888(minus119899
1119871119894(119871
119879
119894
119894119888+ 120575
ℎ119894))
+1198991
21198992
tr
times
119879
119894119886
[[[[
[
minus1198992119878119894119886(119890
119894)(
119879
119894119886119878119894119886(119890
119894) minus 120576
119894119886(119890
119894)
+1
2119877minus1
119887119894(119890
119894) nabla119878
119894119888(119890
119894)119879
119894119888
minus1
2119877minus1
119887119894(119890
119894) nabla120576
119894119888(119890
119894)
)
]]]]
]
+ 11989911198992119890119879
119894119865(119882
119879
119894119865120581 (Λ
119879
119894119865119890119894) + 120576
119894119865(119890
119894) + 119887
119894(119890
119894) mix)
+Ξ (119890119879
119894119876119890119890119894+ 119906
119879
mix119877119906mix)
10 Mathematical Problems in Engineering
le minus(1198712
119894119898minus1198991
21198712
119894119872)10038171003817100381710038171003817
119894119888
10038171003817100381710038171003817
2
+1
21198991
1205752
ℎ119894
minus (11989911198782
119894119886119898minus3
4119899111989921198782
119894119886119872)10038171003817100381710038171003817
119894119886
10038171003817100381710038171003817
2
+1198991
41198992
10038171003817100381710038171003817119877minus110038171003817100381710038171003817
21003817100381710038171003817119887119894(119890119894)10038171003817100381710038172 10038171003817100381710038171003817nabla119878
2
119894119888119872
10038171003817100381710038171003817
10038171003817100381710038171003817
119894119888
10038171003817100381710038171003817
2
+1198991
21198992
(120576119894119886(119890
119894) + 119877
minus1
119887119894(119890
119894)nabla120576
119894119888(119890
119894)
2)
119879
sdot (120576119894119886(119890
119894) + 119877
minus1
119887119894(119890
119894) nabla120576
119894119888(119890
119894)
2)
+ 11989911198992
1003817100381710038171003817119887119894 (119890119894)10038171003817100381710038172
120576119894119886(119890
119894)119879
120576119894119886(119890
119894)
+ 119899111989921198782
119894119886119872
1003817100381710038171003817119887119894 (119890119894)1003817100381710038171003817210038171003817100381710038171003817
119894119886
10038171003817100381710038171003817
2
+ 11989911198992Ξ120582min (119876119890
)1003817100381710038171003817119890119894119865
10038171003817100381710038172
+ 11989911198992(1003817100381710038171003817119887119894 (119890119894)
10038171003817100381710038172
minus Ξ120582min (119877))1003817100381710038171003817119906mix
10038171003817100381710038172
(72)
If the following inequalities can satisfy
120582min 1198761198901003817100381710038171003817119890119894119865
10038171003817100381710038172
2le 119890
119879
119894119865119876119890119890119894119865le 120582max 119876119890
1003817100381710038171003817119890119894119865
10038171003817100381710038172
2
120582min 1198771003817100381710038171003817119906mix
10038171003817100381710038172
2le 119906
119879
mix119877119906mix le 120582max 1198771003817100381710038171003817119906mix
10038171003817100381710038172
2
Ξ gt
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
120582min 119877
(73)
then 119894119906(119890
119894 119906mix) can be further transformed as
119894119906(119890
119894 119906mix)
le minus(1198712
119894119898minus1198991
21198712
119894119872minus
1198991
41198992
10038171003817100381710038171003817119877minus110038171003817100381710038171003817
21003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
nabla1198782
119894119888119872)10038171003817100381710038171003817
119894119888
10038171003817100381710038171003817
2
minus (11989911198782
119894119886119898minus3
4119899111989921198782
119894119886119872minus 119899
111989921198782
119894119886119872
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
)10038171003817100381710038171003817
119894119886
10038171003817100381710038171003817
2
minus 11989911198992(1003817100381710038171003817119887119894(119890119894)
10038171003817100381710038172
+ Ξ120582min (119877))1003817100381710038171003817119906mix
10038171003817100381710038172
minus 11989911198992Ξ120582min (119876119890
)1003817100381710038171003817119890119894119865
10038171003817100381710038172
le minus11989911198992Ξ120582min (119876119890
)1003817100381710038171003817119890119894119865
10038171003817100381710038172
(74)
Therefore we can get the conclusion that 119894119906(119890
119894 119906mix) lt 0
4 Simulations
In order to verify the validity and convergence of the pro-posed decentralized reinforcement learning robust optimaltracking control method based on ACI and to study theconvergence of the error by comparing the simulation resultin this paper two different configurations of the time varying
external constrained reconfigurablemodular robot have beenapplied shown in Figures 3 and 4
For the sake of the facilitation of the analysis of theconfigurations above we can transform them into a formof analytic charts which are shown in Figures 5 and 6where 119871
1 119871
2 and 119871
4are the length of the links 119871
3is the
distance between the time varying constraint joint and thebase modular
The time varying constraint can be defined as a kindof column which rotated about with a certain degree offreedom The constraint equations of configuration A andconfiguration B are shown as follows
Ψ119860(119902 119905) = 119871
1cos 119902
1+ 119871
2cos 119902
2minus [119871
3+ 119871
4cot120572 (119905)]
Ψ119861(119902 119905) = 119871
1+ 119871
2cos 119902
2minus [119871
3+ 119871
4cot120572 (119905)]
(75)
In the equation above the angle 120572(119905) between the timevarying constraint and the119883-axis can be defined as follows
120572 (119905) = 075120587 + 02 sin 119905
2 (76)
The initial positions of joint models are 1199021(0) = 2 119902
2(0) =
2 in configurationA and 1199021(0) = 2 119902
2(0) = 2 in configuration
BThe initial velocities of joints are zerosThe dynamicmodelof configurations A and B is designed as follows
119872119860(119902) = [
036 cos (1199022) + 06066 018 cos (119902
2) + 01233
018 cos (1199022) + 01233 01233
]
119872119861(119902) = [
017 minus 01166cos2 (1199022) minus006 cos (119902
2)
minus006 cos (1199022) 01233
]
119862119860(119902 119902) = [
minus036 sin (1199022) 119902
2minus018 sin (119902
2) 119902
2
018 sin (1199022) ( 119902
1minus 119902
2) 018 sin (119902
2) 119902
1
]
119862119861(119902 119902) = [
01166 sin (21199022) 119902
2006 sin (119902
2) 119902
2
006 sin (1199022) 119902
20
]
119866119860(119902) = [
minus588 sin (1199021+ 119902
2) minus 1764 sin (119902
1)
minus588 sin (1199021+ 119902
2)
]
119866119861(119902) = [
0
minus588 cos (1199022)]
119865119860(119902 119902) = [
1199021+ 10 sin (3119902
1) + 2 sgn ( 119902
1)
12 1199022+ 5 sin (2119902
2) + sgn ( 119902
2)]
119865119861(119902 119902) = [
0
15 1199022+ sin (119902
2) + 12 sgn ( 119902
2)]
(77)
The desired trajectory of configurations A and B is shown asConfiguration A
1199101119889
= 05 cos (119905) + 02 sin (3119905)
1199102119889
= Θ (1199101119889 119905)
= arcsin[1198711sin (120572 (119905) minus 119910
1119889) minus 119871
3sin (120572 (119905))
1198712
] + 120572 (119905)
(78)
Mathematical Problems in Engineering 11
Figure 3 Configuration A for simulation
Figure 4 Configuration B for simulation
Configuration B
1199101119889
= 0
1199102119889
= Θ (1199101119889 119905)
= arcsin [1198711sin (120572 (119905)) minus 119871
3sin (120572 (119905))
1198712
] + 120572 (119905)
(79)
Due to the limit of dynamic external constraints thevariety of joint 1 in configuration B is zero
In order to confirm that the adopted method can beapplied in different configurations and to verify the trackingperformance for the subsystem desired trajectory by usingthe ACI and RBF-NN based decentralized reinforcementlearning robust optimal tracking control method in this partthe comparative simulations which include the classic RBFneural network control method and ACI based decentral-ized reinforcement learning robust optimal tracking controlmethod have been adopted respectively
From Figures 7 8 9 10 11 12 13 and 14 the joint trackingand error curves are shown by using classic RBF neural net-work [4] to compensate the effect of the dynamic nonlinearterm and the interconnection term in the subsystem Figures7 to 10 show that the actual output trajectories take about2 seconds to track the desired trajectories This is due tothe fact that the classical neural network method requires alonger training process and parameter adaptive process Theerror curves in Figures 11ndash14 show that the joint subsystem
q1L2
L3
L4
L1
Y
X
120572
q2
Figure 5 The analytic chart of configuration A
q2
L4
L2
L1
L3
Y
120572
X
q1
Figure 6 The analytic chart of configuration B
constraints cannot be well compensated by adopting the clas-sical neural control method for the reconfigurable modularrobot when the time varying constrains exist When thejoint output variables turn larger the reconfigurable modularsystem cannot exhibit a good robustness and the trackingerrors are larger than before
Figures 15 16 17 18 19 20 21 and 22 showed thejoint tracking and error curves by adopting ACI to identifythe optimal 119876-function optimal control policy and globaldynamic nonlinear terms of the HJB equation in the sub-system Figures 15 to 18 show that the actual output trajec-tories of the subsystems can track the desired trajectoriesin 05 seconds by using the proposed robust reinforcementlearning optimal control method This is due to the excellentidentifying ability of ACIwhich can identify the uncertaintiescontained in the subsystems in a short timeThe error curvesin Figures 19ndash22 show that the tracking error is very smalland it can converge to zero in a short time Besides whenthe proposed decentralized control method is adopted for thejoint subsystems the robustness is manifested
Figures 23 and 24 showed the 3D-tip trajectory curvesby using ACI algorithm These two figures show that theproposed decentralized control method can fully satisfythe accessibility requirement of the reconfigurable modularrobot Besides the singular displacements of the joint subsys-tems and the end-effector would not appear The parametersdefined in ACI are shown in Table 1
12 Mathematical Problems in Engineering
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
25
Time (s)
Join
t 1 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
minus1
minus05
Figure 7 Trajectory tracking curve of configuration A joint 1 withRBF neural network
0 1 2 3 4 5 6 7 8 9 1008
1
12
14
16
18
2
22
24
26
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 8 Trajectory tracking curve of configuration A joint 2 withRBF neural network
Table 1 Parameter list of action-critic-identifier
119896 120572 120592 1205781198861
1205781198862
120578119888
1205731
1205732
120574
800 300 0005 10 50 20 02 2 05
The simulation results show that compared with thesituation of using classic RBF neural network controllerthe decentralized robust optimal tracking control methodbased on ACI and reinforcement learning can be appliedinto different configurations of the time varying constrainedreconfigurable modular robot And the joint variables cantrack the desired trajectory within a very short time indifferent configurations and the fluctuation of the errorconvergence range is minimal
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus2
minus15
minus05
minus1
Desired trajectoryActual trajectory
Figure 9 Trajectory tracking curve of configuration B joint 1 withRBF neural network
0 1 2 3 4 5 6 7 8 9 1017
18
19
2
21
22
23
24
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 10 Trajectory tracking curve of configuration B joint 2 withRBF neural network
0 1 2 3 4 5 6 7 8 9 10
0
002
004
006
008
01
Time (s)
Join
t 1 er
ror (
rad)
minus01
minus008
minus006
minus004
minus002
Figure 11 Tracking error curve of configuration A joint 1 with RBFneural network
Mathematical Problems in Engineering 13
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
002
004
006
008
01
Join
t 2 er
ror (
rad)
minus01
minus008
minus006
minus004
minus002
Figure 12 Tracking error curve of configuration A joint 2 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10
0
001
002
003
004
005
Time (s)
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 13 Tracking error curve of configuration B joint 1 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 2 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 14 Tracking error curve of configuration B joint 2 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus1
minus05
Desired trajectoryActual trajectory
Figure 15 Trajectory tracking curve of configuration A joint 1 withACI based reinforcement learning
0 1 2 3 4 5 6 7 8 9 1008
1
12
14
16
18
2
22
24
26
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 16 Trajectory tracking curve of configuration A joint 2 withACI based reinforcement learning
5 Conclusions and Future Work
In this paper combining ACI with RBF neural network anovel decentralized reinforcement learning robust optimaltracking control theory has been proposed for time varyingconstrained reconfigurable modular robots Moreover thistheory is used to solve the problem of the continuoustime nonlinear optimal control policy for strongly coupleduncertainty robotic system Firstly we build the model ofsubsystem with the time varying external constraints anddescribed the global robot system as a synthesis of inter-connected subsystem Secondly ACI is used to estimate theHJB equation and the global uncertainty where a continuous-time optimal 119876-function is adopted to take the place oftraditional optimal value function the optimal 119876-function
14 Mathematical Problems in Engineering
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus15
minus05
minus2
minus1
Desired trajectoryActual trajectory
Figure 17 Trajectory tracking curve of configuration B joint 1 withACI based reinforcement learning
0 1 2 3 4 5 6 7 8 9 1017
18
19
2
21
22
23
24
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 18 Trajectory tracking curve of configuration B joint 2 withACI based reinforcement learning
and the optimal control police are approximated by critic-NNand action-NN and the global uncertainty is identified bythe identifier Thirdly we design a novel decentralized robustoptimal tracking controller so the desired trajectory can betracked and the tracking error could converge to zero infinite timeOn this basis two kinds of Lyapunov functions aredesigned to confirm the stability of ACI and the subsystemFinally in order to confirm the superiority of the proposedcontrol theory the comparative simulation examples havebeen presented combining two different configurations of therobot
In the future more complex configuration for timevarying external constrained reconfigurable modular robotcan be included Therefore the decentralized controller withhigher control and error precision will be considered
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 19 Tracking error curve of configuration A joint 1 with ACIbased reinforcement learning
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005Jo
int 2
erro
r (ra
d)
minus005
minus004
minus003
minus002
minus001
Figure 20 Tracking error curve of configuration A joint 2 with ACIbased reinforcement learning
0
0
1 2 3 4 5 6 7 8 9 10Time (s)
001
002
003
004
005
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 21 Tracking error curve of configuration B joint 1 with ACIbased reinforcement learning
Mathematical Problems in Engineering 15
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 2 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 22 Tracking error curve of configuration B joint 2 with ACIbased reinforcement learning
005
1
02 03 04 05 06 07
0
01
02
03
minus1
minus05minus02
minus01
xla
bel
y labelz label
Desired trajectoryActual trajectory
Figure 23 3D-tip trajectory curve of configuration A with ACI
005
1
035 036 037 038 039 04
006008
01012014016018
minus1
minus05
xla
bel
y labelz label
Desired trajectoryActual trajectory
Figure 24 3D-tip trajectory curve of configuration B with ACI
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work is financially supported by the National NaturalScience Foundation of China (Grant nos 61374051 and60974010) the Scientific Technological Development PlanProject in Jilin Province of China (Grant no 20110705) Thefirst author is funded by China Scholarship Council
References
[1] Y Li and B Dong ldquoDecentralized ADRC control for reconfig-urable manipulators based on VGSTA-ESO of sliding moderdquoINFORMATION vol 15 no 6 pp 2453ndash2466 2012
[2] Y LiM-C Zhu andY-C Li ldquoSimulation on robust neurofuzzycompensator for reconfigurable manipulator motion controlrdquoJournal of System Simulation vol 19 no 22 pp 5169ndash5174 2007
[3] M-C Zhu and Y-C Li ldquoDecentralized adaptive sliding modecontrol for reconfigurable manipulators using fuzzy logicrdquoJournal of Jilin University (Engineering and Technology Edition)vol 39 no 1 pp 170ndash176 2009
[4] L Zhu and Y Li ldquoDecentralized adaptive neural networkcontrol for reconfigurable manipulatorsrdquo in Proceedings of theChinese Control and Decision Conference (CCDC rsquo10) pp 1760ndash1765 Xuzhou China May 2010
[5] M Zhu Y Li and Y Li ldquoA new distributed control schemeof modular and reconfigurable robotsrdquo in Proceedings of theIEEE International Conference onMechatronics and Automation(ICMA rsquo07) pp 2622ndash2627 Harbin China August 2007
[6] M C Zhu Y Li and Y C Li ldquoObserver-based decentralizedadaptive fuzzy control for reconfigurable manipulatorrdquo Controland Decision vol 24 no 3 pp 429ndash434 2009
[7] S Richard and A G Barto Reinforcement Learning An Intro-duction The MIT Press London UK 1998
[8] F L Lewis and D Liu ReinForcement Learning and Approxi-mate Dynamic Programming For Feedback Control Wiley-IEEEPress New York NY USA 2012
[9] Y-K Xu and X-R Cao ldquoLebesgue-sampling-based optimalcontrol problems with time aggregationrdquo IEEE Transactions onAutomatic Control vol 56 no 5 pp 1097ndash1109 2011
[10] F L Lewis andD Vrabie ldquoReinforcement learning and adaptivedynamic programming for feedback controlrdquo IEEE Circuits andSystems Magazine vol 9 no 3 pp 32ndash50 2009
[11] X Xu H-G He and D Hu ldquoEfficient reinforcement learningusing recursive least-squares methodsrdquo Journal of ArtificialIntelligence Research vol 16 pp 259ndash292 2002
[12] H Zhang Q Wei and Y Luo ldquoA novel infinite-time optimaltracking control scheme for a class of discrete-time nonlinearsystems via the greedy HDP iteration algorithmrdquo IEEE Trans-actions on Systems Man and Cybernetics B vol 38 no 4 pp937ndash942 2008
[13] H Zhang R Song Q Wei and T Zhang ldquoOptimal trackingcontrol for a class of nonlinear discrete-time systems withtime delays based on heuristic dynamic programmingrdquo IEEETransactions on Neural Networks vol 22 no 12 pp 1851ndash18622011
16 Mathematical Problems in Engineering
[14] H Zhang X Xie and S Tong ldquoHomogenous polynomiallyparameter-dependent 119867
infinfilter designs of discrete-time fuzzy
systemsrdquo IEEE Transactions on Systems Man and CyberneticsB vol 41 no 5 pp 1313ndash1322 2011
[15] H Zhang L Cui X Zhang and Y Luo ldquoData-drivenrobust approximate optimal tracking control for unknown gen-eral nonlinear systems using adaptive dynamic programmingmethodrdquo IEEE Transactions on Neural Networks vol 22 no 12pp 2226ndash2236 2011
[16] J Zhang H Zhang Y Luo and H Liang ldquoOptimal controldesign for nonlinear systems adaptive dynamic programmingbased on fuzzy critic estimatorrdquo in The International JointConference on Neural Network pp 1ndash6 Brisbane Australia2012
[17] H Zhang D Gong B Chen and Z Liu ldquoSynchronization forcoupled neural networkswith interval delay a novel augmentedlyapunov-krasovskii functional methodrdquo IEEE Transactions onNeural Networks and Learning Systems vol 24 no 1 pp 58ndash702013
[18] S Bhasin K Dupree P M Patre and W E Dixon ldquoNeuralnetwork control of a robot interacting with an uncertain vis-coelastic environmentrdquo IEEE Transactions on Control SystemsTechnology vol 19 no 4 pp 947ndash955 2011
[19] S G Khan G Herrmann F Lewis T Pipe and C MelhuishldquoA Q-learning based Cartesian model reference compliancecontroller implementation for a humanoid robot armrdquo inProceedings of the IEEE 5th International Conference onRoboticsAutomation andMechatronics (RAM rsquo11) pp 214ndash219 QingdaoChina September 2011
[20] P K Patchaikani L Behera and G Prasad ldquoA single networkadaptive critic-based redundancy resolution scheme for robotmanipulatorsrdquo IEEE Transactions on Industrial Electronics vol59 no 8 pp 3241ndash3253 2012
[21] C J C H Watkins and P Dayan ldquoQ-learningrdquo MachineLearning vol 8 no 3-4 pp 279ndash292 1992
[22] C J CHWatkinsLearning fromdelayed rewards [PhD thesis]University of Cambridge Cambridge UK 1989
[23] F L Lewis and V L Syrmos Optimal Control John Wiley ampSons New York NY USA 1995
[24] M Sassano and A Astolfi ldquoDynamic approximate solutionsof the HJ inequality and of the HJB equation for input-affinenonlinear systemsrdquo IEEE Transactions on Automatic Controlvol 57 no 10 pp 2490ndash2503 2012
[25] Y X Wu and C Wang ldquoDeterministic learning based adaptivenetwork control of robot in task spacerdquoActa Automatica Sinicavol 39 no 1 pp 1ndash10 2013
[26] P M Patre W MacKunis K Kaiser and W E DixonldquoAsymptotic tracking for uncertain dynamic systems via amultilayer neural network feedforward and RISE feedbackcontrol structurerdquo IEEE Transactions on Automatic Control vol53 no 9 pp 2180ndash2185 2008
[27] B E Paden and S S Sastry ldquoA calculus for computing Filippovrsquosdifferential inclusion with application to the variable structurecontrol of robot manipulatorsrdquo IEEE Transactions on Circuitsand Systems vol 34 no 1 pp 73ndash82 1987
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Mathematical Problems in Engineering 5
Here 119903119894(119890
119894 119906
119894) represents the reward function for the current
state shown as
119903119894(119890
119894 119906
119894) = 119890
119879
119894119876119890119890119894+ 119906
119879
119894119877119906
119894 (24)
where 119876119890and 119877 are the positive definite matrixes
Typically recording the value of state-action pairs is moreuseful than recording the value of state only since the state-action pairs are the predictions of the reward Even if thereward value of a state is low it does not mean that the valueof state-action pairs is low too If the state of the subsystemin a period time produces a higher reward then it can stillget a higher state-action value Therefore from a long termperspective defining a suitable state-action value function(119876-function) can make actions produce more rewards [2122]
According to (23) and (24) the continuous-time optimal119876-function can be defined as
119876lowast
119894(119890
119894 119886
119894 119906
119894) = 119903
119894(119890
119894 119886
119894 119906
119894) + 119881
lowast
119894(119890
119894 119906
119894)
= 119903119894(119890
119894 119886
119894 119906
119894)
+ min119906119894
119905le120591ltinfin
int
infin
0
119903119894(119890
119894(120591) 119906
119894(119890
119894(120591))) 119889120591
(25)
Assumption 3 The partial derivation of 119876lowast
119894and 119903
119894(119890
119894 119886
119894 119906
119894)
exist and they are continuous in the domain According to(18) and (24) by using the control policy 119906
119894 the optimal
119876-function can satisfy the following Hamiltonian-Jacobi-Bellman equation [23]
HJB119894(119890
119894 119906
119894 nabla119876
lowast
119894)
= min119906119894(119890119894)
[119903119894(119890
119894 119886
119894 119906
119894)
+nabla119876lowast
119894(minus119865 (119890
119894 119906
119894) minus ℎ
119894(119890 119890 119890) minus 119891
119894+ 119887
119894(119890
119894) 119906
119894)]
= min119906119894(119890119894)
[119903119894(119890
119894 119886
119894 119906
119894) + nabla119876
lowast
119894Φ
119894(119890
119894 119906
119894)]
(26)
whereΦ119894(119890
119894 119906
119894) = minus119865(119890
119894 119906
119894)minusℎ
119894(119890 119890 119890)minus119891
119894+119887
119894(119890
119894)119906
119894means the
global uncertainty including the unknown dynamics of thesubsystem and the interconnection term andnabla119876lowast
119894= 120597119876
lowast
119894120597119890
119894
means the gradient of the optimal 119876-function
Lemma 4 (see [24]) Considering dynamics of the subsystemof time varying constrained reconfigurable modular robot in(14) in order to ensure the minimum of the HJB equation (26)possessing the stationary point with respect to 119906
119894 the optimal
119876-function and the optimal control policy must satisfy thefollowing conditions
(1) 120597119867119869119861(119890119894 119906
119894 nabla119876
119894)120597119906
119894= 0
(2) 1205972119867119869119861119894(119890
119894 119906
119894 nabla119876
119894)(120597119906
119894times 120597119906
119879
119894) ge 0
The necessary conditions above lead us to the followingresults
(a) The bounded control policy can guarantee a localminimum of the HJB equation (26) and satisfy theconstraints imposed on the control inputs
(b) The Hessian matrix is positive-definite and the controlpolice 119906
119894can render the global minimum of the HJB
equation(c) If an optimal algorithm exists it is unique
According to Lemma 4 if the reward function is smoothand the optimal control 119906lowast
119894is adopted then the HJB equation
satisfies the following equation
HJBlowast
119894(119890
119894 119906
lowast
119894 nabla119876
lowast
119894) = min
119906lowast
119894(119890119894)
[119903119894(119890
119894 119886
119894 119906
lowast
119894) + nabla119876
lowast
119894Φ
119894(119890
119894 119906
lowast
119894)]
= 0
(27)
And the optimal control can be expressed as follows
119906lowast
119894(119890
119894) = arg
119906lowast
119894
min [HJBlowast
119894(119890
119894 119906
lowast
119894 nabla119876
lowast
119894)]
=1
2119877minus1
119887119879
119894(119890
119894)120597119876
lowast
119894(119890
119894 119886
119894 119906
119894)119879
120597119890119894
(28)
If the optimal 119876-function 119876lowast
119894is continuous derivable
and known and the initial value119876lowast
119894(0) = 0 as well as the opti-
mal control policy 119906lowast
119894(119890
119894) and the global uncertainty of the
subsystemΦ119894(119890
119894 119906
lowast
119894) is known then the HJB equation in (27)
is held and solvableHowever in the actual situation119876lowast
119894is not
derivable everywhere and 119906lowast119894(119890
119894) andΦ
119894(119890
119894 119906
lowast
119894) are unknown
Therefore it is not feasible to solve the HJB equation byusing average method In this paper we combine the action-critic identifier (ACI) with RBF neural network to estimatethe optimal control policy the optimal 119876-function and theglobal uncertainty of the subsystem Action-NN is used toestimate 119906
lowast
119894(119890
119894) and is denoted as
119894(119890
119894) 119876lowast
119894is estimated
by critic-NN and expressed as 119876119894 then we use the robust
neural network identifier to identify Φ119894(119890
119894 119906
lowast
119894) denoted as
Φ119894(119890
119894 119906
lowast
119894)Theblock diagramof theACI architecture is shown
in Figure 2The estimated HJB equation can be expressed as follows
HJBlowast
119894(119890
119894
119894 nabla119876
119894) = min
119906119894(119890119894)[119903
119894(119890
119894 119886
119894
119894) + nabla119876
119894Φ
119894(119890
119894
119894)]
(29)
The identification error of the HJB equation above can beexpressed as
120575ℎ119894= HJBlowast
119894(119890
119894
119894 nabla119876
119894) minusHJBlowast
119894(119890
119894 119906
lowast
119894 nabla119876
lowast
119894) (30)
A classic radial basis function of the neural network isproposed in [25] shown as (31)
119873(119909) = 119882lowast119879
119878 (119909) + 120576 (119909) (31)
6 Mathematical Problems in Engineering
Action
Rewardfunction
HJB
error
Identifier
Subsystem
Critic
minus+
Qi(ei ai ui)Qi(ei ai ui)
ri(ei ai ui)
ri(ei ai ui)
Φi(ei ui)
Φi(ei ui)
Φi(ei )
eiF(t)
ui
ui
(t)
120575hi
1s
Figure 2 The architecture of action-critic-identifier
where 119882lowast means the ideal neural network weights and 120576(119909)
represents the estimation error In the case of using sufficientnumber of nodes if the center and width of the nodes arebuilt appropriately then any kind of continuous functioncould be approximated by RBF-NN Therefore the optimal119876-function and the optimal control policy can be expressedas follows
119876lowast
119894= 119882
119879
119894119878119894(119890
119894) + 120576
119894119888(119890
119894)
119906lowast
119894(119890
119894) = minus
1
2119877minus1
119887119879
119894(119890
119894) [ 119878
119894(119890
119894)119879
119882119894+ 120576
119894119886(119890
119894)]
(32)
where 119878119894(119890
119894) = [119904
1198941(119890
119894) sdot sdot sdot 119904
119894119899(119890
119894)]119879 indicates the smooth
basis function of the neural network 119882119894means the ideal
unknown neural network weight and 120576119894119888(119890
119894) and 120576
119894119886(119890
119894) are
the estimation error By using 119876119894and
119894(119890
119894) to estimate 119876
lowast
119894
and 119906lowast
119894(119890
119894) we can get the following equations
119876119894=
119879
119894119888119878119894119888(119890
119894) (33)
119894(119890
119894) = minus
1
2119877minus1
119887119879
119894(119890
119894) 119878
119894119886(119890
119894)119879
119894119886 (34)
According to the equations above 119894119888(119905) and
119894119886(119905) indicated
the weights of critic-NN and action-NN And the estimationerrors of weights are shown as follows
119894119888(119905) = 119882
119894minus
119894119888(119905) (35)
119894119886(119905) = 119882
119894minus
119894119886(119905) (36)
The update law of the weight for the critic-NN is a gradientdescent algorithm which is shown as follows
119882
119894119888(119905) = minus119899
1119871119894(119871
119879
119894
119894119888+ 119890
119879
119894119876119890119890119894+ 119906
119879
119894119877119906
119894) (37)
In the equation above 119899119894gt 0 is the adaptive gain of the neural
network 119871119894and 119897
119894are defined as
119871119894=
119897119894
119897119879119894119897119894+ 1
119897119894= nabla119878
119894119888(119890
119894) 119890
119894
(38)
Therefore according to the definition above the followinginequalities can be obtained
119871119894119898
le 119871119894le 119871
119894119872
119878119894119888119898
le 119878119894119888(119890
119894) le 119878
119894119888119872
119878119894119886119898
le 119878119894119886(119890
119894) le 119878
119894119886119872
(39)
Mathematical Problems in Engineering 7
Combining (35) with (38) we can get that
119882119894119888(119905) = minus119899
1119871119894(119871
119879
119894
119894119888+ 120575
ℎ119894) (40)
The update law of the weight for the action-NN is developedby a gradient descent algorithm expressed as follows
119882
119894119886(119905) = minus119899
2119878119894119886(119890
119894)
times ((119879
119894119886119878119894119886(119890
119894) +
1
2119877minus1
119887119894(119890
119894) nabla119878
119894119888(119890
119894)119879
119894119888))
119879
(41)
According to the estimation error of action-NN in (36) theoptimal control 119906lowast
119894(119890
119894) can minimize the optimal119876-function
and we can get the following equation
119882119879
119894119886119878119894119886(119890
119894) +
1
2119877minus1
119887119894(119890
119894) nabla119878
119894119888(119890
119894)119879
119882119894119888
+1
2119877minus1
119887119894(119890
119894) nabla120576
119894119888(119890
119894) + 120576
119894119886(119890
119894) = 0
(42)
Putting (41) into (42) we can get that
119882119894119886
= minus1198992119878119894119886(119890
119894)(
119879
119894119886119878119894119886(119890
119894) +
1
2119877minus1
119887119894(119890
119894) nabla119878
119894119888(119890
119894)119879
119894119888
minus1
2119877minus1
119887119894(119890
119894) nabla120576
119894119888(119890
119894) minus 120576
119894119886(119890
119894)
)
(43)
After using critic-NN and action-NN to estimate 119876119894and
119894(119890
119894) we need to design a kind of robust RBF-NN identifier
to identify the nonlinear uncertainties of the subsystem HereΦ
119894(119890
119894
119894) can be expressed as follows
Φ119894(119890
119894
119894) = 119890
119894119865= 119882
119879
119894119865120581 (Λ
119879
119894119865119890119894119865) + 120576
119894119865(119890
119894119865) + 119887
119894(119890
119894)
119894
(44)
where 120581(sdot) means the basic function of neural network and119882
119894119865Λ
119894119865indicate the unknown ideal neural network weights
Equation (44) can be identified by using robust RBF-NNidentifier so we can get
Φ119894(119890
119894
119894) = 119890
119894119865= 119882
119879
119894119865120581119894119865+ 119887
119894(119890
119894)
119894+ 120583
119894 (45)
Here 120581119894119865indicates the estimated value of the basic function of
the neural network 119882119894119865 Λ
119894119865are expressed as the estimated
value of neural network 120583119894isin R means the feedback error
term shown as follows [26]120583119894= 119896 (119890
119894119865(119905) minus 119890
119894119865(119905)) minus 119896 (119890
119894119865(0) minus 119890
119894119865(0)) + 120599
= 119896 (119890119894119865(119905) minus 119890
119894119865(0)) + 120599
120599 = (119896120572 + 120574) 119890119894119865+ 120573
1sat (119890
119894119865)
(46)
where 119896 120572 1205731 and 120574 are the positive control gain constants
and sat(sdot) is a saturation functionTherefore the state estima-tion error of the identifier-NN can be expressed as follows
119890119894119865= 119890
119894119865minus 119890
119894119865
= 119882119879
119894119865120581119894119865minus
119879
119894119865120581119894119865+ 120576
119894119865(119890
119894119865) minus 120583
119894
(47)
A filtered identification error is defined as follows
119864119894= 119890
119894119865+ 120572119890
119894119865 (48)
The derivation of the equation above is shown as
119894= 119882
119879
119894119865119894119865Λ119879
119894119865119890119894119865minus
119879
119894119865
120581119894119865
Λ119879
119894119865119890119894119865+ 120572 119890
119894119865minus 119896119864
119894minus 120574119890
119894119865
minus
119882119879
119894119865120581119894119865+ 120576
119894119865(119890
119894119865) minus 120573
1sat (119890
119894119865) minus
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865
(49)
Here the weight 119882119894119865 Λ
119894119865of the identification-NN can be
updated by
119882
119894119865= proj (Γ
119894119882119865
120581119894119865Λ119879
119894119865
119890119894119865119890119879
119894119865)
Λ119894119865= proj (Γ
119894Λ119865
119890119894119865119890119879
119894119865
119879
119894119865
120581119894119865)
(50)
where Γ119894119882119865
Γ119894Λ119865
are positive constant adaptation gain matri-ces In order to analyze the convergence of the filteredidentification error 119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865
can be divided into thefollowing form
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865
=1
2120581119894119865
119890119894119865[(Λ
119879
119894119865minus Λ
119879
119894119865) (119882
119879
119894119865minus
119879
119894119865)
+ (119882119879
119894119865minus
119879
119894119865) (Λ
119879
119894119865minus Λ
119879
119894119865)]
=1
2120581119894119865
119890119894119865[
119879
119894119865(Λ
119879
119894119865minus Λ
119879
119894119865) + (119882
119879
119894119865minus
119879
119894119865) Λ
119879
119894119865
minus119882119879
119894119865(Λ
119879
119894119865minus Λ
119879
119894119865) minus (119882
119879
119894119865minus
119879
119894119865)Λ
119879
119894119865
]
=1
2120581119894119865
119890119894119865[
119879
119894119865(Λ
119879
119894119865minus Λ
119879
119894119865) + (119882
119879
119894119865minus
119879
119894119865) Λ
119879
119894119865]
minus1
2120581119894119865
119890119894119865[119882
119879
119894119865(Λ
119879
119894119865minus Λ
119879
119894119865) + (119882
119879
119894119865minus
119879
119894119865)Λ
119879
119894119865]
=1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865
minus1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865minus1
2
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865
+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865
(51)
where 119879
119894119865= 119882
119879
119894119865minus
119879
119894119865 Λ119879
119894119865= Λ
119879
119894119865minus Λ
119879
119894119865 Putting (51) into
(49) then (49) can be reduced to the following form
119894= 119875
1198651+ 119875
1198652+ 119875
1198653minus 119896119864
119894minus 120574119890
119894119865minus 120573
1sat (119890
119894119865) (52)
8 Mathematical Problems in Engineering
Among the equations above 1198751198651+119875
1198652+119875
1198653can be expressed
respectively as follows
1198751198651
=1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865minus
119879
119894119865
120581119894119865
Λ119879
119894119865119890119894119865
+ 120572 119890119894119865minus
119879
119894119865120581119894119865
(53)
1198751198652
= minus1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865minus1
2
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865
+119882119879
119894119865119894119865Λ119879
119894119865119890119894119865+ 120576
119894119865(119890
119894119865)
(54)
1198751198653
=1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865 (55)
According to Assumption 1 (48) and (50) the upper boundsof 119875
1198651 119875
1198652 119875
1198653are shown as
100381710038171003817100381711987511986511003817100381710038171003817 le 119869
1(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
100381710038171003817100381711987511986521003817100381710038171003817 le 120589
1
100381710038171003817100381711987511986531003817100381710038171003817 le 120589
2
(56)
Combining (53) and (54) with (55) then we can get that100381710038171003817100381710038171198652
+ 1198653
10038171003817100381710038171003817le 120589
3+ 120589
41198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817 (57)
where 120593119894(119890
119879
119894119865 119864
119879
119894) = [119890
119879
119894119865119864119879
119894]119879 and 119869
119894(sdot) is a global invertible
nondecreasing function 120589119894 (119894 = 1 2 3 4) are computable
positive constants
Theorem 5 Considering dynamics of the subsystem of timevarying constrained reconfigurable modular robot in (14) andthe state equation (18) if the designed identifier and thecorresponding weight update laws are adopted then the globaluncertainty of the subsystem which depends explicitly on theerror term can be identified and the identification error isconverged and bounded
Proof Define the Lyapunov function as the follows
119881119894119871(119890
119894119865 119864
119894) =
1
2119864119879
119894119864119894+1
2120574119890
119879
119894119865119890119894119865+ 120603
119894(119905) + 120601
119894(119905) (58)
In the equation above 120603119894(119905) and 120601
119894(119905) can be expressed as
follows
119894(119905) = minus[
119864119879
119894(119875
1198652minus 120573
1sat (119890
119894119865)) + 119890
119879
1198941198651198751198653
minus12057321198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)100381710038171003817100381710038171003817100381710038171003817119890119894119865
1003817100381710038171003817
]
120603119894(0) = 120573
1
1003816100381610038161003816119890119894119865 (0)1003816100381610038161003816 minus 119890
119879
119894119865(0) (119875
1198652(0) + 119875
1198653(0))
(59)
120601119894(119905) =
1
4120572 [ tr (
119879
119894119865Γminus1
119894119882119865
119894119865) + tr (Λ
119879
119894119865Γminus1
119894Λ119865Λ
119894119865)] (60)
where tr(sdot) represents the trace of matrix Defining 119889 =
[119864119879
119894119890119879
11989411986512060312
11989412060112
119894] 120573
1 120573
2isin R are positive adaptation gains
which are chosen to ensure 120603119894(119905) ge 0 so we can get
1198801(119889) le 119881
119894119871(119890
119894119865 119864
119894) le 119880
2(119889) (61)
where
1198801(119889) =
1
2min (1 120574) 1198892
1198802(119889) = max (1 120574) 1198892
(62)
The derivation of (58) is shown as follows
119894119871(119890
119894119865 119864
119894) = nabla119881
119879
119894119871119870[
119894
119890119879
119894119865
1
212060312
119894119894
1
212060112
119894
120601119894]119879
(63)
where119870[sdot] is expressed as a Filipov set [27]So
119894119871(119890
119894119865 119864
119894) can be deformed as the following form
119894119871(119890
119894119865 119864
119894)
= [119864119879
119894120574119890
119879
1198941198652120603
12
1198942120601
12
119894]119870[
119894
119890119879
119894119865
1
212060312
119894119894
1
212060112
119894
120601119894]119879
le 120574119879
(
1
2
119882119879
119894119865
119894119865
Λ119879
119894119865
119894119865+
1
2
119879
119894119865
119894119865Λ119879
119894119865
119894119865minus
119879
119894119865
119894119865
Λ
119879
119894119865119890119894119865
+120572119894119865minus
1
2
119882119879
119894119865
119894119865
Λ119879
119894119865119890119894119865minus
1
2
119879
119894119865
119894119865Λ119879
119894119865119890119894119865minus 120574119890
119894119865
minus119879
119894119865120581119894119865+119882
119879
119894119865119894119865Λ119879
119894119865119890119894119865+ 120576
119894119865(119890
119894119865) +
1
2
119879
119894119865
119894119865
Λ119879
119894119865
119894119865
+
1
2
119879
119894119865
119894119865
Λ119879
119894119865
119894119865minus 119896119864
119894minus 120573
1119870[sat (119890
119894119865)]
)
minus119864119879
119894(
minus1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865minus1
2
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865
+119882119879
119894119865119894119865Λ119879
119894119865119890119894119865+ 120576
119894119865(119890
119894) minus 120573
1119870[sat (119890
119894119865)])
minus 119890119879
119894119865
1
2(
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865)
+ 120574119890119879
119894119865(119864
119894minus 120572119890
119894119865)
+ 12057321198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)100381710038171003817100381710038171003817100381710038171003817119890119894119865
1003817100381710038171003817
minus1
2120572 tr (119879
119894119865Γminus1
119894119882119865
119882
119894119865) minus
1
2120572 tr (Λ119879
119894119865Γminus1
119894Λ119865
Λ119894119865)
(64)
Put (53) (54) and (55) into (64) then we can get
119894119871(119890
119894119865 119864
119894)
= 119864119879
119894(119875
1198651+ 119875
1198652+ 119875
1198653minus 120573
1119870[sat (119890
119894119865)] minus 119896119864
119894minus 120574119890
119894119865)
+ 120574119890119879
119894119865(119864
119894minus 120572119890
119894119865)
minus 119864119879
119894(119875
1198652minus 120573
1119870[sat (119890
119894119865)])
minus 119890119879
1198941198651198751198653
+ 12057321198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)100381710038171003817100381710038171003817100381710038171003817119890119894119865
1003817100381710038171003817
minus1
2120572 tr (119879
119894119865Γminus1
119894119882119865
119882
119894119865)
minus1
2120572 tr (Λ119879
119894119865Γminus1
119894Λ119865
Λ119894119865)
Mathematical Problems in Engineering 9
= minus120572120574119890119879
119894119865119890119894119865+ (119864
119879
119894minus 119890
119879
119894119865)119875
1198653
1198691(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)2
41198963
minus1
2120572 tr (119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865119890119879
119894119865) minus
1
2120572 tr (Λ119879
119894119865
119890119894119865119890119879
119894119865
119879
119894119865
120581119894119865)
le minus1198961
100381710038171003817100381711989011989411986510038171003817100381710038172
minus 1198962
1003817100381710038171003817119864119894
10038171003817100381710038172
+1198691(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)2
41198963
10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
+1205732
21198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)2
41205721198964
10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
(65)
where 119896min = min1198961 119896
2 120585 = min119896
3 120572119896
4120573
2
2 and
119869(120593119894(119890
119879
119894119865 119864
119879
119894))
2
= 1198691(120593
119894(119890
119879
119894119865 119864
119879
119894))
2 + 1198692(120593
119894(119890
119879
119894119865 119864
119879
119894))
2 sothe following conclusion can be obtained
119894119871(119890
119894119865 119864
119894)
le minus119896min10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
+119869(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)210038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
4120585
le minus11988810038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
(66)
Therefore for an arbitrary constant 119888 minus119888120593119894(119890
119879
119894119865 119864
119879
119894)
2
is a negative semidefinite function which is defined in theadjustable interval119863 expressed as follows
119863 = 119889 (119905) | 119889 le 119869minus1
(2radic119896min120585) (67)
so that Lyapunov stability theory shows that the system isstable In order to make the subsystem of time varying con-strained reconfigurable modular robot tracking the desiredtrajectory progressively in this paper a novel decentralizedreinforcement learning robust optimal tracking controllerhas been designed by using the robust term to compensatethe neural network approximation errors Design the robustcontrol term as
119906119894119903119887
=119873
119903119887119890119894
119890119879119894119890119894+ 120577
(68)
In the equation above 120577 gt 0 is a constant And 119873119903119887can
be expressed as
119873119903119887ge [
[
1205752
ℎ119894
21198991
+1198991(minus120576
119894119886(119890
119894) minus 119877
minus1
119887119894(119890
119894) (nabla120576
119894119888(119890
119894)2))
2
21198992
+11989911198992
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
(minus120576119894119886(119890
119894) minus 119877
minus1
119887119894(119890
119894)nabla120576
119894119888(119890
119894)
2)
2
]
]
sdot(119890
119879
119894119890119894+ 120577)
211989911198992
1003817100381710038171003817119887119894 (119890119894)1003817100381710038171003817 119890
119879
119894119890119894
ge [1198992
1(minus120576
119894119886(119890
119894) minus 119877
minus1
119887119894(119890
119894)nabla120576
119894119888(119890
119894)
2)
2
+ 11989921205752
ℎ119894+ 2119899
2
11198992
2
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
sdot (minus120576119894119886(119890
119894) minus 119877
minus1
119887119894(119890
119894)nabla120576
119894119888(119890
119894)
2)
2
]
sdot(119890
119879
119894119890119894+ 120577)
41198992111989922
1003817100381710038171003817119887119894 (119890119894)1003817100381710038171003817 119890
119879
119894119890119894
(69)
Therefore the global control law can be designed asfollows
119906mix = 119906119894+ 119906
119894119903119887
= minus1
2119877minus1
119887119879
119894(119890
119894) 119878
119894119886(119890
119894)119879
119894119886+
119873119903119887119890119894
119890119879119894119890119894+ 120577
(70)
Theorem 6 Considering dynamics of the subsystem of timevarying constrained reconfigurable modular robot in (14) if thesystem parameters conditions and the assumptions are held thecritic-NN action-NN and identifier are given by (33) (34)and (45) respectively and the decentralized robust optimaltracking controller of the subsystem in (70) is adopted thenthe system is closed-loop stability and the desired trajectory canbe tracked asymptotically by the actual output
Proof Design the Lyapunov function as follows
119881119894119906(119890
119894 119906mix) =
1
21198991
tr 119879
119894119888
119894119888 +
1198991
21198992
tr 119879
119894119886
119894119886
+ 11989911198992[119890
119879
119894119865119890119894119865+ Ξint
infin
0
119903119894(119890
119894 119906mix) 119889120591]
(71)
where Ξ gt 0 is the undetermined parameter 119905 le 120591 lt infin Thederivation of (71) is shown as follows
119894119906(119890
119894 119906mix)
=1
21198991
tr 119879
119894119888
119882
119894119888 +
1198991
21198992
tr 119879
119894119886
119882
119894119886
+ 11989911198992[119890
119879
119894119865119890119894119865+ Ξ119903
119894(119890
119894 119906mix)]
=1
21198991
tr 119879
119894119888(minus119899
1119871119894(119871
119879
119894
119894119888+ 120575
ℎ119894))
+1198991
21198992
tr
times
119879
119894119886
[[[[
[
minus1198992119878119894119886(119890
119894)(
119879
119894119886119878119894119886(119890
119894) minus 120576
119894119886(119890
119894)
+1
2119877minus1
119887119894(119890
119894) nabla119878
119894119888(119890
119894)119879
119894119888
minus1
2119877minus1
119887119894(119890
119894) nabla120576
119894119888(119890
119894)
)
]]]]
]
+ 11989911198992119890119879
119894119865(119882
119879
119894119865120581 (Λ
119879
119894119865119890119894) + 120576
119894119865(119890
119894) + 119887
119894(119890
119894) mix)
+Ξ (119890119879
119894119876119890119890119894+ 119906
119879
mix119877119906mix)
10 Mathematical Problems in Engineering
le minus(1198712
119894119898minus1198991
21198712
119894119872)10038171003817100381710038171003817
119894119888
10038171003817100381710038171003817
2
+1
21198991
1205752
ℎ119894
minus (11989911198782
119894119886119898minus3
4119899111989921198782
119894119886119872)10038171003817100381710038171003817
119894119886
10038171003817100381710038171003817
2
+1198991
41198992
10038171003817100381710038171003817119877minus110038171003817100381710038171003817
21003817100381710038171003817119887119894(119890119894)10038171003817100381710038172 10038171003817100381710038171003817nabla119878
2
119894119888119872
10038171003817100381710038171003817
10038171003817100381710038171003817
119894119888
10038171003817100381710038171003817
2
+1198991
21198992
(120576119894119886(119890
119894) + 119877
minus1
119887119894(119890
119894)nabla120576
119894119888(119890
119894)
2)
119879
sdot (120576119894119886(119890
119894) + 119877
minus1
119887119894(119890
119894) nabla120576
119894119888(119890
119894)
2)
+ 11989911198992
1003817100381710038171003817119887119894 (119890119894)10038171003817100381710038172
120576119894119886(119890
119894)119879
120576119894119886(119890
119894)
+ 119899111989921198782
119894119886119872
1003817100381710038171003817119887119894 (119890119894)1003817100381710038171003817210038171003817100381710038171003817
119894119886
10038171003817100381710038171003817
2
+ 11989911198992Ξ120582min (119876119890
)1003817100381710038171003817119890119894119865
10038171003817100381710038172
+ 11989911198992(1003817100381710038171003817119887119894 (119890119894)
10038171003817100381710038172
minus Ξ120582min (119877))1003817100381710038171003817119906mix
10038171003817100381710038172
(72)
If the following inequalities can satisfy
120582min 1198761198901003817100381710038171003817119890119894119865
10038171003817100381710038172
2le 119890
119879
119894119865119876119890119890119894119865le 120582max 119876119890
1003817100381710038171003817119890119894119865
10038171003817100381710038172
2
120582min 1198771003817100381710038171003817119906mix
10038171003817100381710038172
2le 119906
119879
mix119877119906mix le 120582max 1198771003817100381710038171003817119906mix
10038171003817100381710038172
2
Ξ gt
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
120582min 119877
(73)
then 119894119906(119890
119894 119906mix) can be further transformed as
119894119906(119890
119894 119906mix)
le minus(1198712
119894119898minus1198991
21198712
119894119872minus
1198991
41198992
10038171003817100381710038171003817119877minus110038171003817100381710038171003817
21003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
nabla1198782
119894119888119872)10038171003817100381710038171003817
119894119888
10038171003817100381710038171003817
2
minus (11989911198782
119894119886119898minus3
4119899111989921198782
119894119886119872minus 119899
111989921198782
119894119886119872
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
)10038171003817100381710038171003817
119894119886
10038171003817100381710038171003817
2
minus 11989911198992(1003817100381710038171003817119887119894(119890119894)
10038171003817100381710038172
+ Ξ120582min (119877))1003817100381710038171003817119906mix
10038171003817100381710038172
minus 11989911198992Ξ120582min (119876119890
)1003817100381710038171003817119890119894119865
10038171003817100381710038172
le minus11989911198992Ξ120582min (119876119890
)1003817100381710038171003817119890119894119865
10038171003817100381710038172
(74)
Therefore we can get the conclusion that 119894119906(119890
119894 119906mix) lt 0
4 Simulations
In order to verify the validity and convergence of the pro-posed decentralized reinforcement learning robust optimaltracking control method based on ACI and to study theconvergence of the error by comparing the simulation resultin this paper two different configurations of the time varying
external constrained reconfigurablemodular robot have beenapplied shown in Figures 3 and 4
For the sake of the facilitation of the analysis of theconfigurations above we can transform them into a formof analytic charts which are shown in Figures 5 and 6where 119871
1 119871
2 and 119871
4are the length of the links 119871
3is the
distance between the time varying constraint joint and thebase modular
The time varying constraint can be defined as a kindof column which rotated about with a certain degree offreedom The constraint equations of configuration A andconfiguration B are shown as follows
Ψ119860(119902 119905) = 119871
1cos 119902
1+ 119871
2cos 119902
2minus [119871
3+ 119871
4cot120572 (119905)]
Ψ119861(119902 119905) = 119871
1+ 119871
2cos 119902
2minus [119871
3+ 119871
4cot120572 (119905)]
(75)
In the equation above the angle 120572(119905) between the timevarying constraint and the119883-axis can be defined as follows
120572 (119905) = 075120587 + 02 sin 119905
2 (76)
The initial positions of joint models are 1199021(0) = 2 119902
2(0) =
2 in configurationA and 1199021(0) = 2 119902
2(0) = 2 in configuration
BThe initial velocities of joints are zerosThe dynamicmodelof configurations A and B is designed as follows
119872119860(119902) = [
036 cos (1199022) + 06066 018 cos (119902
2) + 01233
018 cos (1199022) + 01233 01233
]
119872119861(119902) = [
017 minus 01166cos2 (1199022) minus006 cos (119902
2)
minus006 cos (1199022) 01233
]
119862119860(119902 119902) = [
minus036 sin (1199022) 119902
2minus018 sin (119902
2) 119902
2
018 sin (1199022) ( 119902
1minus 119902
2) 018 sin (119902
2) 119902
1
]
119862119861(119902 119902) = [
01166 sin (21199022) 119902
2006 sin (119902
2) 119902
2
006 sin (1199022) 119902
20
]
119866119860(119902) = [
minus588 sin (1199021+ 119902
2) minus 1764 sin (119902
1)
minus588 sin (1199021+ 119902
2)
]
119866119861(119902) = [
0
minus588 cos (1199022)]
119865119860(119902 119902) = [
1199021+ 10 sin (3119902
1) + 2 sgn ( 119902
1)
12 1199022+ 5 sin (2119902
2) + sgn ( 119902
2)]
119865119861(119902 119902) = [
0
15 1199022+ sin (119902
2) + 12 sgn ( 119902
2)]
(77)
The desired trajectory of configurations A and B is shown asConfiguration A
1199101119889
= 05 cos (119905) + 02 sin (3119905)
1199102119889
= Θ (1199101119889 119905)
= arcsin[1198711sin (120572 (119905) minus 119910
1119889) minus 119871
3sin (120572 (119905))
1198712
] + 120572 (119905)
(78)
Mathematical Problems in Engineering 11
Figure 3 Configuration A for simulation
Figure 4 Configuration B for simulation
Configuration B
1199101119889
= 0
1199102119889
= Θ (1199101119889 119905)
= arcsin [1198711sin (120572 (119905)) minus 119871
3sin (120572 (119905))
1198712
] + 120572 (119905)
(79)
Due to the limit of dynamic external constraints thevariety of joint 1 in configuration B is zero
In order to confirm that the adopted method can beapplied in different configurations and to verify the trackingperformance for the subsystem desired trajectory by usingthe ACI and RBF-NN based decentralized reinforcementlearning robust optimal tracking control method in this partthe comparative simulations which include the classic RBFneural network control method and ACI based decentral-ized reinforcement learning robust optimal tracking controlmethod have been adopted respectively
From Figures 7 8 9 10 11 12 13 and 14 the joint trackingand error curves are shown by using classic RBF neural net-work [4] to compensate the effect of the dynamic nonlinearterm and the interconnection term in the subsystem Figures7 to 10 show that the actual output trajectories take about2 seconds to track the desired trajectories This is due tothe fact that the classical neural network method requires alonger training process and parameter adaptive process Theerror curves in Figures 11ndash14 show that the joint subsystem
q1L2
L3
L4
L1
Y
X
120572
q2
Figure 5 The analytic chart of configuration A
q2
L4
L2
L1
L3
Y
120572
X
q1
Figure 6 The analytic chart of configuration B
constraints cannot be well compensated by adopting the clas-sical neural control method for the reconfigurable modularrobot when the time varying constrains exist When thejoint output variables turn larger the reconfigurable modularsystem cannot exhibit a good robustness and the trackingerrors are larger than before
Figures 15 16 17 18 19 20 21 and 22 showed thejoint tracking and error curves by adopting ACI to identifythe optimal 119876-function optimal control policy and globaldynamic nonlinear terms of the HJB equation in the sub-system Figures 15 to 18 show that the actual output trajec-tories of the subsystems can track the desired trajectoriesin 05 seconds by using the proposed robust reinforcementlearning optimal control method This is due to the excellentidentifying ability of ACIwhich can identify the uncertaintiescontained in the subsystems in a short timeThe error curvesin Figures 19ndash22 show that the tracking error is very smalland it can converge to zero in a short time Besides whenthe proposed decentralized control method is adopted for thejoint subsystems the robustness is manifested
Figures 23 and 24 showed the 3D-tip trajectory curvesby using ACI algorithm These two figures show that theproposed decentralized control method can fully satisfythe accessibility requirement of the reconfigurable modularrobot Besides the singular displacements of the joint subsys-tems and the end-effector would not appear The parametersdefined in ACI are shown in Table 1
12 Mathematical Problems in Engineering
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
25
Time (s)
Join
t 1 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
minus1
minus05
Figure 7 Trajectory tracking curve of configuration A joint 1 withRBF neural network
0 1 2 3 4 5 6 7 8 9 1008
1
12
14
16
18
2
22
24
26
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 8 Trajectory tracking curve of configuration A joint 2 withRBF neural network
Table 1 Parameter list of action-critic-identifier
119896 120572 120592 1205781198861
1205781198862
120578119888
1205731
1205732
120574
800 300 0005 10 50 20 02 2 05
The simulation results show that compared with thesituation of using classic RBF neural network controllerthe decentralized robust optimal tracking control methodbased on ACI and reinforcement learning can be appliedinto different configurations of the time varying constrainedreconfigurable modular robot And the joint variables cantrack the desired trajectory within a very short time indifferent configurations and the fluctuation of the errorconvergence range is minimal
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus2
minus15
minus05
minus1
Desired trajectoryActual trajectory
Figure 9 Trajectory tracking curve of configuration B joint 1 withRBF neural network
0 1 2 3 4 5 6 7 8 9 1017
18
19
2
21
22
23
24
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 10 Trajectory tracking curve of configuration B joint 2 withRBF neural network
0 1 2 3 4 5 6 7 8 9 10
0
002
004
006
008
01
Time (s)
Join
t 1 er
ror (
rad)
minus01
minus008
minus006
minus004
minus002
Figure 11 Tracking error curve of configuration A joint 1 with RBFneural network
Mathematical Problems in Engineering 13
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
002
004
006
008
01
Join
t 2 er
ror (
rad)
minus01
minus008
minus006
minus004
minus002
Figure 12 Tracking error curve of configuration A joint 2 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10
0
001
002
003
004
005
Time (s)
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 13 Tracking error curve of configuration B joint 1 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 2 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 14 Tracking error curve of configuration B joint 2 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus1
minus05
Desired trajectoryActual trajectory
Figure 15 Trajectory tracking curve of configuration A joint 1 withACI based reinforcement learning
0 1 2 3 4 5 6 7 8 9 1008
1
12
14
16
18
2
22
24
26
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 16 Trajectory tracking curve of configuration A joint 2 withACI based reinforcement learning
5 Conclusions and Future Work
In this paper combining ACI with RBF neural network anovel decentralized reinforcement learning robust optimaltracking control theory has been proposed for time varyingconstrained reconfigurable modular robots Moreover thistheory is used to solve the problem of the continuoustime nonlinear optimal control policy for strongly coupleduncertainty robotic system Firstly we build the model ofsubsystem with the time varying external constraints anddescribed the global robot system as a synthesis of inter-connected subsystem Secondly ACI is used to estimate theHJB equation and the global uncertainty where a continuous-time optimal 119876-function is adopted to take the place oftraditional optimal value function the optimal 119876-function
14 Mathematical Problems in Engineering
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus15
minus05
minus2
minus1
Desired trajectoryActual trajectory
Figure 17 Trajectory tracking curve of configuration B joint 1 withACI based reinforcement learning
0 1 2 3 4 5 6 7 8 9 1017
18
19
2
21
22
23
24
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 18 Trajectory tracking curve of configuration B joint 2 withACI based reinforcement learning
and the optimal control police are approximated by critic-NNand action-NN and the global uncertainty is identified bythe identifier Thirdly we design a novel decentralized robustoptimal tracking controller so the desired trajectory can betracked and the tracking error could converge to zero infinite timeOn this basis two kinds of Lyapunov functions aredesigned to confirm the stability of ACI and the subsystemFinally in order to confirm the superiority of the proposedcontrol theory the comparative simulation examples havebeen presented combining two different configurations of therobot
In the future more complex configuration for timevarying external constrained reconfigurable modular robotcan be included Therefore the decentralized controller withhigher control and error precision will be considered
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 19 Tracking error curve of configuration A joint 1 with ACIbased reinforcement learning
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005Jo
int 2
erro
r (ra
d)
minus005
minus004
minus003
minus002
minus001
Figure 20 Tracking error curve of configuration A joint 2 with ACIbased reinforcement learning
0
0
1 2 3 4 5 6 7 8 9 10Time (s)
001
002
003
004
005
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 21 Tracking error curve of configuration B joint 1 with ACIbased reinforcement learning
Mathematical Problems in Engineering 15
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 2 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 22 Tracking error curve of configuration B joint 2 with ACIbased reinforcement learning
005
1
02 03 04 05 06 07
0
01
02
03
minus1
minus05minus02
minus01
xla
bel
y labelz label
Desired trajectoryActual trajectory
Figure 23 3D-tip trajectory curve of configuration A with ACI
005
1
035 036 037 038 039 04
006008
01012014016018
minus1
minus05
xla
bel
y labelz label
Desired trajectoryActual trajectory
Figure 24 3D-tip trajectory curve of configuration B with ACI
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work is financially supported by the National NaturalScience Foundation of China (Grant nos 61374051 and60974010) the Scientific Technological Development PlanProject in Jilin Province of China (Grant no 20110705) Thefirst author is funded by China Scholarship Council
References
[1] Y Li and B Dong ldquoDecentralized ADRC control for reconfig-urable manipulators based on VGSTA-ESO of sliding moderdquoINFORMATION vol 15 no 6 pp 2453ndash2466 2012
[2] Y LiM-C Zhu andY-C Li ldquoSimulation on robust neurofuzzycompensator for reconfigurable manipulator motion controlrdquoJournal of System Simulation vol 19 no 22 pp 5169ndash5174 2007
[3] M-C Zhu and Y-C Li ldquoDecentralized adaptive sliding modecontrol for reconfigurable manipulators using fuzzy logicrdquoJournal of Jilin University (Engineering and Technology Edition)vol 39 no 1 pp 170ndash176 2009
[4] L Zhu and Y Li ldquoDecentralized adaptive neural networkcontrol for reconfigurable manipulatorsrdquo in Proceedings of theChinese Control and Decision Conference (CCDC rsquo10) pp 1760ndash1765 Xuzhou China May 2010
[5] M Zhu Y Li and Y Li ldquoA new distributed control schemeof modular and reconfigurable robotsrdquo in Proceedings of theIEEE International Conference onMechatronics and Automation(ICMA rsquo07) pp 2622ndash2627 Harbin China August 2007
[6] M C Zhu Y Li and Y C Li ldquoObserver-based decentralizedadaptive fuzzy control for reconfigurable manipulatorrdquo Controland Decision vol 24 no 3 pp 429ndash434 2009
[7] S Richard and A G Barto Reinforcement Learning An Intro-duction The MIT Press London UK 1998
[8] F L Lewis and D Liu ReinForcement Learning and Approxi-mate Dynamic Programming For Feedback Control Wiley-IEEEPress New York NY USA 2012
[9] Y-K Xu and X-R Cao ldquoLebesgue-sampling-based optimalcontrol problems with time aggregationrdquo IEEE Transactions onAutomatic Control vol 56 no 5 pp 1097ndash1109 2011
[10] F L Lewis andD Vrabie ldquoReinforcement learning and adaptivedynamic programming for feedback controlrdquo IEEE Circuits andSystems Magazine vol 9 no 3 pp 32ndash50 2009
[11] X Xu H-G He and D Hu ldquoEfficient reinforcement learningusing recursive least-squares methodsrdquo Journal of ArtificialIntelligence Research vol 16 pp 259ndash292 2002
[12] H Zhang Q Wei and Y Luo ldquoA novel infinite-time optimaltracking control scheme for a class of discrete-time nonlinearsystems via the greedy HDP iteration algorithmrdquo IEEE Trans-actions on Systems Man and Cybernetics B vol 38 no 4 pp937ndash942 2008
[13] H Zhang R Song Q Wei and T Zhang ldquoOptimal trackingcontrol for a class of nonlinear discrete-time systems withtime delays based on heuristic dynamic programmingrdquo IEEETransactions on Neural Networks vol 22 no 12 pp 1851ndash18622011
16 Mathematical Problems in Engineering
[14] H Zhang X Xie and S Tong ldquoHomogenous polynomiallyparameter-dependent 119867
infinfilter designs of discrete-time fuzzy
systemsrdquo IEEE Transactions on Systems Man and CyberneticsB vol 41 no 5 pp 1313ndash1322 2011
[15] H Zhang L Cui X Zhang and Y Luo ldquoData-drivenrobust approximate optimal tracking control for unknown gen-eral nonlinear systems using adaptive dynamic programmingmethodrdquo IEEE Transactions on Neural Networks vol 22 no 12pp 2226ndash2236 2011
[16] J Zhang H Zhang Y Luo and H Liang ldquoOptimal controldesign for nonlinear systems adaptive dynamic programmingbased on fuzzy critic estimatorrdquo in The International JointConference on Neural Network pp 1ndash6 Brisbane Australia2012
[17] H Zhang D Gong B Chen and Z Liu ldquoSynchronization forcoupled neural networkswith interval delay a novel augmentedlyapunov-krasovskii functional methodrdquo IEEE Transactions onNeural Networks and Learning Systems vol 24 no 1 pp 58ndash702013
[18] S Bhasin K Dupree P M Patre and W E Dixon ldquoNeuralnetwork control of a robot interacting with an uncertain vis-coelastic environmentrdquo IEEE Transactions on Control SystemsTechnology vol 19 no 4 pp 947ndash955 2011
[19] S G Khan G Herrmann F Lewis T Pipe and C MelhuishldquoA Q-learning based Cartesian model reference compliancecontroller implementation for a humanoid robot armrdquo inProceedings of the IEEE 5th International Conference onRoboticsAutomation andMechatronics (RAM rsquo11) pp 214ndash219 QingdaoChina September 2011
[20] P K Patchaikani L Behera and G Prasad ldquoA single networkadaptive critic-based redundancy resolution scheme for robotmanipulatorsrdquo IEEE Transactions on Industrial Electronics vol59 no 8 pp 3241ndash3253 2012
[21] C J C H Watkins and P Dayan ldquoQ-learningrdquo MachineLearning vol 8 no 3-4 pp 279ndash292 1992
[22] C J CHWatkinsLearning fromdelayed rewards [PhD thesis]University of Cambridge Cambridge UK 1989
[23] F L Lewis and V L Syrmos Optimal Control John Wiley ampSons New York NY USA 1995
[24] M Sassano and A Astolfi ldquoDynamic approximate solutionsof the HJ inequality and of the HJB equation for input-affinenonlinear systemsrdquo IEEE Transactions on Automatic Controlvol 57 no 10 pp 2490ndash2503 2012
[25] Y X Wu and C Wang ldquoDeterministic learning based adaptivenetwork control of robot in task spacerdquoActa Automatica Sinicavol 39 no 1 pp 1ndash10 2013
[26] P M Patre W MacKunis K Kaiser and W E DixonldquoAsymptotic tracking for uncertain dynamic systems via amultilayer neural network feedforward and RISE feedbackcontrol structurerdquo IEEE Transactions on Automatic Control vol53 no 9 pp 2180ndash2185 2008
[27] B E Paden and S S Sastry ldquoA calculus for computing Filippovrsquosdifferential inclusion with application to the variable structurecontrol of robot manipulatorsrdquo IEEE Transactions on Circuitsand Systems vol 34 no 1 pp 73ndash82 1987
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
6 Mathematical Problems in Engineering
Action
Rewardfunction
HJB
error
Identifier
Subsystem
Critic
minus+
Qi(ei ai ui)Qi(ei ai ui)
ri(ei ai ui)
ri(ei ai ui)
Φi(ei ui)
Φi(ei ui)
Φi(ei )
eiF(t)
ui
ui
(t)
120575hi
1s
Figure 2 The architecture of action-critic-identifier
where 119882lowast means the ideal neural network weights and 120576(119909)
represents the estimation error In the case of using sufficientnumber of nodes if the center and width of the nodes arebuilt appropriately then any kind of continuous functioncould be approximated by RBF-NN Therefore the optimal119876-function and the optimal control policy can be expressedas follows
119876lowast
119894= 119882
119879
119894119878119894(119890
119894) + 120576
119894119888(119890
119894)
119906lowast
119894(119890
119894) = minus
1
2119877minus1
119887119879
119894(119890
119894) [ 119878
119894(119890
119894)119879
119882119894+ 120576
119894119886(119890
119894)]
(32)
where 119878119894(119890
119894) = [119904
1198941(119890
119894) sdot sdot sdot 119904
119894119899(119890
119894)]119879 indicates the smooth
basis function of the neural network 119882119894means the ideal
unknown neural network weight and 120576119894119888(119890
119894) and 120576
119894119886(119890
119894) are
the estimation error By using 119876119894and
119894(119890
119894) to estimate 119876
lowast
119894
and 119906lowast
119894(119890
119894) we can get the following equations
119876119894=
119879
119894119888119878119894119888(119890
119894) (33)
119894(119890
119894) = minus
1
2119877minus1
119887119879
119894(119890
119894) 119878
119894119886(119890
119894)119879
119894119886 (34)
According to the equations above 119894119888(119905) and
119894119886(119905) indicated
the weights of critic-NN and action-NN And the estimationerrors of weights are shown as follows
119894119888(119905) = 119882
119894minus
119894119888(119905) (35)
119894119886(119905) = 119882
119894minus
119894119886(119905) (36)
The update law of the weight for the critic-NN is a gradientdescent algorithm which is shown as follows
119882
119894119888(119905) = minus119899
1119871119894(119871
119879
119894
119894119888+ 119890
119879
119894119876119890119890119894+ 119906
119879
119894119877119906
119894) (37)
In the equation above 119899119894gt 0 is the adaptive gain of the neural
network 119871119894and 119897
119894are defined as
119871119894=
119897119894
119897119879119894119897119894+ 1
119897119894= nabla119878
119894119888(119890
119894) 119890
119894
(38)
Therefore according to the definition above the followinginequalities can be obtained
119871119894119898
le 119871119894le 119871
119894119872
119878119894119888119898
le 119878119894119888(119890
119894) le 119878
119894119888119872
119878119894119886119898
le 119878119894119886(119890
119894) le 119878
119894119886119872
(39)
Mathematical Problems in Engineering 7
Combining (35) with (38) we can get that
119882119894119888(119905) = minus119899
1119871119894(119871
119879
119894
119894119888+ 120575
ℎ119894) (40)
The update law of the weight for the action-NN is developedby a gradient descent algorithm expressed as follows
119882
119894119886(119905) = minus119899
2119878119894119886(119890
119894)
times ((119879
119894119886119878119894119886(119890
119894) +
1
2119877minus1
119887119894(119890
119894) nabla119878
119894119888(119890
119894)119879
119894119888))
119879
(41)
According to the estimation error of action-NN in (36) theoptimal control 119906lowast
119894(119890
119894) can minimize the optimal119876-function
and we can get the following equation
119882119879
119894119886119878119894119886(119890
119894) +
1
2119877minus1
119887119894(119890
119894) nabla119878
119894119888(119890
119894)119879
119882119894119888
+1
2119877minus1
119887119894(119890
119894) nabla120576
119894119888(119890
119894) + 120576
119894119886(119890
119894) = 0
(42)
Putting (41) into (42) we can get that
119882119894119886
= minus1198992119878119894119886(119890
119894)(
119879
119894119886119878119894119886(119890
119894) +
1
2119877minus1
119887119894(119890
119894) nabla119878
119894119888(119890
119894)119879
119894119888
minus1
2119877minus1
119887119894(119890
119894) nabla120576
119894119888(119890
119894) minus 120576
119894119886(119890
119894)
)
(43)
After using critic-NN and action-NN to estimate 119876119894and
119894(119890
119894) we need to design a kind of robust RBF-NN identifier
to identify the nonlinear uncertainties of the subsystem HereΦ
119894(119890
119894
119894) can be expressed as follows
Φ119894(119890
119894
119894) = 119890
119894119865= 119882
119879
119894119865120581 (Λ
119879
119894119865119890119894119865) + 120576
119894119865(119890
119894119865) + 119887
119894(119890
119894)
119894
(44)
where 120581(sdot) means the basic function of neural network and119882
119894119865Λ
119894119865indicate the unknown ideal neural network weights
Equation (44) can be identified by using robust RBF-NNidentifier so we can get
Φ119894(119890
119894
119894) = 119890
119894119865= 119882
119879
119894119865120581119894119865+ 119887
119894(119890
119894)
119894+ 120583
119894 (45)
Here 120581119894119865indicates the estimated value of the basic function of
the neural network 119882119894119865 Λ
119894119865are expressed as the estimated
value of neural network 120583119894isin R means the feedback error
term shown as follows [26]120583119894= 119896 (119890
119894119865(119905) minus 119890
119894119865(119905)) minus 119896 (119890
119894119865(0) minus 119890
119894119865(0)) + 120599
= 119896 (119890119894119865(119905) minus 119890
119894119865(0)) + 120599
120599 = (119896120572 + 120574) 119890119894119865+ 120573
1sat (119890
119894119865)
(46)
where 119896 120572 1205731 and 120574 are the positive control gain constants
and sat(sdot) is a saturation functionTherefore the state estima-tion error of the identifier-NN can be expressed as follows
119890119894119865= 119890
119894119865minus 119890
119894119865
= 119882119879
119894119865120581119894119865minus
119879
119894119865120581119894119865+ 120576
119894119865(119890
119894119865) minus 120583
119894
(47)
A filtered identification error is defined as follows
119864119894= 119890
119894119865+ 120572119890
119894119865 (48)
The derivation of the equation above is shown as
119894= 119882
119879
119894119865119894119865Λ119879
119894119865119890119894119865minus
119879
119894119865
120581119894119865
Λ119879
119894119865119890119894119865+ 120572 119890
119894119865minus 119896119864
119894minus 120574119890
119894119865
minus
119882119879
119894119865120581119894119865+ 120576
119894119865(119890
119894119865) minus 120573
1sat (119890
119894119865) minus
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865
(49)
Here the weight 119882119894119865 Λ
119894119865of the identification-NN can be
updated by
119882
119894119865= proj (Γ
119894119882119865
120581119894119865Λ119879
119894119865
119890119894119865119890119879
119894119865)
Λ119894119865= proj (Γ
119894Λ119865
119890119894119865119890119879
119894119865
119879
119894119865
120581119894119865)
(50)
where Γ119894119882119865
Γ119894Λ119865
are positive constant adaptation gain matri-ces In order to analyze the convergence of the filteredidentification error 119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865
can be divided into thefollowing form
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865
=1
2120581119894119865
119890119894119865[(Λ
119879
119894119865minus Λ
119879
119894119865) (119882
119879
119894119865minus
119879
119894119865)
+ (119882119879
119894119865minus
119879
119894119865) (Λ
119879
119894119865minus Λ
119879
119894119865)]
=1
2120581119894119865
119890119894119865[
119879
119894119865(Λ
119879
119894119865minus Λ
119879
119894119865) + (119882
119879
119894119865minus
119879
119894119865) Λ
119879
119894119865
minus119882119879
119894119865(Λ
119879
119894119865minus Λ
119879
119894119865) minus (119882
119879
119894119865minus
119879
119894119865)Λ
119879
119894119865
]
=1
2120581119894119865
119890119894119865[
119879
119894119865(Λ
119879
119894119865minus Λ
119879
119894119865) + (119882
119879
119894119865minus
119879
119894119865) Λ
119879
119894119865]
minus1
2120581119894119865
119890119894119865[119882
119879
119894119865(Λ
119879
119894119865minus Λ
119879
119894119865) + (119882
119879
119894119865minus
119879
119894119865)Λ
119879
119894119865]
=1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865
minus1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865minus1
2
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865
+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865
(51)
where 119879
119894119865= 119882
119879
119894119865minus
119879
119894119865 Λ119879
119894119865= Λ
119879
119894119865minus Λ
119879
119894119865 Putting (51) into
(49) then (49) can be reduced to the following form
119894= 119875
1198651+ 119875
1198652+ 119875
1198653minus 119896119864
119894minus 120574119890
119894119865minus 120573
1sat (119890
119894119865) (52)
8 Mathematical Problems in Engineering
Among the equations above 1198751198651+119875
1198652+119875
1198653can be expressed
respectively as follows
1198751198651
=1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865minus
119879
119894119865
120581119894119865
Λ119879
119894119865119890119894119865
+ 120572 119890119894119865minus
119879
119894119865120581119894119865
(53)
1198751198652
= minus1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865minus1
2
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865
+119882119879
119894119865119894119865Λ119879
119894119865119890119894119865+ 120576
119894119865(119890
119894119865)
(54)
1198751198653
=1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865 (55)
According to Assumption 1 (48) and (50) the upper boundsof 119875
1198651 119875
1198652 119875
1198653are shown as
100381710038171003817100381711987511986511003817100381710038171003817 le 119869
1(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
100381710038171003817100381711987511986521003817100381710038171003817 le 120589
1
100381710038171003817100381711987511986531003817100381710038171003817 le 120589
2
(56)
Combining (53) and (54) with (55) then we can get that100381710038171003817100381710038171198652
+ 1198653
10038171003817100381710038171003817le 120589
3+ 120589
41198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817 (57)
where 120593119894(119890
119879
119894119865 119864
119879
119894) = [119890
119879
119894119865119864119879
119894]119879 and 119869
119894(sdot) is a global invertible
nondecreasing function 120589119894 (119894 = 1 2 3 4) are computable
positive constants
Theorem 5 Considering dynamics of the subsystem of timevarying constrained reconfigurable modular robot in (14) andthe state equation (18) if the designed identifier and thecorresponding weight update laws are adopted then the globaluncertainty of the subsystem which depends explicitly on theerror term can be identified and the identification error isconverged and bounded
Proof Define the Lyapunov function as the follows
119881119894119871(119890
119894119865 119864
119894) =
1
2119864119879
119894119864119894+1
2120574119890
119879
119894119865119890119894119865+ 120603
119894(119905) + 120601
119894(119905) (58)
In the equation above 120603119894(119905) and 120601
119894(119905) can be expressed as
follows
119894(119905) = minus[
119864119879
119894(119875
1198652minus 120573
1sat (119890
119894119865)) + 119890
119879
1198941198651198751198653
minus12057321198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)100381710038171003817100381710038171003817100381710038171003817119890119894119865
1003817100381710038171003817
]
120603119894(0) = 120573
1
1003816100381610038161003816119890119894119865 (0)1003816100381610038161003816 minus 119890
119879
119894119865(0) (119875
1198652(0) + 119875
1198653(0))
(59)
120601119894(119905) =
1
4120572 [ tr (
119879
119894119865Γminus1
119894119882119865
119894119865) + tr (Λ
119879
119894119865Γminus1
119894Λ119865Λ
119894119865)] (60)
where tr(sdot) represents the trace of matrix Defining 119889 =
[119864119879
119894119890119879
11989411986512060312
11989412060112
119894] 120573
1 120573
2isin R are positive adaptation gains
which are chosen to ensure 120603119894(119905) ge 0 so we can get
1198801(119889) le 119881
119894119871(119890
119894119865 119864
119894) le 119880
2(119889) (61)
where
1198801(119889) =
1
2min (1 120574) 1198892
1198802(119889) = max (1 120574) 1198892
(62)
The derivation of (58) is shown as follows
119894119871(119890
119894119865 119864
119894) = nabla119881
119879
119894119871119870[
119894
119890119879
119894119865
1
212060312
119894119894
1
212060112
119894
120601119894]119879
(63)
where119870[sdot] is expressed as a Filipov set [27]So
119894119871(119890
119894119865 119864
119894) can be deformed as the following form
119894119871(119890
119894119865 119864
119894)
= [119864119879
119894120574119890
119879
1198941198652120603
12
1198942120601
12
119894]119870[
119894
119890119879
119894119865
1
212060312
119894119894
1
212060112
119894
120601119894]119879
le 120574119879
(
1
2
119882119879
119894119865
119894119865
Λ119879
119894119865
119894119865+
1
2
119879
119894119865
119894119865Λ119879
119894119865
119894119865minus
119879
119894119865
119894119865
Λ
119879
119894119865119890119894119865
+120572119894119865minus
1
2
119882119879
119894119865
119894119865
Λ119879
119894119865119890119894119865minus
1
2
119879
119894119865
119894119865Λ119879
119894119865119890119894119865minus 120574119890
119894119865
minus119879
119894119865120581119894119865+119882
119879
119894119865119894119865Λ119879
119894119865119890119894119865+ 120576
119894119865(119890
119894119865) +
1
2
119879
119894119865
119894119865
Λ119879
119894119865
119894119865
+
1
2
119879
119894119865
119894119865
Λ119879
119894119865
119894119865minus 119896119864
119894minus 120573
1119870[sat (119890
119894119865)]
)
minus119864119879
119894(
minus1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865minus1
2
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865
+119882119879
119894119865119894119865Λ119879
119894119865119890119894119865+ 120576
119894119865(119890
119894) minus 120573
1119870[sat (119890
119894119865)])
minus 119890119879
119894119865
1
2(
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865)
+ 120574119890119879
119894119865(119864
119894minus 120572119890
119894119865)
+ 12057321198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)100381710038171003817100381710038171003817100381710038171003817119890119894119865
1003817100381710038171003817
minus1
2120572 tr (119879
119894119865Γminus1
119894119882119865
119882
119894119865) minus
1
2120572 tr (Λ119879
119894119865Γminus1
119894Λ119865
Λ119894119865)
(64)
Put (53) (54) and (55) into (64) then we can get
119894119871(119890
119894119865 119864
119894)
= 119864119879
119894(119875
1198651+ 119875
1198652+ 119875
1198653minus 120573
1119870[sat (119890
119894119865)] minus 119896119864
119894minus 120574119890
119894119865)
+ 120574119890119879
119894119865(119864
119894minus 120572119890
119894119865)
minus 119864119879
119894(119875
1198652minus 120573
1119870[sat (119890
119894119865)])
minus 119890119879
1198941198651198751198653
+ 12057321198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)100381710038171003817100381710038171003817100381710038171003817119890119894119865
1003817100381710038171003817
minus1
2120572 tr (119879
119894119865Γminus1
119894119882119865
119882
119894119865)
minus1
2120572 tr (Λ119879
119894119865Γminus1
119894Λ119865
Λ119894119865)
Mathematical Problems in Engineering 9
= minus120572120574119890119879
119894119865119890119894119865+ (119864
119879
119894minus 119890
119879
119894119865)119875
1198653
1198691(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)2
41198963
minus1
2120572 tr (119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865119890119879
119894119865) minus
1
2120572 tr (Λ119879
119894119865
119890119894119865119890119879
119894119865
119879
119894119865
120581119894119865)
le minus1198961
100381710038171003817100381711989011989411986510038171003817100381710038172
minus 1198962
1003817100381710038171003817119864119894
10038171003817100381710038172
+1198691(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)2
41198963
10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
+1205732
21198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)2
41205721198964
10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
(65)
where 119896min = min1198961 119896
2 120585 = min119896
3 120572119896
4120573
2
2 and
119869(120593119894(119890
119879
119894119865 119864
119879
119894))
2
= 1198691(120593
119894(119890
119879
119894119865 119864
119879
119894))
2 + 1198692(120593
119894(119890
119879
119894119865 119864
119879
119894))
2 sothe following conclusion can be obtained
119894119871(119890
119894119865 119864
119894)
le minus119896min10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
+119869(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)210038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
4120585
le minus11988810038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
(66)
Therefore for an arbitrary constant 119888 minus119888120593119894(119890
119879
119894119865 119864
119879
119894)
2
is a negative semidefinite function which is defined in theadjustable interval119863 expressed as follows
119863 = 119889 (119905) | 119889 le 119869minus1
(2radic119896min120585) (67)
so that Lyapunov stability theory shows that the system isstable In order to make the subsystem of time varying con-strained reconfigurable modular robot tracking the desiredtrajectory progressively in this paper a novel decentralizedreinforcement learning robust optimal tracking controllerhas been designed by using the robust term to compensatethe neural network approximation errors Design the robustcontrol term as
119906119894119903119887
=119873
119903119887119890119894
119890119879119894119890119894+ 120577
(68)
In the equation above 120577 gt 0 is a constant And 119873119903119887can
be expressed as
119873119903119887ge [
[
1205752
ℎ119894
21198991
+1198991(minus120576
119894119886(119890
119894) minus 119877
minus1
119887119894(119890
119894) (nabla120576
119894119888(119890
119894)2))
2
21198992
+11989911198992
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
(minus120576119894119886(119890
119894) minus 119877
minus1
119887119894(119890
119894)nabla120576
119894119888(119890
119894)
2)
2
]
]
sdot(119890
119879
119894119890119894+ 120577)
211989911198992
1003817100381710038171003817119887119894 (119890119894)1003817100381710038171003817 119890
119879
119894119890119894
ge [1198992
1(minus120576
119894119886(119890
119894) minus 119877
minus1
119887119894(119890
119894)nabla120576
119894119888(119890
119894)
2)
2
+ 11989921205752
ℎ119894+ 2119899
2
11198992
2
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
sdot (minus120576119894119886(119890
119894) minus 119877
minus1
119887119894(119890
119894)nabla120576
119894119888(119890
119894)
2)
2
]
sdot(119890
119879
119894119890119894+ 120577)
41198992111989922
1003817100381710038171003817119887119894 (119890119894)1003817100381710038171003817 119890
119879
119894119890119894
(69)
Therefore the global control law can be designed asfollows
119906mix = 119906119894+ 119906
119894119903119887
= minus1
2119877minus1
119887119879
119894(119890
119894) 119878
119894119886(119890
119894)119879
119894119886+
119873119903119887119890119894
119890119879119894119890119894+ 120577
(70)
Theorem 6 Considering dynamics of the subsystem of timevarying constrained reconfigurable modular robot in (14) if thesystem parameters conditions and the assumptions are held thecritic-NN action-NN and identifier are given by (33) (34)and (45) respectively and the decentralized robust optimaltracking controller of the subsystem in (70) is adopted thenthe system is closed-loop stability and the desired trajectory canbe tracked asymptotically by the actual output
Proof Design the Lyapunov function as follows
119881119894119906(119890
119894 119906mix) =
1
21198991
tr 119879
119894119888
119894119888 +
1198991
21198992
tr 119879
119894119886
119894119886
+ 11989911198992[119890
119879
119894119865119890119894119865+ Ξint
infin
0
119903119894(119890
119894 119906mix) 119889120591]
(71)
where Ξ gt 0 is the undetermined parameter 119905 le 120591 lt infin Thederivation of (71) is shown as follows
119894119906(119890
119894 119906mix)
=1
21198991
tr 119879
119894119888
119882
119894119888 +
1198991
21198992
tr 119879
119894119886
119882
119894119886
+ 11989911198992[119890
119879
119894119865119890119894119865+ Ξ119903
119894(119890
119894 119906mix)]
=1
21198991
tr 119879
119894119888(minus119899
1119871119894(119871
119879
119894
119894119888+ 120575
ℎ119894))
+1198991
21198992
tr
times
119879
119894119886
[[[[
[
minus1198992119878119894119886(119890
119894)(
119879
119894119886119878119894119886(119890
119894) minus 120576
119894119886(119890
119894)
+1
2119877minus1
119887119894(119890
119894) nabla119878
119894119888(119890
119894)119879
119894119888
minus1
2119877minus1
119887119894(119890
119894) nabla120576
119894119888(119890
119894)
)
]]]]
]
+ 11989911198992119890119879
119894119865(119882
119879
119894119865120581 (Λ
119879
119894119865119890119894) + 120576
119894119865(119890
119894) + 119887
119894(119890
119894) mix)
+Ξ (119890119879
119894119876119890119890119894+ 119906
119879
mix119877119906mix)
10 Mathematical Problems in Engineering
le minus(1198712
119894119898minus1198991
21198712
119894119872)10038171003817100381710038171003817
119894119888
10038171003817100381710038171003817
2
+1
21198991
1205752
ℎ119894
minus (11989911198782
119894119886119898minus3
4119899111989921198782
119894119886119872)10038171003817100381710038171003817
119894119886
10038171003817100381710038171003817
2
+1198991
41198992
10038171003817100381710038171003817119877minus110038171003817100381710038171003817
21003817100381710038171003817119887119894(119890119894)10038171003817100381710038172 10038171003817100381710038171003817nabla119878
2
119894119888119872
10038171003817100381710038171003817
10038171003817100381710038171003817
119894119888
10038171003817100381710038171003817
2
+1198991
21198992
(120576119894119886(119890
119894) + 119877
minus1
119887119894(119890
119894)nabla120576
119894119888(119890
119894)
2)
119879
sdot (120576119894119886(119890
119894) + 119877
minus1
119887119894(119890
119894) nabla120576
119894119888(119890
119894)
2)
+ 11989911198992
1003817100381710038171003817119887119894 (119890119894)10038171003817100381710038172
120576119894119886(119890
119894)119879
120576119894119886(119890
119894)
+ 119899111989921198782
119894119886119872
1003817100381710038171003817119887119894 (119890119894)1003817100381710038171003817210038171003817100381710038171003817
119894119886
10038171003817100381710038171003817
2
+ 11989911198992Ξ120582min (119876119890
)1003817100381710038171003817119890119894119865
10038171003817100381710038172
+ 11989911198992(1003817100381710038171003817119887119894 (119890119894)
10038171003817100381710038172
minus Ξ120582min (119877))1003817100381710038171003817119906mix
10038171003817100381710038172
(72)
If the following inequalities can satisfy
120582min 1198761198901003817100381710038171003817119890119894119865
10038171003817100381710038172
2le 119890
119879
119894119865119876119890119890119894119865le 120582max 119876119890
1003817100381710038171003817119890119894119865
10038171003817100381710038172
2
120582min 1198771003817100381710038171003817119906mix
10038171003817100381710038172
2le 119906
119879
mix119877119906mix le 120582max 1198771003817100381710038171003817119906mix
10038171003817100381710038172
2
Ξ gt
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
120582min 119877
(73)
then 119894119906(119890
119894 119906mix) can be further transformed as
119894119906(119890
119894 119906mix)
le minus(1198712
119894119898minus1198991
21198712
119894119872minus
1198991
41198992
10038171003817100381710038171003817119877minus110038171003817100381710038171003817
21003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
nabla1198782
119894119888119872)10038171003817100381710038171003817
119894119888
10038171003817100381710038171003817
2
minus (11989911198782
119894119886119898minus3
4119899111989921198782
119894119886119872minus 119899
111989921198782
119894119886119872
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
)10038171003817100381710038171003817
119894119886
10038171003817100381710038171003817
2
minus 11989911198992(1003817100381710038171003817119887119894(119890119894)
10038171003817100381710038172
+ Ξ120582min (119877))1003817100381710038171003817119906mix
10038171003817100381710038172
minus 11989911198992Ξ120582min (119876119890
)1003817100381710038171003817119890119894119865
10038171003817100381710038172
le minus11989911198992Ξ120582min (119876119890
)1003817100381710038171003817119890119894119865
10038171003817100381710038172
(74)
Therefore we can get the conclusion that 119894119906(119890
119894 119906mix) lt 0
4 Simulations
In order to verify the validity and convergence of the pro-posed decentralized reinforcement learning robust optimaltracking control method based on ACI and to study theconvergence of the error by comparing the simulation resultin this paper two different configurations of the time varying
external constrained reconfigurablemodular robot have beenapplied shown in Figures 3 and 4
For the sake of the facilitation of the analysis of theconfigurations above we can transform them into a formof analytic charts which are shown in Figures 5 and 6where 119871
1 119871
2 and 119871
4are the length of the links 119871
3is the
distance between the time varying constraint joint and thebase modular
The time varying constraint can be defined as a kindof column which rotated about with a certain degree offreedom The constraint equations of configuration A andconfiguration B are shown as follows
Ψ119860(119902 119905) = 119871
1cos 119902
1+ 119871
2cos 119902
2minus [119871
3+ 119871
4cot120572 (119905)]
Ψ119861(119902 119905) = 119871
1+ 119871
2cos 119902
2minus [119871
3+ 119871
4cot120572 (119905)]
(75)
In the equation above the angle 120572(119905) between the timevarying constraint and the119883-axis can be defined as follows
120572 (119905) = 075120587 + 02 sin 119905
2 (76)
The initial positions of joint models are 1199021(0) = 2 119902
2(0) =
2 in configurationA and 1199021(0) = 2 119902
2(0) = 2 in configuration
BThe initial velocities of joints are zerosThe dynamicmodelof configurations A and B is designed as follows
119872119860(119902) = [
036 cos (1199022) + 06066 018 cos (119902
2) + 01233
018 cos (1199022) + 01233 01233
]
119872119861(119902) = [
017 minus 01166cos2 (1199022) minus006 cos (119902
2)
minus006 cos (1199022) 01233
]
119862119860(119902 119902) = [
minus036 sin (1199022) 119902
2minus018 sin (119902
2) 119902
2
018 sin (1199022) ( 119902
1minus 119902
2) 018 sin (119902
2) 119902
1
]
119862119861(119902 119902) = [
01166 sin (21199022) 119902
2006 sin (119902
2) 119902
2
006 sin (1199022) 119902
20
]
119866119860(119902) = [
minus588 sin (1199021+ 119902
2) minus 1764 sin (119902
1)
minus588 sin (1199021+ 119902
2)
]
119866119861(119902) = [
0
minus588 cos (1199022)]
119865119860(119902 119902) = [
1199021+ 10 sin (3119902
1) + 2 sgn ( 119902
1)
12 1199022+ 5 sin (2119902
2) + sgn ( 119902
2)]
119865119861(119902 119902) = [
0
15 1199022+ sin (119902
2) + 12 sgn ( 119902
2)]
(77)
The desired trajectory of configurations A and B is shown asConfiguration A
1199101119889
= 05 cos (119905) + 02 sin (3119905)
1199102119889
= Θ (1199101119889 119905)
= arcsin[1198711sin (120572 (119905) minus 119910
1119889) minus 119871
3sin (120572 (119905))
1198712
] + 120572 (119905)
(78)
Mathematical Problems in Engineering 11
Figure 3 Configuration A for simulation
Figure 4 Configuration B for simulation
Configuration B
1199101119889
= 0
1199102119889
= Θ (1199101119889 119905)
= arcsin [1198711sin (120572 (119905)) minus 119871
3sin (120572 (119905))
1198712
] + 120572 (119905)
(79)
Due to the limit of dynamic external constraints thevariety of joint 1 in configuration B is zero
In order to confirm that the adopted method can beapplied in different configurations and to verify the trackingperformance for the subsystem desired trajectory by usingthe ACI and RBF-NN based decentralized reinforcementlearning robust optimal tracking control method in this partthe comparative simulations which include the classic RBFneural network control method and ACI based decentral-ized reinforcement learning robust optimal tracking controlmethod have been adopted respectively
From Figures 7 8 9 10 11 12 13 and 14 the joint trackingand error curves are shown by using classic RBF neural net-work [4] to compensate the effect of the dynamic nonlinearterm and the interconnection term in the subsystem Figures7 to 10 show that the actual output trajectories take about2 seconds to track the desired trajectories This is due tothe fact that the classical neural network method requires alonger training process and parameter adaptive process Theerror curves in Figures 11ndash14 show that the joint subsystem
q1L2
L3
L4
L1
Y
X
120572
q2
Figure 5 The analytic chart of configuration A
q2
L4
L2
L1
L3
Y
120572
X
q1
Figure 6 The analytic chart of configuration B
constraints cannot be well compensated by adopting the clas-sical neural control method for the reconfigurable modularrobot when the time varying constrains exist When thejoint output variables turn larger the reconfigurable modularsystem cannot exhibit a good robustness and the trackingerrors are larger than before
Figures 15 16 17 18 19 20 21 and 22 showed thejoint tracking and error curves by adopting ACI to identifythe optimal 119876-function optimal control policy and globaldynamic nonlinear terms of the HJB equation in the sub-system Figures 15 to 18 show that the actual output trajec-tories of the subsystems can track the desired trajectoriesin 05 seconds by using the proposed robust reinforcementlearning optimal control method This is due to the excellentidentifying ability of ACIwhich can identify the uncertaintiescontained in the subsystems in a short timeThe error curvesin Figures 19ndash22 show that the tracking error is very smalland it can converge to zero in a short time Besides whenthe proposed decentralized control method is adopted for thejoint subsystems the robustness is manifested
Figures 23 and 24 showed the 3D-tip trajectory curvesby using ACI algorithm These two figures show that theproposed decentralized control method can fully satisfythe accessibility requirement of the reconfigurable modularrobot Besides the singular displacements of the joint subsys-tems and the end-effector would not appear The parametersdefined in ACI are shown in Table 1
12 Mathematical Problems in Engineering
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
25
Time (s)
Join
t 1 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
minus1
minus05
Figure 7 Trajectory tracking curve of configuration A joint 1 withRBF neural network
0 1 2 3 4 5 6 7 8 9 1008
1
12
14
16
18
2
22
24
26
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 8 Trajectory tracking curve of configuration A joint 2 withRBF neural network
Table 1 Parameter list of action-critic-identifier
119896 120572 120592 1205781198861
1205781198862
120578119888
1205731
1205732
120574
800 300 0005 10 50 20 02 2 05
The simulation results show that compared with thesituation of using classic RBF neural network controllerthe decentralized robust optimal tracking control methodbased on ACI and reinforcement learning can be appliedinto different configurations of the time varying constrainedreconfigurable modular robot And the joint variables cantrack the desired trajectory within a very short time indifferent configurations and the fluctuation of the errorconvergence range is minimal
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus2
minus15
minus05
minus1
Desired trajectoryActual trajectory
Figure 9 Trajectory tracking curve of configuration B joint 1 withRBF neural network
0 1 2 3 4 5 6 7 8 9 1017
18
19
2
21
22
23
24
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 10 Trajectory tracking curve of configuration B joint 2 withRBF neural network
0 1 2 3 4 5 6 7 8 9 10
0
002
004
006
008
01
Time (s)
Join
t 1 er
ror (
rad)
minus01
minus008
minus006
minus004
minus002
Figure 11 Tracking error curve of configuration A joint 1 with RBFneural network
Mathematical Problems in Engineering 13
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
002
004
006
008
01
Join
t 2 er
ror (
rad)
minus01
minus008
minus006
minus004
minus002
Figure 12 Tracking error curve of configuration A joint 2 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10
0
001
002
003
004
005
Time (s)
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 13 Tracking error curve of configuration B joint 1 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 2 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 14 Tracking error curve of configuration B joint 2 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus1
minus05
Desired trajectoryActual trajectory
Figure 15 Trajectory tracking curve of configuration A joint 1 withACI based reinforcement learning
0 1 2 3 4 5 6 7 8 9 1008
1
12
14
16
18
2
22
24
26
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 16 Trajectory tracking curve of configuration A joint 2 withACI based reinforcement learning
5 Conclusions and Future Work
In this paper combining ACI with RBF neural network anovel decentralized reinforcement learning robust optimaltracking control theory has been proposed for time varyingconstrained reconfigurable modular robots Moreover thistheory is used to solve the problem of the continuoustime nonlinear optimal control policy for strongly coupleduncertainty robotic system Firstly we build the model ofsubsystem with the time varying external constraints anddescribed the global robot system as a synthesis of inter-connected subsystem Secondly ACI is used to estimate theHJB equation and the global uncertainty where a continuous-time optimal 119876-function is adopted to take the place oftraditional optimal value function the optimal 119876-function
14 Mathematical Problems in Engineering
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus15
minus05
minus2
minus1
Desired trajectoryActual trajectory
Figure 17 Trajectory tracking curve of configuration B joint 1 withACI based reinforcement learning
0 1 2 3 4 5 6 7 8 9 1017
18
19
2
21
22
23
24
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 18 Trajectory tracking curve of configuration B joint 2 withACI based reinforcement learning
and the optimal control police are approximated by critic-NNand action-NN and the global uncertainty is identified bythe identifier Thirdly we design a novel decentralized robustoptimal tracking controller so the desired trajectory can betracked and the tracking error could converge to zero infinite timeOn this basis two kinds of Lyapunov functions aredesigned to confirm the stability of ACI and the subsystemFinally in order to confirm the superiority of the proposedcontrol theory the comparative simulation examples havebeen presented combining two different configurations of therobot
In the future more complex configuration for timevarying external constrained reconfigurable modular robotcan be included Therefore the decentralized controller withhigher control and error precision will be considered
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 19 Tracking error curve of configuration A joint 1 with ACIbased reinforcement learning
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005Jo
int 2
erro
r (ra
d)
minus005
minus004
minus003
minus002
minus001
Figure 20 Tracking error curve of configuration A joint 2 with ACIbased reinforcement learning
0
0
1 2 3 4 5 6 7 8 9 10Time (s)
001
002
003
004
005
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 21 Tracking error curve of configuration B joint 1 with ACIbased reinforcement learning
Mathematical Problems in Engineering 15
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 2 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 22 Tracking error curve of configuration B joint 2 with ACIbased reinforcement learning
005
1
02 03 04 05 06 07
0
01
02
03
minus1
minus05minus02
minus01
xla
bel
y labelz label
Desired trajectoryActual trajectory
Figure 23 3D-tip trajectory curve of configuration A with ACI
005
1
035 036 037 038 039 04
006008
01012014016018
minus1
minus05
xla
bel
y labelz label
Desired trajectoryActual trajectory
Figure 24 3D-tip trajectory curve of configuration B with ACI
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work is financially supported by the National NaturalScience Foundation of China (Grant nos 61374051 and60974010) the Scientific Technological Development PlanProject in Jilin Province of China (Grant no 20110705) Thefirst author is funded by China Scholarship Council
References
[1] Y Li and B Dong ldquoDecentralized ADRC control for reconfig-urable manipulators based on VGSTA-ESO of sliding moderdquoINFORMATION vol 15 no 6 pp 2453ndash2466 2012
[2] Y LiM-C Zhu andY-C Li ldquoSimulation on robust neurofuzzycompensator for reconfigurable manipulator motion controlrdquoJournal of System Simulation vol 19 no 22 pp 5169ndash5174 2007
[3] M-C Zhu and Y-C Li ldquoDecentralized adaptive sliding modecontrol for reconfigurable manipulators using fuzzy logicrdquoJournal of Jilin University (Engineering and Technology Edition)vol 39 no 1 pp 170ndash176 2009
[4] L Zhu and Y Li ldquoDecentralized adaptive neural networkcontrol for reconfigurable manipulatorsrdquo in Proceedings of theChinese Control and Decision Conference (CCDC rsquo10) pp 1760ndash1765 Xuzhou China May 2010
[5] M Zhu Y Li and Y Li ldquoA new distributed control schemeof modular and reconfigurable robotsrdquo in Proceedings of theIEEE International Conference onMechatronics and Automation(ICMA rsquo07) pp 2622ndash2627 Harbin China August 2007
[6] M C Zhu Y Li and Y C Li ldquoObserver-based decentralizedadaptive fuzzy control for reconfigurable manipulatorrdquo Controland Decision vol 24 no 3 pp 429ndash434 2009
[7] S Richard and A G Barto Reinforcement Learning An Intro-duction The MIT Press London UK 1998
[8] F L Lewis and D Liu ReinForcement Learning and Approxi-mate Dynamic Programming For Feedback Control Wiley-IEEEPress New York NY USA 2012
[9] Y-K Xu and X-R Cao ldquoLebesgue-sampling-based optimalcontrol problems with time aggregationrdquo IEEE Transactions onAutomatic Control vol 56 no 5 pp 1097ndash1109 2011
[10] F L Lewis andD Vrabie ldquoReinforcement learning and adaptivedynamic programming for feedback controlrdquo IEEE Circuits andSystems Magazine vol 9 no 3 pp 32ndash50 2009
[11] X Xu H-G He and D Hu ldquoEfficient reinforcement learningusing recursive least-squares methodsrdquo Journal of ArtificialIntelligence Research vol 16 pp 259ndash292 2002
[12] H Zhang Q Wei and Y Luo ldquoA novel infinite-time optimaltracking control scheme for a class of discrete-time nonlinearsystems via the greedy HDP iteration algorithmrdquo IEEE Trans-actions on Systems Man and Cybernetics B vol 38 no 4 pp937ndash942 2008
[13] H Zhang R Song Q Wei and T Zhang ldquoOptimal trackingcontrol for a class of nonlinear discrete-time systems withtime delays based on heuristic dynamic programmingrdquo IEEETransactions on Neural Networks vol 22 no 12 pp 1851ndash18622011
16 Mathematical Problems in Engineering
[14] H Zhang X Xie and S Tong ldquoHomogenous polynomiallyparameter-dependent 119867
infinfilter designs of discrete-time fuzzy
systemsrdquo IEEE Transactions on Systems Man and CyberneticsB vol 41 no 5 pp 1313ndash1322 2011
[15] H Zhang L Cui X Zhang and Y Luo ldquoData-drivenrobust approximate optimal tracking control for unknown gen-eral nonlinear systems using adaptive dynamic programmingmethodrdquo IEEE Transactions on Neural Networks vol 22 no 12pp 2226ndash2236 2011
[16] J Zhang H Zhang Y Luo and H Liang ldquoOptimal controldesign for nonlinear systems adaptive dynamic programmingbased on fuzzy critic estimatorrdquo in The International JointConference on Neural Network pp 1ndash6 Brisbane Australia2012
[17] H Zhang D Gong B Chen and Z Liu ldquoSynchronization forcoupled neural networkswith interval delay a novel augmentedlyapunov-krasovskii functional methodrdquo IEEE Transactions onNeural Networks and Learning Systems vol 24 no 1 pp 58ndash702013
[18] S Bhasin K Dupree P M Patre and W E Dixon ldquoNeuralnetwork control of a robot interacting with an uncertain vis-coelastic environmentrdquo IEEE Transactions on Control SystemsTechnology vol 19 no 4 pp 947ndash955 2011
[19] S G Khan G Herrmann F Lewis T Pipe and C MelhuishldquoA Q-learning based Cartesian model reference compliancecontroller implementation for a humanoid robot armrdquo inProceedings of the IEEE 5th International Conference onRoboticsAutomation andMechatronics (RAM rsquo11) pp 214ndash219 QingdaoChina September 2011
[20] P K Patchaikani L Behera and G Prasad ldquoA single networkadaptive critic-based redundancy resolution scheme for robotmanipulatorsrdquo IEEE Transactions on Industrial Electronics vol59 no 8 pp 3241ndash3253 2012
[21] C J C H Watkins and P Dayan ldquoQ-learningrdquo MachineLearning vol 8 no 3-4 pp 279ndash292 1992
[22] C J CHWatkinsLearning fromdelayed rewards [PhD thesis]University of Cambridge Cambridge UK 1989
[23] F L Lewis and V L Syrmos Optimal Control John Wiley ampSons New York NY USA 1995
[24] M Sassano and A Astolfi ldquoDynamic approximate solutionsof the HJ inequality and of the HJB equation for input-affinenonlinear systemsrdquo IEEE Transactions on Automatic Controlvol 57 no 10 pp 2490ndash2503 2012
[25] Y X Wu and C Wang ldquoDeterministic learning based adaptivenetwork control of robot in task spacerdquoActa Automatica Sinicavol 39 no 1 pp 1ndash10 2013
[26] P M Patre W MacKunis K Kaiser and W E DixonldquoAsymptotic tracking for uncertain dynamic systems via amultilayer neural network feedforward and RISE feedbackcontrol structurerdquo IEEE Transactions on Automatic Control vol53 no 9 pp 2180ndash2185 2008
[27] B E Paden and S S Sastry ldquoA calculus for computing Filippovrsquosdifferential inclusion with application to the variable structurecontrol of robot manipulatorsrdquo IEEE Transactions on Circuitsand Systems vol 34 no 1 pp 73ndash82 1987
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Mathematical Problems in Engineering 7
Combining (35) with (38) we can get that
119882119894119888(119905) = minus119899
1119871119894(119871
119879
119894
119894119888+ 120575
ℎ119894) (40)
The update law of the weight for the action-NN is developedby a gradient descent algorithm expressed as follows
119882
119894119886(119905) = minus119899
2119878119894119886(119890
119894)
times ((119879
119894119886119878119894119886(119890
119894) +
1
2119877minus1
119887119894(119890
119894) nabla119878
119894119888(119890
119894)119879
119894119888))
119879
(41)
According to the estimation error of action-NN in (36) theoptimal control 119906lowast
119894(119890
119894) can minimize the optimal119876-function
and we can get the following equation
119882119879
119894119886119878119894119886(119890
119894) +
1
2119877minus1
119887119894(119890
119894) nabla119878
119894119888(119890
119894)119879
119882119894119888
+1
2119877minus1
119887119894(119890
119894) nabla120576
119894119888(119890
119894) + 120576
119894119886(119890
119894) = 0
(42)
Putting (41) into (42) we can get that
119882119894119886
= minus1198992119878119894119886(119890
119894)(
119879
119894119886119878119894119886(119890
119894) +
1
2119877minus1
119887119894(119890
119894) nabla119878
119894119888(119890
119894)119879
119894119888
minus1
2119877minus1
119887119894(119890
119894) nabla120576
119894119888(119890
119894) minus 120576
119894119886(119890
119894)
)
(43)
After using critic-NN and action-NN to estimate 119876119894and
119894(119890
119894) we need to design a kind of robust RBF-NN identifier
to identify the nonlinear uncertainties of the subsystem HereΦ
119894(119890
119894
119894) can be expressed as follows
Φ119894(119890
119894
119894) = 119890
119894119865= 119882
119879
119894119865120581 (Λ
119879
119894119865119890119894119865) + 120576
119894119865(119890
119894119865) + 119887
119894(119890
119894)
119894
(44)
where 120581(sdot) means the basic function of neural network and119882
119894119865Λ
119894119865indicate the unknown ideal neural network weights
Equation (44) can be identified by using robust RBF-NNidentifier so we can get
Φ119894(119890
119894
119894) = 119890
119894119865= 119882
119879
119894119865120581119894119865+ 119887
119894(119890
119894)
119894+ 120583
119894 (45)
Here 120581119894119865indicates the estimated value of the basic function of
the neural network 119882119894119865 Λ
119894119865are expressed as the estimated
value of neural network 120583119894isin R means the feedback error
term shown as follows [26]120583119894= 119896 (119890
119894119865(119905) minus 119890
119894119865(119905)) minus 119896 (119890
119894119865(0) minus 119890
119894119865(0)) + 120599
= 119896 (119890119894119865(119905) minus 119890
119894119865(0)) + 120599
120599 = (119896120572 + 120574) 119890119894119865+ 120573
1sat (119890
119894119865)
(46)
where 119896 120572 1205731 and 120574 are the positive control gain constants
and sat(sdot) is a saturation functionTherefore the state estima-tion error of the identifier-NN can be expressed as follows
119890119894119865= 119890
119894119865minus 119890
119894119865
= 119882119879
119894119865120581119894119865minus
119879
119894119865120581119894119865+ 120576
119894119865(119890
119894119865) minus 120583
119894
(47)
A filtered identification error is defined as follows
119864119894= 119890
119894119865+ 120572119890
119894119865 (48)
The derivation of the equation above is shown as
119894= 119882
119879
119894119865119894119865Λ119879
119894119865119890119894119865minus
119879
119894119865
120581119894119865
Λ119879
119894119865119890119894119865+ 120572 119890
119894119865minus 119896119864
119894minus 120574119890
119894119865
minus
119882119879
119894119865120581119894119865+ 120576
119894119865(119890
119894119865) minus 120573
1sat (119890
119894119865) minus
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865
(49)
Here the weight 119882119894119865 Λ
119894119865of the identification-NN can be
updated by
119882
119894119865= proj (Γ
119894119882119865
120581119894119865Λ119879
119894119865
119890119894119865119890119879
119894119865)
Λ119894119865= proj (Γ
119894Λ119865
119890119894119865119890119879
119894119865
119879
119894119865
120581119894119865)
(50)
where Γ119894119882119865
Γ119894Λ119865
are positive constant adaptation gain matri-ces In order to analyze the convergence of the filteredidentification error 119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865
can be divided into thefollowing form
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865
=1
2120581119894119865
119890119894119865[(Λ
119879
119894119865minus Λ
119879
119894119865) (119882
119879
119894119865minus
119879
119894119865)
+ (119882119879
119894119865minus
119879
119894119865) (Λ
119879
119894119865minus Λ
119879
119894119865)]
=1
2120581119894119865
119890119894119865[
119879
119894119865(Λ
119879
119894119865minus Λ
119879
119894119865) + (119882
119879
119894119865minus
119879
119894119865) Λ
119879
119894119865
minus119882119879
119894119865(Λ
119879
119894119865minus Λ
119879
119894119865) minus (119882
119879
119894119865minus
119879
119894119865)Λ
119879
119894119865
]
=1
2120581119894119865
119890119894119865[
119879
119894119865(Λ
119879
119894119865minus Λ
119879
119894119865) + (119882
119879
119894119865minus
119879
119894119865) Λ
119879
119894119865]
minus1
2120581119894119865
119890119894119865[119882
119879
119894119865(Λ
119879
119894119865minus Λ
119879
119894119865) + (119882
119879
119894119865minus
119879
119894119865)Λ
119879
119894119865]
=1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865
minus1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865minus1
2
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865
+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865
(51)
where 119879
119894119865= 119882
119879
119894119865minus
119879
119894119865 Λ119879
119894119865= Λ
119879
119894119865minus Λ
119879
119894119865 Putting (51) into
(49) then (49) can be reduced to the following form
119894= 119875
1198651+ 119875
1198652+ 119875
1198653minus 119896119864
119894minus 120574119890
119894119865minus 120573
1sat (119890
119894119865) (52)
8 Mathematical Problems in Engineering
Among the equations above 1198751198651+119875
1198652+119875
1198653can be expressed
respectively as follows
1198751198651
=1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865minus
119879
119894119865
120581119894119865
Λ119879
119894119865119890119894119865
+ 120572 119890119894119865minus
119879
119894119865120581119894119865
(53)
1198751198652
= minus1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865minus1
2
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865
+119882119879
119894119865119894119865Λ119879
119894119865119890119894119865+ 120576
119894119865(119890
119894119865)
(54)
1198751198653
=1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865 (55)
According to Assumption 1 (48) and (50) the upper boundsof 119875
1198651 119875
1198652 119875
1198653are shown as
100381710038171003817100381711987511986511003817100381710038171003817 le 119869
1(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
100381710038171003817100381711987511986521003817100381710038171003817 le 120589
1
100381710038171003817100381711987511986531003817100381710038171003817 le 120589
2
(56)
Combining (53) and (54) with (55) then we can get that100381710038171003817100381710038171198652
+ 1198653
10038171003817100381710038171003817le 120589
3+ 120589
41198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817 (57)
where 120593119894(119890
119879
119894119865 119864
119879
119894) = [119890
119879
119894119865119864119879
119894]119879 and 119869
119894(sdot) is a global invertible
nondecreasing function 120589119894 (119894 = 1 2 3 4) are computable
positive constants
Theorem 5 Considering dynamics of the subsystem of timevarying constrained reconfigurable modular robot in (14) andthe state equation (18) if the designed identifier and thecorresponding weight update laws are adopted then the globaluncertainty of the subsystem which depends explicitly on theerror term can be identified and the identification error isconverged and bounded
Proof Define the Lyapunov function as the follows
119881119894119871(119890
119894119865 119864
119894) =
1
2119864119879
119894119864119894+1
2120574119890
119879
119894119865119890119894119865+ 120603
119894(119905) + 120601
119894(119905) (58)
In the equation above 120603119894(119905) and 120601
119894(119905) can be expressed as
follows
119894(119905) = minus[
119864119879
119894(119875
1198652minus 120573
1sat (119890
119894119865)) + 119890
119879
1198941198651198751198653
minus12057321198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)100381710038171003817100381710038171003817100381710038171003817119890119894119865
1003817100381710038171003817
]
120603119894(0) = 120573
1
1003816100381610038161003816119890119894119865 (0)1003816100381610038161003816 minus 119890
119879
119894119865(0) (119875
1198652(0) + 119875
1198653(0))
(59)
120601119894(119905) =
1
4120572 [ tr (
119879
119894119865Γminus1
119894119882119865
119894119865) + tr (Λ
119879
119894119865Γminus1
119894Λ119865Λ
119894119865)] (60)
where tr(sdot) represents the trace of matrix Defining 119889 =
[119864119879
119894119890119879
11989411986512060312
11989412060112
119894] 120573
1 120573
2isin R are positive adaptation gains
which are chosen to ensure 120603119894(119905) ge 0 so we can get
1198801(119889) le 119881
119894119871(119890
119894119865 119864
119894) le 119880
2(119889) (61)
where
1198801(119889) =
1
2min (1 120574) 1198892
1198802(119889) = max (1 120574) 1198892
(62)
The derivation of (58) is shown as follows
119894119871(119890
119894119865 119864
119894) = nabla119881
119879
119894119871119870[
119894
119890119879
119894119865
1
212060312
119894119894
1
212060112
119894
120601119894]119879
(63)
where119870[sdot] is expressed as a Filipov set [27]So
119894119871(119890
119894119865 119864
119894) can be deformed as the following form
119894119871(119890
119894119865 119864
119894)
= [119864119879
119894120574119890
119879
1198941198652120603
12
1198942120601
12
119894]119870[
119894
119890119879
119894119865
1
212060312
119894119894
1
212060112
119894
120601119894]119879
le 120574119879
(
1
2
119882119879
119894119865
119894119865
Λ119879
119894119865
119894119865+
1
2
119879
119894119865
119894119865Λ119879
119894119865
119894119865minus
119879
119894119865
119894119865
Λ
119879
119894119865119890119894119865
+120572119894119865minus
1
2
119882119879
119894119865
119894119865
Λ119879
119894119865119890119894119865minus
1
2
119879
119894119865
119894119865Λ119879
119894119865119890119894119865minus 120574119890
119894119865
minus119879
119894119865120581119894119865+119882
119879
119894119865119894119865Λ119879
119894119865119890119894119865+ 120576
119894119865(119890
119894119865) +
1
2
119879
119894119865
119894119865
Λ119879
119894119865
119894119865
+
1
2
119879
119894119865
119894119865
Λ119879
119894119865
119894119865minus 119896119864
119894minus 120573
1119870[sat (119890
119894119865)]
)
minus119864119879
119894(
minus1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865minus1
2
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865
+119882119879
119894119865119894119865Λ119879
119894119865119890119894119865+ 120576
119894119865(119890
119894) minus 120573
1119870[sat (119890
119894119865)])
minus 119890119879
119894119865
1
2(
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865)
+ 120574119890119879
119894119865(119864
119894minus 120572119890
119894119865)
+ 12057321198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)100381710038171003817100381710038171003817100381710038171003817119890119894119865
1003817100381710038171003817
minus1
2120572 tr (119879
119894119865Γminus1
119894119882119865
119882
119894119865) minus
1
2120572 tr (Λ119879
119894119865Γminus1
119894Λ119865
Λ119894119865)
(64)
Put (53) (54) and (55) into (64) then we can get
119894119871(119890
119894119865 119864
119894)
= 119864119879
119894(119875
1198651+ 119875
1198652+ 119875
1198653minus 120573
1119870[sat (119890
119894119865)] minus 119896119864
119894minus 120574119890
119894119865)
+ 120574119890119879
119894119865(119864
119894minus 120572119890
119894119865)
minus 119864119879
119894(119875
1198652minus 120573
1119870[sat (119890
119894119865)])
minus 119890119879
1198941198651198751198653
+ 12057321198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)100381710038171003817100381710038171003817100381710038171003817119890119894119865
1003817100381710038171003817
minus1
2120572 tr (119879
119894119865Γminus1
119894119882119865
119882
119894119865)
minus1
2120572 tr (Λ119879
119894119865Γminus1
119894Λ119865
Λ119894119865)
Mathematical Problems in Engineering 9
= minus120572120574119890119879
119894119865119890119894119865+ (119864
119879
119894minus 119890
119879
119894119865)119875
1198653
1198691(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)2
41198963
minus1
2120572 tr (119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865119890119879
119894119865) minus
1
2120572 tr (Λ119879
119894119865
119890119894119865119890119879
119894119865
119879
119894119865
120581119894119865)
le minus1198961
100381710038171003817100381711989011989411986510038171003817100381710038172
minus 1198962
1003817100381710038171003817119864119894
10038171003817100381710038172
+1198691(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)2
41198963
10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
+1205732
21198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)2
41205721198964
10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
(65)
where 119896min = min1198961 119896
2 120585 = min119896
3 120572119896
4120573
2
2 and
119869(120593119894(119890
119879
119894119865 119864
119879
119894))
2
= 1198691(120593
119894(119890
119879
119894119865 119864
119879
119894))
2 + 1198692(120593
119894(119890
119879
119894119865 119864
119879
119894))
2 sothe following conclusion can be obtained
119894119871(119890
119894119865 119864
119894)
le minus119896min10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
+119869(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)210038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
4120585
le minus11988810038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
(66)
Therefore for an arbitrary constant 119888 minus119888120593119894(119890
119879
119894119865 119864
119879
119894)
2
is a negative semidefinite function which is defined in theadjustable interval119863 expressed as follows
119863 = 119889 (119905) | 119889 le 119869minus1
(2radic119896min120585) (67)
so that Lyapunov stability theory shows that the system isstable In order to make the subsystem of time varying con-strained reconfigurable modular robot tracking the desiredtrajectory progressively in this paper a novel decentralizedreinforcement learning robust optimal tracking controllerhas been designed by using the robust term to compensatethe neural network approximation errors Design the robustcontrol term as
119906119894119903119887
=119873
119903119887119890119894
119890119879119894119890119894+ 120577
(68)
In the equation above 120577 gt 0 is a constant And 119873119903119887can
be expressed as
119873119903119887ge [
[
1205752
ℎ119894
21198991
+1198991(minus120576
119894119886(119890
119894) minus 119877
minus1
119887119894(119890
119894) (nabla120576
119894119888(119890
119894)2))
2
21198992
+11989911198992
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
(minus120576119894119886(119890
119894) minus 119877
minus1
119887119894(119890
119894)nabla120576
119894119888(119890
119894)
2)
2
]
]
sdot(119890
119879
119894119890119894+ 120577)
211989911198992
1003817100381710038171003817119887119894 (119890119894)1003817100381710038171003817 119890
119879
119894119890119894
ge [1198992
1(minus120576
119894119886(119890
119894) minus 119877
minus1
119887119894(119890
119894)nabla120576
119894119888(119890
119894)
2)
2
+ 11989921205752
ℎ119894+ 2119899
2
11198992
2
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
sdot (minus120576119894119886(119890
119894) minus 119877
minus1
119887119894(119890
119894)nabla120576
119894119888(119890
119894)
2)
2
]
sdot(119890
119879
119894119890119894+ 120577)
41198992111989922
1003817100381710038171003817119887119894 (119890119894)1003817100381710038171003817 119890
119879
119894119890119894
(69)
Therefore the global control law can be designed asfollows
119906mix = 119906119894+ 119906
119894119903119887
= minus1
2119877minus1
119887119879
119894(119890
119894) 119878
119894119886(119890
119894)119879
119894119886+
119873119903119887119890119894
119890119879119894119890119894+ 120577
(70)
Theorem 6 Considering dynamics of the subsystem of timevarying constrained reconfigurable modular robot in (14) if thesystem parameters conditions and the assumptions are held thecritic-NN action-NN and identifier are given by (33) (34)and (45) respectively and the decentralized robust optimaltracking controller of the subsystem in (70) is adopted thenthe system is closed-loop stability and the desired trajectory canbe tracked asymptotically by the actual output
Proof Design the Lyapunov function as follows
119881119894119906(119890
119894 119906mix) =
1
21198991
tr 119879
119894119888
119894119888 +
1198991
21198992
tr 119879
119894119886
119894119886
+ 11989911198992[119890
119879
119894119865119890119894119865+ Ξint
infin
0
119903119894(119890
119894 119906mix) 119889120591]
(71)
where Ξ gt 0 is the undetermined parameter 119905 le 120591 lt infin Thederivation of (71) is shown as follows
119894119906(119890
119894 119906mix)
=1
21198991
tr 119879
119894119888
119882
119894119888 +
1198991
21198992
tr 119879
119894119886
119882
119894119886
+ 11989911198992[119890
119879
119894119865119890119894119865+ Ξ119903
119894(119890
119894 119906mix)]
=1
21198991
tr 119879
119894119888(minus119899
1119871119894(119871
119879
119894
119894119888+ 120575
ℎ119894))
+1198991
21198992
tr
times
119879
119894119886
[[[[
[
minus1198992119878119894119886(119890
119894)(
119879
119894119886119878119894119886(119890
119894) minus 120576
119894119886(119890
119894)
+1
2119877minus1
119887119894(119890
119894) nabla119878
119894119888(119890
119894)119879
119894119888
minus1
2119877minus1
119887119894(119890
119894) nabla120576
119894119888(119890
119894)
)
]]]]
]
+ 11989911198992119890119879
119894119865(119882
119879
119894119865120581 (Λ
119879
119894119865119890119894) + 120576
119894119865(119890
119894) + 119887
119894(119890
119894) mix)
+Ξ (119890119879
119894119876119890119890119894+ 119906
119879
mix119877119906mix)
10 Mathematical Problems in Engineering
le minus(1198712
119894119898minus1198991
21198712
119894119872)10038171003817100381710038171003817
119894119888
10038171003817100381710038171003817
2
+1
21198991
1205752
ℎ119894
minus (11989911198782
119894119886119898minus3
4119899111989921198782
119894119886119872)10038171003817100381710038171003817
119894119886
10038171003817100381710038171003817
2
+1198991
41198992
10038171003817100381710038171003817119877minus110038171003817100381710038171003817
21003817100381710038171003817119887119894(119890119894)10038171003817100381710038172 10038171003817100381710038171003817nabla119878
2
119894119888119872
10038171003817100381710038171003817
10038171003817100381710038171003817
119894119888
10038171003817100381710038171003817
2
+1198991
21198992
(120576119894119886(119890
119894) + 119877
minus1
119887119894(119890
119894)nabla120576
119894119888(119890
119894)
2)
119879
sdot (120576119894119886(119890
119894) + 119877
minus1
119887119894(119890
119894) nabla120576
119894119888(119890
119894)
2)
+ 11989911198992
1003817100381710038171003817119887119894 (119890119894)10038171003817100381710038172
120576119894119886(119890
119894)119879
120576119894119886(119890
119894)
+ 119899111989921198782
119894119886119872
1003817100381710038171003817119887119894 (119890119894)1003817100381710038171003817210038171003817100381710038171003817
119894119886
10038171003817100381710038171003817
2
+ 11989911198992Ξ120582min (119876119890
)1003817100381710038171003817119890119894119865
10038171003817100381710038172
+ 11989911198992(1003817100381710038171003817119887119894 (119890119894)
10038171003817100381710038172
minus Ξ120582min (119877))1003817100381710038171003817119906mix
10038171003817100381710038172
(72)
If the following inequalities can satisfy
120582min 1198761198901003817100381710038171003817119890119894119865
10038171003817100381710038172
2le 119890
119879
119894119865119876119890119890119894119865le 120582max 119876119890
1003817100381710038171003817119890119894119865
10038171003817100381710038172
2
120582min 1198771003817100381710038171003817119906mix
10038171003817100381710038172
2le 119906
119879
mix119877119906mix le 120582max 1198771003817100381710038171003817119906mix
10038171003817100381710038172
2
Ξ gt
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
120582min 119877
(73)
then 119894119906(119890
119894 119906mix) can be further transformed as
119894119906(119890
119894 119906mix)
le minus(1198712
119894119898minus1198991
21198712
119894119872minus
1198991
41198992
10038171003817100381710038171003817119877minus110038171003817100381710038171003817
21003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
nabla1198782
119894119888119872)10038171003817100381710038171003817
119894119888
10038171003817100381710038171003817
2
minus (11989911198782
119894119886119898minus3
4119899111989921198782
119894119886119872minus 119899
111989921198782
119894119886119872
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
)10038171003817100381710038171003817
119894119886
10038171003817100381710038171003817
2
minus 11989911198992(1003817100381710038171003817119887119894(119890119894)
10038171003817100381710038172
+ Ξ120582min (119877))1003817100381710038171003817119906mix
10038171003817100381710038172
minus 11989911198992Ξ120582min (119876119890
)1003817100381710038171003817119890119894119865
10038171003817100381710038172
le minus11989911198992Ξ120582min (119876119890
)1003817100381710038171003817119890119894119865
10038171003817100381710038172
(74)
Therefore we can get the conclusion that 119894119906(119890
119894 119906mix) lt 0
4 Simulations
In order to verify the validity and convergence of the pro-posed decentralized reinforcement learning robust optimaltracking control method based on ACI and to study theconvergence of the error by comparing the simulation resultin this paper two different configurations of the time varying
external constrained reconfigurablemodular robot have beenapplied shown in Figures 3 and 4
For the sake of the facilitation of the analysis of theconfigurations above we can transform them into a formof analytic charts which are shown in Figures 5 and 6where 119871
1 119871
2 and 119871
4are the length of the links 119871
3is the
distance between the time varying constraint joint and thebase modular
The time varying constraint can be defined as a kindof column which rotated about with a certain degree offreedom The constraint equations of configuration A andconfiguration B are shown as follows
Ψ119860(119902 119905) = 119871
1cos 119902
1+ 119871
2cos 119902
2minus [119871
3+ 119871
4cot120572 (119905)]
Ψ119861(119902 119905) = 119871
1+ 119871
2cos 119902
2minus [119871
3+ 119871
4cot120572 (119905)]
(75)
In the equation above the angle 120572(119905) between the timevarying constraint and the119883-axis can be defined as follows
120572 (119905) = 075120587 + 02 sin 119905
2 (76)
The initial positions of joint models are 1199021(0) = 2 119902
2(0) =
2 in configurationA and 1199021(0) = 2 119902
2(0) = 2 in configuration
BThe initial velocities of joints are zerosThe dynamicmodelof configurations A and B is designed as follows
119872119860(119902) = [
036 cos (1199022) + 06066 018 cos (119902
2) + 01233
018 cos (1199022) + 01233 01233
]
119872119861(119902) = [
017 minus 01166cos2 (1199022) minus006 cos (119902
2)
minus006 cos (1199022) 01233
]
119862119860(119902 119902) = [
minus036 sin (1199022) 119902
2minus018 sin (119902
2) 119902
2
018 sin (1199022) ( 119902
1minus 119902
2) 018 sin (119902
2) 119902
1
]
119862119861(119902 119902) = [
01166 sin (21199022) 119902
2006 sin (119902
2) 119902
2
006 sin (1199022) 119902
20
]
119866119860(119902) = [
minus588 sin (1199021+ 119902
2) minus 1764 sin (119902
1)
minus588 sin (1199021+ 119902
2)
]
119866119861(119902) = [
0
minus588 cos (1199022)]
119865119860(119902 119902) = [
1199021+ 10 sin (3119902
1) + 2 sgn ( 119902
1)
12 1199022+ 5 sin (2119902
2) + sgn ( 119902
2)]
119865119861(119902 119902) = [
0
15 1199022+ sin (119902
2) + 12 sgn ( 119902
2)]
(77)
The desired trajectory of configurations A and B is shown asConfiguration A
1199101119889
= 05 cos (119905) + 02 sin (3119905)
1199102119889
= Θ (1199101119889 119905)
= arcsin[1198711sin (120572 (119905) minus 119910
1119889) minus 119871
3sin (120572 (119905))
1198712
] + 120572 (119905)
(78)
Mathematical Problems in Engineering 11
Figure 3 Configuration A for simulation
Figure 4 Configuration B for simulation
Configuration B
1199101119889
= 0
1199102119889
= Θ (1199101119889 119905)
= arcsin [1198711sin (120572 (119905)) minus 119871
3sin (120572 (119905))
1198712
] + 120572 (119905)
(79)
Due to the limit of dynamic external constraints thevariety of joint 1 in configuration B is zero
In order to confirm that the adopted method can beapplied in different configurations and to verify the trackingperformance for the subsystem desired trajectory by usingthe ACI and RBF-NN based decentralized reinforcementlearning robust optimal tracking control method in this partthe comparative simulations which include the classic RBFneural network control method and ACI based decentral-ized reinforcement learning robust optimal tracking controlmethod have been adopted respectively
From Figures 7 8 9 10 11 12 13 and 14 the joint trackingand error curves are shown by using classic RBF neural net-work [4] to compensate the effect of the dynamic nonlinearterm and the interconnection term in the subsystem Figures7 to 10 show that the actual output trajectories take about2 seconds to track the desired trajectories This is due tothe fact that the classical neural network method requires alonger training process and parameter adaptive process Theerror curves in Figures 11ndash14 show that the joint subsystem
q1L2
L3
L4
L1
Y
X
120572
q2
Figure 5 The analytic chart of configuration A
q2
L4
L2
L1
L3
Y
120572
X
q1
Figure 6 The analytic chart of configuration B
constraints cannot be well compensated by adopting the clas-sical neural control method for the reconfigurable modularrobot when the time varying constrains exist When thejoint output variables turn larger the reconfigurable modularsystem cannot exhibit a good robustness and the trackingerrors are larger than before
Figures 15 16 17 18 19 20 21 and 22 showed thejoint tracking and error curves by adopting ACI to identifythe optimal 119876-function optimal control policy and globaldynamic nonlinear terms of the HJB equation in the sub-system Figures 15 to 18 show that the actual output trajec-tories of the subsystems can track the desired trajectoriesin 05 seconds by using the proposed robust reinforcementlearning optimal control method This is due to the excellentidentifying ability of ACIwhich can identify the uncertaintiescontained in the subsystems in a short timeThe error curvesin Figures 19ndash22 show that the tracking error is very smalland it can converge to zero in a short time Besides whenthe proposed decentralized control method is adopted for thejoint subsystems the robustness is manifested
Figures 23 and 24 showed the 3D-tip trajectory curvesby using ACI algorithm These two figures show that theproposed decentralized control method can fully satisfythe accessibility requirement of the reconfigurable modularrobot Besides the singular displacements of the joint subsys-tems and the end-effector would not appear The parametersdefined in ACI are shown in Table 1
12 Mathematical Problems in Engineering
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
25
Time (s)
Join
t 1 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
minus1
minus05
Figure 7 Trajectory tracking curve of configuration A joint 1 withRBF neural network
0 1 2 3 4 5 6 7 8 9 1008
1
12
14
16
18
2
22
24
26
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 8 Trajectory tracking curve of configuration A joint 2 withRBF neural network
Table 1 Parameter list of action-critic-identifier
119896 120572 120592 1205781198861
1205781198862
120578119888
1205731
1205732
120574
800 300 0005 10 50 20 02 2 05
The simulation results show that compared with thesituation of using classic RBF neural network controllerthe decentralized robust optimal tracking control methodbased on ACI and reinforcement learning can be appliedinto different configurations of the time varying constrainedreconfigurable modular robot And the joint variables cantrack the desired trajectory within a very short time indifferent configurations and the fluctuation of the errorconvergence range is minimal
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus2
minus15
minus05
minus1
Desired trajectoryActual trajectory
Figure 9 Trajectory tracking curve of configuration B joint 1 withRBF neural network
0 1 2 3 4 5 6 7 8 9 1017
18
19
2
21
22
23
24
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 10 Trajectory tracking curve of configuration B joint 2 withRBF neural network
0 1 2 3 4 5 6 7 8 9 10
0
002
004
006
008
01
Time (s)
Join
t 1 er
ror (
rad)
minus01
minus008
minus006
minus004
minus002
Figure 11 Tracking error curve of configuration A joint 1 with RBFneural network
Mathematical Problems in Engineering 13
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
002
004
006
008
01
Join
t 2 er
ror (
rad)
minus01
minus008
minus006
minus004
minus002
Figure 12 Tracking error curve of configuration A joint 2 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10
0
001
002
003
004
005
Time (s)
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 13 Tracking error curve of configuration B joint 1 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 2 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 14 Tracking error curve of configuration B joint 2 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus1
minus05
Desired trajectoryActual trajectory
Figure 15 Trajectory tracking curve of configuration A joint 1 withACI based reinforcement learning
0 1 2 3 4 5 6 7 8 9 1008
1
12
14
16
18
2
22
24
26
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 16 Trajectory tracking curve of configuration A joint 2 withACI based reinforcement learning
5 Conclusions and Future Work
In this paper combining ACI with RBF neural network anovel decentralized reinforcement learning robust optimaltracking control theory has been proposed for time varyingconstrained reconfigurable modular robots Moreover thistheory is used to solve the problem of the continuoustime nonlinear optimal control policy for strongly coupleduncertainty robotic system Firstly we build the model ofsubsystem with the time varying external constraints anddescribed the global robot system as a synthesis of inter-connected subsystem Secondly ACI is used to estimate theHJB equation and the global uncertainty where a continuous-time optimal 119876-function is adopted to take the place oftraditional optimal value function the optimal 119876-function
14 Mathematical Problems in Engineering
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus15
minus05
minus2
minus1
Desired trajectoryActual trajectory
Figure 17 Trajectory tracking curve of configuration B joint 1 withACI based reinforcement learning
0 1 2 3 4 5 6 7 8 9 1017
18
19
2
21
22
23
24
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 18 Trajectory tracking curve of configuration B joint 2 withACI based reinforcement learning
and the optimal control police are approximated by critic-NNand action-NN and the global uncertainty is identified bythe identifier Thirdly we design a novel decentralized robustoptimal tracking controller so the desired trajectory can betracked and the tracking error could converge to zero infinite timeOn this basis two kinds of Lyapunov functions aredesigned to confirm the stability of ACI and the subsystemFinally in order to confirm the superiority of the proposedcontrol theory the comparative simulation examples havebeen presented combining two different configurations of therobot
In the future more complex configuration for timevarying external constrained reconfigurable modular robotcan be included Therefore the decentralized controller withhigher control and error precision will be considered
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 19 Tracking error curve of configuration A joint 1 with ACIbased reinforcement learning
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005Jo
int 2
erro
r (ra
d)
minus005
minus004
minus003
minus002
minus001
Figure 20 Tracking error curve of configuration A joint 2 with ACIbased reinforcement learning
0
0
1 2 3 4 5 6 7 8 9 10Time (s)
001
002
003
004
005
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 21 Tracking error curve of configuration B joint 1 with ACIbased reinforcement learning
Mathematical Problems in Engineering 15
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 2 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 22 Tracking error curve of configuration B joint 2 with ACIbased reinforcement learning
005
1
02 03 04 05 06 07
0
01
02
03
minus1
minus05minus02
minus01
xla
bel
y labelz label
Desired trajectoryActual trajectory
Figure 23 3D-tip trajectory curve of configuration A with ACI
005
1
035 036 037 038 039 04
006008
01012014016018
minus1
minus05
xla
bel
y labelz label
Desired trajectoryActual trajectory
Figure 24 3D-tip trajectory curve of configuration B with ACI
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work is financially supported by the National NaturalScience Foundation of China (Grant nos 61374051 and60974010) the Scientific Technological Development PlanProject in Jilin Province of China (Grant no 20110705) Thefirst author is funded by China Scholarship Council
References
[1] Y Li and B Dong ldquoDecentralized ADRC control for reconfig-urable manipulators based on VGSTA-ESO of sliding moderdquoINFORMATION vol 15 no 6 pp 2453ndash2466 2012
[2] Y LiM-C Zhu andY-C Li ldquoSimulation on robust neurofuzzycompensator for reconfigurable manipulator motion controlrdquoJournal of System Simulation vol 19 no 22 pp 5169ndash5174 2007
[3] M-C Zhu and Y-C Li ldquoDecentralized adaptive sliding modecontrol for reconfigurable manipulators using fuzzy logicrdquoJournal of Jilin University (Engineering and Technology Edition)vol 39 no 1 pp 170ndash176 2009
[4] L Zhu and Y Li ldquoDecentralized adaptive neural networkcontrol for reconfigurable manipulatorsrdquo in Proceedings of theChinese Control and Decision Conference (CCDC rsquo10) pp 1760ndash1765 Xuzhou China May 2010
[5] M Zhu Y Li and Y Li ldquoA new distributed control schemeof modular and reconfigurable robotsrdquo in Proceedings of theIEEE International Conference onMechatronics and Automation(ICMA rsquo07) pp 2622ndash2627 Harbin China August 2007
[6] M C Zhu Y Li and Y C Li ldquoObserver-based decentralizedadaptive fuzzy control for reconfigurable manipulatorrdquo Controland Decision vol 24 no 3 pp 429ndash434 2009
[7] S Richard and A G Barto Reinforcement Learning An Intro-duction The MIT Press London UK 1998
[8] F L Lewis and D Liu ReinForcement Learning and Approxi-mate Dynamic Programming For Feedback Control Wiley-IEEEPress New York NY USA 2012
[9] Y-K Xu and X-R Cao ldquoLebesgue-sampling-based optimalcontrol problems with time aggregationrdquo IEEE Transactions onAutomatic Control vol 56 no 5 pp 1097ndash1109 2011
[10] F L Lewis andD Vrabie ldquoReinforcement learning and adaptivedynamic programming for feedback controlrdquo IEEE Circuits andSystems Magazine vol 9 no 3 pp 32ndash50 2009
[11] X Xu H-G He and D Hu ldquoEfficient reinforcement learningusing recursive least-squares methodsrdquo Journal of ArtificialIntelligence Research vol 16 pp 259ndash292 2002
[12] H Zhang Q Wei and Y Luo ldquoA novel infinite-time optimaltracking control scheme for a class of discrete-time nonlinearsystems via the greedy HDP iteration algorithmrdquo IEEE Trans-actions on Systems Man and Cybernetics B vol 38 no 4 pp937ndash942 2008
[13] H Zhang R Song Q Wei and T Zhang ldquoOptimal trackingcontrol for a class of nonlinear discrete-time systems withtime delays based on heuristic dynamic programmingrdquo IEEETransactions on Neural Networks vol 22 no 12 pp 1851ndash18622011
16 Mathematical Problems in Engineering
[14] H Zhang X Xie and S Tong ldquoHomogenous polynomiallyparameter-dependent 119867
infinfilter designs of discrete-time fuzzy
systemsrdquo IEEE Transactions on Systems Man and CyberneticsB vol 41 no 5 pp 1313ndash1322 2011
[15] H Zhang L Cui X Zhang and Y Luo ldquoData-drivenrobust approximate optimal tracking control for unknown gen-eral nonlinear systems using adaptive dynamic programmingmethodrdquo IEEE Transactions on Neural Networks vol 22 no 12pp 2226ndash2236 2011
[16] J Zhang H Zhang Y Luo and H Liang ldquoOptimal controldesign for nonlinear systems adaptive dynamic programmingbased on fuzzy critic estimatorrdquo in The International JointConference on Neural Network pp 1ndash6 Brisbane Australia2012
[17] H Zhang D Gong B Chen and Z Liu ldquoSynchronization forcoupled neural networkswith interval delay a novel augmentedlyapunov-krasovskii functional methodrdquo IEEE Transactions onNeural Networks and Learning Systems vol 24 no 1 pp 58ndash702013
[18] S Bhasin K Dupree P M Patre and W E Dixon ldquoNeuralnetwork control of a robot interacting with an uncertain vis-coelastic environmentrdquo IEEE Transactions on Control SystemsTechnology vol 19 no 4 pp 947ndash955 2011
[19] S G Khan G Herrmann F Lewis T Pipe and C MelhuishldquoA Q-learning based Cartesian model reference compliancecontroller implementation for a humanoid robot armrdquo inProceedings of the IEEE 5th International Conference onRoboticsAutomation andMechatronics (RAM rsquo11) pp 214ndash219 QingdaoChina September 2011
[20] P K Patchaikani L Behera and G Prasad ldquoA single networkadaptive critic-based redundancy resolution scheme for robotmanipulatorsrdquo IEEE Transactions on Industrial Electronics vol59 no 8 pp 3241ndash3253 2012
[21] C J C H Watkins and P Dayan ldquoQ-learningrdquo MachineLearning vol 8 no 3-4 pp 279ndash292 1992
[22] C J CHWatkinsLearning fromdelayed rewards [PhD thesis]University of Cambridge Cambridge UK 1989
[23] F L Lewis and V L Syrmos Optimal Control John Wiley ampSons New York NY USA 1995
[24] M Sassano and A Astolfi ldquoDynamic approximate solutionsof the HJ inequality and of the HJB equation for input-affinenonlinear systemsrdquo IEEE Transactions on Automatic Controlvol 57 no 10 pp 2490ndash2503 2012
[25] Y X Wu and C Wang ldquoDeterministic learning based adaptivenetwork control of robot in task spacerdquoActa Automatica Sinicavol 39 no 1 pp 1ndash10 2013
[26] P M Patre W MacKunis K Kaiser and W E DixonldquoAsymptotic tracking for uncertain dynamic systems via amultilayer neural network feedforward and RISE feedbackcontrol structurerdquo IEEE Transactions on Automatic Control vol53 no 9 pp 2180ndash2185 2008
[27] B E Paden and S S Sastry ldquoA calculus for computing Filippovrsquosdifferential inclusion with application to the variable structurecontrol of robot manipulatorsrdquo IEEE Transactions on Circuitsand Systems vol 34 no 1 pp 73ndash82 1987
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
8 Mathematical Problems in Engineering
Among the equations above 1198751198651+119875
1198652+119875
1198653can be expressed
respectively as follows
1198751198651
=1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865minus
119879
119894119865
120581119894119865
Λ119879
119894119865119890119894119865
+ 120572 119890119894119865minus
119879
119894119865120581119894119865
(53)
1198751198652
= minus1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865minus1
2
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865
+119882119879
119894119865119894119865Λ119879
119894119865119890119894119865+ 120576
119894119865(119890
119894119865)
(54)
1198751198653
=1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865 (55)
According to Assumption 1 (48) and (50) the upper boundsof 119875
1198651 119875
1198652 119875
1198653are shown as
100381710038171003817100381711987511986511003817100381710038171003817 le 119869
1(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
100381710038171003817100381711987511986521003817100381710038171003817 le 120589
1
100381710038171003817100381711987511986531003817100381710038171003817 le 120589
2
(56)
Combining (53) and (54) with (55) then we can get that100381710038171003817100381710038171198652
+ 1198653
10038171003817100381710038171003817le 120589
3+ 120589
41198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817 (57)
where 120593119894(119890
119879
119894119865 119864
119879
119894) = [119890
119879
119894119865119864119879
119894]119879 and 119869
119894(sdot) is a global invertible
nondecreasing function 120589119894 (119894 = 1 2 3 4) are computable
positive constants
Theorem 5 Considering dynamics of the subsystem of timevarying constrained reconfigurable modular robot in (14) andthe state equation (18) if the designed identifier and thecorresponding weight update laws are adopted then the globaluncertainty of the subsystem which depends explicitly on theerror term can be identified and the identification error isconverged and bounded
Proof Define the Lyapunov function as the follows
119881119894119871(119890
119894119865 119864
119894) =
1
2119864119879
119894119864119894+1
2120574119890
119879
119894119865119890119894119865+ 120603
119894(119905) + 120601
119894(119905) (58)
In the equation above 120603119894(119905) and 120601
119894(119905) can be expressed as
follows
119894(119905) = minus[
119864119879
119894(119875
1198652minus 120573
1sat (119890
119894119865)) + 119890
119879
1198941198651198751198653
minus12057321198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)100381710038171003817100381710038171003817100381710038171003817119890119894119865
1003817100381710038171003817
]
120603119894(0) = 120573
1
1003816100381610038161003816119890119894119865 (0)1003816100381610038161003816 minus 119890
119879
119894119865(0) (119875
1198652(0) + 119875
1198653(0))
(59)
120601119894(119905) =
1
4120572 [ tr (
119879
119894119865Γminus1
119894119882119865
119894119865) + tr (Λ
119879
119894119865Γminus1
119894Λ119865Λ
119894119865)] (60)
where tr(sdot) represents the trace of matrix Defining 119889 =
[119864119879
119894119890119879
11989411986512060312
11989412060112
119894] 120573
1 120573
2isin R are positive adaptation gains
which are chosen to ensure 120603119894(119905) ge 0 so we can get
1198801(119889) le 119881
119894119871(119890
119894119865 119864
119894) le 119880
2(119889) (61)
where
1198801(119889) =
1
2min (1 120574) 1198892
1198802(119889) = max (1 120574) 1198892
(62)
The derivation of (58) is shown as follows
119894119871(119890
119894119865 119864
119894) = nabla119881
119879
119894119871119870[
119894
119890119879
119894119865
1
212060312
119894119894
1
212060112
119894
120601119894]119879
(63)
where119870[sdot] is expressed as a Filipov set [27]So
119894119871(119890
119894119865 119864
119894) can be deformed as the following form
119894119871(119890
119894119865 119864
119894)
= [119864119879
119894120574119890
119879
1198941198652120603
12
1198942120601
12
119894]119870[
119894
119890119879
119894119865
1
212060312
119894119894
1
212060112
119894
120601119894]119879
le 120574119879
(
1
2
119882119879
119894119865
119894119865
Λ119879
119894119865
119894119865+
1
2
119879
119894119865
119894119865Λ119879
119894119865
119894119865minus
119879
119894119865
119894119865
Λ
119879
119894119865119890119894119865
+120572119894119865minus
1
2
119882119879
119894119865
119894119865
Λ119879
119894119865119890119894119865minus
1
2
119879
119894119865
119894119865Λ119879
119894119865119890119894119865minus 120574119890
119894119865
minus119879
119894119865120581119894119865+119882
119879
119894119865119894119865Λ119879
119894119865119890119894119865+ 120576
119894119865(119890
119894119865) +
1
2
119879
119894119865
119894119865
Λ119879
119894119865
119894119865
+
1
2
119879
119894119865
119894119865
Λ119879
119894119865
119894119865minus 119896119864
119894minus 120573
1119870[sat (119890
119894119865)]
)
minus119864119879
119894(
minus1
2119882
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865minus1
2
119879
119894119865
120581119894119865Λ119879
119894119865119890119894119865
+119882119879
119894119865119894119865Λ119879
119894119865119890119894119865+ 120576
119894119865(119890
119894) minus 120573
1119870[sat (119890
119894119865)])
minus 119890119879
119894119865
1
2(
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865+1
2
119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865)
+ 120574119890119879
119894119865(119864
119894minus 120572119890
119894119865)
+ 12057321198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)100381710038171003817100381710038171003817100381710038171003817119890119894119865
1003817100381710038171003817
minus1
2120572 tr (119879
119894119865Γminus1
119894119882119865
119882
119894119865) minus
1
2120572 tr (Λ119879
119894119865Γminus1
119894Λ119865
Λ119894119865)
(64)
Put (53) (54) and (55) into (64) then we can get
119894119871(119890
119894119865 119864
119894)
= 119864119879
119894(119875
1198651+ 119875
1198652+ 119875
1198653minus 120573
1119870[sat (119890
119894119865)] minus 119896119864
119894minus 120574119890
119894119865)
+ 120574119890119879
119894119865(119864
119894minus 120572119890
119894119865)
minus 119864119879
119894(119875
1198652minus 120573
1119870[sat (119890
119894119865)])
minus 119890119879
1198941198651198751198653
+ 12057321198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)100381710038171003817100381710038171003817100381710038171003817119890119894119865
1003817100381710038171003817
minus1
2120572 tr (119879
119894119865Γminus1
119894119882119865
119882
119894119865)
minus1
2120572 tr (Λ119879
119894119865Γminus1
119894Λ119865
Λ119894119865)
Mathematical Problems in Engineering 9
= minus120572120574119890119879
119894119865119890119894119865+ (119864
119879
119894minus 119890
119879
119894119865)119875
1198653
1198691(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)2
41198963
minus1
2120572 tr (119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865119890119879
119894119865) minus
1
2120572 tr (Λ119879
119894119865
119890119894119865119890119879
119894119865
119879
119894119865
120581119894119865)
le minus1198961
100381710038171003817100381711989011989411986510038171003817100381710038172
minus 1198962
1003817100381710038171003817119864119894
10038171003817100381710038172
+1198691(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)2
41198963
10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
+1205732
21198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)2
41205721198964
10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
(65)
where 119896min = min1198961 119896
2 120585 = min119896
3 120572119896
4120573
2
2 and
119869(120593119894(119890
119879
119894119865 119864
119879
119894))
2
= 1198691(120593
119894(119890
119879
119894119865 119864
119879
119894))
2 + 1198692(120593
119894(119890
119879
119894119865 119864
119879
119894))
2 sothe following conclusion can be obtained
119894119871(119890
119894119865 119864
119894)
le minus119896min10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
+119869(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)210038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
4120585
le minus11988810038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
(66)
Therefore for an arbitrary constant 119888 minus119888120593119894(119890
119879
119894119865 119864
119879
119894)
2
is a negative semidefinite function which is defined in theadjustable interval119863 expressed as follows
119863 = 119889 (119905) | 119889 le 119869minus1
(2radic119896min120585) (67)
so that Lyapunov stability theory shows that the system isstable In order to make the subsystem of time varying con-strained reconfigurable modular robot tracking the desiredtrajectory progressively in this paper a novel decentralizedreinforcement learning robust optimal tracking controllerhas been designed by using the robust term to compensatethe neural network approximation errors Design the robustcontrol term as
119906119894119903119887
=119873
119903119887119890119894
119890119879119894119890119894+ 120577
(68)
In the equation above 120577 gt 0 is a constant And 119873119903119887can
be expressed as
119873119903119887ge [
[
1205752
ℎ119894
21198991
+1198991(minus120576
119894119886(119890
119894) minus 119877
minus1
119887119894(119890
119894) (nabla120576
119894119888(119890
119894)2))
2
21198992
+11989911198992
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
(minus120576119894119886(119890
119894) minus 119877
minus1
119887119894(119890
119894)nabla120576
119894119888(119890
119894)
2)
2
]
]
sdot(119890
119879
119894119890119894+ 120577)
211989911198992
1003817100381710038171003817119887119894 (119890119894)1003817100381710038171003817 119890
119879
119894119890119894
ge [1198992
1(minus120576
119894119886(119890
119894) minus 119877
minus1
119887119894(119890
119894)nabla120576
119894119888(119890
119894)
2)
2
+ 11989921205752
ℎ119894+ 2119899
2
11198992
2
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
sdot (minus120576119894119886(119890
119894) minus 119877
minus1
119887119894(119890
119894)nabla120576
119894119888(119890
119894)
2)
2
]
sdot(119890
119879
119894119890119894+ 120577)
41198992111989922
1003817100381710038171003817119887119894 (119890119894)1003817100381710038171003817 119890
119879
119894119890119894
(69)
Therefore the global control law can be designed asfollows
119906mix = 119906119894+ 119906
119894119903119887
= minus1
2119877minus1
119887119879
119894(119890
119894) 119878
119894119886(119890
119894)119879
119894119886+
119873119903119887119890119894
119890119879119894119890119894+ 120577
(70)
Theorem 6 Considering dynamics of the subsystem of timevarying constrained reconfigurable modular robot in (14) if thesystem parameters conditions and the assumptions are held thecritic-NN action-NN and identifier are given by (33) (34)and (45) respectively and the decentralized robust optimaltracking controller of the subsystem in (70) is adopted thenthe system is closed-loop stability and the desired trajectory canbe tracked asymptotically by the actual output
Proof Design the Lyapunov function as follows
119881119894119906(119890
119894 119906mix) =
1
21198991
tr 119879
119894119888
119894119888 +
1198991
21198992
tr 119879
119894119886
119894119886
+ 11989911198992[119890
119879
119894119865119890119894119865+ Ξint
infin
0
119903119894(119890
119894 119906mix) 119889120591]
(71)
where Ξ gt 0 is the undetermined parameter 119905 le 120591 lt infin Thederivation of (71) is shown as follows
119894119906(119890
119894 119906mix)
=1
21198991
tr 119879
119894119888
119882
119894119888 +
1198991
21198992
tr 119879
119894119886
119882
119894119886
+ 11989911198992[119890
119879
119894119865119890119894119865+ Ξ119903
119894(119890
119894 119906mix)]
=1
21198991
tr 119879
119894119888(minus119899
1119871119894(119871
119879
119894
119894119888+ 120575
ℎ119894))
+1198991
21198992
tr
times
119879
119894119886
[[[[
[
minus1198992119878119894119886(119890
119894)(
119879
119894119886119878119894119886(119890
119894) minus 120576
119894119886(119890
119894)
+1
2119877minus1
119887119894(119890
119894) nabla119878
119894119888(119890
119894)119879
119894119888
minus1
2119877minus1
119887119894(119890
119894) nabla120576
119894119888(119890
119894)
)
]]]]
]
+ 11989911198992119890119879
119894119865(119882
119879
119894119865120581 (Λ
119879
119894119865119890119894) + 120576
119894119865(119890
119894) + 119887
119894(119890
119894) mix)
+Ξ (119890119879
119894119876119890119890119894+ 119906
119879
mix119877119906mix)
10 Mathematical Problems in Engineering
le minus(1198712
119894119898minus1198991
21198712
119894119872)10038171003817100381710038171003817
119894119888
10038171003817100381710038171003817
2
+1
21198991
1205752
ℎ119894
minus (11989911198782
119894119886119898minus3
4119899111989921198782
119894119886119872)10038171003817100381710038171003817
119894119886
10038171003817100381710038171003817
2
+1198991
41198992
10038171003817100381710038171003817119877minus110038171003817100381710038171003817
21003817100381710038171003817119887119894(119890119894)10038171003817100381710038172 10038171003817100381710038171003817nabla119878
2
119894119888119872
10038171003817100381710038171003817
10038171003817100381710038171003817
119894119888
10038171003817100381710038171003817
2
+1198991
21198992
(120576119894119886(119890
119894) + 119877
minus1
119887119894(119890
119894)nabla120576
119894119888(119890
119894)
2)
119879
sdot (120576119894119886(119890
119894) + 119877
minus1
119887119894(119890
119894) nabla120576
119894119888(119890
119894)
2)
+ 11989911198992
1003817100381710038171003817119887119894 (119890119894)10038171003817100381710038172
120576119894119886(119890
119894)119879
120576119894119886(119890
119894)
+ 119899111989921198782
119894119886119872
1003817100381710038171003817119887119894 (119890119894)1003817100381710038171003817210038171003817100381710038171003817
119894119886
10038171003817100381710038171003817
2
+ 11989911198992Ξ120582min (119876119890
)1003817100381710038171003817119890119894119865
10038171003817100381710038172
+ 11989911198992(1003817100381710038171003817119887119894 (119890119894)
10038171003817100381710038172
minus Ξ120582min (119877))1003817100381710038171003817119906mix
10038171003817100381710038172
(72)
If the following inequalities can satisfy
120582min 1198761198901003817100381710038171003817119890119894119865
10038171003817100381710038172
2le 119890
119879
119894119865119876119890119890119894119865le 120582max 119876119890
1003817100381710038171003817119890119894119865
10038171003817100381710038172
2
120582min 1198771003817100381710038171003817119906mix
10038171003817100381710038172
2le 119906
119879
mix119877119906mix le 120582max 1198771003817100381710038171003817119906mix
10038171003817100381710038172
2
Ξ gt
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
120582min 119877
(73)
then 119894119906(119890
119894 119906mix) can be further transformed as
119894119906(119890
119894 119906mix)
le minus(1198712
119894119898minus1198991
21198712
119894119872minus
1198991
41198992
10038171003817100381710038171003817119877minus110038171003817100381710038171003817
21003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
nabla1198782
119894119888119872)10038171003817100381710038171003817
119894119888
10038171003817100381710038171003817
2
minus (11989911198782
119894119886119898minus3
4119899111989921198782
119894119886119872minus 119899
111989921198782
119894119886119872
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
)10038171003817100381710038171003817
119894119886
10038171003817100381710038171003817
2
minus 11989911198992(1003817100381710038171003817119887119894(119890119894)
10038171003817100381710038172
+ Ξ120582min (119877))1003817100381710038171003817119906mix
10038171003817100381710038172
minus 11989911198992Ξ120582min (119876119890
)1003817100381710038171003817119890119894119865
10038171003817100381710038172
le minus11989911198992Ξ120582min (119876119890
)1003817100381710038171003817119890119894119865
10038171003817100381710038172
(74)
Therefore we can get the conclusion that 119894119906(119890
119894 119906mix) lt 0
4 Simulations
In order to verify the validity and convergence of the pro-posed decentralized reinforcement learning robust optimaltracking control method based on ACI and to study theconvergence of the error by comparing the simulation resultin this paper two different configurations of the time varying
external constrained reconfigurablemodular robot have beenapplied shown in Figures 3 and 4
For the sake of the facilitation of the analysis of theconfigurations above we can transform them into a formof analytic charts which are shown in Figures 5 and 6where 119871
1 119871
2 and 119871
4are the length of the links 119871
3is the
distance between the time varying constraint joint and thebase modular
The time varying constraint can be defined as a kindof column which rotated about with a certain degree offreedom The constraint equations of configuration A andconfiguration B are shown as follows
Ψ119860(119902 119905) = 119871
1cos 119902
1+ 119871
2cos 119902
2minus [119871
3+ 119871
4cot120572 (119905)]
Ψ119861(119902 119905) = 119871
1+ 119871
2cos 119902
2minus [119871
3+ 119871
4cot120572 (119905)]
(75)
In the equation above the angle 120572(119905) between the timevarying constraint and the119883-axis can be defined as follows
120572 (119905) = 075120587 + 02 sin 119905
2 (76)
The initial positions of joint models are 1199021(0) = 2 119902
2(0) =
2 in configurationA and 1199021(0) = 2 119902
2(0) = 2 in configuration
BThe initial velocities of joints are zerosThe dynamicmodelof configurations A and B is designed as follows
119872119860(119902) = [
036 cos (1199022) + 06066 018 cos (119902
2) + 01233
018 cos (1199022) + 01233 01233
]
119872119861(119902) = [
017 minus 01166cos2 (1199022) minus006 cos (119902
2)
minus006 cos (1199022) 01233
]
119862119860(119902 119902) = [
minus036 sin (1199022) 119902
2minus018 sin (119902
2) 119902
2
018 sin (1199022) ( 119902
1minus 119902
2) 018 sin (119902
2) 119902
1
]
119862119861(119902 119902) = [
01166 sin (21199022) 119902
2006 sin (119902
2) 119902
2
006 sin (1199022) 119902
20
]
119866119860(119902) = [
minus588 sin (1199021+ 119902
2) minus 1764 sin (119902
1)
minus588 sin (1199021+ 119902
2)
]
119866119861(119902) = [
0
minus588 cos (1199022)]
119865119860(119902 119902) = [
1199021+ 10 sin (3119902
1) + 2 sgn ( 119902
1)
12 1199022+ 5 sin (2119902
2) + sgn ( 119902
2)]
119865119861(119902 119902) = [
0
15 1199022+ sin (119902
2) + 12 sgn ( 119902
2)]
(77)
The desired trajectory of configurations A and B is shown asConfiguration A
1199101119889
= 05 cos (119905) + 02 sin (3119905)
1199102119889
= Θ (1199101119889 119905)
= arcsin[1198711sin (120572 (119905) minus 119910
1119889) minus 119871
3sin (120572 (119905))
1198712
] + 120572 (119905)
(78)
Mathematical Problems in Engineering 11
Figure 3 Configuration A for simulation
Figure 4 Configuration B for simulation
Configuration B
1199101119889
= 0
1199102119889
= Θ (1199101119889 119905)
= arcsin [1198711sin (120572 (119905)) minus 119871
3sin (120572 (119905))
1198712
] + 120572 (119905)
(79)
Due to the limit of dynamic external constraints thevariety of joint 1 in configuration B is zero
In order to confirm that the adopted method can beapplied in different configurations and to verify the trackingperformance for the subsystem desired trajectory by usingthe ACI and RBF-NN based decentralized reinforcementlearning robust optimal tracking control method in this partthe comparative simulations which include the classic RBFneural network control method and ACI based decentral-ized reinforcement learning robust optimal tracking controlmethod have been adopted respectively
From Figures 7 8 9 10 11 12 13 and 14 the joint trackingand error curves are shown by using classic RBF neural net-work [4] to compensate the effect of the dynamic nonlinearterm and the interconnection term in the subsystem Figures7 to 10 show that the actual output trajectories take about2 seconds to track the desired trajectories This is due tothe fact that the classical neural network method requires alonger training process and parameter adaptive process Theerror curves in Figures 11ndash14 show that the joint subsystem
q1L2
L3
L4
L1
Y
X
120572
q2
Figure 5 The analytic chart of configuration A
q2
L4
L2
L1
L3
Y
120572
X
q1
Figure 6 The analytic chart of configuration B
constraints cannot be well compensated by adopting the clas-sical neural control method for the reconfigurable modularrobot when the time varying constrains exist When thejoint output variables turn larger the reconfigurable modularsystem cannot exhibit a good robustness and the trackingerrors are larger than before
Figures 15 16 17 18 19 20 21 and 22 showed thejoint tracking and error curves by adopting ACI to identifythe optimal 119876-function optimal control policy and globaldynamic nonlinear terms of the HJB equation in the sub-system Figures 15 to 18 show that the actual output trajec-tories of the subsystems can track the desired trajectoriesin 05 seconds by using the proposed robust reinforcementlearning optimal control method This is due to the excellentidentifying ability of ACIwhich can identify the uncertaintiescontained in the subsystems in a short timeThe error curvesin Figures 19ndash22 show that the tracking error is very smalland it can converge to zero in a short time Besides whenthe proposed decentralized control method is adopted for thejoint subsystems the robustness is manifested
Figures 23 and 24 showed the 3D-tip trajectory curvesby using ACI algorithm These two figures show that theproposed decentralized control method can fully satisfythe accessibility requirement of the reconfigurable modularrobot Besides the singular displacements of the joint subsys-tems and the end-effector would not appear The parametersdefined in ACI are shown in Table 1
12 Mathematical Problems in Engineering
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
25
Time (s)
Join
t 1 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
minus1
minus05
Figure 7 Trajectory tracking curve of configuration A joint 1 withRBF neural network
0 1 2 3 4 5 6 7 8 9 1008
1
12
14
16
18
2
22
24
26
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 8 Trajectory tracking curve of configuration A joint 2 withRBF neural network
Table 1 Parameter list of action-critic-identifier
119896 120572 120592 1205781198861
1205781198862
120578119888
1205731
1205732
120574
800 300 0005 10 50 20 02 2 05
The simulation results show that compared with thesituation of using classic RBF neural network controllerthe decentralized robust optimal tracking control methodbased on ACI and reinforcement learning can be appliedinto different configurations of the time varying constrainedreconfigurable modular robot And the joint variables cantrack the desired trajectory within a very short time indifferent configurations and the fluctuation of the errorconvergence range is minimal
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus2
minus15
minus05
minus1
Desired trajectoryActual trajectory
Figure 9 Trajectory tracking curve of configuration B joint 1 withRBF neural network
0 1 2 3 4 5 6 7 8 9 1017
18
19
2
21
22
23
24
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 10 Trajectory tracking curve of configuration B joint 2 withRBF neural network
0 1 2 3 4 5 6 7 8 9 10
0
002
004
006
008
01
Time (s)
Join
t 1 er
ror (
rad)
minus01
minus008
minus006
minus004
minus002
Figure 11 Tracking error curve of configuration A joint 1 with RBFneural network
Mathematical Problems in Engineering 13
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
002
004
006
008
01
Join
t 2 er
ror (
rad)
minus01
minus008
minus006
minus004
minus002
Figure 12 Tracking error curve of configuration A joint 2 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10
0
001
002
003
004
005
Time (s)
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 13 Tracking error curve of configuration B joint 1 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 2 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 14 Tracking error curve of configuration B joint 2 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus1
minus05
Desired trajectoryActual trajectory
Figure 15 Trajectory tracking curve of configuration A joint 1 withACI based reinforcement learning
0 1 2 3 4 5 6 7 8 9 1008
1
12
14
16
18
2
22
24
26
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 16 Trajectory tracking curve of configuration A joint 2 withACI based reinforcement learning
5 Conclusions and Future Work
In this paper combining ACI with RBF neural network anovel decentralized reinforcement learning robust optimaltracking control theory has been proposed for time varyingconstrained reconfigurable modular robots Moreover thistheory is used to solve the problem of the continuoustime nonlinear optimal control policy for strongly coupleduncertainty robotic system Firstly we build the model ofsubsystem with the time varying external constraints anddescribed the global robot system as a synthesis of inter-connected subsystem Secondly ACI is used to estimate theHJB equation and the global uncertainty where a continuous-time optimal 119876-function is adopted to take the place oftraditional optimal value function the optimal 119876-function
14 Mathematical Problems in Engineering
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus15
minus05
minus2
minus1
Desired trajectoryActual trajectory
Figure 17 Trajectory tracking curve of configuration B joint 1 withACI based reinforcement learning
0 1 2 3 4 5 6 7 8 9 1017
18
19
2
21
22
23
24
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 18 Trajectory tracking curve of configuration B joint 2 withACI based reinforcement learning
and the optimal control police are approximated by critic-NNand action-NN and the global uncertainty is identified bythe identifier Thirdly we design a novel decentralized robustoptimal tracking controller so the desired trajectory can betracked and the tracking error could converge to zero infinite timeOn this basis two kinds of Lyapunov functions aredesigned to confirm the stability of ACI and the subsystemFinally in order to confirm the superiority of the proposedcontrol theory the comparative simulation examples havebeen presented combining two different configurations of therobot
In the future more complex configuration for timevarying external constrained reconfigurable modular robotcan be included Therefore the decentralized controller withhigher control and error precision will be considered
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 19 Tracking error curve of configuration A joint 1 with ACIbased reinforcement learning
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005Jo
int 2
erro
r (ra
d)
minus005
minus004
minus003
minus002
minus001
Figure 20 Tracking error curve of configuration A joint 2 with ACIbased reinforcement learning
0
0
1 2 3 4 5 6 7 8 9 10Time (s)
001
002
003
004
005
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 21 Tracking error curve of configuration B joint 1 with ACIbased reinforcement learning
Mathematical Problems in Engineering 15
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 2 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 22 Tracking error curve of configuration B joint 2 with ACIbased reinforcement learning
005
1
02 03 04 05 06 07
0
01
02
03
minus1
minus05minus02
minus01
xla
bel
y labelz label
Desired trajectoryActual trajectory
Figure 23 3D-tip trajectory curve of configuration A with ACI
005
1
035 036 037 038 039 04
006008
01012014016018
minus1
minus05
xla
bel
y labelz label
Desired trajectoryActual trajectory
Figure 24 3D-tip trajectory curve of configuration B with ACI
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work is financially supported by the National NaturalScience Foundation of China (Grant nos 61374051 and60974010) the Scientific Technological Development PlanProject in Jilin Province of China (Grant no 20110705) Thefirst author is funded by China Scholarship Council
References
[1] Y Li and B Dong ldquoDecentralized ADRC control for reconfig-urable manipulators based on VGSTA-ESO of sliding moderdquoINFORMATION vol 15 no 6 pp 2453ndash2466 2012
[2] Y LiM-C Zhu andY-C Li ldquoSimulation on robust neurofuzzycompensator for reconfigurable manipulator motion controlrdquoJournal of System Simulation vol 19 no 22 pp 5169ndash5174 2007
[3] M-C Zhu and Y-C Li ldquoDecentralized adaptive sliding modecontrol for reconfigurable manipulators using fuzzy logicrdquoJournal of Jilin University (Engineering and Technology Edition)vol 39 no 1 pp 170ndash176 2009
[4] L Zhu and Y Li ldquoDecentralized adaptive neural networkcontrol for reconfigurable manipulatorsrdquo in Proceedings of theChinese Control and Decision Conference (CCDC rsquo10) pp 1760ndash1765 Xuzhou China May 2010
[5] M Zhu Y Li and Y Li ldquoA new distributed control schemeof modular and reconfigurable robotsrdquo in Proceedings of theIEEE International Conference onMechatronics and Automation(ICMA rsquo07) pp 2622ndash2627 Harbin China August 2007
[6] M C Zhu Y Li and Y C Li ldquoObserver-based decentralizedadaptive fuzzy control for reconfigurable manipulatorrdquo Controland Decision vol 24 no 3 pp 429ndash434 2009
[7] S Richard and A G Barto Reinforcement Learning An Intro-duction The MIT Press London UK 1998
[8] F L Lewis and D Liu ReinForcement Learning and Approxi-mate Dynamic Programming For Feedback Control Wiley-IEEEPress New York NY USA 2012
[9] Y-K Xu and X-R Cao ldquoLebesgue-sampling-based optimalcontrol problems with time aggregationrdquo IEEE Transactions onAutomatic Control vol 56 no 5 pp 1097ndash1109 2011
[10] F L Lewis andD Vrabie ldquoReinforcement learning and adaptivedynamic programming for feedback controlrdquo IEEE Circuits andSystems Magazine vol 9 no 3 pp 32ndash50 2009
[11] X Xu H-G He and D Hu ldquoEfficient reinforcement learningusing recursive least-squares methodsrdquo Journal of ArtificialIntelligence Research vol 16 pp 259ndash292 2002
[12] H Zhang Q Wei and Y Luo ldquoA novel infinite-time optimaltracking control scheme for a class of discrete-time nonlinearsystems via the greedy HDP iteration algorithmrdquo IEEE Trans-actions on Systems Man and Cybernetics B vol 38 no 4 pp937ndash942 2008
[13] H Zhang R Song Q Wei and T Zhang ldquoOptimal trackingcontrol for a class of nonlinear discrete-time systems withtime delays based on heuristic dynamic programmingrdquo IEEETransactions on Neural Networks vol 22 no 12 pp 1851ndash18622011
16 Mathematical Problems in Engineering
[14] H Zhang X Xie and S Tong ldquoHomogenous polynomiallyparameter-dependent 119867
infinfilter designs of discrete-time fuzzy
systemsrdquo IEEE Transactions on Systems Man and CyberneticsB vol 41 no 5 pp 1313ndash1322 2011
[15] H Zhang L Cui X Zhang and Y Luo ldquoData-drivenrobust approximate optimal tracking control for unknown gen-eral nonlinear systems using adaptive dynamic programmingmethodrdquo IEEE Transactions on Neural Networks vol 22 no 12pp 2226ndash2236 2011
[16] J Zhang H Zhang Y Luo and H Liang ldquoOptimal controldesign for nonlinear systems adaptive dynamic programmingbased on fuzzy critic estimatorrdquo in The International JointConference on Neural Network pp 1ndash6 Brisbane Australia2012
[17] H Zhang D Gong B Chen and Z Liu ldquoSynchronization forcoupled neural networkswith interval delay a novel augmentedlyapunov-krasovskii functional methodrdquo IEEE Transactions onNeural Networks and Learning Systems vol 24 no 1 pp 58ndash702013
[18] S Bhasin K Dupree P M Patre and W E Dixon ldquoNeuralnetwork control of a robot interacting with an uncertain vis-coelastic environmentrdquo IEEE Transactions on Control SystemsTechnology vol 19 no 4 pp 947ndash955 2011
[19] S G Khan G Herrmann F Lewis T Pipe and C MelhuishldquoA Q-learning based Cartesian model reference compliancecontroller implementation for a humanoid robot armrdquo inProceedings of the IEEE 5th International Conference onRoboticsAutomation andMechatronics (RAM rsquo11) pp 214ndash219 QingdaoChina September 2011
[20] P K Patchaikani L Behera and G Prasad ldquoA single networkadaptive critic-based redundancy resolution scheme for robotmanipulatorsrdquo IEEE Transactions on Industrial Electronics vol59 no 8 pp 3241ndash3253 2012
[21] C J C H Watkins and P Dayan ldquoQ-learningrdquo MachineLearning vol 8 no 3-4 pp 279ndash292 1992
[22] C J CHWatkinsLearning fromdelayed rewards [PhD thesis]University of Cambridge Cambridge UK 1989
[23] F L Lewis and V L Syrmos Optimal Control John Wiley ampSons New York NY USA 1995
[24] M Sassano and A Astolfi ldquoDynamic approximate solutionsof the HJ inequality and of the HJB equation for input-affinenonlinear systemsrdquo IEEE Transactions on Automatic Controlvol 57 no 10 pp 2490ndash2503 2012
[25] Y X Wu and C Wang ldquoDeterministic learning based adaptivenetwork control of robot in task spacerdquoActa Automatica Sinicavol 39 no 1 pp 1ndash10 2013
[26] P M Patre W MacKunis K Kaiser and W E DixonldquoAsymptotic tracking for uncertain dynamic systems via amultilayer neural network feedforward and RISE feedbackcontrol structurerdquo IEEE Transactions on Automatic Control vol53 no 9 pp 2180ndash2185 2008
[27] B E Paden and S S Sastry ldquoA calculus for computing Filippovrsquosdifferential inclusion with application to the variable structurecontrol of robot manipulatorsrdquo IEEE Transactions on Circuitsand Systems vol 34 no 1 pp 73ndash82 1987
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Mathematical Problems in Engineering 9
= minus120572120574119890119879
119894119865119890119894119865+ (119864
119879
119894minus 119890
119879
119894119865)119875
1198653
1198691(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)2
41198963
minus1
2120572 tr (119879
119894119865
120581119894119865Λ119879
119894119865
119890119894119865119890119879
119894119865) minus
1
2120572 tr (Λ119879
119894119865
119890119894119865119890119879
119894119865
119879
119894119865
120581119894119865)
le minus1198961
100381710038171003817100381711989011989411986510038171003817100381710038172
minus 1198962
1003817100381710038171003817119864119894
10038171003817100381710038172
+1198691(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)2
41198963
10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
+1205732
21198692(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)2
41205721198964
10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
(65)
where 119896min = min1198961 119896
2 120585 = min119896
3 120572119896
4120573
2
2 and
119869(120593119894(119890
119879
119894119865 119864
119879
119894))
2
= 1198691(120593
119894(119890
119879
119894119865 119864
119879
119894))
2 + 1198692(120593
119894(119890
119879
119894119865 119864
119879
119894))
2 sothe following conclusion can be obtained
119894119871(119890
119894119865 119864
119894)
le minus119896min10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
+119869(10038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817)210038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
4120585
le minus11988810038171003817100381710038171003817120593119894(119890
119879
119894119865 119864
119879
119894)10038171003817100381710038171003817
2
(66)
Therefore for an arbitrary constant 119888 minus119888120593119894(119890
119879
119894119865 119864
119879
119894)
2
is a negative semidefinite function which is defined in theadjustable interval119863 expressed as follows
119863 = 119889 (119905) | 119889 le 119869minus1
(2radic119896min120585) (67)
so that Lyapunov stability theory shows that the system isstable In order to make the subsystem of time varying con-strained reconfigurable modular robot tracking the desiredtrajectory progressively in this paper a novel decentralizedreinforcement learning robust optimal tracking controllerhas been designed by using the robust term to compensatethe neural network approximation errors Design the robustcontrol term as
119906119894119903119887
=119873
119903119887119890119894
119890119879119894119890119894+ 120577
(68)
In the equation above 120577 gt 0 is a constant And 119873119903119887can
be expressed as
119873119903119887ge [
[
1205752
ℎ119894
21198991
+1198991(minus120576
119894119886(119890
119894) minus 119877
minus1
119887119894(119890
119894) (nabla120576
119894119888(119890
119894)2))
2
21198992
+11989911198992
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
(minus120576119894119886(119890
119894) minus 119877
minus1
119887119894(119890
119894)nabla120576
119894119888(119890
119894)
2)
2
]
]
sdot(119890
119879
119894119890119894+ 120577)
211989911198992
1003817100381710038171003817119887119894 (119890119894)1003817100381710038171003817 119890
119879
119894119890119894
ge [1198992
1(minus120576
119894119886(119890
119894) minus 119877
minus1
119887119894(119890
119894)nabla120576
119894119888(119890
119894)
2)
2
+ 11989921205752
ℎ119894+ 2119899
2
11198992
2
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
sdot (minus120576119894119886(119890
119894) minus 119877
minus1
119887119894(119890
119894)nabla120576
119894119888(119890
119894)
2)
2
]
sdot(119890
119879
119894119890119894+ 120577)
41198992111989922
1003817100381710038171003817119887119894 (119890119894)1003817100381710038171003817 119890
119879
119894119890119894
(69)
Therefore the global control law can be designed asfollows
119906mix = 119906119894+ 119906
119894119903119887
= minus1
2119877minus1
119887119879
119894(119890
119894) 119878
119894119886(119890
119894)119879
119894119886+
119873119903119887119890119894
119890119879119894119890119894+ 120577
(70)
Theorem 6 Considering dynamics of the subsystem of timevarying constrained reconfigurable modular robot in (14) if thesystem parameters conditions and the assumptions are held thecritic-NN action-NN and identifier are given by (33) (34)and (45) respectively and the decentralized robust optimaltracking controller of the subsystem in (70) is adopted thenthe system is closed-loop stability and the desired trajectory canbe tracked asymptotically by the actual output
Proof Design the Lyapunov function as follows
119881119894119906(119890
119894 119906mix) =
1
21198991
tr 119879
119894119888
119894119888 +
1198991
21198992
tr 119879
119894119886
119894119886
+ 11989911198992[119890
119879
119894119865119890119894119865+ Ξint
infin
0
119903119894(119890
119894 119906mix) 119889120591]
(71)
where Ξ gt 0 is the undetermined parameter 119905 le 120591 lt infin Thederivation of (71) is shown as follows
119894119906(119890
119894 119906mix)
=1
21198991
tr 119879
119894119888
119882
119894119888 +
1198991
21198992
tr 119879
119894119886
119882
119894119886
+ 11989911198992[119890
119879
119894119865119890119894119865+ Ξ119903
119894(119890
119894 119906mix)]
=1
21198991
tr 119879
119894119888(minus119899
1119871119894(119871
119879
119894
119894119888+ 120575
ℎ119894))
+1198991
21198992
tr
times
119879
119894119886
[[[[
[
minus1198992119878119894119886(119890
119894)(
119879
119894119886119878119894119886(119890
119894) minus 120576
119894119886(119890
119894)
+1
2119877minus1
119887119894(119890
119894) nabla119878
119894119888(119890
119894)119879
119894119888
minus1
2119877minus1
119887119894(119890
119894) nabla120576
119894119888(119890
119894)
)
]]]]
]
+ 11989911198992119890119879
119894119865(119882
119879
119894119865120581 (Λ
119879
119894119865119890119894) + 120576
119894119865(119890
119894) + 119887
119894(119890
119894) mix)
+Ξ (119890119879
119894119876119890119890119894+ 119906
119879
mix119877119906mix)
10 Mathematical Problems in Engineering
le minus(1198712
119894119898minus1198991
21198712
119894119872)10038171003817100381710038171003817
119894119888
10038171003817100381710038171003817
2
+1
21198991
1205752
ℎ119894
minus (11989911198782
119894119886119898minus3
4119899111989921198782
119894119886119872)10038171003817100381710038171003817
119894119886
10038171003817100381710038171003817
2
+1198991
41198992
10038171003817100381710038171003817119877minus110038171003817100381710038171003817
21003817100381710038171003817119887119894(119890119894)10038171003817100381710038172 10038171003817100381710038171003817nabla119878
2
119894119888119872
10038171003817100381710038171003817
10038171003817100381710038171003817
119894119888
10038171003817100381710038171003817
2
+1198991
21198992
(120576119894119886(119890
119894) + 119877
minus1
119887119894(119890
119894)nabla120576
119894119888(119890
119894)
2)
119879
sdot (120576119894119886(119890
119894) + 119877
minus1
119887119894(119890
119894) nabla120576
119894119888(119890
119894)
2)
+ 11989911198992
1003817100381710038171003817119887119894 (119890119894)10038171003817100381710038172
120576119894119886(119890
119894)119879
120576119894119886(119890
119894)
+ 119899111989921198782
119894119886119872
1003817100381710038171003817119887119894 (119890119894)1003817100381710038171003817210038171003817100381710038171003817
119894119886
10038171003817100381710038171003817
2
+ 11989911198992Ξ120582min (119876119890
)1003817100381710038171003817119890119894119865
10038171003817100381710038172
+ 11989911198992(1003817100381710038171003817119887119894 (119890119894)
10038171003817100381710038172
minus Ξ120582min (119877))1003817100381710038171003817119906mix
10038171003817100381710038172
(72)
If the following inequalities can satisfy
120582min 1198761198901003817100381710038171003817119890119894119865
10038171003817100381710038172
2le 119890
119879
119894119865119876119890119890119894119865le 120582max 119876119890
1003817100381710038171003817119890119894119865
10038171003817100381710038172
2
120582min 1198771003817100381710038171003817119906mix
10038171003817100381710038172
2le 119906
119879
mix119877119906mix le 120582max 1198771003817100381710038171003817119906mix
10038171003817100381710038172
2
Ξ gt
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
120582min 119877
(73)
then 119894119906(119890
119894 119906mix) can be further transformed as
119894119906(119890
119894 119906mix)
le minus(1198712
119894119898minus1198991
21198712
119894119872minus
1198991
41198992
10038171003817100381710038171003817119877minus110038171003817100381710038171003817
21003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
nabla1198782
119894119888119872)10038171003817100381710038171003817
119894119888
10038171003817100381710038171003817
2
minus (11989911198782
119894119886119898minus3
4119899111989921198782
119894119886119872minus 119899
111989921198782
119894119886119872
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
)10038171003817100381710038171003817
119894119886
10038171003817100381710038171003817
2
minus 11989911198992(1003817100381710038171003817119887119894(119890119894)
10038171003817100381710038172
+ Ξ120582min (119877))1003817100381710038171003817119906mix
10038171003817100381710038172
minus 11989911198992Ξ120582min (119876119890
)1003817100381710038171003817119890119894119865
10038171003817100381710038172
le minus11989911198992Ξ120582min (119876119890
)1003817100381710038171003817119890119894119865
10038171003817100381710038172
(74)
Therefore we can get the conclusion that 119894119906(119890
119894 119906mix) lt 0
4 Simulations
In order to verify the validity and convergence of the pro-posed decentralized reinforcement learning robust optimaltracking control method based on ACI and to study theconvergence of the error by comparing the simulation resultin this paper two different configurations of the time varying
external constrained reconfigurablemodular robot have beenapplied shown in Figures 3 and 4
For the sake of the facilitation of the analysis of theconfigurations above we can transform them into a formof analytic charts which are shown in Figures 5 and 6where 119871
1 119871
2 and 119871
4are the length of the links 119871
3is the
distance between the time varying constraint joint and thebase modular
The time varying constraint can be defined as a kindof column which rotated about with a certain degree offreedom The constraint equations of configuration A andconfiguration B are shown as follows
Ψ119860(119902 119905) = 119871
1cos 119902
1+ 119871
2cos 119902
2minus [119871
3+ 119871
4cot120572 (119905)]
Ψ119861(119902 119905) = 119871
1+ 119871
2cos 119902
2minus [119871
3+ 119871
4cot120572 (119905)]
(75)
In the equation above the angle 120572(119905) between the timevarying constraint and the119883-axis can be defined as follows
120572 (119905) = 075120587 + 02 sin 119905
2 (76)
The initial positions of joint models are 1199021(0) = 2 119902
2(0) =
2 in configurationA and 1199021(0) = 2 119902
2(0) = 2 in configuration
BThe initial velocities of joints are zerosThe dynamicmodelof configurations A and B is designed as follows
119872119860(119902) = [
036 cos (1199022) + 06066 018 cos (119902
2) + 01233
018 cos (1199022) + 01233 01233
]
119872119861(119902) = [
017 minus 01166cos2 (1199022) minus006 cos (119902
2)
minus006 cos (1199022) 01233
]
119862119860(119902 119902) = [
minus036 sin (1199022) 119902
2minus018 sin (119902
2) 119902
2
018 sin (1199022) ( 119902
1minus 119902
2) 018 sin (119902
2) 119902
1
]
119862119861(119902 119902) = [
01166 sin (21199022) 119902
2006 sin (119902
2) 119902
2
006 sin (1199022) 119902
20
]
119866119860(119902) = [
minus588 sin (1199021+ 119902
2) minus 1764 sin (119902
1)
minus588 sin (1199021+ 119902
2)
]
119866119861(119902) = [
0
minus588 cos (1199022)]
119865119860(119902 119902) = [
1199021+ 10 sin (3119902
1) + 2 sgn ( 119902
1)
12 1199022+ 5 sin (2119902
2) + sgn ( 119902
2)]
119865119861(119902 119902) = [
0
15 1199022+ sin (119902
2) + 12 sgn ( 119902
2)]
(77)
The desired trajectory of configurations A and B is shown asConfiguration A
1199101119889
= 05 cos (119905) + 02 sin (3119905)
1199102119889
= Θ (1199101119889 119905)
= arcsin[1198711sin (120572 (119905) minus 119910
1119889) minus 119871
3sin (120572 (119905))
1198712
] + 120572 (119905)
(78)
Mathematical Problems in Engineering 11
Figure 3 Configuration A for simulation
Figure 4 Configuration B for simulation
Configuration B
1199101119889
= 0
1199102119889
= Θ (1199101119889 119905)
= arcsin [1198711sin (120572 (119905)) minus 119871
3sin (120572 (119905))
1198712
] + 120572 (119905)
(79)
Due to the limit of dynamic external constraints thevariety of joint 1 in configuration B is zero
In order to confirm that the adopted method can beapplied in different configurations and to verify the trackingperformance for the subsystem desired trajectory by usingthe ACI and RBF-NN based decentralized reinforcementlearning robust optimal tracking control method in this partthe comparative simulations which include the classic RBFneural network control method and ACI based decentral-ized reinforcement learning robust optimal tracking controlmethod have been adopted respectively
From Figures 7 8 9 10 11 12 13 and 14 the joint trackingand error curves are shown by using classic RBF neural net-work [4] to compensate the effect of the dynamic nonlinearterm and the interconnection term in the subsystem Figures7 to 10 show that the actual output trajectories take about2 seconds to track the desired trajectories This is due tothe fact that the classical neural network method requires alonger training process and parameter adaptive process Theerror curves in Figures 11ndash14 show that the joint subsystem
q1L2
L3
L4
L1
Y
X
120572
q2
Figure 5 The analytic chart of configuration A
q2
L4
L2
L1
L3
Y
120572
X
q1
Figure 6 The analytic chart of configuration B
constraints cannot be well compensated by adopting the clas-sical neural control method for the reconfigurable modularrobot when the time varying constrains exist When thejoint output variables turn larger the reconfigurable modularsystem cannot exhibit a good robustness and the trackingerrors are larger than before
Figures 15 16 17 18 19 20 21 and 22 showed thejoint tracking and error curves by adopting ACI to identifythe optimal 119876-function optimal control policy and globaldynamic nonlinear terms of the HJB equation in the sub-system Figures 15 to 18 show that the actual output trajec-tories of the subsystems can track the desired trajectoriesin 05 seconds by using the proposed robust reinforcementlearning optimal control method This is due to the excellentidentifying ability of ACIwhich can identify the uncertaintiescontained in the subsystems in a short timeThe error curvesin Figures 19ndash22 show that the tracking error is very smalland it can converge to zero in a short time Besides whenthe proposed decentralized control method is adopted for thejoint subsystems the robustness is manifested
Figures 23 and 24 showed the 3D-tip trajectory curvesby using ACI algorithm These two figures show that theproposed decentralized control method can fully satisfythe accessibility requirement of the reconfigurable modularrobot Besides the singular displacements of the joint subsys-tems and the end-effector would not appear The parametersdefined in ACI are shown in Table 1
12 Mathematical Problems in Engineering
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
25
Time (s)
Join
t 1 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
minus1
minus05
Figure 7 Trajectory tracking curve of configuration A joint 1 withRBF neural network
0 1 2 3 4 5 6 7 8 9 1008
1
12
14
16
18
2
22
24
26
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 8 Trajectory tracking curve of configuration A joint 2 withRBF neural network
Table 1 Parameter list of action-critic-identifier
119896 120572 120592 1205781198861
1205781198862
120578119888
1205731
1205732
120574
800 300 0005 10 50 20 02 2 05
The simulation results show that compared with thesituation of using classic RBF neural network controllerthe decentralized robust optimal tracking control methodbased on ACI and reinforcement learning can be appliedinto different configurations of the time varying constrainedreconfigurable modular robot And the joint variables cantrack the desired trajectory within a very short time indifferent configurations and the fluctuation of the errorconvergence range is minimal
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus2
minus15
minus05
minus1
Desired trajectoryActual trajectory
Figure 9 Trajectory tracking curve of configuration B joint 1 withRBF neural network
0 1 2 3 4 5 6 7 8 9 1017
18
19
2
21
22
23
24
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 10 Trajectory tracking curve of configuration B joint 2 withRBF neural network
0 1 2 3 4 5 6 7 8 9 10
0
002
004
006
008
01
Time (s)
Join
t 1 er
ror (
rad)
minus01
minus008
minus006
minus004
minus002
Figure 11 Tracking error curve of configuration A joint 1 with RBFneural network
Mathematical Problems in Engineering 13
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
002
004
006
008
01
Join
t 2 er
ror (
rad)
minus01
minus008
minus006
minus004
minus002
Figure 12 Tracking error curve of configuration A joint 2 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10
0
001
002
003
004
005
Time (s)
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 13 Tracking error curve of configuration B joint 1 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 2 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 14 Tracking error curve of configuration B joint 2 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus1
minus05
Desired trajectoryActual trajectory
Figure 15 Trajectory tracking curve of configuration A joint 1 withACI based reinforcement learning
0 1 2 3 4 5 6 7 8 9 1008
1
12
14
16
18
2
22
24
26
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 16 Trajectory tracking curve of configuration A joint 2 withACI based reinforcement learning
5 Conclusions and Future Work
In this paper combining ACI with RBF neural network anovel decentralized reinforcement learning robust optimaltracking control theory has been proposed for time varyingconstrained reconfigurable modular robots Moreover thistheory is used to solve the problem of the continuoustime nonlinear optimal control policy for strongly coupleduncertainty robotic system Firstly we build the model ofsubsystem with the time varying external constraints anddescribed the global robot system as a synthesis of inter-connected subsystem Secondly ACI is used to estimate theHJB equation and the global uncertainty where a continuous-time optimal 119876-function is adopted to take the place oftraditional optimal value function the optimal 119876-function
14 Mathematical Problems in Engineering
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus15
minus05
minus2
minus1
Desired trajectoryActual trajectory
Figure 17 Trajectory tracking curve of configuration B joint 1 withACI based reinforcement learning
0 1 2 3 4 5 6 7 8 9 1017
18
19
2
21
22
23
24
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 18 Trajectory tracking curve of configuration B joint 2 withACI based reinforcement learning
and the optimal control police are approximated by critic-NNand action-NN and the global uncertainty is identified bythe identifier Thirdly we design a novel decentralized robustoptimal tracking controller so the desired trajectory can betracked and the tracking error could converge to zero infinite timeOn this basis two kinds of Lyapunov functions aredesigned to confirm the stability of ACI and the subsystemFinally in order to confirm the superiority of the proposedcontrol theory the comparative simulation examples havebeen presented combining two different configurations of therobot
In the future more complex configuration for timevarying external constrained reconfigurable modular robotcan be included Therefore the decentralized controller withhigher control and error precision will be considered
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 19 Tracking error curve of configuration A joint 1 with ACIbased reinforcement learning
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005Jo
int 2
erro
r (ra
d)
minus005
minus004
minus003
minus002
minus001
Figure 20 Tracking error curve of configuration A joint 2 with ACIbased reinforcement learning
0
0
1 2 3 4 5 6 7 8 9 10Time (s)
001
002
003
004
005
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 21 Tracking error curve of configuration B joint 1 with ACIbased reinforcement learning
Mathematical Problems in Engineering 15
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 2 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 22 Tracking error curve of configuration B joint 2 with ACIbased reinforcement learning
005
1
02 03 04 05 06 07
0
01
02
03
minus1
minus05minus02
minus01
xla
bel
y labelz label
Desired trajectoryActual trajectory
Figure 23 3D-tip trajectory curve of configuration A with ACI
005
1
035 036 037 038 039 04
006008
01012014016018
minus1
minus05
xla
bel
y labelz label
Desired trajectoryActual trajectory
Figure 24 3D-tip trajectory curve of configuration B with ACI
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work is financially supported by the National NaturalScience Foundation of China (Grant nos 61374051 and60974010) the Scientific Technological Development PlanProject in Jilin Province of China (Grant no 20110705) Thefirst author is funded by China Scholarship Council
References
[1] Y Li and B Dong ldquoDecentralized ADRC control for reconfig-urable manipulators based on VGSTA-ESO of sliding moderdquoINFORMATION vol 15 no 6 pp 2453ndash2466 2012
[2] Y LiM-C Zhu andY-C Li ldquoSimulation on robust neurofuzzycompensator for reconfigurable manipulator motion controlrdquoJournal of System Simulation vol 19 no 22 pp 5169ndash5174 2007
[3] M-C Zhu and Y-C Li ldquoDecentralized adaptive sliding modecontrol for reconfigurable manipulators using fuzzy logicrdquoJournal of Jilin University (Engineering and Technology Edition)vol 39 no 1 pp 170ndash176 2009
[4] L Zhu and Y Li ldquoDecentralized adaptive neural networkcontrol for reconfigurable manipulatorsrdquo in Proceedings of theChinese Control and Decision Conference (CCDC rsquo10) pp 1760ndash1765 Xuzhou China May 2010
[5] M Zhu Y Li and Y Li ldquoA new distributed control schemeof modular and reconfigurable robotsrdquo in Proceedings of theIEEE International Conference onMechatronics and Automation(ICMA rsquo07) pp 2622ndash2627 Harbin China August 2007
[6] M C Zhu Y Li and Y C Li ldquoObserver-based decentralizedadaptive fuzzy control for reconfigurable manipulatorrdquo Controland Decision vol 24 no 3 pp 429ndash434 2009
[7] S Richard and A G Barto Reinforcement Learning An Intro-duction The MIT Press London UK 1998
[8] F L Lewis and D Liu ReinForcement Learning and Approxi-mate Dynamic Programming For Feedback Control Wiley-IEEEPress New York NY USA 2012
[9] Y-K Xu and X-R Cao ldquoLebesgue-sampling-based optimalcontrol problems with time aggregationrdquo IEEE Transactions onAutomatic Control vol 56 no 5 pp 1097ndash1109 2011
[10] F L Lewis andD Vrabie ldquoReinforcement learning and adaptivedynamic programming for feedback controlrdquo IEEE Circuits andSystems Magazine vol 9 no 3 pp 32ndash50 2009
[11] X Xu H-G He and D Hu ldquoEfficient reinforcement learningusing recursive least-squares methodsrdquo Journal of ArtificialIntelligence Research vol 16 pp 259ndash292 2002
[12] H Zhang Q Wei and Y Luo ldquoA novel infinite-time optimaltracking control scheme for a class of discrete-time nonlinearsystems via the greedy HDP iteration algorithmrdquo IEEE Trans-actions on Systems Man and Cybernetics B vol 38 no 4 pp937ndash942 2008
[13] H Zhang R Song Q Wei and T Zhang ldquoOptimal trackingcontrol for a class of nonlinear discrete-time systems withtime delays based on heuristic dynamic programmingrdquo IEEETransactions on Neural Networks vol 22 no 12 pp 1851ndash18622011
16 Mathematical Problems in Engineering
[14] H Zhang X Xie and S Tong ldquoHomogenous polynomiallyparameter-dependent 119867
infinfilter designs of discrete-time fuzzy
systemsrdquo IEEE Transactions on Systems Man and CyberneticsB vol 41 no 5 pp 1313ndash1322 2011
[15] H Zhang L Cui X Zhang and Y Luo ldquoData-drivenrobust approximate optimal tracking control for unknown gen-eral nonlinear systems using adaptive dynamic programmingmethodrdquo IEEE Transactions on Neural Networks vol 22 no 12pp 2226ndash2236 2011
[16] J Zhang H Zhang Y Luo and H Liang ldquoOptimal controldesign for nonlinear systems adaptive dynamic programmingbased on fuzzy critic estimatorrdquo in The International JointConference on Neural Network pp 1ndash6 Brisbane Australia2012
[17] H Zhang D Gong B Chen and Z Liu ldquoSynchronization forcoupled neural networkswith interval delay a novel augmentedlyapunov-krasovskii functional methodrdquo IEEE Transactions onNeural Networks and Learning Systems vol 24 no 1 pp 58ndash702013
[18] S Bhasin K Dupree P M Patre and W E Dixon ldquoNeuralnetwork control of a robot interacting with an uncertain vis-coelastic environmentrdquo IEEE Transactions on Control SystemsTechnology vol 19 no 4 pp 947ndash955 2011
[19] S G Khan G Herrmann F Lewis T Pipe and C MelhuishldquoA Q-learning based Cartesian model reference compliancecontroller implementation for a humanoid robot armrdquo inProceedings of the IEEE 5th International Conference onRoboticsAutomation andMechatronics (RAM rsquo11) pp 214ndash219 QingdaoChina September 2011
[20] P K Patchaikani L Behera and G Prasad ldquoA single networkadaptive critic-based redundancy resolution scheme for robotmanipulatorsrdquo IEEE Transactions on Industrial Electronics vol59 no 8 pp 3241ndash3253 2012
[21] C J C H Watkins and P Dayan ldquoQ-learningrdquo MachineLearning vol 8 no 3-4 pp 279ndash292 1992
[22] C J CHWatkinsLearning fromdelayed rewards [PhD thesis]University of Cambridge Cambridge UK 1989
[23] F L Lewis and V L Syrmos Optimal Control John Wiley ampSons New York NY USA 1995
[24] M Sassano and A Astolfi ldquoDynamic approximate solutionsof the HJ inequality and of the HJB equation for input-affinenonlinear systemsrdquo IEEE Transactions on Automatic Controlvol 57 no 10 pp 2490ndash2503 2012
[25] Y X Wu and C Wang ldquoDeterministic learning based adaptivenetwork control of robot in task spacerdquoActa Automatica Sinicavol 39 no 1 pp 1ndash10 2013
[26] P M Patre W MacKunis K Kaiser and W E DixonldquoAsymptotic tracking for uncertain dynamic systems via amultilayer neural network feedforward and RISE feedbackcontrol structurerdquo IEEE Transactions on Automatic Control vol53 no 9 pp 2180ndash2185 2008
[27] B E Paden and S S Sastry ldquoA calculus for computing Filippovrsquosdifferential inclusion with application to the variable structurecontrol of robot manipulatorsrdquo IEEE Transactions on Circuitsand Systems vol 34 no 1 pp 73ndash82 1987
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
10 Mathematical Problems in Engineering
le minus(1198712
119894119898minus1198991
21198712
119894119872)10038171003817100381710038171003817
119894119888
10038171003817100381710038171003817
2
+1
21198991
1205752
ℎ119894
minus (11989911198782
119894119886119898minus3
4119899111989921198782
119894119886119872)10038171003817100381710038171003817
119894119886
10038171003817100381710038171003817
2
+1198991
41198992
10038171003817100381710038171003817119877minus110038171003817100381710038171003817
21003817100381710038171003817119887119894(119890119894)10038171003817100381710038172 10038171003817100381710038171003817nabla119878
2
119894119888119872
10038171003817100381710038171003817
10038171003817100381710038171003817
119894119888
10038171003817100381710038171003817
2
+1198991
21198992
(120576119894119886(119890
119894) + 119877
minus1
119887119894(119890
119894)nabla120576
119894119888(119890
119894)
2)
119879
sdot (120576119894119886(119890
119894) + 119877
minus1
119887119894(119890
119894) nabla120576
119894119888(119890
119894)
2)
+ 11989911198992
1003817100381710038171003817119887119894 (119890119894)10038171003817100381710038172
120576119894119886(119890
119894)119879
120576119894119886(119890
119894)
+ 119899111989921198782
119894119886119872
1003817100381710038171003817119887119894 (119890119894)1003817100381710038171003817210038171003817100381710038171003817
119894119886
10038171003817100381710038171003817
2
+ 11989911198992Ξ120582min (119876119890
)1003817100381710038171003817119890119894119865
10038171003817100381710038172
+ 11989911198992(1003817100381710038171003817119887119894 (119890119894)
10038171003817100381710038172
minus Ξ120582min (119877))1003817100381710038171003817119906mix
10038171003817100381710038172
(72)
If the following inequalities can satisfy
120582min 1198761198901003817100381710038171003817119890119894119865
10038171003817100381710038172
2le 119890
119879
119894119865119876119890119890119894119865le 120582max 119876119890
1003817100381710038171003817119890119894119865
10038171003817100381710038172
2
120582min 1198771003817100381710038171003817119906mix
10038171003817100381710038172
2le 119906
119879
mix119877119906mix le 120582max 1198771003817100381710038171003817119906mix
10038171003817100381710038172
2
Ξ gt
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
120582min 119877
(73)
then 119894119906(119890
119894 119906mix) can be further transformed as
119894119906(119890
119894 119906mix)
le minus(1198712
119894119898minus1198991
21198712
119894119872minus
1198991
41198992
10038171003817100381710038171003817119877minus110038171003817100381710038171003817
21003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
nabla1198782
119894119888119872)10038171003817100381710038171003817
119894119888
10038171003817100381710038171003817
2
minus (11989911198782
119894119886119898minus3
4119899111989921198782
119894119886119872minus 119899
111989921198782
119894119886119872
1003817100381710038171003817119887119894(119890119894)10038171003817100381710038172
)10038171003817100381710038171003817
119894119886
10038171003817100381710038171003817
2
minus 11989911198992(1003817100381710038171003817119887119894(119890119894)
10038171003817100381710038172
+ Ξ120582min (119877))1003817100381710038171003817119906mix
10038171003817100381710038172
minus 11989911198992Ξ120582min (119876119890
)1003817100381710038171003817119890119894119865
10038171003817100381710038172
le minus11989911198992Ξ120582min (119876119890
)1003817100381710038171003817119890119894119865
10038171003817100381710038172
(74)
Therefore we can get the conclusion that 119894119906(119890
119894 119906mix) lt 0
4 Simulations
In order to verify the validity and convergence of the pro-posed decentralized reinforcement learning robust optimaltracking control method based on ACI and to study theconvergence of the error by comparing the simulation resultin this paper two different configurations of the time varying
external constrained reconfigurablemodular robot have beenapplied shown in Figures 3 and 4
For the sake of the facilitation of the analysis of theconfigurations above we can transform them into a formof analytic charts which are shown in Figures 5 and 6where 119871
1 119871
2 and 119871
4are the length of the links 119871
3is the
distance between the time varying constraint joint and thebase modular
The time varying constraint can be defined as a kindof column which rotated about with a certain degree offreedom The constraint equations of configuration A andconfiguration B are shown as follows
Ψ119860(119902 119905) = 119871
1cos 119902
1+ 119871
2cos 119902
2minus [119871
3+ 119871
4cot120572 (119905)]
Ψ119861(119902 119905) = 119871
1+ 119871
2cos 119902
2minus [119871
3+ 119871
4cot120572 (119905)]
(75)
In the equation above the angle 120572(119905) between the timevarying constraint and the119883-axis can be defined as follows
120572 (119905) = 075120587 + 02 sin 119905
2 (76)
The initial positions of joint models are 1199021(0) = 2 119902
2(0) =
2 in configurationA and 1199021(0) = 2 119902
2(0) = 2 in configuration
BThe initial velocities of joints are zerosThe dynamicmodelof configurations A and B is designed as follows
119872119860(119902) = [
036 cos (1199022) + 06066 018 cos (119902
2) + 01233
018 cos (1199022) + 01233 01233
]
119872119861(119902) = [
017 minus 01166cos2 (1199022) minus006 cos (119902
2)
minus006 cos (1199022) 01233
]
119862119860(119902 119902) = [
minus036 sin (1199022) 119902
2minus018 sin (119902
2) 119902
2
018 sin (1199022) ( 119902
1minus 119902
2) 018 sin (119902
2) 119902
1
]
119862119861(119902 119902) = [
01166 sin (21199022) 119902
2006 sin (119902
2) 119902
2
006 sin (1199022) 119902
20
]
119866119860(119902) = [
minus588 sin (1199021+ 119902
2) minus 1764 sin (119902
1)
minus588 sin (1199021+ 119902
2)
]
119866119861(119902) = [
0
minus588 cos (1199022)]
119865119860(119902 119902) = [
1199021+ 10 sin (3119902
1) + 2 sgn ( 119902
1)
12 1199022+ 5 sin (2119902
2) + sgn ( 119902
2)]
119865119861(119902 119902) = [
0
15 1199022+ sin (119902
2) + 12 sgn ( 119902
2)]
(77)
The desired trajectory of configurations A and B is shown asConfiguration A
1199101119889
= 05 cos (119905) + 02 sin (3119905)
1199102119889
= Θ (1199101119889 119905)
= arcsin[1198711sin (120572 (119905) minus 119910
1119889) minus 119871
3sin (120572 (119905))
1198712
] + 120572 (119905)
(78)
Mathematical Problems in Engineering 11
Figure 3 Configuration A for simulation
Figure 4 Configuration B for simulation
Configuration B
1199101119889
= 0
1199102119889
= Θ (1199101119889 119905)
= arcsin [1198711sin (120572 (119905)) minus 119871
3sin (120572 (119905))
1198712
] + 120572 (119905)
(79)
Due to the limit of dynamic external constraints thevariety of joint 1 in configuration B is zero
In order to confirm that the adopted method can beapplied in different configurations and to verify the trackingperformance for the subsystem desired trajectory by usingthe ACI and RBF-NN based decentralized reinforcementlearning robust optimal tracking control method in this partthe comparative simulations which include the classic RBFneural network control method and ACI based decentral-ized reinforcement learning robust optimal tracking controlmethod have been adopted respectively
From Figures 7 8 9 10 11 12 13 and 14 the joint trackingand error curves are shown by using classic RBF neural net-work [4] to compensate the effect of the dynamic nonlinearterm and the interconnection term in the subsystem Figures7 to 10 show that the actual output trajectories take about2 seconds to track the desired trajectories This is due tothe fact that the classical neural network method requires alonger training process and parameter adaptive process Theerror curves in Figures 11ndash14 show that the joint subsystem
q1L2
L3
L4
L1
Y
X
120572
q2
Figure 5 The analytic chart of configuration A
q2
L4
L2
L1
L3
Y
120572
X
q1
Figure 6 The analytic chart of configuration B
constraints cannot be well compensated by adopting the clas-sical neural control method for the reconfigurable modularrobot when the time varying constrains exist When thejoint output variables turn larger the reconfigurable modularsystem cannot exhibit a good robustness and the trackingerrors are larger than before
Figures 15 16 17 18 19 20 21 and 22 showed thejoint tracking and error curves by adopting ACI to identifythe optimal 119876-function optimal control policy and globaldynamic nonlinear terms of the HJB equation in the sub-system Figures 15 to 18 show that the actual output trajec-tories of the subsystems can track the desired trajectoriesin 05 seconds by using the proposed robust reinforcementlearning optimal control method This is due to the excellentidentifying ability of ACIwhich can identify the uncertaintiescontained in the subsystems in a short timeThe error curvesin Figures 19ndash22 show that the tracking error is very smalland it can converge to zero in a short time Besides whenthe proposed decentralized control method is adopted for thejoint subsystems the robustness is manifested
Figures 23 and 24 showed the 3D-tip trajectory curvesby using ACI algorithm These two figures show that theproposed decentralized control method can fully satisfythe accessibility requirement of the reconfigurable modularrobot Besides the singular displacements of the joint subsys-tems and the end-effector would not appear The parametersdefined in ACI are shown in Table 1
12 Mathematical Problems in Engineering
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
25
Time (s)
Join
t 1 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
minus1
minus05
Figure 7 Trajectory tracking curve of configuration A joint 1 withRBF neural network
0 1 2 3 4 5 6 7 8 9 1008
1
12
14
16
18
2
22
24
26
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 8 Trajectory tracking curve of configuration A joint 2 withRBF neural network
Table 1 Parameter list of action-critic-identifier
119896 120572 120592 1205781198861
1205781198862
120578119888
1205731
1205732
120574
800 300 0005 10 50 20 02 2 05
The simulation results show that compared with thesituation of using classic RBF neural network controllerthe decentralized robust optimal tracking control methodbased on ACI and reinforcement learning can be appliedinto different configurations of the time varying constrainedreconfigurable modular robot And the joint variables cantrack the desired trajectory within a very short time indifferent configurations and the fluctuation of the errorconvergence range is minimal
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus2
minus15
minus05
minus1
Desired trajectoryActual trajectory
Figure 9 Trajectory tracking curve of configuration B joint 1 withRBF neural network
0 1 2 3 4 5 6 7 8 9 1017
18
19
2
21
22
23
24
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 10 Trajectory tracking curve of configuration B joint 2 withRBF neural network
0 1 2 3 4 5 6 7 8 9 10
0
002
004
006
008
01
Time (s)
Join
t 1 er
ror (
rad)
minus01
minus008
minus006
minus004
minus002
Figure 11 Tracking error curve of configuration A joint 1 with RBFneural network
Mathematical Problems in Engineering 13
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
002
004
006
008
01
Join
t 2 er
ror (
rad)
minus01
minus008
minus006
minus004
minus002
Figure 12 Tracking error curve of configuration A joint 2 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10
0
001
002
003
004
005
Time (s)
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 13 Tracking error curve of configuration B joint 1 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 2 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 14 Tracking error curve of configuration B joint 2 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus1
minus05
Desired trajectoryActual trajectory
Figure 15 Trajectory tracking curve of configuration A joint 1 withACI based reinforcement learning
0 1 2 3 4 5 6 7 8 9 1008
1
12
14
16
18
2
22
24
26
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 16 Trajectory tracking curve of configuration A joint 2 withACI based reinforcement learning
5 Conclusions and Future Work
In this paper combining ACI with RBF neural network anovel decentralized reinforcement learning robust optimaltracking control theory has been proposed for time varyingconstrained reconfigurable modular robots Moreover thistheory is used to solve the problem of the continuoustime nonlinear optimal control policy for strongly coupleduncertainty robotic system Firstly we build the model ofsubsystem with the time varying external constraints anddescribed the global robot system as a synthesis of inter-connected subsystem Secondly ACI is used to estimate theHJB equation and the global uncertainty where a continuous-time optimal 119876-function is adopted to take the place oftraditional optimal value function the optimal 119876-function
14 Mathematical Problems in Engineering
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus15
minus05
minus2
minus1
Desired trajectoryActual trajectory
Figure 17 Trajectory tracking curve of configuration B joint 1 withACI based reinforcement learning
0 1 2 3 4 5 6 7 8 9 1017
18
19
2
21
22
23
24
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 18 Trajectory tracking curve of configuration B joint 2 withACI based reinforcement learning
and the optimal control police are approximated by critic-NNand action-NN and the global uncertainty is identified bythe identifier Thirdly we design a novel decentralized robustoptimal tracking controller so the desired trajectory can betracked and the tracking error could converge to zero infinite timeOn this basis two kinds of Lyapunov functions aredesigned to confirm the stability of ACI and the subsystemFinally in order to confirm the superiority of the proposedcontrol theory the comparative simulation examples havebeen presented combining two different configurations of therobot
In the future more complex configuration for timevarying external constrained reconfigurable modular robotcan be included Therefore the decentralized controller withhigher control and error precision will be considered
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 19 Tracking error curve of configuration A joint 1 with ACIbased reinforcement learning
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005Jo
int 2
erro
r (ra
d)
minus005
minus004
minus003
minus002
minus001
Figure 20 Tracking error curve of configuration A joint 2 with ACIbased reinforcement learning
0
0
1 2 3 4 5 6 7 8 9 10Time (s)
001
002
003
004
005
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 21 Tracking error curve of configuration B joint 1 with ACIbased reinforcement learning
Mathematical Problems in Engineering 15
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 2 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 22 Tracking error curve of configuration B joint 2 with ACIbased reinforcement learning
005
1
02 03 04 05 06 07
0
01
02
03
minus1
minus05minus02
minus01
xla
bel
y labelz label
Desired trajectoryActual trajectory
Figure 23 3D-tip trajectory curve of configuration A with ACI
005
1
035 036 037 038 039 04
006008
01012014016018
minus1
minus05
xla
bel
y labelz label
Desired trajectoryActual trajectory
Figure 24 3D-tip trajectory curve of configuration B with ACI
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work is financially supported by the National NaturalScience Foundation of China (Grant nos 61374051 and60974010) the Scientific Technological Development PlanProject in Jilin Province of China (Grant no 20110705) Thefirst author is funded by China Scholarship Council
References
[1] Y Li and B Dong ldquoDecentralized ADRC control for reconfig-urable manipulators based on VGSTA-ESO of sliding moderdquoINFORMATION vol 15 no 6 pp 2453ndash2466 2012
[2] Y LiM-C Zhu andY-C Li ldquoSimulation on robust neurofuzzycompensator for reconfigurable manipulator motion controlrdquoJournal of System Simulation vol 19 no 22 pp 5169ndash5174 2007
[3] M-C Zhu and Y-C Li ldquoDecentralized adaptive sliding modecontrol for reconfigurable manipulators using fuzzy logicrdquoJournal of Jilin University (Engineering and Technology Edition)vol 39 no 1 pp 170ndash176 2009
[4] L Zhu and Y Li ldquoDecentralized adaptive neural networkcontrol for reconfigurable manipulatorsrdquo in Proceedings of theChinese Control and Decision Conference (CCDC rsquo10) pp 1760ndash1765 Xuzhou China May 2010
[5] M Zhu Y Li and Y Li ldquoA new distributed control schemeof modular and reconfigurable robotsrdquo in Proceedings of theIEEE International Conference onMechatronics and Automation(ICMA rsquo07) pp 2622ndash2627 Harbin China August 2007
[6] M C Zhu Y Li and Y C Li ldquoObserver-based decentralizedadaptive fuzzy control for reconfigurable manipulatorrdquo Controland Decision vol 24 no 3 pp 429ndash434 2009
[7] S Richard and A G Barto Reinforcement Learning An Intro-duction The MIT Press London UK 1998
[8] F L Lewis and D Liu ReinForcement Learning and Approxi-mate Dynamic Programming For Feedback Control Wiley-IEEEPress New York NY USA 2012
[9] Y-K Xu and X-R Cao ldquoLebesgue-sampling-based optimalcontrol problems with time aggregationrdquo IEEE Transactions onAutomatic Control vol 56 no 5 pp 1097ndash1109 2011
[10] F L Lewis andD Vrabie ldquoReinforcement learning and adaptivedynamic programming for feedback controlrdquo IEEE Circuits andSystems Magazine vol 9 no 3 pp 32ndash50 2009
[11] X Xu H-G He and D Hu ldquoEfficient reinforcement learningusing recursive least-squares methodsrdquo Journal of ArtificialIntelligence Research vol 16 pp 259ndash292 2002
[12] H Zhang Q Wei and Y Luo ldquoA novel infinite-time optimaltracking control scheme for a class of discrete-time nonlinearsystems via the greedy HDP iteration algorithmrdquo IEEE Trans-actions on Systems Man and Cybernetics B vol 38 no 4 pp937ndash942 2008
[13] H Zhang R Song Q Wei and T Zhang ldquoOptimal trackingcontrol for a class of nonlinear discrete-time systems withtime delays based on heuristic dynamic programmingrdquo IEEETransactions on Neural Networks vol 22 no 12 pp 1851ndash18622011
16 Mathematical Problems in Engineering
[14] H Zhang X Xie and S Tong ldquoHomogenous polynomiallyparameter-dependent 119867
infinfilter designs of discrete-time fuzzy
systemsrdquo IEEE Transactions on Systems Man and CyberneticsB vol 41 no 5 pp 1313ndash1322 2011
[15] H Zhang L Cui X Zhang and Y Luo ldquoData-drivenrobust approximate optimal tracking control for unknown gen-eral nonlinear systems using adaptive dynamic programmingmethodrdquo IEEE Transactions on Neural Networks vol 22 no 12pp 2226ndash2236 2011
[16] J Zhang H Zhang Y Luo and H Liang ldquoOptimal controldesign for nonlinear systems adaptive dynamic programmingbased on fuzzy critic estimatorrdquo in The International JointConference on Neural Network pp 1ndash6 Brisbane Australia2012
[17] H Zhang D Gong B Chen and Z Liu ldquoSynchronization forcoupled neural networkswith interval delay a novel augmentedlyapunov-krasovskii functional methodrdquo IEEE Transactions onNeural Networks and Learning Systems vol 24 no 1 pp 58ndash702013
[18] S Bhasin K Dupree P M Patre and W E Dixon ldquoNeuralnetwork control of a robot interacting with an uncertain vis-coelastic environmentrdquo IEEE Transactions on Control SystemsTechnology vol 19 no 4 pp 947ndash955 2011
[19] S G Khan G Herrmann F Lewis T Pipe and C MelhuishldquoA Q-learning based Cartesian model reference compliancecontroller implementation for a humanoid robot armrdquo inProceedings of the IEEE 5th International Conference onRoboticsAutomation andMechatronics (RAM rsquo11) pp 214ndash219 QingdaoChina September 2011
[20] P K Patchaikani L Behera and G Prasad ldquoA single networkadaptive critic-based redundancy resolution scheme for robotmanipulatorsrdquo IEEE Transactions on Industrial Electronics vol59 no 8 pp 3241ndash3253 2012
[21] C J C H Watkins and P Dayan ldquoQ-learningrdquo MachineLearning vol 8 no 3-4 pp 279ndash292 1992
[22] C J CHWatkinsLearning fromdelayed rewards [PhD thesis]University of Cambridge Cambridge UK 1989
[23] F L Lewis and V L Syrmos Optimal Control John Wiley ampSons New York NY USA 1995
[24] M Sassano and A Astolfi ldquoDynamic approximate solutionsof the HJ inequality and of the HJB equation for input-affinenonlinear systemsrdquo IEEE Transactions on Automatic Controlvol 57 no 10 pp 2490ndash2503 2012
[25] Y X Wu and C Wang ldquoDeterministic learning based adaptivenetwork control of robot in task spacerdquoActa Automatica Sinicavol 39 no 1 pp 1ndash10 2013
[26] P M Patre W MacKunis K Kaiser and W E DixonldquoAsymptotic tracking for uncertain dynamic systems via amultilayer neural network feedforward and RISE feedbackcontrol structurerdquo IEEE Transactions on Automatic Control vol53 no 9 pp 2180ndash2185 2008
[27] B E Paden and S S Sastry ldquoA calculus for computing Filippovrsquosdifferential inclusion with application to the variable structurecontrol of robot manipulatorsrdquo IEEE Transactions on Circuitsand Systems vol 34 no 1 pp 73ndash82 1987
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Mathematical Problems in Engineering 11
Figure 3 Configuration A for simulation
Figure 4 Configuration B for simulation
Configuration B
1199101119889
= 0
1199102119889
= Θ (1199101119889 119905)
= arcsin [1198711sin (120572 (119905)) minus 119871
3sin (120572 (119905))
1198712
] + 120572 (119905)
(79)
Due to the limit of dynamic external constraints thevariety of joint 1 in configuration B is zero
In order to confirm that the adopted method can beapplied in different configurations and to verify the trackingperformance for the subsystem desired trajectory by usingthe ACI and RBF-NN based decentralized reinforcementlearning robust optimal tracking control method in this partthe comparative simulations which include the classic RBFneural network control method and ACI based decentral-ized reinforcement learning robust optimal tracking controlmethod have been adopted respectively
From Figures 7 8 9 10 11 12 13 and 14 the joint trackingand error curves are shown by using classic RBF neural net-work [4] to compensate the effect of the dynamic nonlinearterm and the interconnection term in the subsystem Figures7 to 10 show that the actual output trajectories take about2 seconds to track the desired trajectories This is due tothe fact that the classical neural network method requires alonger training process and parameter adaptive process Theerror curves in Figures 11ndash14 show that the joint subsystem
q1L2
L3
L4
L1
Y
X
120572
q2
Figure 5 The analytic chart of configuration A
q2
L4
L2
L1
L3
Y
120572
X
q1
Figure 6 The analytic chart of configuration B
constraints cannot be well compensated by adopting the clas-sical neural control method for the reconfigurable modularrobot when the time varying constrains exist When thejoint output variables turn larger the reconfigurable modularsystem cannot exhibit a good robustness and the trackingerrors are larger than before
Figures 15 16 17 18 19 20 21 and 22 showed thejoint tracking and error curves by adopting ACI to identifythe optimal 119876-function optimal control policy and globaldynamic nonlinear terms of the HJB equation in the sub-system Figures 15 to 18 show that the actual output trajec-tories of the subsystems can track the desired trajectoriesin 05 seconds by using the proposed robust reinforcementlearning optimal control method This is due to the excellentidentifying ability of ACIwhich can identify the uncertaintiescontained in the subsystems in a short timeThe error curvesin Figures 19ndash22 show that the tracking error is very smalland it can converge to zero in a short time Besides whenthe proposed decentralized control method is adopted for thejoint subsystems the robustness is manifested
Figures 23 and 24 showed the 3D-tip trajectory curvesby using ACI algorithm These two figures show that theproposed decentralized control method can fully satisfythe accessibility requirement of the reconfigurable modularrobot Besides the singular displacements of the joint subsys-tems and the end-effector would not appear The parametersdefined in ACI are shown in Table 1
12 Mathematical Problems in Engineering
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
25
Time (s)
Join
t 1 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
minus1
minus05
Figure 7 Trajectory tracking curve of configuration A joint 1 withRBF neural network
0 1 2 3 4 5 6 7 8 9 1008
1
12
14
16
18
2
22
24
26
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 8 Trajectory tracking curve of configuration A joint 2 withRBF neural network
Table 1 Parameter list of action-critic-identifier
119896 120572 120592 1205781198861
1205781198862
120578119888
1205731
1205732
120574
800 300 0005 10 50 20 02 2 05
The simulation results show that compared with thesituation of using classic RBF neural network controllerthe decentralized robust optimal tracking control methodbased on ACI and reinforcement learning can be appliedinto different configurations of the time varying constrainedreconfigurable modular robot And the joint variables cantrack the desired trajectory within a very short time indifferent configurations and the fluctuation of the errorconvergence range is minimal
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus2
minus15
minus05
minus1
Desired trajectoryActual trajectory
Figure 9 Trajectory tracking curve of configuration B joint 1 withRBF neural network
0 1 2 3 4 5 6 7 8 9 1017
18
19
2
21
22
23
24
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 10 Trajectory tracking curve of configuration B joint 2 withRBF neural network
0 1 2 3 4 5 6 7 8 9 10
0
002
004
006
008
01
Time (s)
Join
t 1 er
ror (
rad)
minus01
minus008
minus006
minus004
minus002
Figure 11 Tracking error curve of configuration A joint 1 with RBFneural network
Mathematical Problems in Engineering 13
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
002
004
006
008
01
Join
t 2 er
ror (
rad)
minus01
minus008
minus006
minus004
minus002
Figure 12 Tracking error curve of configuration A joint 2 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10
0
001
002
003
004
005
Time (s)
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 13 Tracking error curve of configuration B joint 1 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 2 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 14 Tracking error curve of configuration B joint 2 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus1
minus05
Desired trajectoryActual trajectory
Figure 15 Trajectory tracking curve of configuration A joint 1 withACI based reinforcement learning
0 1 2 3 4 5 6 7 8 9 1008
1
12
14
16
18
2
22
24
26
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 16 Trajectory tracking curve of configuration A joint 2 withACI based reinforcement learning
5 Conclusions and Future Work
In this paper combining ACI with RBF neural network anovel decentralized reinforcement learning robust optimaltracking control theory has been proposed for time varyingconstrained reconfigurable modular robots Moreover thistheory is used to solve the problem of the continuoustime nonlinear optimal control policy for strongly coupleduncertainty robotic system Firstly we build the model ofsubsystem with the time varying external constraints anddescribed the global robot system as a synthesis of inter-connected subsystem Secondly ACI is used to estimate theHJB equation and the global uncertainty where a continuous-time optimal 119876-function is adopted to take the place oftraditional optimal value function the optimal 119876-function
14 Mathematical Problems in Engineering
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus15
minus05
minus2
minus1
Desired trajectoryActual trajectory
Figure 17 Trajectory tracking curve of configuration B joint 1 withACI based reinforcement learning
0 1 2 3 4 5 6 7 8 9 1017
18
19
2
21
22
23
24
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 18 Trajectory tracking curve of configuration B joint 2 withACI based reinforcement learning
and the optimal control police are approximated by critic-NNand action-NN and the global uncertainty is identified bythe identifier Thirdly we design a novel decentralized robustoptimal tracking controller so the desired trajectory can betracked and the tracking error could converge to zero infinite timeOn this basis two kinds of Lyapunov functions aredesigned to confirm the stability of ACI and the subsystemFinally in order to confirm the superiority of the proposedcontrol theory the comparative simulation examples havebeen presented combining two different configurations of therobot
In the future more complex configuration for timevarying external constrained reconfigurable modular robotcan be included Therefore the decentralized controller withhigher control and error precision will be considered
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 19 Tracking error curve of configuration A joint 1 with ACIbased reinforcement learning
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005Jo
int 2
erro
r (ra
d)
minus005
minus004
minus003
minus002
minus001
Figure 20 Tracking error curve of configuration A joint 2 with ACIbased reinforcement learning
0
0
1 2 3 4 5 6 7 8 9 10Time (s)
001
002
003
004
005
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 21 Tracking error curve of configuration B joint 1 with ACIbased reinforcement learning
Mathematical Problems in Engineering 15
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 2 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 22 Tracking error curve of configuration B joint 2 with ACIbased reinforcement learning
005
1
02 03 04 05 06 07
0
01
02
03
minus1
minus05minus02
minus01
xla
bel
y labelz label
Desired trajectoryActual trajectory
Figure 23 3D-tip trajectory curve of configuration A with ACI
005
1
035 036 037 038 039 04
006008
01012014016018
minus1
minus05
xla
bel
y labelz label
Desired trajectoryActual trajectory
Figure 24 3D-tip trajectory curve of configuration B with ACI
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work is financially supported by the National NaturalScience Foundation of China (Grant nos 61374051 and60974010) the Scientific Technological Development PlanProject in Jilin Province of China (Grant no 20110705) Thefirst author is funded by China Scholarship Council
References
[1] Y Li and B Dong ldquoDecentralized ADRC control for reconfig-urable manipulators based on VGSTA-ESO of sliding moderdquoINFORMATION vol 15 no 6 pp 2453ndash2466 2012
[2] Y LiM-C Zhu andY-C Li ldquoSimulation on robust neurofuzzycompensator for reconfigurable manipulator motion controlrdquoJournal of System Simulation vol 19 no 22 pp 5169ndash5174 2007
[3] M-C Zhu and Y-C Li ldquoDecentralized adaptive sliding modecontrol for reconfigurable manipulators using fuzzy logicrdquoJournal of Jilin University (Engineering and Technology Edition)vol 39 no 1 pp 170ndash176 2009
[4] L Zhu and Y Li ldquoDecentralized adaptive neural networkcontrol for reconfigurable manipulatorsrdquo in Proceedings of theChinese Control and Decision Conference (CCDC rsquo10) pp 1760ndash1765 Xuzhou China May 2010
[5] M Zhu Y Li and Y Li ldquoA new distributed control schemeof modular and reconfigurable robotsrdquo in Proceedings of theIEEE International Conference onMechatronics and Automation(ICMA rsquo07) pp 2622ndash2627 Harbin China August 2007
[6] M C Zhu Y Li and Y C Li ldquoObserver-based decentralizedadaptive fuzzy control for reconfigurable manipulatorrdquo Controland Decision vol 24 no 3 pp 429ndash434 2009
[7] S Richard and A G Barto Reinforcement Learning An Intro-duction The MIT Press London UK 1998
[8] F L Lewis and D Liu ReinForcement Learning and Approxi-mate Dynamic Programming For Feedback Control Wiley-IEEEPress New York NY USA 2012
[9] Y-K Xu and X-R Cao ldquoLebesgue-sampling-based optimalcontrol problems with time aggregationrdquo IEEE Transactions onAutomatic Control vol 56 no 5 pp 1097ndash1109 2011
[10] F L Lewis andD Vrabie ldquoReinforcement learning and adaptivedynamic programming for feedback controlrdquo IEEE Circuits andSystems Magazine vol 9 no 3 pp 32ndash50 2009
[11] X Xu H-G He and D Hu ldquoEfficient reinforcement learningusing recursive least-squares methodsrdquo Journal of ArtificialIntelligence Research vol 16 pp 259ndash292 2002
[12] H Zhang Q Wei and Y Luo ldquoA novel infinite-time optimaltracking control scheme for a class of discrete-time nonlinearsystems via the greedy HDP iteration algorithmrdquo IEEE Trans-actions on Systems Man and Cybernetics B vol 38 no 4 pp937ndash942 2008
[13] H Zhang R Song Q Wei and T Zhang ldquoOptimal trackingcontrol for a class of nonlinear discrete-time systems withtime delays based on heuristic dynamic programmingrdquo IEEETransactions on Neural Networks vol 22 no 12 pp 1851ndash18622011
16 Mathematical Problems in Engineering
[14] H Zhang X Xie and S Tong ldquoHomogenous polynomiallyparameter-dependent 119867
infinfilter designs of discrete-time fuzzy
systemsrdquo IEEE Transactions on Systems Man and CyberneticsB vol 41 no 5 pp 1313ndash1322 2011
[15] H Zhang L Cui X Zhang and Y Luo ldquoData-drivenrobust approximate optimal tracking control for unknown gen-eral nonlinear systems using adaptive dynamic programmingmethodrdquo IEEE Transactions on Neural Networks vol 22 no 12pp 2226ndash2236 2011
[16] J Zhang H Zhang Y Luo and H Liang ldquoOptimal controldesign for nonlinear systems adaptive dynamic programmingbased on fuzzy critic estimatorrdquo in The International JointConference on Neural Network pp 1ndash6 Brisbane Australia2012
[17] H Zhang D Gong B Chen and Z Liu ldquoSynchronization forcoupled neural networkswith interval delay a novel augmentedlyapunov-krasovskii functional methodrdquo IEEE Transactions onNeural Networks and Learning Systems vol 24 no 1 pp 58ndash702013
[18] S Bhasin K Dupree P M Patre and W E Dixon ldquoNeuralnetwork control of a robot interacting with an uncertain vis-coelastic environmentrdquo IEEE Transactions on Control SystemsTechnology vol 19 no 4 pp 947ndash955 2011
[19] S G Khan G Herrmann F Lewis T Pipe and C MelhuishldquoA Q-learning based Cartesian model reference compliancecontroller implementation for a humanoid robot armrdquo inProceedings of the IEEE 5th International Conference onRoboticsAutomation andMechatronics (RAM rsquo11) pp 214ndash219 QingdaoChina September 2011
[20] P K Patchaikani L Behera and G Prasad ldquoA single networkadaptive critic-based redundancy resolution scheme for robotmanipulatorsrdquo IEEE Transactions on Industrial Electronics vol59 no 8 pp 3241ndash3253 2012
[21] C J C H Watkins and P Dayan ldquoQ-learningrdquo MachineLearning vol 8 no 3-4 pp 279ndash292 1992
[22] C J CHWatkinsLearning fromdelayed rewards [PhD thesis]University of Cambridge Cambridge UK 1989
[23] F L Lewis and V L Syrmos Optimal Control John Wiley ampSons New York NY USA 1995
[24] M Sassano and A Astolfi ldquoDynamic approximate solutionsof the HJ inequality and of the HJB equation for input-affinenonlinear systemsrdquo IEEE Transactions on Automatic Controlvol 57 no 10 pp 2490ndash2503 2012
[25] Y X Wu and C Wang ldquoDeterministic learning based adaptivenetwork control of robot in task spacerdquoActa Automatica Sinicavol 39 no 1 pp 1ndash10 2013
[26] P M Patre W MacKunis K Kaiser and W E DixonldquoAsymptotic tracking for uncertain dynamic systems via amultilayer neural network feedforward and RISE feedbackcontrol structurerdquo IEEE Transactions on Automatic Control vol53 no 9 pp 2180ndash2185 2008
[27] B E Paden and S S Sastry ldquoA calculus for computing Filippovrsquosdifferential inclusion with application to the variable structurecontrol of robot manipulatorsrdquo IEEE Transactions on Circuitsand Systems vol 34 no 1 pp 73ndash82 1987
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
12 Mathematical Problems in Engineering
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
25
Time (s)
Join
t 1 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
minus1
minus05
Figure 7 Trajectory tracking curve of configuration A joint 1 withRBF neural network
0 1 2 3 4 5 6 7 8 9 1008
1
12
14
16
18
2
22
24
26
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 8 Trajectory tracking curve of configuration A joint 2 withRBF neural network
Table 1 Parameter list of action-critic-identifier
119896 120572 120592 1205781198861
1205781198862
120578119888
1205731
1205732
120574
800 300 0005 10 50 20 02 2 05
The simulation results show that compared with thesituation of using classic RBF neural network controllerthe decentralized robust optimal tracking control methodbased on ACI and reinforcement learning can be appliedinto different configurations of the time varying constrainedreconfigurable modular robot And the joint variables cantrack the desired trajectory within a very short time indifferent configurations and the fluctuation of the errorconvergence range is minimal
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus2
minus15
minus05
minus1
Desired trajectoryActual trajectory
Figure 9 Trajectory tracking curve of configuration B joint 1 withRBF neural network
0 1 2 3 4 5 6 7 8 9 1017
18
19
2
21
22
23
24
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 10 Trajectory tracking curve of configuration B joint 2 withRBF neural network
0 1 2 3 4 5 6 7 8 9 10
0
002
004
006
008
01
Time (s)
Join
t 1 er
ror (
rad)
minus01
minus008
minus006
minus004
minus002
Figure 11 Tracking error curve of configuration A joint 1 with RBFneural network
Mathematical Problems in Engineering 13
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
002
004
006
008
01
Join
t 2 er
ror (
rad)
minus01
minus008
minus006
minus004
minus002
Figure 12 Tracking error curve of configuration A joint 2 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10
0
001
002
003
004
005
Time (s)
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 13 Tracking error curve of configuration B joint 1 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 2 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 14 Tracking error curve of configuration B joint 2 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus1
minus05
Desired trajectoryActual trajectory
Figure 15 Trajectory tracking curve of configuration A joint 1 withACI based reinforcement learning
0 1 2 3 4 5 6 7 8 9 1008
1
12
14
16
18
2
22
24
26
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 16 Trajectory tracking curve of configuration A joint 2 withACI based reinforcement learning
5 Conclusions and Future Work
In this paper combining ACI with RBF neural network anovel decentralized reinforcement learning robust optimaltracking control theory has been proposed for time varyingconstrained reconfigurable modular robots Moreover thistheory is used to solve the problem of the continuoustime nonlinear optimal control policy for strongly coupleduncertainty robotic system Firstly we build the model ofsubsystem with the time varying external constraints anddescribed the global robot system as a synthesis of inter-connected subsystem Secondly ACI is used to estimate theHJB equation and the global uncertainty where a continuous-time optimal 119876-function is adopted to take the place oftraditional optimal value function the optimal 119876-function
14 Mathematical Problems in Engineering
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus15
minus05
minus2
minus1
Desired trajectoryActual trajectory
Figure 17 Trajectory tracking curve of configuration B joint 1 withACI based reinforcement learning
0 1 2 3 4 5 6 7 8 9 1017
18
19
2
21
22
23
24
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 18 Trajectory tracking curve of configuration B joint 2 withACI based reinforcement learning
and the optimal control police are approximated by critic-NNand action-NN and the global uncertainty is identified bythe identifier Thirdly we design a novel decentralized robustoptimal tracking controller so the desired trajectory can betracked and the tracking error could converge to zero infinite timeOn this basis two kinds of Lyapunov functions aredesigned to confirm the stability of ACI and the subsystemFinally in order to confirm the superiority of the proposedcontrol theory the comparative simulation examples havebeen presented combining two different configurations of therobot
In the future more complex configuration for timevarying external constrained reconfigurable modular robotcan be included Therefore the decentralized controller withhigher control and error precision will be considered
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 19 Tracking error curve of configuration A joint 1 with ACIbased reinforcement learning
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005Jo
int 2
erro
r (ra
d)
minus005
minus004
minus003
minus002
minus001
Figure 20 Tracking error curve of configuration A joint 2 with ACIbased reinforcement learning
0
0
1 2 3 4 5 6 7 8 9 10Time (s)
001
002
003
004
005
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 21 Tracking error curve of configuration B joint 1 with ACIbased reinforcement learning
Mathematical Problems in Engineering 15
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 2 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 22 Tracking error curve of configuration B joint 2 with ACIbased reinforcement learning
005
1
02 03 04 05 06 07
0
01
02
03
minus1
minus05minus02
minus01
xla
bel
y labelz label
Desired trajectoryActual trajectory
Figure 23 3D-tip trajectory curve of configuration A with ACI
005
1
035 036 037 038 039 04
006008
01012014016018
minus1
minus05
xla
bel
y labelz label
Desired trajectoryActual trajectory
Figure 24 3D-tip trajectory curve of configuration B with ACI
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work is financially supported by the National NaturalScience Foundation of China (Grant nos 61374051 and60974010) the Scientific Technological Development PlanProject in Jilin Province of China (Grant no 20110705) Thefirst author is funded by China Scholarship Council
References
[1] Y Li and B Dong ldquoDecentralized ADRC control for reconfig-urable manipulators based on VGSTA-ESO of sliding moderdquoINFORMATION vol 15 no 6 pp 2453ndash2466 2012
[2] Y LiM-C Zhu andY-C Li ldquoSimulation on robust neurofuzzycompensator for reconfigurable manipulator motion controlrdquoJournal of System Simulation vol 19 no 22 pp 5169ndash5174 2007
[3] M-C Zhu and Y-C Li ldquoDecentralized adaptive sliding modecontrol for reconfigurable manipulators using fuzzy logicrdquoJournal of Jilin University (Engineering and Technology Edition)vol 39 no 1 pp 170ndash176 2009
[4] L Zhu and Y Li ldquoDecentralized adaptive neural networkcontrol for reconfigurable manipulatorsrdquo in Proceedings of theChinese Control and Decision Conference (CCDC rsquo10) pp 1760ndash1765 Xuzhou China May 2010
[5] M Zhu Y Li and Y Li ldquoA new distributed control schemeof modular and reconfigurable robotsrdquo in Proceedings of theIEEE International Conference onMechatronics and Automation(ICMA rsquo07) pp 2622ndash2627 Harbin China August 2007
[6] M C Zhu Y Li and Y C Li ldquoObserver-based decentralizedadaptive fuzzy control for reconfigurable manipulatorrdquo Controland Decision vol 24 no 3 pp 429ndash434 2009
[7] S Richard and A G Barto Reinforcement Learning An Intro-duction The MIT Press London UK 1998
[8] F L Lewis and D Liu ReinForcement Learning and Approxi-mate Dynamic Programming For Feedback Control Wiley-IEEEPress New York NY USA 2012
[9] Y-K Xu and X-R Cao ldquoLebesgue-sampling-based optimalcontrol problems with time aggregationrdquo IEEE Transactions onAutomatic Control vol 56 no 5 pp 1097ndash1109 2011
[10] F L Lewis andD Vrabie ldquoReinforcement learning and adaptivedynamic programming for feedback controlrdquo IEEE Circuits andSystems Magazine vol 9 no 3 pp 32ndash50 2009
[11] X Xu H-G He and D Hu ldquoEfficient reinforcement learningusing recursive least-squares methodsrdquo Journal of ArtificialIntelligence Research vol 16 pp 259ndash292 2002
[12] H Zhang Q Wei and Y Luo ldquoA novel infinite-time optimaltracking control scheme for a class of discrete-time nonlinearsystems via the greedy HDP iteration algorithmrdquo IEEE Trans-actions on Systems Man and Cybernetics B vol 38 no 4 pp937ndash942 2008
[13] H Zhang R Song Q Wei and T Zhang ldquoOptimal trackingcontrol for a class of nonlinear discrete-time systems withtime delays based on heuristic dynamic programmingrdquo IEEETransactions on Neural Networks vol 22 no 12 pp 1851ndash18622011
16 Mathematical Problems in Engineering
[14] H Zhang X Xie and S Tong ldquoHomogenous polynomiallyparameter-dependent 119867
infinfilter designs of discrete-time fuzzy
systemsrdquo IEEE Transactions on Systems Man and CyberneticsB vol 41 no 5 pp 1313ndash1322 2011
[15] H Zhang L Cui X Zhang and Y Luo ldquoData-drivenrobust approximate optimal tracking control for unknown gen-eral nonlinear systems using adaptive dynamic programmingmethodrdquo IEEE Transactions on Neural Networks vol 22 no 12pp 2226ndash2236 2011
[16] J Zhang H Zhang Y Luo and H Liang ldquoOptimal controldesign for nonlinear systems adaptive dynamic programmingbased on fuzzy critic estimatorrdquo in The International JointConference on Neural Network pp 1ndash6 Brisbane Australia2012
[17] H Zhang D Gong B Chen and Z Liu ldquoSynchronization forcoupled neural networkswith interval delay a novel augmentedlyapunov-krasovskii functional methodrdquo IEEE Transactions onNeural Networks and Learning Systems vol 24 no 1 pp 58ndash702013
[18] S Bhasin K Dupree P M Patre and W E Dixon ldquoNeuralnetwork control of a robot interacting with an uncertain vis-coelastic environmentrdquo IEEE Transactions on Control SystemsTechnology vol 19 no 4 pp 947ndash955 2011
[19] S G Khan G Herrmann F Lewis T Pipe and C MelhuishldquoA Q-learning based Cartesian model reference compliancecontroller implementation for a humanoid robot armrdquo inProceedings of the IEEE 5th International Conference onRoboticsAutomation andMechatronics (RAM rsquo11) pp 214ndash219 QingdaoChina September 2011
[20] P K Patchaikani L Behera and G Prasad ldquoA single networkadaptive critic-based redundancy resolution scheme for robotmanipulatorsrdquo IEEE Transactions on Industrial Electronics vol59 no 8 pp 3241ndash3253 2012
[21] C J C H Watkins and P Dayan ldquoQ-learningrdquo MachineLearning vol 8 no 3-4 pp 279ndash292 1992
[22] C J CHWatkinsLearning fromdelayed rewards [PhD thesis]University of Cambridge Cambridge UK 1989
[23] F L Lewis and V L Syrmos Optimal Control John Wiley ampSons New York NY USA 1995
[24] M Sassano and A Astolfi ldquoDynamic approximate solutionsof the HJ inequality and of the HJB equation for input-affinenonlinear systemsrdquo IEEE Transactions on Automatic Controlvol 57 no 10 pp 2490ndash2503 2012
[25] Y X Wu and C Wang ldquoDeterministic learning based adaptivenetwork control of robot in task spacerdquoActa Automatica Sinicavol 39 no 1 pp 1ndash10 2013
[26] P M Patre W MacKunis K Kaiser and W E DixonldquoAsymptotic tracking for uncertain dynamic systems via amultilayer neural network feedforward and RISE feedbackcontrol structurerdquo IEEE Transactions on Automatic Control vol53 no 9 pp 2180ndash2185 2008
[27] B E Paden and S S Sastry ldquoA calculus for computing Filippovrsquosdifferential inclusion with application to the variable structurecontrol of robot manipulatorsrdquo IEEE Transactions on Circuitsand Systems vol 34 no 1 pp 73ndash82 1987
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Mathematical Problems in Engineering 13
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
002
004
006
008
01
Join
t 2 er
ror (
rad)
minus01
minus008
minus006
minus004
minus002
Figure 12 Tracking error curve of configuration A joint 2 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10
0
001
002
003
004
005
Time (s)
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 13 Tracking error curve of configuration B joint 1 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 2 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 14 Tracking error curve of configuration B joint 2 with RBFneural network
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus1
minus05
Desired trajectoryActual trajectory
Figure 15 Trajectory tracking curve of configuration A joint 1 withACI based reinforcement learning
0 1 2 3 4 5 6 7 8 9 1008
1
12
14
16
18
2
22
24
26
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 16 Trajectory tracking curve of configuration A joint 2 withACI based reinforcement learning
5 Conclusions and Future Work
In this paper combining ACI with RBF neural network anovel decentralized reinforcement learning robust optimaltracking control theory has been proposed for time varyingconstrained reconfigurable modular robots Moreover thistheory is used to solve the problem of the continuoustime nonlinear optimal control policy for strongly coupleduncertainty robotic system Firstly we build the model ofsubsystem with the time varying external constraints anddescribed the global robot system as a synthesis of inter-connected subsystem Secondly ACI is used to estimate theHJB equation and the global uncertainty where a continuous-time optimal 119876-function is adopted to take the place oftraditional optimal value function the optimal 119876-function
14 Mathematical Problems in Engineering
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus15
minus05
minus2
minus1
Desired trajectoryActual trajectory
Figure 17 Trajectory tracking curve of configuration B joint 1 withACI based reinforcement learning
0 1 2 3 4 5 6 7 8 9 1017
18
19
2
21
22
23
24
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 18 Trajectory tracking curve of configuration B joint 2 withACI based reinforcement learning
and the optimal control police are approximated by critic-NNand action-NN and the global uncertainty is identified bythe identifier Thirdly we design a novel decentralized robustoptimal tracking controller so the desired trajectory can betracked and the tracking error could converge to zero infinite timeOn this basis two kinds of Lyapunov functions aredesigned to confirm the stability of ACI and the subsystemFinally in order to confirm the superiority of the proposedcontrol theory the comparative simulation examples havebeen presented combining two different configurations of therobot
In the future more complex configuration for timevarying external constrained reconfigurable modular robotcan be included Therefore the decentralized controller withhigher control and error precision will be considered
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 19 Tracking error curve of configuration A joint 1 with ACIbased reinforcement learning
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005Jo
int 2
erro
r (ra
d)
minus005
minus004
minus003
minus002
minus001
Figure 20 Tracking error curve of configuration A joint 2 with ACIbased reinforcement learning
0
0
1 2 3 4 5 6 7 8 9 10Time (s)
001
002
003
004
005
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 21 Tracking error curve of configuration B joint 1 with ACIbased reinforcement learning
Mathematical Problems in Engineering 15
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 2 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 22 Tracking error curve of configuration B joint 2 with ACIbased reinforcement learning
005
1
02 03 04 05 06 07
0
01
02
03
minus1
minus05minus02
minus01
xla
bel
y labelz label
Desired trajectoryActual trajectory
Figure 23 3D-tip trajectory curve of configuration A with ACI
005
1
035 036 037 038 039 04
006008
01012014016018
minus1
minus05
xla
bel
y labelz label
Desired trajectoryActual trajectory
Figure 24 3D-tip trajectory curve of configuration B with ACI
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work is financially supported by the National NaturalScience Foundation of China (Grant nos 61374051 and60974010) the Scientific Technological Development PlanProject in Jilin Province of China (Grant no 20110705) Thefirst author is funded by China Scholarship Council
References
[1] Y Li and B Dong ldquoDecentralized ADRC control for reconfig-urable manipulators based on VGSTA-ESO of sliding moderdquoINFORMATION vol 15 no 6 pp 2453ndash2466 2012
[2] Y LiM-C Zhu andY-C Li ldquoSimulation on robust neurofuzzycompensator for reconfigurable manipulator motion controlrdquoJournal of System Simulation vol 19 no 22 pp 5169ndash5174 2007
[3] M-C Zhu and Y-C Li ldquoDecentralized adaptive sliding modecontrol for reconfigurable manipulators using fuzzy logicrdquoJournal of Jilin University (Engineering and Technology Edition)vol 39 no 1 pp 170ndash176 2009
[4] L Zhu and Y Li ldquoDecentralized adaptive neural networkcontrol for reconfigurable manipulatorsrdquo in Proceedings of theChinese Control and Decision Conference (CCDC rsquo10) pp 1760ndash1765 Xuzhou China May 2010
[5] M Zhu Y Li and Y Li ldquoA new distributed control schemeof modular and reconfigurable robotsrdquo in Proceedings of theIEEE International Conference onMechatronics and Automation(ICMA rsquo07) pp 2622ndash2627 Harbin China August 2007
[6] M C Zhu Y Li and Y C Li ldquoObserver-based decentralizedadaptive fuzzy control for reconfigurable manipulatorrdquo Controland Decision vol 24 no 3 pp 429ndash434 2009
[7] S Richard and A G Barto Reinforcement Learning An Intro-duction The MIT Press London UK 1998
[8] F L Lewis and D Liu ReinForcement Learning and Approxi-mate Dynamic Programming For Feedback Control Wiley-IEEEPress New York NY USA 2012
[9] Y-K Xu and X-R Cao ldquoLebesgue-sampling-based optimalcontrol problems with time aggregationrdquo IEEE Transactions onAutomatic Control vol 56 no 5 pp 1097ndash1109 2011
[10] F L Lewis andD Vrabie ldquoReinforcement learning and adaptivedynamic programming for feedback controlrdquo IEEE Circuits andSystems Magazine vol 9 no 3 pp 32ndash50 2009
[11] X Xu H-G He and D Hu ldquoEfficient reinforcement learningusing recursive least-squares methodsrdquo Journal of ArtificialIntelligence Research vol 16 pp 259ndash292 2002
[12] H Zhang Q Wei and Y Luo ldquoA novel infinite-time optimaltracking control scheme for a class of discrete-time nonlinearsystems via the greedy HDP iteration algorithmrdquo IEEE Trans-actions on Systems Man and Cybernetics B vol 38 no 4 pp937ndash942 2008
[13] H Zhang R Song Q Wei and T Zhang ldquoOptimal trackingcontrol for a class of nonlinear discrete-time systems withtime delays based on heuristic dynamic programmingrdquo IEEETransactions on Neural Networks vol 22 no 12 pp 1851ndash18622011
16 Mathematical Problems in Engineering
[14] H Zhang X Xie and S Tong ldquoHomogenous polynomiallyparameter-dependent 119867
infinfilter designs of discrete-time fuzzy
systemsrdquo IEEE Transactions on Systems Man and CyberneticsB vol 41 no 5 pp 1313ndash1322 2011
[15] H Zhang L Cui X Zhang and Y Luo ldquoData-drivenrobust approximate optimal tracking control for unknown gen-eral nonlinear systems using adaptive dynamic programmingmethodrdquo IEEE Transactions on Neural Networks vol 22 no 12pp 2226ndash2236 2011
[16] J Zhang H Zhang Y Luo and H Liang ldquoOptimal controldesign for nonlinear systems adaptive dynamic programmingbased on fuzzy critic estimatorrdquo in The International JointConference on Neural Network pp 1ndash6 Brisbane Australia2012
[17] H Zhang D Gong B Chen and Z Liu ldquoSynchronization forcoupled neural networkswith interval delay a novel augmentedlyapunov-krasovskii functional methodrdquo IEEE Transactions onNeural Networks and Learning Systems vol 24 no 1 pp 58ndash702013
[18] S Bhasin K Dupree P M Patre and W E Dixon ldquoNeuralnetwork control of a robot interacting with an uncertain vis-coelastic environmentrdquo IEEE Transactions on Control SystemsTechnology vol 19 no 4 pp 947ndash955 2011
[19] S G Khan G Herrmann F Lewis T Pipe and C MelhuishldquoA Q-learning based Cartesian model reference compliancecontroller implementation for a humanoid robot armrdquo inProceedings of the IEEE 5th International Conference onRoboticsAutomation andMechatronics (RAM rsquo11) pp 214ndash219 QingdaoChina September 2011
[20] P K Patchaikani L Behera and G Prasad ldquoA single networkadaptive critic-based redundancy resolution scheme for robotmanipulatorsrdquo IEEE Transactions on Industrial Electronics vol59 no 8 pp 3241ndash3253 2012
[21] C J C H Watkins and P Dayan ldquoQ-learningrdquo MachineLearning vol 8 no 3-4 pp 279ndash292 1992
[22] C J CHWatkinsLearning fromdelayed rewards [PhD thesis]University of Cambridge Cambridge UK 1989
[23] F L Lewis and V L Syrmos Optimal Control John Wiley ampSons New York NY USA 1995
[24] M Sassano and A Astolfi ldquoDynamic approximate solutionsof the HJ inequality and of the HJB equation for input-affinenonlinear systemsrdquo IEEE Transactions on Automatic Controlvol 57 no 10 pp 2490ndash2503 2012
[25] Y X Wu and C Wang ldquoDeterministic learning based adaptivenetwork control of robot in task spacerdquoActa Automatica Sinicavol 39 no 1 pp 1ndash10 2013
[26] P M Patre W MacKunis K Kaiser and W E DixonldquoAsymptotic tracking for uncertain dynamic systems via amultilayer neural network feedforward and RISE feedbackcontrol structurerdquo IEEE Transactions on Automatic Control vol53 no 9 pp 2180ndash2185 2008
[27] B E Paden and S S Sastry ldquoA calculus for computing Filippovrsquosdifferential inclusion with application to the variable structurecontrol of robot manipulatorsrdquo IEEE Transactions on Circuitsand Systems vol 34 no 1 pp 73ndash82 1987
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
14 Mathematical Problems in Engineering
0 1 2 3 4 5 6 7 8 9 10
0
05
1
15
2
Time (s)
Join
t 1 p
ositi
on (r
ad)
minus15
minus05
minus2
minus1
Desired trajectoryActual trajectory
Figure 17 Trajectory tracking curve of configuration B joint 1 withACI based reinforcement learning
0 1 2 3 4 5 6 7 8 9 1017
18
19
2
21
22
23
24
Time (s)
Join
t 2 p
ositi
on (r
ad)
Desired trajectoryActual trajectory
Figure 18 Trajectory tracking curve of configuration B joint 2 withACI based reinforcement learning
and the optimal control police are approximated by critic-NNand action-NN and the global uncertainty is identified bythe identifier Thirdly we design a novel decentralized robustoptimal tracking controller so the desired trajectory can betracked and the tracking error could converge to zero infinite timeOn this basis two kinds of Lyapunov functions aredesigned to confirm the stability of ACI and the subsystemFinally in order to confirm the superiority of the proposedcontrol theory the comparative simulation examples havebeen presented combining two different configurations of therobot
In the future more complex configuration for timevarying external constrained reconfigurable modular robotcan be included Therefore the decentralized controller withhigher control and error precision will be considered
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 19 Tracking error curve of configuration A joint 1 with ACIbased reinforcement learning
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005Jo
int 2
erro
r (ra
d)
minus005
minus004
minus003
minus002
minus001
Figure 20 Tracking error curve of configuration A joint 2 with ACIbased reinforcement learning
0
0
1 2 3 4 5 6 7 8 9 10Time (s)
001
002
003
004
005
Join
t 1 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 21 Tracking error curve of configuration B joint 1 with ACIbased reinforcement learning
Mathematical Problems in Engineering 15
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 2 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 22 Tracking error curve of configuration B joint 2 with ACIbased reinforcement learning
005
1
02 03 04 05 06 07
0
01
02
03
minus1
minus05minus02
minus01
xla
bel
y labelz label
Desired trajectoryActual trajectory
Figure 23 3D-tip trajectory curve of configuration A with ACI
005
1
035 036 037 038 039 04
006008
01012014016018
minus1
minus05
xla
bel
y labelz label
Desired trajectoryActual trajectory
Figure 24 3D-tip trajectory curve of configuration B with ACI
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work is financially supported by the National NaturalScience Foundation of China (Grant nos 61374051 and60974010) the Scientific Technological Development PlanProject in Jilin Province of China (Grant no 20110705) Thefirst author is funded by China Scholarship Council
References
[1] Y Li and B Dong ldquoDecentralized ADRC control for reconfig-urable manipulators based on VGSTA-ESO of sliding moderdquoINFORMATION vol 15 no 6 pp 2453ndash2466 2012
[2] Y LiM-C Zhu andY-C Li ldquoSimulation on robust neurofuzzycompensator for reconfigurable manipulator motion controlrdquoJournal of System Simulation vol 19 no 22 pp 5169ndash5174 2007
[3] M-C Zhu and Y-C Li ldquoDecentralized adaptive sliding modecontrol for reconfigurable manipulators using fuzzy logicrdquoJournal of Jilin University (Engineering and Technology Edition)vol 39 no 1 pp 170ndash176 2009
[4] L Zhu and Y Li ldquoDecentralized adaptive neural networkcontrol for reconfigurable manipulatorsrdquo in Proceedings of theChinese Control and Decision Conference (CCDC rsquo10) pp 1760ndash1765 Xuzhou China May 2010
[5] M Zhu Y Li and Y Li ldquoA new distributed control schemeof modular and reconfigurable robotsrdquo in Proceedings of theIEEE International Conference onMechatronics and Automation(ICMA rsquo07) pp 2622ndash2627 Harbin China August 2007
[6] M C Zhu Y Li and Y C Li ldquoObserver-based decentralizedadaptive fuzzy control for reconfigurable manipulatorrdquo Controland Decision vol 24 no 3 pp 429ndash434 2009
[7] S Richard and A G Barto Reinforcement Learning An Intro-duction The MIT Press London UK 1998
[8] F L Lewis and D Liu ReinForcement Learning and Approxi-mate Dynamic Programming For Feedback Control Wiley-IEEEPress New York NY USA 2012
[9] Y-K Xu and X-R Cao ldquoLebesgue-sampling-based optimalcontrol problems with time aggregationrdquo IEEE Transactions onAutomatic Control vol 56 no 5 pp 1097ndash1109 2011
[10] F L Lewis andD Vrabie ldquoReinforcement learning and adaptivedynamic programming for feedback controlrdquo IEEE Circuits andSystems Magazine vol 9 no 3 pp 32ndash50 2009
[11] X Xu H-G He and D Hu ldquoEfficient reinforcement learningusing recursive least-squares methodsrdquo Journal of ArtificialIntelligence Research vol 16 pp 259ndash292 2002
[12] H Zhang Q Wei and Y Luo ldquoA novel infinite-time optimaltracking control scheme for a class of discrete-time nonlinearsystems via the greedy HDP iteration algorithmrdquo IEEE Trans-actions on Systems Man and Cybernetics B vol 38 no 4 pp937ndash942 2008
[13] H Zhang R Song Q Wei and T Zhang ldquoOptimal trackingcontrol for a class of nonlinear discrete-time systems withtime delays based on heuristic dynamic programmingrdquo IEEETransactions on Neural Networks vol 22 no 12 pp 1851ndash18622011
16 Mathematical Problems in Engineering
[14] H Zhang X Xie and S Tong ldquoHomogenous polynomiallyparameter-dependent 119867
infinfilter designs of discrete-time fuzzy
systemsrdquo IEEE Transactions on Systems Man and CyberneticsB vol 41 no 5 pp 1313ndash1322 2011
[15] H Zhang L Cui X Zhang and Y Luo ldquoData-drivenrobust approximate optimal tracking control for unknown gen-eral nonlinear systems using adaptive dynamic programmingmethodrdquo IEEE Transactions on Neural Networks vol 22 no 12pp 2226ndash2236 2011
[16] J Zhang H Zhang Y Luo and H Liang ldquoOptimal controldesign for nonlinear systems adaptive dynamic programmingbased on fuzzy critic estimatorrdquo in The International JointConference on Neural Network pp 1ndash6 Brisbane Australia2012
[17] H Zhang D Gong B Chen and Z Liu ldquoSynchronization forcoupled neural networkswith interval delay a novel augmentedlyapunov-krasovskii functional methodrdquo IEEE Transactions onNeural Networks and Learning Systems vol 24 no 1 pp 58ndash702013
[18] S Bhasin K Dupree P M Patre and W E Dixon ldquoNeuralnetwork control of a robot interacting with an uncertain vis-coelastic environmentrdquo IEEE Transactions on Control SystemsTechnology vol 19 no 4 pp 947ndash955 2011
[19] S G Khan G Herrmann F Lewis T Pipe and C MelhuishldquoA Q-learning based Cartesian model reference compliancecontroller implementation for a humanoid robot armrdquo inProceedings of the IEEE 5th International Conference onRoboticsAutomation andMechatronics (RAM rsquo11) pp 214ndash219 QingdaoChina September 2011
[20] P K Patchaikani L Behera and G Prasad ldquoA single networkadaptive critic-based redundancy resolution scheme for robotmanipulatorsrdquo IEEE Transactions on Industrial Electronics vol59 no 8 pp 3241ndash3253 2012
[21] C J C H Watkins and P Dayan ldquoQ-learningrdquo MachineLearning vol 8 no 3-4 pp 279ndash292 1992
[22] C J CHWatkinsLearning fromdelayed rewards [PhD thesis]University of Cambridge Cambridge UK 1989
[23] F L Lewis and V L Syrmos Optimal Control John Wiley ampSons New York NY USA 1995
[24] M Sassano and A Astolfi ldquoDynamic approximate solutionsof the HJ inequality and of the HJB equation for input-affinenonlinear systemsrdquo IEEE Transactions on Automatic Controlvol 57 no 10 pp 2490ndash2503 2012
[25] Y X Wu and C Wang ldquoDeterministic learning based adaptivenetwork control of robot in task spacerdquoActa Automatica Sinicavol 39 no 1 pp 1ndash10 2013
[26] P M Patre W MacKunis K Kaiser and W E DixonldquoAsymptotic tracking for uncertain dynamic systems via amultilayer neural network feedforward and RISE feedbackcontrol structurerdquo IEEE Transactions on Automatic Control vol53 no 9 pp 2180ndash2185 2008
[27] B E Paden and S S Sastry ldquoA calculus for computing Filippovrsquosdifferential inclusion with application to the variable structurecontrol of robot manipulatorsrdquo IEEE Transactions on Circuitsand Systems vol 34 no 1 pp 73ndash82 1987
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Mathematical Problems in Engineering 15
0 1 2 3 4 5 6 7 8 9 10Time (s)
0
001
002
003
004
005
Join
t 2 er
ror (
rad)
minus005
minus004
minus003
minus002
minus001
Figure 22 Tracking error curve of configuration B joint 2 with ACIbased reinforcement learning
005
1
02 03 04 05 06 07
0
01
02
03
minus1
minus05minus02
minus01
xla
bel
y labelz label
Desired trajectoryActual trajectory
Figure 23 3D-tip trajectory curve of configuration A with ACI
005
1
035 036 037 038 039 04
006008
01012014016018
minus1
minus05
xla
bel
y labelz label
Desired trajectoryActual trajectory
Figure 24 3D-tip trajectory curve of configuration B with ACI
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work is financially supported by the National NaturalScience Foundation of China (Grant nos 61374051 and60974010) the Scientific Technological Development PlanProject in Jilin Province of China (Grant no 20110705) Thefirst author is funded by China Scholarship Council
References
[1] Y Li and B Dong ldquoDecentralized ADRC control for reconfig-urable manipulators based on VGSTA-ESO of sliding moderdquoINFORMATION vol 15 no 6 pp 2453ndash2466 2012
[2] Y LiM-C Zhu andY-C Li ldquoSimulation on robust neurofuzzycompensator for reconfigurable manipulator motion controlrdquoJournal of System Simulation vol 19 no 22 pp 5169ndash5174 2007
[3] M-C Zhu and Y-C Li ldquoDecentralized adaptive sliding modecontrol for reconfigurable manipulators using fuzzy logicrdquoJournal of Jilin University (Engineering and Technology Edition)vol 39 no 1 pp 170ndash176 2009
[4] L Zhu and Y Li ldquoDecentralized adaptive neural networkcontrol for reconfigurable manipulatorsrdquo in Proceedings of theChinese Control and Decision Conference (CCDC rsquo10) pp 1760ndash1765 Xuzhou China May 2010
[5] M Zhu Y Li and Y Li ldquoA new distributed control schemeof modular and reconfigurable robotsrdquo in Proceedings of theIEEE International Conference onMechatronics and Automation(ICMA rsquo07) pp 2622ndash2627 Harbin China August 2007
[6] M C Zhu Y Li and Y C Li ldquoObserver-based decentralizedadaptive fuzzy control for reconfigurable manipulatorrdquo Controland Decision vol 24 no 3 pp 429ndash434 2009
[7] S Richard and A G Barto Reinforcement Learning An Intro-duction The MIT Press London UK 1998
[8] F L Lewis and D Liu ReinForcement Learning and Approxi-mate Dynamic Programming For Feedback Control Wiley-IEEEPress New York NY USA 2012
[9] Y-K Xu and X-R Cao ldquoLebesgue-sampling-based optimalcontrol problems with time aggregationrdquo IEEE Transactions onAutomatic Control vol 56 no 5 pp 1097ndash1109 2011
[10] F L Lewis andD Vrabie ldquoReinforcement learning and adaptivedynamic programming for feedback controlrdquo IEEE Circuits andSystems Magazine vol 9 no 3 pp 32ndash50 2009
[11] X Xu H-G He and D Hu ldquoEfficient reinforcement learningusing recursive least-squares methodsrdquo Journal of ArtificialIntelligence Research vol 16 pp 259ndash292 2002
[12] H Zhang Q Wei and Y Luo ldquoA novel infinite-time optimaltracking control scheme for a class of discrete-time nonlinearsystems via the greedy HDP iteration algorithmrdquo IEEE Trans-actions on Systems Man and Cybernetics B vol 38 no 4 pp937ndash942 2008
[13] H Zhang R Song Q Wei and T Zhang ldquoOptimal trackingcontrol for a class of nonlinear discrete-time systems withtime delays based on heuristic dynamic programmingrdquo IEEETransactions on Neural Networks vol 22 no 12 pp 1851ndash18622011
16 Mathematical Problems in Engineering
[14] H Zhang X Xie and S Tong ldquoHomogenous polynomiallyparameter-dependent 119867
infinfilter designs of discrete-time fuzzy
systemsrdquo IEEE Transactions on Systems Man and CyberneticsB vol 41 no 5 pp 1313ndash1322 2011
[15] H Zhang L Cui X Zhang and Y Luo ldquoData-drivenrobust approximate optimal tracking control for unknown gen-eral nonlinear systems using adaptive dynamic programmingmethodrdquo IEEE Transactions on Neural Networks vol 22 no 12pp 2226ndash2236 2011
[16] J Zhang H Zhang Y Luo and H Liang ldquoOptimal controldesign for nonlinear systems adaptive dynamic programmingbased on fuzzy critic estimatorrdquo in The International JointConference on Neural Network pp 1ndash6 Brisbane Australia2012
[17] H Zhang D Gong B Chen and Z Liu ldquoSynchronization forcoupled neural networkswith interval delay a novel augmentedlyapunov-krasovskii functional methodrdquo IEEE Transactions onNeural Networks and Learning Systems vol 24 no 1 pp 58ndash702013
[18] S Bhasin K Dupree P M Patre and W E Dixon ldquoNeuralnetwork control of a robot interacting with an uncertain vis-coelastic environmentrdquo IEEE Transactions on Control SystemsTechnology vol 19 no 4 pp 947ndash955 2011
[19] S G Khan G Herrmann F Lewis T Pipe and C MelhuishldquoA Q-learning based Cartesian model reference compliancecontroller implementation for a humanoid robot armrdquo inProceedings of the IEEE 5th International Conference onRoboticsAutomation andMechatronics (RAM rsquo11) pp 214ndash219 QingdaoChina September 2011
[20] P K Patchaikani L Behera and G Prasad ldquoA single networkadaptive critic-based redundancy resolution scheme for robotmanipulatorsrdquo IEEE Transactions on Industrial Electronics vol59 no 8 pp 3241ndash3253 2012
[21] C J C H Watkins and P Dayan ldquoQ-learningrdquo MachineLearning vol 8 no 3-4 pp 279ndash292 1992
[22] C J CHWatkinsLearning fromdelayed rewards [PhD thesis]University of Cambridge Cambridge UK 1989
[23] F L Lewis and V L Syrmos Optimal Control John Wiley ampSons New York NY USA 1995
[24] M Sassano and A Astolfi ldquoDynamic approximate solutionsof the HJ inequality and of the HJB equation for input-affinenonlinear systemsrdquo IEEE Transactions on Automatic Controlvol 57 no 10 pp 2490ndash2503 2012
[25] Y X Wu and C Wang ldquoDeterministic learning based adaptivenetwork control of robot in task spacerdquoActa Automatica Sinicavol 39 no 1 pp 1ndash10 2013
[26] P M Patre W MacKunis K Kaiser and W E DixonldquoAsymptotic tracking for uncertain dynamic systems via amultilayer neural network feedforward and RISE feedbackcontrol structurerdquo IEEE Transactions on Automatic Control vol53 no 9 pp 2180ndash2185 2008
[27] B E Paden and S S Sastry ldquoA calculus for computing Filippovrsquosdifferential inclusion with application to the variable structurecontrol of robot manipulatorsrdquo IEEE Transactions on Circuitsand Systems vol 34 no 1 pp 73ndash82 1987
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
16 Mathematical Problems in Engineering
[14] H Zhang X Xie and S Tong ldquoHomogenous polynomiallyparameter-dependent 119867
infinfilter designs of discrete-time fuzzy
systemsrdquo IEEE Transactions on Systems Man and CyberneticsB vol 41 no 5 pp 1313ndash1322 2011
[15] H Zhang L Cui X Zhang and Y Luo ldquoData-drivenrobust approximate optimal tracking control for unknown gen-eral nonlinear systems using adaptive dynamic programmingmethodrdquo IEEE Transactions on Neural Networks vol 22 no 12pp 2226ndash2236 2011
[16] J Zhang H Zhang Y Luo and H Liang ldquoOptimal controldesign for nonlinear systems adaptive dynamic programmingbased on fuzzy critic estimatorrdquo in The International JointConference on Neural Network pp 1ndash6 Brisbane Australia2012
[17] H Zhang D Gong B Chen and Z Liu ldquoSynchronization forcoupled neural networkswith interval delay a novel augmentedlyapunov-krasovskii functional methodrdquo IEEE Transactions onNeural Networks and Learning Systems vol 24 no 1 pp 58ndash702013
[18] S Bhasin K Dupree P M Patre and W E Dixon ldquoNeuralnetwork control of a robot interacting with an uncertain vis-coelastic environmentrdquo IEEE Transactions on Control SystemsTechnology vol 19 no 4 pp 947ndash955 2011
[19] S G Khan G Herrmann F Lewis T Pipe and C MelhuishldquoA Q-learning based Cartesian model reference compliancecontroller implementation for a humanoid robot armrdquo inProceedings of the IEEE 5th International Conference onRoboticsAutomation andMechatronics (RAM rsquo11) pp 214ndash219 QingdaoChina September 2011
[20] P K Patchaikani L Behera and G Prasad ldquoA single networkadaptive critic-based redundancy resolution scheme for robotmanipulatorsrdquo IEEE Transactions on Industrial Electronics vol59 no 8 pp 3241ndash3253 2012
[21] C J C H Watkins and P Dayan ldquoQ-learningrdquo MachineLearning vol 8 no 3-4 pp 279ndash292 1992
[22] C J CHWatkinsLearning fromdelayed rewards [PhD thesis]University of Cambridge Cambridge UK 1989
[23] F L Lewis and V L Syrmos Optimal Control John Wiley ampSons New York NY USA 1995
[24] M Sassano and A Astolfi ldquoDynamic approximate solutionsof the HJ inequality and of the HJB equation for input-affinenonlinear systemsrdquo IEEE Transactions on Automatic Controlvol 57 no 10 pp 2490ndash2503 2012
[25] Y X Wu and C Wang ldquoDeterministic learning based adaptivenetwork control of robot in task spacerdquoActa Automatica Sinicavol 39 no 1 pp 1ndash10 2013
[26] P M Patre W MacKunis K Kaiser and W E DixonldquoAsymptotic tracking for uncertain dynamic systems via amultilayer neural network feedforward and RISE feedbackcontrol structurerdquo IEEE Transactions on Automatic Control vol53 no 9 pp 2180ndash2185 2008
[27] B E Paden and S S Sastry ldquoA calculus for computing Filippovrsquosdifferential inclusion with application to the variable structurecontrol of robot manipulatorsrdquo IEEE Transactions on Circuitsand Systems vol 34 no 1 pp 73ndash82 1987
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of