6
NAFIPS 2005 - 2005 Annual Meeting of the North American Fuzzy Information Processing Society Neuro-fuzzy Learning of Strategies for Optimal Control Problems* Kaivan Kamali', Lijun Jiang2, John Yen', K.W. Wang2 'Laboratory for Intelligent Agents 2Structural Dynamics & Control Laboratory School of Information Sciences & Technology Dept. of Mechanical & Nuclear Engineering The Pennsylvania State University University Park, PA 16802 Email: {kxk3O2,lxjl48,juyl,kxw2 } (©psu.edu Abstract Various techniques have been proposed to automate the weight selection process in optimal control problems; yet these techniques do not provide symbolic rules that can be reused. We propose a layered approach for weight selection process in which Q-learning is used for select- ing weighting matrices and hybrid genetic algorithm is used for selecting optimal design variables. Our approach can solve problems that genetic algorithm alone cannot solve. More importantly, the Q-learning's optimal policy enables the training of neuro-fuzzy systems which yields reusable knowledge in the form offuzzy if-then rules. Ex- perimental results show that the proposed method can au- tomate the weight selection process and the fuzzy if-then rules acquired by training a neuro-fuzzy system can solve similar weight selection problems. 1. Introduction In traditional optimal control and design prob- lems [6], the control gains and design parameters are derived to minimize a cost function reflecting the sys- tem performance and control effort. One major chal- lenge of such approaches is the selection of weighting matrices in the cost function. TIraditionally, select- ing the weighting matrices has mostly been accom- plished based on human intuition and through trial and error, which can be time consuming and ineffec- tive. Hence, various techniques have been proposed to automate the weight selection process [1, 10, 9, 16, 2]. However, these techniques are not accompanied by a learning process which can be used to solve a simi- lar problem. * This research has been supported by NSF CMS grant No. 02- 18597 and by AFSOR MURI grant No. F49620-00-1-0326. We propose to model the problem of finding the opti- mal weighting matrices as a deterministic Markov de- cision process (MDP) [7]. A deterministic MDP is a MDP in which the state transitions are determinis- tic. Reinforcement learning (RL) techniques [11, 13, 4] such as Q-learning [15] can be used to find the opti- mal policy of the MDP. Modeling the problem as a de- terministic MDP has the following benefits: (1) there are abundant RL techniques to solve the MDP and au- tomate the weight selection problem, (2) the optimal policy computed by RL techniques can be used to gen- erate fuzzy rules, which can be used in other weight se- lection problems. Neuro-fuzzy systems such as adaptive network-based fuzzy inference system (ANFIS) [3], can be used to learn fuzzy if-then rules for the weight se- lection problem using the training data obtained from RL's optimal policy. To evaluate our method, we performed several nu- merical experiments on a sample active-passive hy- brid vibration control problem, namely adaptive struc- tures with active-passive hybrid piezoelectric networks (APPN) [14, 12, 5, 8]. These experiments show (1) our method can automate the weight selection problem, (2) fuzzy if-then rules are learned by training ANFIS us- ing the training data acquired from RL's optimal pol- icy, and (3) the learned fuzzy if-then rules can be used to solve other weight selection problems. The rest of this paper is organized as follows: Sec- tion 2 discusses the proposed methodology. Section 3 explains a sample problem, namely, active-passive hy- brid piezoelectric networks (APPN) for structural vi- bration control. Section 4 discusses the experimental results and Section 5 concludes the paper. 2. Proposed Methodology In this section we first describe Markov deci- sion processes and Q-learning. We then use an 0-7803-91 87-X/05/$20.00 §2005 IEEE. 199

[IEEE NAFIPS 2005 - 2005 Annual Meeting of the North American Fuzzy Information Processing Society - Detroit, MI, USA (26-28 June 2005)] NAFIPS 2005 - 2005 Annual Meeting of the North

  • Upload
    kw

  • View
    213

  • Download
    1

Embed Size (px)

Citation preview

NAFIPS 2005 - 2005 Annual Meeting of the North American Fuzzy Information Processing Society

Neuro-fuzzy Learning of Strategies for Optimal Control Problems*

Kaivan Kamali', Lijun Jiang2, John Yen', K.W. Wang2'Laboratory for Intelligent Agents 2Structural Dynamics & Control Laboratory

School of Information Sciences & Technology Dept. of Mechanical & Nuclear EngineeringThe Pennsylvania State University

University Park, PA 16802Email: {kxk3O2,lxjl48,juyl,kxw2 } (©psu.edu

Abstract

Various techniques have been proposed to automatethe weight selection process in optimal control problems;yet these techniques do not provide symbolic rules thatcan be reused. We propose a layered approach for weightselection process in which Q-learning is used for select-ing weighting matrices and hybrid genetic algorithm isusedfor selecting optimal design variables. Our approachcan solve problems that genetic algorithm alone cannotsolve. More importantly, the Q-learning's optimal policyenables the training of neuro-fuzzy systems which yieldsreusable knowledge in the form offuzzy if-then rules. Ex-perimental results show that the proposed method can au-tomate the weight selection process and the fuzzy if-thenrules acquired by training a neuro-fuzzy system can solvesimilar weight selection problems.

1. Introduction

In traditional optimal control and design prob-lems [6], the control gains and design parameters arederived to minimize a cost function reflecting the sys-tem performance and control effort. One major chal-lenge of such approaches is the selection of weightingmatrices in the cost function. TIraditionally, select-ing the weighting matrices has mostly been accom-plished based on human intuition and through trialand error, which can be time consuming and ineffec-tive. Hence, various techniques have been proposed toautomate the weight selection process [1, 10, 9, 16, 2].However, these techniques are not accompanied by alearning process which can be used to solve a simi-lar problem.

* This research has been supported by NSF CMS grant No. 02-18597 and by AFSOR MURI grant No. F49620-00-1-0326.

We propose to model the problem of finding the opti-mal weighting matrices as a deterministic Markov de-cision process (MDP) [7]. A deterministic MDP is aMDP in which the state transitions are determinis-tic. Reinforcement learning (RL) techniques [11, 13, 4]such as Q-learning [15] can be used to find the opti-mal policy of the MDP. Modeling the problem as a de-terministic MDP has the following benefits: (1) thereare abundant RL techniques to solve the MDP and au-tomate the weight selection problem, (2) the optimalpolicy computed by RL techniques can be used to gen-erate fuzzy rules, which can be used in other weight se-lection problems. Neuro-fuzzy systems such as adaptivenetwork-based fuzzy inference system (ANFIS) [3], canbe used to learn fuzzy if-then rules for the weight se-lection problem using the training data obtained fromRL's optimal policy.

To evaluate our method, we performed several nu-merical experiments on a sample active-passive hy-brid vibration control problem, namely adaptive struc-tures with active-passive hybrid piezoelectric networks(APPN) [14, 12, 5, 8]. These experiments show (1) ourmethod can automate the weight selection problem, (2)fuzzy if-then rules are learned by training ANFIS us-ing the training data acquired from RL's optimal pol-icy, and (3) the learned fuzzy if-then rules can be usedto solve other weight selection problems.

The rest of this paper is organized as follows: Sec-tion 2 discusses the proposed methodology. Section 3explains a sample problem, namely, active-passive hy-brid piezoelectric networks (APPN) for structural vi-bration control. Section 4 discusses the experimentalresults and Section 5 concludes the paper.

2. Proposed Methodology

In this section we first describe Markov deci-sion processes and Q-learning. We then use an

0-7803-91 87-X/05/$20.00 §2005 IEEE. 199

active-passive hybrid control design problem to illus-trate and evaluate our method. Finally, we providesome background information on ANFIS.

2.1. Markov Decision Processes and Q-Learning

The problem of finding a strategy for adjustingweighting matrices can be viewed as a deterministicMarkov Decision Process (MDP). A Markov DecisionProcess consists of a set of states, S, a set of actions,A, a reward function, R: S x A -- X, and a state tran-sition function, T: S x A lH(S), where IH(S) is aprobability distribution over S. A deterministic MDPis a MDP in which the state transitions are determin-istic. The action at each state is selected according toa policy function, 7r: S -+ A, which maps states to ac-tions. The goal is to find the optimal policy, *, whichmaximizes a performance metric, e.g. the expected dis-counted sum of rewards received.

Optimal policy of a MDP can be learned using RLtechniques. Q-learning is a RL technique, which learnsthe optimal policy of a MDP by learning a Q-function(Q : S x A -- R) which maps state-action pairs tovalues. The optimal Q-value for a state-action pair,Q*(s, a), is the expected discounted reward for takingaction a in state s and continuing with the optimal pol-icy thereafter. The optimal policy is determined fromthe optimal Q-value

7r* (s) = argmaxaQ* (s, a) (1)

Q-values are updated based on the reward received. Ifeach state-action pair is visited frequently enough thenit is guaranteed that the estimate Q-value will convergeto the optimal Q-value.

2.2. Description of New Methodology

Consider the following linear system with passive con-trol parameters pi (i = 1.... q) and active control in-put u:

.+(t) = A(pi, . ..pq)x(t) + Bl(pi,... ,pq)u(t) + B2f(t)(2)

y(t) = Cx(t) (3)where x e 3n is the system state vector, u e Rm isthe control input vector, y E RP is the output perfor-mance vector, and f is the white Gaussian noise of ex-citation. The matrices A, B1, B2, and C are systemmatrix, control input matrix, disturbance input ma-trix, and output matrix, respectively.

For a given set of state weighting matrices Q andcontrol weighting matrix R, the optimal values of the

passive control parameters pi can be found by minimiz-ing the cost function, which is selected to be the min-imized cost function of the stochastic regulator prob-lem [6].

J = Min( [XT(t)QX(t) + UT(t)Tu(t)]dt) (4)

As reported in [14] and [12], the system state weight-ing matrix Q and control weighting matrix R deter-mine the solution of the regulator Ricatti equation andhence the active control gains and the optimal valuesof the passive control parameters. Therefore, the sys-tem performances are highly dependant on the weight-ing matrices Q and R. Our problem is now formulatedas the following: Finding the weighting matrices Q andR such that the generated active-passive hybrid con-trol with optimal passive control parameters p* and op-timal active control gain Kc yields a closed-loop sys-tem which satisfies some specific constraints from thesystem control designers.

Following the assumption made in [1, 10, 9, 16, 2],the weighting matrices Q and R are restricted to bediagonal such that

(5)(6)

= diag[w 2 ...wn ]R = diag [WC WC * * WCM]

where wi (i = 1,... n) is the state weighting fac-tor on the ith structural mode, wi (i 1,... ,m) isthe control weighting factor on the jth control input.Therefore, the goal of the optimization problem is tofind the optimal value of a weighting factor vector,

(W=,m ... IWM',WC1I .X,Wj) = (w1. .. Iwm+n)

given the optimal values of the active feedback gainvector and the passive design variables.

The weight selection problem is cast as a RL prob-lem, where the state space is a (n + m)-dimensional.In our framework, we use Q-learning to solve the RLproblem. For each weight vector, the optimal value ofthe passive design variables are computed using hybridGA. This layered approach, i.e. Q-learning for select-ing weight vector and hybrid GA for selecting optimalpassive design variables, allows for solving optimizationproblems that cannot be solved using GA alone. Fur-thermore, the Q-learning's optimal policy enables thetraining of neuro-fuzzy systems, e.g. ANFIS, and yieldsreusable knowledge in the form of fuzzy if-then rules.

The proposed approach is given in Fig. 1. The firststep is to initialize the state space, which includesthe following: (1) determine the range of each weightvariable to be optimized, and (2) discretize the rangeof each weight variable. Two nested loops are thenexecuted. The inner loop represents a single searchthrough the state space -repeated until a maximum

200

view of the proposed layered

number of iterations is reached. The outer loop is re-

peated until the policy for the RL agent converges.

The first step in the outer loop is to initialize the lo-cation of the RL agent in the state space, which is de-termined by the initial values of the elements of T. Thisbasically defines the starting point of the RL agent'ssearch in the state space. The RL agent first finds theoptimal value of the passive design variables using hy-brid GA and then evaluates the design based on thedesign objectives. Next, it selects an action and up-

dates its state according to the selected action.The set of actions the RL agent can perform is de-

noted as a = (a1 ,a21,. .., aln+m, a2n+m), where a1,is an increase in wi and a22 is a decrease in wi. Theamount of increase or decrease Awz depends on how thestate space is discretized. Actions are selected accord-ing to a stochastic strategy. The strategy is designedto allow for exploration of the search space by proba-bilistically selecting between the action correspondingto the maximum Q-value and other actions. The RLagent then finds the optimal value of the passive de-sign variables for the new state using hybrid GA andevaluates the design using the design objectives.

In order to learn the optimal policy for weight ad-justments the algorithm must compute the reward,which is calculated based on the degree in which thenew design improves (or deteriorates) over the previ-ous design. The reward is the difference between theevaluation of the current state and the previous state.Each state is evaluated by computing D, the distancebetween its performance (e.g. power consumption, vi-bration magnitude) and the design objectives. Hence,the reward is r = Dold - Dnew. A reward is positive ifthe performance of the new design is closer to the de-sign objectives.

2.3. Neuro-fuzzy Systems: ANFIS

Neuro-fuzzy systems are a combination of two pop-ular soft computing techniques: neural networks andfuzzy systems. Neural networks have the capability tolearn from examples, yet the learned knowledge cannotbe represented explicitly. On the other hand, knowl-edge in fuzzy systems is represented via explicit fuzzyif-then rules, yet fuzzy systems have no learning ca-pability. Neuro-fuzzy system is a hybrid approach inwhich a fuzzy system is trained using techniques sim-ilar to those applied to neural networks. One of thefirst neuro-fuzzy systems was Adaptive Network-basedFuzzy Inference System (ANFIS) [3]. ANFIS representsa Sugeno-type fuzzy system as a multilayer feedforwardnetwork which can be trained via backpropagation or acombination of backpropagation and least squares es-timate.

3. An Example Problem

To evaluate the proposed method, an active-passivehybrid vibration control problem is formulated as anexample for testing. This test problem is concernedwith utilizing the active and passive characteristics ofpiezoelectric materials and circuitry for structural vi-bration suppression. The concept of the active-passivehybrid piezoelectric network (APPN) has been investi-gated by many researchers in the past [14, 12, 5, 8]. Inthe following, the proposed methodology is used in thedesign process of the active-passive hybrid piezoelec-tric network for vibration control of a cantilever beam.A schematic of the system is shown in Fig. 2. A can-

tilever beam is partially covered with two PZT patchesused as actuators. Each PZT actuator is connected toan external voltage (V1 and V2) source in series with aresistor-inductor shunt circuit. The beam is excited bya Gaussian white noise at x = 0.95Lb, and the outputdisplacement is measured at the free-end of the beam.The two surface mounted PZT patches are located at

201

Figure 1. Overallmethodology

-w2) such that the optimal active-passive hybrid controlwill satisfy all constraints defined by the user.

There are different ways to define the constraints inthe control system design. Eigenvalue assignment, feed-back gain limit constraints, and input/output varianceconstraints have been used in the previous investiga-tions. In our example constraints on the output perior-mance and control input are defined as follows: (1) up-per bounds are given to restrict the magnitudes of fre-quency response function at specified spatial locationsand structural resonant frequencies:

Figure 2. Schematic of the system with APPN

x1= 0.02m and x1 = 0.08m, respectively. Other para-meters related to the properties of the beam and PZTpatches are the same as specified in [141.A system state space form of the system equations

can be expressed as (detailed derivation of the systemequations can be found in [14]):

A(L1, R1, L2, R2)X(t) + B,(Li, R1, L2, R2)u(t) + B2f(t)(7)

y(t) = Cx(t) (8)x = [qT T Qi Q2 Q2] u(t) =ve1 (t) Vc2 (t)ITf(t) (9)

where q and q are vectors of generalized displacementand velocity of the beam structure; Ql, Qli Q2, andQ2 are electric charge and current of the first and sec-ond PZT actuators in the first and second piezoelec-tric shunting circuit, respectively; u is the control in-put vector composed of the control voltages VK1 andVc2; f(t) is the external excitation vector of Gaussianwhite noise. Here the system matrix A and control in-put matrix B1 are function of the circuit parameters(inductances L1, L2 and resistances R1, R2).

For a given set of weighting matrices Q and R,the optimal values of the passive design parameters(L1, L2, R1, and R2), can be found by minimizing thecost function, Eq. 4, where the system state weight ma-trix Q and control weigh matrix R can be expressedas

Q = diag[WQKb WQMb 0 0 0 01WQ = diag[wl w2 ... W,]

R - diag[wl W2]

(10)(11)(12)

where Kb and Mb are the mass and stiffness matri-ces of the beam structure, respectively. wi and w2are scalar weighting factors on the 1st and 2nd controlvoltage source, respectively. wX wm ... wm are scalarweighting factors on the n structural modes used in thesystem equation derivation, respectively. Now our de-sign problem can be reformulated as finding the scalarweighting factors on system states (wI w2 ... wn ) andscalar weighting factors on control voltage sources (wi,

1 G(s)IIsWp =II C(SI-A+BlKc)B2f11=i8,, <Upp-1,..,n(13)

(2) variances of the control voltage sources arebounded:

lim E[Vcj2(t)] < j2 j - 1,2 (14)

4. Experimental Results

We performed several experiments to evaluate ourmethod. In the first experiment we used our layered ap-proach to solve an example weight selection problem.In the second experiment we used the Q-learning's op-timal policy to train ANFIS modules. We then usedthe trained ANFIS modules to solve two weight selec-tion problems: the problem that ANFIS was trainedwith and a different weight selection problem.

4.1. Experiment 1

In this experiment we performed optimization on anexample APPN problem. The design requirements forthe APPN system are as follows:

1. Magnitude of the frequency response func-tion (FRF) near the Ist natural frequency of thesystem is < -60 dB

2. Magnitude of the FRF near the 2nd natural fre-quency of the system is < -85 dB

3. Covariance of the 1St voltage source is < 250 V4. Covariance of the 2nd voltage source is < 250 V

In our simulation, only the first five modes (n = 5)are considered. For simplicity, we also assume that (1)the two weighting factors on the control voltage sourceare same (i.e. wl = = wC), and (2) the weight-ing factors on the structural modes other than thoserelated to the design constraints take the value of 1(i.e. w3 - w4 = w5 = 1). Now the weighting ma-trices Q and R are totally determined by three designvariables: wc, w , and w2 . Thus the problem of find-ing optimal weighting matrices Q and R can be refor-mulated as finding the optimal values of the weight-ing factor on control voltage source (wc) and those on

202

C6utroUervaitagem.. vdRl-

-control

Factor logio(w,) wi wiInitial -3 1 1Step size -1 0.5 0.5Dimension 6 7 7

Table 1. Specification of the state space for ex-periment 1

structural modes (wi and w2 ). The searching statespace of the problem is three dimensional and specifi-cation of the state space is given in Tab. 1. The value ofwi and w2 increase in the state space by their corre-

sponding step sizes, while the value of w, is decreasedby a factor which is reciprocal to its step size.

In this experiment, for each iteration of the Q-learning module, a maximum of 100 weight changeswere performed. Each iteration ended when a maxi-mum number of weight changes was performed. Theprogram iterated until the convergence of the opti-mal policy. By optimal policy we mean the optimalsequence of weight changes leading to a state satisfy-ing all the user constraints on system output and con-trol input. The results of numerical experiment 1 are

given in Tab. 2. The weights returned by the programaccomplish an optimal system which satisfy all the con-

straints on the output performance and control effort.Our proposed method allows for automation of the op-timization process, frees the expert from the burden offinding the correct sequence of weight changes, and pro-duces reasonable results.

4.2. Experiment 2

In experiment 2, we used the Q-learning's optimalpolicy-learned in experiment 1-to train three ANFISmodules. The ANFIS modules are used to make weightchanging decisions on wc, w and w2. The overallmethodology is given in Fig. 3. Next we describe howto get the training data for ANFIS modules.

The optimal policy is a sequence of actions (i.e.weight changes) which is derived from the Q-values.Basically, in each state the action corresponding to themaximum Q-value is the optimal action. We derivedthe optimal actions for each state in the state-space ofexperiment 1 problem. The actions corresponding toweight changes for wc (either increase or decrease) wereused as the training data for the ANFIS module corre-sponding to weight changing decisions for w,. Trainingdata for the other two ANFIS was derived similarly.

Each ANFIS module has 4 inputs: (1) magnitude ofFRF near 1st natural frequency, (2) magnitude of FRF

Variance of the 2nd control voltage: 178.1 V

Table 2. Results for experiment 1

near 2nd natural frequency, (3) variance of the 1st con-trol voltage, and (4) variance of the 2nd control volt-age. Each ANFIS input has 3 membership functions as-sociated with linguistic values bad, average, and good.The membership functions are Gaussian. Since thereare 4 inputs and each input has 3 linguistic values,then each ANFIS has 34 - 81 fuzzy if-then rules. AN-FIS represents a Sugeno-type fuzzy inference systemand the consequent for each rule is a Oth degree func-tion of the input variables.We trained the three ANFIS using a hybrid method

- a combination of backpropagation and least squaresestimate. After training, the three ANFIS were used tomake decisions on weight selection problem for whichthey were trained with. Starting from the initial weightsettings, hybrid GA was used to find the optimal designsolution. The design solution was input to the ANFISmodules. The action corresponding to the ANFIS withthe strongest output was selected. This process contin-ued until a goal state was reached. Starting from theinitial weight settings and following the procedure justdiscussed, the goal state was reached after 24 weightchanges. The layered approach used in experiment 1requires 3400 weight changes to solve the same prob-lem (34 iteration for Q-learning to convergence, 100weight changes per iteration, Tab. 2).

Next, we applied the trained ANFIS to solving a dif-ferent weight selection problem. The requirements forthis test problem are given in Tab. 3. Starting fromthe initial weight settings and following the procedurejust discussed, the goal state was reached after 7 weightchanges. The results are given in Tab. 4. The layeredapproach requires 300 weight changes to solve the sameproblem (3 iteration for Q-learning to convergence, 100weight changes per iteration).

5. Summary

We propose a layered approach for solving optimalcontrol and design problems. Such layered approach

203

Weighting factor on control (w,): 1.OE-7.0Weighting factor on the 1st mode (w: 1.0Weighting factor on the 2na mode (w2): 1.0Number of iterations for convergence: 34Number of weight changes per iteration: 100Magnitude of FRF near 1st natural freq.: -61.5 dBMagnitude of FRF near 2nd natural freq.: -87.8 dBVariance of the 1st control voltage: 189.9 V

Figure 3. Overall view of weight changingmethodology using ANFIS

Magnitude of FRF near 19 natural freq.: -60 dBMagnitude of FRF near 2" natural freq.: -75 dBVariance of the 1" control voltage: 200 VVariance of the 2nd control voltage: 200 V

Table 3. Test problem requirements

-i.e. Q-learning for selecting weighting matrices andhybrid GA for selecting optimal design variables- al-lows for solving optimization problems that cannot besolved using GA alone. Furthermore, the Q-learning'soptimal policy enables the training of neuro-fuzzy sys-tems, e.g. ANFIS, and yields reusable knowledge in theform of fuzzy rules.

To evaluate our methodology, we performed severalexperiments. The experiments showed that our methodcan successfully automate the weight selection prob-lem. Furthermore, the Q-learning's optimal policy pro-vides training data for ANFIS modules. ANFIS mod-ules provide reusable fuzzy rules for the weight selec-tion problem, which can also be applied to other weightselection problems. Moreover, the fuzzy rules provideheuristics about adjusting weights such that an ac-

Table 4. Test problem results

ceptable design is reached much faster than using Q-learning.

References

[1] A. Arar, M. Sawan, and R. Rob. Design of optimalcontrol systems with eigenvalue placement in a speci-fied region. Optimal Control Applications and Methods,16(2):149-154, 1995.

[2] E. Collins and M. Selekwa. Fuzzy quadratic weights forvariance constrained lqg design. In Proc. of the IEEEConference on Decision and Control, volume 4, pages4044-4049, 1999.

[3] J. Jang. Anfis: Adaptive-network-based fuzzy inferencesystems. IEEE Tran. on SMC, 23(5):665-685, 1993.

[4] L. Kaelbling, M. Littman, and A. Moore. Reinforce-ment learning: A survey. J. of Al Research, 4(1):237-285, 1996.

[5] S. Kahn and K. Wang. Structural vibration control viapiezoelectric materials with activepassive hybrid net-works. In Proc. ASME IMECE DE, volume 75, pages187-94, 1999.

[6] H. Kwakernaak and R. Sivan. Linear Optimal ControlSystems. Wiley, New York, NY, 1972.

[7] M. Puterman. Markov Decision Processes: Discrete Sto-chastic Dynamic Programming. Wiley, New York, NY,1994.

[8] A. Singh and D. Pines. Active/passive reduction ofvibration of periodic one-dimensional structures usingpiezoelectric actuators. Smart AMaterials and Structures,13(4):698-711, 2004.

[9] B. Stuckman and P. Stuckman. Optimal selectionof weighting matrices in integrated design of struc-tures/controls. Computers and Electrical Engineering,19(1):9-18, 1993.

[10] M. Sunar and S. Rao. Optimal selection ofweighting ma-trices in integrated design of structures/controls. AIAAJ., 31(4):714-720, 1993.

[11] R. Sutton and A. Barto. Reinforcement Learning: AnIntroduction. MIT Press, Cambridge, MA, 1998.

[121 J. Tang and K. Wang. Active-passive hybrid piezoelec-tric networks for vibration control: Comparisons and im-provement. Smart MaterTials and Structures, 10(4):794-806, 2001.

[13] G. Tesauro. Temporal difference learning and td-gammon. Comm. of the ACM, 38(3):58-68, 1995.

[14] M. Tsai and K. Wang. On the structural damping char-acteristics of active piezoelectric actuators with passiveshunt. J. ofSound and Vibration, 221(1):1-20, 1999.

[15] C. Watkins. Learning with delayed rewards. PhD The-sis, Cambridge University, Cambridge, England, 1989.

[16] L. Zhang and J. Mao. An approach for selecting theweighting matrices of lq optimal controller design basedon genetic algorithms. In Proc. of IEEE Conf. on Deci-sion and Control, volume 3, pages 1331-1334, 2002.

204

Weighting factor on control (w,): l.OE-6.0Weighting factor on the 1st mode (w' ): 3.0Weighting factor on the 2nd mode (w2 ): 1.0Magnitude of FRF near 1t natural freq.: -61.4 dBMagnitude of FRF near 21s natural freq.: -78.0 dBVariance of the 1st control voltage: 174.8 VVariance of the 2nd conitrol voltage: 102.3 V