109

Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization
Page 2: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Resilient Control inCooperative and Adversarial Multi-agent Networks

Page 3: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Moncrief-O’Donnell Chair, UTA Research Institute (UTARI)The University of Texas at Arlington, USA

F.L. Lewis National Academy of Inventors

Talk available online at http://www.UTA.edu/UTARI/acs

Supported by :ONR – Tony Seman, Lynn Petersen, Marc SteinbergUS TARDEC – Dariusz MikulskiAFOSR Europe – Kevin BollinoNSF

Reinforcement Learning for Resilient Control inCooperative and Adversarial Multi-agent Networks:

CPS Applications in Microgrid and Human-Robot Interactions

AndQian Ren Consulting Professor, State Key Laboratory of Synthetical Automation

for Process IndustriesNortheastern University, Shenyang, China

Supported by :China NNSFChina Project 111

Page 4: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Invited by Chaoli Wang

Page 5: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

J.J. Finnigan, Complex science for a complex world

The Internet

ecosystem ProfessionalCollaboration network

Barcelona rail network

Structure of Natural andManmade Systems

Peer-to-Peer Relationships in networked systems

Distributed Decision & ControlLocal nature of Physical Laws

Clusters of galaxies

Page 6: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Distributed Adaptive Control for Multi‐Agent Systems

Page 7: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

FlockingReynolds, Computer Graphics 1987

Reynolds’ Rules:Alignment : align headings

Cohesion : steer towards average position of neighbors- towards c.g.Separation : steer to maintain separation from neighbors

( )i

i ij j ij N

a

Nature and Biological Systems are Resilient !

Page 8: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Resilient Control in Cooperative Multi-agent NetworksModeling and Control of Adversarial Networked TeamsCPS Applications-

Distributed Control of Renewable Energy MicrogridsShared Learning in Human-Robot Interactions

Page 9: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Multi-agent Networks on Communication GraphsRobustness of Optimal Design Reinforcement LearningCooperative Agents - Games on Communication GraphsCPS #1 – Resilient Distributed Control of Renewable Energy Microgrids

Page 10: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

F.L. Lewis, H. Zhang, A. Das, K. Hengster-Movric, Cooperative Control of Multi-Agent Systems: Optimal Design and Adaptive Control, Springer-Verlag, 2014

Key Point

Lyapunov Functions and Performance IndicesMust depend on graph topology

Hongwei Zhang, F.L. Lewis, and Abhijit Das“Optimal design for synchronization of cooperative systems: state feedback, observer and output feedback,”IEEE Trans. Automatic Control, vol. 56, no. 8, pp. 1948-1952, August 2011.

H. Zhang, F.L. Lewis, and Z. Qu, "Lyapunov, Adaptive, and Optimal Design Techniques for Cooperative Systems on Directed Communication Graphs," IEEE Trans. Industrial Electronics, vol. 59, no. 7, pp. 3026‐3041, July 2012.

Page 11: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Synchronized Motion of Biological Groups

Fishschool

Birdsflock

Locustsswarm

Firefliessynchronize

Nature and Biological Systems are Resilient

Page 12: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

The Power of Synchronization Coupled OscillatorsDiurnal Rhythm

Page 13: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

1

2

3

4

56

Diameter= length of longest path between two nodes

Volume = sum of in-degrees1

N

ii

Vol d

Spanning treeRoot node

Strongly connected if for all nodes i and j there is a path from i to j.

Tree- every node has in-degree=1Leader or root node

Followers

Communication Graph

Page 14: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Communication Graph 1

2

3

4

56

N nodes

[ ]ijA a

0 ( , )ij j i

i

a if v v E

if j N

oN1

Noi ji

jd a

Out-neighbors of node iCol sum= out-degree

42a

Adjacency matrix

0 0 1 0 0 01 0 0 0 0 11 1 0 0 0 00 1 0 0 0 00 0 1 0 0 00 0 0 1 1 0

A

iN1

N

i ijj

d a

In-neighbors of node iRow sum= in-degreei

(V,E)

iJohn Baras - social standing

Algebraic Graph Theory

Page 15: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

L=D-A = graph Laplacian matrix

1

N

dD

d

[ ]ijA a

Graph Laplacian MatrixAdjacency matrix

In-Degree Matrix

Page 16: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

1

2 3

4 5 6

12

3

4 5

6

Graph Eigenvalues for Different Communication Topologies

Directed Tree-Chain of command

Directed Ring-Gossip networkOSCILLATIONS

Page 17: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Graph Eigenvalues for Different Communication Topologies

Directed graph-

Undirected graph-

65

34

2

1

4

5

6

2

3

1

Page 18: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Synchronization on Good Graphs

Chris Elliott fast video

65

34

2

1

1

2 3

4 5 6

Mesh graph4 neighbors

Page 19: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Synchronization on Gossip Rings

Chris Elliott weird video

12

3

4 5

6

Ring graphor cycle

Page 20: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Dynamic Graph- the Distributed Structure of Control

Each node has an associated state i ix u

Standard local voting protocol ( )i

i ij j ij N

u a x x

1

1i i

i i ij ij j i i i iNj N j N

N

xu x a a x d x a a

x

( )u Dx Ax D A x Lx L=D-A = graph Laplacian matrix

x Lx

If x is an n-vector then ( )nx L I x

x

1

N

uu

u

1

N

dD

d

Closed-loop dynamics

i

j

[ ]ijA a

Page 21: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Communication Graph

N nodes or agents

State at node i is ( )ix t

Synchronization problem ( ) ( ) 0i jx t x t

Formation ControlSynchronization of Rigid-body Spacecraft DynamicsSynchronization of Trust Opinions in Multi-agent NetworksDiurnal Rhythm Synchronization in Nature

Page 22: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Controlled Consensus: Cooperative Tracker

Node state i ix uDistributed Local voting protocol with control node v

( ) ( )i

i ij j i i ij N

u a x x b v x

( ) 1x L B x B v

0ib If control v is in the neighborhood of node i

{ }iB diag b

Theorem. Let graph have a spanning tree and for at least one root node. Then L+B is nonsingular with all e-vals positiveand -(L+B) is asymptotically stableAnd all nodes synchronize to the control node

0ib

control node v

Local Neighborhood Tracking Error

1

N

xx

x

Global state

Page 23: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Synchronization on Good Graphs

Chris Elliott fast video

65

34

2

1

1

2 3

4 5 6

Mesh graph4 neighbors

Page 24: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Synchronization on Gossip Rings

Chris Elliott weird video

12

3

4 5

6

Ring graphor cycle

Page 25: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

2 1 1,

2 1 0A B

0.5 0.5K

Agent Dynamics and Local Feedback Design

i i ix Ax Bu

i iu Kx

1

2

3

4

56

Couple 6 agents with communication graph

Nodes synchronize to consensus heading

-350 -300 -250 -200 -150 -100 -50 0 50-300

-250

-200

-150

-100

-50

0

50

x

y

0( ) ( )i

i ij j i i ij N

e x x g x x

Local neighborhood tracking error

i iu K

0xc.g. leader

Example- Non Resilience in Communication Networks

Page 26: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

2 1 1,

2 1 0A B

0.5 0.5K

Agent Dynamics and local Feedback design

i i ix Ax Bu

i iu Kx

1

2

3

4

56

ADD another comm. Link- more information flow

0( ) ( )i

i ij j i i ij N

e x x g x x

Local neighborhood tracking error

i iu K

Causes Unstable Formation!WHY?

-30 -25 -20 -15 -10 -5 0 5 10 15 20-25

-20

-15

-10

-5

0

5

10

15

20

25

x

y

Example- Non Resilience in Communication Networks

Page 27: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Lewis and Syrmos1995

DECOUPLES CONTROL DESIGN FROM COMMUNICATION GRAPH STRUCTURE

OPTIMALDesign at Each node

Lewis and ZhangIEEE TAC 2011

LOCAL OPTIMAL DESIGN Guarantees Global Synchronization

12

0

( )T Ti i i i iJ Q u Ru dt

minimizes

0( ) ( )i

i i ij j i i ij N

u cK cK e x x g x x

Page 28: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Multi-agent Networks on Communication GraphsRobustness of Optimal Design Reinforcement LearningCooperative Agents - Games on Communication GraphsCPS #1 – Resilient Distributed Control of Renewable Energy Microgrids

Page 29: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Cell HomeostasisThe individual cell is a complex feedback control system. It pumps ions across the cell membrane to maintain homeostatis, and has only limited energy to do so.

Cellular Metabolism

Permeability control of the cell membrane

http://www.accessexcellence.org/RC/VL/GG/index.html

Optimality in Biological SystemsWhy are Biological Systems Resilient?

Page 30: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Reinforcement Learning

Every living organism improves its control actions based on rewards received from the environment

The resources available to living organisms are usually meager.Nature uses optimal control.

1. Apply a control.  Evaluate the benefit of that control.2. Improve the control policy.

RL finds optimal policies by evaluating the effects of suboptimal policies

Optimality in Biological Systems

Optimality Provides an Organizational Principle for Behavior

Charles Darwin showed that Optimal Control over long timescalesIs responsible for Natural Selection of Species

Why are Biological Systems Resilient?

Page 31: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

D. Vrabie, K. Vamvoudakis, and F.L. Lewis, Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles, IET Press, 2012.

BooksF.L. Lewis, D. Vrabie, and V. Syrmos, Optimal Control, third edition, John Wiley and Sons, New York, 2012.

New Chapters on:Reinforcement LearningDifferential Games

Page 32: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

F.L. Lewis and D. Vrabie,“Reinforcement learning andadaptive dynamic programmingfor feedback control,”IEEE Circuits & SystemsMagazine, Invited Feature Article,pp. 32-50, Third Quarter 2009.

IEEE Control Systems Magazine,F. Lewis, D. Vrabie, and K.Vamvoudakis,“Reinforcement learning andfeedback Control,” Dec. 2012

Page 33: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

xux A x B u

SystemControlK

PBPBRQPAPA TT 10

1 TK R B P

On-line real-timeControl Loop

FORMAL DESIGN PROCEDURE

Off-line Design LoopUsing ARE

Optimal Control- The Linear Quadratic Regulator (LQR)

An Offline Design Procedurethat requires Knowledge of system dynamics model (A,B)

System modeling is expensive, time consuming, and inaccurate

( , )Q R

User prescribed optimization criterion ( ( )) ( )T T

t

V x t x Qx u Ru d

OPTIMAL CONTROL IS RESILIENTROBUSTHAS GUARANTEED STABILITY MARGINS

Page 34: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Aircraft Autopilots -Linear Quadratic RegulatorLinear Quadratic Gaussian LTR

Page 35: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

A. Optimal control B. Zero-sum games C. Non zero-sum games

1. System dynamics

2. Value/cost function

3. Bellman equation

4. HJ solution equation (Riccati eq.)

5. Policy iteration – gives the structure we need

Adaptive Control Structures for Resilience:

We want to find optimal control solutions Online in real-time Using adaptive control techniquesWithout knowing the full dynamics

For nonlinear systems and general performance indices

Page 36: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Adaptive Control is online and works for unknown systems.Generally not Optimal

Optimal Control is off-line, and needs to know the system dynamics to solve design eqs.

We want ONLINE DIRECT ADAPTIVE OPTIMAL ControlFor any performance cost of our own choosing

Reinforcement Learning turns out to be the key to this!

RL brings together Optimal Control and Adaptive Control

Page 37: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Different methods of learning

SystemAdaptiveLearning system

ControlInputs

outputs

environmentTuneactor

Reinforcementsignal

Actor

Critic

Desiredperformance

Reinforcement learningIvan Pavlov 1890s

Actor-Critic Learning

We want OPTIMAL performanceADP- Approximate Dynamic Programming

Sutton & Barto book

Page 38: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

t

T

t

dtRuuxQdtuxrtxV ))((),())((

( , , ) ( , ) ( , ) ( ) ( ) ( , ) 0T TV V VH x u V r x u x r x u f x g x u r x u

x x x

112( ) ( )T Vu h x R g x

x

dxdVggR

dxdVxQf

dxdV T

TT *1

*

41

*

)(0

Nonlinear System dynamics 

Cost/value 

Bellman Equation, in terms of the Hamiltonian function

Stationary Control Policy

Hamilton‐Jacobi‐Bellmen Equation

Derivation of Nonlinear Optimal Regulator

Off‐line solutionHJB hard to solve.   May not have smooth solution.Dynamics must be known

Stationarity condition 0Hu

( , ) ( ) ( )x f x u f x g x u

Leibniz gives Differential equivalent

To find online methods for optimal control Iterate on these two equations

Page 39: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

),,(),(),(0 uxVxHuxruxf

xV T

0 ( , ( )) ( , ( ))T

jj j

Vf x h x r x h x

x

(0) 0jV

1121( ) ( ) jT

j

Vh x R g x

x

CT Policy Iteration – a Reinforcement Learning Technique

• Convergence proved by Leake and Liu 1967, Saridis 1979 if Lyapunov eq. solved exactly

• Beard & Saridis used Galerkin Integrals to solve Lyapunov eq.• Abu Khalaf & Lewis used NN to approx. V for nonlinear systems and proved convergence

RuuxQuxr T )(),(Utility 

The cost is given by solving the CT Bellman equation

Policy Iteration Solution

Pick stabilizing initial control policy

Policy Evaluation ‐ Find cost, Bellman eq.

Policy improvement ‐ Update control Full system dynamics must be knownOff‐line solution

dxdVggR

dxdVxQf

dxdV T

TT *1

*

41

*

)(0

Scalar equation

M. Abu-Khalaf, F.L. Lewis, and J. Huang, “Policyiterations on the Hamilton-Jacobi-Isaacs equation for H-infinity state feedback control with input saturation,”IEEE Trans. Automatic Control, vol. 51, no. 12, pp.1989-1995, Dec. 2006.

( ) ( )u x h xGiven any admissible policy

0 ( )h x

Converges to solution of HJB

Page 40: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Lemma 1 – Draguna Vrabie

Solves Bellman equation without knowing f(x,u)

( ( )) ( , ) ( ( )), (0) 0t T

t

V x t r x u d V x t T V

0 ( , ) ( , ) ( , , ), (0) 0TV Vf x u r x u H x u V

x x

Allows definition of temporal difference error for CT systems

( ) ( ( )) ( , ) ( ( ))t T

t

e t V x t r x u d V x t T

Integral reinf. form  (IRL) for the CT Bellman eq.Is equivalent to

( ( )) ( , ) ( , ) ( , )t T

t t t T

V x t r x u d r x u d r x u d

value

Key Idea

Bad Bellman Equation

Good Bellman Equation

Page 41: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

( ( )) ( , ) ( ( ))t T

k k kt

V x t r x u dt V x t T

IRL Policy iteration

Initial stabilizing control is needed

Cost update

Control gain update

f(x) and g(x) do not appear

g(x) needed for control update

Policy evaluation‐ IRL Bellman Equation

Policy improvement11

21 1( ) ( )T kk k

Vu h x R g xx

),,(),(),(0 uxVxHuxruxf

xV T

Equivalent to

Solves Bellman eq. (nonlinear Lyapunov eq.) without knowing system dynamics

CT Bellman eq.

Integral Reinforcement Learning (IRL)- Draguna Vrabie

D. Vrabie proved convergence to the optimal value and controlAutomatica 2009, Neural Networks 2009

Converges to solution to HJB eq.dx

dVggRdx

dVxQfdx

dV TTT *

1*

41

*

)(0

Page 42: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

( ( )) ( ) ( ( ))t T

Tk k k k

t

V x t Q x u Ru dt V x t T

Approximate value by Weierstrass Approximator Network ( )TV W x

( ( )) ( ) ( ( ))t T

T T Tk k k k

t

W x t Q x u Ru dt W x t T

( ( )) ( ( )) ( )t T

T Tk k k

t

W x t x t T Q x u Ru dt

regression vector Reinforcement on time interval [t, t+T]

kWNow use RLS along the trajectory to get new weights

Then find updated FB

1 11 12 21 1

( ( ))( ) ( ) ( )( )

TT Tk

k k kV x tu h x R g x R g x Wx x t

Real-time ImplementationApproximate Dynamic Programming

Direct Optimal Adaptive Control for Partially Unknown CT Systems

Value Function Approximation (VFA) to Solve Bellman Equation

Scalar equation with vector unknowns

Same form as standard System ID problems

Page 43: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

( ( )) ( ( )) ( )t T

T Tk k k

t

W x t x t T Q x u Ru dt

2

( ( )) ( ( 2 )) ( )t T

T Tk k k

t T

W x t T x t T Q x u Ru dt

3

2

( ( 2 )) ( ( 3 )) ( )t T

T Tk k k

t T

W x t T x t T Q x u Ru dt

11 12

12 22

p pp p

11 12 22TW p p p

Solving the IRL Bellman Equation

Need data from 3 time intervals to get 3 equations to solve for 3 unknowns

Now solve by Batch least-squares

Solve for value function parameters

Page 44: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

t t+T

observe x(t)

apply uk=Lkx

observe cost integral

update P

Do RLS until convergence to Pk

update control gain

Integral Reinforcement Learning (IRL)

Data set at time [t,t+T)

( ), ( , ), ( )x t t t T x t T

t+2T

observe x(t+T)

apply uk=Lkx

observe cost integral

update P

t+3T

observe x(t+2T)

apply uk=Lkx

observe cost integral

update P

11

Tk kK R B P

( , )t t T ( , 2 )t T t T ( 2 , 3 )t T t T

Or use batch least-squares

(x( )) ( ( )) ( ) ( ) ( ) ( , )t T

T T Tk k k

tW t x t T x Q K RK x d t t T

Solve Bellman Equation - Solves Lyapunov eq. without knowing dynamics

This is a data-based approach that uses measurements of x(t), u(t)Instead of the plant dynamical model.

A is not needed anywhere

Page 45: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

( ) ( )k ku t K x t

Continuous‐time control with discrete gain updates

t

Kk

k0 1 2 3 4 5

Reinforcement Intervals T need not be the sameThey can be selected on‐line in real time

Gain update (Policy)

Control

T

Page 46: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Persistence of Excitation

Regression vector must be PE

Relates to choice of reinforcement interval T

( ( )) ( ( )) ( )t T

T Tk k k

t

W x t x t T Q x u Ru dt

Page 47: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Implementation Policy evaluationNeed to solve online

Add a new state= Integral Reinforcement

T Tx Q x u R u

This is the controller dynamics or memory

(x( )) ( ( )) ( ) ( ) ( ) ( , )t T

T T Tk k k

tW t x t T x Q K RK x d t t T

Page 48: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Direct Optimal Adaptive Controller

A hybrid continuous/discrete dynamic controller whose internal state is the observed cost over the interval

Draguna Vrabie

DynamicControlSystemw/ MEMORY

xu

V

ZOH T

x A x B u System

T Tx Q x u R u

Critic

ActorK

T T

Run RLS or use batch L.S.To identify value of current control

Update FB gain afterCritic has converged

Reinforcement interval T can be selected on line on the fly – can change

Solves Riccati Equation Online without knowing A matrixOPTIMAL CONTROL IS

RESILIENTROBUSTHAS GUARANTEED STABILITY MARGINS

Page 49: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Kung Tz 500 BCConfucius

ArcheryChariot driving

MusicRites and Rituals

PoetryMathematics

孔子Man’s relations to

FamilyFriendsSocietyNationEmperorAncestors

Tian xia da tongHarmony under heaven

124 BC - Han Imperial University in Chang-an

Page 50: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Actor / Critic structure for CT Systems

Theta waves 4-8 Hz

Reinforcement learning

Motor control 200 Hz

Optimal Adaptive IRL for CT systems

A new structure of adaptive controllers

( ( )) ( , ) ( ( ))t T

k k kt

V x t r x u dt V x t T

1121 1( ) ( )T k

k kVu h x R g xx

D. Vrabie, 2009

Page 51: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Motor control 200 Hz

Oscillation is a fundamental property of neural tissue

Brain has multiple adaptive clocks with different timescales

theta rhythm, Hippocampus, Thalamus, 4-10 Hzsensory processing, memory and voluntary control of movement.

gamma rhythms 30-100 Hz, hippocampus and neocortex high cognitive activity.

• consolidation of memory• spatial mapping of the environment – place cells

The high frequency processing is due to the large amounts of sensorial data to be processed

Spinal cord

D. Vrabie and F.L. Lewis, “Neural network approach to continuous-time direct adaptive optimal control forpartially-unknown nonlinear systems,” Neural Networks, vol. 22, no. 3, pp. 237-246, Apr. 2009.

D. Vrabie, F.L. Lewis, D. Levine, “Neural Network-Based Adaptive Optimal Controller- A Continuous-Time Formulation -,” Proc. Int. Conf. Intelligent Control, Shanghai, Sept. 2008.

Page 52: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Doya, Kimura, Kawato 2001

Limbic system

Motor control 200 Hz

theta rhythms 4-10 HzDeliberativeevaluation

control

OPTIMALITY AND RESILIENCEOF BIOLOGICAL SYSTEMS

Page 53: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Simulation 1‐ F‐16 aircraft pitch rate controller

1.01887 0.90506 0.00215 00.82225 1.07741 0.17555 0

0 0 1 1x x u

Stevens and Lewis 2003

*1 11 12 13 22 23 33[p 2p 2p p 2p p ]

[1.4245 1.1682 -0.1352 1.4349 -0.1501 0.4329]

T

T

W

PBPBRQPAPA TT 10 ARE

Exact solution

,Q I R I

][ eqx

Select quadratic NN basis set for VFA

Page 54: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

0 0.5 1 1.5 2-0.3

-0.2

-0.1

0Control signal

Time (s)

0 0.5 1 1.5 2-0.4

-0.2

0Controller parameters

Time (s)

0 0.5 1 1.5 20

0.5

1

1.5

2

2.5

3

3.5

4System states

Time (s)

0 1 2 3 4 5 60

0.05

0.1

0.15

0.2

Critic parameters

Time (s)

P(1,1)P(1,2)P(2,2)P(1,1) - optimalP(1,2) - optimalP(2,2) - optimal

Simulations on:  F‐16 autopilot

A matrix not needed 

Converge to SS Riccati equation soln

Solves ARE online without knowing A

PBPBRQPAPA TT 10

Page 55: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

1/ / 0 0 00 1/ 1/ 0 0

,1/ 0 1/ 1/ 1/

0 0 0 0

p p p

T T

G G G G

E

T K TT T

A BRT T T T

K

x Ax Bu

( ) [ ( ) ( ) ( ) ( )]Tg gx t f t P t X t E t

FrequencyGenerator outputGovernor positionIntegral control

Simulation 2: Load Frequency Control of Electric Power system

PBPBRQPAPA TT 10

0.4750 0.4766 0.0601 0.4751

0.4766 0.7831 0.1237 0.3829

0.0601 0.1237 0.0513 0.0298

0.4751 0.3829 0.0298 2.3370

.AREP

ARE

ARE solution using full dynamics model (A,B)

A. Al-Tamimi, D. Vrabie, Youyi Wang

Page 56: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

0.4750 0.4766 0.0601 0.4751

0.4766 0.7831 0.1237 0.3829

0.0601 0.1237 0.0513 0.0298

0.4751 0.3829 0.0298 2.3370

.AREP

PBPBRQPAPA TT 10

0 10 20 30 40 50 60-0.5

0

0.5

1

1.5

2

2.5P matrix parameters P(1,1),P(1,3),P(2,4),P(4,4)

Time (s)

P(1,1)P(1,3)P(2,4)P(4,4)

( ), ( ), ( : )x t x t T t t T Fifteen data points

Hence, the value estimate was updated every 1.5s.

IRL period of T= 0.1s.

Solves ARE online without knowing A

0.4802 0.4768 0.0603 0.4754

0.4768 0.7887 0.1239 0.3834

0.0603 0.1239 0.0567 0.0300

0.4754 0.3843 0.0300 2.3433

.critic NNP

0 1 2 3 4 5 6-0.1

-0.08

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

0.08

0.1System states

Time (s)

Page 57: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization
Page 58: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Adaptive Control

Plantcontrol output

Identify the Controller-Direct Adaptive

Identify the system model-Indirect Adaptive

Identify the performance value-Optimal Adaptive

)()( xWxV TOPTIMALITY YIELDSRESILIENCEROBUSTNESSPERFORMANCE

GUARANTEES

Page 59: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization
Page 60: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Multi-agent Networks on Communication GraphsRobustness of Optimal Design Reinforcement LearningCooperative Agents - Games on Communication GraphsCPS #1 – Resilient Distributed Control of Renewable Energy Microgrids

Page 61: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Sun Tz bin fa孙子兵法

Games on Communication Graphs

500 BC

Page 62: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Fast Decision UsingNew Concept of Graphical Games to limit each agent’s decision horizon

Faster decisions based on local informationUnder certain conditions, the local decisions are globally optimalNash Equilibrium on Graphs is defined

Speed up Multi-agent Decision-making by Imposing Sparse EfficientCommunication Structures

Bounded Rationality -

Page 63: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Manufacturing as the Interactions of Multiple AgentsEach machine has it own dynamics and cost functionNeighboring machines influence each other most stronglyThere are local optimization requirements as well as global necessities

( )i

i i i i i i ij j jj N

A d g B u e B u

12

0

( (0), , ) ( )i

T T Ti i i i i ii i i ii i j ij j

j N

J u u Q u R u u R u dt

Each process has its own dynamics

And cost function

Each process helps other processes achieve optimality and efficiency

Page 64: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

F.L. Lewis, H. Zhang, A. Das, K. Hengster-Movric, Cooperative Control of Multi-Agent Systems: Optimal Design and Adaptive Control, Springer-Verlag, 2014

Key Point

Lyapunov Functions and Performance IndicesMust depend on graph topology

Hongwei Zhang, F.L. Lewis, and Abhijit Das“Optimal design for synchronization of cooperative systems: state feedback, observer and output feedback,”IEEE Trans. Automatic Control, vol. 56, no. 8, pp. 1948-1952, August 2011.

H. Zhang, F.L. Lewis, and Z. Qu, "Lyapunov, Adaptive, and Optimal Design Techniques for Cooperative Systems on Directed Communication Graphs," IEEE Trans. Industrial Electronics, vol. 59, no. 7, pp. 3026‐3041, July 2012.

Page 65: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

,i i i ix Ax B u

0 0x Ax

0( ) ( ),ix t x t i

0( ) ( ),i

i ij i j i ij N

e x x g x x

( ) ,nix t ( ) im

iu t

Graphical GamesSynchronization‐ Cooperative Tracker Problem

Node dynamics

Target generator dynamics

Synchronization problem

Local neighborhood tracking error (Lihua Xie)

x0(t)

( )i

i i i i i i ij j jj N

A d g B u e B u

12

0

( (0), , ) ( )i

T T Ti i i i i ii i i ii i j ij j

j N

J u u Q u R u u R u dt

12

0

( ( ), ( ), ( ))i i i iL t u t u t dt

Local nbhd. tracking error dynamics

Define Local nbhd. performance index

Local agent dynamics driven by neighbors’ controls

Values driven by neighbors’ controls

K.G. Vamvoudakis, F.L. Lewis, and G.R. Hudas, “Multi-Agent Differential Graphical Games: online adaptive learning solution for synchronization with optimality,” Automatica, vol. 48, no. 8, pp. 1598-1611, Aug. 2012.

Impose Optimality to get Resilience and Robustness

Page 66: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

1u

2u

iu Control action of player i

Value function of player i

New Differential Graphical Game

( )i

i i i i i i ij j jj N

A d g B u e B u

State dynamics of agent i

Local DynamicsLocal Value Function

Only depends on graph neighbors

12

0

( (0), , ) ( )i

T T Ti i i i i ii i i ii i j ij j

j N

J u u Q u R u u R u dt

OPTIMAL CONTROL IS RESILIENTROBUSTHAS GUARANTEED STABILITY MARGINS

Page 67: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

1

N

i ii

z Az B u

12

10

( (0), , ) ( )N

T Ti i i j ij j

j

J z u u z Qz u R u dt

1u

2u

iu Control action of player i

Central Dynamics

Value function of player i

Standard Multi-Agent Differential Game

Central DynamicsLocal Value Functiondepends on ALL

other control actions

Page 68: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

1 1 11 1 2 3 1 2 1 3 13 3 3( ) ( ) ( ) coi

teamJ J J J J J J J J J

1 1 12 1 2 3 2 1 2 3 23 3 3( ) ( ) ( ) coi

teamJ J J J J J J J J J

1 1 13 1 2 3 3 1 3 2 33 3 3( ) ( ) ( ) coi

teamJ J J J J J J J J J

The objective functions of each player can be written as a team average term plus a conflict of interest term:

1 1

1 1

( ) , 1,N N

coii j i j team iN N

j j

J J J J J J i N

For N-player zero-sum games, the first term is zero, i.e. the players have no goals in common.

For N-players

Team Interest vs. Self InterestCooperation vs. Collaboration

Page 69: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

* * *( , ) ( , ), ,i j G j i j G jJ u u J u u i j N

New Definition of Nash Equilibrium for Graphical Games

* * *1 2, ,...,u u u

* * * *( , ) ( , ),i i i G i i i G iJ J u u J u u i N

Def:  Interactive Nash equilibrium

are in Interactive Nash equilibrium if

2.  There exists a policy         such that ju

1.

That is, every player can find a policy that changes the value of every other player. 

i.e. they are in Nash equilibrium

Theorem 3.  Let (A,Bi) be reachable for all i.  Let agent i be in local best response

Then                            are in global Interactive Nash iff the graph is strongly connected.

*( , ) ( , ),i i i i i iJ u u J u u i

* * *1 2, ,...,u u u

To restore the Symmetry of Nash equilibrium

Page 70: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

1 1 12 2 2( , , , ) ( ) 0

i i

TT T Ti i

i i i i i i i i i ij j j i ii i i ii i j ij ji i j N j N

V VH u u A d g B u e B u Q u R u u R u

10 ( ) Ti ii i i ii i

i i

H Vu d g R B

u

2 1 2 1 11 1 12 2 2( ) ( ) 0,

i

TT Tj jc T T Ti i i

i i ii i i i i ii i j j j jj ij jj ji i i j jj N

V VV V VA Q d g B R B d g B R R R B i N

2 1 1( ) ( ) ,i

jc T Tii i i i i ii i ij j j j jj j

i jj N

VVA A d g B R B e d g B R B i N

* *( , , , ) 0ii i i i

i

VH u u

* 2 11 1 12 2 20 ( , , , ) ( )

i

T Tc T T Ti i i i

i i i i i i ii i i i i ii i j ij ji i i i j N

V V V VH u u A Q d g B R B u R u

2 1( )

i

c T ii i i i i ii i ij j j

i j N

VA A d g B R B e B u

12( ( )) ( )

i

T T Ti i i ii i i ii i j ij j

j Nt

V t Q u R u u R u dt

Value function

Differential equivalent (Leibniz formula) is Bellman’s Equation

Stationarity Condition

1.  Coupled HJ equations

2.  Best Response HJ Equations – other players have fixed policies

where

where

Graphical Game Solution Using Integral Reinforcement Learning

ju

To find online methods for GG

Iterate on these two equations

Page 71: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Online Solution of Graphical Games

Use Reinforcement LearningConvergence Results

POLICY ITERATION

Kyriakos Vamvoudakis

MULTI-AGENT LEARNING

Page 72: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Multi-agent Networks on Communication GraphsRobustness of Optimal Design Reinforcement LearningCooperative Agents - Games on Communication GraphsCPS #1 – Resilient Distributed Control of Renewable Energy Microgrids

Page 73: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Distributed Autonomy in Active Energy Distribution Systems

Ali Davoudi, F.L. Lewis, Chris Edrington

Supported by ONR Tony Seman & Lynn Petersen

Page 74: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

What is a micro‐grid?• Micro-grid is a small-scale power system that provides the power for

a group of consumers.• Micro-grid enables

local power support for local and critical loads.

• Micro-grid has the ability to work in both grid-connected and islanded modes.

• Micro-grid facilitates the integration of Distributed EnergyResources (DER).

Photo from: http://www.horizonenergygroup.com

Page 75: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

• The main building block of smart-grids

• Rural plants

• Business buildings, hospitals,and factories

Smart‐grid photo from: http://www.sustainable‐sphere.com

An introduction to micro‐grids: Micro‐grid applications

Page 76: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Distributed Generators (DG)Distributed Energy Resources (DER)

• Non-renewables Internal combustion engine Micro-turbines Fuel cells

• Renewables Photovoltaic Wind Hydroelectric Biomass

104

Page 77: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Micro‐grid Advantages

• Micro-grid provides high quality and reliable power to the criticalconsumers

• During main grid disturbances, micro-grid can quickly disconnectform the main grid and provide reliable power for its local loads

• DGs can be simply installed close to the loads which significantlyreduces the power transmission line losses

• By using renewable energy resources, a micro-grid reduces CO2emissions

105

Page 78: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Micro‐grid Hierarchical Control Structure

Tertiary ControlmodesOptimal operation in both operating

modes

Secondary Control

Primary Control

MicrogridTie

Power flow control in grid-tied mode

Voltage deviation mitigationFrequency deviation alleviation

Voltage stability provision

preservingFrequency stability

preservingPlug and play capability for DGs

Main grid

Do coop. ctrl. here toSynchronize frequencyand voltage 

Bidram, A., & Davoudi, A. (2012). Hierarchical structure of microgrids control system. IEEE Transactions on Smart Grid, vol. 3, pp. 1963‐1976, Dec 2012.

Maintains Stability

Page 79: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Cooperative Game-theoretic Control of Active Loads in DC Microgrids

Ling-ling Fan, Vahidreza Nasirian, Hamidreza Modares,

Frank L. Lewis, Yong-duan Song, and Ali Davoudi,

3t

2t

1t

3t

2t

1t

e

outp

inp

e

outp

3t

2t

1t

inp

Power buffer operation during a step change in power demand.

18r

48r

58r

59r

47r

27r

67r

69r

39r

iv i

p

s1vs1

r

iu

ie

Power buffers in Microgrid Network

Page 80: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

2

,i

i ii

i i

ve p

rr u

ìïïï = -ïíïï =ïïî

Active Load Power Buffer

Stored energyInput impedanceBus voltage Control input Output power = a disturbance

ieir

iviu

ip

Vahid Nasirian

Page 81: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

2

0

d , 1, , ,i

i j ij j i ij N

J u t i M M Nr¥

Î

æ ö÷ç ÷ç= + = + +÷ç ÷ç ÷çè øåò x Q xT

Define coupled performance indices

( )

2q q

1( )q

0 00 2 1

1 00 0 0

0 10 0 0

2 0 ,

0

i i i ii

i ii ii i

i i i i

i i

M N

ij jj M i

i

e ei i

r r u w

p p

r

i

g

g+

= + ¹

é ùé ù é ù é ù é ù- -ê úê ú ê ú ê ú ê úê úê ú ê ú ê ú ê úê ú= + + +ê ú ê ú ê ú ê úê úê ú ê ú ê ú ê úê úê ú ê ú ê ú ê úê úë û ë û ë û ë ûë û

é ùê úê úê úê ú+ ê úê úê úê úë û

å

x x B DA

1, , ,i M M N= + +

Solve for bus voltage to get coupled agent dynamics

Define Communication GraphSparse efficient topologyOptimal design provides Resilience

and disturbance rejection

Vahid NasirianReza Modares

Dr. Ali Davoudi

Page 82: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

111

Microgrid

DG 1DG 2 DG 3

DG 4

DG 5

DG 6DG 7

DG 8

DG 1DG 2 DG 3

DG 4

DG 5

DG 6

DG 8

DG 7

Communication link

Cybercommunication

framework

Micro‐grid secondary control:Distributed CPS structure

Physical LayerThe interconnect structure of the power grid

Cyber layerA sparse, efficient communication network to allow

cooperative control for synchronization ofvoltage and frequency

Work of Ali BidramWith Dr. A. Davoudi

Cyber Physical System (CPS)

Resilience Because of-Sparse, Efficient communication networkOptimal Design- Graph GamesDistributed computation – no SPOF

Page 83: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Case Study

Physical Microgrid Layer

Cyber Communication Layer

A 48 V DC distribution network with three active loads and three sources

Page 84: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

1 2 3 4 5 6 7 8 9

47

48

49

Time (s)

MG

Vol

tage

(V)

1 2 3 4 5 6 7 8 960

80

100

120

Time (s)

Buf

fer V

olta

ge (V

)

1 2 3 4 5 6 7 8 9

47

48

49

Time (s)

Load

Vol

tage

(V)

1 2 3 4 5 6 7 8 91.5

22.5

33.5

Time (s)

Sou

rce

Cur

rent

(A)

1 2 3 4 5 6 7 8 910

15

20

25

Time (s)

Sto

red

Ene

rgy

(J)

1 2 3 4 5 6 7 8 90

10

20

30

Time (s)

Res

ista

nce

( )

1 2 3 4 5 6 7 8 950

100

150

Time (s)O

utpu

t Pow

er (W

)

10 20 3010

15

20

25

Sto

red

Ene

rgy

(J)

10 20 30Input Resistance ( )

10 20 30

Ω

Buffer 4 Buffer 5 Buffer 6

Ω

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

s1i

s2i

s3i

b4v

b5v

b6v

4v

5v

6v

o4v

o5v

o6v

4e

5e

6e

4r

5r

6r

4p5p

6p

A B DC

Load change at terminal 5

Page 85: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Optimal Leader-Follower TrackersCPS #2 – Resilient Human-robot Interaction Systems

Page 86: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

On‐policy RL

116

Target and behaviorpolicy

SystemRef.

Target policy: The policy that we are learning about.

Behavior policy: The policy that generates actions and behavior

Target policy and behavior policy are the same

Page 87: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Off‐policy RL

117

Behavior Policy System

Target policy

Ref.

Target policy and behavior policy are different

Humans can learn optimal policies while actually applying suboptimal policies

Page 88: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Off-policy IRLHumans can learn optimal policies while actually applying suboptimal policies

( ) ( )x f x g x u

( ) ( ), ( )t

J x r x u d

[ [ [] [ ]] ]( ( )) ( ( )) ( )i i

t T t

t i i

T

t TJ x t J x t T Q x d Ru du

On-policy IRL

[ 1] 1 [ ]12

i T ixu R g J

[ ] [ ]( )i ix f gu g u u Off-policy IRL

[ ] [ ] [ ] [ ] [ 1] [ ]( ( )) ( ( )) ( ) 2 ( )t t tii i

t T t T t T

T i i T iJ x t J x t T Q x d R d du u u R u u

This is a linear equation for andThey can be found simultaneously online using measured data using Kronecker product and VFA

[ ]iJ [ 1]iu

1. Completely unknown system dynamics2. Can use applied u(t) for –

disturbance rejection – Z.P. Jiang - R. Song and Lewis, 2015robust control – Y. Jiang & Z.P. Jiang, IEEE TCS 2012exploring probing noise – without bias ! - J.Y. Lee, J.B. Park, Y.H. Choi 2012

DDO

Yu Jiang & Zhong-Ping Jiang, Automatica 2012

system

value

Must know g(x)

Page 89: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Augmented system state: ( ) ( ) ( )T

T T

dX t x t y té ù= ê úë û

Augmented system: 1

A BX X u T X B u

F

é ù é ùê ú ê ú= + º +ê ú ê úê ú ê úë û ë û

00 0

Value function:

( ) ( )1 1

( ) ( ) ( )2 2

t T T T

Tt

V X t e X Q X u R u d X t P X tg t t¥

- - é ù= + =ê úë ûò

1 1 1, [ ]TTQ C QC C C I= = -

LQT Bellman equation:

1 10 ( ) ( )T T T T T

TTX B u P X X P TX B u X P X X Q X u Rug= + + + - + +

121

Assume the reference trajectory is generated by d dy Fy

x A x B u

y C x

= +=

System Dynamics

LQ Tracker Problem for Continuous-time Systems

Page 90: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Online solution to the CT LQT ARE: Off‐policy IRL

1 1( )i

iX TX B u T X B K X u= + = + + ,

1

i

iT T B K= -

( )

( ) ( )1

1

( ) ( ) ( ) ( ) ( )

( ) 2 ( )

t TT T i T i t T i

t

t T t Tt T iT i t i T T i

Tt t

iRK

de X t T P X t T X t P X t e X P X d

d

e X Q K RK X d e u K X B P X d

g g t

g t g t

tt

t t

+- - -

+ +- - - -

+

+ + - = -

= - + + +

ò

ò ò

Off-policy IRL Bellman equation

Algorithm. Online Off-policy IRL algorithm for LQT

Online step: Apply a fixed control input and collect some data

Offline step: Policy evaluation and improvement using LS on collected data

( )( ) ( ) ( ) ( )t T

T T i T i t Tit

e X t T P X t T X t P X t e X Q X dg g t t+

- - -+ + - =- +ò

( ) 12 ( )t T

t i T i

te u K X RK X dg t t

+- - ++ò No knowledge of the dynamics is required

125

Humans can learn optimal policies while actually applying suboptimal policies

Page 91: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Convergence

Lemma: The Bellman equation, the IRL Bellman equation and the off-policy IRL

Bellman equation have the same solution for the value function and the control

update law.

Theorem. The IRL Algorithm and Off-policy IRL Algorithm for the LQT problem

converge to the optimal solution.

127

Page 92: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Optimal Leader-Follower TrackersCPS #2 – Resilient Human-robot Interaction Systems

Page 93: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

PR2 meets Isura

Page 94: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

H. Modares, I. Ranatunga, F.L. Lewis, and D.O. Popa, “Optimized Assistive Human-robot Interaction using Reinforcement Learning,” IEEE Transactions on Cybernetics, to appear 2015.

RL for Human-Robot Interactions

RobotManipulator

TorqueDesign

PrescribedImpedance Model

xhf

mx

e

th

KHuman

FeedforwardControl

‐d

x+

-

Performance

Robot-specific inner control loop

Task-specific outer-loop control designUsing IRL Optimal Tracking Implemented on PR2 robot

2-Loop HRI Learning Controller Based on Human Factors Performance StudiesHumans learn 2 components in HRI-

1. must learn to compensate for nonlinear robot dynamics2. must learn to perform the task

Our result - The Robot adapts in order to minimize human effort

Page 95: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Robot control inner loopusing Neural Adaptive Learning

Task control outer loopusing Online Reinforcement Learning

2-Loop HRI Learning Controller Based on Human Factors Performance StudiesThe Robot adapts in order to minimize human effort

Human-Robot Interaction Using Reinforcement Learning

Page 96: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization
Page 97: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Modeling and Control of Adversarial Networked TeamsBipartite Synchronization on Antagonistic Communication GraphsZero-Sum GamesMulti-agent Pursuit-Evasion Games

Page 98: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Modeling and Control of Adversarial Networked TeamsBipartite Synchronization on Antagonistic Communication GraphsZero-Sum GamesMulti-agent Pursuit-Evasion Games

Page 99: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

2

0

2

0

2

0

2

0

2

)(

)(

)(

)(

dttd

dtuhh

dttd

dttz T

System( ) ( ) ( )( )

TT T

x f x g x u k x dy h x

z y u

( )u l x

d

u

z

x control

Performance output disturbance

Find control u(t) so that 

For all L2 disturbancesAnd a prescribed gain 

L2 Gain Problem

H‐Infinity Control Using Neural Networks

Zero‐Sum differential gameNature as the opposing player

Disturbance Rejection

TwoAntagonistic Inputs

Page 100: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

2* 2

0

( (0)) min max ( (0), , ) min max ( ) ( )T T

u ud dV x V x u d h x h x u Ru d dt

Define 2‐player zero‐sum game as

min max ( (0), , ) max min ( (0), , )u ud d

V x u d V x u d

The game has a unique value (saddle‐point solution) iff the Nash condition holds

Optimal Game Design YieldsResilienceRobustnessGuaranteed PerformanceDisturbance Rejection

TwoAntagonistic Inputs

Page 101: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

control node v

Cooperative and Adversarial Multi-agent NetworkAdversarial disturbance at each agent

Agents want to cooperate to synchronize to the control leader

Zero-sum Graphical Games

Page 102: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

i i i i

i i

x Ax Bu Ddy Cx

( ) ( ) ( )dy (I )

I A I B u I DC

Cooperative Synchronization0 0

0 0

x Axy Cx

System

leader

0x Ix

2( ) T Tz t Q u Ru

H

Performance output

2

0 0

0 0

2

2

( ) ( )

( ) ( ) ( )

T T

TT

z t dt Q u Ru dt

d t dt d t Td t dt

Bounded L2 synchronization error  

2

0

( , , ) ( )T T TJ u d Q u Ru d Td dt

H‐inf performance index

Resilience to Network Disturbances

Page 103: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Condition on graph topology

Condition for existenceof local OPFB

1 1( ) PR L G

2 2 22TR K C B P M

Kristian Hengster-MovricCooperative SynchronizationH

J. Gadewadikar, Frank L. Lewis, L. Xie, V. Kucera and M. Abu-Khalaf, “Parameterization of all stabilizing H∞static state-feedback gains: Application to output-feedback design,” Automatica, vol. 43, no. 9, pp. 1597-1604, September 2007

Page 104: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Condition on graph topology

Condition for existenceof local OPFB

1 1( ) PR L G

2 2 22TR K C B P M

ye L G C

Fine Local OPFB structure Coarse global OPFB structure

Symmetry between OPFB and Graph Topology Information Restrictions

Same Same

Communication Network Limits Available Information Exchange

Communication Network Tries to Destroy OptimalityProper Controls Design Restores Optimality

0(y ) (y )i

i i yi ij j i i ij N

u cKC cKe cK e y g y

i iy Cx

Network info flow restrictions

Local system measurement restrictions

Page 105: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

0 0

0 0

,,

x Axy Cx

2-player Zero-sum Nash Equilibrium

,,

i i i i

i i

x Ax Bu Ddy Cx

Command generator

Agent Dynamics

Nash Equilibrium

2

0

( , , ) ( )T T TJ u d Q u Ru d Td dt

H‐inf performance index

Resilience to Network Disturbances

Page 106: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

control node v

Cooperative and Adversarial Multi-agent NetworkAdversarial disturbance at each agent

Agents want to cooperate to synchronize to the control leader

Zero-sum Graphical Games

Page 107: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

Theorem 3. Under the hypotheses of Theorem 2, suppose that in addition matrix satisfies2M

* * * *2 2 2 2 2 2 2 2 2 2 2( ) ( ) ( ) ( ) 0T T T T TC K K R K K C M K K C C K K M

for any *2 2K K

* *2( )u c L G K C

* 2 1 Td T D P

Then a Nash equilibrium is provided by

2

0

( , , ) ( )T T TJ u d Q u Ru d Td dt

with Q,R,T given in Theorem 2 and

1 2 12 2 2 2 2 2 2 2 2 22 0T T T TA P P A Q P BR B P P DD P M R M

12 2 2

*2 ( )TK C R B P M

1 1 ( )P cR L G

1 2P P P

Kristian Hengster-Movric

Optimal H-inf control is distributed

Worst case disturbance is NOT distributed

2-player Zero-sum Nash Equilibrium,

,i i i i

i i

x Ax Bu Ddy Cx

Agent Dynamics

Page 108: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization

The cloud-capped towers, the gorgeous palaces,The solemn temples, the great globe itself,Yea, all which it inherit, shall dissolve,And, like this insubstantial pageant faded,Leave not a rack behind.

We are such stuff as dreams are made on, and our little life is rounded with a sleep.

Our revels now are ended. These our actors, As I foretold you, were all spirits, and Are melted into air, into thin air.

Prospero, in The Tempest, act 4, sc. 1, l. 152-6, Shakespeare

Page 109: Resilient Control in - UTA lectures/2015 ISRCS Resilience … · Algebraic Graph Theory. ... Synchronization of Trust Opinions in Multi-agent Networks Diurnal Rhythm Synchronization