Download pdf - Synchronization of coupled oscillators is a game

CSLCOORDINATED SCIENCE LABORATORY

Synchronization of coupled oscillators is a game

Prashant G. Mehta1

1Coordinated Science LaboratoryDepartment of Mechanical Science and Engineering

University of Illinois at Urbana-Champaign

University of Maryland, March 4, 2010

Acknowledgment: AFOSR, NSF

Huibing Yin Sean P. Meyn Uday V. Shanbhag

H. Yin, P. G. Mehta, S. P. Meyn and U. V. Shanbhag, “Synchronization of coupled oscillators is a game,” ACC 2010

P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 2 / 69

Millennium bridge

Video of London Millennium bridge from youtube

[11] S. H. Strogatz et al., Nature, 2005


http://www.youtube.com/watch?v=gQK21572oSU

Classical Kuramoto model

dθi(t) =

(ωi +

κ

N

N

∑j=1

sin(θj(t)−θi(t))

)dt +σ dξi(t), i = 1, . . . ,N

ωi taken from distribution g(ω) over [1− γ,1+ γ]γ — measures the heterogeneity of the population

κ — measures the strength of coupling

[6] Y. Kuramoto (1975)



dθi(t) =

(ωi +

κ

N

N

∑j=1


)dt +σ dξi(t), i = 1, . . . ,N


κ — measures the strength of coupling 1- 1+1




dθi(t) =

(ωi +

κ

N

N

∑j=1


)dt +σ dξi(t), i = 1, . . . ,N






dθi(t) =

(ωi +

κ

N

N

∑j=1


)dt +σ dξi(t), i = 1, . . . ,N



0 0.1 0.20.1

0.15

0.2

0.25

0.3 Locking

Incoherence

κ

κ < κc(γ)

R

γ

Synchrony

Incoherence



Movies of incoherence and synchrony solution

−1

0.8

0.6

0.4

0.2

0

0.2

0.4

0.6

0.8

1

−1

0.8

0.6

0.4

0.2

0

0.2

0.4

0.6

0.8

1

Incoherence Synchrony


Problem statement

Dynamics of ith oscillator

dθi = (ωi +ui(t))dt +σ dξi, i = 1, . . . ,N, t ≥ 0

ui(t) — control 1- 1+1

ith oscillator seeks to minimize

ηi(ui;u−i) = limT→∞

1T

∫ T

0E[ c(θi;θ−i)︸︷︷︸

cost of anarchy

+ 12 Ru2

i︸︷︷︸cost of control

]ds

θ−i = (θj)j6=iR — control penalty

c(·) — cost function

c(θi;θ−i) =1N ∑

j 6=ic•(θi,θj), c• ≥ 0


Problem statement






1T

∫ T

0E[ c(θi;θ−i)︸︷︷︸

cost of anarchy

+ 12 Ru2


]ds




j 6=ic•(θi,θj), c• ≥ 0


Problem statement






1T

∫ T

0E[ c(θi;θ−i)︸︷︷︸

cost of anarchy

+ 12 Ru2


]ds




j 6=ic•(θi,θj), c• ≥ 0


1 MotivationWhy a game?Why Oscillators?

2 Problems and resultsProblem statementMain results

3 Derivation of modelOverviewDerivation stepsPDE model

4 Analysis of phase transitionIncoherence solutionBifurcation analysisNumerics

5 LearningQ-function approximationSteepest descent algorithm

Motivation Why a game?

Quiz

In the video you just watched, why were theindividuals walking strangely?

A. To show respect to the Queen.B. Anarchists in the crowd were trying to destabilize the bridge.C. They were stepping to the beat of the soundtrack "Walk Like an

Egyptian."D. The individuals were trying to maintain their balance.



Quiz






Quiz






Quiz






Quiz






“Rational irrationality”

“—behavior that, on the individual level, is perfectly reasonable butthat, when aggregated in the marketplace, produces calamity.”

ExamplesMillennium bridgeFinancial market

John Cassidy, “Rational Irrationality: The real reason that capitalism is so crash-prone,” The New Yorker, 2009


http://www.newyorker.com/reporting/2009/10/05/091005fa_fact_cassidy


“Rational irrationality”

“—behavior that, on the individual level, is perfectly reasonable butthat, when aggregated in the marketplace, produces calamity.”

ExamplesMillennium bridgeFinancial market

John Cassidy, “Rational Irrationality: The real reason that capitalism is so crash-prone,” The New Yorker, 2009


http://www.newyorker.com/reporting/2009/10/05/091005fa_fact_cassidy






Motivation Why Oscillators?

Hodgkin-Huxley type Neuron model

CdVdt

=−gT ·m2∞(V) ·h · (V−ET)

−gh · r · (V−Eh)− . . . . . .

dhdt

=h∞(V)−h

τh(V)drdt

=r∞(V)− r

τr(V)2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000

−150

−100

−50

0

50

100

Voltage

time

Neural spike train

[4] J. Guckenheimer, J. Math. Biol., 1975; [2] J. Moehlis et al., Neural Computation, 2004




CdVdt

=−gT ·m2∞(V) ·h · (V−ET)

−gh · r · (V−Eh)− . . . . . .

dhdt

=h∞(V)−h

τh(V)drdt

=r∞(V)− r

τr(V)2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000

−150

−100

−50

0

50

100

Voltage

time

Neural spike train





CdVdt

=−gT ·m2∞(V) ·h · (V−ET)

−gh · r · (V−Eh)− . . . . . .

dhdt

=h∞(V)−h

τh(V)drdt

=r∞(V)− r

τr(V)2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000

−150

−100

−50

0

50

100

Voltage

time

Neural spike train

−100

−50

0

50

100

0

0.2

0.4

0.6

0.8

10

0.1

0.2

0.3

0.4

Vh

r

Limit cyle

r

h v





CdVdt

=−gT ·m2∞(V) ·h · (V−ET)

−gh · r · (V−Eh)− . . . . . .

dhdt

=h∞(V)−h

τh(V)drdt

=r∞(V)− r

τr(V)2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000

−150

−100

−50

0

50

100

Voltage

time

Neural spike train

−100

−50

0

50

100

0

0.2

0.4

0.6

0.8

10

0.1

0.2

0.3

0.4

Vh

r

Limit cyle

r

h v

Normal form reduction−−−−−−−−−−−−−→

θi = ωi +ui ·Φ(θi)








Problems and results Problem statement

Finite oscillator model






1T

∫ T

0E[ c(θi;θ−i)︸︷︷︸

cost of anarchy

+ 12 Ru2


]ds

θ−i = (θj)j6=iR — control penaltyc(·) — cost function


j 6=ic•(θi,θj), c• ≥ 0







Problems and results Main results

1. Synchronization is a solution of game

Locking

0 0.1 0.2

0.15

0.2

0.25

R−1/ 2

γγIncoherence

R > Rc(γ)

Synchrony

Incoherence

dθi = (ωi +ui)dt +σ dξi


1T

∫ T

0E[c(θi;θ−i)+ 1

2 Ru2i ]ds

1- 1+1

Yin et al., ACC 2010 Strogatz et al., J. Stat. Phy., 1992




Locking

0 0.1 0.2

0.15

0.2

0.25

R−1/ 2

γγIncoherence

R > Rc(γ)

Synchrony

Incoherence



1T

∫ T

0E[c(θi;θ−i)+ 1

2 Ru2i ]ds

0 0.1 0.20.1

0.15

0.2

0.25

0.3 Locking

Incoherence

κ

κ < κc(γ)

R

γ

Synchrony

Incoherence

dθi =

(ωi +

κ

N

N

∑j=1

sin(θj−θi)

)dt +σ dξi

Yin et al., ACC 2010 Strogatz et al., J. Stat. Phy., 1992



2. Kuramoto control is approximately optimal

−0.2

0

0.2

0.4

0.6

ω = 1

Kuramoto

PopulationDensity

Control laws

0 π 2π θ

ui =−A∗iR

1N ∑

j 6=isin(θ −θj(t))

0 50 100 150 200 250 3002

2.5

3

3.5

4

4.5

5

5.5

6

t

k = 0.01; R = 1000

Ai

A*

Learning algorithm:dAi

dt=−ε . . .

Yin et.al. CDC 2010







Derivation of model Overview

Overview of model derivation

dθi = (ωi +ui(t))dt +σ dξi


1T

∫ T

0E[c(θi, t)+ 1

2 Ru2i ]ds

Influence

Influence

Mass

1 Mean-field approximationAssumption:

c(θi;θ−i(t)) =1N ∑

j6=ic•(θi,θj)

N→∞−−−−−−→ c(θ , t)

2 Optimal control of single oscillatorDecentralized control structure

[5] M. Huang, P. Caines, and R. Malhame, IEEE TAC, 2007 [HCM]







Derivation of model Derivation steps

Single oscillator with given cost

Dynamics of the oscillator

dθi = (ωi +ui(t))dt +σ dξi, t ≥ 0

The cost function is assumed known

ηi(ui; c) = limT→∞

1T

∫ T

0E[ c(θi;θ−i) + 1

2 Ru2i (s)]ds

⇑c(θi(s),s)

HJB equation:

∂thi +ωi∂θ hi =1

2R(∂θ hi)2− c(θ , t)+η

∗i −

σ2

2∂

2θθ hi

Optimal control law: u∗i (t) = ϕi(θ , t) =− 1R

∂θ hi(θ , t)

[1] D. P. Bertsekas (1995); [9] S. P. Meyn, IEEE TAC, 1997



Single oscillator with optimal control

Dynamics of the oscillator

dθi(t) =(

ωi−1R

∂θ hi(θi, t))

dt +σ dξi(t)

Fokker-Planck equation for pdf p(θ , t,ωi)

FPK: ∂tp+ωi∂θ p =1R

∂θ [p(∂θ h)]+σ2

2∂

2θθ p

[7] A. Lasota and M. C. Mackey, “Chaos, Fractals and Noise,” Springer 1994



Mean-field Approximation

HJB equation for population

∂th+ω∂θ h =1

2R(∂θ h)2− c(θ , t)+η(ω)− σ2

2∂

2θθ h h(θ , t,ω)

Population density

∂tp+ω∂θ p =1R

∂θ [p(∂θ h)]+σ2

2∂

2θθ p p(θ , t,ω)

Enforce cost consistency

c(θ , t) =∫

Ω

∫ 2π

0c•(θ ,ϑ)p(ϑ , t,ω)g(ω)dϑ dω

≈ 1N ∑

j 6=ic•(θ ,ϑ)







Derivation of model PDE model

Summary

HJB: ∂th+ω∂θ h =1

2R(∂θ h)2− c(θ , t) +η

∗− σ2

2∂

2θθ h ⇒ h(θ , t,ω)

FPK: ∂tp+ω∂θ p =1R

∂θ [p( ∂θ h )]+σ2

2∂

2θθ p ⇒ p(θ , t,ω)

Mean-field approx.: c(ϑ , t) =∫

Ω

∫ 2π

0c•(ϑ ,θ) p(θ , t,ω) g(ω)dθ dω

1 Bellman’s optimality principle (H,J,B)2 Propagation of chaos (F,P,K, Mckean, Vlasov,. . . )3 Mean-field approximation (Boltzmann, Kac,. . . )4 Connection to Nash game (Weintraub, HCM, Altman,. . . )



Summary


2R(∂θ h)2− c(θ , t) +η

∗− σ2

2∂

2θθ h ⇒ h(θ , t,ω)


∂θ [p( ∂θ h )]+σ2

2∂

2θθ p ⇒ p(θ , t,ω)


Ω

∫ 2π





Summary


2R(∂θ h)2− c(θ , t) +η

∗− σ2

2∂

2θθ h ⇒ h(θ , t,ω)


∂θ [p( ∂θ h )]+σ2

2∂

2θθ p ⇒ p(θ , t,ω)


Ω

∫ 2π





Summary


2R(∂θ h)2− c(θ , t) +η

∗− σ2

2∂

2θθ h ⇒ h(θ , t,ω)


∂θ [p( ∂θ h )]+σ2

2∂

2θθ p ⇒ p(θ , t,ω)


Ω

∫ 2π





Summary


2R(∂θ h)2− c(θ , t) +η

∗− σ2

2∂

2θθ h ⇒ h(θ , t,ω)


∂θ [p( ∂θ h )]+σ2

2∂

2θθ p ⇒ p(θ , t,ω)


Ω

∫ 2π





1. Solution of PDE gives ε-Nash equilibrium

Optimal control law

uoi =− 1

R∂θ h(θ(t), t,ω)

∣∣ω=ωi

ε-Nash property (as N→ ∞)

ηi(uoi ;uo−i)≤ ηi(ui;uo

−i)+O(1√N

), i = 1, . . . ,N.

So, we look for solutions of PDEs.




Optimal control law

uoi =− 1


∣∣ω=ωi



−i)+O(1√N

), i = 1, . . . ,N.





Optimal control law

uoi =− 1


∣∣ω=ωi



−i)+O(1√N

), i = 1, . . . ,N.




2. Incoherence solution (PDE)

Incoherence solution

h(θ , t,ω) = h0(θ) := 0 p(θ , t,ω) = p0(θ) :=1

2π

incoherence

h(θ , t,ω) = 0 ⇒ ∂th+ω∂θ h =1

2R(∂θ h)2− c(θ , t)+η

∗− σ2

2∂

2θθ h

∂tp+ω∂θ p =1R

∂θ [p(∂θ h)]+σ2

2∂

2θθ p

c(θ , t) =∫

Ω

∫ 2π





Incoherence solution

h(θ , t,ω) = h0(θ) := 0 p(θ , t,ω) = p0(θ) :=1

2π

incoherence

h(θ , t,ω) = 0⇒ ∂th+ω∂θ h =1

2R(∂θ h)2− c(θ , t)+η

∗− σ2

2∂

2θθ h

p(θ , t,ω) = 12π⇒ ∂tp+ω∂θ p =

1R

∂θ [p(∂θ h)]+σ2

2∂

2θθ p

c(θ , t) =∫

Ω

∫ 2π





Assume c•(ϑ ,θ) = c•(ϑ −θ) = 12 sin2

(ϑ −θ

2

)Incoherence solution

h(θ , t,ω) = h0(θ) := 0 p(θ , t,ω) = p0(θ) :=1

2π

Optimal control u =− 1R

∂θ h = 0

Average cost

c(θ , t) =∫

Ω

∫ 2π

0

12 sin2

(θ −ϑ

2

)1

2πg(ω)dϑ dω

η∗(ω) = c(θ , t) =

14

=: η0 for all ω ∈Ω

incoherence soln.

No cost of control



2. Incoherence solution (Finite population)

Closed-loop dynamics dθi = (ωi + ui︸︷︷︸=0

)dt +σ dξi(t)

Average cost

ηi = limT→∞

1T

∫ T

0E[c(θi;θ−i)+ 1

2 Ru2i︸︷︷︸

=0

]dt

= limT→∞

1N ∑

j 6=i

1T

∫ T

0E[ 1

2 sin2(

θi(t)−θj(t)2

)]dt

=1N ∑

j6=i

∫ 2π

0E[ 1

2 sin2(

θi(t)−ϑ

2

)]

12π

dϑ =N−1

Nη0

−1

0.8

0.6

0.4

0.2

0

0.2

0.4

0.6

0.8

1

incoherence

ε-Nash property


−i)+O(1√N

), i = 1, . . . ,N.



2. Incoherence solution (Finite population)

Closed-loop dynamics dθi = (ωi + ui︸︷︷︸=0

)dt +σ dξi(t)

Average cost

ηi = limT→∞

1T

∫ T

0E[c(θi;θ−i)+ 1

2 Ru2i︸︷︷︸

=0

]dt

= limT→∞

1N ∑

j 6=i

1T

∫ T

0E[ 1

2 sin2(

θi(t)−θj(t)2

)]dt

=1N ∑

j6=i

∫ 2π

0E[ 1

2 sin2(

θi(t)−ϑ

2

)]

12π

dϑ =N−1

Nη0

−1

0.8

0.6

0.4

0.2

0

0.2

0.4

0.6

0.8

1

incoherence

ε-Nash property


−i)+O(1√N

), i = 1, . . . ,N.




Locking

0 0.1 0.2

0.15

0.2

0.25

R−1/ 2

γγIncoherence

R > Rc(γ)

Synchrony

IncoherenceR−1/ 2

η(ω)

0. 1 0.15 0. 2 0.25 0. 3 0.350. 1

0.15

0. 2

0.25

ω= 0.95

ω= 1

ω= 1.05

R > Rc

η(ω) = η0

R < R

c

η(ω) < η0

c



1T

∫ T

0E[c(θi;θ−i)+ 1

2 Ru2i ]ds η(ω) = min

uiηi(ui;uo

−i)

0 1 2 3 4 5 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1t = 38.24

Synchrony solution of

Yin et al., “Synchronization of oscillators is a game,” ACC2010P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 29 / 69



Locking

0 0.1 0.2

0.15

0.2

0.25

R−1/ 2

γγIncoherence

R > Rc(γ)

Synchrony

IncoherenceR−1/ 2

η(ω)

0. 1 0.15 0. 2 0.25 0. 3 0.350. 1

0.15

0. 2

0.25

ω= 0.95

ω= 1

ω= 1.05

R > Rc

η(ω) = η0

R < R

c

η(ω) < η0

c

incoherence soln.



1T

∫ T

0E[c(θi;θ−i)+ 1


uiηi(ui;uo

−i)

0 1 2 3 4 5 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1t = 38.24





Locking

0 0.1 0.2

0.15

0.2

0.25

R−1/ 2

γγIncoherence

R > Rc(γ)

Synchrony

IncoherenceR−1/ 2

η(ω)

0. 1 0.15 0. 2 0.25 0. 3 0.350. 1

0.15

0. 2

0.25

ω= 0.95

ω= 1

ω= 1.05

R > Rc

η(ω) = η0

R < R

c

η(ω) < η0

c

synchrony soln.



1T

∫ T

0E[c(θi;θ−i)+ 1


uiηi(ui;uo

−i)

0 1 2 3 4 5 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1t = 38.24








Analysis of phase transition Incoherence solution

Overview of the steps


2R(∂θ h)2− c(θ , t) +η

∗− σ2

2∂

2θθ h ⇒ h(θ , t,ω)


∂θ [p( ∂θ h )]+σ2

2∂

2θθ p ⇒ p(θ , t,ω)

c(ϑ , t) =∫

Ω

∫ 2π


Assume c•(ϑ ,θ) = c•(ϑ −θ) = 12 sin2

(ϑ −θ

2

)Incoherence solution

h(θ , t,ω) = h0(θ) := 0 p(θ , t,ω) = p0(θ) :=1

2π







Analysis of phase transition Bifurcation analysis

Linearization and spectra

Linearized PDE (about incoherence solution)

∂

∂ tz(θ , t,ω) =

(−ω∂θ h− c− σ2

2 ∂ 2θθ

h−ω∂θ p+ 1

2πR ∂ 2θθ

h+ σ2

2 ∂ 2θθ

p

)=: LRz(θ , t,ω)

Spectrum of the linear operator1 Continuous spectrum S(k)+∞

k=−∞

S(k) :=λ ∈ C

∣∣λ =±σ2

2k2− kωi for all ω ∈Ω

2 Discrete spectrum

Characteristic eqn:1

8R

∫Ω

g(ω)

(λ − σ2

2 +ωi)(λ + σ2

2 +ωi)dω +1 = 0.





∂

∂ tz(θ , t,ω) =

(−ω∂θ h− c− σ2

2 ∂ 2θθ

h−ω∂θ p+ 1

2πR ∂ 2θθ

h+ σ2

2 ∂ 2θθ

p

)=: LRz(θ , t,ω)


k=−∞

S(k) :=λ ∈ C

∣∣λ =±σ2


2 Discrete spectrum


8R

∫Ω

g(ω)

(λ − σ2

2 +ωi)(λ + σ2

2 +ωi)dω +1 = 0.





∂

∂ tz(θ , t,ω) =

(−ω∂θ h− c− σ2

2 ∂ 2θθ

h−ω∂θ p+ 1

2πR ∂ 2θθ

h+ σ2

2 ∂ 2θθ

p

)=: LRz(θ , t,ω)


k=−∞

S(k) :=λ ∈ C

∣∣λ =±σ2


−0.2 −0.1 0 0.1 0.2 0.3

−3

−2

−1

0

1

2

3

real

ima

g

γ = 0.1

R decreases

k=2 k=2

k=1 k=1

2 Discrete spectrum


8R

∫Ω

g(ω)

(λ − σ2

2 +ωi)(λ + σ2

2 +ωi)dω +1 = 0.





∂

∂ tz(θ , t,ω) =

(−ω∂θ h− c− σ2

2 ∂ 2θθ

h−ω∂θ p+ 1

2πR ∂ 2θθ

h+ σ2

2 ∂ 2θθ

p

)=: LRz(θ , t,ω)


k=−∞

S(k) :=λ ∈ C

∣∣λ =±σ2


−0.2 −0.1 0 0.1 0.2 0.3

−3

−2

−1

0

1

2

3

real

ima

g

γ = 0.1

R decreases

k=2 k=2

k=1 k=1

2 Discrete spectrum


8R

∫Ω

g(ω)

(λ − σ2

2 +ωi)(λ + σ2

2 +ωi)dω +1 = 0.



Bifurcation diagram (Hamiltonian Hopf)


8R

∫Ω

g(ω)

(λ − σ2

2 +ωi)(λ + σ2

2 +ωi)dω +1 = 0.

Stability proof

[3] Dellnitz et al., Int. Series Num. Math., 1992





8R

∫Ω

g(ω)

(λ − σ2

2 +ωi)(λ + σ2

2 +ωi)dω +1 = 0.

Stability proof

−0.2 −0.1 0 0.1 0.2

-0.6

-0.8

-1

-1.2

-1.4

real

imag

(a)

Cont. spectrum; ind. of RDisc. spectrum; fn. of R

Bifurcation point






8R

∫Ω

g(ω)

(λ − σ2

2 +ωi)(λ + σ2

2 +ωi)dω +1 = 0.

Stability proof

−0.2 −0.1 0 0.1 0.2

-0.6

-0.8

-1

-1.2

-1.4

real

imag

(a)

Cont. spectrum; ind. of RDisc. spectrum; fn. of R

Bifurcation point

0 0.05 0.1 0.15 0.215

20

25

30

35

40

45

50

Incoherence

R > RR

c(γ

γ

) (c))

Synchrony

0.05








Analysis of phase transition Numerics

Numerical solution of PDEs

Incoherence; R = 60incoherence

incoherence



Numerical solution of PDEs

Incoherence; R = 60incoherence

incoherence

Synchrony; R = 10

synchrony

synchrony



Bifurcation diagram

Locking

0 0.1 0.2

0.15

0.2

0.25

R−1/ 2

γγIncoherence

R > Rc(γ)

Synchrony

Incoherence R−1/2

η(ω)

0. 1 0.15 0. 2 0.25 0. 3 0.35

0. 1

0.15

0. 2

0.25

ω = 0.95

ω = 1

ω = 1.05

R > Rc

η(ω) = η0

R < Rc

η(ω) < η0



1T

∫ T

0E[c(θi;θ−i)+ 1

2 Ru2i ]ds



Bifurcation diagram

Locking

0 0.1 0.2

0.15

0.2

0.25

R−1/ 2

γγIncoherence

R > Rc(γ)

Synchrony

Incoherence R−1/2

η(ω)

0. 1 0.15 0. 2 0.25 0. 3 0.35

0. 1

0.15

0. 2

0.25

ω = 0.95

ω = 1

ω = 1.05

R > Rc

η(ω) = η0

R < Rc

η(ω) < η0

incoherence soln.



1T

∫ T

0E[c(θi;θ−i)+ 1

2 Ru2i ]ds



Bifurcation diagram

Locking

0 0.1 0.2

0.15

0.2

0.25

R−1/ 2

γγIncoherence

R > Rc(γ)

Synchrony

Incoherence R−1/2

η(ω)

0. 1 0.15 0. 2 0.25 0. 3 0.35

0. 1

0.15

0. 2

0.25

ω = 0.95

ω = 1

ω = 1.05

R > Rc

η(ω) = η0

R < Rc

η(ω) < η0

synchrony soln.



1T

∫ T

0E[c(θi;θ−i)+ 1

2 Ru2i ]ds







Learning Q-function approximation

Comparison to Kuramoto law

Control law u = ϕ(θ , t,ω)

−0.2

0

0.2

0.4

0.6

ω = 0.95

ω = 1

ω = 1.05

PopulationDensity

Control laws

0 π 2π θ

Equivalent control law in Kuramoto oscillator

u(Kur)i =

κ

N

N

∑j=1

sin(θj(t)−θi)N→∞

≈ κ0 sin(ϑ0 + t−θi)



Comparison to Kuramoto law

Control law u = ϕ(θ , t,ω)

−0.2

0

0.2

0.4

0.6

ω = 0.95

ω = 1

ω = 1.05

Kuramoto

Population

Density

Control laws

0 π 2π θ

Equivalent control law in Kuramoto oscillator

u(Kur)i =

κ

N

N

∑j=1

sin(θj(t)−θi)N→∞

≈ κ0 sin(ϑ0 + t−θi)



Optimality equation minuic(θ ;θ−i(t))+ 1

2 Ru2i +Duihi(θ , t)︸︷︷︸

=: Hi(θ ,ui;θ−i(t))

= η∗i

Optimal control law Kuramoto law

u∗i =− 1R

∂θ hi(θ , t) u(Kur)i =−κ

N ∑j 6=i

sin(θi−θj(t))

Parameterization:

H(Ai,φi)i (θ ,ui;θ−i(t))= c(θ ;θ−i(t))+ 1

2 Ru2i +(ωi−1+ui)AiS(φi)+

σ2

2AiC(φi)

where

S(φ)(θ ,θ−i) =1N ∑

j 6=isin(θ −θj−φ), C(φ)(θ ,θ−i) =

1N ∑

j 6=icos(θ −θj−φ)

Approx. optimal control:

u(Ai,φi)i = argmin

ui

H(Ai,φi)i (θ ,ui;θ−i(t))=−Ai

RS(φi)(θ ,θ−i)






= η∗i


u∗i =− 1R


N ∑j 6=i

sin(θi−θj(t))

Parameterization:



σ2

2AiC(φi)

where

S(φ)(θ ,θ−i) =1N ∑


1N ∑



u(Ai,φi)i = argmin

ui


RS(φi)(θ ,θ−i)






= η∗i


u∗i =− 1R


N ∑j 6=i

sin(θi−θj(t))

Parameterization:



σ2

2AiC(φi)

where

S(φ)(θ ,θ−i) =1N ∑


1N ∑



u(Ai,φi)i = argmin

ui


RS(φi)(θ ,θ−i)






= η∗i


u∗i =− 1R


N ∑j 6=i

sin(θi−θj(t))

Parameterization:



σ2

2AiC(φi)

where

S(φ)(θ ,θ−i) =1N ∑


1N ∑



u(Ai,φi)i = argmin

ui


RS(φi)(θ ,θ−i)







Learning Steepest descent algorithm

Bellman error:

Pointwise: L (Ai,φi)(θ , t) = minuiH(Ai,φi)

i −η(A∗i ,φ

∗i )

i

Simple gradient descent algorithm

e(Ai,φi) =2

∑k=1|〈L (Ai,φi), ϕk(θ)〉|2

dAi

dt=−ε

de(Ai,φi)dAi

,dφi

dt=−ε

de(Ai,φi)dφi

(∗)

Theorem (Convergence)

Assume population is in synchrony. The ith oscillator updatesaccording to (∗). Then

Ai(t)→ A∗ =1

2σ2

The pointwise Bellman error L (Ai,0)(θ , t) = ε(R)cos2(θ − t)

where ε(R) =1

16Rσ4


Learning Steepest descent algorithm

Phase transition

Suppose all oscillators use approx. optimal control law:

ui =−A∗

R1N ∑

j 6=isin(θi−θj(t))

then the phase transition boundary is

Rc(γ) =

1

2σ4 if γ = 01

4σ2γtan−1

(2γ

σ2

)if γ > 0

0 50 100 150 200 250 3002

2.5

3

3.5

4

4.5

5

5.5

6

t

k = 0.01; R = 1000

Ai

A*

0 0.05 0.1 0.15 0.215

20

25

30

35

40

45

50

γ

R

PDE

Learning

Incoherence

Synchrony


Thank you!

Website: http://www.mechse.illinois.edu/research/mehtapg

Huibing Yin Sean P. Meyn Uday V. Shanbhag

H. Yin, P. G. Mehta, S. P. Meyn and U. V. Shanbhag, “Synchronization of coupled oscillators is a game,” ACC 2010

http://www.mechse.illinois.edu/research/mehtapg

Bibliography

Dimitri P. Bertsekas.Dynamic Programming and Optimal Control, volume 1.Athena Scientific, Belmont, Massachusetts, 1995.

Eric Brown, Jeff Moehlis, and Philip Holmes.On the phase reduction and response dynamics of neuraloscillator populations.Neural Computation, 16(4):673–715, 2004.

M. Dellnitz, J.E. Marsden, I. Melbourne, and J. Scheurle.Generic bifurcations of pendula.Int. Series Num. Math., 104:111–122, 1992.

J. Guckenheimer.Isochrons and phaseless sets.J. Math. Biol., 1:259–273, 1975.

Minyi Huang, Peter E. Caines, and Roland P. Malhame.


Bibliography

Large-population cost-coupled LQG problems with nonuniformagents: Individual-mass behavior and decentralized ε-nashequilibria.IEEE transactions on automatic control, 52(9):1560–1571, 2007.

Y. Kuramoto.International Symposium on Mathematical Problems in TheoreticalPhysics, volume 39 of Lecture Notes in Physics.Springer-Verlag, 1975.

Andrzej Lasota and Michael C. Mackey.Chaos, Fractals and Noise.Springer, 1994.

P. Mehta and S. Meyn.Q-learning and Pontryagin’s Minimum Principle.To appear, 48th IEEE Conference on Decision and Control,December 16-18 2009.

Sean P. Meyn.


Bibliography

The policy iteration algorithm for average reward markov decisionprocesses with general state space.IEEE Transactions on Automatic Control, 42(12):1663–1680,December 1997.

S. H. Strogatz and R. E. Mirollo.Stability of incoherence in a population of coupled oscillators.Journal of Statistical Physics, 63:613–635, May 1991.

Steven H. Strogatz, Daniel M. Abrams, Bruno Eckhardt, andEdward Ott.Theoretical mechanics: Crowd synchrony on the millenniumbridge.Nature, 438:43–44, 2005.