78
1/51 Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References GAMES, DYNAMICS & OPTIMIZATION Panayotis Mertikopoulos French National Center for Scientific Research (CNRS) Laboratoire d’Informatique de Grenoble (LIG) Criteo AI Lab CoreLab, NTUA — October 5, 2020 P. Mertikopoulos CNRS & Criteo AI Lab

LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

1/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

GAMES, DYNAMICS & OPTIMIZATION

Panayotis Mertikopoulos

French National Center for Scientific Research (CNRS)

Laboratoire d’Informatique de Grenoble (LIG)

Criteo AI Lab

CoreLab, NTUA — October 5, 2020

P. Mertikopoulos CNRS & Criteo AI Lab

Page 2: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

2/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Outline

Overview

From flows to algorithms

From algorithms to flows

Flows in games

Monotone games

Spurious limits

P. Mertikopoulos CNRS & Criteo AI Lab

Page 3: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

3/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Overview

What is the long-run behavior of first-order methods in optimization / games?

In optimization:▸ Do first-order (= gradient-based) algorithms converge to critical points?▸ Are local minimizers selected?

In games:▸ Do gradient methods converge to Nash equilibrium?▸ Are all Nash equilibria created equal?

Dynamics: from discrete to continuous and back again

P. Mertikopoulos CNRS & Criteo AI Lab

Page 4: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

3/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Overview

What is the long-run behavior of first-order methods in optimization / games?

In optimization:▸ Do first-order (= gradient-based) algorithms converge to critical points?▸ Are local minimizers selected?

In games:▸ Do gradient methods converge to Nash equilibrium?▸ Are all Nash equilibria created equal?

Dynamics: from discrete to continuous and back again

P. Mertikopoulos CNRS & Criteo AI Lab

Page 5: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

3/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Overview

What is the long-run behavior of first-order methods in optimization / games?

In optimization:▸ Do first-order (= gradient-based) algorithms converge to critical points?▸ Are local minimizers selected?

In games:▸ Do gradient methods converge to Nash equilibrium?▸ Are all Nash equilibria created equal?

Dynamics: from discrete to continuous and back again

P. Mertikopoulos CNRS & Criteo AI Lab

Page 6: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

3/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Overview

What is the long-run behavior of first-order methods in optimization / games?

In optimization:▸ Do first-order (= gradient-based) algorithms converge to critical points?▸ Are local minimizers selected?

In games:▸ Do gradient methods converge to Nash equilibrium?▸ Are all Nash equilibria created equal?

Dynamics: from discrete to continuous and back again

P. Mertikopoulos CNRS & Criteo AI Lab

Page 7: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

4/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

About

N. Hallak

A. Kavis Y. -P. Hsieh

V. Cevher G. Piliouras

C. Papadimitriou

Z. Zhou▸ M, Papadimitriou & Piliouras, Cycles in adversarial regularized learning, SODA 2018▸ M & Zhou, Learning in games with continuous action sets and unknown payoff functions,Mathematical Programming, vol. 173, pp. 465–507, Jan. 2019

▸ M, Hallak, Kavis & Cevher,On the almost sure convergence of stochastic gradient descent innon-convex problems, NeurIPS 2020

▸ Hsieh, M & Cevher, The limits of min-max optimization algorithms: convergence to spuriousnon-critical sets, https://arxiv.org/abs/2006.09065

P. Mertikopoulos CNRS & Criteo AI Lab

Page 8: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

5/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Outline

Overview

From flows to algorithms

From algorithms to flows

Flows in games

Monotone games

Spurious limits

P. Mertikopoulos CNRS & Criteo AI Lab

Page 9: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

6/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Basic problem

minimizex∈Rd f (x)

▸ f non-convex [technical assumptions later]

▸ f unknown/difficult to manipulate in closed form [low precision methods]

▸ Single-player game: calculate best responses [more in second part]

P. Mertikopoulos CNRS & Criteo AI Lab

Page 10: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

7/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Gradient flows

Gradient flow of a function f ∶Rd → R

x(t) = −∇ f (x(t)) (GF)

Main property: f is a (strict) Lyapunov function for (GF)

d f /dt = −∥∇ f (x(t))∥ ≤ w/ equality iff∇ f (x) =

P. Mertikopoulos CNRS & Criteo AI Lab

Page 11: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

8/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Convergence of gradient flows

Blanket assumptions▸ Lipschitz smoothness:

∥∇ f (x′) −∇ f (x)∥ ≤ L∥x′ − x∥ for all x , x′ ∈ Rd (LS)

▸ Bounded sublevels:

Lc ≡ {x ∈ Rd ∶ f (x) ≤ c} is bounded for all c < sup f (sub)

Theorem▸ Assume: (LS), (sub)▸ Then: x(t) converges to crit( f ) ≡ {x∗ ∈ Rd ∶ ∇ f (x∗) = }

[NB: setwise, not pointwise convergence, cf. Palis Jr. and de Melo, 1982]

P. Mertikopoulos CNRS & Criteo AI Lab

Page 12: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

9/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

From flows to algorithms: gradient descent

Forward Euler (explicit) Ô⇒ gradient descent (GD) [Cauchy, 1847]

Xn+ = Xn − γn∇ f (Xn) (GD)

x

x+

−γ∇ f (x)

P. Mertikopoulos CNRS & Criteo AI Lab

Page 13: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

10/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

From flows to algorithms: proximal gradient

Backward Euler (implicit) Ô⇒ proximal gradient (PG) [Martinet, 1970]

Xn+ = Xn − γn∇ f (Xn+) (PG)

x

x+−γ∇ f (x+)

P. Mertikopoulos CNRS & Criteo AI Lab

Page 14: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

11/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

From flows to algorithms: extra-gradient

Midpoint Runge-Kutta (explicit) Ô⇒ extra-gradient (EG) [Korpelevich, 1976]

Xn+/ = Xn − γn∇ f (Xn) Xn+ = Xn − γn∇ f (Xn+/) (EG)

x

xlead

−γ∇ f (x)

−γ∇ f (xlead)

x+−γ∇ f (xlead)

P. Mertikopoulos CNRS & Criteo AI Lab

Page 15: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

11/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

From flows to algorithms: extra-gradient

Midpoint Runge-Kutta (explicit) Ô⇒ extra-gradient (EG) [Korpelevich, 1976]

Xn+/ = Xn − γn∇ f (Xn) Xn+ = Xn − γn∇ f (Xn+/) (EG)

x

xlead

−γ∇ f (x) −γ∇ f (xlead)

x+−γ∇ f (xlead)

P. Mertikopoulos CNRS & Criteo AI Lab

Page 16: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

11/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

From flows to algorithms: extra-gradient

Midpoint Runge-Kutta (explicit) Ô⇒ extra-gradient (EG) [Korpelevich, 1976]

Xn+/ = Xn − γn∇ f (Xn) Xn+ = Xn − γn∇ f (Xn+/) (EG)

x

xlead

−γ∇ f (x) −γ∇ f (xlead)

x+−γ∇ f (xlead)

P. Mertikopoulos CNRS & Criteo AI Lab

Page 17: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

12/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Stochastic gradient feedback

In many applications, perfect gradient information is unavailable / too costly:

▸ Machine learning:f (x) = ∑N

i= f i(x) and only a batch of∇ f i(x) is computable per iteration

▸ Control / Engineering:f (x) = E[F(x;ω)] and only∇F(x;ω) can be observed for a random ω

▸ Game Theory / Bandit Learning:Only f (x) is observable

Stochastic first-order oracle (SFO) feedback:

Xn ↦ Vn¯

feedback

= ∇ f (Xn)´¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¶gradient

+ Zn

noise

+ bn®bias

(SFO)

where Zn is “zero-mean” and bn is “small” (more later)

P. Mertikopoulos CNRS & Criteo AI Lab

Page 18: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

12/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Stochastic gradient feedback

In many applications, perfect gradient information is unavailable / too costly:

▸ Machine learning:f (x) = ∑N

i= f i(x) and only a batch of∇ f i(x) is computable per iteration

▸ Control / Engineering:f (x) = E[F(x;ω)] and only∇F(x;ω) can be observed for a random ω

▸ Game Theory / Bandit Learning:Only f (x) is observable

Stochastic first-order oracle (SFO) feedback:

Xn ↦ Vn¯

feedback

= ∇ f (Xn)´¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¶gradient

+ Zn

noise

+ bn®bias

(SFO)

where Zn is “zero-mean” and bn is “small” (more later)

P. Mertikopoulos CNRS & Criteo AI Lab

Page 19: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

13/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Stochastic gradient descent

Noisy Euler (explicit) Ô⇒ stochastic gradient descent (SGD)

Xn+ = Xn − γn[∇ f (Xn) + Wn°noise

] (SGD)

x

x+

−γ[∇ f (x) +w]

P. Mertikopoulos CNRS & Criteo AI Lab

Page 20: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

14/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Example: zeroth-order feedback

Given f ∶R→ R, estimate f ′(x) at target point x ∈ R

f ′(x) ≈ f (x + δ) − f (x − δ)δ

Pick u = ± with probability /. Then:

E[ f (x + δu)u] = f (x + δ) −

f (x − δ)

Ô⇒ Estimate f ′(x) with a single query of f at x = x + δu

Algorithm Simultaneous perturbation stochastic approximation [Spall, 1992]

1: Draw u uniformly from Sd

2: Query x = x + δu3: Get f = f (x)4: Set V = (d/δ) f u

P. Mertikopoulos CNRS & Criteo AI Lab

Page 21: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

14/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Example: zeroth-order feedback

Given f ∶R→ R, estimate f ′(x) at target point x ∈ R

f ′(x) ≈ f (x + δ) − f (x − δ)δ

Pick u = ± with probability /. Then:

E[ f (x + δu)u] = f (x + δ) −

f (x − δ)

Ô⇒ Estimate f ′(x) with a single query of f at x = x + δu

Algorithm Simultaneous perturbation stochastic approximation [Spall, 1992]

1: Draw u uniformly from Sd

2: Query x = x + δu3: Get f = f (x)4: Set V = (d/δ) f u

P. Mertikopoulos CNRS & Criteo AI Lab

Page 22: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

15/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

The Robbins-Monro template

Generalized Robbins-Monro:

Xn+ = Xn − γn[∇ f (Xn) + Zn + bn] (RM)

with∑n γn =∞, γn → , and E[Zn ∣ Xn , . . . , X] =

Examples▸ Gradient descent (det.): Zn = , bn = ▸ Proximal gradient (det.): Zn = , bn = ∇ f (Xn+) −∇ f (Xn)

▸ Extra-gradient (det.): Zn = , bn = ∇ f (Xn+/) −∇ f (Xn)

▸ Stochastic gradient descent (stoch.): Zn = zero-mean, bn =

▸ SPSA (stoch.): Zn = (d/δ) f (Xn)Un −∇ fδ(Xn), bn = ∇ fδ(Xn) −∇ f (Xn) where

fδ(x) =

vol(Bδ) ∫Bδf (x + δu) du

▸ ⋯

P. Mertikopoulos CNRS & Criteo AI Lab

Page 23: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

16/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Outline

Overview

From flows to algorithms

From algorithms to flows

Flows in games

Monotone games

Spurious limits

P. Mertikopoulos CNRS & Criteo AI Lab

Page 24: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

17/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

From algorithms to flows

Basic idea: if γn is “small”, the noise washes out and “ limt→∞ (RM) = limt→∞ (GF) ”

Ô⇒ ODE method of stochastic approximation[Ljung, 1977; Benveniste et al., 1990; Kushner and Yin, 1997; Benaïm, 1999]

▸ Virtual time: τn = ∑nk= γk

▸ Virtual trajectory: X(t) = Xn +t − τn

τn+ − τn(Xn+ − Xn)

▸ Asymptotic pseudotrajectory (APT):limt→∞

sup≤h≤T

∥X(t + h) −Φh(X(t))∥ =

where Φs(x) denotes the position at time s of an orbit of (GF) starting at x▸ Long run: X(t) tracks (GF) with arbitrary accuracy over windows of arbitrary length

[Benaïm and Hirsch, 1995, 1996; Benaïm, 1999; Benaïm et al., 2005, 2006]

P. Mertikopoulos CNRS & Criteo AI Lab

Page 25: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

17/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

From algorithms to flows

Basic idea: if γn is “small”, the noise washes out and “ limt→∞ (RM) = limt→∞ (GF) ”

Ô⇒ ODE method of stochastic approximation[Ljung, 1977; Benveniste et al., 1990; Kushner and Yin, 1997; Benaïm, 1999]

▸ Virtual time: τn = ∑nk= γk

▸ Virtual trajectory: X(t) = Xn +t − τn

τn+ − τn(Xn+ − Xn)

▸ Asymptotic pseudotrajectory (APT):limt→∞

sup≤h≤T

∥X(t + h) −Φh(X(t))∥ =

where Φs(x) denotes the position at time s of an orbit of (GF) starting at x▸ Long run: X(t) tracks (GF) with arbitrary accuracy over windows of arbitrary length

[Benaïm and Hirsch, 1995, 1996; Benaïm, 1999; Benaïm et al., 2005, 2006]

P. Mertikopoulos CNRS & Criteo AI Lab

Page 26: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

18/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Stochastic approximation criteria

When is a sequence generated by (RM) an APT?

(A) ▸ Xn is bounded▸ f is Lipschitz continuous and smooth:

∣ f (x′) − f (x)∣ ≤ G∥x′ − x∥ (LC)∥∇ f (x′) −∇ f (x)∥ ≤ L∥x′ − x∥ (LS)

(B) ▸ E[∑n γn∥Zn∥] <∞

▸ supn E[∥Zn∥q] <∞ and∑n γ+q/n <∞

▸ Zn sub-Gaussian and γn = o(/ log n)

(C) ▸ ∑n γnbn = with probability

Proposition (Benaïm, 1999; Hsieh,M & Cevher, 2020)▸ Assume: any of (A); any of (B); (C)▸ Then: Xn is an APT of (GF) with probability

P. Mertikopoulos CNRS & Criteo AI Lab

Page 27: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

19/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Convergence of APTs

Theorem (Benaïm and Hirsch, 1995, 1996)▸ Assume: Xn is a bounded APT of (GF)▸ Then: Xn converges to crit( f ) with probability

Theorem (Ljung, 1977; Benaïm, 1999)▸ Assume: (LC), (LS), (sub); supn∥Xn∥ <∞▸ Then: Xn converges (a.s.) to a component of crit( f ) where f is constant

Boundedness: implicit, algorithm-dependent assumption; non-verifiable!

P. Mertikopoulos CNRS & Criteo AI Lab

Page 28: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

19/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Convergence of APTs

Theorem (Benaïm and Hirsch, 1995, 1996)▸ Assume: Xn is a bounded APT of (GF)▸ Then: Xn converges to crit( f ) with probability

Theorem (Ljung, 1977; Benaïm, 1999)▸ Assume: (LC), (LS), (sub); supn∥Xn∥ <∞▸ Then: Xn converges (a.s.) to a component of crit( f ) where f is constant

Boundedness: implicit, algorithm-dependent assumption; non-verifiable!

P. Mertikopoulos CNRS & Criteo AI Lab

Page 29: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

20/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Can boundedness be dropped?

Key obstacle: infinite plains of vanishing gradients [think f (x) = − exp(−x)]

Countered if gradient sublevel sets do not extend to infinity

Mε ≡ {x ∈ Rd ∶ ∥∇ f (x)∥ ≤ ε} is bounded for some ε > (Gsub)

[standard under regularization]

Proposition (M, Hallak, Kavis & Cevher, 2020)▸ Assume: (LC), (LS), (sub), (Gsub)▸ Then: for all ε > , there exists some τ = τ(ε) such that, for all t ≥ τ:

(a) f (x(t)) ≤ f (x()) − ε; or(b) x(t) is within ε-distance of crit( f )

In words: (GF) either descends f by a uniform amount, or it is already near-critical

P. Mertikopoulos CNRS & Criteo AI Lab

Page 30: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

20/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Can boundedness be dropped?

Key obstacle: infinite plains of vanishing gradients [think f (x) = − exp(−x)]

Countered if gradient sublevel sets do not extend to infinity

Mε ≡ {x ∈ Rd ∶ ∥∇ f (x)∥ ≤ ε} is bounded for some ε > (Gsub)

[standard under regularization]

Proposition (M, Hallak, Kavis & Cevher, 2020)▸ Assume: (LC), (LS), (sub), (Gsub)▸ Then: for all ε > , there exists some τ = τ(ε) such that, for all t ≥ τ:

(a) f (x(t)) ≤ f (x()) − ε; or(b) x(t) is within ε-distance of crit( f )

In words: (GF) either descends f by a uniform amount, or it is already near-critical

P. Mertikopoulos CNRS & Criteo AI Lab

Page 31: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

20/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Can boundedness be dropped?

Key obstacle: infinite plains of vanishing gradients [think f (x) = − exp(−x)]

Countered if gradient sublevel sets do not extend to infinity

Mε ≡ {x ∈ Rd ∶ ∥∇ f (x)∥ ≤ ε} is bounded for some ε > (Gsub)

[standard under regularization]

Proposition (M, Hallak, Kavis & Cevher, 2020)▸ Assume: (LC), (LS), (sub), (Gsub)▸ Then: for all ε > , there exists some τ = τ(ε) such that, for all t ≥ τ:

(a) f (x(t)) ≤ f (x()) − ε; or(b) x(t) is within ε-distance of crit( f )

In words: (GF) either descends f by a uniform amount, or it is already near-critical

P. Mertikopoulos CNRS & Criteo AI Lab

Page 32: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

21/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Can boundedness be dropped?

Proposition▸ Assume: (LC), (LS), (sub), (Gsub); any of (B); (C)▸ Then: With probability , a subsequence of Xn converges to crit( f )

Theorem (M, Hallak, Kavis & Cevher, 2020)▸ Assume: (LC), (LS), (sub), (Gsub); any of (B); (C)▸ Then: With probability , Xn converges to a (possibly random) component

of crit( f ) over which f is constant

P. Mertikopoulos CNRS & Criteo AI Lab

Page 33: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

21/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Can boundedness be dropped?

Proposition▸ Assume: (LC), (LS), (sub), (Gsub); any of (B); (C)▸ Then: With probability , a subsequence of Xn converges to crit( f )

Theorem (M, Hallak, Kavis & Cevher, 2020)▸ Assume: (LC), (LS), (sub), (Gsub); any of (B); (C)▸ Then: With probability , Xn converges to a (possibly random) component

of crit( f ) over which f is constant

P. Mertikopoulos CNRS & Criteo AI Lab

Page 34: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

22/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Are all critical points desirable?

Figure: A hyperbolic ridge manifold, typical of ResNet loss landscapes [Li et al., 2018]

P. Mertikopoulos CNRS & Criteo AI Lab

Page 35: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

23/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Are traps avoided?

Hyperbolic saddle (isolated non-minimizing critical point)

λmin(Hess( f (x∗))) < , det(Hess( f (x∗))) ≠

Ô⇒ (GF) is linearly unstable near x∗

Ô⇒ convergence to x∗ unlikely

Theorem (Pemantle, 1990)▸ Assume:

▸ x∗ is a hyperbolic saddle point▸ Zn is finite (a.s.) and uniformly exciting

E[⟨Z , u⟩+] ≥ c for all unit vectors u ∈ Sd− , x ∈ Rd

▸ γn ∝ /n

▸ Then: P(limn→∞ Xn = x∗) =

P. Mertikopoulos CNRS & Criteo AI Lab

Page 36: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

23/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Are traps avoided?

Hyperbolic saddle (isolated non-minimizing critical point)

λmin(Hess( f (x∗))) < , det(Hess( f (x∗))) ≠

Ô⇒ (GF) is linearly unstable near x∗

Ô⇒ convergence to x∗ unlikely

Theorem (Pemantle, 1990)▸ Assume:

▸ x∗ is a hyperbolic saddle point▸ Zn is finite (a.s.) and uniformly exciting

E[⟨Z , u⟩+] ≥ c for all unit vectors u ∈ Sd− , x ∈ Rd

▸ γn ∝ /n

▸ Then: P(limn→∞ Xn = x∗) =

P. Mertikopoulos CNRS & Criteo AI Lab

Page 37: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

24/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Are non-hyperbolic traps avoided?

Strict saddleλmin(Hess( f (x∗))) <

Theorem (Ge et al., 2015)▸ Given: confidence level ζ > ▸ Assume:

▸ f is bounded and satisfies (LS)▸ Hess( f (x)) is Lipschitz continuous▸ for all x ∈ Rd : (a) ∥∇ f (x)∥ ≥ ε; or (b) λmin(Hess( f (x))) ≤ −β; or (c) x is δ-close

to a local minimum x∗ of f around which f is α-strongly convex▸ Zn is finite (a.s.) and contains a component uniformly sampled from the unit

sphere; also, bn = ▸ γn ≡ γ with γ = O(/ log(/ζ))

▸ Then: with probability at least − ζ, SGD produces afterO(γ− log(/(γζ)))iterations a point which isO(√γ log(/(γζ)))-close to x∗ (and hence awayfrom any strict saddle)

P. Mertikopoulos CNRS & Criteo AI Lab

Page 38: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

24/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Are non-hyperbolic traps avoided?

Strict saddleλmin(Hess( f (x∗))) <

Theorem (Ge et al., 2015)▸ Given: confidence level ζ > ▸ Assume:

▸ f is bounded and satisfies (LS)▸ Hess( f (x)) is Lipschitz continuous▸ for all x ∈ Rd : (a) ∥∇ f (x)∥ ≥ ε; or (b) λmin(Hess( f (x))) ≤ −β; or (c) x is δ-close

to a local minimum x∗ of f around which f is α-strongly convex▸ Zn is finite (a.s.) and contains a component uniformly sampled from the unit

sphere; also, bn = ▸ γn ≡ γ with γ = O(/ log(/ζ))

▸ Then: with probability at least − ζ, SGD produces afterO(γ− log(/(γζ)))iterations a point which isO(√γ log(/(γζ)))-close to x∗ (and hence awayfrom any strict saddle)

P. Mertikopoulos CNRS & Criteo AI Lab

Page 39: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

25/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Are non-hyperbolic traps avoided always?

Theorem (M, Hallak, Kavis & Cevher, 2020)▸ Assume:

▸ f satisfies (LC) and (LS)▸ Zn is finite (a.s.) and uniformly exciting

E[⟨Z , u⟩+] ≥ c for all unit vectors u ∈ Sd− , x ∈ Rd

▸ γn ∝ /np for some p ∈ (, ]

▸ Then: P(Xn converges to a set of strict saddle points) =

Proof.Use Pemantle (1990) + differential geometric arguments of Benaïm and Hirsch (1995).

P. Mertikopoulos CNRS & Criteo AI Lab

Page 40: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

26/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Outline

Overview

From flows to algorithms

From algorithms to flows

Flows in games

Monotone games

Spurious limits

P. Mertikopoulos CNRS & Criteo AI Lab

Page 41: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

27/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Single- vs. multi-agent setting

In single-agent optimization, first-order iterative schemes▸ Converge to the problem’s set of critical points▸ Avoid spurious, non-minimizing critical manifolds

Does this intuition carry over to games?

Domulti-agent learning algorithms▸ Converge to unilaterally stable/stationary points?▸ Avoid spurious, non-equilibrium points?

P. Mertikopoulos CNRS & Criteo AI Lab

Page 42: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

27/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Single- vs. multi-agent setting

In single-agent optimization, first-order iterative schemes▸ Converge to the problem’s set of critical points▸ Avoid spurious, non-minimizing critical manifolds

Does this intuition carry over to games?

Domulti-agent learning algorithms▸ Converge to unilaterally stable/stationary points?▸ Avoid spurious, non-equilibrium points?

P. Mertikopoulos CNRS & Criteo AI Lab

Page 43: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

28/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Online decision processes

Agents called to take repeated decisions withminimal information:

for n ≥ doChoose action Xn [focal agent choice]

Incur loss ℓn(Xn) [depends on all agents]

end for

Driving question: How to choose “good” actions?▸ Unknown world: no beliefs, knowledge of the game, etc.▸ Minimal information: feedback often limited to incurred losses

P. Mertikopoulos CNRS & Criteo AI Lab

Page 44: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

29/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

N-player games

The game▸ Finite set of players i ∈ N = {, . . . ,N}▸ Each player selects an action from a closed convex set Xi ⊆ Rd i

▸ Loss of player i given by loss function ℓ i ∶X ≡∏i Xi → R

Examples▸ Finite games (mixed extensions)▸ Divisible good auctions (Kelly)▸ Traffic routing▸ Power control/allocation problems▸ Cournot oligopolies▸ ⋯

P. Mertikopoulos CNRS & Criteo AI Lab

Page 45: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

30/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Nash equilibrium

Nash equilibriumAction profile x∗ = (x∗ , . . . , x∗n ) ∈ X that is unilaterally stable

ℓ i(x∗i ; x∗−i) ≤ ℓ i(x i ; x∗−i) for every player i ∈ N and every deviation x i ∈ Xi

▸ Local Nash equilibrium: local version [stable under local deviations]▸ Critical point: unilateral stationarity [x∗i is stationary for ℓ i(⋅, x∗−i)]

Individual loss gradientsVi(x) = ∇x i ℓ i(x i ; x−i)

Ô⇒ individually steepest variation

Variational characterizationIf x∗ is a (local) Nash equilibrium, then

⟨Vi(x∗), x i − x∗i ⟩ ≥ for all i ∈ N , x i ∈ Xi

Intuition: ℓ i(x i ; x∗−i) weakly increasing along all rays emanating from x∗i

P. Mertikopoulos CNRS & Criteo AI Lab

Page 46: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

30/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Nash equilibrium

Nash equilibriumAction profile x∗ = (x∗ , . . . , x∗n ) ∈ X that is unilaterally stable

ℓ i(x∗i ; x∗−i) ≤ ℓ i(x i ; x∗−i) for every player i ∈ N and every deviation x i ∈ Xi

▸ Local Nash equilibrium: local version [stable under local deviations]▸ Critical point: unilateral stationarity [x∗i is stationary for ℓ i(⋅, x∗−i)]

Individual loss gradientsVi(x) = ∇x i ℓ i(x i ; x−i)

Ô⇒ individually steepest variation

Variational characterizationIf x∗ is a (local) Nash equilibrium, then

⟨Vi(x∗), x i − x∗i ⟩ ≥ for all i ∈ N , x i ∈ Xi

Intuition: ℓ i(x i ; x∗−i) weakly increasing along all rays emanating from x∗i

P. Mertikopoulos CNRS & Criteo AI Lab

Page 47: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

30/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Nash equilibrium

Nash equilibriumAction profile x∗ = (x∗ , . . . , x∗n ) ∈ X that is unilaterally stable

ℓ i(x∗i ; x∗−i) ≤ ℓ i(x i ; x∗−i) for every player i ∈ N and every deviation x i ∈ Xi

▸ Local Nash equilibrium: local version [stable under local deviations]▸ Critical point: unilateral stationarity [x∗i is stationary for ℓ i(⋅, x∗−i)]

Individual loss gradientsVi(x) = ∇x i ℓ i(x i ; x−i)

Ô⇒ individually steepest variation

Variational characterizationIf x∗ is a (local) Nash equilibrium, then

⟨Vi(x∗), x i − x∗i ⟩ ≥ for all i ∈ N , x i ∈ Xi

Intuition: ℓ i(x i ; x∗−i) weakly increasing along all rays emanating from x∗i

P. Mertikopoulos CNRS & Criteo AI Lab

Page 48: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

31/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Geometric interpretation

X

TC(x∗)

NC(x∗)

.x∗

−V(x∗)

At Nash equilibrium, individual descent directions are outward-pointing

P. Mertikopoulos CNRS & Criteo AI Lab

Page 49: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

32/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

First-order algorithms in games

Individual gradient field V(x) = (V(x), . . . ,VN(x)), x = (x , . . . , xN)

▸ Individual gradient descent:

Xn+ = Xn − γnV(Xn)

▸ Extra-gradient:

Xn+/ = Xn − γn∇ℓ(Xn) Xn+ = Xn − γn∇ℓ(Xn+/)

▸ ⋯

Mean dynamics:x(t) = −V(x(t)) (MD)

Ô⇒ no longer a gradient system

P. Mertikopoulos CNRS & Criteo AI Lab

Page 50: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

33/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Outline

Overview

From flows to algorithms

From algorithms to flows

Flows in games

Monotone games

Spurious limits

P. Mertikopoulos CNRS & Criteo AI Lab

Page 51: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

34/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

The dynamics of min-max games

Bilinear min-max games (saddle-point problems)

minx∈X

maxx∈X

L(x , x) = (x − b)⊺A(x − b) (SP)

[no constraints: X = Rd ,X = Rd ]

Mean dynamics:

x = −A(x − b) x = A⊺(x − b)

Energy function:E(x) =

∥x − b∥ +

∥x − b∥

Lyapunov property:dEdt≤ w/ equality if A = A⊺

Ô⇒ distance to solutions (weakly) decreasing along (MD)

P. Mertikopoulos CNRS & Criteo AI Lab

Page 52: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

34/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

The dynamics of min-max games

Bilinear min-max games (saddle-point problems)

minx∈X

maxx∈X

L(x , x) = (x − b)⊺A(x − b) (SP)

[no constraints: X = Rd ,X = Rd ]

Mean dynamics:

x = −A(x − b) x = A⊺(x − b)

Energy function:E(x) =

∥x − b∥ +

∥x − b∥

Lyapunov property:dEdt≤ w/ equality if A = A⊺

Ô⇒ distance to solutions (weakly) decreasing along (MD)

P. Mertikopoulos CNRS & Criteo AI Lab

Page 53: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

35/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Cycles

Roadblock: the energy might be a constant of motion [Hofbauer et al., 2009]

-��� -��� ��� ��� ���

-���

-���

���

���

���

��

� �

Figure: Hamiltonian flow of L(x , x) = xx

P. Mertikopoulos CNRS & Criteo AI Lab

Page 54: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

36/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Poincaré recurrence

Definition (Poincaré, 1890’s)A dynamical system is Poincaré recurrent if almost all solution trajectories returninfinitely close to their starting point infinitely many times

Theorem (M, Papadimitriou, Piliouras, 2018; unconstrained version)(MD) is Poincaré recurrent in all bilinear min-max games that admit an equilibrium

P. Mertikopoulos CNRS & Criteo AI Lab

Page 55: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

36/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Poincaré recurrence

Definition (Poincaré, 1890’s)A dynamical system is Poincaré recurrent if almost all solution trajectories returninfinitely close to their starting point infinitely many times

Theorem (M, Papadimitriou, Piliouras, 2018; unconstrained version)(MD) is Poincaré recurrent in all bilinear min-max games that admit an equilibrium

P. Mertikopoulos CNRS & Criteo AI Lab

Page 56: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

37/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Learning in min-max games: gradient descent

Individual gradient descent:

Xn+ = Xn − γnV(Xn)

P. Mertikopoulos CNRS & Criteo AI Lab

Page 57: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

37/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Learning in min-max games: gradient descent

Individual gradient descent:

Xn+ = Xn − γnV(Xn)

Energy no longer a constant:∥Xn+ − x∗∥ =

∥Xn − x∗∥ + γn

hhhhhhh⟨V(Xn), Xn − x∗⟩´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶

from (MD)

+ γn∥V(Xn)∥´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶discretization error

…even worse

P. Mertikopoulos CNRS & Criteo AI Lab

Page 58: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

37/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Learning in min-max games: gradient descent

Individual gradient descent:

Xn+ = Xn − γnV(Xn)

-��� -��� ��� ��� ���

-���

-���

���

���

���

��

� �

P. Mertikopoulos CNRS & Criteo AI Lab

Page 59: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

38/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Learning in min-max games: extra-gradient

Extra-gradient:

Xn+/ = Xn − γn∇ℓ(Xn) Xn+ = Xn − γn∇ℓ(Xn+/)

-��� -��� ��� ��� ���

-���

-���

���

���

���

��

� �

P. Mertikopoulos CNRS & Criteo AI Lab

Page 60: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

39/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Learning in min-max games

Long-run behavior of min-max learning algorithms:

▸ Mean dynamics: Poincaré recurrent (periodic orbits)7 Individual gradient descent: divergence (outward spirals)3 Extra-gradient: convergence (inward spirals)

Different outcomes despite same underlying dynamics!

P. Mertikopoulos CNRS & Criteo AI Lab

Page 61: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

39/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Learning in min-max games

Long-run behavior of min-max learning algorithms:

▸ Mean dynamics: Poincaré recurrent (periodic orbits)7 Individual gradient descent: divergence (outward spirals)3 Extra-gradient: convergence (inward spirals)

Different outcomes despite same underlying dynamics!

P. Mertikopoulos CNRS & Criteo AI Lab

Page 62: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

40/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Monotonicity and strict monotonicity

Bilinear games are special cases ofmonotone games:

⟨V(x′) − V(x), x′ − x⟩ ≥ for all x , x′ ∈ X (MC)

[Ô⇒ strictly monotone if (MC) is strict for x ≠ x′]

Equivalently: H(x) ≽ where H is the game’s Hessian matrix:

H i j(x) =∇x j∇x j ℓ i(x) +

(∇x i∇x j ℓ j(x))

Examples: bilinear games (not strict), Kelly auctions, Cournot markets, routing, …

Nomenclature:▸ Diagonal strict convexity [Rosen, 1965]▸ Stable games [Hofbauer and Sandholm, 2009]▸ Contractive games [Sandholm, 2015]▸ Dissipative games [Sorin and Wan, 2016]

P. Mertikopoulos CNRS & Criteo AI Lab

Page 63: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

40/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Monotonicity and strict monotonicity

Bilinear games are special cases ofmonotone games:

⟨V(x′) − V(x), x′ − x⟩ ≥ for all x , x′ ∈ X (MC)

[Ô⇒ strictly monotone if (MC) is strict for x ≠ x′]

Equivalently: H(x) ≽ where H is the game’s Hessian matrix:

H i j(x) =∇x j∇x j ℓ i(x) +

(∇x i∇x j ℓ j(x))

Examples: bilinear games (not strict), Kelly auctions, Cournot markets, routing, …

Nomenclature:▸ Diagonal strict convexity [Rosen, 1965]▸ Stable games [Hofbauer and Sandholm, 2009]▸ Contractive games [Sandholm, 2015]▸ Dissipative games [Sorin and Wan, 2016]

P. Mertikopoulos CNRS & Criteo AI Lab

Page 64: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

40/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Monotonicity and strict monotonicity

Bilinear games are special cases ofmonotone games:

⟨V(x′) − V(x), x′ − x⟩ ≥ for all x , x′ ∈ X (MC)

[Ô⇒ strictly monotone if (MC) is strict for x ≠ x′]

Equivalently: H(x) ≽ where H is the game’s Hessian matrix:

H i j(x) =∇x j∇x j ℓ i(x) +

(∇x i∇x j ℓ j(x))

Examples: bilinear games (not strict), Kelly auctions, Cournot markets, routing, …

Nomenclature:▸ Diagonal strict convexity [Rosen, 1965]▸ Stable games [Hofbauer and Sandholm, 2009]▸ Contractive games [Sandholm, 2015]▸ Dissipative games [Sorin and Wan, 2016]

P. Mertikopoulos CNRS & Criteo AI Lab

Page 65: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

41/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Convergence to equilibrium

Different behavior under strictmonotonicity:∥Xn+ − x∗∥ =

∥Xn − x∗∥ − γn ⟨V(Xn), Xn − x∗⟩

´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶> if Xn not Nash

+ γn∥V(Xn)∥´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶discretization error

Can the drift overcome the discretization error?

Theorem (M & Zhou, 2019)▸ Assume: strict monotonicity; any of (A); any of (B); (C)▸ Then: any generalized Robbins-Monro learning algorithm converges to the

game’s (unique) Nash equilibrium with probability

In strictly monotone games, gradient methods↝ Nash equilibrium

P. Mertikopoulos CNRS & Criteo AI Lab

Page 66: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

41/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Convergence to equilibrium

Different behavior under strictmonotonicity:∥Xn+ − x∗∥ =

∥Xn − x∗∥ − γn ⟨V(Xn), Xn − x∗⟩

´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶> if Xn not Nash

+ γn∥V(Xn)∥´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶discretization error

Can the drift overcome the discretization error?

Theorem (M & Zhou, 2019)▸ Assume: strict monotonicity; any of (A); any of (B); (C)▸ Then: any generalized Robbins-Monro learning algorithm converges to the

game’s (unique) Nash equilibrium with probability

In strictly monotone games, gradient methods↝ Nash equilibrium

P. Mertikopoulos CNRS & Criteo AI Lab

Page 67: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

42/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Outline

Overview

From flows to algorithms

From algorithms to flows

Flows in games

Monotone games

Spurious limits

P. Mertikopoulos CNRS & Criteo AI Lab

Page 68: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

43/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Almost bilinear games

Consider the “almost bilinear” game

minx∈X

maxx∈X

L(x , x) = xx + εϕ(x)

where ε > and ϕ(x) = (/)x − (/)x

Properties:▸ Unique critical point at the origin▸ Not Nash; unstable under (MD)▸ (MD) attracted to unique, stable limit cycle from almost all initial conditions

[Hsieh,M & Cevher, 2020]

P. Mertikopoulos CNRS & Criteo AI Lab

Page 69: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

44/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Spurious limits in almost bilinear games

Trajectories of (RM) converge to a spurious cycle that contains no critical points

-��� -��� -��� ��� ��� ��� ���

-���

-���

-���

���

���

���

���

-��� -��� -��� ��� ��� ��� ���

-���

-���

-���

���

���

���

���

-��� -��� -��� ��� ��� ��� ���

-���

-���

-���

���

���

���

���

Figure: Left: (MD); center: SGD; right: stochastic extra-gradient (SEG)

P. Mertikopoulos CNRS & Criteo AI Lab

Page 70: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

45/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Forsaken solutions

Another almost bilinear game

minx∈X

maxx∈X

L(x , x) = xx + ε[ϕ(x) − ϕ(x)]

where ε > and ϕ(x) = (/)x − (/)x + (/)x

Properties:▸ Unique critical point at the origin▸ Local Nash equilibrium; stable under (MD)▸ Two isolated periodic orbits:

▸ One unstable, shielding equilibrium, but small▸ One stable, attracts all trajectories of (MD) outside small basin

[Hsieh,M & Cevher, 2020]

P. Mertikopoulos CNRS & Criteo AI Lab

Page 71: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

46/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Forsaken solutions in almost bilinear games

With high probability, (RM) forsakes the game’s unique (local) equilibrium

-��� -��� -��� ��� ��� ��� ���

-���

-���

-���

���

���

���

���

-��� -��� -��� ��� ��� ��� ���

-���

-���

-���

���

���

���

���

-��� -��� -��� ��� ��� ��� ���

-���

-���

-���

���

���

���

���

Figure: Left: (MD); center: SGD; right: SEG

P. Mertikopoulos CNRS & Criteo AI Lab

Page 72: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

47/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

The limits of gradient-based learning in games

Limit cycles Ô⇒ internally chain transitive (ICT) = invariant, no proper attractors

Examples of ICT sets▸ V = ∇ℓ Ô⇒ components of critical points▸ L(x , x) = xx Ô⇒ any annular region centered on (, )▸ Almost bilinear Ô⇒ isolated periodic orbits + unique stationary point

Theorem (Hsieh,M & Cevher, 2020)▸ Assume: any of (A); any of (B); (C)▸ Then:▸ Xn converges to an ICT of (MD) with probability ▸ (RM) converges to attractors of (MD) with arbitrarily high probability

P. Mertikopoulos CNRS & Criteo AI Lab

Page 73: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

47/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

The limits of gradient-based learning in games

Limit cycles Ô⇒ internally chain transitive (ICT) = invariant, no proper attractors

Examples of ICT sets▸ V = ∇ℓ Ô⇒ components of critical points▸ L(x , x) = xx Ô⇒ any annular region centered on (, )▸ Almost bilinear Ô⇒ isolated periodic orbits + unique stationary point

Theorem (Hsieh,M & Cevher, 2020)▸ Assume: any of (A); any of (B); (C)▸ Then:▸ Xn converges to an ICT of (MD) with probability ▸ (RM) converges to attractors of (MD) with arbitrarily high probability

P. Mertikopoulos CNRS & Criteo AI Lab

Page 74: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

48/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Conclusions

In contrast to single-agent problems (optimization), game-theoretic learning▸ May have limit points that are neither stable nor stationary▸ Cannot avoid spurious, non-equilibrium points with positive probability▸ Different approach needed (mixed-strategy learning, multiple-timescales…)

What about finite games?▸ Limit cycles may still appear▸ Which Nash equilibria are stable under no-regret learning?

[stay tuned to CoreLab FM,]

P. Mertikopoulos CNRS & Criteo AI Lab

Page 75: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

48/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

Conclusions

In contrast to single-agent problems (optimization), game-theoretic learning▸ May have limit points that are neither stable nor stationary▸ Cannot avoid spurious, non-equilibrium points with positive probability▸ Different approach needed (mixed-strategy learning, multiple-timescales…)

What about finite games?▸ Limit cycles may still appear▸ Which Nash equilibria are stable under no-regret learning?

[stay tuned to CoreLab FM,]

P. Mertikopoulos CNRS & Criteo AI Lab

Page 76: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

49/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

References I

M. Benaïm. Dynamics of stochastic approximation algorithms. In J. Azéma, M. Émery, M. Ledoux, andM. Yor, editors, Séminaire de Probabilités XXXIII, volume 1709 of Lecture Notes in Mathematics, pages1–68. Springer Berlin Heidelberg, 1999.

M. Benaïm and M. W. Hirsch. Dynamics of Morse-Smale urn processes. Ergodic Theory and DynamicalSystems, 15(6):1005–1030, December 1995.

M. Benaïm and M. W. Hirsch. Asymptotic pseudotrajectories and chain recurrent flows, withapplications. Journal of Dynamics and Differential Equations, 8(1):141–176, 1996.

M. Benaïm, J. Hofbauer, and S. Sorin. Stochastic approximations and differential inclusions. SIAMJournal on Control and Optimization, 44(1):328–348, 2005.

M. Benaïm, J. Hofbauer, and S. Sorin. Stochastic approximations and differential inclusions, part II:Applications. Mathematics of Operations Research, 31(4):673–695, 2006.

A. Benveniste, M. Métivier, and P. Priouret. Adaptive Algorithms and Stochastic Approximations.Springer, 1990.

R. Ge, F. Huang, C. Jin, and Y. Yuan. Escaping from saddle points — Online stochastic gradient for tensordecomposition. In COLT ’15: Proceedings of the 28th Annual Conference on Learning Theory, 2015.

J. Hofbauer and W. H. Sandholm. Stable games and their dynamics. Journal of Economic Theory, 144(4):1665–1693, July 2009.

J. Hofbauer, S. Sorin, and Y. Viossat. Time average replicator and best reply dynamics. Mathematics ofOperations Research, 34(2):263–269, May 2009.

P. Mertikopoulos CNRS & Criteo AI Lab

Page 77: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

50/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

References II

Y.-P. Hsieh, P. Mertikopoulos, and V. Cevher. The limits of min-max optimization algorithms:Convergence to spurious non-critical sets. https://arxiv.org/abs/2006.09065, 2020.

G. M. Korpelevich. The extragradient method for finding saddle points and other problems. Èkonom. iMat. Metody, 12:747–756, 1976.

H. J. Kushner and G. G. Yin. Stochastic approximation algorithms and applications. Springer-Verlag,New York, NY, 1997.

H. Li, Z. Xu, G. Taylor, C. Suder, and T. Goldstein. Visualizing the loss landscape of neural nets. InNeurIPS ’18: Proceedings of the 32nd International Conference of Neural Information ProcessingSystems, 2018.

L. Ljung. Analysis of recursive stochastic algorithms. IEEE Trans. Autom. Control, 22(4):551–575, August1977.

B. Martinet. Régularisation d’inéquations variationnelles par approximations successives. ESAIM:Mathematical Modelling and Numerical Analysis, 4(R3):154–158, 1970.

P. Mertikopoulos and Z. Zhou. Learning in games with continuous action sets and unknown payofffunctions. Mathematical Programming, 173(1-2):465–507, January 2019.

P. Mertikopoulos, C. H. Papadimitriou, and G. Piliouras. Cycles in adversarial regularized learning. InSODA ’18: Proceedings of the 29th annual ACM-SIAM Symposium on Discrete Algorithms, 2018.

P. Mertikopoulos CNRS & Criteo AI Lab

Page 78: LetterSpace=2GAMES, DYNAMICS & OPTIMIZATION

51/51

Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References

References III

P. Mertikopoulos, N. Hallak, A. Kavis, and V. Cevher. On the almost sure convergence of stochasticgradient descent in non-convex problems. In NeurIPS ’20: Proceedings of the 34th InternationalConference on Neural Information Processing Systems, 2020.

J. Palis Jr. and W. de Melo. Geometric Theory of Dynamical Systems. Springer-Verlag, 1982.

R. Pemantle. Nonconvergence to unstable points in urn models and stochastic aproximations. Annals ofProbability, 18(2):698–712, April 1990.

R. T. Rockafellar. Monotone operators and the proximal point algorithm. SIAM Journal on Optimization,14(5):877–898, 1976.

J. B. Rosen. Existence and uniqueness of equilibrium points for concave N-person games.Econometrica, 33(3):520–534, 1965.

W. H. Sandholm. Population games and deterministic evolutionary dynamics. In H. P. Young andS. Zamir, editors, Handbook of Game Theory IV, pages 703–778. Elsevier, 2015.

S. Sorin and C. Wan. Finite composite games: Equilibria and dynamics. Journal of Dynamics andGames, 3(1):101–120, January 2016.

J. C. Spall. Multivariate stochastic approximation using a simultaneous perturbation gradientapproximation. IEEE Trans. Autom. Control, 37(3):332–341, March 1992.

P. Mertikopoulos CNRS & Criteo AI Lab