Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
1/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
GAMES, DYNAMICS & OPTIMIZATION
Panayotis Mertikopoulos
French National Center for Scientific Research (CNRS)
Laboratoire d’Informatique de Grenoble (LIG)
Criteo AI Lab
CoreLab, NTUA — October 5, 2020
P. Mertikopoulos CNRS & Criteo AI Lab
2/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Outline
Overview
From flows to algorithms
From algorithms to flows
Flows in games
Monotone games
Spurious limits
P. Mertikopoulos CNRS & Criteo AI Lab
3/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Overview
What is the long-run behavior of first-order methods in optimization / games?
In optimization:▸ Do first-order (= gradient-based) algorithms converge to critical points?▸ Are local minimizers selected?
In games:▸ Do gradient methods converge to Nash equilibrium?▸ Are all Nash equilibria created equal?
Dynamics: from discrete to continuous and back again
P. Mertikopoulos CNRS & Criteo AI Lab
3/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Overview
What is the long-run behavior of first-order methods in optimization / games?
In optimization:▸ Do first-order (= gradient-based) algorithms converge to critical points?▸ Are local minimizers selected?
In games:▸ Do gradient methods converge to Nash equilibrium?▸ Are all Nash equilibria created equal?
Dynamics: from discrete to continuous and back again
P. Mertikopoulos CNRS & Criteo AI Lab
3/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Overview
What is the long-run behavior of first-order methods in optimization / games?
In optimization:▸ Do first-order (= gradient-based) algorithms converge to critical points?▸ Are local minimizers selected?
In games:▸ Do gradient methods converge to Nash equilibrium?▸ Are all Nash equilibria created equal?
Dynamics: from discrete to continuous and back again
P. Mertikopoulos CNRS & Criteo AI Lab
3/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Overview
What is the long-run behavior of first-order methods in optimization / games?
In optimization:▸ Do first-order (= gradient-based) algorithms converge to critical points?▸ Are local minimizers selected?
In games:▸ Do gradient methods converge to Nash equilibrium?▸ Are all Nash equilibria created equal?
Dynamics: from discrete to continuous and back again
P. Mertikopoulos CNRS & Criteo AI Lab
4/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
About
N. Hallak
A. Kavis Y. -P. Hsieh
V. Cevher G. Piliouras
C. Papadimitriou
Z. Zhou▸ M, Papadimitriou & Piliouras, Cycles in adversarial regularized learning, SODA 2018▸ M & Zhou, Learning in games with continuous action sets and unknown payoff functions,Mathematical Programming, vol. 173, pp. 465–507, Jan. 2019
▸ M, Hallak, Kavis & Cevher,On the almost sure convergence of stochastic gradient descent innon-convex problems, NeurIPS 2020
▸ Hsieh, M & Cevher, The limits of min-max optimization algorithms: convergence to spuriousnon-critical sets, https://arxiv.org/abs/2006.09065
P. Mertikopoulos CNRS & Criteo AI Lab
5/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Outline
Overview
From flows to algorithms
From algorithms to flows
Flows in games
Monotone games
Spurious limits
P. Mertikopoulos CNRS & Criteo AI Lab
6/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Basic problem
minimizex∈Rd f (x)
▸ f non-convex [technical assumptions later]
▸ f unknown/difficult to manipulate in closed form [low precision methods]
▸ Single-player game: calculate best responses [more in second part]
P. Mertikopoulos CNRS & Criteo AI Lab
7/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Gradient flows
Gradient flow of a function f ∶Rd → R
x(t) = −∇ f (x(t)) (GF)
Main property: f is a (strict) Lyapunov function for (GF)
d f /dt = −∥∇ f (x(t))∥ ≤ w/ equality iff∇ f (x) =
P. Mertikopoulos CNRS & Criteo AI Lab
8/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Convergence of gradient flows
Blanket assumptions▸ Lipschitz smoothness:
∥∇ f (x′) −∇ f (x)∥ ≤ L∥x′ − x∥ for all x , x′ ∈ Rd (LS)
▸ Bounded sublevels:
Lc ≡ {x ∈ Rd ∶ f (x) ≤ c} is bounded for all c < sup f (sub)
Theorem▸ Assume: (LS), (sub)▸ Then: x(t) converges to crit( f ) ≡ {x∗ ∈ Rd ∶ ∇ f (x∗) = }
[NB: setwise, not pointwise convergence, cf. Palis Jr. and de Melo, 1982]
P. Mertikopoulos CNRS & Criteo AI Lab
9/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
From flows to algorithms: gradient descent
Forward Euler (explicit) Ô⇒ gradient descent (GD) [Cauchy, 1847]
Xn+ = Xn − γn∇ f (Xn) (GD)
x
x+
−γ∇ f (x)
P. Mertikopoulos CNRS & Criteo AI Lab
10/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
From flows to algorithms: proximal gradient
Backward Euler (implicit) Ô⇒ proximal gradient (PG) [Martinet, 1970]
Xn+ = Xn − γn∇ f (Xn+) (PG)
x
x+−γ∇ f (x+)
P. Mertikopoulos CNRS & Criteo AI Lab
11/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
From flows to algorithms: extra-gradient
Midpoint Runge-Kutta (explicit) Ô⇒ extra-gradient (EG) [Korpelevich, 1976]
Xn+/ = Xn − γn∇ f (Xn) Xn+ = Xn − γn∇ f (Xn+/) (EG)
x
xlead
−γ∇ f (x)
−γ∇ f (xlead)
x+−γ∇ f (xlead)
P. Mertikopoulos CNRS & Criteo AI Lab
11/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
From flows to algorithms: extra-gradient
Midpoint Runge-Kutta (explicit) Ô⇒ extra-gradient (EG) [Korpelevich, 1976]
Xn+/ = Xn − γn∇ f (Xn) Xn+ = Xn − γn∇ f (Xn+/) (EG)
x
xlead
−γ∇ f (x) −γ∇ f (xlead)
x+−γ∇ f (xlead)
P. Mertikopoulos CNRS & Criteo AI Lab
11/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
From flows to algorithms: extra-gradient
Midpoint Runge-Kutta (explicit) Ô⇒ extra-gradient (EG) [Korpelevich, 1976]
Xn+/ = Xn − γn∇ f (Xn) Xn+ = Xn − γn∇ f (Xn+/) (EG)
x
xlead
−γ∇ f (x) −γ∇ f (xlead)
x+−γ∇ f (xlead)
P. Mertikopoulos CNRS & Criteo AI Lab
12/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Stochastic gradient feedback
In many applications, perfect gradient information is unavailable / too costly:
▸ Machine learning:f (x) = ∑N
i= f i(x) and only a batch of∇ f i(x) is computable per iteration
▸ Control / Engineering:f (x) = E[F(x;ω)] and only∇F(x;ω) can be observed for a random ω
▸ Game Theory / Bandit Learning:Only f (x) is observable
Stochastic first-order oracle (SFO) feedback:
Xn ↦ Vn¯
feedback
= ∇ f (Xn)´¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¶gradient
+ Zn
noise
+ bn®bias
(SFO)
where Zn is “zero-mean” and bn is “small” (more later)
P. Mertikopoulos CNRS & Criteo AI Lab
12/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Stochastic gradient feedback
In many applications, perfect gradient information is unavailable / too costly:
▸ Machine learning:f (x) = ∑N
i= f i(x) and only a batch of∇ f i(x) is computable per iteration
▸ Control / Engineering:f (x) = E[F(x;ω)] and only∇F(x;ω) can be observed for a random ω
▸ Game Theory / Bandit Learning:Only f (x) is observable
Stochastic first-order oracle (SFO) feedback:
Xn ↦ Vn¯
feedback
= ∇ f (Xn)´¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¶gradient
+ Zn
noise
+ bn®bias
(SFO)
where Zn is “zero-mean” and bn is “small” (more later)
P. Mertikopoulos CNRS & Criteo AI Lab
13/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Stochastic gradient descent
Noisy Euler (explicit) Ô⇒ stochastic gradient descent (SGD)
Xn+ = Xn − γn[∇ f (Xn) + Wn°noise
] (SGD)
x
x+
−γ[∇ f (x) +w]
P. Mertikopoulos CNRS & Criteo AI Lab
14/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Example: zeroth-order feedback
Given f ∶R→ R, estimate f ′(x) at target point x ∈ R
f ′(x) ≈ f (x + δ) − f (x − δ)δ
Pick u = ± with probability /. Then:
E[ f (x + δu)u] = f (x + δ) −
f (x − δ)
Ô⇒ Estimate f ′(x) with a single query of f at x = x + δu
Algorithm Simultaneous perturbation stochastic approximation [Spall, 1992]
1: Draw u uniformly from Sd
2: Query x = x + δu3: Get f = f (x)4: Set V = (d/δ) f u
P. Mertikopoulos CNRS & Criteo AI Lab
14/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Example: zeroth-order feedback
Given f ∶R→ R, estimate f ′(x) at target point x ∈ R
f ′(x) ≈ f (x + δ) − f (x − δ)δ
Pick u = ± with probability /. Then:
E[ f (x + δu)u] = f (x + δ) −
f (x − δ)
Ô⇒ Estimate f ′(x) with a single query of f at x = x + δu
Algorithm Simultaneous perturbation stochastic approximation [Spall, 1992]
1: Draw u uniformly from Sd
2: Query x = x + δu3: Get f = f (x)4: Set V = (d/δ) f u
P. Mertikopoulos CNRS & Criteo AI Lab
15/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
The Robbins-Monro template
Generalized Robbins-Monro:
Xn+ = Xn − γn[∇ f (Xn) + Zn + bn] (RM)
with∑n γn =∞, γn → , and E[Zn ∣ Xn , . . . , X] =
Examples▸ Gradient descent (det.): Zn = , bn = ▸ Proximal gradient (det.): Zn = , bn = ∇ f (Xn+) −∇ f (Xn)
▸ Extra-gradient (det.): Zn = , bn = ∇ f (Xn+/) −∇ f (Xn)
▸ Stochastic gradient descent (stoch.): Zn = zero-mean, bn =
▸ SPSA (stoch.): Zn = (d/δ) f (Xn)Un −∇ fδ(Xn), bn = ∇ fδ(Xn) −∇ f (Xn) where
fδ(x) =
vol(Bδ) ∫Bδf (x + δu) du
▸ ⋯
P. Mertikopoulos CNRS & Criteo AI Lab
16/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Outline
Overview
From flows to algorithms
From algorithms to flows
Flows in games
Monotone games
Spurious limits
P. Mertikopoulos CNRS & Criteo AI Lab
17/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
From algorithms to flows
Basic idea: if γn is “small”, the noise washes out and “ limt→∞ (RM) = limt→∞ (GF) ”
Ô⇒ ODE method of stochastic approximation[Ljung, 1977; Benveniste et al., 1990; Kushner and Yin, 1997; Benaïm, 1999]
▸ Virtual time: τn = ∑nk= γk
▸ Virtual trajectory: X(t) = Xn +t − τn
τn+ − τn(Xn+ − Xn)
▸ Asymptotic pseudotrajectory (APT):limt→∞
sup≤h≤T
∥X(t + h) −Φh(X(t))∥ =
where Φs(x) denotes the position at time s of an orbit of (GF) starting at x▸ Long run: X(t) tracks (GF) with arbitrary accuracy over windows of arbitrary length
[Benaïm and Hirsch, 1995, 1996; Benaïm, 1999; Benaïm et al., 2005, 2006]
P. Mertikopoulos CNRS & Criteo AI Lab
17/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
From algorithms to flows
Basic idea: if γn is “small”, the noise washes out and “ limt→∞ (RM) = limt→∞ (GF) ”
Ô⇒ ODE method of stochastic approximation[Ljung, 1977; Benveniste et al., 1990; Kushner and Yin, 1997; Benaïm, 1999]
▸ Virtual time: τn = ∑nk= γk
▸ Virtual trajectory: X(t) = Xn +t − τn
τn+ − τn(Xn+ − Xn)
▸ Asymptotic pseudotrajectory (APT):limt→∞
sup≤h≤T
∥X(t + h) −Φh(X(t))∥ =
where Φs(x) denotes the position at time s of an orbit of (GF) starting at x▸ Long run: X(t) tracks (GF) with arbitrary accuracy over windows of arbitrary length
[Benaïm and Hirsch, 1995, 1996; Benaïm, 1999; Benaïm et al., 2005, 2006]
P. Mertikopoulos CNRS & Criteo AI Lab
18/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Stochastic approximation criteria
When is a sequence generated by (RM) an APT?
(A) ▸ Xn is bounded▸ f is Lipschitz continuous and smooth:
∣ f (x′) − f (x)∣ ≤ G∥x′ − x∥ (LC)∥∇ f (x′) −∇ f (x)∥ ≤ L∥x′ − x∥ (LS)
(B) ▸ E[∑n γn∥Zn∥] <∞
▸ supn E[∥Zn∥q] <∞ and∑n γ+q/n <∞
▸ Zn sub-Gaussian and γn = o(/ log n)
(C) ▸ ∑n γnbn = with probability
Proposition (Benaïm, 1999; Hsieh,M & Cevher, 2020)▸ Assume: any of (A); any of (B); (C)▸ Then: Xn is an APT of (GF) with probability
P. Mertikopoulos CNRS & Criteo AI Lab
19/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Convergence of APTs
Theorem (Benaïm and Hirsch, 1995, 1996)▸ Assume: Xn is a bounded APT of (GF)▸ Then: Xn converges to crit( f ) with probability
Theorem (Ljung, 1977; Benaïm, 1999)▸ Assume: (LC), (LS), (sub); supn∥Xn∥ <∞▸ Then: Xn converges (a.s.) to a component of crit( f ) where f is constant
Boundedness: implicit, algorithm-dependent assumption; non-verifiable!
P. Mertikopoulos CNRS & Criteo AI Lab
19/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Convergence of APTs
Theorem (Benaïm and Hirsch, 1995, 1996)▸ Assume: Xn is a bounded APT of (GF)▸ Then: Xn converges to crit( f ) with probability
Theorem (Ljung, 1977; Benaïm, 1999)▸ Assume: (LC), (LS), (sub); supn∥Xn∥ <∞▸ Then: Xn converges (a.s.) to a component of crit( f ) where f is constant
Boundedness: implicit, algorithm-dependent assumption; non-verifiable!
P. Mertikopoulos CNRS & Criteo AI Lab
20/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Can boundedness be dropped?
Key obstacle: infinite plains of vanishing gradients [think f (x) = − exp(−x)]
Countered if gradient sublevel sets do not extend to infinity
Mε ≡ {x ∈ Rd ∶ ∥∇ f (x)∥ ≤ ε} is bounded for some ε > (Gsub)
[standard under regularization]
Proposition (M, Hallak, Kavis & Cevher, 2020)▸ Assume: (LC), (LS), (sub), (Gsub)▸ Then: for all ε > , there exists some τ = τ(ε) such that, for all t ≥ τ:
(a) f (x(t)) ≤ f (x()) − ε; or(b) x(t) is within ε-distance of crit( f )
In words: (GF) either descends f by a uniform amount, or it is already near-critical
P. Mertikopoulos CNRS & Criteo AI Lab
20/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Can boundedness be dropped?
Key obstacle: infinite plains of vanishing gradients [think f (x) = − exp(−x)]
Countered if gradient sublevel sets do not extend to infinity
Mε ≡ {x ∈ Rd ∶ ∥∇ f (x)∥ ≤ ε} is bounded for some ε > (Gsub)
[standard under regularization]
Proposition (M, Hallak, Kavis & Cevher, 2020)▸ Assume: (LC), (LS), (sub), (Gsub)▸ Then: for all ε > , there exists some τ = τ(ε) such that, for all t ≥ τ:
(a) f (x(t)) ≤ f (x()) − ε; or(b) x(t) is within ε-distance of crit( f )
In words: (GF) either descends f by a uniform amount, or it is already near-critical
P. Mertikopoulos CNRS & Criteo AI Lab
20/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Can boundedness be dropped?
Key obstacle: infinite plains of vanishing gradients [think f (x) = − exp(−x)]
Countered if gradient sublevel sets do not extend to infinity
Mε ≡ {x ∈ Rd ∶ ∥∇ f (x)∥ ≤ ε} is bounded for some ε > (Gsub)
[standard under regularization]
Proposition (M, Hallak, Kavis & Cevher, 2020)▸ Assume: (LC), (LS), (sub), (Gsub)▸ Then: for all ε > , there exists some τ = τ(ε) such that, for all t ≥ τ:
(a) f (x(t)) ≤ f (x()) − ε; or(b) x(t) is within ε-distance of crit( f )
In words: (GF) either descends f by a uniform amount, or it is already near-critical
P. Mertikopoulos CNRS & Criteo AI Lab
21/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Can boundedness be dropped?
Proposition▸ Assume: (LC), (LS), (sub), (Gsub); any of (B); (C)▸ Then: With probability , a subsequence of Xn converges to crit( f )
Theorem (M, Hallak, Kavis & Cevher, 2020)▸ Assume: (LC), (LS), (sub), (Gsub); any of (B); (C)▸ Then: With probability , Xn converges to a (possibly random) component
of crit( f ) over which f is constant
P. Mertikopoulos CNRS & Criteo AI Lab
21/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Can boundedness be dropped?
Proposition▸ Assume: (LC), (LS), (sub), (Gsub); any of (B); (C)▸ Then: With probability , a subsequence of Xn converges to crit( f )
Theorem (M, Hallak, Kavis & Cevher, 2020)▸ Assume: (LC), (LS), (sub), (Gsub); any of (B); (C)▸ Then: With probability , Xn converges to a (possibly random) component
of crit( f ) over which f is constant
P. Mertikopoulos CNRS & Criteo AI Lab
22/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Are all critical points desirable?
Figure: A hyperbolic ridge manifold, typical of ResNet loss landscapes [Li et al., 2018]
P. Mertikopoulos CNRS & Criteo AI Lab
23/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Are traps avoided?
Hyperbolic saddle (isolated non-minimizing critical point)
λmin(Hess( f (x∗))) < , det(Hess( f (x∗))) ≠
Ô⇒ (GF) is linearly unstable near x∗
Ô⇒ convergence to x∗ unlikely
Theorem (Pemantle, 1990)▸ Assume:
▸ x∗ is a hyperbolic saddle point▸ Zn is finite (a.s.) and uniformly exciting
E[⟨Z , u⟩+] ≥ c for all unit vectors u ∈ Sd− , x ∈ Rd
▸ γn ∝ /n
▸ Then: P(limn→∞ Xn = x∗) =
P. Mertikopoulos CNRS & Criteo AI Lab
23/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Are traps avoided?
Hyperbolic saddle (isolated non-minimizing critical point)
λmin(Hess( f (x∗))) < , det(Hess( f (x∗))) ≠
Ô⇒ (GF) is linearly unstable near x∗
Ô⇒ convergence to x∗ unlikely
Theorem (Pemantle, 1990)▸ Assume:
▸ x∗ is a hyperbolic saddle point▸ Zn is finite (a.s.) and uniformly exciting
E[⟨Z , u⟩+] ≥ c for all unit vectors u ∈ Sd− , x ∈ Rd
▸ γn ∝ /n
▸ Then: P(limn→∞ Xn = x∗) =
P. Mertikopoulos CNRS & Criteo AI Lab
24/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Are non-hyperbolic traps avoided?
Strict saddleλmin(Hess( f (x∗))) <
Theorem (Ge et al., 2015)▸ Given: confidence level ζ > ▸ Assume:
▸ f is bounded and satisfies (LS)▸ Hess( f (x)) is Lipschitz continuous▸ for all x ∈ Rd : (a) ∥∇ f (x)∥ ≥ ε; or (b) λmin(Hess( f (x))) ≤ −β; or (c) x is δ-close
to a local minimum x∗ of f around which f is α-strongly convex▸ Zn is finite (a.s.) and contains a component uniformly sampled from the unit
sphere; also, bn = ▸ γn ≡ γ with γ = O(/ log(/ζ))
▸ Then: with probability at least − ζ, SGD produces afterO(γ− log(/(γζ)))iterations a point which isO(√γ log(/(γζ)))-close to x∗ (and hence awayfrom any strict saddle)
P. Mertikopoulos CNRS & Criteo AI Lab
24/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Are non-hyperbolic traps avoided?
Strict saddleλmin(Hess( f (x∗))) <
Theorem (Ge et al., 2015)▸ Given: confidence level ζ > ▸ Assume:
▸ f is bounded and satisfies (LS)▸ Hess( f (x)) is Lipschitz continuous▸ for all x ∈ Rd : (a) ∥∇ f (x)∥ ≥ ε; or (b) λmin(Hess( f (x))) ≤ −β; or (c) x is δ-close
to a local minimum x∗ of f around which f is α-strongly convex▸ Zn is finite (a.s.) and contains a component uniformly sampled from the unit
sphere; also, bn = ▸ γn ≡ γ with γ = O(/ log(/ζ))
▸ Then: with probability at least − ζ, SGD produces afterO(γ− log(/(γζ)))iterations a point which isO(√γ log(/(γζ)))-close to x∗ (and hence awayfrom any strict saddle)
P. Mertikopoulos CNRS & Criteo AI Lab
25/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Are non-hyperbolic traps avoided always?
Theorem (M, Hallak, Kavis & Cevher, 2020)▸ Assume:
▸ f satisfies (LC) and (LS)▸ Zn is finite (a.s.) and uniformly exciting
E[⟨Z , u⟩+] ≥ c for all unit vectors u ∈ Sd− , x ∈ Rd
▸ γn ∝ /np for some p ∈ (, ]
▸ Then: P(Xn converges to a set of strict saddle points) =
Proof.Use Pemantle (1990) + differential geometric arguments of Benaïm and Hirsch (1995).
P. Mertikopoulos CNRS & Criteo AI Lab
26/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Outline
Overview
From flows to algorithms
From algorithms to flows
Flows in games
Monotone games
Spurious limits
P. Mertikopoulos CNRS & Criteo AI Lab
27/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Single- vs. multi-agent setting
In single-agent optimization, first-order iterative schemes▸ Converge to the problem’s set of critical points▸ Avoid spurious, non-minimizing critical manifolds
Does this intuition carry over to games?
Domulti-agent learning algorithms▸ Converge to unilaterally stable/stationary points?▸ Avoid spurious, non-equilibrium points?
P. Mertikopoulos CNRS & Criteo AI Lab
27/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Single- vs. multi-agent setting
In single-agent optimization, first-order iterative schemes▸ Converge to the problem’s set of critical points▸ Avoid spurious, non-minimizing critical manifolds
Does this intuition carry over to games?
Domulti-agent learning algorithms▸ Converge to unilaterally stable/stationary points?▸ Avoid spurious, non-equilibrium points?
P. Mertikopoulos CNRS & Criteo AI Lab
28/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Online decision processes
Agents called to take repeated decisions withminimal information:
for n ≥ doChoose action Xn [focal agent choice]
Incur loss ℓn(Xn) [depends on all agents]
end for
Driving question: How to choose “good” actions?▸ Unknown world: no beliefs, knowledge of the game, etc.▸ Minimal information: feedback often limited to incurred losses
P. Mertikopoulos CNRS & Criteo AI Lab
29/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
N-player games
The game▸ Finite set of players i ∈ N = {, . . . ,N}▸ Each player selects an action from a closed convex set Xi ⊆ Rd i
▸ Loss of player i given by loss function ℓ i ∶X ≡∏i Xi → R
Examples▸ Finite games (mixed extensions)▸ Divisible good auctions (Kelly)▸ Traffic routing▸ Power control/allocation problems▸ Cournot oligopolies▸ ⋯
P. Mertikopoulos CNRS & Criteo AI Lab
30/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Nash equilibrium
Nash equilibriumAction profile x∗ = (x∗ , . . . , x∗n ) ∈ X that is unilaterally stable
ℓ i(x∗i ; x∗−i) ≤ ℓ i(x i ; x∗−i) for every player i ∈ N and every deviation x i ∈ Xi
▸ Local Nash equilibrium: local version [stable under local deviations]▸ Critical point: unilateral stationarity [x∗i is stationary for ℓ i(⋅, x∗−i)]
Individual loss gradientsVi(x) = ∇x i ℓ i(x i ; x−i)
Ô⇒ individually steepest variation
Variational characterizationIf x∗ is a (local) Nash equilibrium, then
⟨Vi(x∗), x i − x∗i ⟩ ≥ for all i ∈ N , x i ∈ Xi
Intuition: ℓ i(x i ; x∗−i) weakly increasing along all rays emanating from x∗i
P. Mertikopoulos CNRS & Criteo AI Lab
30/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Nash equilibrium
Nash equilibriumAction profile x∗ = (x∗ , . . . , x∗n ) ∈ X that is unilaterally stable
ℓ i(x∗i ; x∗−i) ≤ ℓ i(x i ; x∗−i) for every player i ∈ N and every deviation x i ∈ Xi
▸ Local Nash equilibrium: local version [stable under local deviations]▸ Critical point: unilateral stationarity [x∗i is stationary for ℓ i(⋅, x∗−i)]
Individual loss gradientsVi(x) = ∇x i ℓ i(x i ; x−i)
Ô⇒ individually steepest variation
Variational characterizationIf x∗ is a (local) Nash equilibrium, then
⟨Vi(x∗), x i − x∗i ⟩ ≥ for all i ∈ N , x i ∈ Xi
Intuition: ℓ i(x i ; x∗−i) weakly increasing along all rays emanating from x∗i
P. Mertikopoulos CNRS & Criteo AI Lab
30/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Nash equilibrium
Nash equilibriumAction profile x∗ = (x∗ , . . . , x∗n ) ∈ X that is unilaterally stable
ℓ i(x∗i ; x∗−i) ≤ ℓ i(x i ; x∗−i) for every player i ∈ N and every deviation x i ∈ Xi
▸ Local Nash equilibrium: local version [stable under local deviations]▸ Critical point: unilateral stationarity [x∗i is stationary for ℓ i(⋅, x∗−i)]
Individual loss gradientsVi(x) = ∇x i ℓ i(x i ; x−i)
Ô⇒ individually steepest variation
Variational characterizationIf x∗ is a (local) Nash equilibrium, then
⟨Vi(x∗), x i − x∗i ⟩ ≥ for all i ∈ N , x i ∈ Xi
Intuition: ℓ i(x i ; x∗−i) weakly increasing along all rays emanating from x∗i
P. Mertikopoulos CNRS & Criteo AI Lab
31/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Geometric interpretation
X
TC(x∗)
NC(x∗)
.x∗
−V(x∗)
At Nash equilibrium, individual descent directions are outward-pointing
P. Mertikopoulos CNRS & Criteo AI Lab
32/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
First-order algorithms in games
Individual gradient field V(x) = (V(x), . . . ,VN(x)), x = (x , . . . , xN)
▸ Individual gradient descent:
Xn+ = Xn − γnV(Xn)
▸ Extra-gradient:
Xn+/ = Xn − γn∇ℓ(Xn) Xn+ = Xn − γn∇ℓ(Xn+/)
▸ ⋯
Mean dynamics:x(t) = −V(x(t)) (MD)
Ô⇒ no longer a gradient system
P. Mertikopoulos CNRS & Criteo AI Lab
33/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Outline
Overview
From flows to algorithms
From algorithms to flows
Flows in games
Monotone games
Spurious limits
P. Mertikopoulos CNRS & Criteo AI Lab
34/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
The dynamics of min-max games
Bilinear min-max games (saddle-point problems)
minx∈X
maxx∈X
L(x , x) = (x − b)⊺A(x − b) (SP)
[no constraints: X = Rd ,X = Rd ]
Mean dynamics:
x = −A(x − b) x = A⊺(x − b)
Energy function:E(x) =
∥x − b∥ +
∥x − b∥
Lyapunov property:dEdt≤ w/ equality if A = A⊺
Ô⇒ distance to solutions (weakly) decreasing along (MD)
P. Mertikopoulos CNRS & Criteo AI Lab
34/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
The dynamics of min-max games
Bilinear min-max games (saddle-point problems)
minx∈X
maxx∈X
L(x , x) = (x − b)⊺A(x − b) (SP)
[no constraints: X = Rd ,X = Rd ]
Mean dynamics:
x = −A(x − b) x = A⊺(x − b)
Energy function:E(x) =
∥x − b∥ +
∥x − b∥
Lyapunov property:dEdt≤ w/ equality if A = A⊺
Ô⇒ distance to solutions (weakly) decreasing along (MD)
P. Mertikopoulos CNRS & Criteo AI Lab
35/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Cycles
Roadblock: the energy might be a constant of motion [Hofbauer et al., 2009]
-��� -��� ��� ��� ���
-���
-���
���
���
���
��
� �
Figure: Hamiltonian flow of L(x , x) = xx
P. Mertikopoulos CNRS & Criteo AI Lab
36/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Poincaré recurrence
Definition (Poincaré, 1890’s)A dynamical system is Poincaré recurrent if almost all solution trajectories returninfinitely close to their starting point infinitely many times
Theorem (M, Papadimitriou, Piliouras, 2018; unconstrained version)(MD) is Poincaré recurrent in all bilinear min-max games that admit an equilibrium
P. Mertikopoulos CNRS & Criteo AI Lab
36/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Poincaré recurrence
Definition (Poincaré, 1890’s)A dynamical system is Poincaré recurrent if almost all solution trajectories returninfinitely close to their starting point infinitely many times
Theorem (M, Papadimitriou, Piliouras, 2018; unconstrained version)(MD) is Poincaré recurrent in all bilinear min-max games that admit an equilibrium
P. Mertikopoulos CNRS & Criteo AI Lab
37/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Learning in min-max games: gradient descent
Individual gradient descent:
Xn+ = Xn − γnV(Xn)
P. Mertikopoulos CNRS & Criteo AI Lab
37/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Learning in min-max games: gradient descent
Individual gradient descent:
Xn+ = Xn − γnV(Xn)
Energy no longer a constant:∥Xn+ − x∗∥ =
∥Xn − x∗∥ + γn
hhhhhhh⟨V(Xn), Xn − x∗⟩´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶
from (MD)
+ γn∥V(Xn)∥´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶discretization error
…even worse
P. Mertikopoulos CNRS & Criteo AI Lab
37/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Learning in min-max games: gradient descent
Individual gradient descent:
Xn+ = Xn − γnV(Xn)
-��� -��� ��� ��� ���
-���
-���
���
���
���
��
� �
P. Mertikopoulos CNRS & Criteo AI Lab
38/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Learning in min-max games: extra-gradient
Extra-gradient:
Xn+/ = Xn − γn∇ℓ(Xn) Xn+ = Xn − γn∇ℓ(Xn+/)
-��� -��� ��� ��� ���
-���
-���
���
���
���
��
� �
P. Mertikopoulos CNRS & Criteo AI Lab
39/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Learning in min-max games
Long-run behavior of min-max learning algorithms:
▸ Mean dynamics: Poincaré recurrent (periodic orbits)7 Individual gradient descent: divergence (outward spirals)3 Extra-gradient: convergence (inward spirals)
Different outcomes despite same underlying dynamics!
P. Mertikopoulos CNRS & Criteo AI Lab
39/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Learning in min-max games
Long-run behavior of min-max learning algorithms:
▸ Mean dynamics: Poincaré recurrent (periodic orbits)7 Individual gradient descent: divergence (outward spirals)3 Extra-gradient: convergence (inward spirals)
Different outcomes despite same underlying dynamics!
P. Mertikopoulos CNRS & Criteo AI Lab
40/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Monotonicity and strict monotonicity
Bilinear games are special cases ofmonotone games:
⟨V(x′) − V(x), x′ − x⟩ ≥ for all x , x′ ∈ X (MC)
[Ô⇒ strictly monotone if (MC) is strict for x ≠ x′]
Equivalently: H(x) ≽ where H is the game’s Hessian matrix:
H i j(x) =∇x j∇x j ℓ i(x) +
(∇x i∇x j ℓ j(x))
⊺
Examples: bilinear games (not strict), Kelly auctions, Cournot markets, routing, …
Nomenclature:▸ Diagonal strict convexity [Rosen, 1965]▸ Stable games [Hofbauer and Sandholm, 2009]▸ Contractive games [Sandholm, 2015]▸ Dissipative games [Sorin and Wan, 2016]
P. Mertikopoulos CNRS & Criteo AI Lab
40/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Monotonicity and strict monotonicity
Bilinear games are special cases ofmonotone games:
⟨V(x′) − V(x), x′ − x⟩ ≥ for all x , x′ ∈ X (MC)
[Ô⇒ strictly monotone if (MC) is strict for x ≠ x′]
Equivalently: H(x) ≽ where H is the game’s Hessian matrix:
H i j(x) =∇x j∇x j ℓ i(x) +
(∇x i∇x j ℓ j(x))
⊺
Examples: bilinear games (not strict), Kelly auctions, Cournot markets, routing, …
Nomenclature:▸ Diagonal strict convexity [Rosen, 1965]▸ Stable games [Hofbauer and Sandholm, 2009]▸ Contractive games [Sandholm, 2015]▸ Dissipative games [Sorin and Wan, 2016]
P. Mertikopoulos CNRS & Criteo AI Lab
40/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Monotonicity and strict monotonicity
Bilinear games are special cases ofmonotone games:
⟨V(x′) − V(x), x′ − x⟩ ≥ for all x , x′ ∈ X (MC)
[Ô⇒ strictly monotone if (MC) is strict for x ≠ x′]
Equivalently: H(x) ≽ where H is the game’s Hessian matrix:
H i j(x) =∇x j∇x j ℓ i(x) +
(∇x i∇x j ℓ j(x))
⊺
Examples: bilinear games (not strict), Kelly auctions, Cournot markets, routing, …
Nomenclature:▸ Diagonal strict convexity [Rosen, 1965]▸ Stable games [Hofbauer and Sandholm, 2009]▸ Contractive games [Sandholm, 2015]▸ Dissipative games [Sorin and Wan, 2016]
P. Mertikopoulos CNRS & Criteo AI Lab
41/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Convergence to equilibrium
Different behavior under strictmonotonicity:∥Xn+ − x∗∥ =
∥Xn − x∗∥ − γn ⟨V(Xn), Xn − x∗⟩
´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶> if Xn not Nash
+ γn∥V(Xn)∥´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶discretization error
Can the drift overcome the discretization error?
Theorem (M & Zhou, 2019)▸ Assume: strict monotonicity; any of (A); any of (B); (C)▸ Then: any generalized Robbins-Monro learning algorithm converges to the
game’s (unique) Nash equilibrium with probability
In strictly monotone games, gradient methods↝ Nash equilibrium
P. Mertikopoulos CNRS & Criteo AI Lab
41/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Convergence to equilibrium
Different behavior under strictmonotonicity:∥Xn+ − x∗∥ =
∥Xn − x∗∥ − γn ⟨V(Xn), Xn − x∗⟩
´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶> if Xn not Nash
+ γn∥V(Xn)∥´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶discretization error
Can the drift overcome the discretization error?
Theorem (M & Zhou, 2019)▸ Assume: strict monotonicity; any of (A); any of (B); (C)▸ Then: any generalized Robbins-Monro learning algorithm converges to the
game’s (unique) Nash equilibrium with probability
In strictly monotone games, gradient methods↝ Nash equilibrium
P. Mertikopoulos CNRS & Criteo AI Lab
42/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Outline
Overview
From flows to algorithms
From algorithms to flows
Flows in games
Monotone games
Spurious limits
P. Mertikopoulos CNRS & Criteo AI Lab
43/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Almost bilinear games
Consider the “almost bilinear” game
minx∈X
maxx∈X
L(x , x) = xx + εϕ(x)
where ε > and ϕ(x) = (/)x − (/)x
Properties:▸ Unique critical point at the origin▸ Not Nash; unstable under (MD)▸ (MD) attracted to unique, stable limit cycle from almost all initial conditions
[Hsieh,M & Cevher, 2020]
P. Mertikopoulos CNRS & Criteo AI Lab
44/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Spurious limits in almost bilinear games
Trajectories of (RM) converge to a spurious cycle that contains no critical points
-��� -��� -��� ��� ��� ��� ���
-���
-���
-���
���
���
���
���
-��� -��� -��� ��� ��� ��� ���
-���
-���
-���
���
���
���
���
-��� -��� -��� ��� ��� ��� ���
-���
-���
-���
���
���
���
���
Figure: Left: (MD); center: SGD; right: stochastic extra-gradient (SEG)
P. Mertikopoulos CNRS & Criteo AI Lab
45/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Forsaken solutions
Another almost bilinear game
minx∈X
maxx∈X
L(x , x) = xx + ε[ϕ(x) − ϕ(x)]
where ε > and ϕ(x) = (/)x − (/)x + (/)x
Properties:▸ Unique critical point at the origin▸ Local Nash equilibrium; stable under (MD)▸ Two isolated periodic orbits:
▸ One unstable, shielding equilibrium, but small▸ One stable, attracts all trajectories of (MD) outside small basin
[Hsieh,M & Cevher, 2020]
P. Mertikopoulos CNRS & Criteo AI Lab
46/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Forsaken solutions in almost bilinear games
With high probability, (RM) forsakes the game’s unique (local) equilibrium
-��� -��� -��� ��� ��� ��� ���
-���
-���
-���
���
���
���
���
-��� -��� -��� ��� ��� ��� ���
-���
-���
-���
���
���
���
���
-��� -��� -��� ��� ��� ��� ���
-���
-���
-���
���
���
���
���
Figure: Left: (MD); center: SGD; right: SEG
P. Mertikopoulos CNRS & Criteo AI Lab
47/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
The limits of gradient-based learning in games
Limit cycles Ô⇒ internally chain transitive (ICT) = invariant, no proper attractors
Examples of ICT sets▸ V = ∇ℓ Ô⇒ components of critical points▸ L(x , x) = xx Ô⇒ any annular region centered on (, )▸ Almost bilinear Ô⇒ isolated periodic orbits + unique stationary point
Theorem (Hsieh,M & Cevher, 2020)▸ Assume: any of (A); any of (B); (C)▸ Then:▸ Xn converges to an ICT of (MD) with probability ▸ (RM) converges to attractors of (MD) with arbitrarily high probability
P. Mertikopoulos CNRS & Criteo AI Lab
47/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
The limits of gradient-based learning in games
Limit cycles Ô⇒ internally chain transitive (ICT) = invariant, no proper attractors
Examples of ICT sets▸ V = ∇ℓ Ô⇒ components of critical points▸ L(x , x) = xx Ô⇒ any annular region centered on (, )▸ Almost bilinear Ô⇒ isolated periodic orbits + unique stationary point
Theorem (Hsieh,M & Cevher, 2020)▸ Assume: any of (A); any of (B); (C)▸ Then:▸ Xn converges to an ICT of (MD) with probability ▸ (RM) converges to attractors of (MD) with arbitrarily high probability
P. Mertikopoulos CNRS & Criteo AI Lab
48/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Conclusions
In contrast to single-agent problems (optimization), game-theoretic learning▸ May have limit points that are neither stable nor stationary▸ Cannot avoid spurious, non-equilibrium points with positive probability▸ Different approach needed (mixed-strategy learning, multiple-timescales…)
What about finite games?▸ Limit cycles may still appear▸ Which Nash equilibria are stable under no-regret learning?
[stay tuned to CoreLab FM,]
P. Mertikopoulos CNRS & Criteo AI Lab
48/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
Conclusions
In contrast to single-agent problems (optimization), game-theoretic learning▸ May have limit points that are neither stable nor stationary▸ Cannot avoid spurious, non-equilibrium points with positive probability▸ Different approach needed (mixed-strategy learning, multiple-timescales…)
What about finite games?▸ Limit cycles may still appear▸ Which Nash equilibria are stable under no-regret learning?
[stay tuned to CoreLab FM,]
P. Mertikopoulos CNRS & Criteo AI Lab
49/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
References I
M. Benaïm. Dynamics of stochastic approximation algorithms. In J. Azéma, M. Émery, M. Ledoux, andM. Yor, editors, Séminaire de Probabilités XXXIII, volume 1709 of Lecture Notes in Mathematics, pages1–68. Springer Berlin Heidelberg, 1999.
M. Benaïm and M. W. Hirsch. Dynamics of Morse-Smale urn processes. Ergodic Theory and DynamicalSystems, 15(6):1005–1030, December 1995.
M. Benaïm and M. W. Hirsch. Asymptotic pseudotrajectories and chain recurrent flows, withapplications. Journal of Dynamics and Differential Equations, 8(1):141–176, 1996.
M. Benaïm, J. Hofbauer, and S. Sorin. Stochastic approximations and differential inclusions. SIAMJournal on Control and Optimization, 44(1):328–348, 2005.
M. Benaïm, J. Hofbauer, and S. Sorin. Stochastic approximations and differential inclusions, part II:Applications. Mathematics of Operations Research, 31(4):673–695, 2006.
A. Benveniste, M. Métivier, and P. Priouret. Adaptive Algorithms and Stochastic Approximations.Springer, 1990.
R. Ge, F. Huang, C. Jin, and Y. Yuan. Escaping from saddle points — Online stochastic gradient for tensordecomposition. In COLT ’15: Proceedings of the 28th Annual Conference on Learning Theory, 2015.
J. Hofbauer and W. H. Sandholm. Stable games and their dynamics. Journal of Economic Theory, 144(4):1665–1693, July 2009.
J. Hofbauer, S. Sorin, and Y. Viossat. Time average replicator and best reply dynamics. Mathematics ofOperations Research, 34(2):263–269, May 2009.
P. Mertikopoulos CNRS & Criteo AI Lab
50/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
References II
Y.-P. Hsieh, P. Mertikopoulos, and V. Cevher. The limits of min-max optimization algorithms:Convergence to spurious non-critical sets. https://arxiv.org/abs/2006.09065, 2020.
G. M. Korpelevich. The extragradient method for finding saddle points and other problems. Èkonom. iMat. Metody, 12:747–756, 1976.
H. J. Kushner and G. G. Yin. Stochastic approximation algorithms and applications. Springer-Verlag,New York, NY, 1997.
H. Li, Z. Xu, G. Taylor, C. Suder, and T. Goldstein. Visualizing the loss landscape of neural nets. InNeurIPS ’18: Proceedings of the 32nd International Conference of Neural Information ProcessingSystems, 2018.
L. Ljung. Analysis of recursive stochastic algorithms. IEEE Trans. Autom. Control, 22(4):551–575, August1977.
B. Martinet. Régularisation d’inéquations variationnelles par approximations successives. ESAIM:Mathematical Modelling and Numerical Analysis, 4(R3):154–158, 1970.
P. Mertikopoulos and Z. Zhou. Learning in games with continuous action sets and unknown payofffunctions. Mathematical Programming, 173(1-2):465–507, January 2019.
P. Mertikopoulos, C. H. Papadimitriou, and G. Piliouras. Cycles in adversarial regularized learning. InSODA ’18: Proceedings of the 29th annual ACM-SIAM Symposium on Discrete Algorithms, 2018.
P. Mertikopoulos CNRS & Criteo AI Lab
51/51
Overview From flows to algorithms From algorithms to flows Flows in games Monotone games Spurious limits References
References III
P. Mertikopoulos, N. Hallak, A. Kavis, and V. Cevher. On the almost sure convergence of stochasticgradient descent in non-convex problems. In NeurIPS ’20: Proceedings of the 34th InternationalConference on Neural Information Processing Systems, 2020.
J. Palis Jr. and W. de Melo. Geometric Theory of Dynamical Systems. Springer-Verlag, 1982.
R. Pemantle. Nonconvergence to unstable points in urn models and stochastic aproximations. Annals ofProbability, 18(2):698–712, April 1990.
R. T. Rockafellar. Monotone operators and the proximal point algorithm. SIAM Journal on Optimization,14(5):877–898, 1976.
J. B. Rosen. Existence and uniqueness of equilibrium points for concave N-person games.Econometrica, 33(3):520–534, 1965.
W. H. Sandholm. Population games and deterministic evolutionary dynamics. In H. P. Young andS. Zamir, editors, Handbook of Game Theory IV, pages 703–778. Elsevier, 2015.
S. Sorin and C. Wan. Finite composite games: Equilibria and dynamics. Journal of Dynamics andGames, 3(1):101–120, January 2016.
J. C. Spall. Multivariate stochastic approximation using a simultaneous perturbation gradientapproximation. IEEE Trans. Autom. Control, 37(3):332–341, March 1992.
P. Mertikopoulos CNRS & Criteo AI Lab