86
Distributed Solvers for Online Data-Driven Network Optimization Jorge Cort´ es Mechanical and Aerospace Engineering University of California, San Diego http://carmenere.ucsd.edu/jorge NREL Workshop: Innovative Optimization and Control Methods for Highly Distributed Autonomous Systems Table Mountain Inn, Golden, CO April 11-12, 2019

Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

Distributed Solvers for Online Data-Driven Network Optimization

UNIVERSITY OF CALIFORNIAUNOFFICIAL SEAL

Attachment B - “Unofficial” SealFor Use on Letterhead

Jorge Cortes

Mechanical and Aerospace Engineering University of California, San Diego http://carmenere.ucsd.edu/jorge

NREL Workshop: Innovative Optimization and Control Methods for Highly Distributed Autonomous Systems

Table Mountain Inn, Golden, CO April 11-12, 2019

Page 2: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

Acknowledgments

Ashish Priyank Simon Erfan Nozari David Mateos Dean Richert Michael McCreesh Cherukuri Srivastava Niederlander

Pavan Tallapragada

Bahman Gharesifard

Solmaz Kia Chin-Yao Chang Miguel Vaquero

Sonia Martinez Enrique Steven Low Marcello Emiliano Dall’Anese Mallada Colombino

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 2 / 38

Page 3: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

Uncertainty pervasive too: interconnected world with

unstructured environments

complex, changing network dynamics

humans in the loop

contested scenarios, adversaries

´

Network Optimization is Pervasive

Optimizing agent operation with limited network resources

Grid of the future Intelligent transportation Disaster response

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 3 / 38

Page 4: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

Network Optimization is Pervasive

Optimizing agent operation with limited network resources

Grid of the future Intelligent transportation Disaster response

Uncertainty pervasive too: interconnected world with

unstructured environments complex, changing network dynamics humans in the loop contested scenarios, adversaries

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 3 / 38

Page 5: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

Basic Taxonomy for Network Optimization

Simple, yet remarkably ubiquitous formulation in multi-agent applications

minimize ‘aggregate objective function’(x)

subject to ‘ineq constraints’(x)

‘eq constraints’(x)

Large-scale systems

coupling might come from objective, constraints, or both

individual agents may seek to find global solution or only own component

coupling topology versus network topology: varying degree of sparsity all the way to non-sparse at all

How do we solve optimization in a distributed way?

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 4 / 38

Page 6: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

Optimizers of convex problems ⇔ saddle points of convex-concave Lagrangian

L(x , y , z) = f (x) + y>g(x) + z>(Ax − b)

Dynamical systems approach to algorithms

dynamical systems that solve problemsin linear algebra, systems, optimization

´

Distributed Solvers for Network Optimization

Network optimization w/ distributed structure (widespread in multi-agent scenarios)

nX minimize f (x) = fi (xi ) (separable objective)

i=1

subject to g(x) ≤ 0

Ax = b (locally expressible)

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 5 / 38

Page 7: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

Dynamical systems approach to algorithms

dynamical systems that solve problemsin linear algebra, systems, optimization

´

Distributed Solvers for Network Optimization

Network optimization w/ distributed structure (widespread in multi-agent scenarios)

nX minimize f (x) = fi (xi ) (separable objective)

i=1

subject to g(x) ≤ 0

Ax = b (locally expressible)

Optimizers of convex problems ⇔ saddle points of convex-concave Lagrangian

L(x , y , z) = f (x) + y > g(x) + z >(Ax − b)

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 5 / 38

Page 8: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

Distributed Solvers for Network Optimization

Network optimization w/ distributed structure (widespread in multi-agent scenarios)

nX minimize f (x) = fi (xi ) (separable objective)

i=1

subject to g(x) ≤ 0

Ax = b (locally expressible)

Optimizers of convex problems ⇔ saddle points of convex-concave Lagrangian

L(x , y , z) = f (x) + y > g(x) + z >(Ax − b)

Dynamical systems approach to algorithms

dynamical systems that solve problems in linear algebra, systems, optimization

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 5 / 38

Page 9: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

Rich dynamical behavior

characterization of stability and convergence properties

appealing for large-scale systems: easily implementableby individual agents

higher-order methods difficult to “distribute”, errors incomm&sensing lead to errors in higher-order terms

´

Distributed Algorithm Design via Primal-Dual Dynamics

Saddle points of L can be found via saddle-point dynamics

x = −rx L(x , y , z)

y = [ry L(x , y , z)]+ y

z = rz L(x , y , z)

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 6 / 38

Page 10: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

Rich dynamical behavior

characterization of stability and convergence properties

appealing for large-scale systems: easily implementableby individual agents

higher-order methods difficult to “distribute”, errors incomm&sensing lead to errors in higher-order terms

´

Distributed Algorithm Design via Primal-Dual Dynamics

Saddle points of L can be found via saddle-point dynamics

x = −rf (x) − y T rg(x) − AT z

y = [g(x)]+ y

z = Ax − b

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 6 / 38

Page 11: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

Rich dynamical behavior

characterization of stability and convergence properties

appealing for large-scale systems: easily implementableby individual agents

higher-order methods difficult to “distribute”, errors incomm&sensing lead to errors in higher-order terms

´

Distributed Algorithm Design via Primal-Dual Dynamics

Saddle points of L can be found via saddle-point dynamics X X∂fi ∂gα xi = − (xi ) − yα (xi , xNi ) − zβ Aβi

∂xi ∂xiα β

yα = [gα(x)]+ yα X

zβ = Aβj xj − bβ

j

From agent viewpoint, problem structure gives rise to distributed dynamics

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 6 / 38

Page 12: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

Distributed Algorithm Design via Primal-Dual Dynamics

Saddle points of L can be found via saddle-point dynamics X X∂fi ∂gα xi = − (xi ) − yα (xi , xNi ) − zβ Aβi

∂xi ∂xiα β

yα = [gα(x)]+ yα X

zβ = Aβj xj − bβ

j

From agent viewpoint, problem structure gives rise to distributed dynamics

Rich dynamical behavior

characterization of stability and convergence properties

appealing for large-scale systems: easily implementable by individual agents

higher-order methods difficult to “distribute”, errors in comm&sensing lead to errors in higher-order terms

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 6 / 38

Page 13: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

A. Cherukuri, B. Gharesifard, and J. Cortes. Saddle-point dynamics: conditions for asymptotic stability of saddle points.

SIAM Journal on Control and Optimization, 55(1):486–511, 2017N. K. Dhingra, S. Z. Khong, and M. R. Jovanovic. The proximal augmented Lagrangian method for nonsmooth composite optimization.

IEEE Transactions on Automatic Control, 2018.

To appear

Systems & Control Letters, 87:10–15, 2016´

Stability of Primal-Dual Dynamics Saddle points not necessarily asymptotically stable

Primal-dual dynamics for convex-concave F (x , z) = xz [Samuelson, 58]

x = −rx F (x , z) = −z

z = rz F (x , z) = x

Saddle point (0, 0) is stable, not asymptotically stable -1.0 -0.5 0.0 0.5 1.0-1.0

-0.5

0.0

0.5

1.0

x

z

Stream of results to understand asymptotic convergence & properties

K. Arrow, L Hurwitz, and H. Uzawa. Studies in Linear and Non-Linear Programming.

Stanford University Press, Stanford, California, 1958 D. Feijer and F. Paganini. Stability of primal-dual gradient dynamics and applications to network optimization.

Automatica, 46:1974–1981, 2010 J. Wang and N. Elia. Control approach to distributed optimization.

In Allerton Conf. on Communications, Control and Computing, pages 557–561, Monticello, IL, October 2010 J. Wang and N. Elia. A control perspective for centralized and distributed convex optimization.

In IEEE Conf. on Decision and Control, pages 3800–3805, Orlando, Florida, 2011 J. Chen and V. K. N. Lau. Convergence analysis of saddle point problems in time varying wireless systems - control theoretical approach.

IEEE Transactions on Signal Processing, 60(1):443–452, 2012 E. Mallada, C. Zhao, and S. H. Low. Optimal load-side control for frequency regulation in smart grids.

IEEE Transactions on Automatic Control, 62(12):6294–6309, 2017 T. Holding and I. Lestas. Stability and instability in saddle point dynamics - Part I.

2017.

https://arxiv.org/abs/1707.07349 T. Holding and I. Lestas. Stability and instability in saddle point dynamics - Part II: The subgradient method.

2017.

https://arxiv.org/abs/1707.07351 A. Cherukuri, E. Mallada, and J. Cortes. Asymptotic convergence of constrained primal-dual dynamics.

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 7 / 38

Page 14: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

LaSalle Functions for Asymptotic Convergence

Positive-definite functions with negative semi-definite derivative

distance to saddle point (x∗, y∗, z∗)

Vd (x , y , z) = 12

� kx − x∗k2 + ky − y∗k2 + kz − z∗k2

magnitude of vector field

Vm(x , y , z) = 12 (krx F (x , y , z)k2 + krz F (x , y , z)k2 X

+ ((ry F (x , y , z))j )2)

j active

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 8 / 38

Page 15: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

LaSalle Functions for Asymptotic Convergence

Positive-definite functions with negative semi-definite derivative

distance to saddle point (x∗, y∗, z∗)

Vd (x , y , z) = 12

� kx − x∗k2 + ky − y∗k2 + kz − z∗k2

magnitude of vector field

Vm(x , y , z) = 12 (krx F (x , y , z)k2 + krz F (x , y , z)k2 X

+ ((ry F (x , y , z))j )2)

j active

Global convergence under

convexity-concavity plus local strong convexity-concavity

convexity-linearity plus property of sets of saddle-points

quasiconvexity-quasiconcavity plus property of sets of saddle-points

local second-order information about saddle function

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 8 / 38

Page 16: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

LaSalle Functions for Asymptotic Convergence

Positive-definite functions with negative semi-definite derivative

distance to saddle point (x∗, y∗, z∗) � � Vd (x , y , z) = 1

2 kx − x∗k2 + ky − y∗k2 + kz − z∗k2

magnitude of vector field

Vm(x , y , z) = 12 (krx F (x , y , z)k2 + krz F (x , y , z)k2 X

+ ((ry F (x , y , z))j )2)

j active

LaSalle arguments show convergence, but not enough for

characterization of convergence rate

dealing with errors in computation/comm/sensing

characterization of robustness against disturbances

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 8 / 38

Page 17: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

Lyapunov Function for Constrained Optimization

For f : Rn → R strongly convex, twice continuously differentiable,g : Rn → Rp convex, twice continuously differentiable,

V (x , y , z) = 12k(x , y , z)k2Saddle(F ) + Vm(x , y , z)

1 V positive definite with respect to Saddle(F )

2 V negative definite: t 7→ V (x(t), y(t), z(t)) right-continuous, a.e. differentiable,

ddt

V (x(t), y(t), z(t)) < 0 for t where derivative exists & (x(t), y(t), z(t)) 6∈ Saddle(F )

V (x(t0), y(t0), z(t0)) ≤ limt↑t0 V (x(t), y(t), z(t)) for all t0 ≥ 0

A. Cherukuri, E. Mallada, S. H. Low, and J. Cortes. The role of convexity in saddle-point dynamics: Lyapunov function and robustness.IEEE Transactions on Automatic Control, 63(8):2449–2464, 2018

´

Beyond Asymptotic Stability

Identification of Lyapunov functions of primal-dual dynamics —leading to systematic characterization of convergence properties

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 9 / 38

Page 18: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

1 V positive definite with respect to Saddle(F )

2 V negative definite: t 7→ V (x(t), y(t), z(t)) right-continuous, a.e. differentiable,

ddt

V (x(t), y(t), z(t)) < 0 for t where derivative exists & (x(t), y(t), z(t)) 6∈ Saddle(F )

V (x(t0), y(t0), z(t0)) ≤ limt↑t0 V (x(t), y(t), z(t)) for all t0 ≥ 0

A. Cherukuri, E. Mallada, S. H. Low, and J. Cortes. The role of convexity in saddle-point dynamics: Lyapunov function and robustness.IEEE Transactions on Automatic Control, 63(8):2449–2464, 2018

´

Beyond Asymptotic Stability

Identification of Lyapunov functions of primal-dual dynamics —leading to systematic characterization of convergence properties

Lyapunov Function for Constrained Optimization

For f : Rn → R strongly convex, twice continuously differentiable, g : Rn → Rp convex, twice continuously differentiable,

V (x , y , z) = 1 2 k(x , y , z)k

2 Saddle(F )

+ Vm(x , y , z)

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 9 / 38

Page 19: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

Beyond Asymptotic Stability

Identification of Lyapunov functions of primal-dual dynamics —leading to systematic characterization of convergence properties

Lyapunov Function for Constrained Optimization

For f : Rn → R strongly convex, twice continuously differentiable, g : Rn → Rp convex, twice continuously differentiable,

V (x , y , z) = 1 2 k(x , y , z)k

2 Saddle(F )

+ Vm(x , y , z)

1

2

V positive definite with respect to Saddle(F )

V negative definite: t 7→ V (x(t), y (t), z(t)) right-continuous, a.e. differentiable,

d V (x(t), y(t), z(t)) < 0 for t where derivative exists & (x(t), y(t), z(t)) 6∈ Saddle(F )dt

V (x(t0), y(t0), z(t0)) ≤ lim t↑t0 V (x(t), y(t), z(t)) for all t0 ≥ 0

A. Cherukuri, E. Mallada, S. H. Low, and J. Cortes. The role of convexity in saddle-point dynamics: Lyapunov function and robustness. IEEE Transactions on Automatic Control, 63(8):2449–2464, 2018

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 9 / 38

Page 20: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

Real-time implementation of the dynamics

adjust stepsize opportunistically based on system stateaperiodic discrete-time implementation w/same convergence guarantees

Tracking in time-varying optimization problems, data-driven implementations

approx.dynamics = true dynamics + error

estimates/data in lieu of exact elements to close the loop/avoid expensivecentralized computations/circumvent complex dynamicsonline formulations with tracking guarantees & handling of streaming data

´

Implications

Algorithm robustness against disturbances

true dynamics + disturbances

characterization of input-to-state stability properties of primal-dual dynamics graceful performance degradation as a function of size of disturbance

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 10 / 38

Page 21: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

Tracking in time-varying optimization problems, data-driven implementations

approx.dynamics = true dynamics + error

estimates/data in lieu of exact elements to close the loop/avoid expensivecentralized computations/circumvent complex dynamicsonline formulations with tracking guarantees & handling of streaming data

´

Implications

Algorithm robustness against disturbances

true dynamics + disturbances

characterization of input-to-state stability properties of primal-dual dynamics graceful performance degradation as a function of size of disturbance

Real-time implementation of the dynamics

adjust stepsize opportunistically based on system state aperiodic discrete-time implementation w/same convergence guarantees

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 10 / 38

Page 22: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

Implications

Algorithm robustness against disturbances

true dynamics + disturbances

characterization of input-to-state stability properties of primal-dual dynamics graceful performance degradation as a function of size of disturbance

Real-time implementation of the dynamics

adjust stepsize opportunistically based on system state aperiodic discrete-time implementation w/same convergence guarantees

Tracking in time-varying optimization problems, data-driven implementations

approx.dynamics = true dynamics + error

estimates/data in lieu of exact elements to close the loop/avoid expensive centralized computations/circumvent complex dynamics online formulations with tracking guarantees & handling of streaming data

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 10 / 38

Page 23: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

Journal of Machine Learning Research, 17:1–43, 2016

A. C. Wilson, B. Recht, and M. I. Jordan. A Lyapunov analysis of momentum methods in optimization.2018.Available at https://arxiv.org/pdf/1611.02635.pdf

W. Su, S. Boyd, and E. J. Candes. A differential equation for modeling Nesterov’s accelerated gradient´

Primal-Dual Dynamics as Versatile Tool

Characterization of rates, speed, and acceleration N. K. Dhingra, S. Z. Khong, and M. R. Jovanovic. The proximal augmented Lagrangian method for nonsmooth composite optimization. IEEE Transactions on Automatic Control, 2018. To appear

J. Cortes and S. K. Niederlander. Distributed coordination for nonsmooth convex optimization via saddle-point dynamics. Journal of Nonlinear Science, 2019. To appear

G. Qu and N. Li. On the exponential stability of primal-dual gradient dynamics. 2018. Available online at https://arxiv.org/abs/1803.01825

W. Shi, Q. Ling, G. Wu, and W. Yin. EXTRA: an exact first-order algorithm for decentralized consensus optimization. SIAM Journal on Optimization, 25(2):944–966, 2015

M. McCreesh, J. Cortes, and B. Gharesifard. Accelerated convergence of saddle-point dynamics for convex-concave quadratic functions. In IEEE Conf. on Decision and Control, Nice, France, December 2019. Submitted

Graceful degradation as a function of size of disturbance A. Cherukuri, E. Mallada, S. H. Low, and J. Cortes. The role of convexity in saddle-point dynamics: Lyapunov function and robustness. IEEE Transactions on Automatic Control, 63(8):2449–2464, 2018

H. D. Nguyen, T. L. Vu, K. Turitsyn, and J. Slotine. Contraction and robustness of continuous time primal-dual dynamics. IEEE Control Systems Letters, 2(4):755–760, 2018

Feedback-based, data-driven, online formulation, tracking guarantees A. Bernstein and E. Dall’Anese. Real-time feedback-based optimization of distribution grids: a unified approach. 2017. https://arxiv.org/abs/1711.01627

M. Colombino, E. Dall’Anese, and A. Bernstein. Online optimization as a feedback controller: Stability and tracking. IEEE Transactions on Control of Network Systems, 2018. Submitted

E. Dall’Anese, S. Guggilam, A. Simonetto, Y. C. Chen, and S. V. Dhople. Optimal regulation of virtual power plants. IEEE Transactions on Power Systems, 33(2):1868–1881, 2018

Continuous-time vs discrete-time dynamics, machine learning ´ method: theory and insights. J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 11 / 38

Page 24: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

Beyond Convergence #1: ISS with Respect to Saddle(F )

Robustness to errors in the gradient computation, noise in state measurements, errors in the controller implementation?

Theorem (Equality-Constrained Optimization)

For f strongly convex and C 2 , g = 0, with mI � r2f (x) � MI ∀x ∈ Rn , � x z

=

�−rx F (x , z) rz F (x , z)

+

�ux

uz

=

�−rf (x) − A>z

Ax − b

+

�ux

uz

is ISS with respect to Saddle(F )

Proof: Vβ (x , z) = β 1 1 k(x , z)k2 + β2Vm(x , z) is ISS-Lyapunov function 2 Saddle(F )

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 12 / 38

Page 25: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

Beyond Convergence #1: ISS with Respect to Saddle(F )

Robustness to errors in the gradient computation, noise in state measurements, errors in the controller implementation?

Conjecture (Constrained Optimization)

For f strongly convex and C 2 , g convex and C 2, with mI � r2f (x) � MI ∀x ∈ Rn , ⎡

⎣ x yz

⎦=

⎣ −rx F (x , y , z) [ry F (x , y , z)]+

y rz F (x , y , z)

⎦+

⎣ ux

uy

uz

⎦=

⎣ −rf (x) − yT rg(x) − A>z

[g(x)]+ y

Ax − b

⎦+

⎣ ux

uy

uz

is ISS with respect to Saddle(F )

Proof: ISS-Lyapunov function theory for switched systems?

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 12 / 38

Page 26: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

Example: Input-to-State Stability

⎡1 1

⎤ ⎡2⎤

f (x) = kxk2 , A = 1 0 and b = 1⎣ ⎦ , ⎣ ⎦

0 1 1

� Saddle(F ) = (x , z) ∈ R2 × R3 | x = (1, 1), z = −(1, 1, 1) + λ(1, −1, −1), λ ∈ R

f is C2, strongly convex, r2f (x) = 2I

0 5 10 15 20 25

-3

-1

1

3

x1 x2 z1 z2 z3

0 5 10 15 20 25

0

1

2

3

saddle-point dynamics distance to saddle points

Vanishing disturbance

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 13 / 38

Page 27: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

Example: Input-to-State Stability

⎡1 1

⎤ ⎡2⎤

f (x) = kxk2 , A = 1 0 and b = 1⎣ ⎦ , ⎣ ⎦

0 1 1

� Saddle(F ) = (x , z) ∈ R2 × R3 | x = (1, 1), z = −(1, 1, 1) + λ(1, −1, −1), λ ∈ R

f is C2, strongly convex, r2f (x) = 2I

0 5 10 15 20 25

-3

-1

1

3

x1 x2 z1 z2 z3

0 5 10 15 20 25

0

1

2

3

saddle-point dynamics distance to saddle points

“Constant + Sinusoid” disturbance

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 13 / 38

Page 28: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

Given sequence of triggering time instants {tk}∞k=0,

x(t) = −rxF (x(tk), z(tk))

z(t) = rzF (x(tk), z(tk))

for t ∈ [tk , tk+1) and k ∈ Z≥0

Objective: Design criterium to opportunistically select {tk}∞k=0 such thatfeasible executions: inter-trigger times lower bounded by positive quantity

global asymptotic convergence is retained

-

´

Beyond Convergence #2: Real Time Implementation

Opportunistic state-triggered implementation

avoid continuous evaluation of the vector field

adjust stepsize opportunistically based on state of the system

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 14 / 38

Page 29: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

Objective: Design criterium to opportunistically select {tk}∞k=0 such thatfeasible executions: inter-trigger times lower bounded by positive quantity

global asymptotic convergence is retained

-

´

Beyond Convergence #2: Real Time Implementation

Opportunistic state-triggered implementation

avoid continuous evaluation of the vector field

adjust stepsize opportunistically based on state of the system

Given sequence of triggering time instants {tk }∞ k=0,

x(t) = −rx F (x(tk ), z(tk ))

z(t) = rz F (x(tk ), z(tk ))

for t ∈ [tk , tk+1) and k ∈ Z≥0

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 14 / 38

Page 30: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

-

´

Beyond Convergence #2: Real Time Implementation

Opportunistic state-triggered implementation

avoid continuous evaluation of the vector field

adjust stepsize opportunistically based on state of the system

Given sequence of triggering time instants {tk }∞ k=0,

x(t) = −rx F (x(tk ), z(tk ))

z(t) = rz F (x(tk ), z(tk ))

for t ∈ [tk , tk+1) and k ∈ Z≥0

Objective: Design criterium to opportunistically select {tk }∞ such that k=0

feasible executions: inter-trigger times lower bounded by positive quantity

global asymptotic convergence is retained

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 14 / 38

Page 31: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

Opportunistic state-triggered paradigm

trade-offs: comp, comm, sensing, control

identify criteria to autonomously trigger actionsbased on task – ‘active’ asynchronism

efficient implementations, incorporates uncertainty

´

Resource-Aware Control and Coordination

Continuous or periodic implementation paradigm

costly-to-implement synchronization for information sharing, processing, decision making

‘passive’ asynchronism, fixed agent time schedules

inefficient implementations for processor usage, communication bandwidth, energy

Time

Agents

Time

Certificate

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 15 / 38

Page 32: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

Resource-Aware Control and Coordination

Continuous or periodic implementation paradigm

costly-to-implement synchronization for information sharing, processing, decision making

‘passive’ asynchronism, fixed agent time schedules

inefficient implementations for processor usage, communication bandwidth, energy

Opportunistic state-triggered paradigm

trade-offs: comp, comm, sensing, control

identify criteria to autonomously trigger actions based on task – ‘active’ asynchronism

efficient implementations, incorporates uncertainty

Time

Agents

Time

Certificate

Time

Agents

Time

Certificate

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 15 / 38

Page 33: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

A Picture (A Movie) is Worth a Thousand Words

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 16 / 38

Page 34: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

Insights

feasibility: guaranteed monotonic decrease of function, but aperiodic executionsmight not be feasible (accumulation of triggered times, Zeno)

certificate: LaSalle function clearly not good enough

trigger: specific challenges for network systems, both in design (local triggers) andanalysis (asynchronism, Zeno)

trigger: stabilization versus optimization

´

How to Decide When to Update?

Simplified setup: system x = f (x , u) on Rn with stabilization via

controller: k : Rn → Rm

certificate: Lyapunov function V : Rn → R

Synthesis for x = f (x , k(x)), w/ x sampled version of x?

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 17 / 38

Page 35: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

Insights

feasibility: guaranteed monotonic decrease of function, but aperiodic executionsmight not be feasible (accumulation of triggered times, Zeno)

certificate: LaSalle function clearly not good enough

trigger: specific challenges for network systems, both in design (local triggers) andanalysis (asynchronism, Zeno)

trigger: stabilization versus optimization

´

How to Decide When to Update?

Simplified setup: system x = f (x , u) on Rn with stabilization via

controller: k : Rn → Rm

certificate: Lyapunov function V : Rn → R

Synthesis for x = f (x , k(x)), w/ x sampled version of x?

V = rV (x) · f (x , k(x)) = rV (x) · f (x , k(x)) + rV (x) · (f (x , k(x)) − f (x , k(x)))

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 17 / 38

Page 36: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

Insights

feasibility: guaranteed monotonic decrease of function, but aperiodic executionsmight not be feasible (accumulation of triggered times, Zeno)

certificate: LaSalle function clearly not good enough

trigger: specific challenges for network systems, both in design (local triggers) andanalysis (asynchronism, Zeno)

trigger: stabilization versus optimization

´

How to Decide When to Update?

Simplified setup: system x = f (x , u) on Rn with stabilization via

controller: k : Rn → Rm

certificate: Lyapunov function V : Rn → R

Synthesis for x = f (x , k(x)), w/ x sampled version of x?

V = rV (x) · f (x , k(x)) ≤ rV (x) · f (x , k(x)) + h(x) kx − xk

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 17 / 38

Page 37: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

Insights

feasibility: guaranteed monotonic decrease of function, but aperiodic executionsmight not be feasible (accumulation of triggered times, Zeno)

certificate: LaSalle function clearly not good enough

trigger: specific challenges for network systems, both in design (local triggers) andanalysis (asynchronism, Zeno)

trigger: stabilization versus optimization

´

How to Decide When to Update?

Simplified setup: system x = f (x , u) on Rn with stabilization via

controller: k : Rn → Rm

certificate: Lyapunov function V : Rn → R

Synthesis for x = f (x , k(x)), w/ x sampled version of x?

−Lf (x,k(x))V (x)Trigger criterium: kx − xk ≤

h(x)

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 17 / 38

Page 38: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

How to Decide When to Update?

Simplified setup: system x = f (x , u) on Rn with stabilization via

controller: k : Rn → Rm

certificate: Lyapunov function V : Rn → R

Synthesis for x = f (x , k(x)), w/ x sampled version of x?

−Lf (x,k(x))V (x)Trigger criterium: kx − xk ≤

h(x)

Insights

feasibility: guaranteed monotonic decrease of function, but aperiodic executions might not be feasible (accumulation of triggered times, Zeno)

certificate: LaSalle function clearly not good enough

trigger: specific challenges for network systems, both in design (local triggers) and analysis (asynchronism, Zeno)

trigger: stabilization versus optimization

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 17 / 38

Page 39: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

State-Triggered Implementation for Primal-Dual Dynamics

Given sequence of triggering time instants {tk }∞ k=0,

x(t) = −rx F (x(tk ), z(tk ))

z(t) = rz F (x(tk ), z(tk ))

for t ∈ [tk , tk+1) and k ∈ Z≥0

Approach: Use Vβ to design criterium to opportunistically select {tk }∞ k=0

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 18 / 38

Page 40: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

Approach: Use Vβ to design criterium to opportunistically select {tk}∞k=0

´

State-Triggered Implementation for Primal-Dual Dynamics

Given sequence of triggering time instants {tk }∞ k=0,

x(t) = −rx F (x(tk ), z(tk ))

z(t) = rz F (x(tk ), z(tk ))

for t ∈ [tk , tk+1) and k ∈ Z≥0

State-triggered criterium

Vβ (x(tk ), z(tk ))LXsptk+1 = tk − ξ(x(tk ), z(tk ))kXsp(x(tk ), z(tk ))k2

Next triggering time computable with information available at current one

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 18 / 38

Page 41: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

Approach: Use Vβ to design criterium to opportunistically select {tk}∞k=0

´

State-Triggered Implementation for Primal-Dual Dynamics

Given sequence of triggering time instants {tk }∞ k=0,

x(t) = −rx F (x(tk ), z(tk ))

z(t) = rz F (x(tk ), z(tk ))

for t ∈ [tk , tk+1) and k ∈ Z≥0

Theorem For f strongly convex and C 2, with mI � r2f (x) � MI and x 7→ r2f (x) Lipschitz, A full row rank,

trajectories of self-triggered dynamics converge to (x∗, z∗)

inter-event times are lower bounded by positive quantity

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 18 / 38

Page 42: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

Example: Self-Triggered Implementation

F (x , z) = kxk2 + z(x1 + x2 + x3 − 1).

f (x) = kxk2 satisfies hypotheses, A = [1, 1, 1] full row-rank 1 1Saddle(F ) = {(( 13 , 3 , ), − 2 )}3 3

0 5 10 15

-3

-2

-1

0

1

2

3

x1 x2 x3 z

0 5 10 15

0

10

20

30

40

10 15 20

0

0.5

1×10

-4

primal-dual dynamics Vβ

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 19 / 38

Page 43: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

Beyond Convergence #3: Tracking in Time-Varying Opt

Time-varying optimization problems

minimize ft (x)

subject to gt (x) ≤ 0

ht (x) = 0

ISS characterization leads to tracking guarantees

lim sup τ →∞

k(x(τ ), y(τ), z(τ)) − (x ∗ (t), y ∗ (t), z ∗ (t))k ≤ γ(cδ)

where γ is class K function and

δ is upper bound on rate of change of saddle points

k d dt (x ∗ (t), y ∗ (t), z ∗ (t))k ≤ δ

c time scale of primal-dual dynamics

d dτ

= c d dt

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 20 / 38

Page 44: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

Distributed primal-dual dynamics to solveMPC formulation

Two groups of 5 cars enter network attime 1 (red) and 3 (blue)

Capacity in cell 4 drops to zero at time5 due to an accident

M. Vaquero and J. Cortes. Distributed augmentation-regularization for robust online convex optimization.In IFAC Workshop on Distributed Estimation and Control in Networked Systems, pages 230–235, Groningen, The Netherlands, 2018

´

Dynamic Traffic Assignment

Dynamic traffic assignment problem tunes traffic flows to optimize total travel distance, total travel time given time-varying inflows to network

Cell transmission model

x(t) – traffic volume in cells at time t f (t) – flows between cells at time t λ(t) – inflow to the network +

µ(t) – outflow from the network

supply&demand functions of infrastructure

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 21 / 38

Page 45: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

M. Vaquero and J. Cortes. Distributed augmentation-regularization for robust online convex optimization.In IFAC Workshop on Distributed Estimation and Control in Networked Systems, pages 230–235, Groningen, The Netherlands, 2018

´

Dynamic Traffic Assignment

Dynamic traffic assignment problem tunes traffic flows to optimize total travel distance, total travel time given time-varying inflows to network

Cell transmission model

x(t) – traffic volume in cells at time t f (t) – flows between cells at time t λ(t) – inflow to the network +

µ(t) – outflow from the network

supply&demand functions of infrastructure

Distributed primal-dual dynamics to solve MPC formulation

Two groups of 5 cars enter network at time 1 (red) and 3 (blue)

Capacity in cell 4 drops to zero at time 5 due to an accident 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

Primal-Dual Iterations 104

0

5

10

15

20

25

30

35

Dis

tan

ce

to

op

tim

um

MP

C ite

ration =

1

MP

C ite

ration =

2

MP

C ite

ration =

3

MP

C ite

ration =

4

MP

C ite

ration =

5

MP

C ite

ration =

6

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 21 / 38

Page 46: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

M. Vaquero and J. Cortes. Distributed augmentation-regularization for robust online convex optimization.In IFAC Workshop on Distributed Estimation and Control in Networked Systems, pages 230–235, Groningen, The Netherlands, 2018

´

Dynamic Traffic Assignment

Dynamic traffic assignment problem tunes traffic flows to optimize total travel distance, total travel time given time-varying inflows to network

Cell transmission model

x(t) – traffic volume in cells at time t f (t) – flows between cells at time t λ(t) – inflow to the network +

µ(t) – outflow from the network

supply&demand functions of infrastructure

Distributed primal-dual dynamics to solve MPC formulation

Two groups of 5 cars enter network at time 1 (red) and 3 (blue)

Capacity in cell 4 drops to zero at time 5 due to an accident

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 21 / 38

Page 47: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

Dynamic Traffic Assignment

Dynamic traffic assignment problem tunes traffic flows to optimize total travel distance, total travel time given time-varying inflows to network

Cell transmission model

x(t) – traffic volume in cells at time t f (t) – flows between cells at time t λ(t) – inflow to the network +

µ(t) – outflow from the network

supply&demand functions of infrastructure

Distributed primal-dual dynamics to solve MPC formulation

Two groups of 5 cars enter network at time 1 (red) and 3 (blue)

Capacity in cell 4 drops to zero at time 5 due to an accident

M. Vaquero and J. Cortes. Distributed augmentation-regularization for robust online convex optimization. In IFAC Workshop on Distributed Estimation and Control in Networked Systems, pages 230–235, Groningen, The Netherlands, 2018

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 21 / 38

Page 48: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

Beyond Convergence #4: Coupled w/Network Dynamics

Select optimized setpoints x (power injection of distributed energy resources, traffic flows among contiguous arterial roads)

min f (x , y(x))

subject to g(x , y(x)) ≤ 0

h(x , y(x)) = 0

that drive physical y(x) (bus frequencies/voltages, traffic density) dynamics

ξ = Φ(ξ, x)

y = Ψ(ξ, d)

Implementation of primal-dual dynamics requires evaluation of y(x), ∂∂xy (x)

substitute by data/high-fidelity sim/estimates ⇒ approx. primal-dual dynamics

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 22 / 38

Page 49: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

Traffic Intersection Control with Simulation of Urban MObility (SUMO) simulator

Stages

Controlled traffic light (green-red stage time)

Total cycle time = 160 sec, is = is time of stage i

Flows = 10 cars/cycle, but and = 100 cars/cycle

F(s) is number of cars out given s and flows

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 23 / 38

Page 50: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

Data-driven Optimization of Stage Times

SUMO is complex agent-based simulator that incorporates

network characteristics: road topology, traffic lights, sidewalks

number of vehicles entering network at any time and any lane

destination of vehicles: route, turning ratios, origin/destination

type of vehicle: car, truck, bus, van, bicycle, pedestrian (w/ length, width, acceleration, max speed)

driver’s behavior: intersection model, lane changing model, car following model

minimize − x

subject to x = Ft (s) 8X

si = 160 i=1

si ≥ 5

data-driven optimized strategy constant, equal stage times

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 24 / 38

Page 51: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

Data-driven Optimization of Stage Times

SUMO is complex agent-based simulator that incorporates

network characteristics: road topology, traffic lights, sidewalks

number of vehicles entering network at any time and any lane

destination of vehicles: route, turning ratios, origin/destination

type of vehicle: car, truck, bus, van, bicycle, pedestrian (w/ length, width, acceleration, max speed)

driver’s behavior: intersection model, lane changing model, car following model

minimize − x

subject to x = Ft (s) 8X

si = 160 i=1

si ≥ 5

data-driven optimized strategy adaptive signal control – Webster

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 24 / 38

Page 52: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

Taking the Basic Taxonomy Further

minimize ‘aggregate objective function’(x)

subject to ‘ineq constraints’(x)

‘eq constraints’(x)

Hedging against uncertainty via data-driven distributionally-robust optimization A. Cherukuri and J. Cortes. Cooperative data-driven distributionally robust optimization. IEEE Transactions on Automatic Control, 2018. Submitted

Protecting privacy of individual information via differential privacy E. Nozari, P. Tallapragada, and J. Cortes. Differentially private distributed convex optimization via functional perturbation. IEEE Transactions on Control of Network Systems, 5(1):395–408, 2018

Multi-layer optimization: competition+coordination in power systems A. Cherukuri and J. Cortes. Iterative bidding in electricity markets: rationality and robustness. IEEE Transactions on Network Science and Engineering, 2018. Submitted

P. Srivastava, C.-Y. Chang, and J. Cortes. Participation of microgrids in frequency regulation markets. In American Control Conference, pages 3834–3839, Milwaukee, WI, May 2018

Dealing with non-sparse constraints through data C.-Y. Chang, M. Colombino, J. Cortes, and E. Dall’Anese. Saddle-flow dynamics for distributed feedback-based optimization. IEEE Control Systems Letters, 2019. Submitted

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 25 / 38

Page 53: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

Hedging Against Uncertainty

Stochastic optimization works well when distribution is known and datasets are large

Risky with “uncertainty about uncertainty” Published as a conference paper at ICLR 2015

+ .007⇥ =

x sign(rxJ(✓, x, y))x +

✏sign(rxJ(✓, x, y))“panda” “nematode” “gibbon”

57.7% confidence 8.2% confidence 99.3 % confidence

Figure 1: A demonstration of fast adversarial example generation applied to GoogLeNet (Szegedyet al., 2014a) on ImageNet. By adding an imperceptibly small vector whose elements are equal tothe sign of the elements of the gradient of the cost function with respect to the input, we can changeGoogLeNet’s classification of the image. Here our ✏ of .007 corresponds to the magnitude of thesmallest bit of an 8 bit image encoding after GoogLeNet’s conversion to real numbers.

Let ✓ be the parameters of a model, x the input to the model, y the targets associated with x (formachine learning tasks that have targets) and J(✓, x, y) be the cost used to train the neural network.We can linearize the cost function around the current value of ✓, obtaining an optimal max-normconstrained pertubation of

⌘ = ✏sign (rxJ(✓, x, y)) .

We refer to this as the “fast gradient sign method” of generating adversarial examples. Note that therequired gradient can be computed efficiently using backpropagation.

We find that this method reliably causes a wide variety of models to misclassify their input. SeeFig. 1 for a demonstration on ImageNet. We find that using ✏ = .25, we cause a shallow softmaxclassifier to have an error rate of 99.9% with an average confidence of 79.3% on the MNIST (?) testset1. In the same setting, a maxout network misclassifies 89.4% of our adversarial examples withan average confidence of 97.6%. Similarly, using ✏ = .1, we obtain an error rate of 87.15% andan average probability of 96.6% assigned to the incorrect labels when using a convolutional maxoutnetwork on a preprocessed version of the CIFAR-10 (Krizhevsky & Hinton, 2009) test set2. Othersimple methods of generating adversarial examples are possible. For example, we also found thatrotating x by a small angle in the direction of the gradient reliably produces adversarial examples.

The fact that these simple, cheap algorithms are able to generate misclassified examples serves asevidence in favor of our interpretation of adversarial examples as a result of linearity. The algorithmsare also useful as a way of speeding up adversarial training or even just analysis of trained networks.

5 ADVERSARIAL TRAINING OF LINEAR MODELS VERSUS WEIGHT DECAY

Perhaps the simplest possible model we can consider is logistic regression. In this case, the fastgradient sign method is exact. We can use this case to gain some intuition for how adversarialexamples are generated in a simple setting. See Fig. 2 for instructive images.

If we train a single model to recognize labels y 2 {�1, 1} with P (y = 1) = ��w>x + b

�where

�(z) is the logistic sigmoid function, then training consists of gradient descent on

Ex,y⇠pdata⇣(�y(w>x + b))

where ⇣(z) = log (1 + exp(z)) is the softplus function. We can derive a simple analytical form fortraining on the worst-case adversarial perturbation of x rather than x itself, based on gradient sign

1This is using MNIST pixel values in the interval [0, 1]. MNIST data does contain values other than 0 or1, but the images are essentially binary. Each pixel roughly encodes “ink” or “no ink”. This justifies expectingthe classifier to be able to handle perturbations within a range of width 0.5, and indeed human observers canread such images without difficulty.

2 See https://github.com/lisa-lab/pylearn2/tree/master/pylearn2/scripts/papers/maxout. for the preprocessing code, which yields a standard deviation of roughly 0.5.

3

[Goodfellow, Shlens, Szegedy ’15]

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 26 / 38

Page 54: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

Hedging Against Uncertainty

Stochastic optimization works well when distribution is known and datasets are large

Risky with “uncertainty about uncertainty” Published as a conference paper at ICLR 2015

+ .007⇥ =

x sign(rxJ(✓, x, y))x +

✏sign(rxJ(✓, x, y))“panda” “nematode” “gibbon”

57.7% confidence 8.2% confidence 99.3 % confidence

Figure 1: A demonstration of fast adversarial example generation applied to GoogLeNet (Szegedyet al., 2014a) on ImageNet. By adding an imperceptibly small vector whose elements are equal tothe sign of the elements of the gradient of the cost function with respect to the input, we can changeGoogLeNet’s classification of the image. Here our ✏ of .007 corresponds to the magnitude of thesmallest bit of an 8 bit image encoding after GoogLeNet’s conversion to real numbers.

Let ✓ be the parameters of a model, x the input to the model, y the targets associated with x (formachine learning tasks that have targets) and J(✓, x, y) be the cost used to train the neural network.We can linearize the cost function around the current value of ✓, obtaining an optimal max-normconstrained pertubation of

⌘ = ✏sign (rxJ(✓, x, y)) .

We refer to this as the “fast gradient sign method” of generating adversarial examples. Note that therequired gradient can be computed efficiently using backpropagation.

We find that this method reliably causes a wide variety of models to misclassify their input. SeeFig. 1 for a demonstration on ImageNet. We find that using ✏ = .25, we cause a shallow softmaxclassifier to have an error rate of 99.9% with an average confidence of 79.3% on the MNIST (?) testset1. In the same setting, a maxout network misclassifies 89.4% of our adversarial examples withan average confidence of 97.6%. Similarly, using ✏ = .1, we obtain an error rate of 87.15% andan average probability of 96.6% assigned to the incorrect labels when using a convolutional maxoutnetwork on a preprocessed version of the CIFAR-10 (Krizhevsky & Hinton, 2009) test set2. Othersimple methods of generating adversarial examples are possible. For example, we also found thatrotating x by a small angle in the direction of the gradient reliably produces adversarial examples.

The fact that these simple, cheap algorithms are able to generate misclassified examples serves asevidence in favor of our interpretation of adversarial examples as a result of linearity. The algorithmsare also useful as a way of speeding up adversarial training or even just analysis of trained networks.

5 ADVERSARIAL TRAINING OF LINEAR MODELS VERSUS WEIGHT DECAY

Perhaps the simplest possible model we can consider is logistic regression. In this case, the fastgradient sign method is exact. We can use this case to gain some intuition for how adversarialexamples are generated in a simple setting. See Fig. 2 for instructive images.

If we train a single model to recognize labels y 2 {�1, 1} with P (y = 1) = ��w>x + b

�where

�(z) is the logistic sigmoid function, then training consists of gradient descent on

Ex,y⇠pdata⇣(�y(w>x + b))

where ⇣(z) = log (1 + exp(z)) is the softplus function. We can derive a simple analytical form fortraining on the worst-case adversarial perturbation of x rather than x itself, based on gradient sign

1This is using MNIST pixel values in the interval [0, 1]. MNIST data does contain values other than 0 or1, but the images are essentially binary. Each pixel roughly encodes “ink” or “no ink”. This justifies expectingthe classifier to be able to handle perturbations within a range of width 0.5, and indeed human observers canread such images without difficulty.

2 See https://github.com/lisa-lab/pylearn2/tree/master/pylearn2/scripts/papers/maxout. for the preprocessing code, which yields a standard deviation of roughly 0.5.

3

[Goodfellow, Shlens, Szegedy ’15]

Many scenarios w/decisions need to be made before large datasets can be collected

deadlines imposed by performance or safety considerations

timescale of system’s evolution faster than speed at which data can be collected

acquiring samples is expensive: adversary purposely hides in environment

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 26 / 38

Page 55: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

Hedging Against Uncertainty

Stochastic optimization works well when distribution is known and datasets are large

Risky with “uncertainty about uncertainty” Published as a conference paper at ICLR 2015

+ .007⇥ =

x sign(rxJ(✓, x, y))x +

✏sign(rxJ(✓, x, y))“panda” “nematode” “gibbon”

57.7% confidence 8.2% confidence 99.3 % confidence

Figure 1: A demonstration of fast adversarial example generation applied to GoogLeNet (Szegedyet al., 2014a) on ImageNet. By adding an imperceptibly small vector whose elements are equal tothe sign of the elements of the gradient of the cost function with respect to the input, we can changeGoogLeNet’s classification of the image. Here our ✏ of .007 corresponds to the magnitude of thesmallest bit of an 8 bit image encoding after GoogLeNet’s conversion to real numbers.

Let ✓ be the parameters of a model, x the input to the model, y the targets associated with x (formachine learning tasks that have targets) and J(✓, x, y) be the cost used to train the neural network.We can linearize the cost function around the current value of ✓, obtaining an optimal max-normconstrained pertubation of

⌘ = ✏sign (rxJ(✓, x, y)) .

We refer to this as the “fast gradient sign method” of generating adversarial examples. Note that therequired gradient can be computed efficiently using backpropagation.

We find that this method reliably causes a wide variety of models to misclassify their input. SeeFig. 1 for a demonstration on ImageNet. We find that using ✏ = .25, we cause a shallow softmaxclassifier to have an error rate of 99.9% with an average confidence of 79.3% on the MNIST (?) testset1. In the same setting, a maxout network misclassifies 89.4% of our adversarial examples withan average confidence of 97.6%. Similarly, using ✏ = .1, we obtain an error rate of 87.15% andan average probability of 96.6% assigned to the incorrect labels when using a convolutional maxoutnetwork on a preprocessed version of the CIFAR-10 (Krizhevsky & Hinton, 2009) test set2. Othersimple methods of generating adversarial examples are possible. For example, we also found thatrotating x by a small angle in the direction of the gradient reliably produces adversarial examples.

The fact that these simple, cheap algorithms are able to generate misclassified examples serves asevidence in favor of our interpretation of adversarial examples as a result of linearity. The algorithmsare also useful as a way of speeding up adversarial training or even just analysis of trained networks.

5 ADVERSARIAL TRAINING OF LINEAR MODELS VERSUS WEIGHT DECAY

Perhaps the simplest possible model we can consider is logistic regression. In this case, the fastgradient sign method is exact. We can use this case to gain some intuition for how adversarialexamples are generated in a simple setting. See Fig. 2 for instructive images.

If we train a single model to recognize labels y 2 {�1, 1} with P (y = 1) = ��w>x + b

�where

�(z) is the logistic sigmoid function, then training consists of gradient descent on

Ex,y⇠pdata⇣(�y(w>x + b))

where ⇣(z) = log (1 + exp(z)) is the softplus function. We can derive a simple analytical form fortraining on the worst-case adversarial perturbation of x rather than x itself, based on gradient sign

1This is using MNIST pixel values in the interval [0, 1]. MNIST data does contain values other than 0 or1, but the images are essentially binary. Each pixel roughly encodes “ink” or “no ink”. This justifies expectingthe classifier to be able to handle perturbations within a range of width 0.5, and indeed human observers canread such images without difficulty.

2 See https://github.com/lisa-lab/pylearn2/tree/master/pylearn2/scripts/papers/maxout. for the preprocessing code, which yields a standard deviation of roughly 0.5.

3

[Goodfellow, Shlens, Szegedy ’15]

Traditional robust optimization might be overly conservative

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 26 / 38

Page 56: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

Hedging Against Uncertainty

Stochastic optimization works well when distribution is known and datasets are large

Risky with “uncertainty about uncertainty” Published as a conference paper at ICLR 2015

+ .007⇥ =

x sign(rxJ(✓, x, y))x +

✏sign(rxJ(✓, x, y))“panda” “nematode” “gibbon”

57.7% confidence 8.2% confidence 99.3 % confidence

Figure 1: A demonstration of fast adversarial example generation applied to GoogLeNet (Szegedyet al., 2014a) on ImageNet. By adding an imperceptibly small vector whose elements are equal tothe sign of the elements of the gradient of the cost function with respect to the input, we can changeGoogLeNet’s classification of the image. Here our ✏ of .007 corresponds to the magnitude of thesmallest bit of an 8 bit image encoding after GoogLeNet’s conversion to real numbers.

Let ✓ be the parameters of a model, x the input to the model, y the targets associated with x (formachine learning tasks that have targets) and J(✓, x, y) be the cost used to train the neural network.We can linearize the cost function around the current value of ✓, obtaining an optimal max-normconstrained pertubation of

⌘ = ✏sign (rxJ(✓, x, y)) .

We refer to this as the “fast gradient sign method” of generating adversarial examples. Note that therequired gradient can be computed efficiently using backpropagation.

We find that this method reliably causes a wide variety of models to misclassify their input. SeeFig. 1 for a demonstration on ImageNet. We find that using ✏ = .25, we cause a shallow softmaxclassifier to have an error rate of 99.9% with an average confidence of 79.3% on the MNIST (?) testset1. In the same setting, a maxout network misclassifies 89.4% of our adversarial examples withan average confidence of 97.6%. Similarly, using ✏ = .1, we obtain an error rate of 87.15% andan average probability of 96.6% assigned to the incorrect labels when using a convolutional maxoutnetwork on a preprocessed version of the CIFAR-10 (Krizhevsky & Hinton, 2009) test set2. Othersimple methods of generating adversarial examples are possible. For example, we also found thatrotating x by a small angle in the direction of the gradient reliably produces adversarial examples.

The fact that these simple, cheap algorithms are able to generate misclassified examples serves asevidence in favor of our interpretation of adversarial examples as a result of linearity. The algorithmsare also useful as a way of speeding up adversarial training or even just analysis of trained networks.

5 ADVERSARIAL TRAINING OF LINEAR MODELS VERSUS WEIGHT DECAY

Perhaps the simplest possible model we can consider is logistic regression. In this case, the fastgradient sign method is exact. We can use this case to gain some intuition for how adversarialexamples are generated in a simple setting. See Fig. 2 for instructive images.

If we train a single model to recognize labels y 2 {�1, 1} with P (y = 1) = ��w>x + b

�where

�(z) is the logistic sigmoid function, then training consists of gradient descent on

Ex,y⇠pdata⇣(�y(w>x + b))

where ⇣(z) = log (1 + exp(z)) is the softplus function. We can derive a simple analytical form fortraining on the worst-case adversarial perturbation of x rather than x itself, based on gradient sign

1This is using MNIST pixel values in the interval [0, 1]. MNIST data does contain values other than 0 or1, but the images are essentially binary. Each pixel roughly encodes “ink” or “no ink”. This justifies expectingthe classifier to be able to handle perturbations within a range of width 0.5, and indeed human observers canread such images without difficulty.

2 See https://github.com/lisa-lab/pylearn2/tree/master/pylearn2/scripts/papers/maxout. for the preprocessing code, which yields a standard deviation of roughly 0.5.

3

[Goodfellow, Shlens, Szegedy ’15]

The best of both worlds: distributionally robust optimization

empirical samples + ‘what-if’ scenarios

Attractive b/c of formal guarantees valid for small datasets

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 26 / 38

Page 57: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

Given N i.i.d samples {bξ k}Nk=1, discrete empirical probability distribution

PN =1

N

NXk=1

δbξ k

EPN [f (x , ξ)] =1

N

NXi=1

f (x , ξk)

Sample-average approximation has

almost sure convergence guarantee as N →∞poor out-of-sample performance for small N

´

Network Optimization under Uncertainty

Stochastic optimization: inf EP[f (x , ξ)]x∈X ⊂Rd

objective f : Rd × Rm → R encodes network goal —total travel time, total distance covered, -network outflow

decision variable x at central/aggregate/local level —max velocity, inflows at control nodes, routing at intersections

random variable ξ distributed w/ unknown probability P —inflows/outflows at non-control nodes, road densities, vehicle locations

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 27 / 38

Page 58: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

Given N i.i.d samples {bξ k}Nk=1, discrete empirical probability distribution

PN =1

N

NXk=1

δbξ k

EPN [f (x , ξ)] =1

N

NXi=1

f (x , ξk)

Sample-average approximation has

almost sure convergence guarantee as N →∞poor out-of-sample performance for small N

´

Network Optimization under Uncertainty

Stochastic optimization: inf EP[f (x , ξ)]x∈X ⊂Rd

objective f : Rd × Rm → R encodes network goal —total travel time, total distance covered, -network outflow

decision variable x at central/aggregate/local level —max velocity, inflows at control nodes, routing at intersections

random variable ξ distributed w/ unknown probability P —inflows/outflows at non-control nodes, road densities, vehicle locations

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 27 / 38

Page 59: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

EPN [f (x , ξ)] =1

N

NXi=1

f (x , ξk)

Sample-average approximation has

almost sure convergence guarantee as N →∞poor out-of-sample performance for small N

´

Network Optimization under Uncertainty

Stochastic optimization: inf EP[f (x , ξ)]x∈X ⊂Rd

objective f : Rd × Rm → R encodes network goal —total travel time, total distance covered, -network outflow

decision variable x at central/aggregate/local level —max velocity, inflows at control nodes, routing at intersections

random variable ξ distributed w/ unknown probability P —inflows/outflows at non-control nodes, road densities, vehicle locations

Given N i.i.d samples {b k=1, discrete empirical probability distribution ξ k }N

NX1PN = δξbkN

k=1

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 27 / 38

Page 60: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

Sample-average approximation has

almost sure convergence guarantee as N →∞poor out-of-sample performance for small N

´

Network Optimization under Uncertainty

Stochastic optimization: inf EP[f (x , ξ)]x∈X ⊂Rd

objective f : Rd × Rm → R encodes network goal —total travel time, total distance covered, -network outflow

decision variable x at central/aggregate/local level —max velocity, inflows at control nodes, routing at intersections

random variable ξ distributed w/ unknown probability P —inflows/outflows at non-control nodes, road densities, vehicle locations

Given N i.i.d samples {b k=1, discrete empirical probability distribution ξ k }N

N NX X1 1 kPN = δξbk EPN [f (x , ξ)] = f (x , ξ )

N N k=1 i=1

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 27 / 38

Page 61: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

Network Optimization under Uncertainty

Stochastic optimization: inf EP[f (x , ξ)]x∈X ⊂Rd

objective f : Rd × Rm → R encodes network goal —total travel time, total distance covered, -network outflow

decision variable x at central/aggregate/local level —max velocity, inflows at control nodes, routing at intersections

random variable ξ distributed w/ unknown probability P —inflows/outflows at non-control nodes, road densities, vehicle locations

Given N i.i.d samples {b k=1, discrete empirical probability distribution ξ k }N

N NX X1 1PN = δξbk EPN [f (x , ξ)] = f (x , ξk )

N N k=1 i=1

Sample-average approximation has

almost sure convergence guarantee as N →∞

poor out-of-sample performance for small N

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 27 / 38

Page 62: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

Desirable properties in ambiguity set

rich enough to contain true data-generating distribution with high confidence

small enough to exclude pathological distributions (avoid conservativeness)

easy to parameterize from data

facilitate tractable solution of optimization problem

´

Distributionally Robust Optimization

Account for ignorance of true data-generating distribution P

Distributionally robust (DRO) formulation to hedge against uncertainty

inf sup EQ[f (x , ξ)] x∈X Q∈PbN

bPN is ambiguity set of probability distributions

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 28 / 38

Page 63: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

Distributionally Robust Optimization

Account for ignorance of true data-generating distribution P

Distributionally robust (DRO) formulation to hedge against uncertainty

inf sup EQ[f (x , ξ)] x∈X Q∈PbN

bPN is ambiguity set of probability distributions

Desirable properties in ambiguity set

rich enough to contain true data-generating distribution with high confidence

small enough to exclude pathological distributions (avoid conservativeness)

easy to parameterize from data

facilitate tractable solution of optimization problem

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 28 / 38

Page 64: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

Ambiguity Sets Via Wasserstein Metric

Wasserstein metric: cost of optimal transportation plan of probability mass

2 dW2 (Q1, Q2) =

�inf

nZ kξ1 − ξ2k2Π(dξ1, dξ2)

�� Π ∈ H(Q1, Q2)o� 1

Ξ2

—H(Q1, Q2) is set of distributions w/ marginals Q1 and Q2

Q1 Q2

Π is transportation plan for moving mass distribution

norm k · k encodes transportation cost

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 29 / 38

Page 65: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

Out-of-sample guarantee & tractability [Esfahani & Kuhn ’16]

Let bJN , bxN be optimal value, optimizer of DRO. ThenPN�EP[f (bxN , ξ)] ≤ bJN� ≥ 1− β

Additionally, if x 7→ f (x , ξ) convex ∀ξ ∈ Ξ, then bJN equal to convex optimizationinf

λ≥0,x∈X

nλ�2N(β) +

1

N

NXk=1

maxξ∈Rm

�f (x , ξ)− λkξ − ξkk2

�o

´

Data-Driven DRO

Ambiguity set via Wasserstein metric

PbN = B�N (β)(PN )

Then, PN (P ∈ PbN ) ≥ 1 − β

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 30 / 38

Page 66: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

Additionally, if x 7→ f (x , ξ) convex ∀ξ ∈ Ξ, then bJN equal to convex optimizationinf

λ≥0,x∈X

nλ�2N(β) +

1

N

NXk=1

maxξ∈Rm

�f (x , ξ)− λkξ − ξkk2

�o

´

Data-Driven DRO

Ambiguity set via Wasserstein metric

PbN = B�N (β)(PN )

Then, PN (P ∈ PbN ) ≥ 1 − β

Out-of-sample guarantee & tractability [Esfahani & Kuhn ’16]

Let bJN , bxN be optimal value, optimizer of DRO. Then

PN � EP[f (bxN , ξ)] ≤ bJN

� ≥ 1 − β

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 30 / 38

Page 67: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

Data-Driven DRO

Ambiguity set via Wasserstein metric

PbN = B�N (β)(PN )

Then, PN (P ∈ PbN ) ≥ 1 − β

Out-of-sample guarantee & tractability [Esfahani & Kuhn ’16]

Let bJN , bxN be optimal value, optimizer of DRO. Then

PN � EP[f (bxN , ξ)] ≤ bJN

� ≥ 1 − β

Additionally, if x 7→ f (x , ξ) convex ∀ξ ∈ Ξ, then bJN equal to convex optimization

inf λ≥0,x∈X

n λ�2

N (β) + 1 N

NX

k=1

max ξ∈Rm

� f (x , ξ) − λkξ − ξk k2

�o

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 30 / 38

Page 68: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

Network Optimization Problem

Cooperative network of n agents collecting data, communicating w/ neighbors, no central coordinator

communication modeled by graph (V, E) each agent knows objective function f , constraint set X

each agent gathers i.i.d samples Ξb i (Ξb i ∩ Ξb j = ∅)

data available across network ∪n Ξb i = {ξbk }N i=1 k=1

{ξ9, ξ10}

{ξ1, ξ2}

{ξ3, ξ4}

{ξ5, ξ6}{ξ7, ξ8}

f(x, ξ)

Leverage power-of-many to collectively improve out-of-sample guarantee

individual agents can solve DRO with own data, but

can benefit from others’ contributions to obtain higher-quality solution

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 31 / 38

Page 69: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

| {z }agreement on λi ’s

| {z }agreement on xi ’s

Problems are equivalent (w/ f convex in x)

Given bxN , there exists λ∗ ≥ 0 s.t. (1n ⊗ bxN , λ∗1n) is optimizer of (?)

If (x∗v , λ∗v ) is optimizer of (?), then x∗v = 1n ⊗ bxN

Same optimal value bJNOptimization (?) has separable objective & locally computable constraints!

´

Distributed Reformulation of Data-Driven DRO

Each agent i with own estimate x i (and λi ) of optimal solution

N (β)1> N � ��2 X n λv 1 kmin + max f (x vk , ξ)−λvk kξ − ξb k2

xv ,λv ≥0n n N ξ∈Rm k=1

(?) subject to Lλv = 0n and (L ⊗ Id )xv = 0nd

1 n(Here xv = (x ; . . . ; x ), λv = (λ1; . . . ; λn ))

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 32 / 38

Page 70: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

Problems are equivalent (w/ f convex in x)

Given bxN , there exists λ∗ ≥ 0 s.t. (1n ⊗ bxN , λ∗1n) is optimizer of (?)

If (x∗v , λ∗v ) is optimizer of (?), then x∗v = 1n ⊗ bxN

Same optimal value bJNOptimization (?) has separable objective & locally computable constraints!

´

Distributed Reformulation of Data-Driven DRO

Each agent i with own estimate x i (and λi ) of optimal solution

N (β)1> N � ��2 X n λv 1 kmin + max f (x vk , ξ)−λvk kξ − ξb k2

xv ,λv ≥0n n N ξ∈Rm k=1

(?) subject to Lλv = 0n and (L ⊗ Id )xv = 0nd| {z } | {z }

agreement on λi ’s agreement on xi ’s

1 n(Here xv = (x ; . . . ; x ), λv = (λ1; . . . ; λn ))

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 32 / 38

Page 71: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

Distributed Reformulation of Data-Driven DRO

Each agent i with own estimate x i (and λi ) of optimal solution

N (β)1> N � ��2 X n λv 1 kmin + max f (x vk , ξ)−λvk kξ − ξb k2

xv ,λv ≥0n n N ξ∈Rm k=1

(?) subject to Lλv = 0n and (L ⊗ Id )xv = 0nd| {z } | {z }

agreement on λi ’s agreement on xi ’s

1 n(Here xv = (x ; . . . ; x ), λv = (λ1; . . . ; λn ))

Problems are equivalent (w/ f convex in x)

Given bxN , there exists λ ∗ ≥ 0 s.t. (1n ⊗ bxN , λ ∗ 1n) is optimizer of (?)

If (x ∗ v , λ ∗

v ) is optimizer of (?), then x ∗ v = 1n ⊗ bxN

Same optimal value bJN

Optimization (?) has separable objective & locally computable constraints!

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 32 / 38

Page 72: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

Augmented Lagrangian: (for better convergence properties)

Laug(xv, λv, ν, η) := L(xv, λv, ν, η) +1

2x>v (L⊗ Id )xv +

1

2λ>v Lλv

= max{ξ(k)}

Laug(xv, λv, ν, η, {ξ(k)})

Laug(xv, λv, ν, η, {ξ(k)}) :=�2N(β)1

>n λv

n+

NXk=1

�f (xvk , ξ)−λvk kξ − bξkk2�

+ ν>Lλv + η>(L⊗ Id )xv +1

2x>v (L⊗ Id )xv +

1

2λ>v Lλv

´

Modified Lagrangian

Getting rid of inner maximization in Lagrangian

λv N

N n�2 (β)1> X � � Lagrangian: L(xv, λv, ν, η) := + max f (x vk , ξ)−λvk kξ − ξbk k2

n ξ∈Rm k=1

+ ν>Lλv + η>(L ⊗ Id )xv

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 33 / 38

Page 73: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

= max{ξ(k)}

Laug(xv, λv, ν, η, {ξ(k)})

Laug(xv, λv, ν, η, {ξ(k)}) :=�2N(β)1

>n λv

n+

NXk=1

�f (xvk , ξ)−λvk kξ − bξkk2�

+ ν>Lλv + η>(L⊗ Id )xv +1

2x>v (L⊗ Id )xv +

1

2λ>v Lλv

´

Modified Lagrangian

Getting rid of inner maximization in Lagrangian

N � ��2 (β)1> X N n λv Lagrangian: L(xv, λv, ν, η) := + max f (x vk , ξ)−λvk kξ − ξbk k2

n ξ∈Rm k=1

+ ν>Lλv + η>(L ⊗ Id )xv

Augmented Lagrangian: (for better convergence properties)

>Laug(xv, λv, ν, η) := L(xv, λv, ν, η) + 1 x (L ⊗ Id )xv +

1 λ>Lλvv v2 2

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 33 / 38

Page 74: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

Modified Lagrangian

Getting rid of inner maximization in Lagrangian

N � ��2 (β)1> X N n λv Lagrangian: L(xv, λv, ν, η) := + max f (x vk , ξ)−λvk kξ − ξbk k2

n ξ∈Rm k=1

+ ν>Lλv + η>(L ⊗ Id )xv

Augmented Lagrangian: (for better convergence properties)

1 > 1 Laug(xv, λv, ν, η) := L(xv, λv, ν, η) + x (L ⊗ Id )xv + λ>Lλvv v2 2

= max Laug(xv, λv, ν, η, {ξ(k)})˜{ξ(k)}

N � � Laug(xv, λv, ν, η, {ξ(k)}) := N n λv

+ f (x vk , ξ)−λvk kξ − ξbk k2�2 (β)1> X

n k=1

>+ ν>Lλv + η>(L ⊗ Id )xv + 1 x (L ⊗ Id )xv +

1 λ>Lλvv v2 2

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 33 / 38

Page 75: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

Correspondence between optima and saddle points

1 If (x∗v , λ∗v , ν

∗, η∗) is saddle point of L, then ∃ {(ξ∗)(k)} such that((x∗v , λ

∗v , ν

∗, η∗, {(ξ∗)(k)}) is saddle point of Laug over λv ≥ 0n

2 If ((x∗v , λ∗v , ν

∗, η∗, {(ξ∗)(k)}) is saddle point of Laug over λv ≥ 0n, then(x∗v , λ

∗v , ν

∗, η∗) is saddle point of L

Laug is convex-concave in ((xv, λv), (ν, η, {ξ(k)})) over domain λv ≥ 0n

´

Modified Lagrangian

Saddle points of Laug exists implying

min max Laug(xv,λv, ν, η) = max min Laug(xv, λv, ν, η) xv ,λv ≥0n ν,η ν,η xv ,λv ≥0n

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 34 / 38

Page 76: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

Correspondence between optima and saddle points

1 If (x∗v , λ∗v , ν

∗, η∗) is saddle point of L, then ∃ {(ξ∗)(k)} such that((x∗v , λ

∗v , ν

∗, η∗, {(ξ∗)(k)}) is saddle point of Laug over λv ≥ 0n

2 If ((x∗v , λ∗v , ν

∗, η∗, {(ξ∗)(k)}) is saddle point of Laug over λv ≥ 0n, then(x∗v , λ

∗v , ν

∗, η∗) is saddle point of L

Laug is convex-concave in ((xv, λv), (ν, η, {ξ(k)})) over domain λv ≥ 0n

´

Modified Lagrangian

Saddle points of Laug exists implying

min max Laug(xv,λv, ν, η) = max min Laug(xv, λv, ν, η) xv ,λv ≥0n ν,η ν,η xv ,λv ≥0n

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 34 / 38

Page 77: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

Correspondence between optima and saddle points

1 If (x∗v , λ∗v , ν

∗, η∗) is saddle point of L, then ∃ {(ξ∗)(k)} such that((x∗v , λ

∗v , ν

∗, η∗, {(ξ∗)(k)}) is saddle point of Laug over λv ≥ 0n

2 If ((x∗v , λ∗v , ν

∗, η∗, {(ξ∗)(k)}) is saddle point of Laug over λv ≥ 0n, then(x∗v , λ

∗v , ν

∗, η∗) is saddle point of L

Laug is convex-concave in ((xv, λv), (ν, η, {ξ(k)})) over domain λv ≥ 0n

´

Modified Lagrangian

Saddle points of Laug exists implying

˜ ˜min max max Laug(·) = max min max Laug(·) xv,λv ≥0n ν,η {ξ(k)} ν,η xv ,λv ≥0n {ξ(k)}

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 34 / 38

Page 78: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

assuming min-max operator on the right can be interchanged – requires formal proof

Correspondence between optima and saddle points

1 If (x∗v , λ∗v , ν

∗, η∗) is saddle point of L, then ∃ {(ξ∗)(k)} such that((x∗v , λ

∗v , ν

∗, η∗, {(ξ∗)(k)}) is saddle point of Laug over λv ≥ 0n

2 If ((x∗v , λ∗v , ν

∗, η∗, {(ξ∗)(k)}) is saddle point of Laug over λv ≥ 0n, then(x∗v , λ

∗v , ν

∗, η∗) is saddle point of L

Laug is convex-concave in ((xv, λv), (ν, η, {ξ(k)})) over domain λv ≥ 0n

´

Modified Lagrangian

Saddle points of Laug exists implying

˜ ˜min max Laug(·) = max min max Laug(·)ν,η xv ,λv ≥0n ν,η,{ξ(k)} xv ,λv ≥0n {ξ(k)}

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 34 / 38

Page 79: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

Correspondence between optima and saddle points

1 If (x∗v , λ∗v , ν

∗, η∗) is saddle point of L, then ∃ {(ξ∗)(k)} such that((x∗v , λ

∗v , ν

∗, η∗, {(ξ∗)(k)}) is saddle point of Laug over λv ≥ 0n

2 If ((x∗v , λ∗v , ν

∗, η∗, {(ξ∗)(k)}) is saddle point of Laug over λv ≥ 0n, then(x∗v , λ

∗v , ν

∗, η∗) is saddle point of L

Laug is convex-concave in ((xv, λv), (ν, η, {ξ(k)})) over domain λv ≥ 0n

´

Modified Lagrangian

Saddle points of Laug exists implying

˜ ˜min max Laug(·) = max min max Laug(·) xv ,λv ≥0n ν,η,{ξ(k)} ν,η xv ,λv ≥0n {ξ(k)}

assuming min-max operator on the right can be interchanged – requires formal proof

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 34 / 38

Page 80: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

Correspondence between optima and saddle points

1 If (x∗v , λ∗v , ν

∗, η∗) is saddle point of L, then ∃ {(ξ∗)(k)} such that((x∗v , λ

∗v , ν

∗, η∗, {(ξ∗)(k)}) is saddle point of Laug over λv ≥ 0n

2 If ((x∗v , λ∗v , ν

∗, η∗, {(ξ∗)(k)}) is saddle point of Laug over λv ≥ 0n, then(x∗v , λ

∗v , ν

∗, η∗) is saddle point of L

Laug is convex-concave in ((xv, λv), (ν, η, {ξ(k)})) over domain λv ≥ 0n

´

Modified Lagrangian

Saddle points of Laug exists implying

˜ ˜min max Laug(·) = max min Laug(·) xv ,λv≥0n xv ,λv ≥0nν,η,{ξ(k)} ν,η,{ξ(k)}

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 34 / 38

Page 81: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

Modified Lagrangian

Saddle points of Laug exists implying

˜ ˜min max Laug(·) = max min Laug(·) xv ,λv≥0n xv ,λv ≥0nν,η,{ξ(k)} ν,η,{ξ(k)}

Correspondence between optima and saddle points

1 If (x ∗ v , λ ∗

v , ν ∗ , η ∗ ) is saddle point of L, then ∃ {(ξ ∗ )(k)} such that ((x ∗

v , λ ∗ v , ν ∗ , η ∗ , {(ξ ∗ )(k)}) is saddle point of Laug over λv ≥ 0n

2 If ((x ∗ v , λ ∗

v , ν ∗ , η ∗ , {(ξ ∗ )(k)}) is saddle point of Laug over λv ≥ 0n, then (x ∗ v , λ ∗

v , ν ∗ , η ∗ ) is saddle point of L

Laug is convex-concave in ((xv, λv), (ν, η, {ξ(k)})) over domain λv ≥ 0n

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 34 / 38

Page 82: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

When Can Max-Min Operator Be Interchanged?

Theorem Assuming f satisfies technical condition on directions of recession. Max-min operator can be interchanged under either

1 convex-concave objective function f

2 convex-convex objective function f and

quadratic in ξ,

f (x , ξ) = ξ>Qξ + x >Rξ + `(x)

least-squares problem (w/ d = m),

f (x , ξ) = a(ξm − (ξ1:m−1; 1)> x)2

In either case, Laug is convex-concave in variables ((xv, λv), {ξ(k)})

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 35 / 38

Page 83: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

Distributed Algorithm for Network Optimization

Primal-dual dynamics for Laug is distributed

dxv (k)˜= − PrX (rxv Laug(xv, λv, ν, η, {ξ }))dt dλv ˜= [−rλv Laug(xv, λv, ν, η, {ξ

(k)})]+ λvdt

dν ˜= rν Laug(xv, λv, ν, η, {ξ(k)})dt dη ˜= rη Laug(xv, λv, ν, η, {ξ(k)})dt

dξ(k) ˜= rξ(k) Laug(xv, λv, ν, η, {ξ(k)}), ∀k ∈ {1, . . . , N}

dt

Laug not necessarily strictly convex in (xv, λv), not linear in {ξk }

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 36 / 38

Page 84: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

Distributed Algorithm for Network Optimization

Primal-dual dynamics for Laug is distributed � �dx i 1 X i i k

X i j i j= rxi gk (x , λ , ξ ) + (η − η ) + (x − x )

dt N k∈Ki j∈Ni h X X� �i+dλi �2 1N (β) i i k i j i j= + rλi gk (x , λ , ξ ) + (ν − ν ) + (λ − λ )

dt n N λi k∈Ki j∈Ni

dν i X = aij (λ

i − λj )dt

j∈Ni

dηi X = aij (x i − x j )

dt j∈Ni

dξk 1 i i k = rξgk (x , λ , ξ ), ∀k ∈ Ki [gk (x , λ, ξ) := f (x , ξ) − λkξ − kk2]dt N

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 36 / 38

Page 85: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

Illustration

data ξbk = ( wbk , ybk ) ∈ R4 × R: input-output pairs goal: find predictor x ∈ R5 such that x >(w ; 1) ∼ y

quadratic loss f (x , ξ) = (x >(w ; 1) − y)2

dataset: w ∼ N (0, I4), y = (1, 4, 3, 2) ∗ w + v , v uniformly distributed over [−1, 1] each agent 30 i.i.d samples (300 network samples)

9

10

1

2

3

4

5

6

7

8

0 40 80 120 160 200

-2

-1

0

1

2

3

4

5

0 2 4 6 8 10

-2

0

2

4

1 2 3 4 5 6 7 8 9 100

5

10

15

20

25

30

evolution of optimizer estimates relative benefit of cooperation J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 37 / 38

Page 86: Distributed Solvers for Online Data-Driven Network ...complex, changing network dynamics humans in the loop contested scenarios, adversaries ´ Network Optimization is Pervasive Optimizing

´

Summary

Conclusions

network optimization via primal-dual dynamics

Lyapunov function: distance to saddle-point set + magnitude of vector field

robustness against disturbances, real-time state-triggered implementation, time-varying, data-driven formulations

Current&Future work

distributed regularization for strongly convex-concave formulations and impact on saddle points

robust stability via ISS for general convex optimization

nonconvex scenarios via sequential convex approx.

dynamic ambiguity sets and online data-driven distributionally robust optimization

trigger design for accelerated convergence

J. Cortes (UC San Diego) Distributed Solvers for Network Optimization April 11, 2019 38 / 38