48
Distributed online optimization over jointly connected digraphs David Mateos-N´ nez Jorge Cort´ es University of California, San Diego {dmateosn,cortes}@ucsd.edu Mathematical Theory of Networks and Systems University of Groningen, July 7, 2014 1 / 30

Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

  • Upload
    others

  • View
    14

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Distributed online optimization over jointly connecteddigraphs

David Mateos-Nunez Jorge Cortes

University of California, San Diego

{dmateosn,cortes}@ucsd.edu

Mathematical Theory of Networks and SystemsUniversity of Groningen, July 7, 2014

1 / 30

Page 2: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Overview of this talk

Distributed optimization Online optimization

Case study: medical diagnosis

2 / 30

Page 3: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Motivation: a data-driven optimization problem

Medical findings, symptoms:age factoramnesia before impactdeterioration in GCS scoreopen skull fractureloss of consciousnessvomiting

Any acute brain finding revealed on Computerized Tomography?

(-1 = not present, 1 = present)

“The Canadian CT Head Rule for patients with minor head injury”

3 / 30

Page 4: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Binary classification

feature vector of patient s:ws = ((ws)1, ..., (ws)d−1)

true diagnosis: ys ∈ {−1, 1}wanted weights: x = (x1, ..., xd)

predictor: h(x ,ws) = x>(ws , 1)

margin: ms(x) = ys h(x ,ws)

Given the data set {ws}Ps=1, estimate x ∈ Rd by solving

minx∈Rd

P∑s=1

l(ms(x))

where the loss function l : R → R is decreasing and convex

4 / 30

Page 5: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Binary classification

feature vector of patient s:ws = ((ws)1, ..., (ws)d−1)

true diagnosis: ys ∈ {−1, 1}wanted weights: x = (x1, ..., xd)

predictor: h(x ,ws) = x>(ws , 1)

margin: ms(x) = ys h(x ,ws)

Given the data set {ws}Ps=1, estimate x ∈ Rd by solving

minx∈Rd

P∑s=1

l(ms(x))

where the loss function l : R → R is decreasing and convex

4 / 30

Page 6: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Why distributed online optimization?

Why distributed?

information is distributed across group of agentsneed to interact to optimize performance

Why online?

information becomes incrementally availableneed adaptive solution

5 / 30

Page 7: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Review ofdistributed convex optimization

6 / 30

Page 8: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

A distributed scenario

In the diagnosis example health center i ∈ {1, . . . ,N}manages a set of patients P i

f (x) =N∑i=1

∑s∈P i

l(ysh(x ,ws)) =N∑i=1

f i (x)

Goal: best predicting model w 7→ h(x∗,w)

minx∈Rd

N∑i=1

f i (x)

using “local information”

7 / 30

Page 9: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

A distributed scenario

In the diagnosis example health center i ∈ {1, . . . ,N}manages a set of patients P i

f (x) =N∑i=1

∑s∈P i

l(ysh(x ,ws)) =N∑i=1

f i (x)

Goal: best predicting model w 7→ h(x∗,w)

minx∈Rd

N∑i=1

f i (x)

using “local information”

7 / 30

Page 10: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

What do we mean by “using local information”?

Agent i maintains an estimate x it of

x∗ ∈ arg minx∈Rd

N∑i=1

f i (x)

Agent i has access to f i

Agent i can share its estimate x it with “neighboring” agents

f 1 f 2

f 3 f 4

f 5

A =

a13 a14

a21 a25a31 a32a41

a52

Distributed algorithms: Tsitsiklis 84, Bertsekas and Tsitsiklis 95

Consensus: Jadbabaie et al. 03, Olfati-Saber, Murray 04, Boyd et al. 05

8 / 30

Page 11: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

What do we mean by “using local information”?

Agent i maintains an estimate x it of

x∗ ∈ arg minx∈Rd

N∑i=1

f i (x)

Agent i has access to f i

Agent i can share its estimate x it with “neighboring” agents

f 1 f 2

f 3 f 4

f 5

A =

a13 a14

a21 a25a31 a32a41

a52

Distributed algorithms: Tsitsiklis 84, Bertsekas and Tsitsiklis 95

Consensus: Jadbabaie et al. 03, Olfati-Saber, Murray 04, Boyd et al. 058 / 30

Page 12: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

How do agents agree on the optimizer?

local minimization vs local consensus

A. Nedic and A. Ozdaglar, TAC, 09

x ik+1 = −ηkg ik +

N∑j=1

aij ,k xjk ,

where g ik ∈ ∂f i (x ik) and Ak = (aij ,k) is doubly stochastic

J. C. Duchi, A. Agarwal, and M. J. Wainwright, TAC, 12

9 / 30

Page 13: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

How do agents agree on the optimizer?

local minimization vs local consensus

A. Nedic and A. Ozdaglar, TAC, 09

x ik+1 = −ηkg ik +

N∑j=1

aij ,k xjk ,

where g ik ∈ ∂f i (x ik) and Ak = (aij ,k) is doubly stochastic

J. C. Duchi, A. Agarwal, and M. J. Wainwright, TAC, 12

9 / 30

Page 14: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Saddle-point dynamics

The minimization problem can be regarded as

minx∈Rd

N∑i=1

f i (x) = minx1,...,xN∈Rd

x1=...=xN

N∑i=1

f i (x i ) = minx∈(Rd )N

Lx=0

f (x),

(Lx)i =N∑j=1

aij(xi − x j) f (x) =

N∑i=1

f i (x i )

Convex-concave augmented Lagrangian when L is symmetric

F (x , z) := f (x) + γ2 x>L x + z>L x ,

Saddle-point dynamics

x = − ∂F (x , z)∂x

= −∇f (x)− γ Lx − Lz

z =∂F (x , z)∂z

= Lx

10 / 30

Page 15: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Saddle-point dynamics

The minimization problem can be regarded as

minx∈Rd

N∑i=1

f i (x) = minx1,...,xN∈Rd

x1=...=xN

N∑i=1

f i (x i ) = minx∈(Rd )N

Lx=0

f (x),

(Lx)i =N∑j=1

aij(xi − x j) f (x) =

N∑i=1

f i (x i )

Convex-concave augmented Lagrangian when L is symmetric

F (x , z) := f (x) + γ2 x>L x + z>L x ,

Saddle-point dynamics

x = − ∂F (x , z)∂x

= −∇f (x)− γ Lx − Lz

z =∂F (x , z)∂z

= Lx10 / 30

Page 16: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

In the case of directed graphs

Weight-balanced ⇔ 1>L = 0 ⇔ L + L> � 0,

F (x , z) := f (x) + γ2 x>L x + z>L x

still convex-concave

x = − ∂F (x , z)∂x

= −∇f (x)− γ 12

(L+ L>)x − L>z (Non distributed!)

changed to −∇f (x)− γ Lx − Lz

z =∂F (x , z)∂z

= Lx

11 / 30

Page 17: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

In the case of directed graphs

Weight-balanced ⇔ 1>L = 0 ⇔ L + L> � 0,

F (x , z) := f (x) + γ2 x>L x + z>L x

still convex-concave

x = − ∂F (x , z)∂x

= −∇f (x)− γ 12

(L+ L>)x − L>z (Non distributed!)

changed to −∇f (x)− γ Lx − Lz

z =∂F (x , z)∂z

= Lx

J. Wang and N. Elia (with L> = L), Allerton, 10

11 / 30

Page 18: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

In the case of directed graphs

Weight-balanced ⇔ 1>L = 0 ⇔ L + L> � 0,

F (x , z) := f (x) + γ2 x>L x + z>L x

still convex-concave

x = − ∂F (x , z)∂x

= −∇f (x)− γ 12

(L+ L>)x − L>z (Non distributed!)

changed to −∇f (x)− γ Lx − Lz

z =∂F (x , z)∂z

= Lx

B. Gharesifard and J. Cortes, CDC, 12

11 / 30

Page 19: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Review ofonline convex optimization

12 / 30

Page 20: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Different kind of optimization: sequential decision making

In the diagnosis example:

Each round t ∈ {1, . . . ,T}question (features, medical findings): wt

decision (about using CT): h(xt ,wt)outcome (by CT findings/follow up of patient): ytloss: l(yt h(xt ,wt))

Choose xt & Incur loss ft(xt) := l(yt h(xt ,wt))

Goal: sublinear regret

R(u,T ) :=T∑t=1

ft(xt)−T∑t=1

ft(u) ∈ o(T )

using “historical observations”

13 / 30

Page 21: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Different kind of optimization: sequential decision making

In the diagnosis example:

Each round t ∈ {1, . . . ,T}question (features, medical findings): wt

decision (about using CT): h(xt ,wt)outcome (by CT findings/follow up of patient): ytloss: l(yt h(xt ,wt))

Choose xt & Incur loss ft(xt) := l(yt h(xt ,wt))

Goal: sublinear regret

R(u,T ) :=T∑t=1

ft(xt)−T∑t=1

ft(u) ∈ o(T )

using “historical observations”

13 / 30

Page 22: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Why regret?

If the regret is sublinear,

T∑t=1

ft(xt) ≤T∑t=1

ft(u) + o(T ),

then,

limT→∞

1

T

T∑t=1

ft(xt) ≤ limT→∞

1

T

T∑t=1

ft(u)

In temporal average, online decisions {xt}Tt=1 perform as well as bestfixed decision in hindsight

“No regrets, my friend”

14 / 30

Page 23: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Why regret?

What about generalization error?

Sublinear regret does not imply xt+1 will do well with ft+1

No assumptions about sequence {ft}; it canI follow an unknown stochastic or deterministic model,

I or be chosen adversarially

In our example, ft := l(yt h(xt ,wt)).

I If some model w 7→ h(x∗,w) explains reasonably the data inhindsight,

I then the online models w 7→ h(xt ,w) perform just as well in average

15 / 30

Page 24: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Why regret?

What about generalization error?

Sublinear regret does not imply xt+1 will do well with ft+1

No assumptions about sequence {ft}; it canI follow an unknown stochastic or deterministic model,

I or be chosen adversarially

In our example, ft := l(yt h(xt ,wt)).

I If some model w 7→ h(x∗,w) explains reasonably the data inhindsight,

I then the online models w 7→ h(xt ,w) perform just as well in average

15 / 30

Page 25: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Some classical results

Projected gradient descent:

xt+1 = ΠS(xt − ηt∇ft(xt)), (1)

where ΠS is a projection onto a compact set S ⊆ Rd , & ‖∇f ‖2 ≤ H

Follow-the-Regularized-Leader:

xt+1 = argminy∈S

( t∑s=1

fs(y) + ψ(y))

Martin Zinkevich, 03I (1) achieves O(

√T ) regret under convexity with ηt =

1√t

Elad Hazan, Amit Agarwal, and Satyen Kale, 07I (1) achieves O(logT ) regret under p-strong convexity with ηt =

1pt

I Others: Online Newton Step, Follow the Regularized Leader, etc.

16 / 30

Page 26: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Some classical results

Projected gradient descent:

xt+1 = ΠS(xt − ηt∇ft(xt)), (1)

where ΠS is a projection onto a compact set S ⊆ Rd , & ‖∇f ‖2 ≤ H

Follow-the-Regularized-Leader:

xt+1 = argminy∈S

( t∑s=1

fs(y) + ψ(y))

Martin Zinkevich, 03I (1) achieves O(

√T ) regret under convexity with ηt =

1√t

Elad Hazan, Amit Agarwal, and Satyen Kale, 07I (1) achieves O(logT ) regret under p-strong convexity with ηt =

1pt

I Others: Online Newton Step, Follow the Regularized Leader, etc.

16 / 30

Page 27: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Done with the reviewNow... we combine the two frameworks

Previous work and limitations

our coordination algorithm in the online case

theorems of O(√T ) & O(logT ) agent regrets

simulations

outline of the proof

17 / 30

Page 28: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Combining both frameworks

18 / 30

Page 29: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Resuming the diagnosis example:

Health center i ∈ {1, . . . ,N} takes care of a set of patients P it at time t

f i (x) =T∑t=1

∑s∈P i

t

l(ysh(x ,ws)) =T∑t=1

f it (x)

Goal: sublinear agent regret

Rj(u,T ) :=T∑t=1

N∑i=1

f it (xjt )−

T∑t=1

N∑i=1

f it (u) ≤ o(T )

using “local information” & “historical observations”

19 / 30

Page 30: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Resuming the diagnosis example:

Health center i ∈ {1, . . . ,N} takes care of a set of patients P it at time t

f i (x) =T∑t=1

∑s∈P i

t

l(ysh(x ,ws)) =T∑t=1

f it (x)

Goal: sublinear agent regret

Rj(u,T ) :=T∑t=1

N∑i=1

f it (xjt )−

T∑t=1

N∑i=1

f it (u) ≤ o(T )

using “local information” & “historical observations”

19 / 30

Page 31: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Challenge: Coordinate hospitals

f 1t f 2t

f 3t f 4t

f 5t

Need to design distributed online algorithms

20 / 30

Page 32: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Previous work on consensus-based online algorithms

F. Yan, S. Sundaram, S. V. N. Vishwanathan and Y. Qi, TACProjected Subgradient Descent

I log(T ) regret (local strong convexity & bounded subgradients)I

√T regret (convexity & bounded subgradients)

I Both analysis require a projection onto a compact set

S. Hosseini, A. Chapman and M. Mesbahi, CDC, 13Dual Averaging

I√T regret (convexity & bounded subgradients)

I General regularized projection onto a convex closed set.

K. I. Tsianos and M. G. Rabbat, arXiv, 12Projected Subgradient Descent

I Empirical risk as opposed to regret analysis

Limitations: Communication digraph in all casesis fixed and strongly connected

21 / 30

Page 33: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Our contributions (informally)

time-varying communication digraphs under B-joint connectivity &weight-balanced

unconstrained optimization (no projection step onto a bounded set)

logT regret (local strong convexity & bounded subgradients)√T regret (convexity & β-centrality with β ∈ (0, 1] & bounded

subgradients)

x∗x

−ξx

f it is β-central in Z ⊆ Rd \ X ∗,where X ∗ =

{x∗ : 0 ∈ ∂f it (x

∗)},

if ∀x ∈ Z, ∃x∗ ∈ X ∗ such that

−ξ>x (x∗ − x)

‖ξx‖2‖x∗ − x‖2≥ β,

for each ξx ∈ ∂f it (x).

22 / 30

Page 34: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Our contributions (informally)

time-varying communication digraphs under B-joint connectivity &weight-balanced

unconstrained optimization (no projection step onto a bounded set)

logT regret (local strong convexity & bounded subgradients)√T regret (convexity & β-centrality with β ∈ (0, 1] & bounded

subgradients)

x∗x

−ξx

f it is β-central in Z ⊆ Rd \ X ∗,where X ∗ =

{x∗ : 0 ∈ ∂f it (x

∗)},

if ∀x ∈ Z, ∃x∗ ∈ X ∗ such that

−ξ>x (x∗ − x)

‖ξx‖2‖x∗ − x‖2≥ β,

for each ξx ∈ ∂f it (x).

22 / 30

Page 35: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Coordination algorithm

x it+1 = x it − ηt gx it

+ σ(γ

N∑j=1,j 6=i

aij ,t(x jt − x it

)+

N∑j=1,j 6=i

aij ,t(z jt − z it

))

z it+1 = z it − σN∑

j=1,j 6=i

aij ,t(x jt − x it

)

Subgradient descent on previous local objectives, gx it ∈ ∂f it

23 / 30

Page 36: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Coordination algorithm

x it+1 = x it − ηt gx it

+ σ(γ

N∑j=1,j 6=i

aij ,t(x jt − x it

)

+N∑

j=1,j 6=i

aij ,t(z jt − z it

))

z it+1 = z it − σN∑

j=1,j 6=i

aij ,t(x jt − x it

)

Proportional (linear) feedback on disagreement with neighbors

23 / 30

Page 37: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Coordination algorithm

x it+1 = x it − ηt gx it

+ σ(γ

N∑j=1,j 6=i

aij ,t(x jt − x it

)+

N∑j=1,j 6=i

aij ,t(z jt − z it

))

z it+1 = z it − σ

N∑j=1,j 6=i

aij ,t(x jt − x it

)

Integral (linear) feedback on disagreement with neighbors

23 / 30

Page 38: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Coordination algorithm

x it+1 = x it − ηt gx it

+ σ(γ

N∑j=1,j 6=i

aij ,t(x jt − x it

)+

N∑j=1,j 6=i

aij ,t(z jt − z it

))

z it+1 = z it − σ

N∑j=1,j 6=i

aij ,t(x jt − x it

)

Union of graphs over intervals of length B is strongly connected.

f 1t f 2t

f 3t f 4t

f 5t

time t time t + 1

a14,t+1a41,t+1

time t + 2

23 / 30

Page 39: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Coordination algorithm

x it+1 = x it − ηt gx it

+ σ(γ

N∑j=1,j 6=i

aij ,t(x jt − x it

)+

N∑j=1,j 6=i

aij ,t(z jt − z it

))

z it+1 = z it − σ

N∑j=1,j 6=i

aij ,t(x jt − x it

)

Compact representation[xt+1

zt+1

]=

[xtzt

]− σ

[γLt Lt−Lt 0

] [xtzt

]− ηt

[gxt0

]

vt+1 = (I− σE ⊗ Lt)vt − ηtgt ,

23 / 30

Page 40: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Coordination algorithm

x it+1 = x it − ηt gx it

+ σ(γ

N∑j=1,j 6=i

aij ,t(x jt − x it

)+

N∑j=1,j 6=i

aij ,t(z jt − z it

))

z it+1 = z it − σ

N∑j=1,j 6=i

aij ,t(x jt − x it

)

Compact representation & generalization[xt+1

zt+1

]=

[xtzt

]− σ

[γLt Lt−Lt 0

] [xtzt

]− ηt

[gxt0

]

vt+1 = (I− σE ⊗ Lt)vt − ηtgt ,

23 / 30

Page 41: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Theorem (DMN-JC)

Assume that

{f 1t , ..., f Nt }Tt=1 are convex functions in Rd

I with H-bounded subgradient sets,I nonempty and uniformly bounded sets of minimizers, andI p-strongly convex in a suff. large neighborhood of their minimizers

The sequence of weight-balanced communication digraphs isI nondegenerate, andI B-jointly-connected

E ∈ RK×K is diagonalizable with positive real eigenvalues

Then, taking σ ∈[

δλmin(E)δ ,

1−δλmax(E)dmax

]and ηt =

1p t ,

Rj(u,T ) ≤ C (‖u‖22 + 1 + logT ),

for any j ∈ {1, . . . ,N} and u ∈ Rd

24 / 30

Page 42: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Theorem (DMN-JC)

Assume that

{f 1t , ..., f Nt }Tt=1 are convex functions in Rd

I with H-bounded subgradient sets,I nonempty and uniformly bounded sets of minimizers, andI p-strongly convex in a suff. large neighborhood of their minimizers

The sequence of weight-balanced communication digraphs isI nondegenerate, andI B-jointly-connected

E ∈ RK×K is diagonalizable with positive real eigenvalues

Substituting strong convexity by β-centrality, for β ∈ (0, 1],

Rj(u,T ) ≤ C‖u‖22√T ,

for any j ∈ {1, . . . ,N} and u ∈ Rd

24 / 30

Page 43: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Simulations:acute brain finding revealed on Computerized Tomography

25 / 30

Page 44: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Simulations

Agents’ estimates

0 20 40 60 80 100 120 140 160 180 200−0.5

0

0.5

1

1.5

2

{ xit ,7}

Ni=1

0 20 40 60 80 100 120 140 160 180 200

0

1

2

time, t

Centralized

Average regret

0 20 40 60 80 100 120 140 160 180 200

10−3

10−2

10−1

100

101

time horizon, T

maxj 1/T Rj(

x∗

T ,{

∑N

i f it

}T

t=1

)

Proport ional-Integral

disagreement feedback

Proport ional

disagreement feedback

Centralized

f it (x) =∑s∈P i

t

l(ysh(x ,ws)) +110‖x‖

22

where

l(m) = log(1 + e−2m

)26 / 30

Page 45: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Outline of the proof

Analysis of network regret

RN (u,T ) :=T∑t=1

N∑i=1

f it (xit)−

T∑t=1

N∑i=1

f it (u)

ISS with respect to agreement

‖LKvt‖2 ≤ CI‖v1‖2(1− δ

4N2

)d t−1B e

+ CU max1≤s≤t−1

‖us‖2,

Agent regret bound in terms of {ηt}t≥1 and initial conditions

Boundedness of online estimates under β-centrality (β ∈ (0, 1] )

Doubling trick to obtain O(√T ) agent regret

Local strong convexity ⇒ β-centrality

ηt =1p t to obtain O(logT ) agent regret

27 / 30

Page 46: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Conclusions and future work

Conclusions

Distributed online convex optimization over jointly connecteddigraphs,submitted to IEEE Transactions on Network Science and Engineering(can be found in the webpage of Jorge Cortes)

Relevant for regression & classification that play a crucial role inmachine learning, computer vision, etc.

Future work

Refine guarantees under

model for evolution

of objective functions

28 / 30

Page 47: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Future directions in data-driven optimization problems

Distributed strategies for...

feature selection

semi-supervised learning

29 / 30

Page 48: Distributed online optimization over jointly connected digraphscarmenere.ucsd.edu/dmateosn/slides_online_optimization... · 2016-03-22 · Distributed online optimization over jointly

Thank you for listening!

Contact: David Mateos-Nunez and Jorge Cortes, at UC San Diego

30 / 30