ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

ITERATIVE METHODS AND REGULARIZATION

IN THE DESIGN OF FAST ALGORITHMS

Lorenzo Orecchia, MIT Math

An unified framework for optimization and online learning

beyond Multiplicative Weight Updates

Talk Outline: A Tale of Two Halves

PART 1: REGULARIZATION AND ITERATIVE TECHNIQUES FOR ONLINE LEARNING

• Online Linear Optimization

• Online Linear Optimization over Simplex and Multiplicative Weight Updates (MWUs)

• A Regularization Framework to generalize MWUs: Follow the Regularized Leader

MESSAGE: REGULARIZATION IS A POWERFUL ALGORITHMIC TECHNIQUE







Optimization:

Regularized Updates

Online Learning:

Multiplicative Weight

Updates (MWUs)







PART 2: NON-SMOOTH OPTIMIZATION AND FAST ALGORITHMS FOR MAXFLOW

• Non-smooth vs Smooth Convex Optimization

•Non-smooth Convex Optimization reduces to Online Linear Optimization

• Application: Understanding Undirected Maxflow algorithms based on MWUs

MESSAGE: FASTEST ALGORITHMS REQUIRE PRIMAL-DUAL APPROACH

Fast Algorithms for solving specific LPs and SDPs: Maximum Flow problems [PST], [GK], [F], [CKMST] Covering-packing problems [PST] Oblivious routing [R], [M]

Fast Approximation Algorithms based on LP and SDP relaxations: Maxcut [AK] Graph Partitioning Problems [AK], [S], [OSV]

Proof Technique Hardcore Lemma [BHK] QIP = PSPACE [W] Derandomization [Y]

… and more

TOC Applications of MWUs

Machine Learning meets Optimization meets TCS

These techniques have been rediscovered multiple times in different fields:

Machine Learning, Convex Optimization, TCS

Three surveys emphasizing the different viewpoints and literatures:

1) ML: Prediction, Learning and Games by Gabor and Lugosi

2) Optimization: Lectures in Modern Convex Optimization

by Ben Tal and Nemirowski

3) TCS: The Multiplicative Weights Update Method: a Meta

Algorithm and Applications by Arora, Hazan and Kale

REGULARIZATION 101

What is Regularization?

Regularization is a fundamental technique in optimization

OPTIMIZATION

PROBLEM

WELL-BEHAVED

OPTIMIZATION

PROBLEM

• Stable optimum

• Unique optimal solution

• Smoothness conditions

…

What is Regularization?

Regularization is a fundamental technique in optimization

OPTIMIZATION

PROBLEM

WELL-BEHAVED

OPTIMIZATION

PROBLEM

Benefits of Regularization in Learning and Statistics:

• Prevents overfitting

• Increases stability

•Decreases sensitivity to random noise

Regularizer F Parameter ¸ > 0

Example: Regularization Helps Stability

f(c) = argminx2S cTx

Consider a convex set and a linear optimization problem:

The optimal solution f(c) may be very unstable under perturbation of c :

S ½Rn

kc0 ¡ ck · ± and

S

cc0

f(c0) f(c)

kf(c0)¡ f(c)k >> ±



Consider a convex set and a regularized linear optimization problem

where F is ¾-strongly convex.

Then:

S ½Rn

kc0 ¡ ck · ± implies kf(c0)¡ f(c)kk · ±¾

f(c0)f(c)

+F(x)

cTx+F(x)

c0Tx+F(x)



Consider a convex set and a regularized linear optimization problem

where F is ¾-strongly convex.

Then:

S ½Rn

kc0 ¡ ck · ± implies kf(c0)¡ f(c)kk · ±¾

f(c0)f(c)

+F(x)

cTx+F(x)

c0Tx+F(x)

kslopek · ±

ONLINE LINEAR OPTIMIZATION

AND

MULTIPLICATIVE WEIGHT UPDATES

SETUP: Convex set X µ Rn, generic norm, repeated game over T rounds.

At round t,

Online Linear Minimization

ALGORITHM ADVERSARY

x(t) 2X

Current solution


At round t,


ALGORITHM ADVERSARY

x(t) 2X

Current solution

`(t) 2 Rn;kr`(t)k¤ · ½Current linear objective

Loss vector


At round t,


ALGORITHM ADVERSARY

x(t) 2X

Current solution

`(t) 2 Rn;kr`(t)k¤ · ½Current linear objective

Loss vector

`(t)Tx(t)

Algorithm’s loss


At round t,


ALGORITHM ADVERSARY

x(t) 2X x(t) 2X`(t) 2 Rn;kr`(t)k¤ · ½

x(t+1) 2X

Updated solution


At round t,


ALGORITHM ADVERSARY

x(t) 2X x(t) 2X`(t) 2 Rn;kr`(t)k¤ · ½

x(t+1) 2X `(t+1) 2 Rn;kr`(t)k¤ · ½

Updated solution New Loss Vector

L̂


At round t,


ALGORITHM ADVERSARY

x(t) 2X x(t) 2X`(t) 2 Rn;kr`(t)k¤ · ½

x(t+1) 2X `(t+1) 2 Rn;kr`(t)k¤ · ½

GOAL: update x(t) to minimize regret

Average Algorithm’s Loss A Posteriori Optimum

1

T¢TX

t=1

`(t)TxT ¡min

x2X

1

T¢TX

t=1

`(t)

i

T

x

L¤

p(t)ALGORITHM ADVERSARY

distribution over experts

Simplex Case: Learning with Experts SETUP: Simplex X µ Rn under ℓ1 norm. At round t,


distribution over dimensions

i.e. experts


k`(t)k1 · ½

Experts’ losses




k`(t)k1 · ½

Experts’ losses

EiÃp(t)h`(t)

i

i= p(t)

T`(t)

Algorithm’s loss




k`(t)k1 · ½

Experts’ losses

p(t+1)

Update distribution

Simplex Case: Multiplicative Weight Updates

p(t)

ALGORITHM ADVERSARY

`(t)

w(t+1)

i = (1¡ ²)`(t)

i w(t)

i ; w1 = ~1Weights:


p(t)

ALGORITHM ADVERSARY

`(t)

w(t+1)

i = (1¡ ²)`(t)

i w(t)

i ; w1 = ~1Weights:

p(t+1)

i =w(t)

iPn

j=1w(t)

j

Distribution:

p(t)

ALGORITHM ADVERSARY

`(t)

w(t+1)

i = (1¡ ²)`(t)

i w(t)

i ; w1 = ~1Weights:

p(t+1)

i =w(t)

iPn

j=1w(t)

j

Distribution:

MULTIPLICATIVE WEIGHT UPDATE


p(t)

ALGORITHM ADVERSARY

`(t)

w(t+1)

i = (1¡ ²)`(t)

i w(t)

i ; w1 = ~1Weights:

p(t+1)

i =w(t)

iPn

j=1w(t)

j

Distribution:


² 2 (0; 1)0 1

CONSERVATIVE AGGRESSIVE

MWUs: Unraveling the Update

p(t)

ALGORITHM ADVERSARY

`(t)

WEIGHT

CUMULATIVE LOSS

(1¡ ²)

Pt`(t)

i

p(t+1)

i / w(t+1)

i = (1¡ ²)`(t)

i ¢w(t)iUpdate:

w(t+1)

i

Pt `(t)

i

For and

MWUs: Regret Bound

p(t)

ALGORITHM ADVERSARY

`(t)

L̂¡L? · ½ logn

²T+ ½²

k`(t)k1 · ½² < 12

p(t+1)

i / w(t+1)

i = (1¡ ²)`(t)

i ¢w(t)iUpdate:

For and

MWUs: Regret Bound

p(t)

ALGORITHM ADVERSARY

`(t)

L̂¡L? · ½ logn

²T+ ½²

² < 12

p(t+1)

i / w(t+1)

i = (1¡ ²)`(t)

i ¢w(t)iUpdate:

Algorithm’s

Regret

Start-up Penalty Penalty for

being greedy

k`(t)k1 · ½

ONLINE LINEAR OPTIMIZATION BEYOND MWUs

A REGULARIZATION FRAMEWORK

MWUs: Proof Sketch of Regret Bound

©(t+1) = log1¡²Pn

i=1w(t+1)

i

p(t+1)

i / w(t+1)

i = (1¡ ²)

Pt

s=1`(s)

iUpdate:

• Proof is potential function argument

©(t+1) = log1¡²Pn

i=1w(t+1)

i

p(t+1)

i / w(t+1)

i = (1¡ ²)

Pt

s=1`(s)

iUpdate:


• Potential function bounds loss of best expert

©(t+1) · log1¡²minni=1w

(t+1)

i =minni=1

³Pt

s=1 `(s)

i

´


©(t+1) = log1¡²Pn

i=1w(t+1)

i

p(t+1)

i / w(t+1)

i = (1¡ ²)

Pt

s=1`(s)

iUpdate:



• Potential function is related to algorithm’s performance


(t+1)

i =minni=1

³Pt

s=1 `(s)

i

´

©(t+1) ¡©(t) ¸³`(t)

Tp(t)´¡ ²


©(t+1) = log1¡²Pn

i=1w(t+1)

i

p(t+1)

i / w(t+1)

i = (1¡ ²)

Pt

s=1`(s)

iUpdate:



• Potential function is related to algorithm’s performance


(t+1)

i =minni=1

³Pt

s=1 `(s)

i

´

©(t+1) ¡©(t) ¸³`(t)

Tp(t)´¡ ²

DOES THIS PROOF TECHNIQUE GENERALIZE TO BEYOND SIMPLEX CASE?


MWUs AND APPLICATIONS

Designing a Regularized Update GOAL: Design an update and its potential function analysis

QUESTION: Choice of potential function?

DESIDERATA: 1) lower bounds best expert’s loss

2) tracks algorithm’s performance





Attempt 1 – FOLLOW THE LEADER: Cumulative loss

L(t) =Pt

s=1 `(s)

x(t+1) = argminx2X

xTL(t) ©(t+1) = minx2X

xTL(t)

Pick best current solution Potential is current best loss

Designing a Regularized Update






L(t) =Pt

s=1 `(s)

x(t+1) = argminx2X


xTL(t)








L(t) =Pt

s=1 `(s)

x(t+1) = argminx2X


xTL(t)



Fails if best expert changes moves drastically






L(t) =Pt

s=1 `(s)

x(t+1) = argminx2X

xTL(t)

©(t+1) = minx2X

xTL(t)


How to make update

more stable?





Attempt 2 – FOLLOW THE REGULARIZED LEADER:

x(t+1) = argminx2X

xTL(t) + ´ ¢F(x)

©(t+1) = minx2X

xTL(t) + ´ ¢F(x)

Properties of Regularizer F(x):

1. Convex, differentiable

2. ¾-strong convex w.r.t. norm

Parameter ´ ¸ 0, TBD

Regularized Update: Definition






x(t+1) = argminx2X

xTL(t) + ´ ¢F(x)

©(t+1) = minx2X

xTL(t) + ´ ¢F(x)





Regularized Update: Definition

These properties are actually sufficient to get a regret bound






x(t+1) = argminx2X

xTL(t) + ´ ¢F(x)

©(t+1) = minx2X

xTL(t) + ´ ¢F(x)





Regularized Update: Analysis

©(t+1) · minx2X

L(t)Tx+ ´ ¢max

x2XF(x)






x(t+1) = argminx2X

xTL(t) + ´ ¢F(x)

©(t+1) = minx2X

xTL(t) + ´ ¢F(x)






©(t+1) · minx2X

L(t)Tx+ ´ ¢max

x2XF(x) Regularization

error






x(t+1) = argminx2X

xTL(t) + ´ ¢F(x)

©(t+1) = minx2X

xTL(t) + ´ ¢F(x)






?

f(t+1)(x)

Tracking the Algorithm: Proof by Picture

f(t+1)(x) = xTL(t) + ´ ¢ F(x)

f(t)(x)

x

f(t+1)(x)

x(t) x(t+1)

Define:

©(t+1)

©(t)

Define:

©(t+1)

©(t)

Notice:

f(t+1)(x)¡ f(t)(x) = `(t)Tx Latest loss vector


f(t+1)(x) = xTL(t) + ´ ¢ F(x)

f(t)(x)

x

f(t+1)(x)

x(t) x(t+1)

Define:

©(t+1)

©(t)

Notice:

f(t+1)(x)¡ f(t)(x) = `(t)Tx Latest loss vector

`(t)Tx(t)


f(t+1)(x) = L(t)Tx+ ´ ¢ F(x)

f(t)(x)

x

f(t+1)(x)

x(t) x(t+1)

Compare:

©(t+1)

©(t)

and ©(t+1) ¡©(t)


`(t)Tx(t)

f(t)(x)

x

f(t+1)(x)

x(t) x(t+1)

`(t)Tx(t)

f(t)(x)

f(t+1)(x)

p

Want:

©(t+1)

©(t)


f(t+1)(x(t)) ¼ f(t+1)(x(t+1))

`(t)Tx(t)

xx(t) x(t+1)

f(t)(x)

f(t+1)(x)

©(t+1) ¡©(t) = f(t+1)(x(t+1))¡ f(t+1)(x(t)) + `(t)Tx(t)

Regularization in Action

©(t+1)

©(t)

f (t) is (´ ¢ ¾ )-strongly-convex REGULARIZATION

f(t+1)(x) = L(t)Tx+ ´ ¢ F(x)

`(t)Tx(t)

xx(t) x(t+1)

f(t)(x)

f(t+1)(x)

`(t)


©(t+1)

©(t)


kf(t+1) ¡ f(t)k¤ = k`(t)k¤ jjx(t+1) ¡ x(t)jj ·jj`(t)jj¤´¢¾

STABILITY

`(t)Tx(t)

xx(t) x(t+1)

f(t)(x)

f(t+1)(x)

f(t+1)(x) = L(t)Tx+ ´ ¢ F(x)

`(t)


©(t+1)

©(t)


kf(t+1) ¡ f(t)k = k`(t)k jjx(t+1) ¡ x(t)jj¤ ·jj`(t)jj´¢¾

STABILITY

`(t)Tx(t)

xx(t) x(t+1)

f(t)(x)

f(t+1)(x)

f(t+1)(x) = L(t)Tx+ ´ ¢ F(x)

Quadratic

lower bound

to f(t+1)


Analysis: Progress in One Iteration

rf(t+1)(x(t)) = `(t) jjx(t) ¡ x(t)jj ·jj`(t)jj¤´¢¾

f (t+1) is (´ ¢ ¾)-strongly-convex

©(t+1) ¡©(t) = f(t+1)(x(t+1))¡ f(t+1)(x(t)) + `(t)Tx(t)

f(t+1)(x(t+1))¡ f(t+1)(x(t)) ¸ `(t)T(x(t+1) ¡ x(t)) +

jj`(t)jj2¤2´ ¢ ¾


Analysis: Progress in One Iteration

rf(t+1)(x(t)) = `(t)

f(t+1)(x(t+1))¡ f(t+1)(x(t)) ¸ `(t)T(x(t+1) ¡ x(t)) +

jj`(t)jj2¤2´ ¢ ¾

f (t+1) is (´ ¢ ¾)-strongly-convex

©(t+1) ¡©(t) = f(t+1)(x(t+1))¡ f(t+1)(x(t)) + `(t)Tx(t)

¸ ¡k`(t)k¤kx(t+1) ¡ x(t)k+ jj`(t)jj¤2´ ¢ ¾ ¸ ¡k`

(t)k2¤2´ ¢ ¾

jjx(t) ¡ x(t)jj ·jj`(t)jj¤´¢¾


Completing the Analysis

©(t+1) ¡©(t) ¸ `(t)Tx(t) ¡ k`(t)k¤

2¾´Regret at iteration t

Progress in one iteration:



©(t+1) ¡©(t) ¸ `(t)Tx(t) ¡ k`(t)k¤

2¾´


Telescopic sum:

©(T+1) ¸TX

t=1

`(t)Tp(t) +©(1) ¡ T ¢ jj`

(t)jj2´ ¢ ¾



©(t+1) ¡©(t) ¸ `(t)Tx(t) ¡ k`(t)k¤

2¾´


Telescopic sum:

©(T+1) ¸TX

t=1

`(t)Tp(t) +©(1) ¡ T ¢ jj`

(t)jj2´ ¢ ¾

Final regret bound:

1

T

ÃTX

t=1

`(t)Tx(t) ¡min

x2X

TX

t=1

`(t)Tx

!·

´

T¢ (maxx2X

F (x)¡minx2X

F (x)) +½2

2¾´



Regret bound: with regularizer F and

jj`(t)jj¤ · ½

Start-up Penalty Penalty for

being greedy

SAME TYPE OF BOUND AS FOR MWUs

1

T

ÃTX

t=1

`(t)Tx(t) ¡min

x2X

TX

t=1

`(t)Tx

!·

´

T¢ (maxx2X

F (x)¡minx2X

F (x)) +½2

2¾´


Reinterpreting MWUs

©(t+1) = minp¸0;Ppi=1

pTL(t) + ´ ¢nX

i=1

pi logpiPotential function:

Regularizer: is negative entropy F (p) =

nX

i=1

pi log pi


Reinterpreting MWUs

©(t+1) = minp¸0;Ppi=1

pTL(t) + ´ ¢nX

i=1


Regularizer: is negative entropy

F (p ) is 1-strongly-convex w.r.t.

Update:

F (p) =

nX

i=1

pi log pi

k ¢ k1

p(t+1) = arg minp¸0;Ppi=1

pTL(t) + ´ ¢nX

i=1

pi logpi

p(t+1)

i =e¡

1´L(t)

i

Pn

i=1 e¡ 1´L(t)

i

=(1¡ ²)L

(t)

i

Pn

i=1(1¡ ²)L(t)

i

:

SOFT-MAX


Reinterpreting MWUs

©(t+1) = minp¸0;Ppi=1

pTL(t) + ´ ¢nX

i=1


Regularizer: is negative entropy

F (p ) is 1-strongly-convex w.r.t.

Update:

F (p) =

nX

i=1

pi log pi

k ¢ k1

p(t+1) = arg minp¸0;Ppi=1

pTL(t) + ´ ¢nX

i=1

pi logpi

p(t+1)

i =e¡

1´L(t)

i

Pn

i=1 e¡ 1´L(t)

i

=(1¡ ²)L

(t)

i

Pn

i=1(1¡ ²)L(t)

i

:


Beyond MWUs: which regularizer?

Regret bound: optimizing over ́

Best choice of regularizer and norm minimizes

maxt jj`(t)jj2¤ ¢ (maxx2X F (x)¡minx2X F (x))

¾

1

T

ÃTX

t=1

`(t)Tx(t) ¡min

x2X

TX

t=1

`(t)Tx

!·½p(2 ¢ (maxx2X F (x)¡minx2X F (x))p

¾T


Beyond MWUs: which regularizer?

Regret bound: optimizing over ́

Best choice of regularizer and norm minimizes

maxt jj`(t)jj2¤ ¢ (maxx2X F (x)¡minx2X F (x))

¾

1

T

ÃTX

t=1

`(t)Tx(t) ¡min

x2X

TX

t=1

`(t)Tx

!·½p(2 ¢ (maxx2X F (x)¡minx2X F (x))p

¾T

Negative entropy with -norm is approximately optimal for simplex

QUESTION: are other regularizers ever useful?

`1

QUESTION 1:

Are other regularizers, besides entropy, ever useful?

YES! Applications:

Graph Partitioning and Random Walks

Spectral algorithms for balanced separator running in time

Uses random-walk framework and SDP MWUs

Different walks correspond to different regularizers for eigenvector problem

[Mahoney, Orecchia, Vishnoi 2011], [Orecchia, Sachdeva, Vishnoi 2012]

Different Regularizers in Algorithm Design

F(X) = Tr(X1=2)

F(X) = Tr(Xp)

F(X) = Tr(X logX)SDP MWU

p-norm, 1 · p · 1

NEW REGULARIZER

Heat Kernel Random Walk

Lazy Random Walk

Personalized PageRank

~O(m)

QUESTION 1:


YES! Applications:


Sparsification

²-spectral-sparsifiers with edges

Uses Matrix concentration bound equivalent to SDP MWUs

[Spielman, Srivastava 2008]

²-spectral-sparsifiers with edges

Can be interpreted as different regularizer:

[Batson, Spielman, Srivastava 2009]


O(n logn²2

)

O( n²2)

F(X) = Tr(X1=2)

QUESTION 1:


YES! Applications:


Sparsification

Many more in Online Learning

Bandit Online Learning [AHR], …


NON-SMOOTH CONVEX OPTIMIZATION

REDUCES TO

ONLINE LINEAR OPTIMIZATION

Convex Optimization Setup

8x 2 X;

krf(x)k¤ · ½

8x; y 2 X;

krf(y)¡rf(x)k¤ · Lky ¡ xk

f convex, differentiable

X µ Rn closed, convex set

minx2X

f(x)

NON-SMOOTH SMOOTH

½-Lipschitz continuous ½-Lipschitz continuous gradient


8x 2 X;

krf(x)k¤ · ½

8x; y 2 X;




minx2X

f(x)

NON-SMOOTH SMOOTH


Gradient step is guaranteed to decrease

function value

f(x(t+1)) · f(x(t))¡ krf(x(t))k2¤2L


8x 2 X;

krf(x)k¤ · ½

8x; y 2 X;




minx2X

f(x)

NON-SMOOTH SMOOTH



function value

f(x(t+1)) · f(x(t))¡ krf(x(t))k2¤2L

x(t)x(t+1)

NO GRADIENT STEP GUARANTEE


8x 2 X;

krf(x)k¤ · ½

8x; y 2 X;




minx2X

f(x)

NON-SMOOTH SMOOTH



function value

f(x(t+1)) · f(x(t))¡ krf(x(t))k2¤2L

x(t)x(t+1)

NO GRADIENT STEP GUARANTEE

ONLY DUAL GUARANTEE

Non-Smooth Setup: Dual Approach

8x 2X; krf(x)k¤ · ½



minx2X

f(x)

½-Lipschitz continuous

x(t)x(t+1) x(t+2)

APPROACH: Each iterate solution provides a lower bound and an upper bound

f(x(t)) ¸ f(x¤)

f(x¤) ¸ f(x(t)) +rf(x(t)T (x¤ ¡ x(t))


8x 2X; krf(x)k¤ · ½



minx2X

f(x)

½-Lipschitz continuous

x(t)x(t+1) x(t+2)


f(x(t)) ¸ f(x¤)

f(x¤) ¸ f(x(t)) +rf(x(t)T (x¤ ¡ x(t))

CAN WEAKEN DIFFERENTIABILITY ASSUMPTION: SUBGRADIENTS SUFFICE


x(t)x(t+1) x(t+2)


f(x(t)) ¸ f(x¤)

f(x¤) ¸ f(x(t)) +rf(x(t)T (x¤ ¡ x(t))

Take convex combination of both upper bounds and lower bounds with weights °t

UPPER BOUND:

LOWER BOUND:

1PT

t=1°t

³PT

t=1 °tf(x(t))´¸ f(x¤)

UPPER


x(t)x(t+1) x(t+2)


f(x(t)) ¸ f(x¤)

f(x¤) ¸ f(x(t)) +rf(x(t))T (x¤ ¡ x(t))


UPPER:

LOWER :

1PT

t=1°t

³PT

t=1 °tf(x(t))´¸ f(x¤)

UPPER

f(x¤) ¸ 1PT

t=1°t

hPT

t=1 °t(f(x(t)) +rf(x(t))T (x¤ ¡ x(t)))

i

LOWER


x(t)x(t+1) x(t+2)


f(x(t)) ¸ f(x¤)

f(x¤) ¸ f(x(t)) +rf(x(t))T (x¤ ¡ x(t))


UPPER:

LOWER :

1PT

t=1°t

³PT

t=1 °tf(x(t))´¸ f(x¤)

UPPER

f(x¤) ¸ 1PT

t=1°t

hPT

t=1 °t(f(x(t)) +rf(x(t))T (x¤ ¡ x(t)))

i

LOWER HOW TO UPDATE ITERATES?

HOW TO CHOSE WEIGHTS?

Reduction to Online Linear Minimization

Fix weights °t to be uniform for simplicity:

UPPER:

LOWER :

DUALITY GAP:

1PT

t=1°t

³PT

t=1 °tf(x(t))´¸ f(x¤)

f(x¤) ¸ 1PT

t=1°t

hPT

t=1 °t(f(x(t)) +rf(x(t))T (x¤ ¡ x(t)))

i

·PT

t=1°tPT

t=1°tf(x(t))

¸¡ f(x¤) ·

PT

t=1¡rf(x(t))T(x¤ ¡ x(t))

LINEAR FUNCTION



DUALITY GAP: ·PT

t=1°tPT

t=1°tf(x(t))

¸¡ f(x¤) ·

PT

t=1¡rf(x(t))T(x¤ ¡ x(t))

ALGORITHM ADVERSARY

x(t) 2X ¡rf(x(t))

ONLINE SETUP



DUALITY GAP: ·PT

t=1°tPT

t=1°tf(x(t))

¸¡ f(x¤) ·

PT

t=1¡rf(x(t))T(x¤ ¡ x(t))

ALGORITHM ADVERSARY

x(t) 2X `(t) =¡rf(x(t))

ONLINE SETUP

Recall that by assumption: Loss vector is gradient

k`(t)k¤ = krf(x(t))k¤ · ½



DUALITY GAP: hPT

t=11Tf(x(t))

i¡ f(x¤) · 1

T¢PT

t=1¡rf(x(t))T(x¤ ¡ x(t))

ALGORITHM ADVERSARY

x(t) 2X `(t) =¡rf(x(t))

ONLINE SETUP


k`(t)k¤ = krf(x(t))k¤ · ½

1

T¢TX

t=1

¡rf(x(t))T (x¤ ¡ x(t)) = REGRET

Final Bound

ALGORITHM ADVERSARY

x(t) 2X `(t) =¡rf(x(t))

ONLINE SETUP


k`(t)k¤ = krf(x(t))k¤ · ½

TX

t=1


²MD ·½p2 ¢ (maxx2X F (x)¡minx2X F (x))

¾pT

RESULTING ALGORITHM: MIRROR DESCENT

Error bound with ¾-strongly-convex regularizer F

Final Bound

ALGORITHM ADVERSARY

x(t) 2X `(t) =¡rf(x(t))

ONLINE SETUP


k`(t)k¤ = krf(x(t))k¤ · ½

TX

t=1


²MD ·½p2 ¢ (maxx2X F (x)¡minx2X F (x))

¾pT

RESULTING ALGORITHM: MIRROR DESCENT

Error bound with ¾-strongly-convex regularizer F

ASYMPTOTICALLY OPTIMAL BY INFORMATION COMPLEXITY LOWER BOUND

Non-Smooth Optimization over Simplex

²MD ·½p2 ¢ lognpT

RESULTING ALGORITHM:

MIRROR DESCENT OVER SIMPLEX = MWU

Regularizer F is negative entropy, with krf(x(t))k1 · ½

APPLICATIONS IN ALGORITHM DESIGN

Warm-up Example: Linear Programming

A 2 Rm£n;?9x 2 X : Ax¡ b ¸ 0

LP Feasibility problem

Easy constraints

Maintain feasible Hard constraints

Require fixing


A 2 Rm£n;?9x 2 X : Ax¡ b ¸ 0

Convert into non-smooth optimization problem over simplex:

Non-differentiable objective:


minp2¢m

maxx2X

pT (b¡Ax)

f(p) = maxx2X

pT (b¡Ax)


A 2 Rm£n;?9x 2 X : Ax¡ b ¸ 0


Non-differentiable objective:


minp2¢m

maxx2X

pT (b¡Ax)

f(p) = maxx2X

pT (b¡Ax)Best response to dual

solution p


A 2 Rm£n;?9x 2 X : b¡Ax ¸ 0


Non-differentiable objective

Admits subgradients, for all p:


minp2¢m

maxx2X

pT (b¡Ax)

f(p) = maxx2X

pT (b¡Ax)

xp : pT (b¡Axp) ¸ 0;

(b¡Axp) 2 @f(p)Subgradient is slack

in constraints


A 2 Rm£n;?9x 2 X : b¡Ax ¸ 0


Non-differentiable objective

Admits subgradients, for all p:

If we can pick xp such that , then


minp2¢m

maxx2X

pT (b¡Ax)

f(p) = maxx2X

pT (b¡Ax)

xp : pT (b¡Axp) ¸ 0;

(b¡Axp) 2 @f(p)

kb¡Axpk1 · ½

²MD ·½p2 ¢ lognpT

T ·2 ¢ ½2 ¢ logn

²2

Minaximum flow feasibility for value F over undirected graph G with incidence matrix B:

Turn into non-smooth minimization problem over simplex:

MWU and s-t Maxflow

8e 2 E;F ¢ jfejce

· 1

BT f = es ¡ et

f(p) = minBT f=es¡et

X

e2Epe ¢

F ¢ jfejce

¡ 1

Will enforce this

Best response fp is shortest s-t path with lengths pe / ce .

For any p, if fp has length > 1, there is no subgradient, i.e. problem is infeasible.

Otherwise, the following is a subgradient

Unfortunately, width can be large

@f(p)e =F ¢ j(fp)ej

ce¡ 1

k@f(p)ek1 ·F

cmin

[PST 91] T = O

³F logn

²2cmin

´

PROBLEM: Optimal for this specific formulation

SOLUTION: Regularize primal

Width Reduction: make function nicer

x(t)x(t+1) x(t+2)

k@f(p)ek1 ·F

cmin

f(p) = minBTf=es¡et

F ¢X

e2E

fe

ce

³pe +

²

m

´¡ 1

NEED PRIMAL ARGUMENT

PROBLEM: Optimal for this specific formulation

SOLUTION: Regularize primal

REGULARIZATION ERROR:

NEW WIDTH:

ITERATION BOUND:

Width Reduction: make primal nicer

k@f(p)ek1 ·F

cmin

f(p) = minBTf=es¡et

F ¢X

e2E

fe

ce

³pe +

²

m

´¡ 1

²F

k@f(p)ek1 ·m

²

[GK 98] T = O

³m logn

²2

´

Electrical Flow Approach [CKMST]

8e 2 E;F ¢ f2e

c2e· 1

BT f = es ¡ et Will enforce this

Different formulation yields basis for CKMST algorithm:

Non-smooth optimization problem:


X

e2Epe ¢

F ¢ f2ec2e

¡ 1


8e 2 E;F ¢ f2e

c2e· 1




Original width:


X

e2Epe ¢

F ¢ f2ec2e

¡ 1

Best response is electrical flow fp

k@f(p)ek1 · m


8e 2 E;F ¢ f2e

c2e· 1




Regularize primal:


X

e2Epe ¢

F ¢ f2ec2e

¡ 1


F ¢X

e2E

f2ec2e

³pe +

²

m

´¡ 1

k@f(p)ek1 ·

rm

²

Conclusion: Take-away messages

• Regularization is a powerful tool for the design of fast algorithms.

• Most iterative algorithms can be understood as regularized updates:

MWUs, Width Reduction, Interior Point, Gradient descent, ..

• Perform well in practice. Regularization also helps eliminate noise.

• ULTIMATE GOAL:

Development of a library of iterative methods for fast graph algorithms.

Regularization plays a fundamental role in this effort

THE END – THANK YOU

Documents

ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE