98
ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN OF FAST ALGORITHMS Lorenzo Orecchia, MIT Math An unified framework for optimization and online learning beyond Multiplicative Weight Updates

ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

ITERATIVE METHODS AND REGULARIZATION

IN THE DESIGN OF FAST ALGORITHMS

Lorenzo Orecchia, MIT Math

An unified framework for optimization and online learning

beyond Multiplicative Weight Updates

Page 2: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Talk Outline: A Tale of Two Halves

PART 1: REGULARIZATION AND ITERATIVE TECHNIQUES FOR ONLINE LEARNING

• Online Linear Optimization

• Online Linear Optimization over Simplex and Multiplicative Weight Updates (MWUs)

• A Regularization Framework to generalize MWUs: Follow the Regularized Leader

MESSAGE: REGULARIZATION IS A POWERFUL ALGORITHMIC TECHNIQUE

Page 3: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Talk Outline: A Tale of Two Halves

PART 1: REGULARIZATION AND ITERATIVE TECHNIQUES FOR ONLINE LEARNING

• Online Linear Optimization

• Online Linear Optimization over Simplex and Multiplicative Weight Updates (MWUs)

• A Regularization Framework to generalize MWUs: Follow the Regularized Leader

MESSAGE: REGULARIZATION IS A POWERFUL ALGORITHMIC TECHNIQUE

Optimization:

Regularized Updates

Online Learning:

Multiplicative Weight

Updates (MWUs)

Page 4: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Talk Outline: A Tale of Two Halves

PART 1: REGULARIZATION AND ITERATIVE TECHNIQUES FOR ONLINE LEARNING

• Online Linear Optimization

• Online Linear Optimization over Simplex and Multiplicative Weight Updates (MWUs)

• A Regularization Framework to generalize MWUs: Follow the Regularized Leader

MESSAGE: REGULARIZATION IS A POWERFUL ALGORITHMIC TECHNIQUE

PART 2: NON-SMOOTH OPTIMIZATION AND FAST ALGORITHMS FOR MAXFLOW

• Non-smooth vs Smooth Convex Optimization

•Non-smooth Convex Optimization reduces to Online Linear Optimization

• Application: Understanding Undirected Maxflow algorithms based on MWUs

MESSAGE: FASTEST ALGORITHMS REQUIRE PRIMAL-DUAL APPROACH

Page 5: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Fast Algorithms for solving specific LPs and SDPs: Maximum Flow problems [PST], [GK], [F], [CKMST] Covering-packing problems [PST] Oblivious routing [R], [M]

Fast Approximation Algorithms based on LP and SDP relaxations: Maxcut [AK] Graph Partitioning Problems [AK], [S], [OSV]

Proof Technique Hardcore Lemma [BHK] QIP = PSPACE [W] Derandomization [Y]

… and more

TOC Applications of MWUs

Page 6: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Machine Learning meets Optimization meets TCS

These techniques have been rediscovered multiple times in different fields:

Machine Learning, Convex Optimization, TCS

Three surveys emphasizing the different viewpoints and literatures:

1) ML: Prediction, Learning and Games by Gabor and Lugosi

2) Optimization: Lectures in Modern Convex Optimization

by Ben Tal and Nemirowski

3) TCS: The Multiplicative Weights Update Method: a Meta

Algorithm and Applications by Arora, Hazan and Kale

Page 7: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

REGULARIZATION 101

Page 8: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

What is Regularization?

Regularization is a fundamental technique in optimization

OPTIMIZATION

PROBLEM

WELL-BEHAVED

OPTIMIZATION

PROBLEM

• Stable optimum

• Unique optimal solution

• Smoothness conditions

Page 9: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

What is Regularization?

Regularization is a fundamental technique in optimization

OPTIMIZATION

PROBLEM

WELL-BEHAVED

OPTIMIZATION

PROBLEM

Benefits of Regularization in Learning and Statistics:

• Prevents overfitting

• Increases stability

•Decreases sensitivity to random noise

Regularizer F Parameter ¸ > 0

Page 10: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Example: Regularization Helps Stability

f(c) = argminx2S cTx

Consider a convex set and a linear optimization problem:

The optimal solution f(c) may be very unstable under perturbation of c :

S ½Rn

kc0 ¡ ck · ± and

S

cc0

f(c0) f(c)

kf(c0)¡ f(c)k >> ±

Page 11: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Example: Regularization Helps Stability

f(c) = argminx2S cTx

Consider a convex set and a regularized linear optimization problem

where F is ¾-strongly convex.

Then:

S ½Rn

kc0 ¡ ck · ± implies kf(c0)¡ f(c)kk · ±¾

f(c0)f(c)

+F(x)

cTx+F(x)

c0Tx+F(x)

Page 12: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Example: Regularization Helps Stability

f(c) = argminx2S cTx

Consider a convex set and a regularized linear optimization problem

where F is ¾-strongly convex.

Then:

S ½Rn

kc0 ¡ ck · ± implies kf(c0)¡ f(c)kk · ±¾

f(c0)f(c)

+F(x)

cTx+F(x)

c0Tx+F(x)

kslopek · ±

Page 13: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

ONLINE LINEAR OPTIMIZATION

AND

MULTIPLICATIVE WEIGHT UPDATES

Page 14: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

SETUP: Convex set X µ Rn, generic norm, repeated game over T rounds.

At round t,

Online Linear Minimization

ALGORITHM ADVERSARY

x(t) 2X

Current solution

Page 15: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

SETUP: Convex set X µ Rn, generic norm, repeated game over T rounds.

At round t,

Online Linear Minimization

ALGORITHM ADVERSARY

x(t) 2X

Current solution

`(t) 2 Rn;kr`(t)k¤ · ½Current linear objective

Loss vector

Page 16: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

SETUP: Convex set X µ Rn, generic norm, repeated game over T rounds.

At round t,

Online Linear Minimization

ALGORITHM ADVERSARY

x(t) 2X

Current solution

`(t) 2 Rn;kr`(t)k¤ · ½Current linear objective

Loss vector

`(t)Tx(t)

Algorithm’s loss

Page 17: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

SETUP: Convex set X µ Rn, generic norm, repeated game over T rounds.

At round t,

Online Linear Minimization

ALGORITHM ADVERSARY

x(t) 2X x(t) 2X`(t) 2 Rn;kr`(t)k¤ · ½

x(t+1) 2X

Updated solution

Page 18: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

SETUP: Convex set X µ Rn, generic norm, repeated game over T rounds.

At round t,

Online Linear Minimization

ALGORITHM ADVERSARY

x(t) 2X x(t) 2X`(t) 2 Rn;kr`(t)k¤ · ½

x(t+1) 2X `(t+1) 2 Rn;kr`(t)k¤ · ½

Updated solution New Loss Vector

Page 19: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

SETUP: Convex set X µ Rn, generic norm, repeated game over T rounds.

At round t,

Online Linear Minimization

ALGORITHM ADVERSARY

x(t) 2X x(t) 2X`(t) 2 Rn;kr`(t)k¤ · ½

x(t+1) 2X `(t+1) 2 Rn;kr`(t)k¤ · ½

GOAL: update x(t) to minimize regret

Average Algorithm’s Loss A Posteriori Optimum

1

T¢TX

t=1

`(t)TxT ¡min

x2X

1

T¢TX

t=1

`(t)

i

T

x

Page 20: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

p(t)ALGORITHM ADVERSARY

distribution over experts

Simplex Case: Learning with Experts SETUP: Simplex X µ Rn under ℓ1 norm. At round t,

Page 21: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

p(t)ALGORITHM ADVERSARY

distribution over dimensions

i.e. experts

Simplex Case: Learning with Experts SETUP: Simplex X µ Rn under ℓ1 norm. At round t,

k`(t)k1 · ½

Experts’ losses

Page 22: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

p(t)ALGORITHM ADVERSARY

distribution over experts

Simplex Case: Learning with Experts SETUP: Simplex X µ Rn under ℓ1 norm. At round t,

k`(t)k1 · ½

Experts’ losses

EiÃp(t)h`(t)

i

i= p(t)

T`(t)

Algorithm’s loss

Page 23: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

p(t)ALGORITHM ADVERSARY

distribution over experts

Simplex Case: Learning with Experts SETUP: Simplex X µ Rn under ℓ1 norm. At round t,

k`(t)k1 · ½

Experts’ losses

p(t+1)

Update distribution

Page 24: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Simplex Case: Multiplicative Weight Updates

p(t)

ALGORITHM ADVERSARY

`(t)

w(t+1)

i = (1¡ ²)`(t)

i w(t)

i ; w1 = ~1Weights:

Page 25: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Simplex Case: Multiplicative Weight Updates

p(t)

ALGORITHM ADVERSARY

`(t)

w(t+1)

i = (1¡ ²)`(t)

i w(t)

i ; w1 = ~1Weights:

p(t+1)

i =w(t)

iPn

j=1w(t)

j

Distribution:

Page 26: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

p(t)

ALGORITHM ADVERSARY

`(t)

w(t+1)

i = (1¡ ²)`(t)

i w(t)

i ; w1 = ~1Weights:

p(t+1)

i =w(t)

iPn

j=1w(t)

j

Distribution:

MULTIPLICATIVE WEIGHT UPDATE

Simplex Case: Multiplicative Weight Updates

Page 27: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

p(t)

ALGORITHM ADVERSARY

`(t)

w(t+1)

i = (1¡ ²)`(t)

i w(t)

i ; w1 = ~1Weights:

p(t+1)

i =w(t)

iPn

j=1w(t)

j

Distribution:

Simplex Case: Multiplicative Weight Updates

² 2 (0; 1)0 1

CONSERVATIVE AGGRESSIVE

Page 28: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs: Unraveling the Update

p(t)

ALGORITHM ADVERSARY

`(t)

WEIGHT

CUMULATIVE LOSS

(1¡ ²)

Pt`(t)

i

p(t+1)

i / w(t+1)

i = (1¡ ²)`(t)

i ¢w(t)iUpdate:

w(t+1)

i

Pt `(t)

i

Page 29: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

For and

MWUs: Regret Bound

p(t)

ALGORITHM ADVERSARY

`(t)

L̂¡L? · ½ logn

²T+ ½²

k`(t)k1 · ½² < 12

p(t+1)

i / w(t+1)

i = (1¡ ²)`(t)

i ¢w(t)iUpdate:

Page 30: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

For and

MWUs: Regret Bound

p(t)

ALGORITHM ADVERSARY

`(t)

L̂¡L? · ½ logn

²T+ ½²

² < 12

p(t+1)

i / w(t+1)

i = (1¡ ²)`(t)

i ¢w(t)iUpdate:

Algorithm’s

Regret

Start-up Penalty Penalty for

being greedy

k`(t)k1 · ½

Page 31: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

ONLINE LINEAR OPTIMIZATION BEYOND MWUs

A REGULARIZATION FRAMEWORK

Page 32: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs: Proof Sketch of Regret Bound

©(t+1) = log1¡²Pn

i=1w(t+1)

i

p(t+1)

i / w(t+1)

i = (1¡ ²)

Pt

s=1`(s)

iUpdate:

• Proof is potential function argument

Page 33: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

©(t+1) = log1¡²Pn

i=1w(t+1)

i

p(t+1)

i / w(t+1)

i = (1¡ ²)

Pt

s=1`(s)

iUpdate:

• Proof is potential function argument

• Potential function bounds loss of best expert

©(t+1) · log1¡²minni=1w

(t+1)

i =minni=1

³Pt

s=1 `(s)

i

´

MWUs: Proof Sketch of Regret Bound

Page 34: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

©(t+1) = log1¡²Pn

i=1w(t+1)

i

p(t+1)

i / w(t+1)

i = (1¡ ²)

Pt

s=1`(s)

iUpdate:

• Proof is potential function argument

• Potential function bounds loss of best expert

• Potential function is related to algorithm’s performance

©(t+1) · log1¡²minni=1w

(t+1)

i =minni=1

³Pt

s=1 `(s)

i

´

©(t+1) ¡©(t) ¸³`(t)

Tp(t)´¡ ²

MWUs: Proof Sketch of Regret Bound

Page 35: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

©(t+1) = log1¡²Pn

i=1w(t+1)

i

p(t+1)

i / w(t+1)

i = (1¡ ²)

Pt

s=1`(s)

iUpdate:

• Proof is potential function argument

• Potential function bounds loss of best expert

• Potential function is related to algorithm’s performance

©(t+1) · log1¡²minni=1w

(t+1)

i =minni=1

³Pt

s=1 `(s)

i

´

©(t+1) ¡©(t) ¸³`(t)

Tp(t)´¡ ²

DOES THIS PROOF TECHNIQUE GENERALIZE TO BEYOND SIMPLEX CASE?

MWUs: Proof Sketch of Regret Bound

Page 36: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

Designing a Regularized Update GOAL: Design an update and its potential function analysis

QUESTION: Choice of potential function?

DESIDERATA: 1) lower bounds best expert’s loss

2) tracks algorithm’s performance

Page 37: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

QUESTION: Choice of potential function?

DESIDERATA: 1) lower bounds best expert’s loss

2) tracks algorithm’s performance

Attempt 1 – FOLLOW THE LEADER: Cumulative loss

L(t) =Pt

s=1 `(s)

x(t+1) = argminx2X

xTL(t) ©(t+1) = minx2X

xTL(t)

Pick best current solution Potential is current best loss

Designing a Regularized Update

Page 38: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

QUESTION: Choice of potential function?

DESIDERATA: 1) lower bounds best expert’s loss

2) tracks algorithm’s performance

Attempt 1 – FOLLOW THE LEADER: Cumulative loss

L(t) =Pt

s=1 `(s)

x(t+1) = argminx2X

xTL(t) ©(t+1) = minx2X

xTL(t)

Pick best current solution Potential is current best loss

Designing a Regularized Update

Page 39: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

QUESTION: Choice of potential function?

DESIDERATA: 1) lower bounds best expert’s loss

2) tracks algorithm’s performance

Attempt 1 – FOLLOW THE LEADER: Cumulative loss

L(t) =Pt

s=1 `(s)

x(t+1) = argminx2X

xTL(t) ©(t+1) = minx2X

xTL(t)

Pick best current solution Potential is current best loss

Designing a Regularized Update

Fails if best expert changes moves drastically

Page 40: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

QUESTION: Choice of potential function?

DESIDERATA: 1) lower bounds best expert’s loss

2) tracks algorithm’s performance

Attempt 1 – FOLLOW THE LEADER: Cumulative loss

L(t) =Pt

s=1 `(s)

x(t+1) = argminx2X

xTL(t)

©(t+1) = minx2X

xTL(t)

Designing a Regularized Update

How to make update

more stable?

Page 41: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

QUESTION: Choice of potential function?

DESIDERATA: 1) lower bounds best expert’s loss

2) tracks algorithm’s performance

Attempt 2 – FOLLOW THE REGULARIZED LEADER:

x(t+1) = argminx2X

xTL(t) + ´ ¢F(x)

©(t+1) = minx2X

xTL(t) + ´ ¢F(x)

Properties of Regularizer F(x):

1. Convex, differentiable

2. ¾-strong convex w.r.t. norm

Parameter ´ ¸ 0, TBD

Regularized Update: Definition

Page 42: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

QUESTION: Choice of potential function?

DESIDERATA: 1) lower bounds best expert’s loss

2) tracks algorithm’s performance

Attempt 2 – FOLLOW THE REGULARIZED LEADER:

x(t+1) = argminx2X

xTL(t) + ´ ¢F(x)

©(t+1) = minx2X

xTL(t) + ´ ¢F(x)

Properties of Regularizer F(x):

1. Convex, differentiable

2. ¾-strong convex w.r.t. norm

Parameter ´ ¸ 0, TBD

Regularized Update: Definition

These properties are actually sufficient to get a regret bound

Page 43: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

QUESTION: Choice of potential function?

DESIDERATA: 1) lower bounds best expert’s loss

2) tracks algorithm’s performance

Attempt 2 – FOLLOW THE REGULARIZED LEADER:

x(t+1) = argminx2X

xTL(t) + ´ ¢F(x)

©(t+1) = minx2X

xTL(t) + ´ ¢F(x)

Properties of Regularizer F(x):

1. Convex, differentiable

2. ¾-strong convex w.r.t. norm

Parameter ´ ¸ 0, TBD

Regularized Update: Analysis

©(t+1) · minx2X

L(t)Tx+ ´ ¢max

x2XF(x)

Page 44: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

QUESTION: Choice of potential function?

DESIDERATA: 1) lower bounds best expert’s loss

2) tracks algorithm’s performance

Attempt 2 – FOLLOW THE REGULARIZED LEADER:

x(t+1) = argminx2X

xTL(t) + ´ ¢F(x)

©(t+1) = minx2X

xTL(t) + ´ ¢F(x)

Properties of Regularizer F(x):

1. Convex, differentiable

2. ¾-strong convex w.r.t. norm

Parameter ´ ¸ 0, TBD

Regularized Update: Analysis

©(t+1) · minx2X

L(t)Tx+ ´ ¢max

x2XF(x) Regularization

error

Page 45: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

QUESTION: Choice of potential function?

DESIDERATA: 1) lower bounds best expert’s loss

2) tracks algorithm’s performance

Attempt 2 – FOLLOW THE REGULARIZED LEADER:

x(t+1) = argminx2X

xTL(t) + ´ ¢F(x)

©(t+1) = minx2X

xTL(t) + ´ ¢F(x)

Properties of Regularizer F(x):

1. Convex, differentiable

2. ¾-strong convex w.r.t. norm

Parameter ´ ¸ 0, TBD

Regularized Update: Analysis

?

f(t+1)(x)

Page 46: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Tracking the Algorithm: Proof by Picture

f(t+1)(x) = xTL(t) + ´ ¢ F(x)

f(t)(x)

x

f(t+1)(x)

x(t) x(t+1)

Define:

©(t+1)

©(t)

Page 47: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Define:

©(t+1)

©(t)

Notice:

f(t+1)(x)¡ f(t)(x) = `(t)Tx Latest loss vector

Tracking the Algorithm: Proof by Picture

f(t+1)(x) = xTL(t) + ´ ¢ F(x)

f(t)(x)

x

f(t+1)(x)

x(t) x(t+1)

Page 48: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Define:

©(t+1)

©(t)

Notice:

f(t+1)(x)¡ f(t)(x) = `(t)Tx Latest loss vector

`(t)Tx(t)

Tracking the Algorithm: Proof by Picture

f(t+1)(x) = L(t)Tx+ ´ ¢ F(x)

f(t)(x)

x

f(t+1)(x)

x(t) x(t+1)

Page 49: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Compare:

©(t+1)

©(t)

and ©(t+1) ¡©(t)

Tracking the Algorithm: Proof by Picture

`(t)Tx(t)

f(t)(x)

x

f(t+1)(x)

x(t) x(t+1)

`(t)Tx(t)

f(t)(x)

f(t+1)(x)

Page 50: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

p

Want:

©(t+1)

©(t)

Tracking the Algorithm: Proof by Picture

f(t+1)(x(t)) ¼ f(t+1)(x(t+1))

`(t)Tx(t)

xx(t) x(t+1)

f(t)(x)

f(t+1)(x)

©(t+1) ¡©(t) = f(t+1)(x(t+1))¡ f(t+1)(x(t)) + `(t)Tx(t)

Page 51: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Regularization in Action

©(t+1)

©(t)

f (t) is (´ ¢ ¾ )-strongly-convex REGULARIZATION

f(t+1)(x) = L(t)Tx+ ´ ¢ F(x)

`(t)Tx(t)

xx(t) x(t+1)

f(t)(x)

f(t+1)(x)

Page 52: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

`(t)

Regularization in Action

©(t+1)

©(t)

f (t) is (´ ¢ ¾ )-strongly-convex REGULARIZATION

kf(t+1) ¡ f(t)k¤ = k`(t)k¤ jjx(t+1) ¡ x(t)jj ·jj`(t)jj¤´¢¾

STABILITY

`(t)Tx(t)

xx(t) x(t+1)

f(t)(x)

f(t+1)(x)

f(t+1)(x) = L(t)Tx+ ´ ¢ F(x)

Page 53: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

`(t)

Regularization in Action

©(t+1)

©(t)

f (t) is (´ ¢ ¾ )-strongly-convex REGULARIZATION

kf(t+1) ¡ f(t)k = k`(t)k jjx(t+1) ¡ x(t)jj¤ ·jj`(t)jj´¢¾

STABILITY

`(t)Tx(t)

xx(t) x(t+1)

f(t)(x)

f(t+1)(x)

f(t+1)(x) = L(t)Tx+ ´ ¢ F(x)

Quadratic

lower bound

to f(t+1)

Page 54: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

Analysis: Progress in One Iteration

rf(t+1)(x(t)) = `(t) jjx(t) ¡ x(t)jj ·jj`(t)jj¤´¢¾

f (t+1) is (´ ¢ ¾)-strongly-convex

©(t+1) ¡©(t) = f(t+1)(x(t+1))¡ f(t+1)(x(t)) + `(t)Tx(t)

f(t+1)(x(t+1))¡ f(t+1)(x(t)) ¸ `(t)T(x(t+1) ¡ x(t)) +

jj`(t)jj2¤2´ ¢ ¾

Page 55: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

Analysis: Progress in One Iteration

rf(t+1)(x(t)) = `(t)

f(t+1)(x(t+1))¡ f(t+1)(x(t)) ¸ `(t)T(x(t+1) ¡ x(t)) +

jj`(t)jj2¤2´ ¢ ¾

f (t+1) is (´ ¢ ¾)-strongly-convex

©(t+1) ¡©(t) = f(t+1)(x(t+1))¡ f(t+1)(x(t)) + `(t)Tx(t)

¸ ¡k`(t)k¤kx(t+1) ¡ x(t)k+ jj`(t)jj¤2´ ¢ ¾ ¸ ¡k`

(t)k2¤2´ ¢ ¾

jjx(t) ¡ x(t)jj ·jj`(t)jj¤´¢¾

Page 56: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

Completing the Analysis

©(t+1) ¡©(t) ¸ `(t)Tx(t) ¡ k`(t)k¤

2¾´Regret at iteration t

Progress in one iteration:

Page 57: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

Completing the Analysis

©(t+1) ¡©(t) ¸ `(t)Tx(t) ¡ k`(t)k¤

2¾´

Progress in one iteration:

Telescopic sum:

©(T+1) ¸TX

t=1

`(t)Tp(t) +©(1) ¡ T ¢ jj`

(t)jj2´ ¢ ¾

Page 58: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

Completing the Analysis

©(t+1) ¡©(t) ¸ `(t)Tx(t) ¡ k`(t)k¤

2¾´

Progress in one iteration:

Telescopic sum:

©(T+1) ¸TX

t=1

`(t)Tp(t) +©(1) ¡ T ¢ jj`

(t)jj2´ ¢ ¾

Final regret bound:

1

T

ÃTX

t=1

`(t)Tx(t) ¡min

x2X

TX

t=1

`(t)Tx

´

T¢ (maxx2X

F (x)¡minx2X

F (x)) +½2

2¾´

Page 59: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

Completing the Analysis

Regret bound: with regularizer F and

jj`(t)jj¤ · ½

Start-up Penalty Penalty for

being greedy

SAME TYPE OF BOUND AS FOR MWUs

1

T

ÃTX

t=1

`(t)Tx(t) ¡min

x2X

TX

t=1

`(t)Tx

´

T¢ (maxx2X

F (x)¡minx2X

F (x)) +½2

2¾´

Page 60: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

Reinterpreting MWUs

©(t+1) = minp¸0;Ppi=1

pTL(t) + ´ ¢nX

i=1

pi logpiPotential function:

Regularizer: is negative entropy F (p) =

nX

i=1

pi log pi

Page 61: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

Reinterpreting MWUs

©(t+1) = minp¸0;Ppi=1

pTL(t) + ´ ¢nX

i=1

pi logpiPotential function:

Regularizer: is negative entropy

F (p ) is 1-strongly-convex w.r.t.

Update:

F (p) =

nX

i=1

pi log pi

k ¢ k1

p(t+1) = arg minp¸0;Ppi=1

pTL(t) + ´ ¢nX

i=1

pi logpi

p(t+1)

i =e¡

1´L(t)

i

Pn

i=1 e¡ 1´L(t)

i

=(1¡ ²)L

(t)

i

Pn

i=1(1¡ ²)L(t)

i

:

SOFT-MAX

Page 62: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

Reinterpreting MWUs

©(t+1) = minp¸0;Ppi=1

pTL(t) + ´ ¢nX

i=1

pi logpiPotential function:

Regularizer: is negative entropy

F (p ) is 1-strongly-convex w.r.t.

Update:

F (p) =

nX

i=1

pi log pi

k ¢ k1

p(t+1) = arg minp¸0;Ppi=1

pTL(t) + ´ ¢nX

i=1

pi logpi

p(t+1)

i =e¡

1´L(t)

i

Pn

i=1 e¡ 1´L(t)

i

=(1¡ ²)L

(t)

i

Pn

i=1(1¡ ²)L(t)

i

:

Page 63: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

Beyond MWUs: which regularizer?

Regret bound: optimizing over ́

Best choice of regularizer and norm minimizes

maxt jj`(t)jj2¤ ¢ (maxx2X F (x)¡minx2X F (x))

¾

1

T

ÃTX

t=1

`(t)Tx(t) ¡min

x2X

TX

t=1

`(t)Tx

!·½p(2 ¢ (maxx2X F (x)¡minx2X F (x))p

¾T

Page 64: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

Beyond MWUs: which regularizer?

Regret bound: optimizing over ́

Best choice of regularizer and norm minimizes

maxt jj`(t)jj2¤ ¢ (maxx2X F (x)¡minx2X F (x))

¾

1

T

ÃTX

t=1

`(t)Tx(t) ¡min

x2X

TX

t=1

`(t)Tx

!·½p(2 ¢ (maxx2X F (x)¡minx2X F (x))p

¾T

Negative entropy with -norm is approximately optimal for simplex

QUESTION: are other regularizers ever useful?

`1

Page 65: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

QUESTION 1:

Are other regularizers, besides entropy, ever useful?

YES! Applications:

Graph Partitioning and Random Walks

Spectral algorithms for balanced separator running in time

Uses random-walk framework and SDP MWUs

Different walks correspond to different regularizers for eigenvector problem

[Mahoney, Orecchia, Vishnoi 2011], [Orecchia, Sachdeva, Vishnoi 2012]

Different Regularizers in Algorithm Design

F(X) = Tr(X1=2)

F(X) = Tr(Xp)

F(X) = Tr(X logX)SDP MWU

p-norm, 1 · p · 1

NEW REGULARIZER

Heat Kernel Random Walk

Lazy Random Walk

Personalized PageRank

~O(m)

Page 66: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

QUESTION 1:

Are other regularizers, besides entropy, ever useful?

YES! Applications:

Graph Partitioning and Random Walks

Sparsification

²-spectral-sparsifiers with edges

Uses Matrix concentration bound equivalent to SDP MWUs

[Spielman, Srivastava 2008]

²-spectral-sparsifiers with edges

Can be interpreted as different regularizer:

[Batson, Spielman, Srivastava 2009]

Different Regularizers in Algorithm Design

O(n logn²2

)

O( n²2)

F(X) = Tr(X1=2)

Page 67: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

QUESTION 1:

Are other regularizers, besides entropy, ever useful?

YES! Applications:

Graph Partitioning and Random Walks

Sparsification

Many more in Online Learning

Bandit Online Learning [AHR], …

Different Regularizers in Algorithm Design

Page 68: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

NON-SMOOTH CONVEX OPTIMIZATION

REDUCES TO

ONLINE LINEAR OPTIMIZATION

Page 69: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Convex Optimization Setup

8x 2 X;

krf(x)k¤ · ½

8x; y 2 X;

krf(y)¡rf(x)k¤ · Lky ¡ xk

f convex, differentiable

X µ Rn closed, convex set

minx2X

f(x)

NON-SMOOTH SMOOTH

½-Lipschitz continuous ½-Lipschitz continuous gradient

Page 70: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Convex Optimization Setup

8x 2 X;

krf(x)k¤ · ½

8x; y 2 X;

krf(y)¡rf(x)k¤ · Lky ¡ xk

f convex, differentiable

X µ Rn closed, convex set

minx2X

f(x)

NON-SMOOTH SMOOTH

½-Lipschitz continuous ½-Lipschitz continuous gradient

Gradient step is guaranteed to decrease

function value

f(x(t+1)) · f(x(t))¡ krf(x(t))k2¤2L

Page 71: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Convex Optimization Setup

8x 2 X;

krf(x)k¤ · ½

8x; y 2 X;

krf(y)¡rf(x)k¤ · Lky ¡ xk

f convex, differentiable

X µ Rn closed, convex set

minx2X

f(x)

NON-SMOOTH SMOOTH

½-Lipschitz continuous ½-Lipschitz continuous gradient

Gradient step is guaranteed to decrease

function value

f(x(t+1)) · f(x(t))¡ krf(x(t))k2¤2L

x(t)x(t+1)

NO GRADIENT STEP GUARANTEE

Page 72: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Convex Optimization Setup

8x 2 X;

krf(x)k¤ · ½

8x; y 2 X;

krf(y)¡rf(x)k¤ · Lky ¡ xk

f convex, differentiable

X µ Rn closed, convex set

minx2X

f(x)

NON-SMOOTH SMOOTH

½-Lipschitz continuous ½-Lipschitz continuous gradient

Gradient step is guaranteed to decrease

function value

f(x(t+1)) · f(x(t))¡ krf(x(t))k2¤2L

x(t)x(t+1)

NO GRADIENT STEP GUARANTEE

ONLY DUAL GUARANTEE

Page 73: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Non-Smooth Setup: Dual Approach

8x 2X; krf(x)k¤ · ½

f convex, differentiable

X µ Rn closed, convex set

minx2X

f(x)

½-Lipschitz continuous

x(t)x(t+1) x(t+2)

APPROACH: Each iterate solution provides a lower bound and an upper bound

f(x(t)) ¸ f(x¤)

f(x¤) ¸ f(x(t)) +rf(x(t)T (x¤ ¡ x(t))

Page 74: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Non-Smooth Setup: Dual Approach

8x 2X; krf(x)k¤ · ½

f convex, differentiable

X µ Rn closed, convex set

minx2X

f(x)

½-Lipschitz continuous

x(t)x(t+1) x(t+2)

APPROACH: Each iterate solution provides a lower bound and an upper bound

f(x(t)) ¸ f(x¤)

f(x¤) ¸ f(x(t)) +rf(x(t)T (x¤ ¡ x(t))

CAN WEAKEN DIFFERENTIABILITY ASSUMPTION: SUBGRADIENTS SUFFICE

Page 75: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Non-Smooth Setup: Dual Approach

x(t)x(t+1) x(t+2)

APPROACH: Each iterate solution provides a lower bound and an upper bound

f(x(t)) ¸ f(x¤)

f(x¤) ¸ f(x(t)) +rf(x(t)T (x¤ ¡ x(t))

Take convex combination of both upper bounds and lower bounds with weights °t

UPPER BOUND:

LOWER BOUND:

1PT

t=1°t

³PT

t=1 °tf(x(t))´¸ f(x¤)

UPPER

Page 76: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Non-Smooth Setup: Dual Approach

x(t)x(t+1) x(t+2)

APPROACH: Each iterate solution provides a lower bound and an upper bound

f(x(t)) ¸ f(x¤)

f(x¤) ¸ f(x(t)) +rf(x(t))T (x¤ ¡ x(t))

Take convex combination of both upper bounds and lower bounds with weights °t

UPPER:

LOWER :

1PT

t=1°t

³PT

t=1 °tf(x(t))´¸ f(x¤)

UPPER

f(x¤) ¸ 1PT

t=1°t

hPT

t=1 °t(f(x(t)) +rf(x(t))T (x¤ ¡ x(t)))

i

LOWER

Page 77: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Non-Smooth Setup: Dual Approach

x(t)x(t+1) x(t+2)

APPROACH: Each iterate solution provides a lower bound and an upper bound

f(x(t)) ¸ f(x¤)

f(x¤) ¸ f(x(t)) +rf(x(t))T (x¤ ¡ x(t))

Take convex combination of both upper bounds and lower bounds with weights °t

UPPER:

LOWER :

1PT

t=1°t

³PT

t=1 °tf(x(t))´¸ f(x¤)

UPPER

f(x¤) ¸ 1PT

t=1°t

hPT

t=1 °t(f(x(t)) +rf(x(t))T (x¤ ¡ x(t)))

i

LOWER HOW TO UPDATE ITERATES?

HOW TO CHOSE WEIGHTS?

Page 78: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Reduction to Online Linear Minimization

Fix weights °t to be uniform for simplicity:

UPPER:

LOWER :

DUALITY GAP:

1PT

t=1°t

³PT

t=1 °tf(x(t))´¸ f(x¤)

f(x¤) ¸ 1PT

t=1°t

hPT

t=1 °t(f(x(t)) +rf(x(t))T (x¤ ¡ x(t)))

i

·PT

t=1°tPT

t=1°tf(x(t))

¸¡ f(x¤) ·

PT

t=1¡rf(x(t))T(x¤ ¡ x(t))

LINEAR FUNCTION

Page 79: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Reduction to Online Linear Minimization

Fix weights °t to be uniform for simplicity:

DUALITY GAP: ·PT

t=1°tPT

t=1°tf(x(t))

¸¡ f(x¤) ·

PT

t=1¡rf(x(t))T(x¤ ¡ x(t))

ALGORITHM ADVERSARY

x(t) 2X ¡rf(x(t))

ONLINE SETUP

Page 80: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Reduction to Online Linear Minimization

Fix weights °t to be uniform for simplicity:

DUALITY GAP: ·PT

t=1°tPT

t=1°tf(x(t))

¸¡ f(x¤) ·

PT

t=1¡rf(x(t))T(x¤ ¡ x(t))

ALGORITHM ADVERSARY

x(t) 2X `(t) =¡rf(x(t))

ONLINE SETUP

Recall that by assumption: Loss vector is gradient

k`(t)k¤ = krf(x(t))k¤ · ½

Page 81: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Reduction to Online Linear Minimization

Fix weights °t to be uniform for simplicity:

DUALITY GAP: hPT

t=11Tf(x(t))

i¡ f(x¤) · 1

T¢PT

t=1¡rf(x(t))T(x¤ ¡ x(t))

ALGORITHM ADVERSARY

x(t) 2X `(t) =¡rf(x(t))

ONLINE SETUP

Recall that by assumption: Loss vector is gradient

k`(t)k¤ = krf(x(t))k¤ · ½

1

T¢TX

t=1

¡rf(x(t))T (x¤ ¡ x(t)) = REGRET

Page 82: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Final Bound

ALGORITHM ADVERSARY

x(t) 2X `(t) =¡rf(x(t))

ONLINE SETUP

Recall that by assumption: Loss vector is gradient

k`(t)k¤ = krf(x(t))k¤ · ½

TX

t=1

¡rf(x(t))T (x¤ ¡ x(t)) = REGRET

²MD ·½p2 ¢ (maxx2X F (x)¡minx2X F (x))

¾pT

RESULTING ALGORITHM: MIRROR DESCENT

Error bound with ¾-strongly-convex regularizer F

Page 83: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Final Bound

ALGORITHM ADVERSARY

x(t) 2X `(t) =¡rf(x(t))

ONLINE SETUP

Recall that by assumption: Loss vector is gradient

k`(t)k¤ = krf(x(t))k¤ · ½

TX

t=1

¡rf(x(t))T (x¤ ¡ x(t)) = REGRET

²MD ·½p2 ¢ (maxx2X F (x)¡minx2X F (x))

¾pT

RESULTING ALGORITHM: MIRROR DESCENT

Error bound with ¾-strongly-convex regularizer F

ASYMPTOTICALLY OPTIMAL BY INFORMATION COMPLEXITY LOWER BOUND

Page 84: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Non-Smooth Optimization over Simplex

²MD ·½p2 ¢ lognpT

RESULTING ALGORITHM:

MIRROR DESCENT OVER SIMPLEX = MWU

Regularizer F is negative entropy, with krf(x(t))k1 · ½

Page 85: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

APPLICATIONS IN ALGORITHM DESIGN

Page 86: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Warm-up Example: Linear Programming

A 2 Rm£n;?9x 2 X : Ax¡ b ¸ 0

LP Feasibility problem

Easy constraints

Maintain feasible Hard constraints

Require fixing

Page 87: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Warm-up Example: Linear Programming

A 2 Rm£n;?9x 2 X : Ax¡ b ¸ 0

Convert into non-smooth optimization problem over simplex:

Non-differentiable objective:

LP Feasibility problem

minp2¢m

maxx2X

pT (b¡Ax)

f(p) = maxx2X

pT (b¡Ax)

Page 88: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Warm-up Example: Linear Programming

A 2 Rm£n;?9x 2 X : Ax¡ b ¸ 0

Convert into non-smooth optimization problem over simplex:

Non-differentiable objective:

LP Feasibility problem

minp2¢m

maxx2X

pT (b¡Ax)

f(p) = maxx2X

pT (b¡Ax)Best response to dual

solution p

Page 89: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Warm-up Example: Linear Programming

A 2 Rm£n;?9x 2 X : b¡Ax ¸ 0

Convert into non-smooth optimization problem over simplex:

Non-differentiable objective

Admits subgradients, for all p:

LP Feasibility problem

minp2¢m

maxx2X

pT (b¡Ax)

f(p) = maxx2X

pT (b¡Ax)

xp : pT (b¡Axp) ¸ 0;

(b¡Axp) 2 @f(p)Subgradient is slack

in constraints

Page 90: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Warm-up Example: Linear Programming

A 2 Rm£n;?9x 2 X : b¡Ax ¸ 0

Convert into non-smooth optimization problem over simplex:

Non-differentiable objective

Admits subgradients, for all p:

If we can pick xp such that , then

LP Feasibility problem

minp2¢m

maxx2X

pT (b¡Ax)

f(p) = maxx2X

pT (b¡Ax)

xp : pT (b¡Axp) ¸ 0;

(b¡Axp) 2 @f(p)

kb¡Axpk1 · ½

²MD ·½p2 ¢ lognpT

T ·2 ¢ ½2 ¢ logn

²2

Page 91: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Minaximum flow feasibility for value F over undirected graph G with incidence matrix B:

Turn into non-smooth minimization problem over simplex:

MWU and s-t Maxflow

8e 2 E;F ¢ jfejce

· 1

BT f = es ¡ et

f(p) = minBT f=es¡et

X

e2Epe ¢

F ¢ jfejce

¡ 1

Will enforce this

Best response fp is shortest s-t path with lengths pe / ce .

For any p, if fp has length > 1, there is no subgradient, i.e. problem is infeasible.

Otherwise, the following is a subgradient

Unfortunately, width can be large

@f(p)e =F ¢ j(fp)ej

ce¡ 1

k@f(p)ek1 ·F

cmin

[PST 91] T = O

³F logn

²2cmin

´

Page 92: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

PROBLEM: Optimal for this specific formulation

SOLUTION: Regularize primal

Width Reduction: make function nicer

x(t)x(t+1) x(t+2)

k@f(p)ek1 ·F

cmin

f(p) = minBTf=es¡et

F ¢X

e2E

fe

ce

³pe +

²

m

´¡ 1

NEED PRIMAL ARGUMENT

Page 93: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

PROBLEM: Optimal for this specific formulation

SOLUTION: Regularize primal

REGULARIZATION ERROR:

NEW WIDTH:

ITERATION BOUND:

Width Reduction: make primal nicer

k@f(p)ek1 ·F

cmin

f(p) = minBTf=es¡et

F ¢X

e2E

fe

ce

³pe +

²

m

´¡ 1

²F

k@f(p)ek1 ·m

²

[GK 98] T = O

³m logn

²2

´

Page 94: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Electrical Flow Approach [CKMST]

8e 2 E;F ¢ f2e

c2e· 1

BT f = es ¡ et Will enforce this

Different formulation yields basis for CKMST algorithm:

Non-smooth optimization problem:

f(p) = minBT f=es¡et

X

e2Epe ¢

F ¢ f2ec2e

¡ 1

Page 95: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Electrical Flow Approach [CKMST]

8e 2 E;F ¢ f2e

c2e· 1

BT f = es ¡ et Will enforce this

Different formulation yields basis for CKMST algorithm:

Non-smooth optimization problem:

Original width:

f(p) = minBT f=es¡et

X

e2Epe ¢

F ¢ f2ec2e

¡ 1

Best response is electrical flow fp

k@f(p)ek1 · m

Page 96: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Electrical Flow Approach [CKMST]

8e 2 E;F ¢ f2e

c2e· 1

BT f = es ¡ et Will enforce this

Different formulation yields basis for CKMST algorithm:

Non-smooth optimization problem:

Regularize primal:

f(p) = minBT f=es¡et

X

e2Epe ¢

F ¢ f2ec2e

¡ 1

f(p) = minBT f=es¡et

F ¢X

e2E

f2ec2e

³pe +

²

m

´¡ 1

k@f(p)ek1 ·

rm

²

Page 97: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Conclusion: Take-away messages

• Regularization is a powerful tool for the design of fast algorithms.

• Most iterative algorithms can be understood as regularized updates:

MWUs, Width Reduction, Interior Point, Gradient descent, ..

• Perform well in practice. Regularization also helps eliminate noise.

• ULTIMATE GOAL:

Development of a library of iterative methods for fast graph algorithms.

Regularization plays a fundamental role in this effort

Page 98: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

THE END – THANK YOU