Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
ITERATIVE METHODS AND REGULARIZATION
IN THE DESIGN OF FAST ALGORITHMS
Lorenzo Orecchia, MIT Math
An unified framework for optimization and online learning
beyond Multiplicative Weight Updates
Talk Outline: A Tale of Two Halves
PART 1: REGULARIZATION AND ITERATIVE TECHNIQUES FOR ONLINE LEARNING
• Online Linear Optimization
• Online Linear Optimization over Simplex and Multiplicative Weight Updates (MWUs)
• A Regularization Framework to generalize MWUs: Follow the Regularized Leader
MESSAGE: REGULARIZATION IS A POWERFUL ALGORITHMIC TECHNIQUE
Talk Outline: A Tale of Two Halves
PART 1: REGULARIZATION AND ITERATIVE TECHNIQUES FOR ONLINE LEARNING
• Online Linear Optimization
• Online Linear Optimization over Simplex and Multiplicative Weight Updates (MWUs)
• A Regularization Framework to generalize MWUs: Follow the Regularized Leader
MESSAGE: REGULARIZATION IS A POWERFUL ALGORITHMIC TECHNIQUE
Optimization:
Regularized Updates
Online Learning:
Multiplicative Weight
Updates (MWUs)
Talk Outline: A Tale of Two Halves
PART 1: REGULARIZATION AND ITERATIVE TECHNIQUES FOR ONLINE LEARNING
• Online Linear Optimization
• Online Linear Optimization over Simplex and Multiplicative Weight Updates (MWUs)
• A Regularization Framework to generalize MWUs: Follow the Regularized Leader
MESSAGE: REGULARIZATION IS A POWERFUL ALGORITHMIC TECHNIQUE
PART 2: NON-SMOOTH OPTIMIZATION AND FAST ALGORITHMS FOR MAXFLOW
• Non-smooth vs Smooth Convex Optimization
•Non-smooth Convex Optimization reduces to Online Linear Optimization
• Application: Understanding Undirected Maxflow algorithms based on MWUs
MESSAGE: FASTEST ALGORITHMS REQUIRE PRIMAL-DUAL APPROACH
Fast Algorithms for solving specific LPs and SDPs: Maximum Flow problems [PST], [GK], [F], [CKMST] Covering-packing problems [PST] Oblivious routing [R], [M]
Fast Approximation Algorithms based on LP and SDP relaxations: Maxcut [AK] Graph Partitioning Problems [AK], [S], [OSV]
Proof Technique Hardcore Lemma [BHK] QIP = PSPACE [W] Derandomization [Y]
… and more
TOC Applications of MWUs
Machine Learning meets Optimization meets TCS
These techniques have been rediscovered multiple times in different fields:
Machine Learning, Convex Optimization, TCS
Three surveys emphasizing the different viewpoints and literatures:
1) ML: Prediction, Learning and Games by Gabor and Lugosi
2) Optimization: Lectures in Modern Convex Optimization
by Ben Tal and Nemirowski
3) TCS: The Multiplicative Weights Update Method: a Meta
Algorithm and Applications by Arora, Hazan and Kale
REGULARIZATION 101
What is Regularization?
Regularization is a fundamental technique in optimization
OPTIMIZATION
PROBLEM
WELL-BEHAVED
OPTIMIZATION
PROBLEM
• Stable optimum
• Unique optimal solution
• Smoothness conditions
…
What is Regularization?
Regularization is a fundamental technique in optimization
OPTIMIZATION
PROBLEM
WELL-BEHAVED
OPTIMIZATION
PROBLEM
Benefits of Regularization in Learning and Statistics:
• Prevents overfitting
• Increases stability
•Decreases sensitivity to random noise
Regularizer F Parameter ¸ > 0
Example: Regularization Helps Stability
f(c) = argminx2S cTx
Consider a convex set and a linear optimization problem:
The optimal solution f(c) may be very unstable under perturbation of c :
S ½Rn
kc0 ¡ ck · ± and
S
cc0
f(c0) f(c)
kf(c0)¡ f(c)k >> ±
Example: Regularization Helps Stability
f(c) = argminx2S cTx
Consider a convex set and a regularized linear optimization problem
where F is ¾-strongly convex.
Then:
S ½Rn
kc0 ¡ ck · ± implies kf(c0)¡ f(c)kk · ±¾
f(c0)f(c)
+F(x)
cTx+F(x)
c0Tx+F(x)
Example: Regularization Helps Stability
f(c) = argminx2S cTx
Consider a convex set and a regularized linear optimization problem
where F is ¾-strongly convex.
Then:
S ½Rn
kc0 ¡ ck · ± implies kf(c0)¡ f(c)kk · ±¾
f(c0)f(c)
+F(x)
cTx+F(x)
c0Tx+F(x)
kslopek · ±
ONLINE LINEAR OPTIMIZATION
AND
MULTIPLICATIVE WEIGHT UPDATES
SETUP: Convex set X µ Rn, generic norm, repeated game over T rounds.
At round t,
Online Linear Minimization
ALGORITHM ADVERSARY
x(t) 2X
Current solution
SETUP: Convex set X µ Rn, generic norm, repeated game over T rounds.
At round t,
Online Linear Minimization
ALGORITHM ADVERSARY
x(t) 2X
Current solution
`(t) 2 Rn;kr`(t)k¤ · ½Current linear objective
Loss vector
SETUP: Convex set X µ Rn, generic norm, repeated game over T rounds.
At round t,
Online Linear Minimization
ALGORITHM ADVERSARY
x(t) 2X
Current solution
`(t) 2 Rn;kr`(t)k¤ · ½Current linear objective
Loss vector
`(t)Tx(t)
Algorithm’s loss
SETUP: Convex set X µ Rn, generic norm, repeated game over T rounds.
At round t,
Online Linear Minimization
ALGORITHM ADVERSARY
x(t) 2X x(t) 2X`(t) 2 Rn;kr`(t)k¤ · ½
x(t+1) 2X
Updated solution
SETUP: Convex set X µ Rn, generic norm, repeated game over T rounds.
At round t,
Online Linear Minimization
ALGORITHM ADVERSARY
x(t) 2X x(t) 2X`(t) 2 Rn;kr`(t)k¤ · ½
x(t+1) 2X `(t+1) 2 Rn;kr`(t)k¤ · ½
Updated solution New Loss Vector
L̂
SETUP: Convex set X µ Rn, generic norm, repeated game over T rounds.
At round t,
Online Linear Minimization
ALGORITHM ADVERSARY
x(t) 2X x(t) 2X`(t) 2 Rn;kr`(t)k¤ · ½
x(t+1) 2X `(t+1) 2 Rn;kr`(t)k¤ · ½
GOAL: update x(t) to minimize regret
Average Algorithm’s Loss A Posteriori Optimum
1
T¢TX
t=1
`(t)TxT ¡min
x2X
1
T¢TX
t=1
`(t)
i
T
x
L¤
p(t)ALGORITHM ADVERSARY
distribution over experts
Simplex Case: Learning with Experts SETUP: Simplex X µ Rn under ℓ1 norm. At round t,
p(t)ALGORITHM ADVERSARY
distribution over dimensions
i.e. experts
Simplex Case: Learning with Experts SETUP: Simplex X µ Rn under ℓ1 norm. At round t,
k`(t)k1 · ½
Experts’ losses
p(t)ALGORITHM ADVERSARY
distribution over experts
Simplex Case: Learning with Experts SETUP: Simplex X µ Rn under ℓ1 norm. At round t,
k`(t)k1 · ½
Experts’ losses
EiÃp(t)h`(t)
i
i= p(t)
T`(t)
Algorithm’s loss
p(t)ALGORITHM ADVERSARY
distribution over experts
Simplex Case: Learning with Experts SETUP: Simplex X µ Rn under ℓ1 norm. At round t,
k`(t)k1 · ½
Experts’ losses
p(t+1)
Update distribution
Simplex Case: Multiplicative Weight Updates
p(t)
ALGORITHM ADVERSARY
`(t)
w(t+1)
i = (1¡ ²)`(t)
i w(t)
i ; w1 = ~1Weights:
Simplex Case: Multiplicative Weight Updates
p(t)
ALGORITHM ADVERSARY
`(t)
w(t+1)
i = (1¡ ²)`(t)
i w(t)
i ; w1 = ~1Weights:
p(t+1)
i =w(t)
iPn
j=1w(t)
j
Distribution:
p(t)
ALGORITHM ADVERSARY
`(t)
w(t+1)
i = (1¡ ²)`(t)
i w(t)
i ; w1 = ~1Weights:
p(t+1)
i =w(t)
iPn
j=1w(t)
j
Distribution:
MULTIPLICATIVE WEIGHT UPDATE
Simplex Case: Multiplicative Weight Updates
p(t)
ALGORITHM ADVERSARY
`(t)
w(t+1)
i = (1¡ ²)`(t)
i w(t)
i ; w1 = ~1Weights:
p(t+1)
i =w(t)
iPn
j=1w(t)
j
Distribution:
Simplex Case: Multiplicative Weight Updates
² 2 (0; 1)0 1
CONSERVATIVE AGGRESSIVE
MWUs: Unraveling the Update
p(t)
ALGORITHM ADVERSARY
`(t)
WEIGHT
CUMULATIVE LOSS
(1¡ ²)
Pt`(t)
i
p(t+1)
i / w(t+1)
i = (1¡ ²)`(t)
i ¢w(t)iUpdate:
w(t+1)
i
Pt `(t)
i
For and
MWUs: Regret Bound
p(t)
ALGORITHM ADVERSARY
`(t)
L̂¡L? · ½ logn
²T+ ½²
k`(t)k1 · ½² < 12
p(t+1)
i / w(t+1)
i = (1¡ ²)`(t)
i ¢w(t)iUpdate:
For and
MWUs: Regret Bound
p(t)
ALGORITHM ADVERSARY
`(t)
L̂¡L? · ½ logn
²T+ ½²
² < 12
p(t+1)
i / w(t+1)
i = (1¡ ²)`(t)
i ¢w(t)iUpdate:
Algorithm’s
Regret
Start-up Penalty Penalty for
being greedy
k`(t)k1 · ½
ONLINE LINEAR OPTIMIZATION BEYOND MWUs
A REGULARIZATION FRAMEWORK
MWUs: Proof Sketch of Regret Bound
©(t+1) = log1¡²Pn
i=1w(t+1)
i
p(t+1)
i / w(t+1)
i = (1¡ ²)
Pt
s=1`(s)
iUpdate:
• Proof is potential function argument
©(t+1) = log1¡²Pn
i=1w(t+1)
i
p(t+1)
i / w(t+1)
i = (1¡ ²)
Pt
s=1`(s)
iUpdate:
• Proof is potential function argument
• Potential function bounds loss of best expert
©(t+1) · log1¡²minni=1w
(t+1)
i =minni=1
³Pt
s=1 `(s)
i
´
MWUs: Proof Sketch of Regret Bound
©(t+1) = log1¡²Pn
i=1w(t+1)
i
p(t+1)
i / w(t+1)
i = (1¡ ²)
Pt
s=1`(s)
iUpdate:
• Proof is potential function argument
• Potential function bounds loss of best expert
• Potential function is related to algorithm’s performance
©(t+1) · log1¡²minni=1w
(t+1)
i =minni=1
³Pt
s=1 `(s)
i
´
©(t+1) ¡©(t) ¸³`(t)
Tp(t)´¡ ²
MWUs: Proof Sketch of Regret Bound
©(t+1) = log1¡²Pn
i=1w(t+1)
i
p(t+1)
i / w(t+1)
i = (1¡ ²)
Pt
s=1`(s)
iUpdate:
• Proof is potential function argument
• Potential function bounds loss of best expert
• Potential function is related to algorithm’s performance
©(t+1) · log1¡²minni=1w
(t+1)
i =minni=1
³Pt
s=1 `(s)
i
´
©(t+1) ¡©(t) ¸³`(t)
Tp(t)´¡ ²
DOES THIS PROOF TECHNIQUE GENERALIZE TO BEYOND SIMPLEX CASE?
MWUs: Proof Sketch of Regret Bound
MWUs AND APPLICATIONS
Designing a Regularized Update GOAL: Design an update and its potential function analysis
QUESTION: Choice of potential function?
DESIDERATA: 1) lower bounds best expert’s loss
2) tracks algorithm’s performance
MWUs AND APPLICATIONS
QUESTION: Choice of potential function?
DESIDERATA: 1) lower bounds best expert’s loss
2) tracks algorithm’s performance
Attempt 1 – FOLLOW THE LEADER: Cumulative loss
L(t) =Pt
s=1 `(s)
x(t+1) = argminx2X
xTL(t) ©(t+1) = minx2X
xTL(t)
Pick best current solution Potential is current best loss
Designing a Regularized Update
MWUs AND APPLICATIONS
QUESTION: Choice of potential function?
DESIDERATA: 1) lower bounds best expert’s loss
2) tracks algorithm’s performance
Attempt 1 – FOLLOW THE LEADER: Cumulative loss
L(t) =Pt
s=1 `(s)
x(t+1) = argminx2X
xTL(t) ©(t+1) = minx2X
xTL(t)
Pick best current solution Potential is current best loss
Designing a Regularized Update
MWUs AND APPLICATIONS
QUESTION: Choice of potential function?
DESIDERATA: 1) lower bounds best expert’s loss
2) tracks algorithm’s performance
Attempt 1 – FOLLOW THE LEADER: Cumulative loss
L(t) =Pt
s=1 `(s)
x(t+1) = argminx2X
xTL(t) ©(t+1) = minx2X
xTL(t)
Pick best current solution Potential is current best loss
Designing a Regularized Update
Fails if best expert changes moves drastically
MWUs AND APPLICATIONS
QUESTION: Choice of potential function?
DESIDERATA: 1) lower bounds best expert’s loss
2) tracks algorithm’s performance
Attempt 1 – FOLLOW THE LEADER: Cumulative loss
L(t) =Pt
s=1 `(s)
x(t+1) = argminx2X
xTL(t)
©(t+1) = minx2X
xTL(t)
Designing a Regularized Update
How to make update
more stable?
MWUs AND APPLICATIONS
QUESTION: Choice of potential function?
DESIDERATA: 1) lower bounds best expert’s loss
2) tracks algorithm’s performance
Attempt 2 – FOLLOW THE REGULARIZED LEADER:
x(t+1) = argminx2X
xTL(t) + ´ ¢F(x)
©(t+1) = minx2X
xTL(t) + ´ ¢F(x)
Properties of Regularizer F(x):
1. Convex, differentiable
2. ¾-strong convex w.r.t. norm
Parameter ´ ¸ 0, TBD
Regularized Update: Definition
MWUs AND APPLICATIONS
QUESTION: Choice of potential function?
DESIDERATA: 1) lower bounds best expert’s loss
2) tracks algorithm’s performance
Attempt 2 – FOLLOW THE REGULARIZED LEADER:
x(t+1) = argminx2X
xTL(t) + ´ ¢F(x)
©(t+1) = minx2X
xTL(t) + ´ ¢F(x)
Properties of Regularizer F(x):
1. Convex, differentiable
2. ¾-strong convex w.r.t. norm
Parameter ´ ¸ 0, TBD
Regularized Update: Definition
These properties are actually sufficient to get a regret bound
MWUs AND APPLICATIONS
QUESTION: Choice of potential function?
DESIDERATA: 1) lower bounds best expert’s loss
2) tracks algorithm’s performance
Attempt 2 – FOLLOW THE REGULARIZED LEADER:
x(t+1) = argminx2X
xTL(t) + ´ ¢F(x)
©(t+1) = minx2X
xTL(t) + ´ ¢F(x)
Properties of Regularizer F(x):
1. Convex, differentiable
2. ¾-strong convex w.r.t. norm
Parameter ´ ¸ 0, TBD
Regularized Update: Analysis
©(t+1) · minx2X
L(t)Tx+ ´ ¢max
x2XF(x)
MWUs AND APPLICATIONS
QUESTION: Choice of potential function?
DESIDERATA: 1) lower bounds best expert’s loss
2) tracks algorithm’s performance
Attempt 2 – FOLLOW THE REGULARIZED LEADER:
x(t+1) = argminx2X
xTL(t) + ´ ¢F(x)
©(t+1) = minx2X
xTL(t) + ´ ¢F(x)
Properties of Regularizer F(x):
1. Convex, differentiable
2. ¾-strong convex w.r.t. norm
Parameter ´ ¸ 0, TBD
Regularized Update: Analysis
©(t+1) · minx2X
L(t)Tx+ ´ ¢max
x2XF(x) Regularization
error
MWUs AND APPLICATIONS
QUESTION: Choice of potential function?
DESIDERATA: 1) lower bounds best expert’s loss
2) tracks algorithm’s performance
Attempt 2 – FOLLOW THE REGULARIZED LEADER:
x(t+1) = argminx2X
xTL(t) + ´ ¢F(x)
©(t+1) = minx2X
xTL(t) + ´ ¢F(x)
Properties of Regularizer F(x):
1. Convex, differentiable
2. ¾-strong convex w.r.t. norm
Parameter ´ ¸ 0, TBD
Regularized Update: Analysis
?
f(t+1)(x)
Tracking the Algorithm: Proof by Picture
f(t+1)(x) = xTL(t) + ´ ¢ F(x)
f(t)(x)
x
f(t+1)(x)
x(t) x(t+1)
Define:
©(t+1)
©(t)
Define:
©(t+1)
©(t)
Notice:
f(t+1)(x)¡ f(t)(x) = `(t)Tx Latest loss vector
Tracking the Algorithm: Proof by Picture
f(t+1)(x) = xTL(t) + ´ ¢ F(x)
f(t)(x)
x
f(t+1)(x)
x(t) x(t+1)
Define:
©(t+1)
©(t)
Notice:
f(t+1)(x)¡ f(t)(x) = `(t)Tx Latest loss vector
`(t)Tx(t)
Tracking the Algorithm: Proof by Picture
f(t+1)(x) = L(t)Tx+ ´ ¢ F(x)
f(t)(x)
x
f(t+1)(x)
x(t) x(t+1)
Compare:
©(t+1)
©(t)
and ©(t+1) ¡©(t)
Tracking the Algorithm: Proof by Picture
`(t)Tx(t)
f(t)(x)
x
f(t+1)(x)
x(t) x(t+1)
`(t)Tx(t)
f(t)(x)
f(t+1)(x)
p
Want:
©(t+1)
©(t)
Tracking the Algorithm: Proof by Picture
f(t+1)(x(t)) ¼ f(t+1)(x(t+1))
`(t)Tx(t)
xx(t) x(t+1)
f(t)(x)
f(t+1)(x)
©(t+1) ¡©(t) = f(t+1)(x(t+1))¡ f(t+1)(x(t)) + `(t)Tx(t)
Regularization in Action
©(t+1)
©(t)
f (t) is (´ ¢ ¾ )-strongly-convex REGULARIZATION
f(t+1)(x) = L(t)Tx+ ´ ¢ F(x)
`(t)Tx(t)
xx(t) x(t+1)
f(t)(x)
f(t+1)(x)
`(t)
Regularization in Action
©(t+1)
©(t)
f (t) is (´ ¢ ¾ )-strongly-convex REGULARIZATION
kf(t+1) ¡ f(t)k¤ = k`(t)k¤ jjx(t+1) ¡ x(t)jj ·jj`(t)jj¤´¢¾
STABILITY
`(t)Tx(t)
xx(t) x(t+1)
f(t)(x)
f(t+1)(x)
f(t+1)(x) = L(t)Tx+ ´ ¢ F(x)
`(t)
Regularization in Action
©(t+1)
©(t)
f (t) is (´ ¢ ¾ )-strongly-convex REGULARIZATION
kf(t+1) ¡ f(t)k = k`(t)k jjx(t+1) ¡ x(t)jj¤ ·jj`(t)jj´¢¾
STABILITY
`(t)Tx(t)
xx(t) x(t+1)
f(t)(x)
f(t+1)(x)
f(t+1)(x) = L(t)Tx+ ´ ¢ F(x)
Quadratic
lower bound
to f(t+1)
MWUs AND APPLICATIONS
Analysis: Progress in One Iteration
rf(t+1)(x(t)) = `(t) jjx(t) ¡ x(t)jj ·jj`(t)jj¤´¢¾
f (t+1) is (´ ¢ ¾)-strongly-convex
©(t+1) ¡©(t) = f(t+1)(x(t+1))¡ f(t+1)(x(t)) + `(t)Tx(t)
f(t+1)(x(t+1))¡ f(t+1)(x(t)) ¸ `(t)T(x(t+1) ¡ x(t)) +
jj`(t)jj2¤2´ ¢ ¾
MWUs AND APPLICATIONS
Analysis: Progress in One Iteration
rf(t+1)(x(t)) = `(t)
f(t+1)(x(t+1))¡ f(t+1)(x(t)) ¸ `(t)T(x(t+1) ¡ x(t)) +
jj`(t)jj2¤2´ ¢ ¾
f (t+1) is (´ ¢ ¾)-strongly-convex
©(t+1) ¡©(t) = f(t+1)(x(t+1))¡ f(t+1)(x(t)) + `(t)Tx(t)
¸ ¡k`(t)k¤kx(t+1) ¡ x(t)k+ jj`(t)jj¤2´ ¢ ¾ ¸ ¡k`
(t)k2¤2´ ¢ ¾
jjx(t) ¡ x(t)jj ·jj`(t)jj¤´¢¾
MWUs AND APPLICATIONS
Completing the Analysis
©(t+1) ¡©(t) ¸ `(t)Tx(t) ¡ k`(t)k¤
2¾´Regret at iteration t
Progress in one iteration:
MWUs AND APPLICATIONS
Completing the Analysis
©(t+1) ¡©(t) ¸ `(t)Tx(t) ¡ k`(t)k¤
2¾´
Progress in one iteration:
Telescopic sum:
©(T+1) ¸TX
t=1
`(t)Tp(t) +©(1) ¡ T ¢ jj`
(t)jj2´ ¢ ¾
MWUs AND APPLICATIONS
Completing the Analysis
©(t+1) ¡©(t) ¸ `(t)Tx(t) ¡ k`(t)k¤
2¾´
Progress in one iteration:
Telescopic sum:
©(T+1) ¸TX
t=1
`(t)Tp(t) +©(1) ¡ T ¢ jj`
(t)jj2´ ¢ ¾
Final regret bound:
1
T
ÃTX
t=1
`(t)Tx(t) ¡min
x2X
TX
t=1
`(t)Tx
!·
´
T¢ (maxx2X
F (x)¡minx2X
F (x)) +½2
2¾´
MWUs AND APPLICATIONS
Completing the Analysis
Regret bound: with regularizer F and
jj`(t)jj¤ · ½
Start-up Penalty Penalty for
being greedy
SAME TYPE OF BOUND AS FOR MWUs
1
T
ÃTX
t=1
`(t)Tx(t) ¡min
x2X
TX
t=1
`(t)Tx
!·
´
T¢ (maxx2X
F (x)¡minx2X
F (x)) +½2
2¾´
MWUs AND APPLICATIONS
Reinterpreting MWUs
©(t+1) = minp¸0;Ppi=1
pTL(t) + ´ ¢nX
i=1
pi logpiPotential function:
Regularizer: is negative entropy F (p) =
nX
i=1
pi log pi
MWUs AND APPLICATIONS
Reinterpreting MWUs
©(t+1) = minp¸0;Ppi=1
pTL(t) + ´ ¢nX
i=1
pi logpiPotential function:
Regularizer: is negative entropy
F (p ) is 1-strongly-convex w.r.t.
Update:
F (p) =
nX
i=1
pi log pi
k ¢ k1
p(t+1) = arg minp¸0;Ppi=1
pTL(t) + ´ ¢nX
i=1
pi logpi
p(t+1)
i =e¡
1´L(t)
i
Pn
i=1 e¡ 1´L(t)
i
=(1¡ ²)L
(t)
i
Pn
i=1(1¡ ²)L(t)
i
:
SOFT-MAX
MWUs AND APPLICATIONS
Reinterpreting MWUs
©(t+1) = minp¸0;Ppi=1
pTL(t) + ´ ¢nX
i=1
pi logpiPotential function:
Regularizer: is negative entropy
F (p ) is 1-strongly-convex w.r.t.
Update:
F (p) =
nX
i=1
pi log pi
k ¢ k1
p(t+1) = arg minp¸0;Ppi=1
pTL(t) + ´ ¢nX
i=1
pi logpi
p(t+1)
i =e¡
1´L(t)
i
Pn
i=1 e¡ 1´L(t)
i
=(1¡ ²)L
(t)
i
Pn
i=1(1¡ ²)L(t)
i
:
MWUs AND APPLICATIONS
Beyond MWUs: which regularizer?
Regret bound: optimizing over ́
Best choice of regularizer and norm minimizes
maxt jj`(t)jj2¤ ¢ (maxx2X F (x)¡minx2X F (x))
¾
1
T
ÃTX
t=1
`(t)Tx(t) ¡min
x2X
TX
t=1
`(t)Tx
!·½p(2 ¢ (maxx2X F (x)¡minx2X F (x))p
¾T
MWUs AND APPLICATIONS
Beyond MWUs: which regularizer?
Regret bound: optimizing over ́
Best choice of regularizer and norm minimizes
maxt jj`(t)jj2¤ ¢ (maxx2X F (x)¡minx2X F (x))
¾
1
T
ÃTX
t=1
`(t)Tx(t) ¡min
x2X
TX
t=1
`(t)Tx
!·½p(2 ¢ (maxx2X F (x)¡minx2X F (x))p
¾T
Negative entropy with -norm is approximately optimal for simplex
QUESTION: are other regularizers ever useful?
`1
QUESTION 1:
Are other regularizers, besides entropy, ever useful?
YES! Applications:
Graph Partitioning and Random Walks
Spectral algorithms for balanced separator running in time
Uses random-walk framework and SDP MWUs
Different walks correspond to different regularizers for eigenvector problem
[Mahoney, Orecchia, Vishnoi 2011], [Orecchia, Sachdeva, Vishnoi 2012]
Different Regularizers in Algorithm Design
F(X) = Tr(X1=2)
F(X) = Tr(Xp)
F(X) = Tr(X logX)SDP MWU
p-norm, 1 · p · 1
NEW REGULARIZER
Heat Kernel Random Walk
Lazy Random Walk
Personalized PageRank
~O(m)
QUESTION 1:
Are other regularizers, besides entropy, ever useful?
YES! Applications:
Graph Partitioning and Random Walks
Sparsification
²-spectral-sparsifiers with edges
Uses Matrix concentration bound equivalent to SDP MWUs
[Spielman, Srivastava 2008]
²-spectral-sparsifiers with edges
Can be interpreted as different regularizer:
[Batson, Spielman, Srivastava 2009]
Different Regularizers in Algorithm Design
O(n logn²2
)
O( n²2)
F(X) = Tr(X1=2)
QUESTION 1:
Are other regularizers, besides entropy, ever useful?
YES! Applications:
Graph Partitioning and Random Walks
Sparsification
Many more in Online Learning
Bandit Online Learning [AHR], …
Different Regularizers in Algorithm Design
NON-SMOOTH CONVEX OPTIMIZATION
REDUCES TO
ONLINE LINEAR OPTIMIZATION
Convex Optimization Setup
8x 2 X;
krf(x)k¤ · ½
8x; y 2 X;
krf(y)¡rf(x)k¤ · Lky ¡ xk
f convex, differentiable
X µ Rn closed, convex set
minx2X
f(x)
NON-SMOOTH SMOOTH
½-Lipschitz continuous ½-Lipschitz continuous gradient
Convex Optimization Setup
8x 2 X;
krf(x)k¤ · ½
8x; y 2 X;
krf(y)¡rf(x)k¤ · Lky ¡ xk
f convex, differentiable
X µ Rn closed, convex set
minx2X
f(x)
NON-SMOOTH SMOOTH
½-Lipschitz continuous ½-Lipschitz continuous gradient
Gradient step is guaranteed to decrease
function value
f(x(t+1)) · f(x(t))¡ krf(x(t))k2¤2L
Convex Optimization Setup
8x 2 X;
krf(x)k¤ · ½
8x; y 2 X;
krf(y)¡rf(x)k¤ · Lky ¡ xk
f convex, differentiable
X µ Rn closed, convex set
minx2X
f(x)
NON-SMOOTH SMOOTH
½-Lipschitz continuous ½-Lipschitz continuous gradient
Gradient step is guaranteed to decrease
function value
f(x(t+1)) · f(x(t))¡ krf(x(t))k2¤2L
x(t)x(t+1)
NO GRADIENT STEP GUARANTEE
Convex Optimization Setup
8x 2 X;
krf(x)k¤ · ½
8x; y 2 X;
krf(y)¡rf(x)k¤ · Lky ¡ xk
f convex, differentiable
X µ Rn closed, convex set
minx2X
f(x)
NON-SMOOTH SMOOTH
½-Lipschitz continuous ½-Lipschitz continuous gradient
Gradient step is guaranteed to decrease
function value
f(x(t+1)) · f(x(t))¡ krf(x(t))k2¤2L
x(t)x(t+1)
NO GRADIENT STEP GUARANTEE
ONLY DUAL GUARANTEE
Non-Smooth Setup: Dual Approach
8x 2X; krf(x)k¤ · ½
f convex, differentiable
X µ Rn closed, convex set
minx2X
f(x)
½-Lipschitz continuous
x(t)x(t+1) x(t+2)
APPROACH: Each iterate solution provides a lower bound and an upper bound
f(x(t)) ¸ f(x¤)
f(x¤) ¸ f(x(t)) +rf(x(t)T (x¤ ¡ x(t))
Non-Smooth Setup: Dual Approach
8x 2X; krf(x)k¤ · ½
f convex, differentiable
X µ Rn closed, convex set
minx2X
f(x)
½-Lipschitz continuous
x(t)x(t+1) x(t+2)
APPROACH: Each iterate solution provides a lower bound and an upper bound
f(x(t)) ¸ f(x¤)
f(x¤) ¸ f(x(t)) +rf(x(t)T (x¤ ¡ x(t))
CAN WEAKEN DIFFERENTIABILITY ASSUMPTION: SUBGRADIENTS SUFFICE
Non-Smooth Setup: Dual Approach
x(t)x(t+1) x(t+2)
APPROACH: Each iterate solution provides a lower bound and an upper bound
f(x(t)) ¸ f(x¤)
f(x¤) ¸ f(x(t)) +rf(x(t)T (x¤ ¡ x(t))
Take convex combination of both upper bounds and lower bounds with weights °t
UPPER BOUND:
LOWER BOUND:
1PT
t=1°t
³PT
t=1 °tf(x(t))´¸ f(x¤)
UPPER
Non-Smooth Setup: Dual Approach
x(t)x(t+1) x(t+2)
APPROACH: Each iterate solution provides a lower bound and an upper bound
f(x(t)) ¸ f(x¤)
f(x¤) ¸ f(x(t)) +rf(x(t))T (x¤ ¡ x(t))
Take convex combination of both upper bounds and lower bounds with weights °t
UPPER:
LOWER :
1PT
t=1°t
³PT
t=1 °tf(x(t))´¸ f(x¤)
UPPER
f(x¤) ¸ 1PT
t=1°t
hPT
t=1 °t(f(x(t)) +rf(x(t))T (x¤ ¡ x(t)))
i
LOWER
Non-Smooth Setup: Dual Approach
x(t)x(t+1) x(t+2)
APPROACH: Each iterate solution provides a lower bound and an upper bound
f(x(t)) ¸ f(x¤)
f(x¤) ¸ f(x(t)) +rf(x(t))T (x¤ ¡ x(t))
Take convex combination of both upper bounds and lower bounds with weights °t
UPPER:
LOWER :
1PT
t=1°t
³PT
t=1 °tf(x(t))´¸ f(x¤)
UPPER
f(x¤) ¸ 1PT
t=1°t
hPT
t=1 °t(f(x(t)) +rf(x(t))T (x¤ ¡ x(t)))
i
LOWER HOW TO UPDATE ITERATES?
HOW TO CHOSE WEIGHTS?
Reduction to Online Linear Minimization
Fix weights °t to be uniform for simplicity:
UPPER:
LOWER :
DUALITY GAP:
1PT
t=1°t
³PT
t=1 °tf(x(t))´¸ f(x¤)
f(x¤) ¸ 1PT
t=1°t
hPT
t=1 °t(f(x(t)) +rf(x(t))T (x¤ ¡ x(t)))
i
·PT
t=1°tPT
t=1°tf(x(t))
¸¡ f(x¤) ·
PT
t=1¡rf(x(t))T(x¤ ¡ x(t))
LINEAR FUNCTION
Reduction to Online Linear Minimization
Fix weights °t to be uniform for simplicity:
DUALITY GAP: ·PT
t=1°tPT
t=1°tf(x(t))
¸¡ f(x¤) ·
PT
t=1¡rf(x(t))T(x¤ ¡ x(t))
ALGORITHM ADVERSARY
x(t) 2X ¡rf(x(t))
ONLINE SETUP
Reduction to Online Linear Minimization
Fix weights °t to be uniform for simplicity:
DUALITY GAP: ·PT
t=1°tPT
t=1°tf(x(t))
¸¡ f(x¤) ·
PT
t=1¡rf(x(t))T(x¤ ¡ x(t))
ALGORITHM ADVERSARY
x(t) 2X `(t) =¡rf(x(t))
ONLINE SETUP
Recall that by assumption: Loss vector is gradient
k`(t)k¤ = krf(x(t))k¤ · ½
Reduction to Online Linear Minimization
Fix weights °t to be uniform for simplicity:
DUALITY GAP: hPT
t=11Tf(x(t))
i¡ f(x¤) · 1
T¢PT
t=1¡rf(x(t))T(x¤ ¡ x(t))
ALGORITHM ADVERSARY
x(t) 2X `(t) =¡rf(x(t))
ONLINE SETUP
Recall that by assumption: Loss vector is gradient
k`(t)k¤ = krf(x(t))k¤ · ½
1
T¢TX
t=1
¡rf(x(t))T (x¤ ¡ x(t)) = REGRET
Final Bound
ALGORITHM ADVERSARY
x(t) 2X `(t) =¡rf(x(t))
ONLINE SETUP
Recall that by assumption: Loss vector is gradient
k`(t)k¤ = krf(x(t))k¤ · ½
TX
t=1
¡rf(x(t))T (x¤ ¡ x(t)) = REGRET
²MD ·½p2 ¢ (maxx2X F (x)¡minx2X F (x))
¾pT
RESULTING ALGORITHM: MIRROR DESCENT
Error bound with ¾-strongly-convex regularizer F
Final Bound
ALGORITHM ADVERSARY
x(t) 2X `(t) =¡rf(x(t))
ONLINE SETUP
Recall that by assumption: Loss vector is gradient
k`(t)k¤ = krf(x(t))k¤ · ½
TX
t=1
¡rf(x(t))T (x¤ ¡ x(t)) = REGRET
²MD ·½p2 ¢ (maxx2X F (x)¡minx2X F (x))
¾pT
RESULTING ALGORITHM: MIRROR DESCENT
Error bound with ¾-strongly-convex regularizer F
ASYMPTOTICALLY OPTIMAL BY INFORMATION COMPLEXITY LOWER BOUND
Non-Smooth Optimization over Simplex
²MD ·½p2 ¢ lognpT
RESULTING ALGORITHM:
MIRROR DESCENT OVER SIMPLEX = MWU
Regularizer F is negative entropy, with krf(x(t))k1 · ½
APPLICATIONS IN ALGORITHM DESIGN
Warm-up Example: Linear Programming
A 2 Rm£n;?9x 2 X : Ax¡ b ¸ 0
LP Feasibility problem
Easy constraints
Maintain feasible Hard constraints
Require fixing
Warm-up Example: Linear Programming
A 2 Rm£n;?9x 2 X : Ax¡ b ¸ 0
Convert into non-smooth optimization problem over simplex:
Non-differentiable objective:
LP Feasibility problem
minp2¢m
maxx2X
pT (b¡Ax)
f(p) = maxx2X
pT (b¡Ax)
Warm-up Example: Linear Programming
A 2 Rm£n;?9x 2 X : Ax¡ b ¸ 0
Convert into non-smooth optimization problem over simplex:
Non-differentiable objective:
LP Feasibility problem
minp2¢m
maxx2X
pT (b¡Ax)
f(p) = maxx2X
pT (b¡Ax)Best response to dual
solution p
Warm-up Example: Linear Programming
A 2 Rm£n;?9x 2 X : b¡Ax ¸ 0
Convert into non-smooth optimization problem over simplex:
Non-differentiable objective
Admits subgradients, for all p:
LP Feasibility problem
minp2¢m
maxx2X
pT (b¡Ax)
f(p) = maxx2X
pT (b¡Ax)
xp : pT (b¡Axp) ¸ 0;
(b¡Axp) 2 @f(p)Subgradient is slack
in constraints
Warm-up Example: Linear Programming
A 2 Rm£n;?9x 2 X : b¡Ax ¸ 0
Convert into non-smooth optimization problem over simplex:
Non-differentiable objective
Admits subgradients, for all p:
If we can pick xp such that , then
LP Feasibility problem
minp2¢m
maxx2X
pT (b¡Ax)
f(p) = maxx2X
pT (b¡Ax)
xp : pT (b¡Axp) ¸ 0;
(b¡Axp) 2 @f(p)
kb¡Axpk1 · ½
²MD ·½p2 ¢ lognpT
T ·2 ¢ ½2 ¢ logn
²2
Minaximum flow feasibility for value F over undirected graph G with incidence matrix B:
Turn into non-smooth minimization problem over simplex:
MWU and s-t Maxflow
8e 2 E;F ¢ jfejce
· 1
BT f = es ¡ et
f(p) = minBT f=es¡et
X
e2Epe ¢
F ¢ jfejce
¡ 1
Will enforce this
Best response fp is shortest s-t path with lengths pe / ce .
For any p, if fp has length > 1, there is no subgradient, i.e. problem is infeasible.
Otherwise, the following is a subgradient
Unfortunately, width can be large
@f(p)e =F ¢ j(fp)ej
ce¡ 1
k@f(p)ek1 ·F
cmin
[PST 91] T = O
³F logn
²2cmin
´
PROBLEM: Optimal for this specific formulation
SOLUTION: Regularize primal
Width Reduction: make function nicer
x(t)x(t+1) x(t+2)
k@f(p)ek1 ·F
cmin
f(p) = minBTf=es¡et
F ¢X
e2E
fe
ce
³pe +
²
m
´¡ 1
NEED PRIMAL ARGUMENT
PROBLEM: Optimal for this specific formulation
SOLUTION: Regularize primal
REGULARIZATION ERROR:
NEW WIDTH:
ITERATION BOUND:
Width Reduction: make primal nicer
k@f(p)ek1 ·F
cmin
f(p) = minBTf=es¡et
F ¢X
e2E
fe
ce
³pe +
²
m
´¡ 1
²F
k@f(p)ek1 ·m
²
[GK 98] T = O
³m logn
²2
´
Electrical Flow Approach [CKMST]
8e 2 E;F ¢ f2e
c2e· 1
BT f = es ¡ et Will enforce this
Different formulation yields basis for CKMST algorithm:
Non-smooth optimization problem:
f(p) = minBT f=es¡et
X
e2Epe ¢
F ¢ f2ec2e
¡ 1
Electrical Flow Approach [CKMST]
8e 2 E;F ¢ f2e
c2e· 1
BT f = es ¡ et Will enforce this
Different formulation yields basis for CKMST algorithm:
Non-smooth optimization problem:
Original width:
f(p) = minBT f=es¡et
X
e2Epe ¢
F ¢ f2ec2e
¡ 1
Best response is electrical flow fp
k@f(p)ek1 · m
Electrical Flow Approach [CKMST]
8e 2 E;F ¢ f2e
c2e· 1
BT f = es ¡ et Will enforce this
Different formulation yields basis for CKMST algorithm:
Non-smooth optimization problem:
Regularize primal:
f(p) = minBT f=es¡et
X
e2Epe ¢
F ¢ f2ec2e
¡ 1
f(p) = minBT f=es¡et
F ¢X
e2E
f2ec2e
³pe +
²
m
´¡ 1
k@f(p)ek1 ·
rm
²
Conclusion: Take-away messages
• Regularization is a powerful tool for the design of fast algorithms.
• Most iterative algorithms can be understood as regularized updates:
MWUs, Width Reduction, Interior Point, Gradient descent, ..
• Perform well in practice. Regularization also helps eliminate noise.
• ULTIMATE GOAL:
Development of a library of iterative methods for fast graph algorithms.
Regularization plays a fundamental role in this effort
THE END – THANK YOU