Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Prediction, Learning and Games - Chapter 4Randomized prediction

Walid Krichene

November 14, 2013

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 1 / 16

Experts framework

RecallOn iteration t: experts reveal their advice fi,t

forecaster makes a decision pt =∑N

i=1 wi,t fi,tthe losses are revealed `(fi,t , yt) and `(pt , yt)

forecaster updates weights wi,t+1

`(·, yt) is convex

DefinitionRegret

Ri,T = LT − Li,T =T∑

t=1

`(pt , yt)− `(fi,t , yt)

Goal: RTT = o(1)


Experts framework

RecallOn iteration t: experts reveal their advice fi,tforecaster makes a decision pt =

∑Ni=1 wi,t fi,t

the losses are revealed `(fi,t , yt) and `(pt , yt)


`(·, yt) is convex

DefinitionRegret


t=1

`(pt , yt)− `(fi,t , yt)

Goal: RTT = o(1)


Experts framework


∑Ni=1 wi,t fi,t



`(·, yt) is convex

DefinitionRegret


t=1

`(pt , yt)− `(fi,t , yt)

Goal: RTT = o(1)


Experts framework


∑Ni=1 wi,t fi,t



`(·, yt) is convex

DefinitionRegret


t=1

`(pt , yt)− `(fi,t , yt)

Goal: RTT = o(1)


Experts framework


∑Ni=1 wi,t fi,t



`(·, yt) is convex

DefinitionRegret


t=1

`(pt , yt)− `(fi,t , yt)

Goal: RTT = o(1)


Experts framework


∑Ni=1 wi,t fi,t



`(·, yt) is convex

DefinitionRegret


t=1

`(pt , yt)− `(fi,t , yt)

Goal: RTT = o(1)


Multiplicative weight algorithms

DefinitionHedge algorithm

wi,t+1 ∝ wi,t exp(−γ`(fi,t , yt))

Average regret RTT ≤

ln NγT + γ

8

taking γt = O(√

ln NT ) yields RT

T ≤ O( ln NT ) + O( ln N

T )

small losses: RTT ≤

1T

(γ

1−e−γ − 1)

L∗i + 1T

ln N1−e−γ






ln NγT + γ

8

taking γt = O(√

ln NT ) yields RT


T )


1T

(γ

1−e−γ − 1)

L∗i + 1T

ln N1−e−γ






ln NγT + γ

8

taking γt = O(√

ln NT ) yields RT


T )


1T

(γ

1−e−γ − 1)

L∗i + 1T

ln N1−e−γ






ln NγT + γ

8

taking γt = O(√

ln NT ) yields RT


T )


1T

(γ

1−e−γ − 1)

L∗i + 1T

ln N1−e−γ


Motivation for randomization

What if the decision set is non-convex? E.g. 0, 1Also: any deterministic algorithm can incur Ω(n) loss.solution (to both): Randomize


Setting

Actions 1, 2, . . . ,N = [N]

forecaster maintains a probability distribution (pi,t)i∈[N]

randomly picks an action It ∼ pt

losses are revealed `(i , yt)

sequence y1, . . . , yT can be fixed a priori (oblivious opponent) or can dependon player’s decisions.


Setting

Actions 1, 2, . . . ,N = [N]






Setting

Actions 1, 2, . . . ,N = [N]






Setting

Actions 1, 2, . . . ,N = [N]






Setting

Actions 1, 2, . . . ,N = [N]






Regret

DefinitionExpected loss (conditioned on past plays) ¯(pt ,Yt) =

∑Ni=1 `(i ,Yt)pi,t

DefinitionExpected regret

Ri,T =T∑

t=1

¯(pt ,Yt)− `(i ,Yt)


Regret

Note: pt+1 only depends on pt and `(i ,Yt) (not on forecaster randomization)Strategy: bound expected regret

RT

T≤ B(T )

then with high probability (≥ 1− δ)

RT

T≤ B(T ) +

√− ln δ

T


Regret

Note: pt+1 only depends on pt and `(i ,Yt) (not on forecaster randomization)

Strategy: bound expected regret

RT

T≤ B(T )


RT

T≤ B(T ) +

√− ln δ

T


Regret

Note: pt+1 only depends on pt and `(i ,Yt) (not on forecaster randomization)Strategy: bound expected regret

RT

T≤ B(T )


RT

T≤ B(T ) +

√− ln δ

T


Back to experts

Can think of this asStrategy space is simplex on actions: ∆N

Every expert has constant advice: fi,t is the i-th vertex of the simplexdecision of forecaster is convex combination of verticesloss function is `(·,Yt) is linear: (expected) loss incurred by forecaster is∑

i

pt,i`(fi,t ,Yt) = `(∑

i

pt,i fi,t ,Yt) = `(pt ,Yt)

write the expected loss as 〈`t , pt〉 where `t,i = `(fi,t ,Yt).


Back to experts


Every expert has constant advice: fi,t is the i-th vertex of the simplex

decision of forecaster is convex combination of verticesloss function is `(·,Yt) is linear: (expected) loss incurred by forecaster is∑

i

pt,i`(fi,t ,Yt) = `(∑

i




Back to experts


Every expert has constant advice: fi,t is the i-th vertex of the simplexdecision of forecaster is convex combination of vertices

loss function is `(·,Yt) is linear: (expected) loss incurred by forecaster is∑i

pt,i`(fi,t ,Yt) = `(∑

i




Back to experts


Every expert has constant advice: fi,t is the i-th vertex of the simplexdecision of forecaster is convex combination of verticesloss function is `(·,Yt) is linear: (expected) loss incurred by forecaster is∑

i

pt,i`(fi,t ,Yt) = `(∑

i




Hedge algorithm

Regularized greedy algorithmHedge is solution to

pt+1 = arg minp∈∆N

〈`t , p〉+1γ

DKL(p||pt)

where DKL is the Kullback-Leibler divergence DKL(p, q) =∑

i lnpiqi

pi

Alsopt+1 = arg min

p∈∆N〈Lt , p〉 −

1γ

H(p)

where H is the entropy H(p) = −∑

i pi ln pi

Connection with stochastic optimization (last week)Greedy strategy called fictitious play


Hedge algorithm

Regularized greedy algorithmHedge is solution to

pt+1 = arg minp∈∆N

〈`t , p〉+1γ

DKL(p||pt)

where DKL is the Kullback-Leibler divergence DKL(p, q) =∑

i lnpiqi

pi

Alsopt+1 = arg min

p∈∆N〈Lt , p〉 −

1γ

H(p)

where H is the entropy H(p) = −∑

i pi ln pi

Connection with stochastic optimization (last week)Greedy strategy called fictitious play


Follow the perturbed leader

Regularized problem called: follow the perturbed leader

They prove it in a more general case

Theorem (Theorem 4.2)

It = argmini

Li,t−1 + Zi,t

Then

RT

T≤ 1

T

(Emax

iZi,1 + Emax

i−Zi,1 +

∑i

∫Ft(z)(fZ (z)− fZ (z − `t))

)

where Ft(z) = `(it(z), yt) and fZ is the density of Z .


Follow the perturbed leader

Regularized problem called: follow the perturbed leaderThey prove it in a more general case

Theorem (Theorem 4.2)

It = argmini

Li,t−1 + Zi,t

Then

RT

T≤ 1

T

(Emax

iZi,1 + Emax

i−Zi,1 +

∑i

∫Ft(z)(fZ (z)− fZ (z − `t))

)

where Ft(z) = `(it(z), yt) and fZ is the density of Z .


Internal regret

regret analyzed so far: external regret. Compare your cumulative loss tocumulative loss of a single expert / action

internal regret: Swap all instances of action i with j

Definition (Internal regret)

Ri,j,T =T∑

t=1

pi,t(`(i ,Yt)− `(j ,Yt))

internal regretmaxi,j

Ri,j,T


Internal regret

regret analyzed so far: external regret. Compare your cumulative loss tocumulative loss of a single expert / actioninternal regret: Swap all instances of action i with j


Ri,j,T =T∑

t=1

pi,t(`(i ,Yt)− `(j ,Yt))


Ri,j,T


Internal regret

regret analyzed so far: external regret. Compare your cumulative loss tocumulative loss of a single expert / actioninternal regret: Swap all instances of action i with j


Ri,j,T =T∑

t=1

pi,t(`(i ,Yt)− `(j ,Yt))


Ri,j,T


Internal regret

In a sense, stronger than external regret:

∑i

Ri,j,T =T∑

t=1

∑i

pt,i (`t,i − `t,j)

=T∑

t=1

〈pt , `t〉 − `t,j

= Rj,T

so if maxj Rj,T ≤ N maxi,j Ri,j,T

Weighted average forecaster has large internal regret. RT = Ω(T )

But, can adapt to have small internal regret


Internal regret


∑i

Ri,j,T =T∑

t=1

∑i

pt,i (`t,i − `t,j)

=T∑

t=1

〈pt , `t〉 − `t,j

= Rj,T





Internal regret


∑i

Ri,j,T =T∑

t=1

∑i

pt,i (`t,i − `t,j)

=T∑

t=1

〈pt , `t〉 − `t,j

= Rj,T





Internal regret


∑i

Ri,j,T =T∑

t=1

∑i

pt,i (`t,i − `t,j)

=T∑

t=1

〈pt , `t〉 − `t,j

= Rj,T





Minimizing internal regret


define modified strategies pi→jt

pi→jt,i ← 0

pi→jt,j ← pt,i + pt,j

Apply an algorithm that minimizes external regret to a set of new expertsi → j , i 6= j. O(N2) experts for the new algorithm.

An action is pi→jt .

Results in a sequence of meta-strategies µt ∈ ∆N2

The fixed point pt =∑

(i,j) pi→jt µt,(i,j) minimizes internal regret

can be computed using Gaussian elimination. (write pt = A(µt)pt)





pi→jt,i ← 0












pi→jt,i ← 0












pi→jt,i ← 0









Generalized regret

experts react to the prediction, fi,t(It)

also, activation function Ai,t(k)

Definition (Generalized regret)

ri,t =N∑

k=1

pt,kAi,t(k)(`(k,Yt)− `(fi,t(k),Yt))

External regret is a special caseSet Ai,t(k) = 1 identically, and fi,t(k) = iInternal regret is a special caseConsider experts (i , j), i 6= j). Set

f(i,j),t(k) =

k if k 6= ij otherwise


Generalized regret




ri,t =N∑

k=1



f(i,j),t(k) =



Generalized regret




ri,t =N∑

k=1


External regret is a special caseSet Ai,t(k) = 1 identically, and fi,t(k) = i

Internal regret is a special caseConsider experts (i , j), i 6= j). Set

f(i,j),t(k) =



Generalized regret




ri,t =N∑

k=1



f(i,j),t(k) =



Generalized regret

Given a potential φ, there exists a forecaster that satisfies the Blackwellcondition. (Theorem 4.3)

Recall

Blackwell condition

〈rt ,∇φ(Rt−1)〉 ≤ 0

From the Blackwell condition one can derive a bound on regret


Generalized regret

Given a potential φ, there exists a forecaster that satisfies the Blackwellcondition. (Theorem 4.3)Recall

Blackwell condition

〈rt ,∇φ(Rt−1)〉 ≤ 0



Generalized regret

Given a potential φ, there exists a forecaster that satisfies the Blackwellcondition. (Theorem 4.3)Recall

Blackwell condition

〈rt ,∇φ(Rt−1)〉 ≤ 0



Next week

Chapter 5 (Efficient forecasting for special classes of Experts)or Chapter 6 (Limited information: multi-armed bandit versions)


Documents

Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene