Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Prediction, Learning and Games - Chapter 4Randomized prediction
Walid Krichene
November 14, 2013
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 1 / 16
Experts framework
RecallOn iteration t: experts reveal their advice fi,t
forecaster makes a decision pt =∑N
i=1 wi,t fi,tthe losses are revealed `(fi,t , yt) and `(pt , yt)
forecaster updates weights wi,t+1
`(·, yt) is convex
DefinitionRegret
Ri,T = LT − Li,T =T∑
t=1
`(pt , yt)− `(fi,t , yt)
Goal: RTT = o(1)
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 2 / 16
Experts framework
RecallOn iteration t: experts reveal their advice fi,tforecaster makes a decision pt =
∑Ni=1 wi,t fi,t
the losses are revealed `(fi,t , yt) and `(pt , yt)
forecaster updates weights wi,t+1
`(·, yt) is convex
DefinitionRegret
Ri,T = LT − Li,T =T∑
t=1
`(pt , yt)− `(fi,t , yt)
Goal: RTT = o(1)
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 2 / 16
Experts framework
RecallOn iteration t: experts reveal their advice fi,tforecaster makes a decision pt =
∑Ni=1 wi,t fi,t
the losses are revealed `(fi,t , yt) and `(pt , yt)
forecaster updates weights wi,t+1
`(·, yt) is convex
DefinitionRegret
Ri,T = LT − Li,T =T∑
t=1
`(pt , yt)− `(fi,t , yt)
Goal: RTT = o(1)
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 2 / 16
Experts framework
RecallOn iteration t: experts reveal their advice fi,tforecaster makes a decision pt =
∑Ni=1 wi,t fi,t
the losses are revealed `(fi,t , yt) and `(pt , yt)
forecaster updates weights wi,t+1
`(·, yt) is convex
DefinitionRegret
Ri,T = LT − Li,T =T∑
t=1
`(pt , yt)− `(fi,t , yt)
Goal: RTT = o(1)
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 2 / 16
Experts framework
RecallOn iteration t: experts reveal their advice fi,tforecaster makes a decision pt =
∑Ni=1 wi,t fi,t
the losses are revealed `(fi,t , yt) and `(pt , yt)
forecaster updates weights wi,t+1
`(·, yt) is convex
DefinitionRegret
Ri,T = LT − Li,T =T∑
t=1
`(pt , yt)− `(fi,t , yt)
Goal: RTT = o(1)
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 2 / 16
Experts framework
RecallOn iteration t: experts reveal their advice fi,tforecaster makes a decision pt =
∑Ni=1 wi,t fi,t
the losses are revealed `(fi,t , yt) and `(pt , yt)
forecaster updates weights wi,t+1
`(·, yt) is convex
DefinitionRegret
Ri,T = LT − Li,T =T∑
t=1
`(pt , yt)− `(fi,t , yt)
Goal: RTT = o(1)
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 2 / 16
Multiplicative weight algorithms
DefinitionHedge algorithm
wi,t+1 ∝ wi,t exp(−γ`(fi,t , yt))
Average regret RTT ≤
ln NγT + γ
8
taking γt = O(√
ln NT ) yields RT
T ≤ O( ln NT ) + O( ln N
T )
small losses: RTT ≤
1T
(γ
1−e−γ − 1)
L∗i + 1T
ln N1−e−γ
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 3 / 16
Multiplicative weight algorithms
DefinitionHedge algorithm
wi,t+1 ∝ wi,t exp(−γ`(fi,t , yt))
Average regret RTT ≤
ln NγT + γ
8
taking γt = O(√
ln NT ) yields RT
T ≤ O( ln NT ) + O( ln N
T )
small losses: RTT ≤
1T
(γ
1−e−γ − 1)
L∗i + 1T
ln N1−e−γ
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 3 / 16
Multiplicative weight algorithms
DefinitionHedge algorithm
wi,t+1 ∝ wi,t exp(−γ`(fi,t , yt))
Average regret RTT ≤
ln NγT + γ
8
taking γt = O(√
ln NT ) yields RT
T ≤ O( ln NT ) + O( ln N
T )
small losses: RTT ≤
1T
(γ
1−e−γ − 1)
L∗i + 1T
ln N1−e−γ
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 3 / 16
Multiplicative weight algorithms
DefinitionHedge algorithm
wi,t+1 ∝ wi,t exp(−γ`(fi,t , yt))
Average regret RTT ≤
ln NγT + γ
8
taking γt = O(√
ln NT ) yields RT
T ≤ O( ln NT ) + O( ln N
T )
small losses: RTT ≤
1T
(γ
1−e−γ − 1)
L∗i + 1T
ln N1−e−γ
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 3 / 16
Motivation for randomization
What if the decision set is non-convex? E.g. 0, 1Also: any deterministic algorithm can incur Ω(n) loss.solution (to both): Randomize
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 4 / 16
Setting
Actions 1, 2, . . . ,N = [N]
forecaster maintains a probability distribution (pi,t)i∈[N]
randomly picks an action It ∼ pt
losses are revealed `(i , yt)
sequence y1, . . . , yT can be fixed a priori (oblivious opponent) or can dependon player’s decisions.
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 5 / 16
Setting
Actions 1, 2, . . . ,N = [N]
forecaster maintains a probability distribution (pi,t)i∈[N]
randomly picks an action It ∼ pt
losses are revealed `(i , yt)
sequence y1, . . . , yT can be fixed a priori (oblivious opponent) or can dependon player’s decisions.
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 5 / 16
Setting
Actions 1, 2, . . . ,N = [N]
forecaster maintains a probability distribution (pi,t)i∈[N]
randomly picks an action It ∼ pt
losses are revealed `(i , yt)
sequence y1, . . . , yT can be fixed a priori (oblivious opponent) or can dependon player’s decisions.
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 5 / 16
Setting
Actions 1, 2, . . . ,N = [N]
forecaster maintains a probability distribution (pi,t)i∈[N]
randomly picks an action It ∼ pt
losses are revealed `(i , yt)
sequence y1, . . . , yT can be fixed a priori (oblivious opponent) or can dependon player’s decisions.
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 5 / 16
Setting
Actions 1, 2, . . . ,N = [N]
forecaster maintains a probability distribution (pi,t)i∈[N]
randomly picks an action It ∼ pt
losses are revealed `(i , yt)
sequence y1, . . . , yT can be fixed a priori (oblivious opponent) or can dependon player’s decisions.
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 5 / 16
Regret
DefinitionExpected loss (conditioned on past plays) ¯(pt ,Yt) =
∑Ni=1 `(i ,Yt)pi,t
DefinitionExpected regret
Ri,T =T∑
t=1
¯(pt ,Yt)− `(i ,Yt)
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 6 / 16
Regret
Note: pt+1 only depends on pt and `(i ,Yt) (not on forecaster randomization)Strategy: bound expected regret
RT
T≤ B(T )
then with high probability (≥ 1− δ)
RT
T≤ B(T ) +
√− ln δ
T
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 7 / 16
Regret
Note: pt+1 only depends on pt and `(i ,Yt) (not on forecaster randomization)
Strategy: bound expected regret
RT
T≤ B(T )
then with high probability (≥ 1− δ)
RT
T≤ B(T ) +
√− ln δ
T
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 7 / 16
Regret
Note: pt+1 only depends on pt and `(i ,Yt) (not on forecaster randomization)Strategy: bound expected regret
RT
T≤ B(T )
then with high probability (≥ 1− δ)
RT
T≤ B(T ) +
√− ln δ
T
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 7 / 16
Back to experts
Can think of this asStrategy space is simplex on actions: ∆N
Every expert has constant advice: fi,t is the i-th vertex of the simplexdecision of forecaster is convex combination of verticesloss function is `(·,Yt) is linear: (expected) loss incurred by forecaster is∑
i
pt,i`(fi,t ,Yt) = `(∑
i
pt,i fi,t ,Yt) = `(pt ,Yt)
write the expected loss as 〈`t , pt〉 where `t,i = `(fi,t ,Yt).
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 8 / 16
Back to experts
Can think of this asStrategy space is simplex on actions: ∆N
Every expert has constant advice: fi,t is the i-th vertex of the simplex
decision of forecaster is convex combination of verticesloss function is `(·,Yt) is linear: (expected) loss incurred by forecaster is∑
i
pt,i`(fi,t ,Yt) = `(∑
i
pt,i fi,t ,Yt) = `(pt ,Yt)
write the expected loss as 〈`t , pt〉 where `t,i = `(fi,t ,Yt).
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 8 / 16
Back to experts
Can think of this asStrategy space is simplex on actions: ∆N
Every expert has constant advice: fi,t is the i-th vertex of the simplexdecision of forecaster is convex combination of vertices
loss function is `(·,Yt) is linear: (expected) loss incurred by forecaster is∑i
pt,i`(fi,t ,Yt) = `(∑
i
pt,i fi,t ,Yt) = `(pt ,Yt)
write the expected loss as 〈`t , pt〉 where `t,i = `(fi,t ,Yt).
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 8 / 16
Back to experts
Can think of this asStrategy space is simplex on actions: ∆N
Every expert has constant advice: fi,t is the i-th vertex of the simplexdecision of forecaster is convex combination of verticesloss function is `(·,Yt) is linear: (expected) loss incurred by forecaster is∑
i
pt,i`(fi,t ,Yt) = `(∑
i
pt,i fi,t ,Yt) = `(pt ,Yt)
write the expected loss as 〈`t , pt〉 where `t,i = `(fi,t ,Yt).
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 8 / 16
Hedge algorithm
Regularized greedy algorithmHedge is solution to
pt+1 = arg minp∈∆N
〈`t , p〉+1γ
DKL(p||pt)
where DKL is the Kullback-Leibler divergence DKL(p, q) =∑
i lnpiqi
pi
Alsopt+1 = arg min
p∈∆N〈Lt , p〉 −
1γ
H(p)
where H is the entropy H(p) = −∑
i pi ln pi
Connection with stochastic optimization (last week)Greedy strategy called fictitious play
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 9 / 16
Hedge algorithm
Regularized greedy algorithmHedge is solution to
pt+1 = arg minp∈∆N
〈`t , p〉+1γ
DKL(p||pt)
where DKL is the Kullback-Leibler divergence DKL(p, q) =∑
i lnpiqi
pi
Alsopt+1 = arg min
p∈∆N〈Lt , p〉 −
1γ
H(p)
where H is the entropy H(p) = −∑
i pi ln pi
Connection with stochastic optimization (last week)Greedy strategy called fictitious play
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 9 / 16
Follow the perturbed leader
Regularized problem called: follow the perturbed leader
They prove it in a more general case
Theorem (Theorem 4.2)
It = argmini
Li,t−1 + Zi,t
Then
RT
T≤ 1
T
(Emax
iZi,1 + Emax
i−Zi,1 +
∑i
∫Ft(z)(fZ (z)− fZ (z − `t))
)
where Ft(z) = `(it(z), yt) and fZ is the density of Z .
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 10 / 16
Follow the perturbed leader
Regularized problem called: follow the perturbed leaderThey prove it in a more general case
Theorem (Theorem 4.2)
It = argmini
Li,t−1 + Zi,t
Then
RT
T≤ 1
T
(Emax
iZi,1 + Emax
i−Zi,1 +
∑i
∫Ft(z)(fZ (z)− fZ (z − `t))
)
where Ft(z) = `(it(z), yt) and fZ is the density of Z .
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 10 / 16
Internal regret
regret analyzed so far: external regret. Compare your cumulative loss tocumulative loss of a single expert / action
internal regret: Swap all instances of action i with j
Definition (Internal regret)
Ri,j,T =T∑
t=1
pi,t(`(i ,Yt)− `(j ,Yt))
internal regretmaxi,j
Ri,j,T
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 11 / 16
Internal regret
regret analyzed so far: external regret. Compare your cumulative loss tocumulative loss of a single expert / actioninternal regret: Swap all instances of action i with j
Definition (Internal regret)
Ri,j,T =T∑
t=1
pi,t(`(i ,Yt)− `(j ,Yt))
internal regretmaxi,j
Ri,j,T
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 11 / 16
Internal regret
regret analyzed so far: external regret. Compare your cumulative loss tocumulative loss of a single expert / actioninternal regret: Swap all instances of action i with j
Definition (Internal regret)
Ri,j,T =T∑
t=1
pi,t(`(i ,Yt)− `(j ,Yt))
internal regretmaxi,j
Ri,j,T
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 11 / 16
Internal regret
In a sense, stronger than external regret:
∑i
Ri,j,T =T∑
t=1
∑i
pt,i (`t,i − `t,j)
=T∑
t=1
〈pt , `t〉 − `t,j
= Rj,T
so if maxj Rj,T ≤ N maxi,j Ri,j,T
Weighted average forecaster has large internal regret. RT = Ω(T )
But, can adapt to have small internal regret
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 12 / 16
Internal regret
In a sense, stronger than external regret:
∑i
Ri,j,T =T∑
t=1
∑i
pt,i (`t,i − `t,j)
=T∑
t=1
〈pt , `t〉 − `t,j
= Rj,T
so if maxj Rj,T ≤ N maxi,j Ri,j,T
Weighted average forecaster has large internal regret. RT = Ω(T )
But, can adapt to have small internal regret
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 12 / 16
Internal regret
In a sense, stronger than external regret:
∑i
Ri,j,T =T∑
t=1
∑i
pt,i (`t,i − `t,j)
=T∑
t=1
〈pt , `t〉 − `t,j
= Rj,T
so if maxj Rj,T ≤ N maxi,j Ri,j,T
Weighted average forecaster has large internal regret. RT = Ω(T )
But, can adapt to have small internal regret
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 12 / 16
Internal regret
In a sense, stronger than external regret:
∑i
Ri,j,T =T∑
t=1
∑i
pt,i (`t,i − `t,j)
=T∑
t=1
〈pt , `t〉 − `t,j
= Rj,T
so if maxj Rj,T ≤ N maxi,j Ri,j,T
Weighted average forecaster has large internal regret. RT = Ω(T )
But, can adapt to have small internal regret
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 12 / 16
Minimizing internal regret
Minimizing internal regret
define modified strategies pi→jt
pi→jt,i ← 0
pi→jt,j ← pt,i + pt,j
Apply an algorithm that minimizes external regret to a set of new expertsi → j , i 6= j. O(N2) experts for the new algorithm.
An action is pi→jt .
Results in a sequence of meta-strategies µt ∈ ∆N2
The fixed point pt =∑
(i,j) pi→jt µt,(i,j) minimizes internal regret
can be computed using Gaussian elimination. (write pt = A(µt)pt)
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 13 / 16
Minimizing internal regret
Minimizing internal regret
define modified strategies pi→jt
pi→jt,i ← 0
pi→jt,j ← pt,i + pt,j
Apply an algorithm that minimizes external regret to a set of new expertsi → j , i 6= j. O(N2) experts for the new algorithm.
An action is pi→jt .
Results in a sequence of meta-strategies µt ∈ ∆N2
The fixed point pt =∑
(i,j) pi→jt µt,(i,j) minimizes internal regret
can be computed using Gaussian elimination. (write pt = A(µt)pt)
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 13 / 16
Minimizing internal regret
Minimizing internal regret
define modified strategies pi→jt
pi→jt,i ← 0
pi→jt,j ← pt,i + pt,j
Apply an algorithm that minimizes external regret to a set of new expertsi → j , i 6= j. O(N2) experts for the new algorithm.
An action is pi→jt .
Results in a sequence of meta-strategies µt ∈ ∆N2
The fixed point pt =∑
(i,j) pi→jt µt,(i,j) minimizes internal regret
can be computed using Gaussian elimination. (write pt = A(µt)pt)
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 13 / 16
Minimizing internal regret
Minimizing internal regret
define modified strategies pi→jt
pi→jt,i ← 0
pi→jt,j ← pt,i + pt,j
Apply an algorithm that minimizes external regret to a set of new expertsi → j , i 6= j. O(N2) experts for the new algorithm.
An action is pi→jt .
Results in a sequence of meta-strategies µt ∈ ∆N2
The fixed point pt =∑
(i,j) pi→jt µt,(i,j) minimizes internal regret
can be computed using Gaussian elimination. (write pt = A(µt)pt)
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 13 / 16
Generalized regret
experts react to the prediction, fi,t(It)
also, activation function Ai,t(k)
Definition (Generalized regret)
ri,t =N∑
k=1
pt,kAi,t(k)(`(k,Yt)− `(fi,t(k),Yt))
External regret is a special caseSet Ai,t(k) = 1 identically, and fi,t(k) = iInternal regret is a special caseConsider experts (i , j), i 6= j). Set
f(i,j),t(k) =
k if k 6= ij otherwise
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 14 / 16
Generalized regret
experts react to the prediction, fi,t(It)
also, activation function Ai,t(k)
Definition (Generalized regret)
ri,t =N∑
k=1
pt,kAi,t(k)(`(k,Yt)− `(fi,t(k),Yt))
External regret is a special caseSet Ai,t(k) = 1 identically, and fi,t(k) = iInternal regret is a special caseConsider experts (i , j), i 6= j). Set
f(i,j),t(k) =
k if k 6= ij otherwise
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 14 / 16
Generalized regret
experts react to the prediction, fi,t(It)
also, activation function Ai,t(k)
Definition (Generalized regret)
ri,t =N∑
k=1
pt,kAi,t(k)(`(k,Yt)− `(fi,t(k),Yt))
External regret is a special caseSet Ai,t(k) = 1 identically, and fi,t(k) = i
Internal regret is a special caseConsider experts (i , j), i 6= j). Set
f(i,j),t(k) =
k if k 6= ij otherwise
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 14 / 16
Generalized regret
experts react to the prediction, fi,t(It)
also, activation function Ai,t(k)
Definition (Generalized regret)
ri,t =N∑
k=1
pt,kAi,t(k)(`(k,Yt)− `(fi,t(k),Yt))
External regret is a special caseSet Ai,t(k) = 1 identically, and fi,t(k) = iInternal regret is a special caseConsider experts (i , j), i 6= j). Set
f(i,j),t(k) =
k if k 6= ij otherwise
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 14 / 16
Generalized regret
Given a potential φ, there exists a forecaster that satisfies the Blackwellcondition. (Theorem 4.3)
Recall
Blackwell condition
〈rt ,∇φ(Rt−1)〉 ≤ 0
From the Blackwell condition one can derive a bound on regret
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 15 / 16
Generalized regret
Given a potential φ, there exists a forecaster that satisfies the Blackwellcondition. (Theorem 4.3)Recall
Blackwell condition
〈rt ,∇φ(Rt−1)〉 ≤ 0
From the Blackwell condition one can derive a bound on regret
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 15 / 16
Generalized regret
Given a potential φ, there exists a forecaster that satisfies the Blackwellcondition. (Theorem 4.3)Recall
Blackwell condition
〈rt ,∇φ(Rt−1)〉 ≤ 0
From the Blackwell condition one can derive a bound on regret
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 15 / 16
Next week
Chapter 5 (Efficient forecasting for special classes of Experts)or Chapter 6 (Limited information: multi-armed bandit versions)
Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 16 / 16