Upload
others
View
26
Download
0
Embed Size (px)
Citation preview
Nonparametric Sequential Prediction of Time Series.Extension to quantile prediction.
G. Biau – B. Patra
University Paris VI – University Paris VI - Lokad
ENGREF, JUNE 2009
B. Patra (University Paris VI - Lokad) 1 / 56
Outline
1 Nonparametric mean predictionContextA consistent strategyExperimental results
2 Non parametric quantile predictionContext and quantile regressionA similar consistent strategyExperimental results
3 Other contexts
4 Conclusion
B. Patra (University Paris VI - Lokad) 2 / 56
Outline
1 Nonparametric mean predictionContextA consistent strategyExperimental results
2 Non parametric quantile predictionContext and quantile regressionA similar consistent strategyExperimental results
3 Other contexts
4 Conclusion
B. Patra (University Paris VI - Lokad) 3 / 56
Introduction
Time series prediction has a long history [Yule, 1927].Genetics, medecine, climate, finance. . .Until 1970’s, parametric approach.Recently: nonparametric approaches.
B. Patra (University Paris VI - Lokad) 4 / 56
Nonparametric framework
A slightly different spiritConsider the sequential (= on-line) prediction of time series.
Including series that do not necessarily satisfy classical statisticalassumptions for bounded, mixing or Markovian processes.
GoalShow consistency results under a minimum of hypotheses.
B. Patra (University Paris VI - Lokad) 5 / 56
Nonparametric framework
A slightly different spiritConsider the sequential (= on-line) prediction of time series.
Including series that do not necessarily satisfy classical statisticalassumptions for bounded, mixing or Markovian processes.
GoalShow consistency results under a minimum of hypotheses.
B. Patra (University Paris VI - Lokad) 5 / 56
Context
Sequential predictionAt time n = 1,2, . . ., predict the next outcome yn ∈ R of sequencey1, y2, . . .
Side information x1, x2, . . ., where each xi ∈ Rd .
Notationyn−1
1 = (y1, . . . , yn−1).
xn1 = (x1, . . . , xn).
On-line learningThe elements y0, y1, y2, . . . and x1, x2, . . . are revealed one at a time, inorder, beginning with (x1, y0), (x2, y1), . . .
B. Patra (University Paris VI - Lokad) 6 / 56
Context
Sequential predictionAt time n = 1,2, . . ., predict the next outcome yn ∈ R of sequencey1, y2, . . .
Side information x1, x2, . . ., where each xi ∈ Rd .
Notationyn−1
1 = (y1, . . . , yn−1).
xn1 = (x1, . . . , xn).
On-line learningThe elements y0, y1, y2, . . . and x1, x2, . . . are revealed one at a time, inorder, beginning with (x1, y0), (x2, y1), . . .
B. Patra (University Paris VI - Lokad) 6 / 56
Context
Sequential predictionAt time n = 1,2, . . ., predict the next outcome yn ∈ R of sequencey1, y2, . . .
Side information x1, x2, . . ., where each xi ∈ Rd .
Notationyn−1
1 = (y1, . . . , yn−1).
xn1 = (x1, . . . , xn).
On-line learningThe elements y0, y1, y2, . . . and x1, x2, . . . are revealed one at a time, inorder, beginning with (x1, y0), (x2, y1), . . .
B. Patra (University Paris VI - Lokad) 6 / 56
Prediction
StrategySequence g = {gn}∞n=1 of forecasting functions
gn : (Rd )n × Rn−1 → R.
Prediction at time n is gn (xn1 , y
n−11 ).
Hypotheses(x1, y1), (x2, y2), . . . are realizations of random variables (X1,Y1),(X2,Y2), . . .
The process {(Xn,Yn)}∞−∞ is jointly stationary and ergodic.
B. Patra (University Paris VI - Lokad) 7 / 56
Prediction
StrategySequence g = {gn}∞n=1 of forecasting functions
gn : (Rd )n × Rn−1 → R.
Prediction at time n is gn (xn1 , y
n−11 ).
Hypotheses(x1, y1), (x2, y2), . . . are realizations of random variables (X1,Y1),(X2,Y2), . . .
The process {(Xn,Yn)}∞−∞ is jointly stationary and ergodic.
B. Patra (University Paris VI - Lokad) 7 / 56
Measuring the error
DefinitionAt time n , the (normalized ) cumulative prediction error on the stringsX n
1 and Y n1 is
Ln(g) =1n
n∑t=1
(gt (X t
1,Yt−11 )− Yt
)2.
Goal versus realityGoal: make Ln(g) small.
Reality: fundamental limit L? [Algoet, 1994].
B. Patra (University Paris VI - Lokad) 8 / 56
Measuring the error
DefinitionAt time n , the (normalized ) cumulative prediction error on the stringsX n
1 and Y n1 is
Ln(g) =1n
n∑t=1
(gt (X t
1,Yt−11 )− Yt
)2.
Goal versus realityGoal: make Ln(g) small.
Reality: fundamental limit L? [Algoet, 1994].
B. Patra (University Paris VI - Lokad) 8 / 56
How good can we get?
Fundamental limit [Algoet, 1994]For any prediction strategy g and jointly stationary ergodic process{(Xn,Yn)}∞−∞,
lim infn→∞
Ln(g) ≥ L? almost surely,
where
L? = E{(
Y0 − E{
Y0|X 0−∞,Y
−1−∞
})2}
is the minimal mean squared error of any prediction for the value of Y0based on the infinite past observation sequences
Y−1−∞ = (. . . ,Y−2,Y−1) and X 0
−∞ = (. . . ,X−1,X0).
B. Patra (University Paris VI - Lokad) 9 / 56
Consistency
DefinitionA prediction strategy g is universally consistent with respect to a classC of stationary and ergodic processes {(Xn,Yn)}∞−∞ if for eachprocess in the class,
limn→∞
Ln(g) = L? almost surely.
There exist universally consistent strategies for the class C of allbounded, stationary and ergodic processes [Algoet,1992 andMorvai et al.,1996].Very complex or slow-converging algorithms.
B. Patra (University Paris VI - Lokad) 10 / 56
Consistency
DefinitionA prediction strategy g is universally consistent with respect to a classC of stationary and ergodic processes {(Xn,Yn)}∞−∞ if for eachprocess in the class,
limn→∞
Ln(g) = L? almost surely.
There exist universally consistent strategies for the class C of allbounded, stationary and ergodic processes [Algoet,1992 andMorvai et al.,1996].Very complex or slow-converging algorithms.
B. Patra (University Paris VI - Lokad) 10 / 56
Outline
1 Nonparametric mean predictionContextA consistent strategyExperimental results
2 Non parametric quantile predictionContext and quantile regressionA similar consistent strategyExperimental results
3 Other contexts
4 Conclusion
B. Patra (University Paris VI - Lokad) 11 / 56
State of the art
A book on prediction of individual sequences.Cesa-Bianchi, N. and Lugosi, G. Prediction, Learning, andGames, Cambridge University Press, New York, 2006.
Quantization strategiesGyörfi and Lugosi (2001) and Nobel (2003): boundedprocesses.Györfi and Ottucsák (2007): fourth-finite momentprocesses.
Biau and al. contributionSeveral simple nonparametric strategies for non-necessarily boundedprocesses:
Kernel-based strategy.Nearest neighbor-based strategy.
B. Patra (University Paris VI - Lokad) 12 / 56
State of the art
A book on prediction of individual sequences.Cesa-Bianchi, N. and Lugosi, G. Prediction, Learning, andGames, Cambridge University Press, New York, 2006.
Quantization strategiesGyörfi and Lugosi (2001) and Nobel (2003): boundedprocesses.Györfi and Ottucsák (2007): fourth-finite momentprocesses.
Biau and al. contributionSeveral simple nonparametric strategies for non-necessarily boundedprocesses:
Kernel-based strategy.Nearest neighbor-based strategy.
B. Patra (University Paris VI - Lokad) 12 / 56
State of the art
A book on prediction of individual sequences.Cesa-Bianchi, N. and Lugosi, G. Prediction, Learning, andGames, Cambridge University Press, New York, 2006.
Quantization strategiesGyörfi and Lugosi (2001) and Nobel (2003): boundedprocesses.Györfi and Ottucsák (2007): fourth-finite momentprocesses.
Biau and al. contributionSeveral simple nonparametric strategies for non-necessarily boundedprocesses:
Kernel-based strategy.Nearest neighbor-based strategy.
B. Patra (University Paris VI - Lokad) 12 / 56
Nearest neighbor strategy
Define infinite array of experts h(k ,`) = {h(k ,`)n }+∞n=1. k , ` = 1,2, . . .
What are k and ¯̀?k is the length of the past observation vector we consider.¯̀ (simple function of `) is the number of nearest neighbors oflength k we consider.
More precisely, ¯̀ = bp`nc where p` ∈ (0,1) and lim`→∞ p` = 0.
Each expert has a job
At time n, expert h(k ,`)n searches for the ¯̀nearest neighbors of
length k .
B. Patra (University Paris VI - Lokad) 13 / 56
Nearest neighbor strategy
Define infinite array of experts h(k ,`) = {h(k ,`)n }+∞n=1. k , ` = 1,2, . . .
What are k and ¯̀?k is the length of the past observation vector we consider.¯̀ (simple function of `) is the number of nearest neighbors oflength k we consider.
More precisely, ¯̀ = bp`nc where p` ∈ (0,1) and lim`→∞ p` = 0.
Each expert has a job
At time n, expert h(k ,`)n searches for the ¯̀nearest neighbors of
length k .
B. Patra (University Paris VI - Lokad) 13 / 56
Nearest neighbor strategy
Define infinite array of experts h(k ,`) = {h(k ,`)n }+∞n=1. k , ` = 1,2, . . .
What are k and ¯̀?k is the length of the past observation vector we consider.¯̀ (simple function of `) is the number of nearest neighbors oflength k we consider.
More precisely, ¯̀ = bp`nc where p` ∈ (0,1) and lim`→∞ p` = 0.
Each expert has a job
At time n, expert h(k ,`)n searches for the ¯̀nearest neighbors of
length k .
B. Patra (University Paris VI - Lokad) 13 / 56
Nearest neighbor strategy
Define infinite array of experts h(k ,`) = {h(k ,`)n }+∞n=1. k , ` = 1,2, . . .
What are k and ¯̀?k is the length of the past observation vector we consider.¯̀ (simple function of `) is the number of nearest neighbors oflength k we consider.
More precisely, ¯̀ = bp`nc where p` ∈ (0,1) and lim`→∞ p` = 0.
Each expert has a job
At time n, expert h(k ,`)n searches for the ¯̀nearest neighbors of
length k .
B. Patra (University Paris VI - Lokad) 13 / 56
6,17 7,18 9,10 8,18 7,16 6,17 5,15
0,10 1,15 2,17 3,72 −1,71 6,39 5,16 0,11 −0,20 1,89 2,84 3,92 2,99 2,21 1,73 ?
1,34 −1,25 0,19 3,18 4,13 2,22 1,34 0,26 −1,90 −2,29 0,88 1,28 3,31 4,12 5,15 3,31
2,27 −3,16 0,01 1,16 5,17
0,91
2,67
2,56 1,881,78 0,57
2,89 2,12 1,78 3,14
3,13
2,18 1,18 0,99
1,89 0,90
Figure: Work of fundamental expert with k = 3 and ¯̀ = 4.
B. Patra (University Paris VI - Lokad) 14 / 56
1,16 5,17 6,17 7,18 9,10 8,18 7,16 6,17 5,15
0,10 1,15 2,17 3,72 −1,71 6,39 5,16 0,11 −0,20 1,89 2,84 3,92 2,99 2,21 1,73 ?
?
1,34 −1,25 0,19 3,18 4,13 2,22 1,34 0,26 −1,90 −2,29 0,88 1,28 3,31 4,12 5,15 3,31
2,27 −3,16 0,01
0,90
2,56 1,88
0,91
1,78 0,57
2,67
2,99 2,21 1,73
2,89 2,12 1,78 3,14
3,13
2,18 1,18 0,99
1,89
B. Patra (University Paris VI - Lokad) 15 / 56
1,89
2,22 1,34 0,26 −1,90 −2,29 0,88 1,28 3,31 4,12 5,15 3,31
2,27 −3,16 0,01 1,16 5,17 6,17 7,18 9,10 8,18 7,16 6,17 5,15
0,10 1,15 2,17 3,72 −1,71 6,39 5,16 0,11 −0,20 2,84 3,92 2,99 2,21 1,73 ?
1,34 −1,25 0,19 3,18 4,13
2,99 2,21 1,73
2,21 1,73
2,56 1,88 0,57
2,99
1,78
2,89 2,12 1,78 3,14
3,13
2,18 1,18 0,99
1,89 0,90 0,91
2,56 1,88
?
0,57
1,78
2,67
B. Patra (University Paris VI - Lokad) 16 / 56
3,92
3,18 4,13 2,22 1,34 0,26 −1,90 −2,29 0,88 1,28 3,31 4,12 5,15 3,31
2,27 −3,16 0,01 1,16 5,17 6,17 7,18 9,10 8,18 7,16 6,17 5,15
0,10 1,15 2,17 3,72 −1,71 6,39 5,16 0,11 −0,20 1,89 2,84 2,99 2,21 1,73 ?
1,34 −1,25 0,19
2,67
?
1,78 1,88
2,89 2,12 1,78 3,14
3,13
2,18 1,18 0,99
1,89 0,90 0,91
2.12 2,67
2,56 0,57
2.89
1,78
1,78
2,99 2,21 1,73
2,99 2,21 1,73
2,56 1,88 0,57
B. Patra (University Paris VI - Lokad) 17 / 56
3,18 4,13 2,22 1,34 0,26 −1,90 −2,29 0,88 1,28 3,31 4,12 5,15 3,31
2,27 −3,16 0,01 1,16 5,17 6,17 7,18 9,10 8,18 7,16 6,17 5,15
0,10 1,15 2,17 3,72 −1,71 6,39 5,16 0,11 −0,20 1,89 2,84 3,92 2,99 2,21 1,73 ?
1,34 −1,25 0,19
0,99
2,18
2,89 2,12 1,78
3,13
1,18
1,89 0,90 0,91
2.12
1,18 0,993,14
2.89
1,78
1,78
2,99 2,21 1,73
2,99 2,21 1,73
2,56 1,88 0,57
2,67
?
1,78 1,88
2,67
2,56 0,57
3,14 2,18
B. Patra (University Paris VI - Lokad) 18 / 56
3,18 4,13 2,22 1,34 0,26 −1,90 −2,29 0,88 1,28 3,31 4,12 5,15 3,31
2,27 −3,16 0,01 1,16 5,17 6,17 7,18 9,10 8,18 7,16 6,17 5,15
0,10 1,15 2,17 3,72 −1,71 6,39 5,16 0,11 −0,20 1,89 2,84 3,92 2,99 2,21 1,73 ?
1,34 −1,25 0,19
2,12 1,78 1,18
1,89 0,90
2.12
0,90
0,91
0,91
2.89
1,78
1,78
2,99 2,21 1,73
2,99 2,21 1,73
2,56 1,88 0,57
2,67
?
1,78 1,88
2,67
2,56 0,57
3,14 2,18 0,99
2,18 1,18 0,993,14
3,13
3,13 1,89
2,89
B. Patra (University Paris VI - Lokad) 19 / 56
3,18 4,13 2,22 1,34 0,26 −1,90 −2,29 0,88 1,28 3,31 4,12 5,15 3,31
2,27 −3,16 0,01 1,16 5,17 6,17 7,18 9,10 8,18 7,16 6,17 5,15
0,10 1,15 2,17 3,72 −1,71 6,39 5,16 0,11 −0,20 1,89 2,84 3,92 2,99 2,21 1,73 ?
1,34 −1,25 0,19
3,13 1,89 0,90
2,89 2,12 1,78 1,18
1,89 0,90
2.12
0,91
0,91
?=1,29
2.89
1,78
1,78
2,99 2,21 1,73
2,99 2,21 1,73
2,56 1,88 0,57
2,67
?
1,78 1,88
2,67
2,56 0,57
3,14 2,18 0,99
2,18 1,18 0,993,14
3,13
B. Patra (University Paris VI - Lokad) 20 / 56
Prediction and Aggregation
Prediction of one expert
h(k ,`)n (xn
1 , yn−11 ) = Tmin{nδ,`}
(∑{t∈J(k,`)
n } yt
|J(k ,`)n |
).
Aggregated prediction of all experts
gn(xn1 , y
n−11 ) =
∞∑k ,`=1
pk ,`,nh(k ,`)n (xn
1 , yn−11 ).
Where do the pk ,`,n come from?Exponentially weight the experts based on their past performance.
B. Patra (University Paris VI - Lokad) 21 / 56
Prediction and Aggregation
Prediction of one expert
h(k ,`)n (xn
1 , yn−11 ) = Tmin{nδ,`}
(∑{t∈J(k,`)
n } yt
|J(k ,`)n |
).
Aggregated prediction of all experts
gn(xn1 , y
n−11 ) =
∞∑k ,`=1
pk ,`,nh(k ,`)n (xn
1 , yn−11 ).
Where do the pk ,`,n come from?Exponentially weight the experts based on their past performance.
B. Patra (University Paris VI - Lokad) 21 / 56
Prediction and Aggregation
Prediction of one expert
h(k ,`)n (xn
1 , yn−11 ) = Tmin{nδ,`}
(∑{t∈J(k,`)
n } yt
|J(k ,`)n |
).
Aggregated prediction of all experts
gn(xn1 , y
n−11 ) =
∞∑k ,`=1
pk ,`,nh(k ,`)n (xn
1 , yn−11 ).
Where do the pk ,`,n come from?Exponentially weight the experts based on their past performance.
B. Patra (University Paris VI - Lokad) 21 / 56
Aggregation continued. . ..
DefinitionsLet {qk ,`} be a probability distribution over all pairs (k , `) ofpositive integers such that qk ,` > 0 for all (k , `).For ηn > 0, we define the weights
wk ,`,n = qk ,`e−ηn(n−1)Ln−1(h(k,`)).
We normalize these weights:
pk ,`,n =wk ,`,n∑∞
i,j=1 wi,j,n.
Global prediction
gn(xn1 , y
n−11 ) =
∞∑k ,`=1
pk ,`,nh(k ,`)n (xn
1 , yn−11 ).
B. Patra (University Paris VI - Lokad) 22 / 56
Aggregation continued. . ..
DefinitionsLet {qk ,`} be a probability distribution over all pairs (k , `) ofpositive integers such that qk ,` > 0 for all (k , `).For ηn > 0, we define the weights
wk ,`,n = qk ,`e−ηn(n−1)Ln−1(h(k,`)).
We normalize these weights:
pk ,`,n =wk ,`,n∑∞
i,j=1 wi,j,n.
Global prediction
gn(xn1 , y
n−11 ) =
∞∑k ,`=1
pk ,`,nh(k ,`)n (xn
1 , yn−11 ).
B. Patra (University Paris VI - Lokad) 22 / 56
Aggregation continued. . ..
DefinitionsLet {qk ,`} be a probability distribution over all pairs (k , `) ofpositive integers such that qk ,` > 0 for all (k , `).For ηn > 0, we define the weights
wk ,`,n = qk ,`e−ηn(n−1)Ln−1(h(k,`)).
We normalize these weights:
pk ,`,n =wk ,`,n∑∞
i,j=1 wi,j,n.
Global prediction
gn(xn1 , y
n−11 ) =
∞∑k ,`=1
pk ,`,nh(k ,`)n (xn
1 , yn−11 ).
B. Patra (University Paris VI - Lokad) 22 / 56
Aggregation continued. . ..
DefinitionsLet {qk ,`} be a probability distribution over all pairs (k , `) ofpositive integers such that qk ,` > 0 for all (k , `).For ηn > 0, we define the weights
wk ,`,n = qk ,`e−ηn(n−1)Ln−1(h(k,`)).
We normalize these weights:
pk ,`,n =wk ,`,n∑∞
i,j=1 wi,j,n.
Global prediction
gn(xn1 , y
n−11 ) =
∞∑k ,`=1
pk ,`,nh(k ,`)n (xn
1 , yn−11 ).
B. Patra (University Paris VI - Lokad) 22 / 56
Result
Theorem [Biau, Bleakley, Györfi, Ottucsák, 2009]Let C be the class of all jointly stationary and ergodic processes{(Xn,Yn)}∞−∞ such that E{Y 4
0 } <∞.
Choose parameter ηn as
ηn =1√n.
Then the nearest neighbor forecasting strategy is universallyconsistent with respect to the class C, that is
limn→∞
Ln(g) = L? almost surely.
B. Patra (University Paris VI - Lokad) 23 / 56
Result
Theorem [Biau, Bleakley, Györfi, Ottucsák, 2009]Let C be the class of all jointly stationary and ergodic processes{(Xn,Yn)}∞−∞ such that E{Y 4
0 } <∞.
Choose parameter ηn as
ηn =1√n.
Then the nearest neighbor forecasting strategy is universallyconsistent with respect to the class C, that is
limn→∞
Ln(g) = L? almost surely.
B. Patra (University Paris VI - Lokad) 23 / 56
Result
Theorem [Biau, Bleakley, Györfi, Ottucsák, 2009]Let C be the class of all jointly stationary and ergodic processes{(Xn,Yn)}∞−∞ such that E{Y 4
0 } <∞.
Choose parameter ηn as
ηn =1√n.
Then the nearest neighbor forecasting strategy is universallyconsistent with respect to the class C, that is
limn→∞
Ln(g) = L? almost surely.
B. Patra (University Paris VI - Lokad) 23 / 56
Outline
1 Nonparametric mean predictionContextA consistent strategyExperimental results
2 Non parametric quantile predictionContext and quantile regressionA similar consistent strategyExperimental results
3 Other contexts
4 Conclusion
B. Patra (University Paris VI - Lokad) 24 / 56
Experimental results
Call center data setDaily call volumes entering a call center.Long series between 382 and 826 time values. 21 series.
B. Patra (University Paris VI - Lokad) 25 / 56
Future outcome predictions results
Model Name Avg Abs Error Avg Sqr Error Mape (%)AR(7) 65.80 9738 31.6
DayOfTheWeekMean 53.95 7099 22.8HoltWinters 49.84 6025 21.5
MeanExpertMixture 52.37 6536 22.3MA 179 62448 52.0
Figure: Forecasting future outcomes
B. Patra (University Paris VI - Lokad) 26 / 56
Outline
1 Nonparametric mean predictionContextA consistent strategyExperimental results
2 Non parametric quantile predictionContext and quantile regressionA similar consistent strategyExperimental results
3 Other contexts
4 Conclusion
B. Patra (University Paris VI - Lokad) 27 / 56
Quantile forecasting
Quantile forecastingGiven a stochastic process Y1,Y2, . . ..
Previously, estimate the conditional mean of Yn givenY1, . . . ,Yn−1.Now: the conditional τ th quantile of Yn given Y1, . . . ,Yn−1.
B. Patra (University Paris VI - Lokad) 28 / 56
Quantile forecasting
Quantile forecastingGiven a stochastic process Y1,Y2, . . ..
Previously, estimate the conditional mean of Yn givenY1, . . . ,Yn−1.Now: the conditional τ th quantile of Yn given Y1, . . . ,Yn−1.
B. Patra (University Paris VI - Lokad) 28 / 56
Quantile forecasting
What for?Understand conditional distributions.τ = 0.5 robust forecasting.Build confidence interval.
Applications fieldsFinance: CVAR. Also biology, medecine, telecoms...Here: call volumes (optimize staff in a call center).
B. Patra (University Paris VI - Lokad) 29 / 56
Quantile forecasting
What for?Understand conditional distributions.τ = 0.5 robust forecasting.Build confidence interval.
Applications fieldsFinance: CVAR. Also biology, medecine, telecoms...Here: call volumes (optimize staff in a call center).
B. Patra (University Paris VI - Lokad) 29 / 56
Figure: Quantile forecast with τ = 0.1,0.9.
B. Patra (University Paris VI - Lokad) 30 / 56
Quantile Regression
Conditional quantilesX multivariate random variable, Y real valued random variable,
qτ (X ) , F←Y |X (τ) = inf{t ∈ R : FY |X (t) ≥ τ}.
FY |X conditional cumulative distribution function.
Proposition [Koenker, 2005]
qτ (X ) ∈ argminq(.)∈R
EPY |X [ρτ (Y − q(X ))] .
If FY |X is (strictly) increasing then qτ (X ) is the unique minimizer.
B. Patra (University Paris VI - Lokad) 31 / 56
Quantile Regression
Conditional quantilesX multivariate random variable, Y real valued random variable,
qτ (X ) , F←Y |X (τ) = inf{t ∈ R : FY |X (t) ≥ τ}.
FY |X conditional cumulative distribution function.
Proposition [Koenker, 2005]
qτ (X ) ∈ argminq(.)∈R
EPY |X [ρτ (Y − q(X ))] .
If FY |X is (strictly) increasing then qτ (X ) is the unique minimizer.
B. Patra (University Paris VI - Lokad) 31 / 56
Figure: Pinball function graph.
B. Patra (University Paris VI - Lokad) 32 / 56
Non parametric framework
FrameworkHere, we observe a string realization yn−1
1 of a stationary andergodic process {Yn}∞n=−∞ . . .
. . . and try to estimate qτ (yn−11 ) = F←
Yn|Y n−11 =yn−1
1(τ), the conditional
quantile at time n.
StrategySequence g = {gn}∞n=1 of τ th quantile forecasting functions
gn : Rn−1 −→ R.
Quantile estimation at time n is gn(yn−11 ).
B. Patra (University Paris VI - Lokad) 33 / 56
Non parametric framework
FrameworkHere, we observe a string realization yn−1
1 of a stationary andergodic process {Yn}∞n=−∞ . . .
. . . and try to estimate qτ (yn−11 ) = F←
Yn|Y n−11 =yn−1
1(τ), the conditional
quantile at time n.
StrategySequence g = {gn}∞n=1 of τ th quantile forecasting functions
gn : Rn−1 −→ R.
Quantile estimation at time n is gn(yn−11 ).
B. Patra (University Paris VI - Lokad) 33 / 56
Errors
Empirical measure criterion.At time n the cumulative pinball error of the strategy g is
Gn(g) =1n
n∑t=1
ρτ
(yt − gt (y t−1
1 )).
Adapted result of [Algoët, 1994]For any stationary and ergodic process {Yn}+∞n=−∞,
lim infn→∞
Gn(g) ≥ G? a.s.,
where
G? = E[minq(.)
EPY0|Y
−1−∞
[ρτ
(Y0 − q(Y−1
−∞))]]
.
B. Patra (University Paris VI - Lokad) 34 / 56
Errors
Empirical measure criterion.At time n the cumulative pinball error of the strategy g is
Gn(g) =1n
n∑t=1
ρτ
(yt − gt (y t−1
1 )).
Adapted result of [Algoët, 1994]For any stationary and ergodic process {Yn}+∞n=−∞,
lim infn→∞
Gn(g) ≥ G? a.s.,
where
G? = E[minq(.)
EPY0|Y
−1−∞
[ρτ
(Y0 − q(Y−1
−∞))]]
.
B. Patra (University Paris VI - Lokad) 34 / 56
Outline
1 Nonparametric mean predictionContextA consistent strategyExperimental results
2 Non parametric quantile predictionContext and quantile regressionA similar consistent strategyExperimental results
3 Other contexts
4 Conclusion
B. Patra (University Paris VI - Lokad) 35 / 56
Nearest neighbors strategy
Elementary predictors
Define infinite array of experts h(k ,`) = {h(k ,`)n }+∞n=1. k , ` = 1,2, . . .
Each expert has a job
At time n, expert h(k ,`)n finds the ¯̀nearest neighbors of length k .
B. Patra (University Paris VI - Lokad) 36 / 56
Nearest neighbors strategy
Elementary predictors
Define infinite array of experts h(k ,`) = {h(k ,`)n }+∞n=1. k , ` = 1,2, . . .
Each expert has a job
At time n, expert h(k ,`)n finds the ¯̀nearest neighbors of length k .
B. Patra (University Paris VI - Lokad) 36 / 56
6,17 7,18 9,10 8,18 7,16 6,17 5,15
0,10 1,15 2,17 3,72 −1,71 6,39 5,16 0,11 −0,20 1,89 2,84 3,92 2,99 2,21 1,73 ?
1,34 −1,25 0,19 3,18 4,13 2,22 1,34 0,26 −1,90 −2,29 0,88 1,28 3,31 4,12 5,15 3,31
2,27 −3,16 0,01 1,16 5,17
0,91
2,67
2,56 1,881,78 0,57
2,89 2,12 1,78 3,14
3,13
2,18 1,18 0,99
1,89 0,90
Figure: Work of fundamental expert with k = 3 and ¯̀ = 4.
B. Patra (University Paris VI - Lokad) 37 / 56
1,16 5,17 6,17 7,18 9,10 8,18 7,16 6,17 5,15
0,10 1,15 2,17 3,72 −1,71 6,39 5,16 0,11 −0,20 1,89 2,84 3,92 2,99 2,21 1,73 ?
?
1,34 −1,25 0,19 3,18 4,13 2,22 1,34 0,26 −1,90 −2,29 0,88 1,28 3,31 4,12 5,15 3,31
2,27 −3,16 0,01
0,90
2,56 1,88
0,91
1,78 0,57
2,67
2,99 2,21 1,73
2,89 2,12 1,78 3,14
3,13
2,18 1,18 0,99
1,89
B. Patra (University Paris VI - Lokad) 38 / 56
1,89
2,22 1,34 0,26 −1,90 −2,29 0,88 1,28 3,31 4,12 5,15 3,31
2,27 −3,16 0,01 1,16 5,17 6,17 7,18 9,10 8,18 7,16 6,17 5,15
0,10 1,15 2,17 3,72 −1,71 6,39 5,16 0,11 −0,20 2,84 3,92 2,99 2,21 1,73 ?
1,34 −1,25 0,19 3,18 4,13
2,99 2,21 1,73
2,21 1,73
2,56 1,88 0,57
2,99
1,78
2,89 2,12 1,78 3,14
3,13
2,18 1,18 0,99
1,89 0,90 0,91
2,56 1,88
?
0,57
1,78
2,67
B. Patra (University Paris VI - Lokad) 39 / 56
3,92
3,18 4,13 2,22 1,34 0,26 −1,90 −2,29 0,88 1,28 3,31 4,12 5,15 3,31
2,27 −3,16 0,01 1,16 5,17 6,17 7,18 9,10 8,18 7,16 6,17 5,15
0,10 1,15 2,17 3,72 −1,71 6,39 5,16 0,11 −0,20 1,89 2,84 2,99 2,21 1,73 ?
1,34 −1,25 0,19
2,67
?
1,78 1,88
2,89 2,12 1,78 3,14
3,13
2,18 1,18 0,99
1,89 0,90 0,91
2.12 2,67
2,56 0,57
2.89
1,78
1,78
2,99 2,21 1,73
2,99 2,21 1,73
2,56 1,88 0,57
B. Patra (University Paris VI - Lokad) 40 / 56
3,18 4,13 2,22 1,34 0,26 −1,90 −2,29 0,88 1,28 3,31 4,12 5,15 3,31
2,27 −3,16 0,01 1,16 5,17 6,17 7,18 9,10 8,18 7,16 6,17 5,15
0,10 1,15 2,17 3,72 −1,71 6,39 5,16 0,11 −0,20 1,89 2,84 3,92 2,99 2,21 1,73 ?
1,34 −1,25 0,19
0,99
2,18
2,89 2,12 1,78
3,13
1,18
1,89 0,90 0,91
2.12
1,18 0,993,14
2.89
1,78
1,78
2,99 2,21 1,73
2,99 2,21 1,73
2,56 1,88 0,57
2,67
?
1,78 1,88
2,67
2,56 0,57
3,14 2,18
B. Patra (University Paris VI - Lokad) 41 / 56
3,18 4,13 2,22 1,34 0,26 −1,90 −2,29 0,88 1,28 3,31 4,12 5,15 3,31
2,27 −3,16 0,01 1,16 5,17 6,17 7,18 9,10 8,18 7,16 6,17 5,15
0,10 1,15 2,17 3,72 −1,71 6,39 5,16 0,11 −0,20 1,89 2,84 3,92 2,99 2,21 1,73 ?
1,34 −1,25 0,19
2,12 1,78 1,18
1,89 0,90
2.12
0,90
0,91
0,91
2.89
1,78
1,78
2,99 2,21 1,73
2,99 2,21 1,73
2,56 1,88 0,57
2,67
?
1,78 1,88
2,67
2,56 0,57
3,14 2,18 0,99
2,18 1,18 0,993,14
3,13
3,13 1,89
2,89
B. Patra (University Paris VI - Lokad) 42 / 56
3,18 4,13 2,22 1,34 0,26 −1,90 −2,29 0,88 1,28 3,31 4,12 5,15 3,31
2,27 −3,16 0,01 1,16 5,17 6,17 7,18 9,10 8,18 7,16 6,17 5,15
0,10 1,15 2,17 3,72 −1,71 6,39 5,16 0,11 −0,20 1,89 2,84 3,92 2,99 2,21 1,73 ?
1,34 −1,25 0,19
3,13 1,89 0,90
2,89 2,12 1,78 1,18
1,89 0,90
2.12
0,91
0,91
empirical quantile
2.89
1,78
1,78
2,99 2,21 1,73
2,99 2,21 1,73
2,56 1,88 0,57
2,67
?
1,78 1,88
2,67
2,56 0,57
3,14 2,18 0,99
2,18 1,18 0,993,14
3,13
? =
B. Patra (University Paris VI - Lokad) 43 / 56
Prediction and Aggregation
Prediction of one expert
h(k ,`)n (yn−1
1 ) = Tmin{nδ,`}
argminq∈R
∑{t∈J(k,`)
n }
ρτ (yt − q)
.
[Can be easily computed by sorting the sample.]
Aggregated prediction of all experts
gn(yn−11 ) =
∞∑k ,`=1
pk ,`,nh(k ,`)n (yn−1
1 ).
B. Patra (University Paris VI - Lokad) 44 / 56
Prediction and Aggregation
Prediction of one expert
h(k ,`)n (yn−1
1 ) = Tmin{nδ,`}
argminq∈R
∑{t∈J(k,`)
n }
ρτ (yt − q)
.
[Can be easily computed by sorting the sample.]
Aggregated prediction of all experts
gn(yn−11 ) =
∞∑k ,`=1
pk ,`,nh(k ,`)n (yn−1
1 ).
B. Patra (University Paris VI - Lokad) 44 / 56
Aggregation continued. . .
DefinitionsFor ηn > 0, we define the weights
wk ,`,n = qk ,`e−ηn(n−1)Gn−1(h(k,`)).
We normalize these weights:
pk ,`,n =wk ,`,n∑∞
i,j=1 wi,j,n.
Global prediction
gn(yn−11 ) =
∞∑k ,`=1
pk ,`,nh(k ,`)n (yn−1
1 ).
B. Patra (University Paris VI - Lokad) 45 / 56
Aggregation continued. . .
DefinitionsFor ηn > 0, we define the weights
wk ,`,n = qk ,`e−ηn(n−1)Gn−1(h(k,`)).
We normalize these weights:
pk ,`,n =wk ,`,n∑∞
i,j=1 wi,j,n.
Global prediction
gn(yn−11 ) =
∞∑k ,`=1
pk ,`,nh(k ,`)n (yn−1
1 ).
B. Patra (University Paris VI - Lokad) 45 / 56
Aggregation continued. . .
DefinitionsFor ηn > 0, we define the weights
wk ,`,n = qk ,`e−ηn(n−1)Gn−1(h(k,`)).
We normalize these weights:
pk ,`,n =wk ,`,n∑∞
i,j=1 wi,j,n.
Global prediction
gn(yn−11 ) =
∞∑k ,`=1
pk ,`,nh(k ,`)n (yn−1
1 ).
B. Patra (University Paris VI - Lokad) 45 / 56
Aggregation continued. . .
DefinitionsFor ηn > 0, we define the weights
wk ,`,n = qk ,`e−ηn(n−1)Gn−1(h(k,`)).
We normalize these weights:
pk ,`,n =wk ,`,n∑∞
i,j=1 wi,j,n.
Global prediction
gn(yn−11 ) =
∞∑k ,`=1
pk ,`,nh(k ,`)n (yn−1
1 ).
B. Patra (University Paris VI - Lokad) 45 / 56
Theoretical Results
TheoremLet C be the class of all jointly stationary and ergodic processes{Yn}∞n=−∞ such that E{Y 2
0 } <∞ and FY0|Y−1−∞
is a.s. increasing.
Then the nearest neighbor quantile forecasting strategy isuniversally consistent with respect to the class C, that is, for allprocess Y ∈ C
limn→∞
Gn(g) = G? almost surely.
B. Patra (University Paris VI - Lokad) 46 / 56
Theoretical Results
TheoremLet C be the class of all jointly stationary and ergodic processes{Yn}∞n=−∞ such that E{Y 2
0 } <∞ and FY0|Y−1−∞
is a.s. increasing.
Then the nearest neighbor quantile forecasting strategy isuniversally consistent with respect to the class C, that is, for allprocess Y ∈ C
limn→∞
Gn(g) = G? almost surely.
B. Patra (University Paris VI - Lokad) 46 / 56
Mathematical demonstration
DifficultyWe can not apply ergodic theorem on fundamental experts.Ergodicity provides weak convergence of random probabilitymeasure (almost surely).
Example
Let J(k ,`)n the set of the indices of the neighbors.
We have, almost surely in term of weak convergence,
P(k ,`)n →
n→∞P(k ,`)∞ ,
where P(k ,`)n , 1
|J(k,`)n |
∑i∈J(k,`)
nδYi .
B. Patra (University Paris VI - Lokad) 47 / 56
Mathematical demonstration
DifficultyWe can not apply ergodic theorem on fundamental experts.Ergodicity provides weak convergence of random probabilitymeasure (almost surely).
Example
Let J(k ,`)n the set of the indices of the neighbors.
We have, almost surely in term of weak convergence,
P(k ,`)n →
n→∞P(k ,`)∞ ,
where P(k ,`)n , 1
|J(k,`)n |
∑i∈J(k,`)
nδYi .
B. Patra (University Paris VI - Lokad) 47 / 56
Mathematical demonstration
Lemma
Let {µn}∞n=1 be a uniformly integrable sequence of real probabilitymeasures, and let µ∞ be a probability measure with (strictly)increasing distribution function. Suppose that {µn}∞n=1 convergesweakly to µ∞. Then, for all τ ∈ (0,1),
qτ,n → qτ,∞ as n→∞,
where qτ,n ∈ argminq Eµn [ρτ (Y − q)] for all n ≥ 1 and{qτ,∞} = argminq Eµ∞ [ρτ (Y − q)].
B. Patra (University Paris VI - Lokad) 48 / 56
Outline
1 Nonparametric mean predictionContextA consistent strategyExperimental results
2 Non parametric quantile predictionContext and quantile regressionA similar consistent strategyExperimental results
3 Other contexts
4 Conclusion
B. Patra (University Paris VI - Lokad) 49 / 56
Future outcome predictions results
τ = 0.5 median base forecaster : robustness.
Model Name Avg Abs Error Avg Sqr Error Mape (%)AR(7) 65.80 9738 31.6
QAR(8)0.5 57.8 9594 24.9DayOfTheWeekMean 53.95 7099 22.8
HoltWinters 49.84 6025 21.5QuantileExpertMixture0.5 48.1 5731 21.6
MeanExpertMixture 52.37 6536 22.3MA 179 62448 52.0
Figure: Forecasting future outcomes.
B. Patra (University Paris VI - Lokad) 50 / 56
Quantile forecasting
Model Name PinBall Loss (0.1) Ramp LossQuantileExpertMixture0.1 13.71 0.80
QAR(7)0.1 13.22 0.88
Figure: τ = 0.1
Model Name PinBall Loss (0.9) Ramp LossQuantileExpertMixture0.9 12.27 0.07
QAR(7)0.9 19.31 0.07
Figure: τ = 0.9
B. Patra (University Paris VI - Lokad) 51 / 56
Quantile forecasting
Model Name PinBall Loss (0.1) Ramp LossQuantileExpertMixture0.1 13.71 0.80
QAR(7)0.1 13.22 0.88
Figure: τ = 0.1
Model Name PinBall Loss (0.9) Ramp LossQuantileExpertMixture0.9 12.27 0.07
QAR(7)0.9 19.31 0.07
Figure: τ = 0.9
B. Patra (University Paris VI - Lokad) 51 / 56
Extensions
Binary predictionPredict the next outcome yn ∈ {0,1} of a sequence of binarynumbers y1, y2, . . .
We know the past sequence yn−11 = (y1, . . . , yn−1).
The whole theory carries over, with
Hn(g) =1n
n∑t=1
1[gt (Y t−1
1 )6=Yt ]
and
g∗n(Y n−11 ) =
{1 if P{Yn = 1|Y n−1
1 } ≥ 1/20 otherwise.
B. Patra (University Paris VI - Lokad) 52 / 56
Extensions
Binary predictionPredict the next outcome yn ∈ {0,1} of a sequence of binarynumbers y1, y2, . . .
We know the past sequence yn−11 = (y1, . . . , yn−1).
The whole theory carries over, with
Hn(g) =1n
n∑t=1
1[gt (Y t−1
1 )6=Yt ]
and
g∗n(Y n−11 ) =
{1 if P{Yn = 1|Y n−1
1 } ≥ 1/20 otherwise.
B. Patra (University Paris VI - Lokad) 52 / 56
Extensions
Binary predictionPredict the next outcome yn ∈ {0,1} of a sequence of binarynumbers y1, y2, . . .
We know the past sequence yn−11 = (y1, . . . , yn−1).
The whole theory carries over, with
Hn(g) =1n
n∑t=1
1[gt (Y t−1
1 )6=Yt ]
and
g∗n(Y n−11 ) =
{1 if P{Yn = 1|Y n−1
1 } ≥ 1/20 otherwise.
B. Patra (University Paris VI - Lokad) 52 / 56
Portfolio selection
Mathematical modelA market with d assets.y1,y2, . . . ∈ Rd
+ represent the evolution of the market in time.The j-th component of yn represents the amount obtained afterinvesting a unit capital in the j-th asset, on the n-th training period.The investor is allowed to diversify his capital according to aportfolio vector bn = (b(1)
n , . . . ,b(d)n ).
WealthS0 is the investor initial capital and b1 = (1/d , . . . ,1/d).At the end of the first training period,
S1 = S0
d∑j=1
b(j)1 y (j)
1 = S0〈b1,y1〉.
B. Patra (University Paris VI - Lokad) 53 / 56
Portfolio selection
Mathematical modelA market with d assets.y1,y2, . . . ∈ Rd
+ represent the evolution of the market in time.The j-th component of yn represents the amount obtained afterinvesting a unit capital in the j-th asset, on the n-th training period.The investor is allowed to diversify his capital according to aportfolio vector bn = (b(1)
n , . . . ,b(d)n ).
WealthS0 is the investor initial capital and b1 = (1/d , . . . ,1/d).At the end of the first training period,
S1 = S0
d∑j=1
b(j)1 y (j)
1 = S0〈b1,y1〉.
B. Patra (University Paris VI - Lokad) 53 / 56
Portfolio selection
Mathematical modelA market with d assets.y1,y2, . . . ∈ Rd
+ represent the evolution of the market in time.The j-th component of yn represents the amount obtained afterinvesting a unit capital in the j-th asset, on the n-th training period.The investor is allowed to diversify his capital according to aportfolio vector bn = (b(1)
n , . . . ,b(d)n ).
WealthS0 is the investor initial capital and b1 = (1/d , . . . ,1/d).At the end of the first training period,
S1 = S0
d∑j=1
b(j)1 y (j)
1 = S0〈b1,y1〉.
B. Patra (University Paris VI - Lokad) 53 / 56
Portfolio selection
Mathematical modelA market with d assets.y1,y2, . . . ∈ Rd
+ represent the evolution of the market in time.The j-th component of yn represents the amount obtained afterinvesting a unit capital in the j-th asset, on the n-th training period.The investor is allowed to diversify his capital according to aportfolio vector bn = (b(1)
n , . . . ,b(d)n ).
WealthS0 is the investor initial capital and b1 = (1/d , . . . ,1/d).At the end of the first training period,
S1 = S0
d∑j=1
b(j)1 y (j)
1 = S0〈b1,y1〉.
B. Patra (University Paris VI - Lokad) 53 / 56
Portfolio selection
Mathematical modelA market with d assets.y1,y2, . . . ∈ Rd
+ represent the evolution of the market in time.The j-th component of yn represents the amount obtained afterinvesting a unit capital in the j-th asset, on the n-th training period.The investor is allowed to diversify his capital according to aportfolio vector bn = (b(1)
n , . . . ,b(d)n ).
WealthS0 is the investor initial capital and b1 = (1/d , . . . ,1/d).At the end of the first training period,
S1 = S0
d∑j=1
b(j)1 y (j)
1 = S0〈b1,y1〉.
B. Patra (University Paris VI - Lokad) 53 / 56
Portfolio selection
Mathematical modelA market with d assets.y1,y2, . . . ∈ Rd
+ represent the evolution of the market in time.The j-th component of yn represents the amount obtained afterinvesting a unit capital in the j-th asset, on the n-th training period.The investor is allowed to diversify his capital according to aportfolio vector bn = (b(1)
n , . . . ,b(d)n ).
WealthS0 is the investor initial capital and b1 = (1/d , . . . ,1/d).At the end of the first training period,
S1 = S0
d∑j=1
b(j)1 y (j)
1 = S0〈b1,y1〉.
B. Patra (University Paris VI - Lokad) 53 / 56
Portfolio selection
Mathematical modelA market with d assets.y1,y2, . . . ∈ Rd
+ represent the evolution of the market in time.The j-th component of yn represents the amount obtained afterinvesting a unit capital in the j-th asset, on the n-th training period.The investor is allowed to diversify his capital according to aportfolio vector bn = (b(1)
n , . . . ,b(d)n ).
WealthS0 is the investor initial capital and b1 = (1/d , . . . ,1/d).At the end of the first training period,
S1 = S0
d∑j=1
b(j)1 y (j)
1 = S0〈b1,y1〉.
B. Patra (University Paris VI - Lokad) 53 / 56
Wealth.By induction,
Sn = Sn−1〈bn(yn−11 ),yn〉 = S0 exp
{n∑
t=1
log〈bt (yt−11 ),yt〉
}.
Goal: find the best investment strategy {bn}∞n=1 to maximize thewealth Sn.Equivalent to maximize the criterionWn(b) = 1
n∑n
t=1 log〈bt (yt−11 ),yt〉.
Strategies so far: quantization, kernel and nearest neighboron-line prediction with expert aggregation.
References: Györfi and Schäfer (2003), Györfi et al.(2006, 2008).
B. Patra (University Paris VI - Lokad) 54 / 56
Wealth.By induction,
Sn = Sn−1〈bn(yn−11 ),yn〉 = S0 exp
{n∑
t=1
log〈bt (yt−11 ),yt〉
}.
Goal: find the best investment strategy {bn}∞n=1 to maximize thewealth Sn.Equivalent to maximize the criterionWn(b) = 1
n∑n
t=1 log〈bt (yt−11 ),yt〉.
Strategies so far: quantization, kernel and nearest neighboron-line prediction with expert aggregation.
References: Györfi and Schäfer (2003), Györfi et al.(2006, 2008).
B. Patra (University Paris VI - Lokad) 54 / 56
Wealth.By induction,
Sn = Sn−1〈bn(yn−11 ),yn〉 = S0 exp
{n∑
t=1
log〈bt (yt−11 ),yt〉
}.
Goal: find the best investment strategy {bn}∞n=1 to maximize thewealth Sn.Equivalent to maximize the criterionWn(b) = 1
n∑n
t=1 log〈bt (yt−11 ),yt〉.
Strategies so far: quantization, kernel and nearest neighboron-line prediction with expert aggregation.
References: Györfi and Schäfer (2003), Györfi et al.(2006, 2008).
B. Patra (University Paris VI - Lokad) 54 / 56
Wealth.By induction,
Sn = Sn−1〈bn(yn−11 ),yn〉 = S0 exp
{n∑
t=1
log〈bt (yt−11 ),yt〉
}.
Goal: find the best investment strategy {bn}∞n=1 to maximize thewealth Sn.Equivalent to maximize the criterionWn(b) = 1
n∑n
t=1 log〈bt (yt−11 ),yt〉.
Strategies so far: quantization, kernel and nearest neighboron-line prediction with expert aggregation.
References: Györfi and Schäfer (2003), Györfi et al.(2006, 2008).
B. Patra (University Paris VI - Lokad) 54 / 56
Bibliography
Algoet, P. The strong law of large numbers for sequential decisionsunder uncertainty, IEEE Trans. Inform. Theory, 40, 609-633, 1994.
Biau, G., Bleakley, K., Györfi, L. and Ottucsák, G. Non-parametricsequential prediction of time series, J. Nonparametr. Stat., 2009, inpress.
Cesa-Bianchi, N. and Lugosi, G. Prediction, Learning, and Games,Cambridge University Press, New York, 2006.
Györfi, L., Udina, F. and Walk, H. Nonparametric nearest neighborbased empirical portfolio selection strategies, Statist. Decis., 26,145-157, 2008.
Koenker, R. Quantile Regression, Cambridge University Press,Cambridge, 2005.
B. Patra (University Paris VI - Lokad) 55 / 56
Thank you for your attention.
B. Patra (University Paris VI - Lokad) 56 / 56