Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Introduction Estimations: local modelling Cross Validation Assignments
Lecture 9: Nonparametric Regression (1)
Applied Statistics 2015
1 / 22
Introduction Estimations: local modelling Cross Validation Assignments
An example: Pick-It Lottery
The New Jersey Pick-It Lottery is a daily numbers game run by thestate of New Jersey. Buying a ticket entitles a player to pick a numberbetween 0 and 999. Half of the money bet each day goes into theprize pool. (The state takes the other half.) The state picks a winningnumber at random, and the prize pool is shared equally among allwinning tickets.
We analyze the first 254 drawings after the lottery started in 1975.Figure 1 shows a scatterplot of the winning numbers and their payoffs.
2 / 22
Introduction Estimations: local modelling Cross Validation Assignments
An example: Pick-It Lottery
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●●
●●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0 200 400 600 800 1000
200
400
600
800
Number
Pay
off
Although all numbers are equally likely to win, numbers chosen byfewer people have bigger payoffs if they win because the prize is sharedamong fewer tickets.Question: can we find some pattern from the data? Are there numberswith larger payoffs? 3 / 22
Introduction Estimations: local modelling Cross Validation Assignments
An example: Pick-It LotteryThe question can be answered by regression analysis.Linear regression: assumes the linear relation between payoff and win-ning number. The blue dashed line is the least squares regression line,which shows a general trend of higher payoffs for larger winning num-bers.
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●●
●●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0 200 400 600 800 1000
200
400
600
800
Number
Pay
off
4 / 22
Introduction Estimations: local modelling Cross Validation Assignments
Nonparametric regression
Nonparametric regression do not assume any parametric structure. It isalso known as “learning a function” in the field of machine learning. Thereare n pairs of observations (x1, Y1), . . . , (xn, Yn). The response variableY is related to the covariate x by the equations
Yi = r(xi) + εi, i = 1, . . . , n
where r is the regression function, E(εi) = 0 and Var(εi) = σ2.
Here, we want to estimate r under weak assumptions withoutassuming a parametric model of r.
We are treating the covariate xi as fixed – fixed design. For randomdesign, the data are (Xi, Yi), i = 1, . . . , n and r(x) is the conditionalexpectation of Y given that X = x: r(x) = E(Y |X = x).
5 / 22
Introduction Estimations: local modelling Cross Validation Assignments
A general idea behind different estimations
Note that Yi is the sum of r(xi) and some error, the expected valueof which is zero. This motivates to estimate r(x) by the average ofthose Yi where Xi is “close” to x.
Different ways of averaging and different measures of closeness leadto different estimators.
6 / 22
Introduction Estimations: local modelling Cross Validation Assignments
An Example
The data are n = 60 pairs of observations from a certain regressionmodel.
How to construc rn, an etimator of r?
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
−2
−1
01
2
x
Y
7 / 22
Introduction Estimations: local modelling Cross Validation Assignments
Estimator: RegressogramA regressogram is construced in a similar manner as that for histogram.Here we consider that xi ∈ [0, 1]. Devide the unit interval into m equallyspaced bins denoted by B1, B2, . . . , Bm. Define the regressogram,
gn(x) =1
kj
∑i:xi∈Bj
Yi, for x ∈ Bj ,
where kj is the number of points in Bj . Here we use the convention 00 = 0.
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
−2
−1
01
2
Regressogram (m=10)
x
Y
8 / 22
Introduction Estimations: local modelling Cross Validation Assignments
Estimator: RegressogramA regressogram is construced in a similar manner as that for histogram.Here we consider that xi ∈ [0, 1]. Devide the unit interval into m equallyspaced bins denoted by B1, B2, . . . , Bm. Define the regressogram,
gn(x) =1
kj
∑i:xi∈Bj
Yi, for x ∈ Bj ,
where kj is the number of points in Bj . Here we use the convention 00 = 0.
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
−2
−1
01
2
Regressogram (m=10)
x
Y
8 / 22
Introduction Estimations: local modelling Cross Validation Assignments
Estimator: Local averageFix h > 0,
rn(x) =
∑ni=1 I(x− h < xi ≤ x+ h)Yi∑ni=1 I(x− h < xi ≤ x+ h)
.
This is also called naive kernel estimator: rn(x) =∑n
i=112 1[−1,1)(
x−xih )Yi∑n
i=112 1[−1,1)(
x−xih )
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
−2
−1
01
2
Local Average (h=0.2)
x
Y
9 / 22
Introduction Estimations: local modelling Cross Validation Assignments
Estimator: Local averageFix h > 0,
rn(x) =
∑ni=1 I(x− h < xi ≤ x+ h)Yi∑ni=1 I(x− h < xi ≤ x+ h)
.
This is also called naive kernel estimator: rn(x) =∑n
i=112 1[−1,1)(
x−xih )Yi∑n
i=112 1[−1,1)(
x−xih )
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
−2
−1
01
2Local Average (h=0.2)
x
Y
9 / 22
Introduction Estimations: local modelling Cross Validation Assignments
Nadaraya-Watson EstimatorReplacing the box kernel by a general kernel in the local averageestimator, we obtain the Nadaraya-Watson estimator of r:
rn(x) =
∑ni=1K
(x−xi
h
)Yi∑n
i=1K(x−xi
h
) .
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
−2
−1
01
2
Nadaraya−Watson (h=0.2, kernel=guassian)
x
Y
10 / 22
Introduction Estimations: local modelling Cross Validation Assignments
Nadaraya-Watson EstimatorReplacing the box kernel by a general kernel in the local averageestimator, we obtain the Nadaraya-Watson estimator of r:
rn(x) =
∑ni=1K
(x−xi
h
)Yi∑n
i=1K(x−xi
h
) .
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
−2
−1
01
2
Nadaraya−Watson (h=0.2, kernel=guassian)
x
Y
10 / 22
Introduction Estimations: local modelling Cross Validation Assignments
The black curve indicates r(x), the real regression function.The underlying mode is: Yi = sin(8xi)− xi + x3i + εi, withεi ∼ N(0, 0.5).
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
−2
−1
01
2
x
Y
11 / 22
Introduction Estimations: local modelling Cross Validation Assignments
Some comments
The three estimators can be written in the same form:
rn(x) =
n∑i=1
li(x)Yi.
Define the class of piecewise constant functionsFm = {m : m(t) =
∑mi=1 ciI(t ∈ Bi), ci ∈ R} . Then the
regressogram
gn = argminm∈Fm
n∑i=1
(Yi − m(xi))2.
The Nadaraya-Watson estimator can be considered as locally fittinga constant to to the data:
rn(x) = argminc∈R
n∑i=1
K
(x− xih
)(Yi − c)2.
12 / 22
Introduction Estimations: local modelling Cross Validation Assignments
Risk
For fixed x,
MSE(rn(x)) = E((rn(x)− r(x))2
)= (E(rn(x))− r(x))2+Var(rn(x)) .
For global index, we consider
MISE(rn) =E
(∫(rn(x)− r(x))2dx
)=
∫(E(rn(x))− r(x))2 dx+
∫Var(rn(x)) dx;
and the average mean square error
AMSE(rn) =1
n
n∑i=1
E((rn(xi)− r(xi))2
).
13 / 22
Introduction Estimations: local modelling Cross Validation Assignments
Cross Validation: choosing bandwidths
Take the AMSE as the criteria. We would like to choose h to minimize
AMSE(h) =1
n
n∑i=1
E((rnh(xi)− r(xi))2
).
Since r is unknown, we need to estimate AMSE(h). As a first guess,one might think of the average residual sums of squares
1
n
n∑i=1
(Yi − rnh(xi))2 .
This turns out to be a bad choice. It usually leads to undersmoothing(overfitting). The reason is that it favors estimates which are toowell-adapted for the data and are not reasonsable for new observations.
14 / 22
Introduction Estimations: local modelling Cross Validation Assignments
Cross Validation: choosing bandwidths
We estimate the risk using the leave-one-out cross validation scoredefined as
CV (h) =1
n
n∑i=1
(Yi − r(i)nh(xi)
)2,
where r(i)nh(xi) is the estimator based on {(xj , Yj), 1 ≤ j ≤ n, j 6= i}, i.e.
ommitting the observation (xi, Yi).
15 / 22
Introduction Estimations: local modelling Cross Validation Assignments
Cross Validation: choosing bandwidths
In order to compute the CV score, there is no need to fit the curve ntimes. Let rnh(x) =
∑ni=1 li(x)Yi. Then CV (h) can be written as
CV (h) =1
n
n∑i=1
(Yi − rnh(xi)1− li(xi)
)2
.
Hence
hcv = argminh
CV (h) = argminh
1
n
n∑i=1
(Yi − rnh(xi)1− li(xi)
)2
.
16 / 22
Introduction Estimations: local modelling Cross Validation Assignments
An example: Pick-It Lotteryh = 1, 10, 20, 50.
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●●●
●
●
●●
●●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●●
●
●●
●
●●
●
●
●●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
0 200 400 600 800 1000
200
400
600
800
Number
Pay
off
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●●●
●
●
●●
●●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●●
●
●●
●
●●
●
●
●●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
0 200 400 600 800 1000
200
400
600
800
Number
Pay
off
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●●●
●
●
●●
●●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●●
●
●●
●
●●
●
●
●●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
0 200 400 600 800 1000
200
400
600
800
Number
Pay
off
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●●●
●
●
●●
●●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●●
●
●●
●
●●
●
●
●●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
0 200 400 600 800 1000
200
400
600
800
Number
Pay
off
17 / 22
Introduction Estimations: local modelling Cross Validation Assignments
An example: Pick-It Lottery
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0 200 400 600 800 1000
200
400
600
800
Number
Pay
off
The curve suggests that there were larger payoffs for numbers in theinterval [0, 100]. People tended to pick numbers starting with 2 and3. This pattern disappeared after 1976. People noticed the patternand changed their choices.
18 / 22
Introduction Estimations: local modelling Cross Validation Assignments
Lectures 10 – 12
19 / 22
Introduction Estimations: local modelling Cross Validation Assignments
Group Presentation (April 20)
Group 16
The data are the infant-mortality rates (infant death per 1000 livebirths) and GDP per capita (in U.S. dollars) for 193 countries in2003.Make a scatter plot of the data.Estimate the regression function with different approaches. Give yourcomments.
20 / 22
Introduction Estimations: local modelling Cross Validation Assignments
Group Presentation (April 20)
Group 17
Downloand the dataset CMB fromhttp://www.stat.cmu.edu/~larry/all-of-nonpar/data.html.Consider power as response variable and Multipole as covariate.Fit a model based on the first 400 observations:
Make a scatter plot of the data. Consider Nadaraya-Watsonestimator. Using CV (h) score to choose h. Present your estimate.
Repeat the procedure above, but for the whole data set.
21 / 22
Introduction Estimations: local modelling Cross Validation Assignments
Group Presentation (April 20)
Group 18
Consider the following model.
Yi = r(xi) + εi,
where r(x) = x2 − 2x, x ∈ [0, 2] and εi iid from N(0, 0.52).
Simulate one sample {(xi, Yi), i = 1, . . . , 100}. You can choosexi = i/50. Fit a Nadaraya-Watson estimator to this data.Choose your kernel and h. Estimate MSE(rn(1)) by simulation.Hint: simulate many samples from the model and use the samplecounterpart as the estimator.Repeat step 2 for a different h. Compare the results and give yourcomments.
22 / 22