Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Nonparametric Density Estimation(one dimension)
Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction
Nonparametric kernel density estimation
Tine Buch-Kromann
February 7, 2007
Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)
Derivation
Idea of the histogram:
1
n · interval length#{obs that fall into a small interval containing x}
Idea of the kernel density estimator:
1
n · interval length#{obs that fall into a small interval around x}
Note: That the estimator does not depends on a origin of a bin grid.
When we want to estimate the density in x , we consider theinterval [x − h, x + h):
f̂h(x) =1
2hn#{Xi ∈ [x − h, x + h)}
Note: Interval length is 2h.
Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)
Derivation
f̂h(x) =1
2hn#{Xi ∈ [x − h, x + h)}
=1
2hn
n∑
i=1
I (|x − Xi | ≤ h)
=1
hn
n∑
i=1
1
2I
(∣∣∣∣x − Xi
h
∣∣∣∣ ≤ 1)
=1
hn
n∑
i=1
K
(x − Xi
h
)
where K (u) = 12 I (|u| ≤ 1) is the uniform kernel function, whichassigns weight 1/2 to each observation in the interval around x .Points outside the interval assigns the weight 0.
Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)
Derivation
Improvement:More weight to observations very close to x and less weight toobservations farther away x (The Epanechnikov kernel function)
K (u) =3
4(1− u2)I (|u| ≤ 1)
Kernel K (u)
Uniform 12 I (|u| ≤ 1)Triangle (1− |u|)I (|u| ≤ 1)Epanechnikov 34(1− u2)I (|u| ≤ 1)Quartic 1516(1− u2)2I (|u| ≤ 1)Triweight 3532(1− u2)3I (|u| ≤ 1)Gaussian 1√
2πexp(−12u2)
Cosine π4 cos(
π2 u
)I (|u| ≤ 1)
Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)
Derivation
Kernel Density EstimatorGeneral form of the kernel density estimator of a probabilitydensity f , based on a sample X1, ...,Xn
f̂h(x) =1
n
n∑
i=1
Kh(x − Xi )
where Kh(·) = 1hK (·/h) and h is the bandwidth.
Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)
Bandwidth
Bandwidth, h:h controls the smoothness of the estimate (similar to thehistogram) and the choice of h is a crucial problem.
The problem of how to determine the value of h is a reasonableway is handled later.
Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)
The Kernel Function
Properties:Kernel functions are usually probability density function:
∫K (u) du = 1
K (u) ≥ 0 ∀u in the domain of K
Consequences:The kernel density estimator is a pdf
∫f̂h(x) dx = 1
Moreover, f̂h(x) inherits all the continuity and differentiabilityproperties of K : If K is ν times continuously differentiable thenalso f̂h(x) is ν times cont. diff.
Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)
The Kernel Function
The smoothness of f̂h(x) (for the same value of h) depends ofkernel function.
...even if both kernel functions are continuous
Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)
Kernel Density Estimator as a Sum of Bumps
An alternative view of the kernel density estimation.Rescaled kernel function
1
nhK
(x − Xi
h
)=
1
nKh (x − Xi )
Note: The area under the rescaled kernel function is∫
1
nhK
(x − Xi
h
)dx =
1
nh
∫K (u)h du
=1
nhh
∫K (u) du
=1
n
Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)
Rescaled kernel function
Rewrite the kernel density function as the sum of rescaledkernel functions
f̂h(x) =1
n
n∑
i=1
Kh(x − Xi ) =n∑
i=1
1
nhK
(x − Xi
h
)
Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)
Statistical Properties: Bias
Bias:
Bias(f̂h(x)) =h2
2f ′′(x)µ2(K ) + o(h2), h → 0
where µ2(K ) =∫
s2K (s) ds.
The bias is proportional to h2:Choose a small h to reduce the bias.
Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)
Statistical Properties: Bias
Bias depends on f ′′(x) (curvature):In peaks of f : The bias < 0 since f ′′ < 0 around a local maximumof f .In ”valleys” of f : The bias > 0 since f ′′ > 0 around a local mini-mum of f .The magnitude of the bias depends of the absolute value of f ′′.
Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)
Statistical Properties: Variance
Variance:
V(f̂h(x)) =1
nh||K ||22f (x) + o
(1
nh
), nh →∞
where ||K ||22 =∫
K 2(s) ds, the squared L2 norm of K .
The variance is proportional to (nh)−1:
I Choose large h to reduce the variance.I Increase n to reduce the variance.I The variance increases in ||K ||22: Flat kernels reduce the
variance.
Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)
Statistical Properties: MSE
Trade-off between bias and variance:Increasing h will lower the variance but increases the bias and viceversa.
Minimizing the MSE is a compromise between over- andundersmoothing.
Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)
Statistical Properties: MSE
Mean Squared Error:
MSE(f̂h(x)) =h4
4f ′′(x)2µ2(K )2 +
1
nh||K ||22f (x) + o(h4) + o
(1
nh
)
Note: MSE → 0 as h → 0 and nh →∞,ie. the kernel density estimator is a consistent estimator.
Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)
Statistical Properties: MISE and AMISE
MISE: Mean Integrated Squared Error
MISE(f̂h) =1
nh||K ||22 +
h4
4µ2(K )
2||f ′′||22 + o(
1
nh
)+ o(h4),
as h → 0, nh →∞.AMISE: Approx. formula for MISE, ignoring higher order terms
AMISE(f̂h) =1
nh||K ||22 +
h4
4µ22(K )||f ′′||22
Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)
Statistical Properties: Optimal bandwidth
Bandwidth: Optimal wrt. AMISE
hopt =
( ||K ||22||f ′′||22 µ2(K )2 n
)1/5∼ n−1/5
Depends on ||f ′′||22 - unknown.Convergence of AMISE:
AMISE(f̂hopt ) =5
4
(||K ||22)4/5 (
µ2(K )||f ′′||2)2/5
n−4/5 ∼ n−4/5
Note: AMISE converges at the rate n−4/5.
Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)
Statistical Properties: Comparison with the histogram
AMISE: Histogram
AMISE(f̂h) =1
nh+
h2
12||f ′||22
Bandwidth: Optimal wrt. AMISE (histogram)
h0 =
(6
n||f ′||22
)1/3∼ n−1/3
AMISE converges at the rate n−2/3 in the histogram.
Slower rate of convergence in the histogram compared to thekernel density estimator.
Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)
Bandwidth selection
The two most frequently used method of banwidth selection:
I The plug-in method,I Cross-validation methods.
Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)
Bandwidth selection: Silverman’s Rule of Thumb
Plug-in methods:Replace unknown parameters with estimates.
AMISE optimal bandwidth
hopt =
( ||K ||22||f ′′||22 µ2(K )2 n
)1/5
Unknown parameter is ||f ′′||22.Assume f is a normal(µ, σ2)-distribution, then
||f ′′||22 = σ−53
8√
π≈ 0.212σ−5
Replace the σ with σ̂.
Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)
Bandwidth selection: Silverman’s Rule of Thumb
Choose kernel function: Gaussian kernel.
Then the ”Rule-of-Thumb” bandwidth
ĥrot =
(||ϕ||22
||f̂ ′′||22 µ22(ϕ) n
)1/5
=
(4σ̂5
3n
)1/5
≈ 1.06σ̂n−1/5
Applicable formula for bandwidth selection.If X is normally distributed, then ĥrot gives the optimal bandwidth.If X is not normally distributed, then ĥrot will give a bandwidth nottoo far from the optimal bandwidth, if the distribution of X is nottoo different from the normal distribution.
Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)
Bandwidth selection: Silverman’s Rule of Thumb
Practical problem:The Rule-of-Thumb bandwidths is sensitive to outliers:A single outlier may cause a too large estimator of σ and hence atoo large bandwidth.
Robust estimator:Calculate
R = X[0.75n]︸ ︷︷ ︸75%−quantile
− X[0.25n]︸ ︷︷ ︸25%−quantile
We assume X ∼ N(µ, σ2) and Z ∼ N(0, 1), thereforeR = X[0.75n] − X[0.25n]
= (µ + σZ[0.75n])− (µ + σZ[0.25n])= σ(Z[0.75n] − Z[0.25n])≈ σ(0.67− (−0.67))= 1.34σ
Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)
Bandwidth selection: Silverman’s Rule of Thumb
Therefore
σ̂ =R
1.34
Plug it into the ”Rule-of-Thumb” bandwidth
ĥrot ≈ = 1.06σ̂n−1/5
= 1.06R
1.34n−1/5
≈ 0.79R̂n−1/n
”Better-Rule-of-Thumb”:Combine the fist and the robust Rule-of-Thumb
ĥrot = 1.06 min
(σ̂,
R
1.34
)n−1/5
Nonparametric kernel density estimation Nonparametric Density Estimation (one dimension)