24
Introduction to Kernel Smoothing Wilcoxon score Density 700 800 900 1000 1100 1200 1300 0.000 0.002 0.004 0.006

Introduction to Kernel Smoothing · The kernel K { Can be a proper pdf. Usually chosen to be unimodal and symmetric about zero.)Center of kernel is placed right over each data point.)In

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Kernel Smoothing · The kernel K { Can be a proper pdf. Usually chosen to be unimodal and symmetric about zero.)Center of kernel is placed right over each data point.)In

Introduction to Kernel Smoothing

Wilcoxon score

Den

sity

700 800 900 1000 1100 1200 1300

0.00

00.

002

0.00

40.

006

Page 2: Introduction to Kernel Smoothing · The kernel K { Can be a proper pdf. Usually chosen to be unimodal and symmetric about zero.)Center of kernel is placed right over each data point.)In

M. P. Wand & M. C. Jones

Kernel Smoothing

Monographs on Statistics and Applied Probability

Chapman & Hall, 1995.

Stefanie Scheid - Introduction to Kernel Smoothing - January 5, 2004 1

Page 3: Introduction to Kernel Smoothing · The kernel K { Can be a proper pdf. Usually chosen to be unimodal and symmetric about zero.)Center of kernel is placed right over each data point.)In

Introduction

Histogram of some p−values

p−values

Den

sity

0.0 0.2 0.4 0.6 0.8 1.0

02

46

810

12

Stefanie Scheid - Introduction to Kernel Smoothing - January 5, 2004 2

Page 4: Introduction to Kernel Smoothing · The kernel K { Can be a proper pdf. Usually chosen to be unimodal and symmetric about zero.)Center of kernel is placed right over each data point.)In

Introduction

– Estimation of functions such as regression functions or probability

density functions.

– Kernel-based methods are most popular non-parametric estimators.

– Can uncover structural features in the data which a parametric

approach might not reveal.

Stefanie Scheid - Introduction to Kernel Smoothing - January 5, 2004 3

Page 5: Introduction to Kernel Smoothing · The kernel K { Can be a proper pdf. Usually chosen to be unimodal and symmetric about zero.)Center of kernel is placed right over each data point.)In

Univariate kernel density estimator

Given a random sample X1, . . . , Xn with a continuous, univariate

density f . The kernel density estimator is

f̂(x, h) =1nh

n∑i=1

K

(x−Xi

h

)with kernel K and bandwidth h. Under mild conditions (h

must decrease with increasing n) the kernel estimate converges in

probability to the true density.

Stefanie Scheid - Introduction to Kernel Smoothing - January 5, 2004 4

Page 6: Introduction to Kernel Smoothing · The kernel K { Can be a proper pdf. Usually chosen to be unimodal and symmetric about zero.)Center of kernel is placed right over each data point.)In

The kernel K

– Can be a proper pdf. Usually chosen to be unimodal and symmetric

about zero.

⇒ Center of kernel is placed right over each data point.

⇒ Influence of each data point is spread about its neighborhood.

⇒ Contribution from each point is summed to overall estimate.

Stefanie Scheid - Introduction to Kernel Smoothing - January 5, 2004 5

Page 7: Introduction to Kernel Smoothing · The kernel K { Can be a proper pdf. Usually chosen to be unimodal and symmetric about zero.)Center of kernel is placed right over each data point.)In

● ● ● ● ● ● ● ● ● ●

0 5 10 15

0.0

0.5

1.0

1.5

Gaussian kernel density estimate

● ● ● ● ● ● ● ● ● ●

Stefanie Scheid - Introduction to Kernel Smoothing - January 5, 2004 6

Page 8: Introduction to Kernel Smoothing · The kernel K { Can be a proper pdf. Usually chosen to be unimodal and symmetric about zero.)Center of kernel is placed right over each data point.)In

The bandwidth h

– Scaling factor.

– Controls how wide the probability mass is spread around a point.

– Controls the smoothness or roughness of a density estimate.

⇒ Bandwidth selection bears danger of under- or oversmoothing.

Stefanie Scheid - Introduction to Kernel Smoothing - January 5, 2004 7

Page 9: Introduction to Kernel Smoothing · The kernel K { Can be a proper pdf. Usually chosen to be unimodal and symmetric about zero.)Center of kernel is placed right over each data point.)In

From over- to undersmoothing

KDE with b=0.1

p−values

Den

sity

0.0 0.2 0.4 0.6 0.8 1.0

02

46

810

12

Stefanie Scheid - Introduction to Kernel Smoothing - January 5, 2004 8

Page 10: Introduction to Kernel Smoothing · The kernel K { Can be a proper pdf. Usually chosen to be unimodal and symmetric about zero.)Center of kernel is placed right over each data point.)In

From over- to undersmoothing

KDE with b=0.05

p−values

Den

sity

0.0 0.2 0.4 0.6 0.8 1.0

02

46

810

12

Stefanie Scheid - Introduction to Kernel Smoothing - January 5, 2004 9

Page 11: Introduction to Kernel Smoothing · The kernel K { Can be a proper pdf. Usually chosen to be unimodal and symmetric about zero.)Center of kernel is placed right over each data point.)In

From over- to undersmoothing

KDE with b=0.02

p−values

Den

sity

0.0 0.2 0.4 0.6 0.8 1.0

02

46

810

12

Stefanie Scheid - Introduction to Kernel Smoothing - January 5, 2004 10

Page 12: Introduction to Kernel Smoothing · The kernel K { Can be a proper pdf. Usually chosen to be unimodal and symmetric about zero.)Center of kernel is placed right over each data point.)In

From over- to undersmoothing

KDE with b=0.005

p−values

Den

sity

0.0 0.2 0.4 0.6 0.8 1.0

02

46

810

12

Stefanie Scheid - Introduction to Kernel Smoothing - January 5, 2004 11

Page 13: Introduction to Kernel Smoothing · The kernel K { Can be a proper pdf. Usually chosen to be unimodal and symmetric about zero.)Center of kernel is placed right over each data point.)In

Some kernels

−2 −1 0 1 2

0.0

0.1

0.2

0.3

0.4

0.5

Uniform

−2 −1 0 1 2

0.0

0.2

0.4

0.6

Epanechnikov

−2 −1 0 1 2

0.0

0.2

0.4

0.6

0.8

Biweight

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

Gauss

−2 −1 0 1 2

0.0

0.2

0.4

0.6

0.8

1.0

Triangular

Stefanie Scheid - Introduction to Kernel Smoothing - January 5, 2004 12

Page 14: Introduction to Kernel Smoothing · The kernel K { Can be a proper pdf. Usually chosen to be unimodal and symmetric about zero.)Center of kernel is placed right over each data point.)In

Some kernels

K(x, p) =(1− x2)p

22p+1B(p+ 1, p+ 1)1{|x|<1}

with B(a, b) = Γ(a)Γ(b)/Γ(a+ b).

– p = 0: Uniform kernel.

– p = 1: Epanechnikov kernel.

– p = 2: Biweight kernel.

Stefanie Scheid - Introduction to Kernel Smoothing - January 5, 2004 13

Page 15: Introduction to Kernel Smoothing · The kernel K { Can be a proper pdf. Usually chosen to be unimodal and symmetric about zero.)Center of kernel is placed right over each data point.)In

Kernel efficiency

– Perfomance of kernel is measured by MISE (mean integrated

squared error) or AMISE (asymptotic MISE).

– Epanechnikov kernel minimizes AMISE and is therefore optimal.

– Kernel efficiency is measured in comparison to Epanechnikov kernel.

Stefanie Scheid - Introduction to Kernel Smoothing - January 5, 2004 14

Page 16: Introduction to Kernel Smoothing · The kernel K { Can be a proper pdf. Usually chosen to be unimodal and symmetric about zero.)Center of kernel is placed right over each data point.)In

Kernel Efficiency

Epanechnikov 1.000

Biweight 0.994

Triangular 0.986

Normal 0.951

Uniform 0.930

⇒ Choice of kernel is not as important as choice of bandwidth.

Stefanie Scheid - Introduction to Kernel Smoothing - January 5, 2004 15

Page 17: Introduction to Kernel Smoothing · The kernel K { Can be a proper pdf. Usually chosen to be unimodal and symmetric about zero.)Center of kernel is placed right over each data point.)In

Modified KDEs

– Local KDE: Bandwidth depends on x.

– Variable KDE: Smooth out the influence of points in sparse regions.

– Transformation KDE: If f is difficult to estimate (highly skewed,

high kurtosis), transform data to gain a pdf that is easier to

estimate.

Stefanie Scheid - Introduction to Kernel Smoothing - January 5, 2004 16

Page 18: Introduction to Kernel Smoothing · The kernel K { Can be a proper pdf. Usually chosen to be unimodal and symmetric about zero.)Center of kernel is placed right over each data point.)In

Bandwidth selection

– Simple versus high-tech selection rules.

– Objective function: MISE/AMISE.

– R-function density offers several selection rules.

Stefanie Scheid - Introduction to Kernel Smoothing - January 5, 2004 17

Page 19: Introduction to Kernel Smoothing · The kernel K { Can be a proper pdf. Usually chosen to be unimodal and symmetric about zero.)Center of kernel is placed right over each data point.)In

bw.nrd0, bw.nrd

– Normal scale rule.

– Assumes f to be normal and calculates the AMISE-optimal

bandwidth in this setting.

– First guess but oversmoothes if f is multimodal or otherwise not

normal.

Stefanie Scheid - Introduction to Kernel Smoothing - January 5, 2004 18

Page 20: Introduction to Kernel Smoothing · The kernel K { Can be a proper pdf. Usually chosen to be unimodal and symmetric about zero.)Center of kernel is placed right over each data point.)In

bw.ucv

– Unbiased (or least squares) cross-validation.

– Estimates part of MISE by leave-one-out KDE and minimizes this

estimator with respect to h.

– Problems: Several local minima, high variability.

Stefanie Scheid - Introduction to Kernel Smoothing - January 5, 2004 19

Page 21: Introduction to Kernel Smoothing · The kernel K { Can be a proper pdf. Usually chosen to be unimodal and symmetric about zero.)Center of kernel is placed right over each data point.)In

bw.bcv

– Biased cross-validation.

– Estimation is based on optimization of AMISE instead of MISE (as

bw.ucv does).

– Lower variance but reasonable bias.

Stefanie Scheid - Introduction to Kernel Smoothing - January 5, 2004 20

Page 22: Introduction to Kernel Smoothing · The kernel K { Can be a proper pdf. Usually chosen to be unimodal and symmetric about zero.)Center of kernel is placed right over each data point.)In

bw.SJ(method=c("ste", "dpi"))

– The AMISE optimization involves the estimation of density

functionals like integrated squared density derivatives.

– dpi: Direct plug-in rule. Estimates the needed functionals by KDE.

Problem: Choice of pilot bandwidth.

– ste: Solve-the-equation rule. The pilot bandwidth depends on h.

Stefanie Scheid - Introduction to Kernel Smoothing - January 5, 2004 21

Page 23: Introduction to Kernel Smoothing · The kernel K { Can be a proper pdf. Usually chosen to be unimodal and symmetric about zero.)Center of kernel is placed right over each data point.)In

Comparison of bandwidth selectors

– Simulation results depend on selected true densities.

– Selectors with pilot bandwidths perform quite well but rely on

asymptotics ⇒ less accurate for densities with “sharp features”

(e.g. multiple modes).

– UCV has high variance but does not depend on asymptotics.

– BCV performs bad in several simulations.

– Authors’ recommendation: DPI or STE better than UCV or BCV.

Stefanie Scheid - Introduction to Kernel Smoothing - January 5, 2004 22

Page 24: Introduction to Kernel Smoothing · The kernel K { Can be a proper pdf. Usually chosen to be unimodal and symmetric about zero.)Center of kernel is placed right over each data point.)In

KDE with Epanechnikov kernel and DPI rule

p−values

Den

sity

0.0 0.2 0.4 0.6 0.8 1.0

02

46

810

12

Stefanie Scheid - Introduction to Kernel Smoothing - January 5, 2004 23