74
Robust Statistics Osnat Goren-Peyser 25.3.08

Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Embed Size (px)

Citation preview

Page 1: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Robust Statistics

Osnat Goren-Peyser25.3.08

Page 2: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Outline

1. Introduction2. Motivation3. Measuring robustness4. M estimators5. Order statistics approaches6. Summary and conclusions

Page 3: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

1. Introduction

Page 4: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Problem definition Let be a set of iid

variables with distribution F, sorted in increasing order so that

is the value of the m’th order statistics

An estimator is a function of the observations.

We are looking for estimators such that

1 2ˆ ˆ , ,..., nx x x

1 2, ,..., nx x x x

1 2 ... nx x x

mx

ˆ

Page 5: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Asymptotic values of estimator Define: such that

where is the asymptotic value of estimator at F.

Estimator is consistent for θ if:

We say that is asymptotically normal with parameters θ,V(θ) if

ˆ ˆ F ˆ ˆn p

F ˆ F

ˆ F

n

n

ˆ 0,n dn N V

n

Page 6: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Efficiency An unbiased estimator is efficient if

Where is Fisher information

An unbiased estimator is asymptotic efficient if

ˆvar 1I

ˆlim var 1nI

I

Page 7: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Relative efficiency For fixed underlying distribution, assume two

unbiased estimators and of . we say is more efficient than if

The relative efficiency (RE) of estimator with respect to is defined as the ratio of their variances.

1 2

1 2

12

2 1 1 2ˆ ˆ ˆ ˆ; var varRE

1 2ˆ ˆvar var

Page 8: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Asymptotic Relative efficiency

The asymptotic relative efficiency (ARE) is the limit of the RE as the sample size n→∞

For two estimators which are each consistent for θ and also asymptotically normal [1]:

1

2 1

2

ˆvarˆ ˆ; lim

ˆvarnARE

1 2ˆ ˆ,

Page 9: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

The location model xi=μ+ui ; i=1,2,…,n

Where μ is the unknown location parameter, ui are the errors, and xi are the observations.

The errors ui‘s are i.i.d random variables each with the same distribution function F0.

The observations xi‘s are i.i.d random variables with common distribution function: F(x) = F0(x- μ)

Page 10: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Normality Classical statistical methods rely on the

assumption that F is exactly known The assumption that is a normal

distribution is commonly used But normality often happens approximately

robust methods Approximately normality

The majority of observations are normally distributed some observations follow a different pattern (not

normal) or no pattern at all. Suggested model: a mixture model

2,F N

Page 11: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

A mixture model Formalizing the idea of F being approximate

normal Assume that a proportion 1-ε of the observations

is generated by the normal model, while a proportion ε is generated by an unknown model.

The mixture model: F=(1- ε)G+ εH F is a contamination “neighborhood” of G and

also called the gross error model F is called a normal mixture model when both G

and H are normal

Page 12: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

2. Motivation

Page 13: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Outliers Outlier is an atypical

observation that is well separated from the bulk of the data.

Statistics derived from data sets that include outliers will often be misleading

Even a single outlier can have a large distorting influence on a classical statistical methods

-4 -2 0 2 40

10

20

30

40

50

60

70

80

90

100

values-15 -10 -5 0 50

20

40

60

80

100

120

140

160

180

values

outliers

Estimators not sensitive to outliers are said to be robustEstimators not sensitive to

outliers are said to be robust

Without the outliers

2 1

With the outliers

Page 14: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Mean and standard deviation

The sample mean is defined by

A classical estimation for the location (center) of the data For , the sample mean is unbiased with

The sample standard deviation (SD) is defied by

A classical estimation for the dispersion of the data

How much influence a single outlier can have on these classical estimators?

How much influence a single outlier can have on these classical estimators?

1

1 n

ii

x xn

21

1

1

n

ii

s x xn

2,N 2,N n

Page 15: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Example 1 – the flour example Consider the following 24 determinations of the copper

content in wholemeal flour (in parts per million), sorted in ascending order [6]

2.20,2.20,2.40,2.40,2.50,2.70,2.80,2.90,3.03,3.03,3.10,3.37,3.40,3.40,3.40,3.50,3.60,3.70,3.70,3.70,3.70,3.77,5.28,28.95

The value 28.95 considered as an outlier. Two cases:

Case A - Taking into account the whole data Case B - Deleting the suspicious outlier

Page 16: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Case A Case B

0 5 10 15 20 25 300

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18Case A - Using the whole data for estimation

Observation value

Pro

babi

lity

data

sample mean

2 2.5 3 3.5 4 4.5 5 5.50

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18Case B - Deleting the outlier

Observation value

Pro

babi

lity

data

sample mean

Example 1 – PDFs

outlier meanmean

4.28, 5.30x s 3.21, 0.69x s

Page 17: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Example 1 – arising question Question: How much influence a single outlier

can have on sample mean and sample SD? Assuming the outlier value 28.95 is replaced by

an arbitrary value varying from −∞ to +∞: The value of the sample mean changes from −∞ to

+∞. The value of the sample SD changes from −∞ to +∞.

Conclusion: A single outlier has an unbounded influence on these two classical estimators!

This is related to sensitivity curve and influence function, as we will see later.

Page 18: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Handling outliers approaches

Detect and remove outliers from the data set Manual screening The normal Q-Q plot The “three-sigma edit” rule

Robust estimators!

Page 19: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Manual screeningWhy screen the data and remove outliers is not

sufficient? Users do not always screen the data. Outliers are not always errors!

Outliers may be correct, and very important for seeing the whole picture including extreme cases.

It can be very difficult to spot outliers in multivariate or highly structured data.

It is a subjective decision Without any unified criterion: Different users different

results It is difficult to determine the statistical behavior of the

complete procedure

Page 20: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

The Q-Q normal plot Manual screening tool

for an underlying Normal distribution

A quantile-quantile plot of the sample quantiles of X versus theoretical quantiles from a normal distribution.

If the distribution of X is normal, the plot will be close to linear.

Page 21: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

The “three-sigma edit” rule Outlier detection tool for an underlying Normal distribution Define the ratio between xi distance to the sample mean and

the sample SD:

The “three-sigma edit rule”: Observations with |ti|>3 are deemed as suspicious

Example 1 - The largest observation in the flour data has ti=4.65, and so is suspicious

Disadvantages: In a very small samples the rule is ineffective Masking: When there are several outliers, their effects may interact in

such a way that some or all of them remain unnoticed

ii

x xt

s

Page 22: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Example 2 – Velocity of light Consider the following 20 determinations of

the time (in microseconds) needed for light to travel a distance of 7442 m [6].

28,26,33,24,34,-44,27,16,40,-229, 22,24,21,25,30, 23,29,31,19

The actual times are the table values × 0.001 + 24.8.

The values -2 and -44 suspicious as outliers.

Page 23: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Example 2 – QQ plot

-2 -1.5 -1 -0.5 0 0.5 1 1.5 224.75

24.76

24.77

24.78

24.79

24.8

24.81

24.82

24.83

24.84

Standard Normal Quantiles

Qua

ntile

s of

Inp

ut S

ampl

e

QQ Plot of Sample Data versus Standard Normal

Outlier -2

Outlier -44

Page 24: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Example 2 – Masking Results:

Based on the three-sigma edit rule: the value of |ti | for the observation −2 does not

indicate that it is an outlier. the value −44 “masks” the value −2.

2 1.35 , 44 3.73i i i it x t x

Page 25: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Detect and remove outliers There are many other methods for detecting

outliers. Deleting an outlier poses a number of

problems: Affects the distribution theory Underestimating

data variability Depends on the user’s subjective decisions

difficult to determine the statistical behavior of the complete procedure.

Robust estimators provide automatic ways of detecting, and removing outliersRobust estimators provide automatic ways of detecting, and removing outliers

Page 26: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Example 1 – Comparing the sample median to the sample mean

2 2.5 3 3.5 4 4.5 5 5.5

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Case C - Testing the median estimator

Observation value

Pro

babi

lity

data

Case A sample mean

Case A sample medianCase B sample mean

Case B sample median

Median BMean B Mean A

Median A

• Case A: med = 3.3850• Case B: med = 3.3700

• The sample median fits the bulk of the data in both cases

• The value of the sample median does not change from −∞ to +∞ as was the case for the sample mean. • The sample median is a good robust alternative to the sample mean.

• Case A: med = 3.3850• Case B: med = 3.3700

• The sample median fits the bulk of the data in both cases

• The value of the sample median does not change from −∞ to +∞ as was the case for the sample mean. • The sample median is a good robust alternative to the sample mean.

Page 27: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Robust alternative to mean Sample median is a very old method for estimating the

“middle” of the data. The sample median is defined for some integer m by

For large n and , the sample median is approximately

At normal distribution: ARE(median;mean)=2/π≈64%

1

, 2 1

, 22

m

m m

x if n is odd n m

Med x x xif n is even n m

2, 2N n 2,NF

Page 28: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Single outlier affection The sample mean can be upset completely by a

single outlier The sample median is little affected by a single

outlier The median is resistant to gross errors whereas

the mean is not The median will tolerate up to 50% gross errors

before it can be made arbitrarily large Breakdown point:

Median: 50% Mean: 0%

Page 29: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Mean & median – robustness vs. efficiency

For mixture mode:

The sample mean variance is

The sample median variance is approximately

21 ,1 ,F N N

21

n

22 1n

The gain in robustness due to using the median is paid for by a loss in efficiency when F is very close to normal

The gain in robustness due to using the median is paid for by a loss in efficiency when F is very close to normal

Page 30: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

So why not always use the sample median? If the data do not contain outliers, the sample median

has statistical performance which is poorer than that of the classical sample mean

Robust estimation goal: “The best of both worlds”

We shall develop estimators which combine the low variance of the mean at the normal, with the robustness of the median under contamination.

Page 31: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

3. Measuring robustness

Page 32: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Analysis tools

Sensitivity curve (SC) Influence function (IF) Breakdown point (BP)

Page 33: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Sensitivity curve SC measures the effect of different

locations of an outlier on the sample The sensitivity curve of an estimator

for the samples is:

where x0 is the location of a single outlier Bounded SC(x0) high robustness!

1 2, ,..., nx x x

1 2 1 20 0ˆ ˆ, ,..., , , ,...,n nSC x x x xx xx x

Page 34: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

SC of mean & median

n=200 F=N(0,1)

-10 -5 0 5 10-5

-4

-3

-2

-1

0

1

2

3

4

5x 10

-3 Sample mean

outlier

SC

-10 -5 0 5 10-5

-4

-3

-2

-1

0

1

2

3

4

5x 10

-3 Sample median

outlier

SC

Page 35: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Standardized sensitivity curve

The standardized sensitivity curve is defined by

1 2 0 1 20

ˆ ˆ, ,..., , , ,...,

1 1n n

n

x x x x x x xSC x

n

What happens if we add one more observation to a very large sample?What happens if we add one more

observation to a very large sample?

Page 36: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Influence function The influence function of an estimator (Hampel,

1974) is an asymptotic version of its sensitivity curve. It is an approximation to the behavior of θ∞ when the

sample contains a small fraction ε of identical outliers. It is defined as

Where is the point-mass at x0 and stands for “limit from the right”

0x

0

ˆ 00

0 0

ˆ ˆ1, lim

ˆ 1

xF FIF x F

F

Page 37: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

IF main uses is the asymptotic value of the

estimate when the underlying distribution is F and a fraction ε of outliers is equal to x0.

IF has two main uses: Assessing the relative influence of individual

observation towards the value of estimate Unbounded IF less robustness

Allowing a simple heuristic assessment of the asymptotic variance of an estimate

0

ˆ 1 xF

Page 38: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

IF as limit version of SC The SC is a finite sample version of IF

If ε is small

for large n:where ε=1/(n+1)

0 ˆ 0

ˆ ˆ1 ,xF F IF x F

0 ˆ 0

ˆ ˆ1 ,xbias F F IF x F

ˆ0 0 ,nSC x IF x F

Page 39: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Breakdown Point BP is the proportion of arbitrarily large

observations an estimator can handle before giving an arbitrarily large result

Maximum possible BP = 50% High BP more robustness! As seen before:

Mean: 0% Median: 50%

Page 40: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Summary

SC measures the effect of different outliers on estimation

IF is the asymptotic behavior of SC The IF and the BP consider extreme

situations in the study of contamination. IF deals with “infinitesimal” values of ε BP deals with the largest ε an estimate

can tolerate.

Page 41: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

4. M estimators

Page 42: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Maximum likelihood of μ Consider the location model, and assume that

has density

The likelihood function is:

The maximum likelihood estimate (MLE) of μ is

0F

0f

1 2ˆ arg max , ,..., ;nL x x x

1 2 01

, ,..., ;n

n ii

L x x x f x

Page 43: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

M estimators of location (μ) MLE-like estimators: generalizing ML estimators If we have density , which is everywhere positive

The MLE would solve:

Let , if this exists, then

M estimator can almost equivalently described by ρ or Ψ If ρ is everywhere differentiable and ψ is monotonic, then the

forms (*) and (**) are equivalent [6] If ψ continuous and increasing, the solution is unique [6]

0f

0log f 1

ˆ arg minn

ii

x

1

ˆ 0n

ii

x

(*)

(**)

Page 44: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Special casesThe sample mean

The sample median

2 2x x x x

x x

1 1

1ˆ ˆ0

n n

i ii i

x xn

1

1

ˆ 0

ˆ ˆ0 0 0

ˆ ˆ ˆ# # 0

n

ii

n

i ii

i i

sign x

I x I x

x x Med x

1 0

0 0

1 0

0 0

x

x sign x x

x

I x I x

Page 45: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Special cases: ρ and ψ

-4 -2 0 2 40

2

4

6Squared errors

x

(x)

-4 -2 0 2 40

1

2

3Absolute errors

x

(x)

-4 -2 0 2 4-4

-2

0

2

4Squared errors

x

(x

)

-4 -2 0 2 4-1

-0.5

0

0.5

1Absolute errors

x

(x

)

Sample

mean

Sample

median

Page 46: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Asymptotic behavior of location M estimators

For a given distribution F, assume ρ is differentiable and ψ is increasing, and define as the solution of

For large n, and the distribution of estimator

is approximately with

If is uniquely defined then is consistent at F [3]

0 0 F

0 0FE x

0ˆp

(***)

0 ,N n 0 0

22

F FE x E x

Page 47: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Desirable properties M estimators are robust to large proportions of

outliers When ψ is odd, bounded and monotonically increasing, the

BP is 0.5 The IF is proportional to ψ

Ψ function may be chosen to bound the influence of outliers and achieve high efficiency

M estimators are asymptotically normal Can be also consistent for μ

M estimators can be chosen to completely reject outliers (called redescending M estimators) , while maintaining a large BP and high efficiency

Page 48: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Disadvantages

They are in general only implicitly defined and must be found by iterative search

They are in general will not be scale equivariant

Page 49: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Huber functions A popular family of M estimator (Huber,1964) This estimator is an odd nondecreasing ψ

function which minimizes the asymptotic variance among all estimator satisfying:where

Advantages: Combines sample mean for small errors with sample

median for gross errors Boundedness of ψ

( )IF x c 2c

Page 50: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Huber ρ and ψ functions

With derivative , where

2

22k

x if x kx

k x k if x k

0

, 0

, 0k

x if x k kx

sign x k if x k k

x sign x

2 k x

Page 51: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Huber ρ and ψ functions

Ricardo A. Maronna, R. Douglas Martin and V´ıctor J. Yohai, Robust Statistics: Theory and Methods , 2006 John Wiley & Sons

ρ (x)

Ψ(x)

Page 52: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Huber functions – Robustness & efficiency tradeoff

Special cases: K=0 sample median K ∞ sample mean

Asymptotic variances at normal mixture model with G = N(0, 1) and H = N(0, 10),

The larger the asymptotic variance, the less efficient estimator, but the more robust.

Efficiency come on the expense of robustness

mean

median

Ricardo A. Maronna, R. Douglas Martin and V´ıctor J. Yohai, Robust Statistics: Theory and Methods , 2006 John Wiley & Sons

Increasing v

Decreasing robustness

Page 53: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Redescending M estimators Redescending M estimators have ψ functions which are

non-decreasing near the origin but then decrease toward the axis far from the origin

They usually satisfy ψ(x)=0 for all |x|≥r, where r is the minimum rejection point.

Accept the completely outliers rejection ability, they: Do not suffer form masking effect Has a potential to have high BP Their ψ functions can be chosen to redescend smoothly to

zero information in moderately large outliers is not ignored completely improve efficiency!

A popular family of redescending M estimators (Tukey), called Bisquare or biweight

Page 54: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Bisquare ρ and ψ functions

With derivative , where

32

1 1 /

1

x k if x kx

if x k

22

1x

x x I x kk

26 /x k

Page 55: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Bisquare ρ and ψ functions

Ricardo A. Maronna, R. Douglas Martin and V´ıctor J. Yohai, Robust Statistics: Theory and Methods , 2006 John Wiley & Sons

ρ (x)

Ψ(x)

Page 56: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Bisquare function – efficiency

ARE(bisquare;MLE)=

Achieved ARE close to 1

ARE 0.8 0.85 0.9 0.95

k 3.14 3.44 3.88 4.68

Page 57: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Choice of ψ and ρ

In practical, the choice of ρ and ψ function is not critical to gaining a good robust estimate (Huber, 1981).

Redescending and bounded ψ functions are to be preferred

Bounded ρ functions are to be preferred Bisquare function is a popular choiceBisquare function is a popular choice

Page 58: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

5. Order statistics approaches

Page 59: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

The β-trimmed mean Let and The β-trimmed mean is defined by

Where [.] stands for the integer part and denotes the ith order statistic.

is the sample mean after the m largest and the m smallest observation have been discard

1

1

2

n m

ii m

x xn m

0,0.5 1m n

ix

x

Page 60: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

The β-trimmed mean – cont.

Limit cases β=0 the sample mean β 0.5 the sample median

Distribution of trimmed mean The exact distribution is intractable For large n the, distribution under

location model is approximately normal BP of β % trimmed mean = β %

Page 61: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Example 1 – trimmed mean

All data Delete outlier

Mean 4.28 3.2

Median 3.38 3.37

Trimmed mean 10% 3.2 3.11

Trimmed mean 25% 3.17 3.17

Median and trimmed mean are less sensitive to outliers existenceMedian and trimmed mean are less sensitive to outliers existence

Page 62: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

The W-Winsorized mean The W-Winsorized mean is defied by

The m smallest observations are replaced by the (m+1)’st smallest observation, and the m largest observations are replaced by the (m+1)’st largest observations

1

,2

11 11 n m

w im

mi

m nm x m xx xn

Page 63: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Trimmed and W-Winsorized mean disadvantages Uses more information from the sample

than the sample median Unless the underlying distribution is

symmetric, they are unlikely to produce an unbiased estimator for either the mean or the median.

Does not have a normal distribution

Page 64: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

L estimators Trimmed and Winsorized mean are special cases

of L estimators L estimators is defined as Linear combinations

of order statistics:

Where the are given constants

For β-trimmed mean:

1

ˆn

i ii

x

'i s

11

2i I m i n mn m

Page 65: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

L vs. M estimators

M estimators are: More flexible Can be generalized straightforwardly

to multi-parameter problems Have high BP

L estimator Less efficient because they

completely ignore part of the data

Page 66: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

6. Summary & conclusions

Page 67: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

SC of location M estimators

n=20xi~N(0,1)Trimmed mean: α=25%Huber: k=1.37Bisquare: k=4.68

n=20xi~N(0,1)Trimmed mean: α=25%Huber: k=1.37Bisquare: k=4.68

Page 68: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

The effect of increasing contamination on a sample

Replace m points by a fixed value x0=1000

0 0 1 1 2ˆ ˆ,..., , ,..., , ,...,m n nbiased SC m x x x x x x x

n=

20

xi~

N(0

,1)

Tri

mm

ed m

ean:

α=

8.5

%H

uber:

k=

1.3

7B

isquare

: k=

4.6

8

n=

20

xi~

N(0

,1)

Tri

mm

ed m

ean:

α=

8.5

%H

uber:

k=

1.3

7B

isquare

: k=

4.6

8

Page 69: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

IF of location M estimator

IF is proportional to ψ (Huber, 1981)

In general

0

ˆ 0

ˆ,

ˆ

xIF x F

E x

Page 70: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

BP of location M estimator In general

When ψ is odd, bounded and monotonically increasing, the BP is 50%

Assume are finite, then the BP is:

Special cases: Sample mean = 0% Sample median = 50%

1 2,k k 1 2 1 2min ,k k k k

Page 71: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Comparison between different location estimators

Estimator BP SC/IF/ψ Redescending ψ Efficiency in mixture model

Mean 0% unbounded No low

Median 50% bounded No low

Huber 50% bounded No high

Bisquare 50% Bounded at 0 Yes high

x% trimmed mean

x% bounded No

Page 72: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Conclusions Robust statistics provides an alternative

approach to classical statistical methods. Robust statistics seeks to provide methods

that emulate classical methods, but which are not unduly affected by outliers or other small departures from model assumptions.

In order to quantify the robustness of a method, it is necessary to define some measures of robustness

Page 73: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

Efficiency vs. Robustness Efficiency can be achieved by taking ψ

proportional to the derivative of the log-likelihood defined by the density of F: ψ(x)=-c(f’/f)(x), where c is constant≠0

Robustness is achieved by choosing ψ that is smooth and bounded to reduce the influence of a small proportion of observations

Page 74: Robust Statistics Osnat Goren-Peyser 25.3.08. Outline 1.Introduction 2.Motivation 3.Measuring robustness 4.M estimators 5.Order statistics approaches

References1. Robert G. Staudte and Simon J. Sheather, Robust estimation and

testing, Wiley 1990.2. Elvezio Ronchetti, “THE HISTORICAL DEVELOPMENT OF ROBUST

STATISTICS”, ICOTS-7, 2006: Ronchetti, University of Geneva, Switzerland, 2006.

3. Huber, P. (1981). Robust Statistics. New York: Wiley.4. Hampel F. R., Ronchetti, E. M., Rousseeuw, P. J. and Stahel, W. A.

(1986). Robust Statistics: The Approach Based on Influence Functions. New York: Wiley.

5. Tukey6. Ricardo A. Maronna, R. Douglas Martin and V´ıctor J. Yohai, Robust

Statistics: Theory and Methods , 2006 John Wiley & Sons7. B. D. Ripley , M.Sc. in Applied Statistics MT2004, Robust Statistics,

1992–2004.8. Robust statistics From Wikipedia.