73
Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade Maresias, August 6th 2015

Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Sub-Gaussian estimators under heavy tails

Roberto Imbuzeiro Oliveira

XIX Escola Brasileira de Probabilidade Maresias, August 6th 2015

Page 2: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

MatthieuLerasle

(CNRS/Nice)

LucDevroye(McGill)

Joint with

GáborLugosi

(ICREA/UPF)

Page 3: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Our problem (and why it's interesting)

Page 4: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Our problem

We want to estimate the mean of a probability distribution over the real line from an i.i.d. sample.

This is (related to) many fundamental statistical tasks.

Page 5: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Our problem

We assume finite variances, but as little else as possible.

Interesting in theory, important in practice.

Page 6: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Our problem

Want nearly optimal tail bounds, uniformly over large classes of distributions.

High-confidence estimates are sometimes necessary.

Page 7: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Formal statement

Given: P, family of probability distributions over R.For P 2 P, µP and �2

P are the mean and variance of P.

Want: for each large enough n 2 N, an estimator

bEn : Rn ! Rand a parameter �min = �min,n 2 [0, 1) such that,

if Xn1 = (X1, . . . , Xn) is i.i.d. from P 2 P, then

8� 2 [�min, 1) : P | bEn(X

n1 )� µP| > L�P

r1 + ln(1/�)

n

! �.

Page 8: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Formal statement

Given: P, family of probability distributions over R.For P 2 P, µP and �2

P are the mean and variance of P.

Want: for each large enough n 2 N, an estimator

bEn : Rn ! Rand a parameter �min = �min,n 2 [0, 1) such that,

if Xn1 = (X1, . . . , Xn) is i.i.d. from P 2 P, then

8� 2 [�min, 1) : P | bEn(X

n1 )� µP| > L�P

r1 + ln(1/�)

n

! �.

Should be very large (nonparametric)

Page 9: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Formal statement

Given: P, family of probability distributions over R.For P 2 P, µP and �2

P are the mean and variance of P.

Want: for each large enough n 2 N, an estimator

bEn : Rn ! Rand a parameter �min = �min,n 2 [0, 1) such that,

if Xn1 = (X1, . . . , Xn) is i.i.d. from P 2 P, then

8� 2 [�min, 1) : P | bEn(X

n1 )� µP| > L�P

r1 + ln(1/�)

n

! �.

Should be very small (exponentially in n?)

Page 10: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Formal statement

Given: P, family of probability distributions over R.For P 2 P, µP and �2

P are the mean and variance of P.

Want: for each large enough n 2 N, an estimator

bEn : Rn ! Rand a parameter �min = �min,n 2 [0, 1) such that,

if Xn1 = (X1, . . . , Xn) is i.i.d. from P 2 P, then

8� 2 [�min, 1) : P | bEn(X

n1 )� µP| > L�P

r1 + ln(1/�)

n

! �.

Constant (may depend on the family)

Page 11: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Why sub-Gaussian?

What we ask for is basically that the estimator has Gaussian-like fluctuations around the mean.

Catoni: Gaussian-like fluctuations are optimal for "reasonable" families of distributions (more on this below).

P✓| bEn(X

n1 )� µP| >

��Ppn

◆ C1 e

� �2

C2

Page 12: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Why is this interesting?

Estimator must turn heavy tails into light tails! (Tail surgery?)

Page 13: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Why is this interesting?

Page 14: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Why is this interesting?

Related (weaker) estimators have been applied to problems in Statistics and Machine Learning. Our notion could improve these results.

Audibert and Catoni + Hsu and Sabato (least squares), Buyback et al. (bandits), Brownlees et al. (empirical risk minimization).

Page 15: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

When is this possible?

This is the main subject of our paper.

We present our results before we move on.

Page 16: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Our results

Page 17: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

First result

Assumption: variance known up to an interval.

Page 18: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Partially known varianceExample: P [�2

1 ,�22 ]

2 := all distributions with variance �2P 2 [�2

1 ,�22 ].

We let R := �2/�1 (may depend on n).

Theorem:

If R is bounded, then for all large enough nthere exist

bEn : Rn ! R, �min ⇡ e�c nand L constant

such that, when P 2 P [�21 ,�

22 ]

2 and Xn1 =d P

⌦n,

8� 2 [�min, 1) : P | bEn(X

n1 )� µP| > L�P

r1 + ln(1/�)

n

!.

If R unbounded , any sequence �min ! 0 fails.

Page 19: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Partially known varianceExample: P [�2

1 ,�22 ]

2 := all distributions with variance �2P 2 [�2

1 ,�22 ].

We let R := �2/�1 (may depend on n).

Theorem:

If R is bounded, then for all large enough nthere exist

bEn : Rn ! R, �min ⇡ e�c nand L constant

such that, when P 2 P [�21 ,�

22 ]

2 and Xn1 =d P

⌦n,

8� 2 [�min, 1) : P | bEn(X

n1 )� µP| > L�P

r1 + ln(1/�)

n

!.

If R unbounded , any sequence �min ! 0 fails.

Optimal up to the exact values of c>0 e L>0.

Page 20: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Partially known varianceExample: P [�2

1 ,�22 ]

2 := all distributions with variance �2P 2 [�2

1 ,�22 ].

We let R := �2/�1 (may depend on n).

Theorem:

If R is bounded, then for all large enough nthere exist

bEn : Rn ! R, �min ⇡ e�c nand L constant

such that, when P 2 P [�21 ,�

22 ]

2 and Xn1 =d P

⌦n,

8� 2 [�min, 1) : P | bEn(X

n1 )� µP| > L�P

r1 + ln(1/�)

n

!.

If R unbounded , any sequence �min ! 0 fails.

Truly different behavior!

Page 21: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Second result

Assumption: (slightly) higher moments.

Page 22: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Higher momentsExample: P↵,⌘ := all distributions with

EP|X � µP|↵ (⌘ �P)↵,

(here ↵ 2 (2, 3) is fixed, ⌘ � ⌘0 may depend on n)

Theorem: for all large enough n, if k↵,⌘ := (C ⌘)2↵/(↵�2),

there exist

bEn : Rn ! R, �min ⇡ e�c n/k↵,⌘and L constant

such that, when P 2 P↵,⌘ and Xn1 =d P

⌦n,

8� 2 [�min, 1) : P | bEn(X

n1 )� µP| > L�P

r1 + ln(1/�)

n

!.

Page 23: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Higher momentsExample: P↵,⌘ := all distributions with

EP|X � µP|↵ (⌘ �P)↵,

(here ↵ 2 (2, 3) is fixed, ⌘ � ⌘0 may depend on n)

Theorem: for all large enough n, if k↵,⌘ := (C ⌘)2↵/(↵�2),

there exist

bEn : Rn ! R, �min ⇡ e�c n/k↵,⌘and L constant

such that, when P 2 P↵,⌘ and Xn1 =d P

⌦n,

8� 2 [�min, 1) : P | bEn(X

n1 )� µP| > L�P

r1 + ln(1/�)

n

!.

Optimal up to value of c>0.

Page 24: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

An extensionSuffices to assume that the distributions which is k-regular:

For instance, symmetric distributions are 1-regular.

9k 2 N, 8P 2 P, 8j � k : if Xj1 =d P⌦n,

P ±1

j

jX

i=1

(Xi � µP) 0

!� 1

3.

Page 25: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Third result

Assumption: under bounded kurtosis, can get nearly optimal constant

This will be further discussed later.

L =p2 + "

Page 26: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Some background

Page 27: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

History

Typical analyses of estimators for means are based on expectations, not deviations.

Exceptions do exist (eg. Kolmogorov’s CLT for medians), but assumptions and goals are different.

Page 28: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

History

Catoni’s paper (AIHP Prob. Stat. 2012) seems to be the first to focus on deviations as a fundamental problem.

We’ll mention some more applied results later.

Page 29: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Gaussian lower bound

Recall normal cumulative distribution function.

�(r) :=

Z r

�1

e

� x

2

2dxp

2⇡

�1(1� �) ⇠

p2 ln(1/�) for � ⌧ 1.

Page 30: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Gaussian lower bound

Family: P�2

Gauss, all Gaussian distributions over Rwith variance �2 > 0.

Thm (Catoni): for any n,

inf

bEn

sup

P2PXn

1 =dP⌦n

P✓bEn(X

n1 )� µP � �(1� �)�1 �Pp

n

◆= �

Similar result for lower tail.

Page 31: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Gaussian lower bound

Family: P�2

Gauss, all Gaussian distributions over Rwith variance �2 > 0.

Thm (Catoni): for any n,

inf

bEn

sup

P2PXn

1 =dP⌦n

P✓bEn(X

n1 )� µP � �(1� �)�1 �Pp

n

◆= �

Similar result for lower tail.

This is asymptotic to

Lp

ln(1/�) with L =p2

Page 32: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Compare with definition

Given: P, family of probability distributions over R.For P 2 P, µP and �2

P are the mean and variance of P.

Want: for each large enough n 2 N, an estimator

bEn : Rn ! Rand a parameter �min = �min,n 2 [0, 1) such that,

if Xn1 = (X1, . . . , Xn) is i.i.d. from P 2 P, then

8� 2 [�min, 1) : P | bEn(X

n1 )� µP| > L�P

r1 + ln(1/�)

n

! �.

Page 33: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Gaussian lower bound

Family: P�2

Gauss, all Gaussian distributions over Rwith variance �2 > 0.

Thm (Catoni): for any n,

inf

bEn

sup

P2PXn

1 =dP⌦n

P✓bEn(X

n1 )� µP � �(1� �)�1 �Pp

n

◆= �

Similar result for lower tail.

Page 34: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

The empirical mean

It follows from Catoni’s result the empirical mean has optimal deviations for all Gaussian distributions.

This is an exception, rather than the rule.

bEn(Xn1 ) :=

1

n

nX

i=1

Xi

Page 35: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Empirical mean fails

Example: P�2

2 , all distributions with

variances �2P = �2

.

Thm (Catoni): Chebyshev is basically optimal.

sup

P2P�22

Xn1 =dP⌦n

P �����

1

n

nX

i=1

Xi � µP

����� >c�Pp� n

!� �.

Page 36: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Empirical mean fails

Example: Pkrt, all distributions with

kurtosis P := EP|X � µP|4/�4P .

Thm (Catoni): If n is large and � 1/n.

sup

P2Pkrt

Xn1 =dP⌦n

P �����

1

n

nX

i=1

Xi � µP

����� >c�P

(� n)1/4

!� �.

Page 37: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Positive results

Catoni obtained sharp sub-Gaussian estimators in some settings.

Unfortunately, they depend on the confidence level!

Page 38: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

One example

Example: P�

2

2 , all distributions with variance �

2P = �

2.

Thm (Catoni): Set �min := e

�o(n). Then 8� 2 [�min, 1),

there exists a �-dependent bEn,�

with

sup

P2P�22

X

n1 =dP⌦n

P | bE

n,�

(X

n

1 )� µP| > �P

r(2 + o(1)) ln(2/�)

n

! �

Page 39: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Why is this bad?

Suppose you want high confidence. Only guarantee is that the probability of huge error is very low.

Nothing is known about the probability of average-to-large error in more typical events.

Page 40: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Why is this bad?

Statistical and machine learning applications (Bubeck et al., Brownlees et al., Hsu/Sabato) had to cope with this dependence on the confidence level.

In all cases, something was lost.

Page 41: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Our results are “better”

… or rather, genuinely different.

Our results imply that for parameter-dependent estimators are easier to obtain.

We’ll see that right now.

Page 42: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Median of means

Page 43: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Median of means

Simple construction of a sub-Gaussian parameter-dependent estimator that only requires finite second moments.

Known for a long time, in many forms, in different comunities (Nemirovski/Yudin, Alon/Matias/Szégedy, Levin, Jerrum/Sinclair, Hsu…). “Pre-history”.

Page 44: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Median of means

Example: P�2

2 , all distributions with

variances �2P = �2

.

Thm: Set �min := e1�n/2. Then 8� 2 [�min, 1),

there exists a �-dependent bEn,� with

sup

P2P�22

Xn1 =dP⌦n

P | bEn,�(X

n1 )� µP| > L�P

r1 + ln(2/�)

n

! �

Page 45: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Median of meansSample: Xn

1 := (X1, X2, X3, . . . , Xn) from distribution P.

Blocks: split {1, 2, . . . , n} = B1 [B2 [ · · · [Bb,

disjoint blocks of size n/b.Means: for each block B`, define

Y` :=b

n

X

i2B`

Xi

Median of means:

bEn,�(Xn1 ) := median of (Y1, Y2, . . . , Yb)

Page 46: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Analysis

RµPµP � L�P

rb

nµP + L�P

rb

n

Interval

Page 47: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Analysis

RµPµP � L�P

rb

nµP + L�P

rb

n

Want: median of Y1, . . . , Yb in interval.

Su�cient: more than half of the Y`’s

are in there.

Page 48: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Analysis

RµPµP � L�P

rb

nµP + L�P

rb

n

Y` =b

n

X

i2B`

Xi, with the Xi i.i.d. P

E(Y`) = µP, Var(Y`) = b�2P/n

Page 49: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Analysis

RµPµP � L�P

rb

nµP + L�P

rb

n

By Chebyshev, P (Y` 62 interval) L�2

Disjoint blocks) events are independent.

Page 50: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Analysis

RµPµP � L�P

rb

nµP + L�P

rb

n

Probability that � b/2 Y`’s not in interval

is bounded by a binomial tail probability.

If L is large, P�Bin(b, L�2

) � b/2� e�b

Page 51: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Analysis

RµPµP � L�P

rb

nµP + L�P

rb

n

Probability that � b/2 Y`’s not in interval

is bounded by a binomial tail probability.

If L is large, P�Bin(b, L�2

) � b/2� e�b

b ⇡ ln(1/�) and we’re done

Page 52: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Our proof ideas

Page 53: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Exponential is optimalFamily: PLa, all Laplace distributions La

, with � 2 R and

dLa

(x)

dx

=

e

�|x��|

2

Property: e

�|�|n dLa⌦n�

dLa⌦n0

(x) e

|�|n

Consequence: any estimator with constant L

will mistake a La0 sample for a La10L2sample

with prob. ⇡ e

1�5L2n

.

Page 54: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Partially known varianceExample: P [�2

1 ,�22 ]

2 := all distributions with variance �2P 2 [�2

1 ,�22 ].

We let R := �2/�1 (may depend on n).

Theorem:

If R is bounded, then for all large enough nthere exist

bEn : Rn ! R, �min ⇡ e�c nand L constant

such that, when P 2 P [�21 ,�

22 ]

2 and Xn1 =d P

⌦n,

8� 2 [�min, 1) : P | bEn(X

n1 )� µP| > L�P

r1 + ln(1/�)

n

!.

If R unbounded , any sequence �min ! 0 fails.

Page 55: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Why unbounded fails

Family: P [c/n,R c/n]Po

, Poisson random variables

with very small means c/n µP

Rc/n.

Recall mean=variance for Poisson!

Xn1

:= sample with mean c/n, SX := X1

+ · · ·+Xn.

Y n1

:= sample with mean Rc/n, SY := Y1

+ · · ·+ Yn.

Page 56: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Why unbounded fails

Xn1 := sample with mean c/n, SX := X1 + · · ·+Xn.

Y n1 := sample with mean Rc/n, SY := Y1 + · · ·+ Yn.

Assume good estimator

bEn with constant L.

P⇣n bE(Y n

1 ) � Rc/2⌘� 1� e1�

Rc4L2

In particular, P⇣n bE(Y n

1 ) � Rc/2 | SY = Rc⌘⇡ 1.

Page 57: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Why unbounded fails

Xn1 := sample with mean c/n, SX := X1 + · · ·+Xn.

Y n1 := sample with mean Rc/n, SY := Y1 + · · ·+ Yn.

Assume good estimator

bEn with constant L.

P⇣n bE(Y n

1 ) � Rc/2⌘� 1� e1�

Rc4L2

In particular, P⇣n bE(Y n

1 ) � Rc/2 | SY = Rc⌘⇡ 1.

Same for X as for Y! (Sample sum is sufficient statistic)

Page 58: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Why unbounded fails

P⇣n bE(Xn

1 ) � Rc/2 | SX = Rc⌘⇡ 1.

So P⇣n bE(Xn

1 ) � Rc/2⌘� P (SX = Rc) ⇡ e�R lnRc

On the other hand, the prob. should be ⇡ e�R2 cL2

by the sub-Gaussian estimation property

)( for R large

Page 59: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

The positive resultExample: P [�2

1 ,�22 ]

2 := all distributions with variance �2P 2 [�2

1 ,�22 ].

We let R := �2/�1 (may depend on n).

Theorem:

If R is bounded, then for all large enough nthere exist

bEn : Rn ! R, �min ⇡ e�c nand L constant

such that, when P 2 P [�21 ,�

22 ]

2 and Xn1 =d P

⌦n,

8� 2 [�min, 1) : P | bEn(X

n1 )� µP| > L�P

r1 + ln(1/�)

n

!.

If R unbounded , any sequence �min ! 0 fails.

Page 60: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Confidence intervalsUse median of means. Get a confidence interval.

bIn,�(Xn1 ) :=

"bEn,�(X

n1 )± L�2

r1 + ln(1/�)

n

#

P µP 2 bIn,�(Xn

1 ) and |bIn,�(Xn1 )| 2LR �P

r1 + ln(1/�)

n

!� 1� �

Page 61: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Confidence intervals

We'll combine sub-Gaussian confidence intervals to obtain a single sub-Gaussian estimator.

Similar in spirit to Lepskii’s adaptation method from nonparametric statistics.

Page 62: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Confidence intervals

Lemma: I1, I2, . . . , IK random nonempty closed intervals.

Assume µ 2 R, P (µ 62 Ik) 2

�k, 1 k K.

Set

ˆK := min{k K : \Kj=kIj 6= ;}.

Let

bE :=midpoint of \Kj=K̂

Ij .

Then 81 k K : P⇣| bE � µ| > |Ik|

⌘ 2

1�k.

Page 63: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Proof sketchI1, I2, . . . , IK random nonempty closed intervals.Set K̂ := min{k K : \K

j=kIj 6= ;}.Let bE :=midpoint of \K

j=K̂Ij .

Assume 8j � k, µ 2 Ij .

Obtain, \Kj=kIj 6= ;, so K̂ k.

Hence bE, µ 2 Ik under the assumption.

) P⇣| bE � µ| > |Ik|

Pj�k P (µ 62 Ij).

Page 64: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Other usesExample: P↵,⌘ := all distributions with

EP|X � µP|↵ (⌘ �P)↵,

(here ↵ 2 (2, 3) is fixed, ⌘ � ⌘0 may depend on n)

Theorem: for all large enough n, if k↵,⌘ := (C ⌘)2↵/(↵�2),

there exist

bEn : Rn ! R, �min ⇡ e�c n/k↵,⌘and L constant

such that, when P 2 P↵,⌘ and Xn1 =d P

⌦n,

8� 2 [�min, 1) : P | bEn(X

n1 )� µP| > L�P

r1 + ln(1/�)

n

!.

Page 65: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Other usesExample: P↵,⌘ := all distributions with

EP|X � µP|↵ (⌘ �P)↵,

(here ↵ 2 (2, 3) is fixed, ⌘ � ⌘0 may depend on n)

Theorem: for all large enough n, if k↵,⌘ := (C ⌘)2↵/(↵�2),

there exist

bEn : Rn ! R, �min ⇡ e�c n/k↵,⌘and L constant

such that, when P 2 P↵,⌘ and Xn1 =d P

⌦n,

8� 2 [�min, 1) : P | bEn(X

n1 )� µP| > L�P

r1 + ln(1/�)

n

!.

Use quantiles of means (instead of medians of means) to build confidence intervals.

Barry-Esséen-type bounds prove that empirical means are nearly symmetric.

Page 66: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Different ideas - kurtosis

Under bounded kurtosis, can use the empirical mean of truncated random variables.

The truncation is data driven and uses preliminary estimates of mean and variance.

Use empirical processes to show this is similar to truncating at the exact mean and variance. Sharp bounds!

Page 67: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Open problems

Page 68: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Open problemsSharp constants are essential for statisticians.

Are sub-Gaussian confidence intervals somehow equivalent to sub-Gaussian estimators?

Efficient extensions to vector-valued data and to risk minimization problems.

Optimal deviation bounds for Poissons, Bernoullis, etc.

Page 69: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Obrigado! (references in the next slides)

Page 70: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Our preprint

Should be posted to the arXiv in some weeks. Available upon request from

roboliv AT gmail.com

Page 71: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Catoni’s work

Catoni’s estimation paper + companion paper on least squares (with Audibert).

J.-Y. Audibert & O. Catoni. "Robust linear least squares regression.” Ann. Stat. 39 no. 5 (2011)

O. Catoni. "Challenging the empirical mean and empirical variance: A deviation study.” Ann. Inst. H. Poincaré Probab. Statist. 48 no. 4 (2012)

Page 72: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Median of meansD. Hsu http://www.inherentuncertainty.org/2010/12/robust-statistics.html (See also Levin, L. "Notes for Miscellaneous Lectures.” arXiv:cs/0503039)

N. Alon, Y. Matias & M. Szégedy. "The Space Complexity of Approximating the Frequency Moments." J. Comput. Syst. Sci. 58 no. 1 (1999)

A. Nemirovski & D. Yudin. Problem complexity and method efficiency in optimization. Wiley (1983).

Page 73: Sub-Gaussian estimators under heavy tailsw3.impa.br/~rimfo/EBP_subgaussian.pdf · Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade

Some applicationsC. Brownlees, E. Joly & G. Lugosi. "Empirical risk minimization for heavy-tailed losses.” To appear in Ann. Stat.

S. Bubeck, N. Cesa-Bianchi & G. Lugosi. “Bandits with heavy tail.” IEEE Transactions on Information Theory 59 no. 11 (2013)

D. Hsu & S. Sabato. "Loss minimization and parameter estimation with heavy tails.” arXiv:1307.1827. Abstract in ICML proceedings (2014).