133
Robust Statistics, Revisited Ankur Moitra (MIT) joint work with Ilias Diakonikolas, Jerry Li, Gautam Kamath, Daniel Kane and Alistair Stewart

Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

RobustStatistics,Revisited

AnkurMoitra(MIT)

jointworkwithIlias Diakonikolas,JerryLi,Gautam Kamath,DanielKaneandAlistairStewart

Page 2: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

CLASSICPARAMETERESTIMATIONGivensamplesfromanunknowndistributioninsomeclass

e.g.a1-DGaussian

canweaccuratelyestimateitsparameters?

Page 3: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

CLASSICPARAMETERESTIMATIONGivensamplesfromanunknowndistributioninsomeclass

e.g.a1-DGaussian

canweaccuratelyestimateitsparameters? Yes!

Page 4: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

CLASSICPARAMETERESTIMATIONGivensamplesfromanunknowndistributioninsomeclass

e.g.a1-DGaussian

canweaccuratelyestimateitsparameters?

empiricalmean: empiricalvariance:

Yes!

Page 5: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

Themaximumlikelihoodestimatorisasymptoticallyefficient(1910-1920)

R.A.Fisher

Page 6: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

Themaximumlikelihoodestimatorisasymptoticallyefficient(1910-1920)

R.A.Fisher J.W.Tukey

Whatabouterrors inthemodelitself?(1960)

Page 7: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

ROBUSTSTATISTICS

Whatestimatorsbehavewellinaneighborhood aroundthe model?

Page 8: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

ROBUSTSTATISTICS

Whatestimatorsbehavewellinaneighborhood aroundthe model?

Let’sstudyasimpleone-dimensionalexample….

Page 9: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

ROBUSTPARAMETERESTIMATIONGivencorrupted samplesfroma1-DGaussian:

canweaccuratelyestimateitsparameters?

=+idealmodel noise observedmodel

Page 10: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

Howdoweconstrainthenoise?

Page 11: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

Howdoweconstrainthenoise?

Equivalently:

L1-normofnoiseatmostO(ε)

Page 12: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

Howdoweconstrainthenoise?

Equivalently:

L1-normofnoiseatmostO(ε) ArbitrarilycorruptO(ε)-fractionofsamples(inexpectation)

Page 13: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

Howdoweconstrainthenoise?

Equivalently:

ThisgeneralizesHuber’sContaminationModel:Anadversarycanadd anε-fractionofsamples

L1-normofnoiseatmostO(ε) ArbitrarilycorruptO(ε)-fractionofsamples(inexpectation)

Page 14: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

Howdoweconstrainthenoise?

Equivalently:

ThisgeneralizesHuber’sContaminationModel:Anadversarycanadd anε-fractionofsamples

L1-normofnoiseatmostO(ε) ArbitrarilycorruptO(ε)-fractionofsamples(inexpectation)

Outliers:Pointsadversaryhascorrupted,Inliers:Pointshehasn’t

Page 15: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

Inwhatnormdowewanttheparameterstobeclose?

Page 16: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

Inwhatnormdowewanttheparameterstobeclose?

Definition:Thetotalvariationdistancebetweentwodistributionswithpdfs f(x)andg(x)is

Page 17: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

Inwhatnormdowewanttheparameterstobeclose?

FromtheboundontheL1-normofthenoise,wehave:

observedideal

Definition:Thetotalvariationdistancebetweentwodistributionswithpdfs f(x)andg(x)is

Page 18: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

Inwhatnormdowewanttheparameterstobeclose?

Definition:Thetotalvariationdistancebetweentwodistributionswithpdfs f(x)andg(x)is

estimate ideal

Goal:Finda1-DGaussianthatsatisfies

Page 19: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

Inwhatnormdowewanttheparameterstobeclose?

estimate observed

Definition:Thetotalvariationdistancebetweentwodistributionswithpdfs f(x)andg(x)is

Equivalently,finda1-DGaussianthatsatisfies

Page 20: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

Dotheempiricalmeanandempiricalvariancework?

Page 21: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

Dotheempiricalmeanandempiricalvariancework?

No!

Page 22: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

Dotheempiricalmeanandempiricalvariancework?

No!

=+idealmodel noise observedmodel

Page 23: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

Dotheempiricalmeanandempiricalvariancework?

No!

=+idealmodel noise observedmodel

Asinglecorruptedsamplecanarbitrarilycorrupttheestimates

Page 24: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

Dotheempiricalmeanandempiricalvariancework?

No!

=+idealmodel noise observedmodel

Asinglecorruptedsamplecanarbitrarilycorrupttheestimates

Butthemedian andmedianabsolutedeviationdowork

Page 25: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

Dotheempiricalmeanandempiricalvariancework?

No!

=+idealmodel noise observedmodel

Asinglecorruptedsamplecanarbitrarilycorrupttheestimates

Butthemedian andmedianabsolutedeviationdowork

Page 26: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

Fact[Folklore]:Givensamplesfromadistributionthatareε-closeintotalvariationdistancetoa1-DGaussian

themedianandMADrecoverestimatesthatsatisfy

where

Page 27: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

Fact[Folklore]:Givensamplesfromadistributionthatareε-closeintotalvariationdistancetoa1-DGaussian

themedianandMADrecoverestimatesthatsatisfy

where

Alsocalled(properly)agnosticallylearninga1-DGaussian

Page 28: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

Fact[Folklore]:Givensamplesfromadistributionthatareε-closeintotalvariationdistancetoa1-DGaussian

themedianandMADrecoverestimatesthatsatisfy

where

Whataboutrobustestimationinhigh-dimensions?

Page 29: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

Whataboutrobustestimationinhigh-dimensions?

e.g.microarrayswith10kgenes

Fact[Folklore]:Givensamplesfromadistributionthatareε-closeintotalvariationdistancetoa1-DGaussian

themedianandMADrecoverestimatesthatsatisfy

where

Page 30: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

PartI:Introduction

� RobustEstimationinOne-dimension� Robustnessvs.HardnessinHigh-dimensions

� OurResults

PartII:AgnosticallyLearningaGaussian

� ParameterDistance� DetectingWhenanEstimatorisCompromised

� FilteringandConvexProgramming� UnknownCovariance

OUTLINE

PartIII:ExperimentsandExtensions

Page 31: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

PartI:Introduction

� RobustEstimationinOne-dimension� Robustnessvs.HardnessinHigh-dimensions

� OurResults

PartII:AgnosticallyLearningaGaussian

� ParameterDistance� DetectingWhenanEstimatorisCompromised

� FilteringandConvexProgramming� UnknownCovariance

OUTLINE

PartIII:ExperimentsandExtensions

Page 32: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

MainProblem:Givensamplesfromadistributionthatareε-closeintotalvariationdistancetoad-dimensionalGaussian

giveanefficientalgorithmtofindparametersthatsatisfy

Page 33: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

MainProblem:Givensamplesfromadistributionthatareε-closeintotalvariationdistancetoad-dimensionalGaussian

giveanefficientalgorithmtofindparametersthatsatisfy

SpecialCases:

(1)Unknownmean

(2)Unknowncovariance

Page 34: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

ACOMPENDIUMOFAPPROACHES

ErrorGuarantee

RunningTime

UnknownMean

Page 35: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

ACOMPENDIUMOFAPPROACHES

ErrorGuarantee

RunningTime

TukeyMedian

UnknownMean

Page 36: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

ACOMPENDIUMOFAPPROACHES

ErrorGuarantee

RunningTime

TukeyMedian

UnknownMean

O(ε)

Page 37: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

ACOMPENDIUMOFAPPROACHES

ErrorGuarantee

RunningTime

TukeyMedian

UnknownMean

O(ε) NP-Hard

Page 38: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

ACOMPENDIUMOFAPPROACHES

ErrorGuarantee

RunningTime

TukeyMedian

UnknownMean

O(ε) NP-Hard

GeometricMedian

Page 39: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

ACOMPENDIUMOFAPPROACHES

ErrorGuarantee

RunningTime

TukeyMedian

UnknownMean

O(ε) NP-Hard

GeometricMedian poly(d,N)

Page 40: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

ACOMPENDIUMOFAPPROACHES

ErrorGuarantee

RunningTime

TukeyMedian

UnknownMean

O(ε) NP-Hard

GeometricMedian poly(d,N)O(ε√d)

Page 41: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

ACOMPENDIUMOFAPPROACHES

ErrorGuarantee

RunningTime

TukeyMedian

UnknownMean

O(ε) NP-Hard

GeometricMedian poly(d,N)O(ε√d)

Tournament O(ε) NO(d)

Page 42: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

ACOMPENDIUMOFAPPROACHES

ErrorGuarantee

RunningTime

TukeyMedian

UnknownMean

O(ε) NP-Hard

GeometricMedian poly(d,N)O(ε√d)

Tournament O(ε) NO(d)

O(ε√d)Pruning O(dN)

Page 43: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

ACOMPENDIUMOFAPPROACHES

ErrorGuarantee

RunningTime

TukeyMedian O(ε) NP-Hard

GeometricMedian O(ε√d) poly(d,N)

Tournament O(ε) NO(d)

O(ε√d)Pruning O(dN)

UnknownMean

Page 44: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

ThePriceofRobustness?

Allknownestimatorsarehardtocomputeorlosepolynomial factorsinthedimension

Page 45: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

ThePriceofRobustness?

Allknownestimatorsarehardtocomputeorlosepolynomial factorsinthedimension

Equivalently:Computationallyefficientestimatorscanonlyhandle

fractionoferrorsandgetnon-trivial(TV<1)guarantees

Page 46: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

ThePriceofRobustness?

Allknownestimatorsarehardtocomputeorlosepolynomial factorsinthedimension

Equivalently:Computationallyefficientestimatorscanonlyhandle

fractionoferrorsandgetnon-trivial(TV<1)guarantees

Page 47: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

ThePriceofRobustness?

Allknownestimatorsarehardtocomputeorlosepolynomial factorsinthedimension

Equivalently:Computationallyefficientestimatorscanonlyhandle

fractionoferrorsandgetnon-trivial(TV<1)guarantees

Isrobustestimationalgorithmicallypossibleinhigh-dimensions?

Page 48: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

PartI:Introduction

� RobustEstimationinOne-dimension� Robustnessvs.HardnessinHigh-dimensions

� OurResults

PartII:AgnosticallyLearningaGaussian

� ParameterDistance� DetectingWhenanEstimatorisCompromised

� FilteringandConvexProgramming� UnknownCovariance

OUTLINE

PartIII:ExperimentsandExtensions

Page 49: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

PartI:Introduction

� RobustEstimationinOne-dimension� Robustnessvs.HardnessinHigh-dimensions

� OurResults

PartII:AgnosticallyLearningaGaussian

� ParameterDistance� DetectingWhenanEstimatorisCompromised

� FilteringandConvexProgramming� UnknownCovariance

OUTLINE

PartIII:ExperimentsandExtensions

Page 50: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

OURRESULTS

Theorem[Diakonikolas,Li,Kamath,Kane,Moitra,Stewart‘16]:Thereisanalgorithmwhengivensamplesfromadistributionthatisε-closeintotalvariationdistancetoad-dimensionalGaussianfindsparametersthatsatisfy

Robustestimationishigh-dimensionsisalgorithmicallypossible!

Moreoverthealgorithmrunsintimepoly(N,d)

Page 51: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

OURRESULTS

Theorem[Diakonikolas,Li,Kamath,Kane,Moitra,Stewart‘16]:Thereisanalgorithmwhengivensamplesfromadistributionthatisε-closeintotalvariationdistancetoad-dimensionalGaussianfindsparametersthatsatisfy

Robustestimationishigh-dimensionsisalgorithmicallypossible!

Moreoverthealgorithmrunsintimepoly(N,d)

Alternatively:CanapproximatetheTukeymedian,etc,ininterestingsemi-randommodels

Page 52: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

Simultaneously[Lai,Rao,Vempala ‘16]gaveagnosticalgorithmsthatachieve:

andworkfornon-Gaussiandistributionstoo

Page 53: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

Simultaneously[Lai,Rao,Vempala ‘16]gaveagnosticalgorithmsthatachieve:

andworkfornon-Gaussiandistributionstoo

Manyotherapplicationsacrossbothpapers:productdistributions,mixturesofsphericalGaussians,SVD,ICA

Page 54: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

AGENERALRECIPE

Robustestimationinhigh-dimensions:

� Step#1:Findanappropriateparameterdistance

� Step#2:Detectwhenthenaïveestimatorhasbeencompromised

� Step#3:Findgoodparameters,ormakeprogressFiltering:FastandpracticalConvexProgramming:Bettersamplecomplexity

Page 55: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

AGENERALRECIPE

Robustestimationinhigh-dimensions:

� Step#1:Findanappropriateparameterdistance

� Step#2:Detectwhenthenaïveestimatorhasbeencompromised

� Step#3:Findgoodparameters,ormakeprogressFiltering:FastandpracticalConvexProgramming:Bettersamplecomplexity

Let’sseehowthisworksforunknownmean…

Page 56: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

PartI:Introduction

� RobustEstimationinOne-dimension� Robustnessvs.HardnessinHigh-dimensions

� OurResults

PartII:AgnosticallyLearningaGaussian

� ParameterDistance� DetectingWhenanEstimatorisCompromised

� FilteringandConvexProgramming� UnknownCovariance

OUTLINE

PartIII:ExperimentsandExtensions

Page 57: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

PartI:Introduction

� RobustEstimationinOne-dimension� Robustnessvs.HardnessinHigh-dimensions

� OurResults

PartII:AgnosticallyLearningaGaussian

� ParameterDistance� DetectingWhenanEstimatorisCompromised

� FilteringandConvexProgramming� UnknownCovariance

OUTLINE

PartIII:ExperimentsandExtensions

Page 58: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

PARAMETERDISTANCE

Step#1:FindanappropriateparameterdistanceforGaussians

Page 59: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

PARAMETERDISTANCE

Step#1:FindanappropriateparameterdistanceforGaussians

ABasicFact:

(1)

Page 60: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

PARAMETERDISTANCE

Step#1:FindanappropriateparameterdistanceforGaussians

ABasicFact:

(1)

ThiscanbeprovenusingPinsker’s Inequality

andthewell-knownformulaforKL-divergencebetweenGaussians

Page 61: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

PARAMETERDISTANCE

Step#1:FindanappropriateparameterdistanceforGaussians

ABasicFact:

(1)

Page 62: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

PARAMETERDISTANCE

Step#1:FindanappropriateparameterdistanceforGaussians

ABasicFact:

(1)

Corollary:Ifourestimate(intheunknownmeancase)satisfies

then

Page 63: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

PARAMETERDISTANCE

Step#1:FindanappropriateparameterdistanceforGaussians

ABasicFact:

(1)

Corollary:Ifourestimate(intheunknownmeancase)satisfies

then

OurnewgoalistobecloseinEuclideandistance

Page 64: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

PartI:Introduction

� RobustEstimationinOne-dimension� Robustnessvs.HardnessinHigh-dimensions

� OurResults

PartII:AgnosticallyLearningaGaussian

� ParameterDistance� DetectingWhenanEstimatorisCompromised

� FilteringandConvexProgramming� UnknownCovariance

OUTLINE

PartIII:ExperimentsandExtensions

Page 65: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

PartI:Introduction

� RobustEstimationinOne-dimension� Robustnessvs.HardnessinHigh-dimensions

� OurResults

PartII:AgnosticallyLearningaGaussian

� ParameterDistance� DetectingWhenanEstimatorisCompromised

� FilteringandConvexProgramming� UnknownCovariance

OUTLINE

PartIII:ExperimentsandExtensions

Page 66: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

DETECTINGCORRUPTIONS

Step#2:Detectwhenthenaïveestimatorhasbeencompromised

Page 67: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

DETECTINGCORRUPTIONS

Step#2:Detectwhenthenaïveestimatorhasbeencompromised

=uncorrupted=corrupted

Page 68: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

DETECTINGCORRUPTIONS

Step#2:Detectwhenthenaïveestimatorhasbeencompromised

=uncorrupted=corrupted

Thereisadirectionoflarge(>1)variance

Page 69: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

KeyLemma:IfX1,X2,…XN comefromadistributionthatisε-closetoandthenfor

(1) (2)

withprobabilityatleast1-δ

Page 70: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

KeyLemma:IfX1,X2,…XN comefromadistributionthatisε-closetoandthenfor

(1) (2)

withprobabilityatleast1-δ

Take-away:Anadversaryneedstomessupthesecondmomentinordertocorruptthefirstmoment

Page 71: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

PartI:Introduction

� RobustEstimationinOne-dimension� Robustnessvs.HardnessinHigh-dimensions

� OurResults

PartII:AgnosticallyLearningaGaussian

� ParameterDistance� DetectingWhenanEstimatorisCompromised

� FilteringandConvexProgramming� UnknownCovariance

OUTLINE

PartIII:ExperimentsandExtensions

Page 72: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

PartI:Introduction

� RobustEstimationinOne-dimension� Robustnessvs.HardnessinHigh-dimensions

� OurResults

PartII:AgnosticallyLearningaGaussian

� ParameterDistance� DetectingWhenanEstimatorisCompromised

� FilteringandConvexProgramming� UnknownCovariance

OUTLINE

PartIII:ExperimentsandExtensions

Page 73: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

OURALGORITHM(S)

Step#3:Eitherfindgoodparameters,orremovemanyoutliers

Page 74: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

OURALGORITHM(S)

Step#3:Eitherfindgoodparameters,orremovemanyoutliers

FilteringApproach:Supposethat:

Page 75: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

OURALGORITHM(S)

Step#3:Eitherfindgoodparameters,orremovemanyoutliers

FilteringApproach:Supposethat:

Wecanthrowoutmorecorruptedthanuncorruptedpoints:

v

wherevisthedirectionoflargestvariance

Page 76: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

OURALGORITHM(S)

Step#3:Eitherfindgoodparameters,orremovemanyoutliers

FilteringApproach:Supposethat:

Wecanthrowoutmorecorruptedthanuncorruptedpoints:

v

wherevisthedirectionoflargestvariance,andThasaformula

Page 77: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

OURALGORITHM(S)

Step#3:Eitherfindgoodparameters,orremovemanyoutliers

FilteringApproach:Supposethat:

Wecanthrowoutmorecorruptedthanuncorruptedpoints:

v

T

wherevisthedirectionoflargestvariance,andThasaformula

Page 78: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

OURALGORITHM(S)

Step#3:Eitherfindgoodparameters,orremovemanyoutliers

FilteringApproach:Supposethat:

Wecanthrowoutmorecorruptedthanuncorruptedpoints

Page 79: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

OURALGORITHM(S)

Step#3:Eitherfindgoodparameters,orremovemanyoutliers

FilteringApproach:Supposethat:

Wecanthrowoutmorecorruptedthanuncorruptedpoints

Ifwecontinuetoolong,we’dhavenocorruptedpointsleft!

Page 80: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

OURALGORITHM(S)

Step#3:Eitherfindgoodparameters,orremovemanyoutliers

FilteringApproach:Supposethat:

Wecanthrowoutmorecorruptedthanuncorruptedpoints

Ifwecontinuetoolong,we’dhavenocorruptedpointsleft!

Eventuallywefind(certifiably)goodparameters

Page 81: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

OURALGORITHM(S)

Step#3:Eitherfindgoodparameters,orremovemanyoutliers

FilteringApproach:Supposethat:

Wecanthrowoutmorecorruptedthanuncorruptedpoints

Ifwecontinuetoolong,we’dhavenocorruptedpointsleft!

Eventuallywefind(certifiably)goodparameters

RunningTime: SampleComplexity:

Page 82: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

OURALGORITHM(S)

Step#3:Eitherfindgoodparameters,orremovemanyoutliers

FilteringApproach:Supposethat:

Wecanthrowoutmorecorruptedthanuncorruptedpoints

Ifwecontinuetoolong,we’dhavenocorruptedpointsleft!

Eventuallywefind(certifiably)goodparameters

RunningTime: SampleComplexity:ConcentrationofLTFs

Page 83: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

PartI:Introduction

� RobustEstimationinOne-dimension� Robustnessvs.HardnessinHigh-dimensions

� OurResults

PartII:AgnosticallyLearningaGaussian

� ParameterDistance� DetectingWhenanEstimatorisCompromised

� FilteringandConvexProgramming� UnknownCovariance

OUTLINE

PartIII:ExperimentsandExtensions

Page 84: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

PartI:Introduction

� RobustEstimationinOne-dimension� Robustnessvs.HardnessinHigh-dimensions

� OurResults

PartII:AgnosticallyLearningaGaussian

� ParameterDistance� DetectingWhenanEstimatorisCompromised

� FilteringandConvexProgramming� UnknownCovariance

OUTLINE

PartIII:ExperimentsandExtensions

Page 85: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

AGENERALRECIPE

Robustestimationinhigh-dimensions:

� Step#1:Findanappropriateparameterdistance

� Step#2:Detectwhenthenaïveestimatorhasbeencompromised

� Step#3:Findgoodparameters,ormakeprogressFiltering:FastandpracticalConvexProgramming:Bettersamplecomplexity

Page 86: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

AGENERALRECIPE

Robustestimationinhigh-dimensions:

� Step#1:Findanappropriateparameterdistance

� Step#2:Detectwhenthenaïveestimatorhasbeencompromised

� Step#3:Findgoodparameters,ormakeprogressFiltering:FastandpracticalConvexProgramming:Bettersamplecomplexity

Howaboutforunknowncovariance?

Page 87: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

PARAMETERDISTANCE

Step#1:FindanappropriateparameterdistanceforGaussians

Page 88: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

PARAMETERDISTANCE

Step#1:FindanappropriateparameterdistanceforGaussians

AnotherBasicFact:

(2)

Page 89: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

PARAMETERDISTANCE

Step#1:FindanappropriateparameterdistanceforGaussians

AnotherBasicFact:

Again,provenusingPinsker’s Inequality

(2)

Page 90: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

PARAMETERDISTANCE

Step#1:FindanappropriateparameterdistanceforGaussians

AnotherBasicFact:

Again,provenusingPinsker’s Inequality

(2)

Ournewgoalistofindanestimatethatsatisfies:

Page 91: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

PARAMETERDISTANCE

Step#1:FindanappropriateparameterdistanceforGaussians

AnotherBasicFact:

Again,provenusingPinsker’s Inequality

(2)

Ournewgoalistofindanestimatethatsatisfies:

Distanceseemsstrange,butit’stherightonetousetoboundTV

Page 92: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

UNKNOWNCOVARIANCE

Whatifwearegivensamplesfrom?

Page 93: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

UNKNOWNCOVARIANCE

Whatifwearegivensamplesfrom?

Howdowedetectifthenaïveestimatoriscompromised?

Page 94: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

UNKNOWNCOVARIANCE

Whatifwearegivensamplesfrom?

Howdowedetectifthenaïveestimatoriscompromised?

KeyFact:Let and

Thenrestrictedtoflattenings ofdxdsymmetricmatrices

Page 95: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

UNKNOWNCOVARIANCE

Whatifwearegivensamplesfrom?

Howdowedetectifthenaïveestimatoriscompromised?

KeyFact:Let and

Thenrestrictedtoflattenings ofdxdsymmetricmatrices

ProofusesIsserlis’s Theorem

Page 96: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

UNKNOWNCOVARIANCE

needtoprojectout

Whatifwearegivensamplesfrom?

Howdowedetectifthenaïveestimatoriscompromised?

KeyFact:Let and

Thenrestrictedtoflattenings ofdxdsymmetricmatrices

Page 97: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

KeyIdea: Transformthedata,lookforrestrictedlargeeigenvalues

Page 98: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

KeyIdea: Transformthedata,lookforrestrictedlargeeigenvalues

Page 99: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

KeyIdea: Transformthedata,lookforrestrictedlargeeigenvalues

Ifwerethetruecovariance,wewouldhaveforinliers

Page 100: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

KeyIdea: Transformthedata,lookforrestrictedlargeeigenvalues

Ifwerethetruecovariance,wewouldhaveforinliers,inwhichcase:

wouldhavesmallrestrictedeigenvalues

Page 101: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

KeyIdea: Transformthedata,lookforrestrictedlargeeigenvalues

Ifwerethetruecovariance,wewouldhaveforinliers,inwhichcase:

wouldhavesmallrestrictedeigenvalues

Take-away:Anadversaryneedstomessupthe(restricted)fourthmomentinordertocorruptthesecondmoment

Page 102: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

ASSEMBLINGTHEALGORITHM

Givensamplesthatareε-closeintotalvariationdistancetoad-dimensionalGaussian

Page 103: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

ASSEMBLINGTHEALGORITHM

Givensamplesthatareε-closeintotalvariationdistancetoad-dimensionalGaussian

Step#1:Doublingtrick

Page 104: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

ASSEMBLINGTHEALGORITHM

Givensamplesthatareε-closeintotalvariationdistancetoad-dimensionalGaussian

Step#1:Doublingtrick

Nowusealgorithmforunknowncovariance

Page 105: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

ASSEMBLINGTHEALGORITHM

Givensamplesthatareε-closeintotalvariationdistancetoad-dimensionalGaussian

Step#1:Doublingtrick

Nowusealgorithmforunknowncovariance

Step#2:(Agnostic)isotropicposition

Page 106: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

ASSEMBLINGTHEALGORITHM

Givensamplesthatareε-closeintotalvariationdistancetoad-dimensionalGaussian

Step#1:Doublingtrick

Nowusealgorithmforunknowncovariance

Step#2:(Agnostic)isotropicposition

rightdistance,ingeneralcase

Page 107: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

ASSEMBLINGTHEALGORITHM

Givensamplesthatareε-closeintotalvariationdistancetoad-dimensionalGaussian

Step#1:Doublingtrick

Nowusealgorithmforunknowncovariance

Step#2:(Agnostic)isotropicposition

Nowusealgorithmforunknownmeanrightdistance,ingeneralcase

Page 108: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

PartI:Introduction

� RobustEstimationinOne-dimension� Robustnessvs.HardnessinHigh-dimensions

� OurResults

PartII:AgnosticallyLearningaGaussian

� ParameterDistance� DetectingWhenanEstimatorisCompromised

� FilteringandConvexProgramming� UnknownCovariance

OUTLINE

PartIII:ExperimentsandExtensions

Page 109: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

PartI:Introduction

� RobustEstimationinOne-dimension� Robustnessvs.HardnessinHigh-dimensions

� OurResults

PartII:AgnosticallyLearningaGaussian

� ParameterDistance� DetectingWhenanEstimatorisCompromised

� FilteringandConvexProgramming� UnknownCovariance

OUTLINE

PartIII:ExperimentsandExtensions

Page 110: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

FURTHERRESULTS

Userestrictedeigenvalueproblemstodetectoutliers

Page 111: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

FURTHERRESULTS

Userestrictedeigenvalueproblemstodetectoutliers

BinaryProductDistributions:

Page 112: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

FURTHERRESULTS

Userestrictedeigenvalueproblemstodetectoutliers

BinaryProductDistributions:

MixturesofTwoc-BalancedBinaryProductDistributions:

Page 113: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

FURTHERRESULTS

Userestrictedeigenvalueproblemstodetectoutliers

BinaryProductDistributions:

MixturesofTwoc-BalancedBinaryProductDistributions:

MixturesofkSphericalGaussians:

Page 114: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

SYNTHETICEXPERIMENTS

Errorratesonsyntheticdata(unknownmean):

+10%noise

Page 115: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

SYNTHETICEXPERIMENTS

Errorratesonsyntheticdata(unknownmean):

100 200 300 400

0

0.5

1

1.5

dimension

excess` 2

error

Filtering

LRVMean

Sample mean w/ noise

Pruning

RANSAC Geometric Median

100 200 300 400

0.04

0.06

0.08

0.1

0.12

0.14

dimension

excess` 2

error

Page 116: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

SYNTHETICEXPERIMENTS

Errorratesonsyntheticdata(unknowncovariance,isotropic):

+10%noise

closetoidentity

Page 117: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

SYNTHETICEXPERIMENTS

20 40 60 80 100

0

0.5

1

1.5

dimension

excess` 2

error

Filtering

LRVCov

Sample covariance w/ noise

Pruning

RANSAC

20 40 60 80 100

0

0.1

0.2

0.3

0.4

dimension

excess` 2

error

Errorratesonsyntheticdata(unknowncovariance,isotropic):

Page 118: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

SYNTHETICEXPERIMENTS

Errorratesonsyntheticdata(unknowncovariance,anisotropic):

+10%noise

farfromidentity

Page 119: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

SYNTHETICEXPERIMENTS

20 40 60 80 100

0

50

100

150

200

dimension

excess` 2

error

Filtering

LRVCov

Sample covariance w/ noise

Pruning

RANSAC

20 40 60 80 100

0

0.5

1

dimension

excess` 2

error

Errorratesonsyntheticdata(unknowncovariance,anisotropic):

Page 120: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

REALDATAEXPERIMENTS

Famousstudyof[Novembre etal.‘08]:TaketoptwosingularvectorsofpeoplexSNPmatrix(POPRES)

Page 121: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

REALDATAEXPERIMENTS

Famousstudyof[Novembre etal.‘08]:TaketoptwosingularvectorsofpeoplexSNPmatrix(POPRES)

-0.2

-0.1

0

0.1

0.2

0.3

-0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2

Original Data

Page 122: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

REALDATAEXPERIMENTS

Famousstudyof[Novembre etal.‘08]:TaketoptwosingularvectorsofpeoplexSNPmatrix(POPRES)

-0.2

-0.1

0

0.1

0.2

0.3

-0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2

Original Data

Page 123: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

REALDATAEXPERIMENTS

Famousstudyof[Novembre etal.‘08]:TaketoptwosingularvectorsofpeoplexSNPmatrix(POPRES)

-0.2

-0.1

0

0.1

0.2

0.3

-0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2

Original Data

“GenesMirrorGeographyinEurope”

Page 124: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

REALDATAEXPERIMENTS

Canwefindsuchpatternsinthepresenceofnoise?

Page 125: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

REALDATAEXPERIMENTS

Canwefindsuchpatternsinthepresenceofnoise?

-0.2 -0.1 0 0.1 0.2 0.3-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2Pruning Projection

10%noise

WhatPCAfinds

Page 126: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

REALDATAEXPERIMENTS

Canwefindsuchpatternsinthepresenceofnoise?

-0.2 -0.1 0 0.1 0.2 0.3-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2Pruning Projection

10%noise

WhatPCAfinds

Page 127: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

-0.2 -0.1 0 0.1 0.2 0.3

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2RANSAC Projection

REALDATAEXPERIMENTS

Canwefindsuchpatternsinthepresenceofnoise?

10%noise

WhatRANSACfinds

Page 128: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

-0.2

-0.1

0

0.1

0.2

0.3

-0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2

XCS Projection

REALDATAEXPERIMENTS

Canwefindsuchpatternsinthepresenceofnoise?

10%noise

WhatrobustPCA(viaSDPs)finds

Page 129: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

-0.2

-0.1

0

0.1

0.2

0.3

-0.15-0.1-0.0500.050.10.150.2

Filter Projection

REALDATAEXPERIMENTS

Canwefindsuchpatternsinthepresenceofnoise?

10%noise

Whatourmethodsfind

Page 130: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

-0.2

-0.1

0

0.1

0.2

0.3

-0.15-0.1-0.0500.050.10.150.2

Filter Projection

-0.2

-0.1

0

0.1

0.2

0.3

-0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2

Original Data

REALDATAEXPERIMENTS

10%noise

Whatourmethodsfind

nonoise

Thepowerofprovablyrobustestimation:

Page 131: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

LOOKINGFORWARD

CanalgorithmsforagnosticallylearningaGaussianhelpinexploratorydataanalysisinhigh-dimensions?

Page 132: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

LOOKINGFORWARD

CanalgorithmsforagnosticallylearningaGaussianhelpinexploratorydataanalysisinhigh-dimensions?

Isn’tthiswhatwewouldhavebeendoingwithrobuststatisticalestimators,ifwehadthemallalong?

Page 133: Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

Thanks!AnyQuestions?

Summary:� Nearlyoptimalalgorithmforagnosticallylearningahigh-dimensionalGaussian

� Generalrecipeusingrestrictedeigenvalueproblems� Furtherapplicationstoothermixturemodels� Ispractical,robuststatisticswithinreach?