42
The End of Chi-Squareds and a New Era in Goodness of Fit Tests Examples: The End of Bayesianism The End of Physics Let us end chi-squareds too!!! Ilya Narsky Caltech November 2003 1

History and modern developments in Goodness of Fit tests - Caltech

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: History and modern developments in Goodness of Fit tests - Caltech

The End of Chi-Squareds and a New Era in Goodness of Fit Tests

Examples:The End of Bayesianism

The End of PhysicsLet us end chi-squareds too!!!

Ilya Narsky Caltech November 2003 1

Page 2: History and modern developments in Goodness of Fit tests - Caltech

History and modern developments in

Goodness of Fit tests

Ilya Narsky Caltech November 2003 2

Page 3: History and modern developments in Goodness of Fit tests - Caltech

Statistics in HEP• Signs of growing interest: 4 workshops in the last 3 yrs

– 2001 CERN– 2001 Fermilab– 2002 Durham– 2003 SLAC

• Bayesian vs Frequentist: Return of the Old Controversy

• No controversy => complimentary

• Today’s talk is entirely frequentist

with astrophysicists

Ilya Narsky Caltech November 2003 3

Page 4: History and modern developments in Goodness of Fit tests - Caltech

• Plenty of statistical papers in physics journals/web• Two magic words to look out for

– New• We propose to use a certain method described in the statistics literature,

perhaps with some straightforward modifications, for analysis of physics data. To our knowledge, no one used this method in physics analysis before. Example: Feldman & Cousins

• We cooked up something and we never cared to read the statistics literature. The method is new because none of our collaborators ever heard about it. Example: …I am trying to be nice today

– Obvious• integration of likelihood for extraction of upper limits => implicitly Bayesian

with uniform prior• using likelihood value at the max likelihood estimator of the parameter to

judge fit quality => basically, nonsense; usually says little about goodness of fit (yes, you can use it as a cross-check – just don’t call it “goodness of fit”)

• Two highly subjective and very extreme principles:– No physicist can invent a truly new statistical method– If a paper written by a physicist about statistical methods fails to quote

statistical literature, there is a substantial probability that this paper is garbage

Ilya Narsky Caltech November 2003 4

Page 5: History and modern developments in Goodness of Fit tests - Caltech

Outline• Hypothesis tests: basic definitions• What is goodness of fit (GOF) test?• History of binned GOF tests• Unbinned 1D GOF tests• Unbinned multivariate GOF tests (PHYSTAT 2003 talks)• What’s next…

Ilya Narsky Caltech November 2003 5

Page 6: History and modern developments in Goodness of Fit tests - Caltech

Notation• hypothesis nullunder (PDF) functiondensity y probabilit |( )θxf

data from estimated functiondensity y probabilit empirical ( )xfn

hypothesis nullunder (CDF) functiondensity cumulative |( )θxF

data from estimated functiondensity cumulative empirical ( )xFnit)about Bayesian(nothing likelihood of definition |()( )≡| θθ xfxL

error I Type Iα error II Type IIα

Ilya Narsky Caltech November 2003 6

Page 7: History and modern developments in Goodness of Fit tests - Caltech

Hypothesis test

Ilya Narsky Caltech November 2003 7

• Test of vs on observable space• Accept if ; reject if• Confidence Level =

• Power =

• Asymptotic properties and properties for finite samples• Optimize power at fixed CL

– Pitman relative efficiency of tests T1 and T2

– Bahadur relative efficiency of tests T1 and T2

T 0H 1H X0H 0HAx∈ CARx =∈

∫∈

=−Ax

dxHxfI )|(1 0α

∫∈

=−Rx

dxHxfII )|(1 1α

IIIII

IIIIII n

n ααααααε fixedat

)()()( 0

1

020

≤≤

=

IIII

III n

n ααααααε fixedat

)()()( 0

1

020

≤≤

=

β

Page 8: History and modern developments in Goodness of Fit tests - Caltech

Ideal test• Uniformly• Most• Powerful• Unbiased

• Neyman-Pearson Lemma– for test– likelihood ratio test is UMP

• True only for the simple hypothesis– if testing

LRT not necessarily UMP– however, nice asymptotic property

• Usually, it is not clear how to find UMP test

100 of instance same at the)(): HTTTT ββ ≤(≠∀

true)is H1if H0 P(reject true)is H0if 0P(reject H)): and 101100

≤(≤(∈∀∈∀ hhHhHh I βα

)θ|(~ xfX 110 : vs: θθθθ == 0 HHCxfxf >0 )|(/)|( 1θθ

010 : vs: θθθθ ≠= 0 HHCxfxf >0 )ˆ|(/)|( θθ

ppxLxL

θθθθ

χθθ

,...,,~)|(log2)ˆ(log2 2

0

21=

−|

Ilya Narsky Caltech November 2003 8

Page 9: History and modern developments in Goodness of Fit tests - Caltech

Locally powerful tests• Often need a two-sided test• Taylor expansion

• Score

• Information matrix

• Wald, 1943• Rao, 1948

010 : vs: θθθθ ≠= 0 HH

( ) ( ) ...)|(log)(21)|(log)()|(log)(log 2

22 +−+−+=|

0=00=00 θθθθ θθ

θθθθ

θθθθ xLddxL

ddxLxL

)(log( xLUi

i |∂∂

=) θθ

θ

⎥⎥⎦

⎢⎢⎣

⎡|

∂∂∂

−=) )(log(2

xLEIji

ij θθθ

θ θ

2~)ˆ(ˆ()ˆ( pT IW χθθθθθ 00 −)−=

21 ~)()()( pT UIUR χθθθ 0

−00=

Ilya Narsky Caltech November 2003 9

Page 10: History and modern developments in Goodness of Fit tests - Caltech

Quiz

1010...,,1,0;10/ ≤≤=== xiixX i

Is this sample drawn from a uniform distribution on [0,1]?

Ilya Narsky Caltech November 2003 10

Page 11: History and modern developments in Goodness of Fit tests - Caltech

What is Goodness of Fit?• Suppose…

• Is this distribution uniform?– alternative = presence of peaks in the data => YES– alternative = highly structured (equidistant) data => NO

• example: measure elapsed time between two events in a Geiger counter and plot intervals sequentially on a straight line

• What will GOF tests tell us?– binned chi-squared => YES– Kolmogorov-Smirnov => YES– distance to nearest neighbor => NO

• rejects equidistant data– Anderson-Darling => NO

• …but an entirely different reason: sensitivity to tails

1010...,,1,0;10/ ≤≤=== xiixX i

Ilya Narsky Caltech November 2003 11

Page 12: History and modern developments in Goodness of Fit tests - Caltech

Formulation of the problem• Test

• Have to assume that we know more about H1 than “anything else”. Otherwise, there is no criterion for choosing one GOF test over another.

• Design a GOF test in any way you like…• …but we need to know when it will work and when it

won’t => test the technique on several alternatives• There is no ultimate GOF test

Ilya Narsky Caltech November 2003 12

Page 13: History and modern developments in Goodness of Fit tests - Caltech

History of tests2χ

• Pearson, 1900

• Fisher, 1924– if is estimated from chi-squared minimization, then

• If estimated by other means, distribution may be different • Chernoff & Lehmann, 1954

– if is estimated by likelihood maximization

∑=

−∞→

−M

iMn

i

ii

npnpn

1

21

2

~)( χ dxxfpiA

i ∫ )= θ|(

pθθθθ ,...,, 21=

θ 21 pM −−χ

θ2

11

21

21

21 ( −

=−−−− ≤)+≤ ∑ M

p

kkpMpM χχθλχχ 1<)≤ θλ (0 k

Ilya Narsky Caltech November 2003 13

Page 14: History and modern developments in Goodness of Fit tests - Caltech

Alternatives• Log likelihood ratio

• Log likelihood ratio modified

• Neyman modified

• Freeman-Tukey

Cressie & Read, 1984:

all asymptotically (if params known in advance)

∑=

⎟⎟⎠

⎞⎜⎜⎝

⎛M

i i

ii np

nn1

log

∑=

⎟⎟⎠

⎞⎜⎜⎝

⎛M

i i

ii np

nnp1

log

∑=

−M

i i

ii

nnpn

1

2)(

( )∑=

−M

iii npn

1

2

∑= ⎥

⎥⎦

⎢⎢⎣

⎡−⎟⎟

⎞⎜⎜⎝

⎛1)+(

M

i i

ii npnn

1

12λ

λλ

0)=λ(

)−= 1(λ

)−= 2(λ

)−= 21(λ

)= 1( Pearson λ

21~ −Mχ

Ilya Narsky Caltech November 2003 14

Page 15: History and modern developments in Goodness of Fit tests - Caltech

Cressie & Read’s conclusions• moments converge to asymptotic moments most fast

for• when no knowledge of alternative available, use• when alternative is peaked, use• when alternative is dipped, use• use as an “excellent compromise” for

and

among popular chi-squared statistics, this conclusion favors the original Pearson test in the sense of Pitman efficiency

2χ7.23.0 ≤≤ λ

5.10 ≤≤ λ1=λ

32=λ

0=λ1≥inp

10≥n

1=λ

Ilya Narsky Caltech November 2003 15

Page 16: History and modern developments in Goodness of Fit tests - Caltech

General quadratic forms

Ilya Narsky Caltech November 2003 16

• GOF measure =

• For Pearson test

• Rao & Robson, 1974

∑ 2

∞→χ

n

TQVV ~

( ) MinpnpnVi

iii ,...,1=−= dxxfp

iAi ∫ )= θ|(

MMIQ ×=

( ) TpMpM

TpMpppMMMMM BBBJBIQ ×

×××××× −+=1

pθθθθ ,...,, 21=

⎟⎟⎠

⎞⎜⎜⎝

∂∂∂

−= )|(log2

xLEJji

ij θθθ j

i

iij

pp

Bθθ

θ ∂)∂

)=

((

1

21~)ˆ()ˆ()ˆ( −∞→ Mn

T VQV χθθθ

Page 17: History and modern developments in Goodness of Fit tests - Caltech

Who cares about binned tests when you can use an unbinned one?

• Unbinned tests are believed to be more powerful, even for large n

• Univariate tests:– general

• Kolmogorov-Smirnov• Cramer-von Mises family

– Cramer-von Mises test– Anderson-Darling– Watson

– specialized• uniformity• exponentiality• normality

Well studied and documented, e.g., book by D’Agostino & Stephens

• Multivariate tests?

nxxnxF i

n)()( CDF Empirical ≤

=

|)()(|sup xFxFnX −

[ ]∫ −X

n dxxfxxFxF )()()()( 2ψ

1)( =xψ

[ ])(1)(1)(

xFxFx

−⋅=ψ

[ ] 2

)()(

)()()(1)(

⎪⎭

⎪⎬

⎪⎩

⎪⎨

−−=∫

xFxF

dxxfxFxFx

n

Xn

ψ

Ilya Narsky Caltech November 2003 17

Page 18: History and modern developments in Goodness of Fit tests - Caltech

Power of univariate EDF tests

• “EDF statistics are usually much more powerful than the Pearson chi-square statistics”

• KS is most well-known but often less powerful than CvMand AD

• Watson statistic is powerful for detection of clustering of F-values at one point

• AD is similar to CvM but is more sensitive to the tails

as per M. Stephens, co-author of “GOF techniques”

Ilya Narsky Caltech November 2003 18

Page 19: History and modern developments in Goodness of Fit tests - Caltech

Neyman smooth tests• The idea goes back to Neyman, 1937

– define alternative to f(x) as– now test

• A set of orthonormal functions appropriate for this null hypothesis:– uniform => Legendre polynomials– normal => Hermite-Chebyshev polynomials– exponential => Laguerre polynomials– Poisson => Poisson-Charlier polynomials– etc

• GOF statistic derived from Rao statistic:– more complicated with nuisance parameters

)()(exp()(1

xfxhCxgK

iii ⎥

⎤⎢⎣

⎡)= ∑

=

θθ

0: vs: 10 ≠0= θθrr

HH

)(xhi

∑ ∑= =

⎥⎦

⎤⎢⎣

⎡K

i

n

jji xh

1

2

1

)(

Rayner & Best, “Smooth Tests of Goodness of Fit”, 1989

Ilya Narsky Caltech November 2003 19

Page 20: History and modern developments in Goodness of Fit tests - Caltech

• Most difficult question: how to choose K?– in practice usually choose K=2,3,4– Teresa Ledwina, 1994-1996: Data driven smooth tests

• choose max number of dimensions M large enough• define parameter subsets • likelihood under smooth alternative

• information number

• Schwartz’ Bayesian information criterion: choose K that gives maximal information number

– This method generally improves powers of Neyman smooth tests for a broad range of alternatives, in some cases dramatically

MKKiiK ,...,2,1;0 ++===Ω θ

[ ]∑=

)=)n

jjxgL

1

|(log(log θθ

nKLLK

K log(logsup21

−)=Ω∈

θθ

Ilya Narsky Caltech November 2003 20

Page 21: History and modern developments in Goodness of Fit tests - Caltech

Multivariate fits

Ilya Narsky Caltech November 2003 21

• Typical CLEO/BaBar/Belle analysis:– estimate parameters from Max Likelihood fit– generate toy MC assuming– derive GOF from and distribution of

• What is wrong with this procedure?– often does not say anything useful about consistency of model

and data– …understandably so because distribution of likelihood values is

not parameter-independent

)ˆ( DATA0 xLL |= θθθ ˆ

0 =

0L )ˆ( TOYxL |θ

∑=

−−−M

ipM

i

ii

npnpn

1

21

2

free-parameter ~)( χ

µµµ

on dependnot does )|ˆ( of ondistributi(family locationxLxf )−

∑ ⎥⎦

⎤⎢⎣

⎡⎟⎠⎞

⎜⎝⎛2−=−

⎟⎠⎞

⎜⎝⎛

σσ

σσ

ixfnL

xf

loglog2log2

1family scale

remember chi-squared tests?

parameter-free

not parameter-free

Page 22: History and modern developments in Goodness of Fit tests - Caltech

• A simple example:

• Toy MC generated with will give a perfect GOF value

• Not surprisingly: since it is not possible to extract new information from distribution of likelihood.

⎟⎠⎞

⎜⎝⎛−

1=)

τττ ttf exp|(

( )τττ ˆlog12ˆ(log2ˆ1

+=)−⇒1

= ∑=

nLtn

n

ii

ττ ˆ=

100%=)),( ττρ ˆˆ(L

Correlation between GOF statistic and parameter estimator must be small!!!

Ilya Narsky Caltech November 2003 22

Page 23: History and modern developments in Goodness of Fit tests - Caltech

What other multivariate methods are there?• Kolmogorov-Smirnov

– Saunders & Laud, 1980– Justel, Pena & Zamar, 1996– Powers???

• Cramer-von Mises– Powers???

• Problem: empirical CDF is not distribution-free => enhanced sensitivity to some parts of the observable space, reduced sensitivity to others

• My gut feeling: don’t use empirical CDF for multivariate GOF

• Tests of multivariate normality (bivariate mostly)– Mardia’s, Shapiro-Wilk’s, bivariate Neyman etc– relatively well studied

free-ondistributinot |)()(|sup

),...,,(),...,,()()()2()2()1()1(

)()2()1(

xFxFn

xxxxxxnxxxF

n

mmiiim

n

rr−

<<<=

[ ]∫ −X

n xdxfxxFxF rrrrr )()()()( 2ψ

Ilya Narsky Caltech November 2003 23

Page 24: History and modern developments in Goodness of Fit tests - Caltech

GOF talks at PHYSTAT 2003• Zech

– comparison of 2 multivariate samples using “energy test”• Yabsley

– applied Zech’s “energy test” to 2D and 1D fits in a Belle analysis• Kinoshita

– applied von Mises test of uniformity on a circle to 1D data• Raja

– likelihood ratio test using density estimated from data and density specified by the model

• Bonvicini– "Generalized chi square from extended likelihood moments"

(transparencies not available)• Narsky

– multivariate GOF using distances to nearest neighbors• Friedman (discussion panel)

– GOF based on decision trees

Ilya Narsky Caltech November 2003 24

Page 25: History and modern developments in Goodness of Fit tests - Caltech

Two-sample comparison using energy function

• Aslan & Zech, – hep-ex/0203010, “A new class of binning-free, multivariate

goodness-of-fit tests: the energy tests”– math.PR/0309164, “A NEW TEST FOR THE MULTIVARIATE TWO-

SAMPLE PROBLEM BASED ON THE CONCEPT OF MINIMUM ENERGY”

– N events in sample X and M events in sample Y

– include or not include variability within MC sample? (I vote “yes”)– use Euclidian distance for R:– if MC generation is expensive, use bootstrap

[ ][ ]∫ ∫ −−= ),()()()()( 00 yxRyfyfxfxfdydxφ

∑ ∑∑≠ ≠

+−=ji ji

jiji

jiji yyRM

yxRNM

xxRN

),(1),(2),(12

,2φ

|)(|),( yxRyxR −=

?

Ilya Narsky Caltech November 2003 25

Page 26: History and modern developments in Goodness of Fit tests - Caltech

• Several candidates for R(r):

• Of course, only 1/r comes from electrostatic energy; others have purely statistical motivation

• Aslan & Zech did quite a good job in comparison of test powers

• Their favorite choice is ||log),( yxyxR −−=

Ilya Narsky Caltech November 2003 26

Page 27: History and modern developments in Goodness of Fit tests - Caltech

• Bivariate Normality• 1D test of uniformity

Ilya Narsky Caltech November 2003 27

Page 28: History and modern developments in Goodness of Fit tests - Caltech

• More tests in 2D and 4D (Zech’s talk at PHYSTAT 2003)

• Excellent performance of the “energy” test up to 4D!!!

Ilya Narsky Caltech November 2003 28

Page 29: History and modern developments in Goodness of Fit tests - Caltech

No physicist can say anything truly new about statistics

• Research: 0.5 hr of my time plus Google engine– Cuadras & Fortiana (and occasionally co-authors), 1995, 1997,

2003– squared distance between samples X and Y

– variability of X with respect to measure of dissimilarity d(x,x’)

– Jensen difference and Rao’s quadratic entropy (Rao, 1982)

• Remember Zech’s formula?

[ ][ ]∫ ∫ −−= ),()()()()( 00 yxRyfyfxfxfdydxφ

Ilya Narsky Caltech November 2003 29

Page 30: History and modern developments in Goodness of Fit tests - Caltech

• Difference between physics and math:

• Is Zech’s method new? Of course, it is!!! Cuadras & Fortiana never attempted to use log|x-y| as measure of dissimilarity

• Aslan & Zech’s paper was certainly useful because Cuadras & Fortiana did not study the power properties of this method extensively

• Lesson – use Google for everything

A new class of binning-free, multivariate goodness-of-fit tests: the energy tests

Comparison of two multivariate samples using a logarithmic measure of point-to-point dissimilarity

Ilya Narsky Caltech November 2003 30

Page 31: History and modern developments in Goodness of Fit tests - Caltech

Two-sample comparison using nearest neighbor counts

• Friedman & Steppel, 1974; Schilling, 1986; Cuzick & Edwards, 1990– mix two samples together and count nearest neighbors that

belong to the same sample

– GOF statistic based on

• Friedman & Rafsky, 1979– minimal spanning tree (straight lines, no loops, min length)– GOF statistic = number of connections between samples

MN yyyMYxxxNX ,...,,)(,...,,)( 2121 == MiyzNixzMNZ iiNii ≤≤=≤≤==+ + 1,;1,)(

∑∑+

= =+=

MN

i

K

kiK kI

KMNT

1 1

)()(

1

⎩⎨⎧

= otherwise ,0

sample same the tobelongs point ofNN th- if ,1)(

ikkIi

Ilya Narsky Caltech November 2003 31

Page 32: History and modern developments in Goodness of Fit tests - Caltech

Raja, “The End of Bayesianism”• Claims that

– “With unbinned data, currently, the fitted parameters are obtained but no measure of goodness of fit is available.” – Paper submitted to Elsevier

– “Prior to my paper, the problem of goodness of fit in unbinnedlikelihoods was an unsolved one.” – Private Communications

• This work lacks study of power functions, comparison with other unbinned GOF methods and realistic examples from HEP analysis. Hopefully, will be extended in the future.

• Motivated his method through Bayesian approach while, in my opinion, there is nothing Bayesian about it.

Ilya Narsky Caltech November 2003 32

Page 33: History and modern developments in Goodness of Fit tests - Caltech

• Basic idea:

• Bickel & Rosenblatt, 1973; Bowman, 1989; Gourieroux & Tenreiro, 1996

• Bowman, 1989: “Density based tests for goodness of fit”

∑=

=n

iin xxK

nxf

1

),(1)( Estimator y Density Probabilit

)( and )( compare thenand xfxfn

[ ]∫ −X

n dxxfxf 2)()( use [ ] )()( :bias of beware0

xfxfE nH ≠

[ ]∫ −X

nn dxxfExf 20 )()( use

),(;2

)(exp2

1),( 2

2

nfhhhxx

hxxK i

i =⎥⎦

⎤⎢⎣

⎡ −−= 2σσπ

[ ]∫ −=X

nn dxxxfExfL )()()( 202 ψ

Ilya Narsky Caltech November 2003 33

Page 34: History and modern developments in Goodness of Fit tests - Caltech

• Bowman compared performance of the new statistic with– Vasicek, CvM, AD and Shapiro-Wilk tests of normality– Watson test of Cramer-von Mises distribution

• Conclusion: performs better for some alternatives, worse for others

So is Raja’s method new? Of course, it is!!!

⎥⎦

⎤⎢⎣

⎡− ∑

= )()(log2Test Ratiood Likeliho:Raja

1 in

in

i xfxf

[ ]∫ −X

nn dxxfExf 20 )()( Error Squared Integrated :authors Previous

Ilya Narsky Caltech November 2003 34

Page 35: History and modern developments in Goodness of Fit tests - Caltech

Distance to nearest neighbor• Review by Dixon (Iowa State)

– www.stat.iastate.edu/preprint/articles/2001-19.pdf

• Clark & Evans, 1954, 1979– GOF statistic = average distance between nearest neighbors

within a population– Used to test 2D populations of various plants for uniformity– Later generalized this approach to an arbitrary number of

dimensions cited 1000+ on Web of Science

• Diggle, 1979– Build an entire distribution of ordered distances to nearest

neighbors and apply KS or CvM test to this distribution– Not a true p.d.f., of course, because NN distances are obviously

interdependent

Ilya Narsky Caltech November 2003 35

Page 36: History and modern developments in Goodness of Fit tests - Caltech

• Ripley, 1976-1977 (cited 600+ on WoS)–– - average intensity of the process – for a uniform process on a plane – plot expected and observed K vs t and estimate GOF from

maximal distance between the two K’s– Seems to be a popular method in ecology

• Bickel & Breiman, 1983; Schilling, 1983– GOF based on distributions of

= Poisson probability of observing one point in a sphere V centered at this point

– form a full distribution of ordered Zi’s and compare with the expected one

– studied performance for multivariate normal densities• SLEUTH at D0: search for new physics

– instead of spheres, use Voronoi regions (Voronoi region = regionof space that is closer to this point than to any other point)

process theofpoint arbitrary an of distance withinpoints )( tnEtK =λNV /ˆ =λ

2)( ttK π=

[ ])()(exp iii xVxnfZ −=

Ilya Narsky Caltech November 2003 36

Page 37: History and modern developments in Goodness of Fit tests - Caltech

Test of Uniformity of Distances between centers of Spanish towns at CL=97%

Test of Uniformity of Positions of Redwood seedlings at CL>95%

Ilya Narsky Caltech November 2003 37

Page 38: History and modern developments in Goodness of Fit tests - Caltech

Narsky, physics/0306171• Use a bivariate distribution of maximal vs minimal

distance to K nearest neighbors• Ideal for detection of well-localized irregularities, e.g.,

unexpected peaks in the data• Powers for bivariate normals and uniform pdf

– compared only with KS

• Example: How consistent are the data with background pdf?−+→ llKB (*)

CL vs cluster size for Kll analysis. Clusters that give max deviation from background pdf are shown with open circles.

Ilya Narsky Caltech November 2003 38

Page 39: History and modern developments in Goodness of Fit tests - Caltech

How to choose number of nearest neighbors?

• Schilling:

Ilya Narsky Caltech November 2003 39

Page 40: History and modern developments in Goodness of Fit tests - Caltech

GOF based on decision tree• As sketched by Friedman at the Panel Discussion

– my free interpretation (because transparencies not available)• Standard decision tree and CART algorithm

– the optimal set of tree splits is the one that minimizes overallerror e(T)

– hence, e(T) can be a measure of GOF

Ttdxtn ii

treedecisiontheof subset a in features and coords withevents )(

∑∈

=txi

i

dtn

td)(

1)(

( )∑∈

−=tx

ii

tddtn

te 2)()(

1)(subset per error

∑∈

=Tt

teTe )()( per treeerror

Ilya Narsky Caltech November 2003 40

Page 41: History and modern developments in Goodness of Fit tests - Caltech

Transform or not transform?• Generally, all multivariate methods described above are

applicable to “raw” distributions• But if you want a GOF test to be equally sensitive to all

regions of the observable space, need to perform this test in the flattened space (aka unit hypercube)

• Transformation to uniformity, of course, is not unique– transformation to m-dimensional unit cube using marginal CDF’s

– point-to-point distances not invariant under relabeling of components, rotation etc

mixxxxFuxFu

iii ≤≤==

− 2),...,,|()(

121

11

Ilya Narsky Caltech November 2003 41

Page 42: History and modern developments in Goodness of Fit tests - Caltech

Summary• Physicists are smart – they can independently re-invent

statistical methods invented decades ago…• …and doing so they attract attention of the community to

application of statistical methods in HEP practice.• Joy of rediscovery aside, we need to admit that we are

not as statistically advanced as other communities (e.g., ecology, medical research, finance etc). A better idea might be to study the statistics literature instead of re-inventing it.

• It would be very useful to study powers of the mentioned GOF tests on realistic HEP problems.

Ilya Narsky Caltech November 2003 42