Transcript
Page 1: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

1

Statistical Flukes, the Higgs Discovery, and 5 Sigma Deborah G. Mayo

Virginia Tech

(I) “5 sigma observed effect”.

One of the biggest science events of 2012-13 was the announcement on July 4, 2012 of evidence for the discovery of a Higgs particle based on a “5 sigma observed effect”. With the March 2013 data analysis, the 5 sigma difference grew to 7 sigmas.

Page 2: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

2

• Because the 5 sigma report refers to frequentist statistical tests, the discovery was immediately imbued with controversies from philosophy of statistics

• I’m an outsider to high energy physics, HEP, but (aside from finding it fascinating), any philosopher of statistics worth her salt should be able to illuminate some of the more public controversies e.g., P-values.

Not difficult to do, fortunately.

Page 3: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

3

(II) Bad Science? (O’Hagan, prompted by Lindley) To the ISBA: “Dear Bayesians: We’ve heard a lot about the Higgs boson. ...Specifically, the news referred to a confidence interval with 5-sigma limits.… Five standard deviations, assuming normality, means a p-value of around 0.0000005… Why such an extreme evidence requirement? We know from a Bayesian perspective that this only makes sense if (a) the existence of the Higgs boson has extremely small prior probability and/or (b) the consequences of erroneously announcing its discovery are dire in the extreme. … …. Are the particle physics community completely wedded to frequentist analysis? If so, has anyone tried to explain what bad science that is?”

Page 4: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

4

Not bad science at all!

• HEP physicists are sophisticated with their statistical methodology: they’d seen too many bumps disappear.

• They want to ensure that before announcing the hypothesis H*: “a new particle has been discovered” that: H* has been given a severe run for its money.

Significance tests and cognate methods (confidence intervals) are methods of choice here for good reason

Page 5: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

5

(III) Simple statistical significance test: ingredients (i) Null or test hypothesis: in terms of an unknown parameter μ in a statistical model, an idealized representation of underlying data generation: a model of the detector

μ is the “global signal strength” parameter H0: μ = 0 i.e., zero signal (background only hypothesis)

Η0: µ = 0 vs. Η1: µ > 0 μ = 1: Standard Model (SM) Higgs boson signal in addition to

the background

Page 6: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

6

Empirical data are modeled as observed values of a sample X (random variable); here numbers of events of a type. (ii). Test statistic or distance statistic: d(X)—the larger its value the more inconsistent the data are with Η0 in the direction of alternatives or discrepancies of interest. d(X): how many excess events of a given type are observed (from trillions of collisions) in comparison to what would be expected from background alone (in the form of bumps). d(X) has a known probability distribution under Η0 (and under various alternatives).

Page 7: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

7

(iii). The P-value (or significance level) associated with d(x0) is the probability of a difference as large or larger than d(x0), under the assumption that H0 is true:

P-value=Pr(d(X) > d(x0); H0)

If the P-value is sufficiently small (e.g., .05, .01, .001) d(x0) is said to be statistically significant (or significant at the level reached) d(X) can be given in terms of standard deviation units, or sigma units

Page 8: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

8

The distribution of statistic d(X) is the sampling distribution

Pr(d(X) > 1; H0) = .16 Pr(d(X) > 2; H0) = .02 Pr(d(X) > 3; H0) =.001 Pr(d(X) > 4; H0) = .00003

Pr(d(X) > 5; H0)= .0000003

The probability of observing results as or more extreme as 5 sigmas, under H0, is approximately 1 in 3,500,000.

Page 9: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

9

Normal distribution

Page 10: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

10

The actual computations are based on simulating what it would be like were Η0: µ = 0 (signal strength = 0), fortified with much cross-checking of results. So the significance test has:

1) Data x0 and hypotheses Η0: µ = 0 vs. Η1: µ > 0 2) A (distance) test statistic d(X) 3) Probability distribution of d(X) under the null and various

alternatives

Page 11: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

11

There’s generally a rule of interpretation:

• if d(X) > 5 sigma, infer discovery

• if d(X) > 2 sigma, get more data We want methods with high capability to detect discrepancies while avoiding mistaking spurious bumps as real.

 

Page 12: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

12

 

• First stage: test for a real effect (Cox’s taxonomy: searching for structure)

Not a point against point test! Cousins: H0 is Standard Model (SM) missing a piece

• Second stage: determine its properties, test SM vs “Beyond SM” (BSM)

(Cox: embedded)

Page 13: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

13

 (IV) The P-Value Police

When the July 2012 report came out, a number of people set out to grade the different interpretations of the P-value report:

Larry Wasserman (“Normal Deviate” on his blog) called them the “P-Value Police”.

• Job: to examine if reports by journalists and scientists could by any stretch of the imagination be seen to have misinterpreted the sigma levels as posterior probability assignments to the various models and claims.

David Spiegelhalter: A well-known (Bayesian) statistician: risk communication.

Page 14: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

14

Thumbs up or down Thumbs up, to the ATLAS group report:

“A statistical combination of these channels and others puts the significance of the signal at 5 sigma, meaning that only one experiment in three million would see an apparent signal this strong in a universe without a Higgs.”

Thumbs down to reports such as:

“There is less than a one in 3.5 million chance that their results are a statistical fluke.”

Critics (Spiegelhalter) allege they are misinterpreting the P-value as a posterior probability on H0.

Page 15: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

15

Not so. H0 does not say the observed results are due to background alone, or are flukes,

Η0: µ = 0 Although if H0 were true it follows that various results would occur with specified probabilities. (In particular, it entails that large bumps are improbable.)

Page 16: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

16

In fact it is an ordinary error probability. Since it’s not just a single result, but a dynamic test procedure, we can write it: (1) Pr(Test T produces d(X) > 5; H0) ≤ .0000003

Note: (1) is not a conditional probability (that involves a prior)

Pr(Test T produces d(X) > 5 and H0)/ Pr(H0)

Page 17: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

17

(V) Detaching inference(s) from the evidence True, the inference actually detached goes beyond a P-value report. Infer:

(2)There is strong evidence for

(first) a genuine discrepancy from H0

(later) H*: a Higgs (or a Higgs-like) particle.

Gradations: indication, evidence, discovery (up to July 4, 2012)

Inferring (2) relies on an implicit principle of evidence.

Page 18: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

18

Test Principle #1: (statistical significance) Data provide evidence for a genuine discrepancy from H0 (just) to the extent that H0 would (very probably) have survived, were H0 a reasonably adequate description of the process generating the data. (1)’ Pr(Test T produces d(X) < 5; H0) > .9999997

• With probability .9999997, the bumps would be smaller, would behave like flukes, disappear with more data, not be produced at both CMS and ATLAS, in a world given by H0.

• They didn’t disappear, they grew (2) So, H*: a Higgs (or a Higgs-like) particle.  

Page 19: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

19

 Following the rule: Interpret 5 sigma bumps as a real effect (a discrepancy from 0), you’d erroneously interpret data with probability less than .0000003

An error probability

The warrant isn’t low long-run error (in a case like this) but detaching an inference based on “strong argument from coincidence”. Qualifying claims by how well they have been probed (precision, accuracy).

Page 20: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

20

Second Stage Once the null is rejected, the job shifts to testing if various parameters agree with the SM predictions. Now the corresponding null hypothesis is the SM Higgs boson The null hypothesis at the second stage

H[2]0: SM Higgs boson: µ = 1

and discrepancies from it are probed, estimated with confidence intervals (Cousins)

Page 21: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

21

Takes us to the most important role served by statistical significance tests: (requiring a 5 sigma excess for discovery):

It affords a standard for:

• (a) denying sufficient evidence of a new particle, inferring “not a genuine effect”, and

• (b) ruling out values of various parameters, e.g., mass ranges.

Page 22: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

22

(VI) Positive and Negative test results of the analysis Positive (very low P-value): infer genuine effects Negative (moderate P-value): deny real effects (infer flukes),

Deny excesses indicate BSM.

• At  both  stages,  they  were  engaged  in  exploration  for  BSM  physics  (beyond  the  standard  model)    

• It  combined  testing,  estimating,  exploring.  

Page 23: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

23

NYT: “Chasing the Higgs” [Dennis Overbye interviews spokespeople Gianotti (ATLAS) and Tonelli (CMS).]

• Once a month they got bumps that were random flukes “So ‘we crosscheck everything’ and ‘try to kill’ any anomaly that might be merely random.” They were convinced they had found evidence of extra dimensions of space time “and then the signal faded like an old tied balloon.”

Page 24: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

24

• “We’ve made many discoveries,” Dr. Tonelli said,

“most of them false.”

• “Ninety-nine percent of the time, that is just what happens.”

What’s the difference between HEP physics and social psychology (and other big data screening) where “most results in most fields are false”, or so we keep hearing? HEP physicists don’t publish on the basis of a single “nominal” (or “local”) P-value.

Page 25: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

25

Look Elsewhere Effect (LEE)

A nominal (or local) P-value: the P-value at a particular, data-determined, mass. But the probability of so impressive a difference anywhere in a mass range would be greater than the local one. I take it that requiring a smaller P-value (i.e., bigger difference), at least 5 sigma, is akin to adjusting for multiple trials or look elsewhere effect LEE.

Page 26: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

26

“Game of Bump-Hunting” (Overbye) “One bump on physicists’ charts…was disappearing. But another was blooming like the shy girl at a dance. …. nobody could remember exactly when she had come in. But she was the one who would marry the prince.” “It continued to grow over the fall until it had reached the 3-sigma level — the chances of being a fluke [spurious significance] were less than 1 in 740, enough for physicists to admit it to the realm of “evidence” of something, but not yet a discovery.”

Page 27: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

27

Background knowledge of how flukes behave:

• “If they were flukes, more data would make them fade into the statistical background,

• If not, the bumps would grow in slow motion into a bona fide discovery.”

• They give the bump a hard time, look at multiple decay channels, and don’t tell the details of where they found her to the other team.

• When two independent experiments find the same particle signal at the same mass, it overcomes the multiple testing and gives a strong argument.

 

Page 28: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

28

(VII) Possible Anomalies for SM They also follow up bumps indicating discrepancies with

H[2]0 SM Higgs boson: µ = 1

Hints of anomalies with the “plain vanilla” particle of the Standard Model (viewed as tests or corresponding interval estimates) Even a year later they examined these anomalies with more data.

Page 29: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

29

Curb your enthusiasm

Matt Strassler: “The excess (in favor of BSM properties) became a bit smaller each time…. That’s an unfortunate sign, if one is hoping the excess isn’t just a statistical fluke.” Or they’d see the bump at ATLAS… and not CMS “Taking all of the data, and not cherry picking…there’s nothing here that you can call “evidence” for the much sought BSM.” (Strassler) Considering the frequent flukes, and the hot competition between the ATLAS and CMS to be first, a tool for when to “curb their enthusiasm” seems exactly what was wanted.

Page 30: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

30

So, this “negative” portion involves:

(a) denying BSM anomalies are real

(b) setting upper bounds for these discrepancies with the SM Higgs

Each with its own test statistic and evidence g(x0)

H[2]0 : SM Higgs boson: µ = 1

Failing to reject the null isn’t evidence for it, but they could set upper bounds.

Page 31: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

31

Test Principle #2 (for non-significance): Data provide evidence to rule out a discrepancy δ∗ to the extent that a larger g(x0) would very probably have resulted if δ were as great as δ∗

Detach δ < δ∗ (could equivalently be viewed as inferring a confidence interval estimate δ = g(x0) + ε)

So these tools seem just the thing for this research

Page 32: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

32

(VIII) Conclusion O’Hagan published a digest of responses a few days later

• “They surely would be willing to announce SM Higgs discovery if they were 99.99% certain of the existence of the SM Higgs” (and avoid the ad hoc 5 sigma)

Pr(SM Higgs) = .9999

• It would require prior probabilities to “SM Higgs” claim, and prior distribution on the numerous “nuisance” parameters of the background and the signal.

• Multivariate priors, correlations between parameters, joint priors, and the catchall: P(data|not H*)

Page 33: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

33

• Even if all that were done and agreed upon, it would not have given the kind of tools needed to find things out

Worse: spiked priors Pr(No SM Higgs)= Pr(SM Higgs)=.5

(not uninformative)

• Physicists believed in SM Higgs before building the big collider, given the perfect predictive success of SM, its simplicity–very different than having evidence for a discovery.

• Others may believe (and fervently wish) that it will break down somewhere.

Page 34: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

34

P-value police: Those who think we want a posterior probability in H* might be sliding from what may be inferred from this legitimate high probability: Pr(test T would not reach 5 sigma; H0) > .9999997

With probability .9999997, our methods would show that the bumps disappear, under the assumption data are due to background H0. They don’t disappear but grow. Infer H* Qualified by the test properties

Page 35: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

35

What’s passed with high severity?

H*: a Higgs boson consistent with the SM (at the levels of precision and accuracy of these experiments)

An adequate account should also always report alternatives that have not been well ruled out

• measurements not precise enough to rule out discrepancies from a SM Higgs as large as 10%, 20%, 50%.

• There are rivals to the SM that would not have been distinguishable with the given data (which went through a lot of filtering, and triggering rules).

They will get more data in 2015, there’s talk of a more precise detector being built

Page 36: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

36

REFERENCES (Online links)  • Atlas  report:  http://cds.cern.ch/record/1494183/files/ATLAS-­‐CONF-­‐2012-­‐162.pdf  

• Atlas  Higgs  experiment,  public  results:  https://twiki.cern.ch/twiki/bin/view/AtlasPublic/HiggsPublicResults  

• CMS  Higgs  experiment,  public  results:  https://twiki.cern.ch/twiki/bin/view/CMSPublic/PhysicsResultsHIG  

• Mayo,  D.  G.  and  Cox,  D.  R.  (2010).  "Frequentist  Statistics  as  a  Theory  of  Inductive  Inference"  in  Error  and  Inference:  Recent  Exchanges  on  Experimental  Reasoning,  Reliability  and  the  Objectivity  and  Rationality  of  Science  (D  Mayo  and  A.  Spanos  eds.),  Cambridge:  Cambridge  University  Press:  1-­‐27.  This  paper  appeared  in  The  Second  Erich  L.  Lehmann  Symposium:  Optimality,  2006,  Lecture  

Page 37: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

37

Notes-­‐Monograph  Series,  Volume  49,  Institute  of  Mathematical  Statistics,  pp.  247-­‐275.    

• Cousins,  R.  (2014).  “The Jeffreys-Lindley Paradox and Discovery Criteria in High Energy Physics” http://arxiv.org/abs/1310.3791

• O’Hagan  letter:  

§ Original  letter  with  responses:  http://bayesian.org/forums/news/3648  

§ 1st  link  in  a  group  of  discussions  of  the  letter:  http://errorstatistics.com/2012/07/11/is-­‐particle-­‐physics-­‐bad-­‐science/  

• Overbye,  D.  (March  15,  2013)  “Chasing  the  Higgs,”  New  York  Times:  http://www.nytimes.com/2013/03/05/science/chasing-­‐the-­‐higgs-­‐boson-­‐how-­‐2-­‐teams-­‐of-­‐rivals-­‐at-­‐CERN-­‐searched-­‐for-­‐physics-­‐most-­‐elusive-­‐particle.html?pagewanted=all&_r=0  

Page 38: Statistical Flukes, the Higgs Discovery, and 5 Sigma

11/5

38

• Spiegelhalter,  D.  (August  7,  2012)  blog,  Understanding  Uncertainty  ,  “Explaining  5  sigma  for  the  Higgs:  how  well  did  they  do?”  http://understandinguncertainty.org/explaining-­‐5-­‐sigma-­‐higgs-­‐how-­‐well-­‐did-­‐they-­‐do  

• Strassler,  M.  (July  2,  2013)  blog,  Of  Particular  Significance,  “A  Second  Higgs  Particle”:  http://profmattstrassler.com/2013/07/02/a-­‐second-­‐higgs-­‐particle/  

• Wasserman,  L.  (July  11,  2012)  blog,  Normal  Deviate,    “The  Higgs  Boson  and  the  P-­‐Value  Police”:  http://normaldeviate.wordpress.com/2012/07/11/the-­‐higgs-­‐boson-­‐and-­‐the-­‐p-­‐value-­‐police/  


Recommended