8
Definition and Treatment of Systematic Uncertainties in High Energy Physics and Astrophysics Pekka K. Sinervo Department of Physics, University of Toronto, Toronto, ON M5S 1A7, CANADA Systematic uncertainties in high energy physics and astrophysics are often significant contributions to the overall uncertainty in a measurement, in many cases being comparable to the statistical uncertainties. However, consistent definition and practice is elusive, as there are few formal definitions and there exists significant ambiguity in what is defined as a systematic and statistical uncertainty in a given analysis. I will describe current practice, and recommend a definition and classification of systematic uncertainties that allows one to treat these sources of uncertainty in a consistent and robust fashion. Classical and Bayesian approaches will be contrasted. 1. INTRODUCTION TO SYSTEMATIC UNCERTAINTIES Most measurements of physical quantities in high energy physics and astrophysics involve both a statis- tical uncertainty and an additional “systematic” un- certainty. Systematic uncertainties play a key role in the measurement of physical quantities, as they are of- ten of comparable scale to the statistical uncertainties. However, as I will illustrate, the definition of these two sources of uncertainty in a measurement is in practice not clearly defined, which leads to confusion and in some cases incorrect inferences. A coherent approach to systematic uncertainties is, however, possible and I will attempt to outline a framework to achieve this. Statistical uncertainties are the result of stochastic fluctations arising from the fact that a measurement is based on a finite set of observations. Repeated mea- surements of the same phenomenon will therefore re- sult in a set of observations that will differ, and the statistical uncertainty is a measure of the range of this variation. By definition, statistical variations between two identical measurements of the same phenomenon are uncorrelated, and we have well-developed theories of statistics that allow us to predict and take account of such uncertainties in measurement theory, in infer- ence and in hypothesis testing (see, for example, [1]). Examples of statistical uncertainties include the finite resolution of an instrument, the Poisson fluctations associated with measurements involving finite sample sizes and random variations in the system one is ex- amining. Systematic uncertainties, on the other hand, arise from uncertainties associated with the nature of the measurement apparatus, assumptions made by the experimenter, or the model used to make inferences based on the observed data. Such uncertainties are generally correlated from one measurement to the next, and we have a limited and incomplete theoreti- cal framework in which we can interpret and accom- modate these uncertainties in inference or hypothesis testing. Common examples of systematic uncertainty include uncertainties that arise from the calibration of the measurement device, the probability of detection of a given type of interaction (often called the “accep- tance” of the detector), and parameters of the model used to make inferences that themselves are not pre- cisely known. The definition of such uncertainties is often ad hoc in a given measurement, and there are few broadly-accepted techniques to incorporate them into the process of statistical inference. All that being said, there has been significant thought given to the practical problem of how to incor- porate systematic uncertainties into a measurement. Examples of this work include proposals to combine statistical and systematic uncertainties into setting confidence limits on measurements [2–6], techniques to estimate the magnitude of systematic uncertain- ties [7], and the use of standard statistical techniques to take into account systematic uncertainties [8, 9]. In addition, there have been numerous papers pub- lished on the systematic uncertainties associated with a given measurement [10]. In this review, I will first discuss a few case studies that illustrate how systematic uncertainties enter into some current measurements in high energy physics and astrophysics. I will then discuss a way in which one can consistently identify and characterize system- atic uncertainties. Finally, I will outline the various techniques by which statistical and systematic uncer- tainties can be formally treated in measurements. 2. CASE STUDIES 2.1. W Boson Cross Section: Definitions are Relative The production of the charged intermediate vector boson, the W , in proton-antiproton (p ¯ p) annihilations is predicted by the Standard Model, and the measure- ment of its rate is of interest in high energy physics. This measurement involves counting the number of candidate events in a sample of observed interactions, PHYSTAT2003, SLAC, Stanford, California, September 8-11, 2003 122

Definition andTreatment of Systematic …...Definition andTreatment of Systematic Uncertainties in High Energy Physics and Astrophysics Pekka K. Sinervo Department of Physics, University

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Definition andTreatment of Systematic …...Definition andTreatment of Systematic Uncertainties in High Energy Physics and Astrophysics Pekka K. Sinervo Department of Physics, University

Definition and Treatment of Systematic Uncer tainties in High EnergyPhysics and Astr oph ysics

Pekka K. SinervoDepartment of Physics, University of Toronto, Toronto, ON M5S 1A7, CANADA

Systematic uncertainties in high energy physics and astrophysics are often significant contributions to theoverall uncertainty in a measurement, in many cases being comparable to the statistical uncertainties. However,consistent definition and practice is elusive, as there are few formal definitions and there exists significantambiguity in what is defined as a systematic and statistical uncertainty in a given analysis. I will describecurrent practice, and recommend a definition and classification of systematic uncertainties that allows one totreat these sources of uncertainty in a consistent and robust fashion. Classical and Bayesian approaches will becontrasted.

1. INTRODUCTION TO SYSTEMATICUNCERTAINTIES

Most measurements of physical quantities in highenergy physics and astrophysics involve both a statis-tical uncertainty and an additional “systematic” un-certainty. Systematic uncertainties play a key role inthe measurement of physical quantities, as they are of-ten of comparable scale to the statistical uncertainties.However, as I will illustrate, the definition of these twosources of uncertainty in a measurement is in practicenot clearly defined, which leads to confusion and insome cases incorrect inferences. A coherent approachto systematic uncertainties is, however, possible and Iwill attempt to outline a framework to achieve this.

Statistical uncertainties are the result of stochasticfluctations arising from the fact that a measurementis based on a finite set of observations. Repeated mea-surements of the same phenomenon will therefore re-sult in a set of observations that will differ, and thestatistical uncertainty is a measure of the range of thisvariation. By definition, statistical variations betweentwo identical measurements of the same phenomenonare uncorrelated, and we have well-developed theoriesof statistics that allow us to predict and take accountof such uncertainties in measurement theory, in infer-ence and in hypothesis testing (see, for example, [1]).Examples of statistical uncertainties include the finiteresolution of an instrument, the Poisson fluctationsassociated with measurements involving finite samplesizes and random variations in the system one is ex-amining.

Systematic uncertainties, on the other hand, arisefrom uncertainties associated with the nature of themeasurement apparatus, assumptions made by theexperimenter, or the model used to make inferencesbased on the observed data. Such uncertainties aregenerally correlated from one measurement to thenext, and we have a limited and incomplete theoreti-cal framework in which we can interpret and accom-modate these uncertainties in inference or hypothesistesting. Common examples of systematic uncertainty

include uncertainties that arise from the calibration ofthe measurement device, the probability of detectionof a given type of interaction (often called the “accep-tance” of the detector), and parameters of the modelused to make inferences that themselves are not pre-cisely known. The definition of such uncertainties isoften ad hoc in a given measurement, and there arefew broadly-accepted techniques to incorporate theminto the process of statistical inference.

All that being said, there has been significantthought given to the practical problem of how to incor-porate systematic uncertainties into a measurement.Examples of this work include proposals to combinestatistical and systematic uncertainties into settingconfidence limits on measurements [2–6], techniquesto estimate the magnitude of systematic uncertain-ties [7], and the use of standard statistical techniquesto take into account systematic uncertainties [8, 9].In addition, there have been numerous papers pub-lished on the systematic uncertainties associated witha given measurement [10].

In this review, I will first discuss a few case studiesthat illustrate how systematic uncertainties enter intosome current measurements in high energy physicsand astrophysics. I will then discuss a way in whichone can consistently identify and characterize system-atic uncertainties. Finally, I will outline the varioustechniques by which statistical and systematic uncer-tainties can be formally treated in measurements.

2. CASE STUDIES

2.1. W Boson Cross Section: Definitionsare Relative

The production of the charged intermediate vectorboson, the W , in proton-antiproton (pp̄) annihilationsis predicted by the Standard Model, and the measure-ment of its rate is of interest in high energy physics.This measurement involves counting the number ofcandidate events in a sample of observed interactions,

PHYSTAT2003, SLAC, Stanford, California, September 8-11, 2003

122

Page 2: Definition andTreatment of Systematic …...Definition andTreatment of Systematic Uncertainties in High Energy Physics and Astrophysics Pekka K. Sinervo Department of Physics, University

Nc, estimating the number of “background” events inthis sample from other processes, Nb, estimating theacceptance of the apparatus including all selection re-quirements used to define the sample of events, ε, andcounting the number of pp̄ annihiliations, L. The crosssection for W boson production is then

σW =Nc − Bb

εL. (1)

The CDF Collaboration at Fermilab has recentlyperformed such a measurement [11], as illustrated inFig. 1 where the transverse mass of a sample of candi-date W → eνe decays is illustrated. The measurementis quoted as

σW = 2.64 ± 0.01(stat) ± 0.18(syst) nb, (2)

where the first uncertainty reflects the statistical un-certainty arising from the size of the candidate sam-ple (approximately 38,000 candidates) and the seconduncertainty arises from the background subtraction inEq. (1). We can estimate these uncertainties as

σstat = σ0/√

Nc (3)

σsyst = σ0

(

δNb

Nb

)2

+

(

δε

ε

)2

+

(

δL

L

)2

, (4)

where the three terms in σsyst are the uncertaintiesarising from the background estimate δNb, the ac-ceptance δε and the integrated luminosity δL. Theparameter σ0 is the measured value.

In the same sample, the experimenters also observethe production of the neutral intermediate vector bo-son, the Z. Because of this, the experimenters canmeasure the acceptance ε by taking a sample of Z

bosons identified by the two charged electrons theydecay into, and then measuring ε from this sample.The dominant undertainty in this measurement arisesfrom the finite statistics in the Z boson sample. Thus,one could equivalently consider δε to be a statisticaluncertainty (and not a systematic one). This meansthat the uncertainties could have just as well been de-fined as

σstat = σ0

1/Nc +

(

δε

ε

)2

(5)

σsyst = σ0

(

δNb

Nb

)2

+

(

δL

L

)2

. (6)

resulting in a different assignment of statistical andsystematic uncertainties.

Why would this matter? If we return back to ouroriginal discussion of what defines a statistical andsystematic uncertainty, we normally assume a system-atic uncertainty is correlated with subsequent mea-surements and it does not scale with the sample size.

Entries 38628

2, GeV/cTM40 60 80 100 120 140

2E

ven

ts p

er 2

GeV

/c

0

500

1000

1500

2000

2500

3000Entries 38628Entries 38628

2, GeV/cTM40 60 80 100 120 140

2E

ven

ts p

er 2

GeV

/c

0

500

1000

1500

2000

2500

3000

DATASum

MCνe→Wee MC→Z

MCντ→WQCD

-1CDF Run II Preliminary, 72pb

Figure 1: The transverse mass distribution for the W

boson candidates as observed recently by CDF. The peak

reflects the Jacobian distribution typical of W boson

decays. The points are the measured distribution and the

various histograms are the predicted distribution from W

decays and the other background processes.

In this case, the uncertainty on ε does not meet theserequirements. The acceptance is a stochastic variable,which will become better known with increasing Z bo-son sample size. It is therefore more informative toidentify it as a statistical uncertainty. I will call this a“class 1” systematic uncertainty. Note that it wouldbe appropriate to include in this category those sys-tematic uncertainties that are in fact constrained bythe result of a separate measurement, so long as theresulting uncertainty is dominated by the stochasticfluctuations in the measurement. An example of thiscould be the calibration constants for a detector thatare defined by separate measurements using a calibra-tion procedure whose precision is limited by statistics.

2.2. Backgr ound Uncer tainty

The second case study also involves the measure-ment of σW introduced in the previous section. Theestimate of the uncertainty on the background rateδNb is performed by evaluating the magnitude of thedifferent sources of candidate events that satisfy thecriteria used to define the W boson candidate sample.In the CDF measurement, it turns out that the back-ground is dominated by events that arise from the pro-

PHYSTAT2003, SLAC, Stanford, California, September 8-11, 2003

123

Page 3: Definition andTreatment of Systematic …...Definition andTreatment of Systematic Uncertainties in High Energy Physics and Astrophysics Pekka K. Sinervo Department of Physics, University

Isolation vs. Missing Transverse Energy

(GeV)TE0 10 20 30 40 50 60 70 80 90 100

Iso

lati

on

Fra

ctio

n

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Iso4MetEntries 77380

Mean x 16.37Mean y 0.1115

RMS x 12.94RMS y 0.1409

Iso4MetEntries 77380

Mean x 16.37Mean y 0.1115

RMS x 12.94RMS y 0.1409

A

B

C

Candidatesν e →W

QCD Background BC A

=

CDF Run II Preliminary

-1 L = 72 pb∫

Figure 2: The distribution of the isolation fraction of the

lepton candidate versus the missing energy in the event.

Candidate leptons are required to have low isolation

fractions (< 0.10), and QCD background events

dominate the region with low missing transverse energy.

The signal sample is defined by the requirement

6ET > 25 GeV. The QCD background in the sample is

estimated by the formula in the figure, which assumes

that the isolation properties of the QCD events are

uncorrelated with the missing transverse energy.

duction of two high-energy quarks or gluons (so-called“QCD events”), one of which “fakes” an electron ormuon. A reliable estimate of this background is diffi-cult to make from first principles, as the rate of suchQCD events is many orders of magnitude larger thanthe W boson cross section, and the rejection power ofthe selection criteria is difficult to measure directly.

The technique used to estimate Nb in the CDF anal-ysis is to take advantage of a known correlation: can-didate events from QCD background will have moreparticles produced in proximity to the electron ormuon candidate. At the same time, most of the QCDevents will also have small values of missing transverseenergy ( 6ET ) compared with the W boson events wherea high-energy neutrino escapes undetected. Thus, ameasure of the isolation of the candidate lepton andthe 6ET can be an instrument to extract an estimate ofthe background in the observed sample. This is shownin Fig. 2, where one sees in the bottom-right regionthe signal region for this analysis. The region withlow missing transverse energy is populated by QCDbackground events.

The dominant uncertainty in the background cal-culation arises from the assumption that the isolationproperties of the electron candidate in QCD eventsis uncorrelated with the missing transverse energy inthe event. Any such correlation is expected to be verysmall, and this is consistent with other observations.However, even a small correlation in these two vari-ables results in a bias in the estimate of Nb. Thispotential bias is difficult to estimate with any preci-sion. In this case, the experimenters varied the choice

bkgdepisoEntries 30

Mean 0.52

RMS 0.219

Isolation cut value0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Met

ho

d

TE

# Q

CD

Bkg

evt

s u

sin

g Is

o v

s.

0

500

1000

1500

2000

2500

3000

bkgdepisoEntries 30

Mean 0.52

RMS 0.219 QCD Background dependence on Iso valueν e →W

Candidatesν e →38628 W

syst 672± stat 82± = 1344 QCD bkgN

CDF Run II Preliminary

-1 L = 72 pb∫March 2002 -Jan 2003

Figure 3: The variation of the background estimate as

the isolation cut value is varied from 0.3 (the default

value) up to 0.8. The isolation variable is a measure of

the fraction of energy observed in a cone near the

electron candidate normalized to the energy of the

electron candidate.

of the isolation criteria to define the QCD backgroundregion and the signal region, and used the variationin the background estimate as a measure of the sys-tematic uncertainty δNb. This variation is shown inFig. 3.

This is an illustration of a systematic uncertaintythat arises from one’s limited knowledge of some fea-tures of the data that cannot be constrained by ob-servations. In these cases, one often is forced to makesome assumptions or approximations in the measure-ment procedure itself that have not been verified pre-cisely. The magnitude of the systematic uncertaintyis also difficult to estimate as it is not well-constrainedby other measurements. In this sense, it differs fromthe class 1 systematic uncertainties introduced above.

I will therefore call this a “class 2” systematic un-certainty. It is one of the most common categoriesof systematic uncertainty in measurements in astro-physics and high energy physics.

2.3. Boomerang CMB Anal ysis

My third case study involves the analysis of thedata collected by the Boomerang cosmic microwavebackground (CMB) probe, which mapped the spatialanisotropy of the CMB radiation over a large portionof the southern sky [12]. The data itself is a fine-grained two-dimensional plot of the spatial variationthe temperature of part of the southern sky, as illus-trated in Fig. 4. The analysis of this data involves thetransformation of the observed spatial variation intoa power series in spherical harmonics, with the spatialvariations now summarized in the power spectrum asa function of the order of the spherical harmonic. Thepower spectrum includes all sources of uncertainty, in-

PHYSTAT2003, SLAC, Stanford, California, September 8-11, 2003

124

Page 4: Definition andTreatment of Systematic …...Definition andTreatment of Systematic Uncertainties in High Energy Physics and Astrophysics Pekka K. Sinervo Department of Physics, University

cluding instrumental effects and uncertainties in cali-brations.1

The Boomerang collaborators then test a large classof theoretical models of early universe development bydetermining the power spectrum predicted by eachmodel and comparing the predicted and observedpower as a function of spherical harmonic. Thesemodels are described by a set of cosmological param-eters, each of them being constrained by other ob-servations and theoretical assumptions. To determinethose models that best describe the data, the experi-menters take a Bayesian approach [13], creating a six-dimensional grid consisting of 6.4 million points, andcalculating the likelihood function for the data at eachpoint. They then define priors for each of the six pa-rameters, and define a posterior probability that isnow a function of these parameters. To make infer-ences on such key parameters as the age of the uni-verse or its overall energy density, a marginalizationis performed by numerically integrating the posteriorprobability over the other parameters. The experi-menters can also consider the effect of varying thepriors to explore the sensitivity of their conclusionsto the priors themselves.

In this analysis, the lack of knowledge in theparadigm used to make inferences from the data iscaptured in the choice of priors for each of the pa-rameters. A classical statistical approach could haveequivalently defined these as sources of systematic un-certainty. Viewed from either perspective, the uncer-tainties that arise from the choice of paradigm are notstatistical in nature, given that they would affect anyanalysis of similar data. Yet they differ from the twoprevious classes of systematic uncertainty I have iden-tified, which arise directly from the measurement tech-nique. I therefore define such theoretically-motivateduncertainties as “class 3” systematics. I also note thatthe Bayesian technique to incorporate these uncer-tainties has no well-defined frequentist analogue, inthat one cannot readily identify an ensemble of exper-iments that would replicate the variation associatedwith these uncertainties.

The distinction between class 2 and class 3 system-atics comes in part from the fact that one is associatedwith the measurement technique while the other arisesin the interpretation of the observations. I argue, how-ever, that there is an additional difference: In the firstcase, there is a specific piece of information needed tocomplete the measurement, the background yield Nb,and the systematic uncertainty arises from manner inwhich that is estimated. In the other case, the exper-iment measures the spatial variation in the CMB and

1The uncertainties associated with instrumental effects andcalibrations are also systematic in nature, but we will not focuson these here.

Figure 4: The temperature variation of the CMB as

measured by the Boomerang experiment. The axes

represent the declination and azimuth of the sky, and the

contour is the region used in the analysis.

summarizes these data in the multipole moments. Thesystematic uncertainties that are associated with thesubsequent analysis of these data in terms of cosmo-logical parameters is very model-dependent, and thesystematic uncertainties arise from the attempt to ex-tract information about a subset of the parameters inthe theory (for example, the age of the universe or theenergy density).

2.4. Summar y of Taxonom y

In these case studies, I have motivated three classesof systematic uncertainties. Class 1 systematics areuncertainties that can be constrained by ancillarymeasurements and can therefore be treated as sta-tistical uncertainties. Class 2 systematics arise frommodel assumptions in the measurement or from poorlyunderstood features of the data or analysis techniquethat introduce a potential bias in the experimentaloutcome. Class 3 systematics arise from uncertaintiesin the underlying theoretical paradigm used to makeinferences using the data.

The advantages of this taxonomy are several. Class1 systematics are statistical in nature and will there-fore naturally scale with the sample size. I recommendthat they be properly considered a statistical uncer-tainty and quoted in that manner. They are not corre-lated with independent measurements and are there-fore straightforward to handle when combining mea-surements or making inferences. Class 2 systematicsare the more challenging category, as they genuinelyreflect some lack of knowledge or uncertainty in themodel used to analyze the data. Because of this, theyalso have correlations that should be understood inany attempt to combine the measurement with other

PHYSTAT2003, SLAC, Stanford, California, September 8-11, 2003

125

Page 5: Definition andTreatment of Systematic …...Definition andTreatment of Systematic Uncertainties in High Energy Physics and Astrophysics Pekka K. Sinervo Department of Physics, University

observations. They also do not scale with the sam-ple size, and therefore may be fundamental limits onhow well one can perform the measurement. Class 3systematics do not depend on how well we understandthe measurement per se, but are fundamentally tied tothe theoretical model or hypothesis being tested. Assuch, there is significant variation in practice. As isillustrated in the third case study and as I will discussbelow, a Bayesian approach allows for a range of pos-sible models to be tested if one can parametrize theuncertainties in the relevant probability distributionfunction and then define reasonable priors. A purelyfrequentist approach to this problem founders on howone would define the relevant ensemble.

3. ESTIMATION OF SYSTEMATICUNCERTAINTIES

There is little, if any, formal guidance in the lit-erature for how to define systematic uncertainties orestimate their magnitudes, and much of current prac-tice has been defined by informal convention and “oraltradition.” A fundamental principle, however, is thatthe technique used to define and estimate a system-atic uncertainty should be consistent with how thestatistical uncertainties are defined in a given mea-surement, since the two sources of uncertainty are of-ten combined in some way when the measurement iscompared with theoretical predictions or independentmeasurements.

Perhaps the most challenging aspect of estimatingsystematic uncertainties is to define in a consistentmanner all the relevant sources of systematic uncer-tainty. This requires a comprehensive understandingof the nature of the measurement, the assumptionsimplicit or explicit in the measurement process, andthe uncertainties and assumptions used in any theo-retical models used to interpret the data. In any ro-bust design of an experiment, the experimenters willanticipate all sources of systematic uncertainty andshould design the measurement to minimize or con-strain them appropriately. Good practice suggeststhat the analysis of systematic uncertainties should bebased on clear hypotheses or models with well-definedassumptions.

In the process of the measurement, it is often typ-ical to make various “cross-checks” and tests to de-termine that no unanticipated source of systematicuncertainty has crept into the measurement. A cross-check, however, should not be construed as a sourceof systematic uncertainty (see, for example the discus-sion in [7]).

A common technique for estimating the magnitudeof systematic uncertainties is to determine the max-imum variation in the measurement, ∆, associatedwith the given source of systematic uncertainty. Argu-ments are then made to transform that into a measure

that corresponds to a one standard deviation measurethat one would associate with a Gaussian statistic,with typical conversions being ∆/2 and ∆/

√12, the

former being argued as a deliberate overestimate, andthe latter being motivated by the assumption that theactual bias arising from the systematic uncertaintycould be anywhere within the interval ∆. Since itis common in astrophysics and high energy physicsto quote 68% confidence level intervals as statisticaluncertainties, it therefore is appropriate to estimatesystematic uncertainties in a comparable manner.

There are various practices that tend to over-estimate the magnitude of systematic uncertainties,and these should be avoided if one is to not dilutethe statistical power of the measurement. A commonmistake is to estimate the magnitude of a systematicuncertainty by using a shift in the measured quantitywhen some assumption is varied in the analysis tech-nique by what is considered the relevant one standarddeviation interval. The problem with this approach isthat often the variation that is observed is dominatedby the statistical uncertainty in the measurement, andany potential systematic bias is therefore obscured. Insuch cases, I recommend that either a more accurateprocedure be found to estimate the systematic uncer-tainty, or at the very least that one recognize thatthis estimate is unreliable and likely to be an overes-timate. A second common mistake is to introduce asystematic uncertainty into the measurement withoutan underlying hypothesis to justify the concern. Thisis often the result of confusing a source of systematicuncertainty with a “cross check” of the measurement.

4. THE STATISTICS OF SYSTEMATICUNCERTAINTIES

A reasonable goal in any treatment of systematicuncertainties is that consistent and well-establishedprocedures be used that allow one to understand howto best use the information embedded in the system-atic uncertainty when interpreting the measurement.Increasingly, the fields of astrophysics and high energyphysics have developed more sophisticated approachesto interval estimation and hypothesis testing. Fre-quentist approaches have returned to the fundamen-tals of Neyman constructions and the resulting cov-erage properties. Bayesian approaches have exploredthe implications of both objective and subjective pri-ors, the nature of inference and the intrinsic powerembedded in such approaches when combining infor-mation from multiple measurements.

I will outline how systematic uncertainties can beaccommodated formally in both Bayesian and fre-quentist approaches.

PHYSTAT2003, SLAC, Stanford, California, September 8-11, 2003

126

Page 6: Definition andTreatment of Systematic …...Definition andTreatment of Systematic Uncertainties in High Energy Physics and Astrophysics Pekka K. Sinervo Department of Physics, University

4.1. Formal Statement

To formally state the problem, assume we have aset of observations xi, i = 1, n, with an associatedprobability distribution function p(xi|θ), where θ isan unknown random parameter. Typically, we wishto make inferences about θ. Let us now assume thatthere is some additional uncertainty in the probabilitydistribution function that can be described with an-other unknown parameter λ. This allows us to definea likelihood function

L (θ, λ) =∏

i

p(xi|θ, λ). (7)

Formally, one can treat λ as a “nuisance parame-ter.” In many cases (especially those associated withclass 1 systematics), one can identify a set of addi-tional observations of a random statistic yj , j = 1, m

that provides information about λ. In that case, thelikelihood would become

L (θ, λ) =∏

i,j

p(xi, yj |θ, λ). (8)

With this formulation, one sees that one has to finda means of taking into account the uncertainties thatarise from the presence of λ in order to make inferenceson θ. I will discuss some possible approaches.

4.2. Bayesian Appr oach

A Bayesian approach would involve identification ofa prior, π(λ), that characterizes our knowledge of λ.Typical practice has been to either assume a flat prioror, in cases where there are corollary measurementsthat give us information on λ, a Gaussian distribution.One can then define a Bayesian posterior probabilitydistribution

L (θ, λ) π(λ) dθdλ, (9)

which we can then marginalize to set Bayesian credi-bility intervals on θ.

This is a straightforward statistical approach andresults in interval estimates that can readily be in-terpreted in the Bayesian context. The usual issuesregarding the choice of priors remains, as does the in-terpretation of a Bayesian credibility interval. Theseare beyond the scope of this discussion, but are cov-ered in most reviews of this approach [13].

4.3. Frequentist Appr oach

The frequentist approach to the formal problemalso starts with the joint probability distributionp(xi, yj |θ, λ). There are various techniques for howto deal with the presence of the nuisance parameterλ, and I will outline just a few of them. I will note

that there isn’t a single commonly adopted strategy inthe literature, and even the simplest techniques tendto involve significant computational burden.

One technique involves identifying a transformationof the parameters to factorize the problem in such amanner that one can then integrate out one of the twoparameters [14]. This approach is robust and theo-retically sound, and in the trivial cases results in a1-dimensional likelihood function that now incorpo-rates the uncertainties arising from the nuisance pa-rameter. It has well-defined coverage properties anda clear frequentist interpretation. However, this ap-proach is of limited value given that it is necessary tofind an appropriate transformation.

I note that this approach is only of value in caseswhere one is dealing with a class 1 systematic un-certainty that is, as I have argued above, formally asource of statistical uncertainty. Class 2 and class 3systematic uncertainties cannot be readily constrainedby a set of observations represented by the yj , j =1, m.

A second approach to the incorporation of nui-sance parameters is to define Neyman “volumes” inthe multi-dimensional parameter space, equivalent towhat is done in the case of a interval setting with onerandom parameter. In this case, one creates an infiniteset of two dimensional contours defined by requiringthat the observed values lie within the contour thenecessary fraction of the time (say 68%). Then oneidentifies the locus of points in this two-dimensionalspace defined by the centres of each contours, and thisboundary becomes the multi-dimensional Neyman in-terval for both parameters, as illustrated in Fig. 5. To“eliminate” the nuisance parameter, one projects thetwo-dimensional contour onto the axis of the parame-ter of interest. This procedure results in a frequentistconfidence interval that over-covers, and in some casesover-covers badly. It thus results in “conservative” in-tervals that may diminish the statistical power of themeasurement.

A third technique to take into account systematicuncertainties involves what is commonly called the“profile method” where one eliminates the nuisanceparameter by creating a profile likelihood defined asthe value of the likelihood maximized by varying λ

for each value of the parameter θ [15]. This createsa likelihood function independent of λ, but one thathas ill-defined coverage properties that depend on thecorrelation between λ and θ. However, it is a straight-forward technique, that is used frequently and that re-sults in inferences that are at some level conservative.

A variation on these techniques has been used inrecent analyses of solar neutrino data, where the anal-ysis uses 81 observables and characterizes the varioussystematic uncertainties by the introduction of 31 pa-rameters [16]. They linearize the effects of the system-atic parameters on the chi-squared function and thenminimize the chi-squared with respect to each of the

PHYSTAT2003, SLAC, Stanford, California, September 8-11, 2003

127

Page 7: Definition andTreatment of Systematic …...Definition andTreatment of Systematic Uncertainties in High Energy Physics and Astrophysics Pekka K. Sinervo Department of Physics, University

Figure 5: The Neyman construction for two parameters.

The nuisance parameter is θ2, and the shaded region is

the interval defined by this construction. The four

dashed contours are examples of the intervals that define

the contour (from G. Zech).

parameters. In this sense, this analysis is a concreteexample of the profile method.

Although most of these techniques are approximatein some sense, they have the virtue that they scale inan intuitively acceptable manner. In the limit wherethe statistical uncertainties are far larger than the sys-tematic uncertainties, the former are the dominant ef-fect in any inference or hypothesis test. Conversely,when the systematic uncertainties begin to competewith or dominate the statistical uncertainties, the re-sults of any statistical inference reflect the system-atic uncertainties and these become the limiting fac-tor in extracting information from the measurement.I would argue that this should be a minimal require-ment of any procedure used to take into account sys-tematic uncertainties.

4.4. Hybrid Techniques

There has been one technique in common use inhigh energy physics to incorporate sources of system-atic uncertainty into an analysis, first described byR. Cousins and V. Highland [2]. Using the notationintroduced earlier, the authors argue that one shouldcreate a modified probability distribution function

pCH(x|θ) =

p(x|θ, λ)π(λ) dλ, (10)

which could be used to define a likelihood functionand make inferences on θ. They argue that this canbe understood as approximating the effects of havingan ensemble of experiments each of them with variouschoices of the parameter λ and with the distributionπ(λ) representing the frequency distribution of λ inthis ensemble.

Although intuitively appealing to a physicist, thisapproach does not correspond to either a truly fre-quentist or Bayesian technique. On the one hand, theconcept of an ensemble is a frequentist construct. Onthe other hand, the concept of integrating or “aver-aging” over the probability distribution function is aBayesian approach. Because of this latter step, it isdifficult to define the coverage of this process [5]. Itherefore consider it a Bayesian technique that can bereadily understood in that formulation if one treatsthe frequency distribution π(λ) as the prior for λ. Inote that it also has the desired property of scalingcorrectly as one varies the relative sizes of the statis-tical and systematic uncertainties.

5. SUMMARY AND CONCLUSIONS

The identification and treatment of systematic un-certainties is becoming increasingly “systematic” inhigh energy physics and astrophysics. In both fields,there is a recognition of the importance of systematicuncertainties in a given measurement, and techniqueshave been adopted that result in systematic uncertain-ties that can be compared in some physically relevantsense with the statistical uncertainties.

I have proposed that systematic uncertainties canbe classified into three broad categories, and by do-ing so creating more clarity and consistency in theirtreatment from one measurement to the next. Suchclassification, done a priori when the experiment is be-ing defined, will assist in optimizing the experimentaldesign and introducing into the data analysis the nec-essary approaches to control and minimize the effectof these systematic effects. In particular, one shouldnot confuse systematic uncertainties with cross-checksof the results.

Bayesian statistics naturally allow us to incorporatesystematic uncertainties into the statistical analysisby introducing priors for each of the parameters as-sociated with the sources of systematic uncertainty.However, one must be careful regarding the choice ofprior. I recommend that in all cases the sensitivity ofany inference or hypothesis test to the choice of priorbe investigated in order to ensure that the conclusionsare robust.

Frequentist approaches to systematic uncertaintiesare less well-understood. The fundamental problem ishow one defines the concept of an ensemble of mea-surements, when in fact what is varying is not an out-come of a measurement but ones assumptions concern-ing the measurement process or the underlying theory.I am not aware of a robust method of incorporatingsystematic uncertainties in a frequentist paradigm ex-cept in cases where the systematic uncertainty is reallya statistical uncertainty and the additional variablecan be treated as a nuisance parameter. However, the

PHYSTAT2003, SLAC, Stanford, California, September 8-11, 2003

128

Page 8: Definition andTreatment of Systematic …...Definition andTreatment of Systematic Uncertainties in High Energy Physics and Astrophysics Pekka K. Sinervo Department of Physics, University

procedures commonly used to incorporate systematicuncertainties into frequentist statistical inference dohave some of the desired “scaling” properties.

Ackno wledgments

I gratefully acknowledge the support of the Nat-ural Sciences and Engineering Research Council ofCanada, and thank the conference organizers for theirgracious hospitality.

References

[1] A. Stuart and J. K. Ord, Kendall’s AdvancedTheory of Statistics, Oxford University Press,New York (1991).

[2] R. D. Cousins and V. L. Highland, Nucl. Instrum.Meth. A320, 331 (1992).

[3] C. Guinti, Phys. Rev. D 59, 113009 (1999).[4] M. Corradi, “Inclusion of Systematic Uncertain-

ties in Upper Limits and Hypothesis Tests,”CERN-OPEN-2000-213 (Aug 2000).

[5] J. Conrad et al., Phys. Rev. D 67, 012002 (2003).[6] G. C. Hill, Phys. Rev. D 67, 118101 (2003).[7] R. J. Barlow, “Systematic Errors, Fact and

Fiction,” hep-ex/0207026 (Jun 2002). Published

in the Proceedings of the Conference on Ad-vanced Statistical Techniques in Particle Physics,Durham England, Mar 16-22, 2002. See also,R. J. Barlow, “Asymmetric Systematic Errors,”hep-ph/0306138 (Jun 2003).

[8] L. Demortier, “ Bayesian Treatment of System-atic Uncertainties,” Published in the Proceedingsof the Conference on Advanced Statistical Tech-niques in Particle Physics, Durham England, Mar16-22, 2002.

[9] G. Zech, Eur. Phys. J C4:12 (2002).[10] See, for example: A. G. Kim et al., “Effects of

Systematic Uncertainties on the Determinationof Cosmological Parameters,” astro-ph/0304509(Apr 2003).

[11] T. Dorigo et al. (The CDF Collaboration),FERMILAB-CONF-03/193-E. Published in theProceedings of the 38th Rencontres de Moriondon QCD and High-Energy Hadronic Interactions,Les Arcs, Savoie, France, March 22-29, 2003.

[12] B. Netterfield et al., Ap. J. 571, 604 (2002).[13] E. T. Jaynes, Probability Theory: The Logic of

Science, Cambridge University Press (1995).[14] G. Punzi, “Including Systematic Uncertainties in

Confidence Limits,” CDF Note in preparation(Jul 2003).

[15] See, for example, N. Reid, in these proceedings.[16] Fogli et al., Phys. Rev. D 66, 053010 (2002).

PHYSTAT2003, SLAC, Stanford, California, September 8-11, 2003

129