Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
The Bayesian Analysis Toolkit
a C++ tool for Bayesian inference
Kevin Kröninger – University of Göttingen / University of Siegen
Bayes Forum, Munich, 13.04.2012
The BAT (wo)men: Frederik Beaujean, Allen Caldwell, Daniel Greenwald, Daniel Kollar, Kevin Kröninger, Shabnaz Pashapour, Arnulf Quadt
Kevin Kröninger – Universität Göttingen / University of Siegen 2
Bayes Forum, Munich, 13.04.2012 The Bayesian Analysis toolkit
Questions in data analysis:● What does the data tell us about our model? Parameter estimation● Which model is favored by the data? Model comparison● Is the model compatible with the data? Goodness-of-fit test
Need methods and tools to extract information
TheoryExperiment Data analysis
Motivation
Kevin Kröninger – Universität Göttingen / University of Siegen 3
Bayes Forum, Munich, 13.04.2012 The Bayesian Analysis toolkit
Outline
Outline:● Requirements / Implementation / Tools● Markov Chain Monte Carlo● MCMC implementation in BAT● A working example● Some propaganda● Summary
Kevin Kröninger – Universität Göttingen / University of Siegen 4
Bayes Forum, Munich, 13.04.2012 The Bayesian Analysis toolkit
Requirements and solutions
Requirements:● Allow to phrase arbitrary models
and data sets● Interface to (HEP) software● Estimate parameters (point
estimates)● Find probability densities
(interval estimates)● Propagate uncertainties● Compare models● Test validity of model against the
data
Solutions:● C++ library based on ROOT*.● Models are implemented as
(base) classes and need to be defined by the user, or
● A set of pre-defined models can be used.
● A set of algorithms can used to perform the actual analysis
*Framework for handling large data sets,graphical representation and analysistools
Kevin Kröninger – Universität Göttingen / University of Siegen 5
Bayes Forum, Munich, 13.04.2012 The Bayesian Analysis toolkit
Requirements:● Allow to phrase arbitrary models
and data sets● Interface to HEP software● Estimate parameters (point
estimates)● Find probability densities
(interval estimates)● Propagate uncertainties● Compare models● Test validity of model against the
data
Solutions:● Minimization can be done via a
Minuit interface or via Simulated Annealing.
● Marginalization and uncertainty estimation can be done via Markov Chain Monte Carlo (MCMC).
● Propagation of uncertainties (without Gaussian assumptions) can also be done via MCMC
Requirements and solutions
Kevin Kröninger – Universität Göttingen / University of Siegen 6
Bayes Forum, Munich, 13.04.2012 The Bayesian Analysis toolkit
Requirements:● Allow to phrase arbitrary models
and data sets● Interface to HEP software● Estimate parameters (point
estimates)● Find probability densities
(interval estimates)● Propagate uncertainties● Compare models● Test validity of model against the
data
Solutions:● Direct comparison of model
probabilities (Bayes factors)● Integration methods from Cuba*
library linked● Perform p-value tests
Requirements and solutions
*A collection of numerical integrationmethods e.g., VEGAS
Kevin Kröninger – Universität Göttingen / University of Siegen 7
Bayes Forum, Munich, 13.04.2012 The Bayesian Analysis toolkit
Implementation
Define MODEL● define parameters● define likelihood● define priors
p D ∣
p0
Read DATA● from text file, ROOT tree,
user-defined (anything)● interface to user-defined
software
USER DEFINED
COMMON METHODS
p ∣ D =p D ∣ p0
∫ p D ∣ p0 d
● create model● read-in data
● normalize● find mode / fit● test the fit● marginalize wrt. one
or two parameters● compare models● provide nice output
Kevin Kröninger – Universität Göttingen / University of Siegen 8
Bayes Forum, Munich, 13.04.2012 The Bayesian Analysis toolkit
Implementation
Tools:● Point estimates:
● Minuit● Simulated Annealing● MCMC● simple Monte Carlo
● Marginalization:● MCMC● simple Monte Carlo
● Integration:● sampled mean● importance sampling● CUBA (Vega, Suave, Divonne, Cuhre)
● Sampling:● simple Monte Carlo● MCMC
● Error propagation● MCMC
Kevin Kröninger – Universität Göttingen / University of Siegen 9
Bayes Forum, Munich, 13.04.2012 The Bayesian Analysis toolkit
Markov Chain Monte Carlo
How does MCMC work?● Output of Bayesian analyses are
posterior probability densities, i.e., functions of an arbitrary number of parameters (dimensions).
● Sampling large dimensional functions is difficult.
● Idea: use random walk heading towards region of larger values (probabilities)
● Metropolis algorithm
N. Metropolis et al., J. Chem. Phys. 21 (1953) 1087.
● Start at some randomly chosen xi
● Randomly generate y around xi
● If f(y) > f(xi) set x
i+1 = y
● If f(y) < f(xi) set x
i+1 = y with prob. p=f(y)/f(x
i)
● If y is not accepted set xi+1
= xi
● Start over
Kevin Kröninger – Universität Göttingen / University of Siegen 10
Bayes Forum, Munich, 13.04.2012 The Bayesian Analysis toolkit
An example of MCMC
Does it work for difficult functions?● Test MCMC on a function:
● Compare MCMC distribution toanalytic function
● Several minima/maxima are noproblem.
● Different orders of magnitude areno problem.
f x=x4 sin x2
Kevin Kröninger – Universität Göttingen / University of Siegen 11
Bayes Forum, Munich, 13.04.2012 The Bayesian Analysis toolkit
An example of MCMC
χ2 vs. x Pull distributionf(x) vs. x
For more examples, seeour test suite on the BATweb page.
Kevin Kröninger – Universität Göttingen / University of Siegen 12
Bayes Forum, Munich, 13.04.2012 The Bayesian Analysis toolkit
MCMC and Bayesian inference
How does MCMC help in Bayesian inference?● Use MCMC to sample the posterior
probability, i.e.
● Marginalization of posterior:
● Fill a histogram with just onecoordinate while sampling
● Error propagation: calculate any function of the parameters whilesampling
● Point estimate: find mode whilesampling
marginalized distributions
Kevin Kröninger – Universität Göttingen / University of Siegen 13
Bayes Forum, Munich, 13.04.2012 The Bayesian Analysis toolkit
MCMC in BAT
Metropolis is ~3 lines of code, fairly easy, but ...
Technical details:● How are the new points generated? Proposal function● How many points can we afford to throw away? Efficiency● How many iterations do we need? Convergence criterion● How correlated are the points? Auto-correlation/lag
Kevin Kröninger – Universität Göttingen / University of Siegen 14
Bayes Forum, Munich, 13.04.2012 The Bayesian Analysis toolkit
Choice of proposal function
How are the new points generated?● Proposal function: probability density of the step size
used in the random walk● Should be independent of the underlying distribution,
i.e., the same everywhere● Shape is important (default: Breit-Wigner)● Width defines efficiency = fraction of accepted points
● Small width = large efficiency
● Large width = small efficiency
● Trade off: efficiency ~25%
Kevin Kröninger – Universität Göttingen / University of Siegen 15
Bayes Forum, Munich, 13.04.2012 The Bayesian Analysis toolkit
Convergence criterion
How many iterations do we need?● MCMC distribution should
converge to underlying function.● In practice: need to stop the chain
at some point. Need criteria.● Two strategies:● Single chain convergence:
● Could monitor auto-correlation● Very CPU-time intensive● Could be done offline
● Multi-chain convergence:● Test convergence of multiple
chains wrt each other● Use Gelman&Rubin criterion
Gelman & Rubin, StatSci 7, 1992
Gelman & Rubin convergence:● Calculate average variance of all chains
● Estimate variance of target distribution
● Calculate ratio and compare with stopping criterion (relaxed version):
< 1.x (x = 0.1 default)
Kevin Kröninger – Universität Göttingen / University of Siegen 16
Bayes Forum, Munich, 13.04.2012 The Bayesian Analysis toolkit
Convergence criterion
Parameter 0 value vs. iteration Parameter 1 vs parameter 0
Convergence a la Gelman & Rubin
Burn-in phase
Kevin Kröninger – Universität Göttingen / University of Siegen 17
Bayes Forum, Munich, 13.04.2012 The Bayesian Analysis toolkit
Auto-correlation and lag
How correlated are the points?● Simple Monte Carlo sampling and “unbiased” random walk create sets
of points without (auto-correlation) while MCMC algorithm can cause auto-correlation, e.g., when rejecting a point (since the old one is taken again)
● Size of the correlation depends on the underlying posterior and the proposal function
● Can thin the MCMC sample by introducing a lag, i.e., take only every nth point to calculate the marginalized distributions
● Cost: need to run a factor of n longer to get the same stat. precision
Kevin Kröninger – Universität Göttingen / University of Siegen 18
Bayes Forum, Munich, 13.04.2012 The Bayesian Analysis toolkit
Auto-correlation and lag
χ2 vs. lagf(x) vs. x
Kevin Kröninger – Universität Göttingen / University of Siegen 19
Bayes Forum, Munich, 13.04.2012 The Bayesian Analysis toolkit
A working example
Phrasing the problem:● Estimate signal strength of Gaussian signal on top of
flat background● Data generated with the
following settings:● Gaussian signal:
● position μ = 2039 keV● width σ = 5 keV● strength <S> = 100
● Flat background:● strength <B> = 3/keV
● Number of events per binfluctuate with Poissondistribution
Kevin Kröninger – Universität Göttingen / University of Siegen 20
Bayes Forum, Munich, 13.04.2012 The Bayesian Analysis toolkit
A working example
Statistical modeling:● Statistical model:
● Gaussian signal on top of flat background● 4 fit parameters:
● Gaussian signal (3) ● Flat background (1)
● Prior knowledge:● Background: 300 ± 20 in 100 keV (e.g., from sideband analysis)● Signal strength: exponentially decreasing (e.g., theoretical intuition)● Signal position: flat (e.g., no idea about the mass of a resonance)● Signal width: 5 ± 1 keV (detector resolution)● Signal and background efficiency fixed to 1 (in this example)
Kevin Kröninger – Universität Göttingen / University of Siegen 21
Bayes Forum, Munich, 13.04.2012 The Bayesian Analysis toolkit
A working example
Statistical modeling:● Likelihood:
● Binned data● Number of expected events per bin:
● Assume independent Poisson fluctuations in each bin● Likelihood:
p D∣S , , , B=∏i=1
N bins ini
ni!e−i
i=∫ xi
S
2e−x−2
22
d x0.01⋅B x i
Kevin Kröninger – Universität Göttingen / University of Siegen 22
Bayes Forum, Munich, 13.04.2012 The Bayesian Analysis toolkit
A working example
Indicate median
Indicate central68% prob. region
Indicate globalmode and mean
Quote median and central 68% prob. regionMarginalized distributions:● Project posterior onto one
parameter axis● Global mode and mode of
marginalized distributionsdo not have to coincide
● Full (correlated) information in Markov Chain
● Default output:● Mean ± std. deviation● Median and central int.● Mode and smallest int.
● All 1-D and 2-D distributions are written out during main run
Kevin Kröninger – Universität Göttingen / University of Siegen 23
Bayes Forum, Munich, 13.04.2012 The Bayesian Analysis toolkit
An example: 2D distributions
Indicate smallest intervalcontaining 68% probability
Indicate global mode
Kevin Kröninger – Universität Göttingen / University of Siegen 24
Bayes Forum, Munich, 13.04.2012 The Bayesian Analysis toolkit
An example: the fit result
Error band
Uncertainty ofprediction
Kevin Kröninger – Universität Göttingen / University of Siegen 25
Bayes Forum, Munich, 13.04.2012 The Bayesian Analysis toolkit
An example: the summary
Results of the marginalization ============================== List of parameters and properties of the marginalized distributions: (0) Parameter "Background": Mean +- sqrt(V): 282.2 +- 13.04 Median +- central 68% interval: 282.1 + 13.11 - 12.92 (Marginalized) mode: 281 5% quantile: 260.8 10% quantile: 265.5 16% quantile: 269.2 84% quantile: 295.7 90% quantile: 299 95% quantile: 303.8 Smallest interval(s) containing 68% and local modes: (268, 298) (local mode at 281 with rel. height 1; rel. area 0.7169)
(2) Parameter "Signal": Mean +- sqrt(V): 83.59 +- 11.09 Median +- central 68% interval: 83.28 + 11.35 - 10.73 (Marginalized) mode: 83 5% quantile: 65.85 10% quantile: 69.55 16% quantile: 72.55 84% quantile: 95.13 90% quantile: 97.99 95% quantile: 102.4 Smallest interval(s) containing 68% and local modes: (72, 96) (local mode at 83 with rel. height 1; rel. area 0.6806)
(4) Parameter "Signal mass": Mean +- sqrt(V): 2038 +- 0.8009 Median +- central 68% interval: 2038 + 0.7945 - 0.7922 (Marginalized) mode: 2038 5% quantile: 2037 10% quantile: 2037 16% quantile: 2037 84% quantile: 2039 90% quantile: 2039 95% quantile: 2039 Smallest interval(s) containing 68% and local modes: (2037, 2039) (local mode at 2038 with rel. height 1; rel. area 0.6844)...
Results of the optimization =========================== Optimization algorithm used:Metropolis MCMC List of parameters and global mode: (0) Parameter "Background": 22.68% (2) Parameter "Signal": 19.76% (4) Parameter "Signal mass": 23.64% (5) Parameter "Signal width": 19.82%
Status of the MCMC ================== Convergence reached: yes Number of iterations until convergence: 24000 Number of chains: 10 Number of iterations per chain: 10000000 Average efficiencies: (0) Parameter "Background": 20.03% (2) Parameter "Signal": 17.35% (4) Parameter "Signal mass": 24.52% (5) Parameter "Signal width": 19.56%
Kevin Kröninger – Universität Göttingen / University of Siegen 26
Bayes Forum, Munich, 13.04.2012 The Bayesian Analysis toolkit
An example: the summary
Kevin Kröninger – Universität Göttingen / University of Siegen 27
Bayes Forum, Munich, 13.04.2012 The Bayesian Analysis toolkit
An example: correlation
Kevin Kröninger – Universität Göttingen / University of Siegen 28
Bayes Forum, Munich, 13.04.2012 The Bayesian Analysis toolkit
An example: knowledge update
Kevin Kröninger – Universität Göttingen / University of Siegen 29
Bayes Forum, Munich, 13.04.2012 The Bayesian Analysis toolkit
An example: knowledge update
Kevin Kröninger – Universität Göttingen / University of Siegen 30
Bayes Forum, Munich, 13.04.2012 The Bayesian Analysis toolkit
Published use cases
● Quentin Buat, Search for extra dimensions in the diphoton final state with ATLAS [arXiv:1201.4748]
● ATLAS collaboration, Search for excited leptons in proton-proton collisions at sqrt(s) = 7 TeV with the ATLAS detector [arXiv:1201.3293]
● I. Abt et al., Measurement of the temperature dependence of pulse lengths in an n-type germanium detector, Eur. Phys. J. Appl. Phys.56:10104,2011 [arXiv:1112.5033]
● ATLAS collaboration, Search for Extra Dimensions using diphoton events in 7 TeV proton-proton collisions with the ATLAS detector [arXiv:1112.2194]
● ATLAS collaboration, A measurement of the ratio of the W and Z cross sections with exactly one associated jet in pp collisions at sqrt(s) = 7 TeV with ATLAS, Phys.Lett.B708:221-240,2012 [arXiv:1108.4908]
●ZEUS collaboration, Search for single-top production in ep collisions at HERA, Phys.Lett.B708:27-36,2012 [arXiv:1111.3901]
● CMS collaboration, Search for a W’ boson decaying to a muon and a neutrino in pp collisions at sqrt(s) = 7 TeV, Phys.Lett.B701:160-179,2011 [arXiv:1103.0030]
● ZEUS collaboration, Measurement of the Longitudinal Proton Structure Function at HERA, Phys.Lett.B682:8-22,2009 [arXiv:0904.1092]
Kevin Kröninger – Universität Göttingen / University of Siegen 31
Bayes Forum, Munich, 13.04.2012 The Bayesian Analysis toolkit
Contact
Contact:● Web page: http://www.mppmu.mpg.de/bat/● Contact: [email protected]● Paper on BAT:
A. Caldwell, D. Kollar, K. Kröninger, BAT - The Bayesian Analysis ToolkitComp. Phys. Comm. 180 (2009) 2197-2209 [arXiv:0808.2552].
Release 0.9 next week
Kevin Kröninger – Universität Göttingen / University of Siegen 32
Bayes Forum, Munich, 13.04.2012 The Bayesian Analysis toolkit
Tutorials
Tutorials:● Set of tutorials on the web page for first steps, including solutions
Kevin Kröninger – Universität Göttingen / University of Siegen 33
Bayes Forum, Munich, 13.04.2012 The Bayesian Analysis toolkit
Warrant
Current projects:● BAT version v1.0● ROOT-less version > v1.0● Parallelization● Graphical representation of uncertainty bands
as in Eur. Phys. J. Plus 127 (2012) 24 [arXiv:1112.2593]
Warrant:● If you are interested joining the effort, please get in touch with us● Also have (Bsc./MSc.) thesis projects to offer
Kevin Kröninger – Universität Göttingen / University of Siegen 34
Bayes Forum, Munich, 13.04.2012 The Bayesian Analysis toolkit
Summary
Summary:● Bayesian inference requires some computational effort (e.g., nuisance
parameters)● Markov Chain Monte Carlo is the key tool to solve these issues● BAT is a tool to combine Bayesian inference with MCMC● Toolbox with more algorithms (integration, optimization, etc.)● C++ library, modular, easy to use● Informative output with predefined plots, numbers, etc.● Did not talk about hypothesis testing and goodness-of-fit, p-values,
Bayes factors, information criteria● Upgrade of BAT ongoing, more to come● Participation and feedback are always welcome
Kevin Kröninger – Universität Göttingen / University of Siegen 35
Bayes Forum, Munich, 13.04.2012 The Bayesian Analysis toolkit
MCMC in BAT
What exactly is being done in BAT?● Step 1: Starting points
● Random within parameter space (default)● Center of each dimension● User-defined
● Step 2: Burn-in● Use multiple chains (default: 5)● Run until convergence is reached and chains are efficient● Or run until the maximum number of iterations is reached● Chains are efficient if the efficiency is between 15% and 50%● Run in sequences to adjust the width of the proposal functions:
● If efficiency > 50%: increase the width● If efficiency < 15%: decrease the width
Kevin Kröninger – Universität Göttingen / University of Siegen 36
Bayes Forum, Munich, 13.04.2012 The Bayesian Analysis toolkit
What exactly is being done in BAT?● Step 3: Main run
● Use width obtained from efficiency optimization and convergence(fixed)
● Run for a specified number of iterations● Perform analysis-specific calculations (next slide)● Store information of every nth iteration (consider lag)
MCMC in BAT
Kevin Kröninger – Universität Göttingen / University of Siegen 37
Bayes Forum, Munich, 13.04.2012 The Bayesian Analysis toolkit
What is done in each step?● Marginalization:
● Fill 1-D and 2-D histograms● Large number: N·(N+1)/2, e.g., for N=50 there are 1275 histograms● Individual histograms can be switched on/off
● Optimization:● Search for maximum of posterior● Not precise, but helpful as starting point for other algorithms
● Error propagation:● Calculate arbitrary (user-defined) functions from parameters
● Misc:● Write points to ROOT tree for offline analysis● Perform any user-defined analysis, histogram filling, etc.
MCMC in BAT
Kevin Kröninger – Universität Göttingen / University of Siegen 38
Bayes Forum, Munich, 13.04.2012 The Bayesian Analysis toolkit
An example: error propagation
Sum of all possible fit functions weighted withposterior: calculate fit function at energy E forall parameter values
Use as error band
Posterior probability for the numberof expected events with energyE=2032 keV