Statistical Methods for Astronomyircamera.as.arizona.edu/Astr_518/Statistics-Lecture2.pdf ·...

Statistical Methods for Astronomy● Lecture 1● Why do we need statistics?

● Definitions

● Statistical distributions

� Binomial Distribution

� Poisson Distribution

� Gaussian Distribution● Central Limit theorem

● Least Squares

� chi-squared

� significance

● Lecture 2● Your Statistical Toolbox

● Bayes' theorem● F-test● KS-test● Monte Carlo method● transforming deviates

If your experiment needs statistics, you ought to have done a better experiment.-Ernest Rutherford

References● “Data Reduction and Error Analysis”, Bevington and

Robinson● “Practical Statistics for Astronomers”, Wall and Jenkins● “Numerical Recipes”, Press et al.● “Understanding Data Better with Bayesian and Global

Statistical Methods”, Press, 1996 (on astro-ph)

Another look at the problem● Knowing the distribution

allows us to predict what we will observe.

● We often know what we have observed and want to determine what that tells us about the distribution.

Bayesian Statistics● “Frequentist” approaches are computationally easy,

but often solve the inverse of the problem we want.● Bayesian approaches use both the data and any

“prior” information to develop a “posterior” distribution.� Allows calculation of parameter uncertainty more

directly.� More easily incorporates outside information.

An Example● I flip a coin 10 times and obtain 7 heads. What is the

probability for flipping a heads?� A frequentist statistician would say 0.7� A bayesian statistician might define a prior probability

with mean=0.5 and sigma=0.2 (for example)

Who would you side with?

Obtaining the Posterior Distribution● Bayes' Theorem states:

P B∣A= P A∣B P BP A

P(A | B) should be read as“probability of A given B”

● A is typically the data P(data), B the statistic we want to know.

● P(B) is the “prior” information we may know about the experiment.

P B∣data∝P data∣BP B ● P(data) is just a normalization constant

Using Bayes' theorem● Assume we are looking for faint companions,

and expect them to be around 1% of the stars we observe.

● From putting in fake companions we know that we can detect planets 90% of the time.

● We also know that we see “false” planets 3% of the observations.

● What is the probability that an object we see is actually a planet?

P planet =0.01

P planet∣det.= P det.∣planet P planet P det

P det.∣planet =0.01P −det.∣planet =0.9

P det.∣no planet =0.03

P planet∣det.= 0.9×0.010.9×0.010.03×0.99

P planet∣det.= P det.∣planet P planet P det.∣planet P planet P det.∣no planet P no planet

General Bayesian Guidance● Focuses on probability rather than accept/reject.● Bayesian approaches allow you to calculate

probabilities the parameters have a range of values in a more straightforward way.

● A common concern about Bayesian statistics is that it is subjective. This is not necessarily a problem.

● Bayesian techniques are generally more computationally intensive, but this is rarely a drawback for modern computers.

Hypothesis Testing● Hypothesis testing uses some metric to determine

whether two data sets, or a data set and a model, are distinct.

● Typically, the problem is set up so that the hypothesis is that the data sets are consistent (the null hypothesis).

● A probability is calculated that the value found would be obtained again with another sample.

● Based on the required level of confidence, the hypothesis is rejected or accepted.

Are two data sets drawn from the same distribution?

● The “t” statistic quantifies the likelihood that the means are the same.

● The “F” statistic quantifies the likelihood that the variances of two data sets are the same.

● Consider two data sets, x and y, with m and n data points:

t= x−ys1 /m1/n

s2=nS xmS ynm

F= ∑ xi−x 2/n−1

∑ y j−y 2/m−1

Student's t test● Calculate the t statistic. A perfect agreement is t=0.● Evaluate the probability for t>value.

t= x−ys1 /m1/n

s2=nS xmS ynm

t= x−ys1 /m1/n

s2=nS xmS ynm

F test● Calculate the F statistic.

● Calculate the probability that F>value.

F= ∑ xi−x 2/n−1

∑ y j−y 2/m−1

The Kolmogorov-Smirnov Test● Calculate the cumulative distribution function for your

model (C_model(x)).● Calculate the cumulative distribution function for your

data(C_data(x).● Find maximum of |Cmodel(x)-Cdata(x)|● The variables, x, must be continuous to use K-S test.

K-S test example

Monte Carlo Simulation● Often we may find it easiest just to replicate an

experiment or observation in the computer. ● In general these tools are referred to as “Monte Carlo”

methods.● General idea is to simulate randomness and

reproduce observations for comparison with data.● First we need a random number sequence.

Creating Random numbers● A proper random sequence of numbers is a whole

topic in itself. Numerical Recipes discusses this in some detail.

● A simple example of a random number generator is the sequence:

I j1=a I jmod m/a

Where a and m are large numbers. I_j is a seed value that would always give us the exact same sequence of random numbers.

Random Numbers● The example gives a “uniform” distribution set of

random numbers. That is, P x dx=dx if 0x1

0otherwise● We would like useful distributions, such as Poisson, etc.

To do so, we need to transform the random numbers.

Transformation Method● Starting from the law for transformation of

probabilities:∣p y dy∣=∣p xdx∣

∣dxdy

∣=∣p ypx

p y=dxdy

● We can rewrite to solve for the probability we want.

1. Need to integrate the probability distribution2. Solve for the new variable (y) in terms of the uniform variable(x)

Example● I want to simulate the time it takes between arrival of

photons at the detector. This is given by an exponential probability distribution:

e−t=dxdt

e−t=x

t=−ln x

P t dt=e− t dt

● Use the transformation of probabilities:● Need to integrate: ∫e−t dt=∫ dx

● A random number in the range 0 to 1 will be transformed to one which can be between Inf and 0

Limitations● Transformation methods are limited to analytical

probability distributions. ● One also needs to be able to integrate the proability

distribution and invert the equation to solve for the new variable.

● Often one of these criteria is not satisfied. You can still generate useful random numbers using the rejection method.

Rejection Method● Generate two uniform random deviates, x and y. ● Adjust x to span the range of values expected for the

random number (x'=f(x)). ● Compare the value of y to the value of the probability

distribution at x' (y'=p(x'))● If y'<y use the value of x' in your simulation, if y'>y

reject this pair and start over.

Statistical Methods for Astronomyircamera.as.arizona.edu/Astr_518/Statistics-Lecture2.pdf ·...

Documents

Model-based Induction and the Frequentist Interpretation ... · 1 Introduction: the frequentist interpretation The conventional wisdom in philosophy of science . The frequentist interpre-

Doing frequentist statistics with scipy

Bayesian Versus Frequentist Inference

RCE-The principles of frequentist and Bayesian medical ......The principles of frequentist and Bayesian medical statistics Paolo Bruzzi Clinical Epidemiology Unit National Cancer Research

Bayesian versus Frequentist Statistics for Uncertainty Analysis - BIPM

Practical Statistics - University of Arizonaircamera.as.arizona.edu/Astr_518/lecture_nov16.pdf · 2010. 11. 21. · • Lecture 1 (Nov. 9):-Correlation ... ,1) - a selection of all

Introduction Frequentist Estimation STATA commands

Mayo: 2nd half “Frequentist Statistics as a Theory of Inductive Inference” (Selection Effects)

Introduction to Bayesian Econometrics›¾涛-slides.pdf · WHU (Institute) Bayesian Econometrics 22/12 4 / 35. Bayesian Statistics Frequentist Probability and Subjective Probability

Bayesian Analysis€¦ · 1 Introduction The goal of statistics is to make informed, data supported decisions in the face of uncertainty. The basis of frequentist statistics is to

Frequentist Statistics as a Theory of Inductive Inference

Probability Distributions and Frequentist Statistics “A single death is a tragedy, a million deaths is a statistic” Joseph Stalin

Practical Statistics - University of Arizonaircamera.as.arizona.edu/Astr_518/Sep-13-Stat.pdf · Practical Statistics ... -Want algorithm that maps out spaces with higher probability

Frequentist probability and frequentist statistics

AST 418/518 Instrumentation and Statisticsircamera.as.arizona.edu/Astr_518/Aug-23-Stat.pdf · Approaches to Statistics l “Frequentist” approaches will calculate statistics that

Bayesian Statistics - Caltech Astronomygeorge/aybi199/Moghaddam_Bayes.pdf · 2011-05-26 · Bayesian vs.Frequentist Ł Frequentist Statistics Œ a.k.a.fiorthodox statisticsfl (ficlassical

Inferential Statistics: A Frequentist Perspective 03.00_InferentialStats.pdfInferential Statistics: A Frequentist Perspective ... • Probability that you will win a fair lottery if

Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics

Classical (frequentist) inference

FREQUENTIST STATISTICS AS A THEORY OF