View
6
Download
0
Category
Preview:
Citation preview
Statistical Methods for Astronomy● Lecture 1● Why do we need statistics?
● Definitions
● Statistical distributions
� Binomial Distribution
� Poisson Distribution
� Gaussian Distribution● Central Limit theorem
● Least Squares
� chi-squared
� significance
● Lecture 2● Your Statistical Toolbox
● Bayes' theorem● F-test● KS-test● Monte Carlo method● transforming deviates
If your experiment needs statistics, you ought to have done a better experiment.-Ernest Rutherford
References● “Data Reduction and Error Analysis”, Bevington and
Robinson● “Practical Statistics for Astronomers”, Wall and Jenkins● “Numerical Recipes”, Press et al.● “Understanding Data Better with Bayesian and Global
Statistical Methods”, Press, 1996 (on astro-ph)
Another look at the problem● Knowing the distribution
allows us to predict what we will observe.
● We often know what we have observed and want to determine what that tells us about the distribution.
Bayesian Statistics● “Frequentist” approaches are computationally easy,
but often solve the inverse of the problem we want.● Bayesian approaches use both the data and any
“prior” information to develop a “posterior” distribution.� Allows calculation of parameter uncertainty more
directly.� More easily incorporates outside information.
An Example● I flip a coin 10 times and obtain 7 heads. What is the
probability for flipping a heads?� A frequentist statistician would say 0.7� A bayesian statistician might define a prior probability
with mean=0.5 and sigma=0.2 (for example)
Who would you side with?
Obtaining the Posterior Distribution● Bayes' Theorem states:
P B∣A= P A∣B P BP A
P(A | B) should be read as“probability of A given B”
● A is typically the data P(data), B the statistic we want to know.
● P(B) is the “prior” information we may know about the experiment.
P B∣data∝P data∣BP B ● P(data) is just a normalization constant
Using Bayes' theorem● Assume we are looking for faint companions,
and expect them to be around 1% of the stars we observe.
● From putting in fake companions we know that we can detect planets 90% of the time.
● We also know that we see “false” planets 3% of the observations.
● What is the probability that an object we see is actually a planet?
P planet =0.01
P planet∣det.= P det.∣planet P planet P det
P det.∣planet =0.01P −det.∣planet =0.9
P det.∣no planet =0.03
P planet∣det.= 0.9×0.010.9×0.010.03×0.99
=0.23
P planet∣det.= P det.∣planet P planet P det.∣planet P planet P det.∣no planet P no planet
General Bayesian Guidance● Focuses on probability rather than accept/reject.● Bayesian approaches allow you to calculate
probabilities the parameters have a range of values in a more straightforward way.
● A common concern about Bayesian statistics is that it is subjective. This is not necessarily a problem.
● Bayesian techniques are generally more computationally intensive, but this is rarely a drawback for modern computers.
Hypothesis Testing● Hypothesis testing uses some metric to determine
whether two data sets, or a data set and a model, are distinct.
● Typically, the problem is set up so that the hypothesis is that the data sets are consistent (the null hypothesis).
● A probability is calculated that the value found would be obtained again with another sample.
● Based on the required level of confidence, the hypothesis is rejected or accepted.
Are two data sets drawn from the same distribution?
● The “t” statistic quantifies the likelihood that the means are the same.
● The “F” statistic quantifies the likelihood that the variances of two data sets are the same.
● Consider two data sets, x and y, with m and n data points:
t= x−ys1 /m1/n
s2=nS xmS ynm
F= ∑ xi−x 2/n−1
∑ y j−y 2/m−1
Student's t test● Calculate the t statistic. A perfect agreement is t=0.● Evaluate the probability for t>value.
t= x−ys1 /m1/n
s2=nS xmS ynm
t= x−ys1 /m1/n
s2=nS xmS ynm
F test● Calculate the F statistic.
● Calculate the probability that F>value.
F= ∑ xi−x 2/n−1
∑ y j−y 2/m−1
The Kolmogorov-Smirnov Test● Calculate the cumulative distribution function for your
model (C_model(x)).● Calculate the cumulative distribution function for your
data(C_data(x).● Find maximum of |Cmodel(x)-Cdata(x)|● The variables, x, must be continuous to use K-S test.
K-S test example
D
Monte Carlo Simulation● Often we may find it easiest just to replicate an
experiment or observation in the computer. ● In general these tools are referred to as “Monte Carlo”
methods.● General idea is to simulate randomness and
reproduce observations for comparison with data.● First we need a random number sequence.
Creating Random numbers● A proper random sequence of numbers is a whole
topic in itself. Numerical Recipes discusses this in some detail.
● A simple example of a random number generator is the sequence:
I j1=a I jmod m/a
Where a and m are large numbers. I_j is a seed value that would always give us the exact same sequence of random numbers.
Random Numbers● The example gives a “uniform” distribution set of
random numbers. That is, P x dx=dx if 0x1
0otherwise● We would like useful distributions, such as Poisson, etc.
To do so, we need to transform the random numbers.
Transformation Method● Starting from the law for transformation of
probabilities:∣p y dy∣=∣p xdx∣
∣dxdy
∣=∣p ypx
∣
p y=dxdy
● We can rewrite to solve for the probability we want.
1. Need to integrate the probability distribution2. Solve for the new variable (y) in terms of the uniform variable(x)
Example● I want to simulate the time it takes between arrival of
photons at the detector. This is given by an exponential probability distribution:
e−t=dxdt
e−t=x
t=−ln x
P t dt=e− t dt
● Use the transformation of probabilities:● Need to integrate: ∫e−t dt=∫ dx
● A random number in the range 0 to 1 will be transformed to one which can be between Inf and 0
Limitations● Transformation methods are limited to analytical
probability distributions. ● One also needs to be able to integrate the proability
distribution and invert the equation to solve for the new variable.
● Often one of these criteria is not satisfied. You can still generate useful random numbers using the rejection method.
Rejection Method● Generate two uniform random deviates, x and y. ● Adjust x to span the range of values expected for the
random number (x'=f(x)). ● Compare the value of y to the value of the probability
distribution at x' (y'=p(x'))● If y'<y use the value of x' in your simulation, if y'>y
reject this pair and start over.
Recommended