27
Practical Statistics Lecture 1 (Nov. 9): - Correlation - Hypothesis Testing Lecture 2 (Nov. 16): - Parameter Estimation - Bayesian Analysis - Rejecting Outliers Lecture 3 (Nov. 18) - Monte Carlo Modeling - Bootstrap + Jack-knife Lecture 4 (Nov. 30): - Detection Effects - Survival Analysis Lecture 5 (Dec. 2): - Fourier Techniques - Filtering - Unevenly Sampled Data 1 Good Reference: Hogg et al. 2010 http://arxiv.org/pdf/1008.4686v1

Practical Statistics - University of Arizonaircamera.as.arizona.edu/Astr_518/lecture_nov16.pdf · 2010. 11. 21. · • Lecture 1 (Nov. 9):-Correlation ... ,1) - a selection of all

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Practical Statistics - University of Arizonaircamera.as.arizona.edu/Astr_518/lecture_nov16.pdf · 2010. 11. 21. · • Lecture 1 (Nov. 9):-Correlation ... ,1) - a selection of all

Practical Statistics • Lecture 1 (Nov. 9):

- Correlation- Hypothesis Testing

• Lecture 2 (Nov. 16):- Parameter Estimation- Bayesian Analysis- Rejecting Outliers

• Lecture 3 (Nov. 18)- Monte Carlo Modeling- Bootstrap + Jack-knife

• Lecture 4 (Nov. 30):- Detection Effects- Survival Analysis

• Lecture 5 (Dec. 2):- Fourier Techniques- Filtering- Unevenly Sampled Data

1

Good Reference: Hogg et al. 2010http://arxiv.org/pdf/1008.4686v1

Page 2: Practical Statistics - University of Arizonaircamera.as.arizona.edu/Astr_518/lecture_nov16.pdf · 2010. 11. 21. · • Lecture 1 (Nov. 9):-Correlation ... ,1) - a selection of all

Parameter Estimation

Most data analysis has the final goal of estimating the value (and uncertainty) of a parameter or parameters.

The modeling portion of this seeks to find the parameters which have the maximum probability of generating the observed data.

The assumption of this approach is that the model (and error estimates) are valid. Alternatives should be explored.

2

Page 3: Practical Statistics - University of Arizonaircamera.as.arizona.edu/Astr_518/lecture_nov16.pdf · 2010. 11. 21. · • Lecture 1 (Nov. 9):-Correlation ... ,1) - a selection of all

P =�

i

f(xi|α)

Maximum Likelihood

If we know (or can guess) the probability distribution,f(a), of our data, then we can write for our set of data:

3

The maximum of this function gives us our best estimate of the parameters, a.

Page 4: Practical Statistics - University of Arizonaircamera.as.arizona.edu/Astr_518/lecture_nov16.pdf · 2010. 11. 21. · • Lecture 1 (Nov. 9):-Correlation ... ,1) - a selection of all

N�

i=1

−(yi − y(xi))2

2σ2

The Classic Example: Linear Fit What is the probability a certain data point is drawn

from a given model?

For N points the overall probability for a given model is

To maximize the probability we should minimize the exponent:

P ∝ e−(yi−y(xi))

2

2σ2

P ∝N�

i=1

e−(yi−y(xi))

2

2σ2

y = b + mx

Page 5: Practical Statistics - University of Arizonaircamera.as.arizona.edu/Astr_518/lecture_nov16.pdf · 2010. 11. 21. · • Lecture 1 (Nov. 9):-Correlation ... ,1) - a selection of all

yi = ln(yi)y = bemx

y = bxmxi = ln(xi)

y = b + mx2 xi = x2i

Linear fit can be used for range of problems through transformation

5

Variation: Change needed beforeusing standard least-squares”

yi = ln(yi)

Note that if original errors were Gaussian (not guaranteed), transformed oneswill not be!

Page 6: Practical Statistics - University of Arizonaircamera.as.arizona.edu/Astr_518/lecture_nov16.pdf · 2010. 11. 21. · • Lecture 1 (Nov. 9):-Correlation ... ,1) - a selection of all

A �X = �Y

�X =�

bm

��Y =

y1

y2

...yN

�A =

1 x1

1 x2

...1 xN

�C =

σ2y1 0 ... 00 σ2

y2 ... 0...

0 0 ... σ2yN

C−1A �X = C−1�Y

Linear Algebra Approach

You would like to solve the system of equations:

6

Where:

To do so, you weight each solution using the covariance matrix:

Page 7: Practical Statistics - University of Arizonaircamera.as.arizona.edu/Astr_518/lecture_nov16.pdf · 2010. 11. 21. · • Lecture 1 (Nov. 9):-Correlation ... ,1) - a selection of all

[AT C−1A]−1 = Cov =�

σ2b σmb

σmb σ2m

... and the solution

Left-multiply by A-transpose. Isolate X

�X = [AT C−1A]−1[AT C−1�Y ]

If the errors are valid, then the first term is important:

Page 8: Practical Statistics - University of Arizonaircamera.as.arizona.edu/Astr_518/lecture_nov16.pdf · 2010. 11. 21. · • Lecture 1 (Nov. 9):-Correlation ... ,1) - a selection of all

Useful Matlab routines:zeros, ones - creates a vector or matrix of zeros or oneseye - creates identity matrixrepmat - replicates a vector into a matrixcat - cocatenates two vectors into a matrixplot - plots two quantitieserrorbar - plots two quantities with errorbarsA’ - transpose of Ainv(A) - matrix inverse of AC\A - matrix inverse of C times A(:,1) - a selection of all the rows and column 1 of a matrixrandom - generation of random number from a particular

distribution.8

Page 9: Practical Statistics - University of Arizonaircamera.as.arizona.edu/Astr_518/lecture_nov16.pdf · 2010. 11. 21. · • Lecture 1 (Nov. 9):-Correlation ... ,1) - a selection of all

Removing Outliers

Your data always has “oddities”

These “oddities” can dominate the parameter estimation

How do we remove them in an objective way?

Fast and Easy waySlow and Righteous way

9

Page 10: Practical Statistics - University of Arizonaircamera.as.arizona.edu/Astr_518/lecture_nov16.pdf · 2010. 11. 21. · • Lecture 1 (Nov. 9):-Correlation ... ,1) - a selection of all

10

Page 11: Practical Statistics - University of Arizonaircamera.as.arizona.edu/Astr_518/lecture_nov16.pdf · 2010. 11. 21. · • Lecture 1 (Nov. 9):-Correlation ... ,1) - a selection of all

Sigma-clipping

• Create best fit model.

• Define x sigma as “unlikely”.

• Get rid of points outside of x sigma

• Refit data.

• Iterate, if necessary.

11

Page 12: Practical Statistics - University of Arizonaircamera.as.arizona.edu/Astr_518/lecture_nov16.pdf · 2010. 11. 21. · • Lecture 1 (Nov. 9):-Correlation ... ,1) - a selection of all

Can be good for obvious outliersGets murky when outliers are several sigmaCan throw out good data.

Page 13: Practical Statistics - University of Arizonaircamera.as.arizona.edu/Astr_518/lecture_nov16.pdf · 2010. 11. 21. · • Lecture 1 (Nov. 9):-Correlation ... ,1) - a selection of all

Which points should get clipped?

13

Page 14: Practical Statistics - University of Arizonaircamera.as.arizona.edu/Astr_518/lecture_nov16.pdf · 2010. 11. 21. · • Lecture 1 (Nov. 9):-Correlation ... ,1) - a selection of all

Bayesian Analysis

We can develop a model that assumes the data is contaminated by bad points:Assume each point has the same probability, P_b, of being

bad.Assume these points have a mean Y_b, and variance,

V_b. with a Gaussian distribution.

Suggested Sampling of new parameters:P_b is uniform from 0 to 1Y_b is uniform from 0 t 2*mean(Y)V_b is uniform from ~30^2 14

Page 15: Practical Statistics - University of Arizonaircamera.as.arizona.edu/Astr_518/lecture_nov16.pdf · 2010. 11. 21. · • Lecture 1 (Nov. 9):-Correlation ... ,1) - a selection of all

Some points:

Nothing special about the choice of the distribution for the bad points, or the priors.

We need to choose something in order to make the calculations. Gaussian distributions are easy to calculate.

We will be “marginalizing” over these parameters in the end, so only want them to be reasonable approximations of the experiment.

15

Page 16: Practical Statistics - University of Arizonaircamera.as.arizona.edu/Astr_518/lecture_nov16.pdf · 2010. 11. 21. · • Lecture 1 (Nov. 9):-Correlation ... ,1) - a selection of all

The Likelihood Distribution

For the model of mixed good and bad points from two separate distribution, we end up with a likelihood distribution:

16

ML =�

i

1− Pb�2πσ2

yi

e−(yi−mxi−b)2

2σ2yi +

Pb�2π(σ2

yi + Vb)e−(yi−Yb)2

2σ2yi

We are interested, in using this to define the “best” value of m and b, as well as it’s confidence interval.

The best value for m and b, is called the Maximum A Posterior (MAP) value.

Page 17: Practical Statistics - University of Arizonaircamera.as.arizona.edu/Astr_518/lecture_nov16.pdf · 2010. 11. 21. · • Lecture 1 (Nov. 9):-Correlation ... ,1) - a selection of all

Marginalization

We want to marginalize (or get rid of) the parameters we defined to model the bad data.

The way to do this is to numerically integrate over P_b, V_b, and Y_b for each m and b value.

The maximum in the b vs. m plot now incorporates our prior information about the data.

17

Page 18: Practical Statistics - University of Arizonaircamera.as.arizona.edu/Astr_518/lecture_nov16.pdf · 2010. 11. 21. · • Lecture 1 (Nov. 9):-Correlation ... ,1) - a selection of all

marginalized MAP contour plot

18

Page 19: Practical Statistics - University of Arizonaircamera.as.arizona.edu/Astr_518/lecture_nov16.pdf · 2010. 11. 21. · • Lecture 1 (Nov. 9):-Correlation ... ,1) - a selection of all

Bayesian Fit

This uses all the data, and is not skewed by the outliers.19

Page 20: Practical Statistics - University of Arizonaircamera.as.arizona.edu/Astr_518/lecture_nov16.pdf · 2010. 11. 21. · • Lecture 1 (Nov. 9):-Correlation ... ,1) - a selection of all

ErrorsIf individual errors are estimated correctly, the ML can tell

us the confidence interval.If error are underestimated we can have problems! (right)

20

Page 21: Practical Statistics - University of Arizonaircamera.as.arizona.edu/Astr_518/lecture_nov16.pdf · 2010. 11. 21. · • Lecture 1 (Nov. 9):-Correlation ... ,1) - a selection of all

What about the errors?

Often the errors may be not known or are not to be trusted.

There are techniques that use only the data values (not there uncertainties) to derive the parameter uncertainty.

Two common techniques:BootstrapJack Knife

21

Page 22: Practical Statistics - University of Arizonaircamera.as.arizona.edu/Astr_518/lecture_nov16.pdf · 2010. 11. 21. · • Lecture 1 (Nov. 9):-Correlation ... ,1) - a selection of all

σ2m =

1M

M�

j=1

[mj −m]2

Boot Strap Technique

Method: If there are N data points, randomly choose the data

point pairs, N points in each set, but can get doubles, or not use full

set.Make M data sets.Calculate variance:

22This is very flexible, being able to use any numerical

metric for “goodness of fit”

Page 23: Practical Statistics - University of Arizonaircamera.as.arizona.edu/Astr_518/lecture_nov16.pdf · 2010. 11. 21. · • Lecture 1 (Nov. 9):-Correlation ... ,1) - a selection of all

m =1N

N�

j=1

mj

σ2m =

N − 1N

N�

j=1

[mj −m]2

Jack Knife Technique

Method:For a data set of N values, remove each one in turn.For each subset of data, calculate m_iThen:

23

Page 24: Practical Statistics - University of Arizonaircamera.as.arizona.edu/Astr_518/lecture_nov16.pdf · 2010. 11. 21. · • Lecture 1 (Nov. 9):-Correlation ... ,1) - a selection of all

Objective Data Modeling

Removing data is subjective and irreproducible.

Sigma clipping is a prescription, without a model of why you have the data you do.

The Bayesian analysis of outliers is an objective fit to the data, which incorporates your knowledge of the outliers.

Bootstrap and Jack knife allow robust estimation of errors. 24

Page 25: Practical Statistics - University of Arizonaircamera.as.arizona.edu/Astr_518/lecture_nov16.pdf · 2010. 11. 21. · • Lecture 1 (Nov. 9):-Correlation ... ,1) - a selection of all

The End (for today)

25

Page 26: Practical Statistics - University of Arizonaircamera.as.arizona.edu/Astr_518/lecture_nov16.pdf · 2010. 11. 21. · • Lecture 1 (Nov. 9):-Correlation ... ,1) - a selection of all

What if we see a correlation?• It’s common (but dangerous!) to just fit a line to the

data:

26“Anscombe’s quartet” illustrates the potential pitfalls of line fitting

Page 27: Practical Statistics - University of Arizonaircamera.as.arizona.edu/Astr_518/lecture_nov16.pdf · 2010. 11. 21. · • Lecture 1 (Nov. 9):-Correlation ... ,1) - a selection of all

Review: Process of Decision Making

27

Ask a Question

Take Data

Reduce Data

Derive Statistics describing data

Does the Statistic answer your question?

Probability Distribution

Error Analysis

Publish!

No

Reflect on what is needed

Yes

Hypothesis Testing

Simulation