51
Introduction to Data Analysis Confidence Intervals

Introduction to Data Analysis Confidence Intervals

Embed Size (px)

Citation preview

Page 1: Introduction to Data Analysis Confidence Intervals

Introduction to Data Analysis

Confidence Intervals

Page 2: Introduction to Data Analysis Confidence Intervals

2

Standard errors continued…

Last week we managed to work out some info about the distribution of sample means. If we have lots of samples then:

Mean of all the sample means = population mean. Shape of the distribution of sample means = normal. Standard error tells us how dispersed the distribution of

sample means is.

Put all these together, and we can finally work out how likely our sample mean is to be near the population mean.

Page 3: Introduction to Data Analysis Confidence Intervals

3

Churchgoing example

Let’s take a less contrived example than my car managing to break any speed limits.

This week I’m interested in the average number of times a year people go to church (or synagogue, or whatever).

I take a sample of 100 people and record the number of times they went to church last year. Some went a lot but most went infrequently.

So, from that sample I get a sample mean and a sample standard deviation.

Page 4: Introduction to Data Analysis Confidence Intervals

4

Church-going sample

0

10

20

30

40

50

60

0 10 20 30 40 50 60

Number of times per year

Freq

uenc

y

Sample mean = 8.5

Sample s.d. = 20

Page 5: Introduction to Data Analysis Confidence Intervals

5

Standard error Since we know the s.d. and mean of the sample, we can

calculate the standard error.

Because we know the standard error and the fact that the sampling distribution is normal, we are able to calculate how ‘likely’ it is that a specific range around our estimate contains the population mean.

We can calculate what’s called a confidence interval.

Page 6: Introduction to Data Analysis Confidence Intervals

6

Confidence intervals (1)

A confidence interval for an estimate is a range of numbers within which the parameter is ‘likely’ to fall.

Remember a parameter is something about the population, like the population mean.

We can use the standard error (and the fact that the sampling distribution is normal) to produce such a range.

Page 7: Introduction to Data Analysis Confidence Intervals

7

Confidence intervals (2)

k is chosen to determine what is meant by ‘likely’ to contain the actual value of the population mean.

We want to pick a k that is meaningful to us. … and is not so large that the interval is useless. … but not so small that the interval is very unlikely to

contain the true population value.

)error standard(estimate k

Page 8: Introduction to Data Analysis Confidence Intervals

8

Confidence intervals (3)

To return to our example, we have: An estimate of the population mean (the

mean number of days of church attendance) which is the sample mean (8 ½ days).

A standard error (2 days) which allows us to put a range around our estimate.

)2(8.5 k

Page 9: Introduction to Data Analysis Confidence Intervals

9

How do we pick k?

Imagine sampling repeatedly, and therefore getting lots of sample means.

Given what we know about the sampling distribution (i.e. it’s normal if n is large enough), we know what proportion of these sample means will fall within a certain number of standard errors of the actual population mean.

Page 10: Introduction to Data Analysis Confidence Intervals

10

Confidence coefficient Call this proportion the confidence coefficient.

If we picked .95 then we know that if we repeatedly sampled the population, that interval around each of the sample means would include the true value 95% of the time.

The confidence coefficient is a number that is chosen by the researcher which is close to 1, like 0.95 or 0.99.

Since the sampling distribution is normal, we know the values of k that correspond to the probability of any proportion.

So we know that approximately 95% of confidence intervals that are 2 standard errors either side of the sample mean will include the population mean.

Page 11: Introduction to Data Analysis Confidence Intervals

11

Back to church For our particular sample we calculate a 95%

confidence interval (i.e. k=2).

Imagine the population mean for churchgoing in Britain is actually 6 days a year.

So our particular sample is off by 2 ½ days a year. The mean of all the possible sample means is equal to the

population mean so the centre of the sampling distribution is 6.

45.8

)22(8.5

k

SE

Page 12: Introduction to Data Analysis Confidence Intervals

12

Sampling distribution (1)

0 1 2 3 4 5 6 7 8 9 10 11 12

Church-going (days a year)

Population mean = 6

Sample mean=8.5

Page 13: Introduction to Data Analysis Confidence Intervals

13

More samples

But that’s just our one sample, let’s imagine we took many samples, and then calculated 95% confidence intervals for all the sample means.

Page 14: Introduction to Data Analysis Confidence Intervals

14

Sampling distribution (2)

0 1 2 3 4 5 6 7 8 9 10 11 12

Church-going (days a year)

Population mean = 6

Page 15: Introduction to Data Analysis Confidence Intervals

15

Sampling distribution (3)

Of my 7 samples, all the confidence intervals around the sample mean enclosed the actual true population mean apart from one.

If we repeated this lots of times, we would expect 95% of the confidence intervals to enclose the actual population mean.

95% because that’s the confidence coefficient that we picked. If we had picked a confidence coefficient of 0.99, then it would be

99% and the confidence intervals would be larger.

Page 16: Introduction to Data Analysis Confidence Intervals

16

Confidence coefficients

To make things easier I’ve been rounding the numbers off, the exact figures for k at each confidence coefficient are slightly different.

Confidence coefficient k

68% 1.00

95% 1.96

99% 2.58

99.9% 3.29

Page 17: Introduction to Data Analysis Confidence Intervals

17

Smaller confidence intervals?

We could just be less certain. If I was willing to pick a 90% confidence interval then

the range of the interval would be narrower as k would be lower.

We would be wrong more often though…

We could increase the sample size. The bigger the sample size, the lower the standard error

and therefore the smaller the confidence interval for a given probability.

This isn’t always practical of course…

Page 18: Introduction to Data Analysis Confidence Intervals

18

Why 95%? Generally speaking in social science we pick 95% as

an appropriate confidence coefficient. A 1 in 20 chance of being wrong is generally felt to

be okay. Path dependency—Fisher integrated the normal PDF

for the 95% level, which is REALLY hard to do without a computer.

This isn’t always the case… Sometimes we want to be really sure we’ve enclosed the mean. Other times we want a narrower range and are willing to accept

that we are wrong more often.

Page 19: Introduction to Data Analysis Confidence Intervals

19

CIs for proportions

Since calculating the standard error is similar for proportions, so are producing confidence intervals.

Remember we need binary data coded a 0 and 1 (yes/no, men/women etc.)

Remember the standard error for proportions depends on the n and the sample proportion. So the CI is as below:

Page 20: Introduction to Data Analysis Confidence Intervals

20

Proportions example

Let’s take a political opinion poll in the US, we’re interested in the population proportion voting Democrat (call this π). Sample is 1000 people. Sample proportion is

0.45 (or 45%).

Page 21: Introduction to Data Analysis Confidence Intervals

21

Opinion polls

A lot of opinion polls use a sample size of about 1000 people, and aim to estimate proportions that are between 30% and 50%.

This is why you often see ‘margins of error’ that are 3% either side of an estimate quoted in newspapers; these are 95% confidence intervals.

Note that all these confidence intervals ignore non-sampling error.

What we call sampling bias, this could be due to non-response, badly worded questions and so forth.

Page 22: Introduction to Data Analysis Confidence Intervals

22

Exercise

If you were taking a political opinion poll, roughly how big a sample would you need to achieve a ‘margin of error’ (i.e. a 95% confidence interval) of:

2% either side of the estimate? 1% either side of the estimate?

(Assume that the sample proportion of interest is around 40%).

Page 23: Introduction to Data Analysis Confidence Intervals

23

Exercise answer

Page 24: Introduction to Data Analysis Confidence Intervals

24

What’s a hypothesis? Hypotheses = testable statements about the world. Hypotheses = falsifiable.

We test hypotheses by attempting to see if they could be false, rather than ‘proving’ them to be true.

All swans are white…? Popper said you cannot prove that all swans are white by counting white swans, but you can prove that not all swans are white by counting one black swan.

We normally generate hypotheses from a combination of theory, past empirical work, common sense and anecdotal observations about the world;

e.g. young people are more leftwing than their elders. Observations abut my friends, work showing this relationship in other countries, theory suggesting that social ageing makes people more rightwing.

Page 25: Introduction to Data Analysis Confidence Intervals

25

Some hypotheses (or not)

My girlfriend spends too much on make-up. NO. It’s a normative claim about ‘too much’.

My girlfriend spends more than me on make-up. NO. Falsifiable, but not really social science.

Women spend more than men on make-up. HOORAY, a social scientific hypothesis. It’s falsifiable, and tells

us something about the world.

Make-up is used to marginalize the Other as a form of contemporary patriarchialist nihilism.

NO. This is too vague to be falsifiable.

Page 26: Introduction to Data Analysis Confidence Intervals

26

Hypothesis testing

Two different types of hypothesis. Descriptive inference (e.g. old people are more religious

than young people). Causal inference (e.g. old people are more religious

than young people because they think about death more).

We test statistical hypotheses using significance testing.

This is a way of statistically testing a hypothesis by comparing the data we have to values predicted by the hypothesis.

Page 27: Introduction to Data Analysis Confidence Intervals

27

Null and alternative hypotheses When we’re testing hypotheses, we want to choose

between two conflicting statements. Null hypothesis (H0) is directly tested.

This is a statement that the parameter we are interested in has a value similar to no effect.

e.g. regarding religiosity, old people are the same as young people.

Alternative hypothesis (Ha) contradicts the null hypothesis.

This is a statement that the parameter falls into a different set of values than those predicted by H0.

e.g. regarding religiosity, old people are different to young people.

Page 28: Introduction to Data Analysis Confidence Intervals

28

A (simple) hypothesis

We’re interested in whether new measures to curb MEPs fraudulently claiming for flights are effective.

Previously, the population of MEPs managed to claim 16 flights each on expenses (per month).

After the new measures are introduced, we sample 100 MEPs and find that they are now managing to only charge 13½ flights a month each to expenses. The standard deviation for this sample is 10 flights.

Are EU (well British and German anyway) taxpayers really getting better value for money though?

Page 29: Introduction to Data Analysis Confidence Intervals

29

Aviatophobia or less fraud?

So in this case the null hypothesis is that there has been no change.

H0 is that MEPs are claiming the same amount per month as they were before, and the difference is just because our sample happens to include a lot of aviatophobic MEPs (or something like that).

The alternative hypothesis is that MEPs spending has been curbed.

Ha is that MEPs are claiming less than they were previously.

Page 30: Introduction to Data Analysis Confidence Intervals

30

More formally

Slightly more formally: The population we are interested is MEPs after the

changes. We want to know whether the population mean ‘number

of flights claimed per month’ is different from 16. H0 is that this population mean is equal to 16.

Ha is that this population mean is less than 16. The info we have is from one sample mean.

Now we imagine that H0 is true…

Page 31: Introduction to Data Analysis Confidence Intervals

31

If H0 is true…(1)

The population mean will be equal to 16. Since n is large(ish) the sampling distribution will be

normal and centred around 16. We can calculate the standard error of the sampling

distribution.

Page 32: Introduction to Data Analysis Confidence Intervals

32

If H0 is true…? (2)

10 11 12 13 14 15 16 17 18 19 20 21 22

Flights claimed (per month)

H0 mean = 16 Sample mean = 13.5

2 ½ % of the distribution

Page 33: Introduction to Data Analysis Confidence Intervals

33

P-value (1)

So we know that H0 looks pretty unlikely (certainly less than a 2½ per cent chance), but we can actually give a more precise probability.

We work out how many standard errors the sample mean is away from H0 to produce a z-score, and then calculate a p-value from this with reference to the normal probability distribution.

Page 34: Introduction to Data Analysis Confidence Intervals

34

P-value (2) If H0 were correct it looks unlikely that we would get

a sample that had a mean of 13½. In fact there is just a 0.006 (or 0.6%) chance that our

sample could have come from a population postulated by the null hypothesis.

We could set an (arbitrary) ‘significance level’ that our test must meet. Maybe we need to be 99% confident that we can reject the null hypothesis (i.e. the p-value is less than 1%).

Best practice when we perform significance tests is simply to report the p-value, and to make the judgement that p-values of, say, 5% and below are probably good evidence the null hypothesis can be rejected.

Page 35: Introduction to Data Analysis Confidence Intervals

35

Steps for Hypothesis test

Check assumptions (i.e. normality, sample size, level of measurement)

State hypotheses—Null and Alternative Calculate appropriate test statistic (e.g z-score) Calculate associated p-value Interpret the result

Page 36: Introduction to Data Analysis Confidence Intervals

36

Interpreting hypothesis test

We NEVER accept the null hypothesis.

We either reject or fail to reject based on our

p-value. May fail to reject null hypothesis due to:

small sample size inappropriate research design biased sample, etc..

Page 37: Introduction to Data Analysis Confidence Intervals

37

Type I and type II errors A type I error occurs when we reject H0, even though

it is true. This is going to happen 5% of the time if we choose to reject H0

when the p-value is less than 0.05. A type II error occurs when we do not reject H0, even

though it is false. If our ‘significance level’ is 0.05, then sometimes there will be a

real difference and we won’t detect it. The more stringent the significance level the more

difficult to detect a real effect is, but the more confident we can be that when we find an effect it is real.

Page 38: Introduction to Data Analysis Confidence Intervals

38

Making errors…

There is a trade-off between the two types of error. Depending on what we’re doing we may be more willing to accept one sort or the other.

Think of this as analogous to a legal trial, we don’t want the guilty to go free (a type II error), but we’d be even unhappier if we execute an innocent person (a type I error).

In this case, we might want the significance level to be very low to minimize executing innocent people (but at the same time allowing lots of the guilty to go free)

Page 39: Introduction to Data Analysis Confidence Intervals

39

Differences between 2 sample means

More normally we’ve sampled two groups and wish to see if they differ.

Let’s go back to our religion example. We might be interested in whether women’s church-

going differs from men. We have 2 samples, 45 men and 55 women. The men have a mean attendance of 7 days a year, with

standard deviation of 15. The women have a mean attendance of 10 days a year

with standard deviation of 15.

Page 40: Introduction to Data Analysis Confidence Intervals

40

Are women more or less religious than men? Essentially, to answer this we estimate the

difference between the populations (the parameter) using the difference between the sample means (the statistic).

We can run a significance test on this statistic, and work out whether our samples are likely to represent real differences between the populations of men and women.

The null hypothesis is therefore that there is no difference between men’s mean churchgoing and women’s mean churchgoing.

Page 41: Introduction to Data Analysis Confidence Intervals

41

Z-score

So we work out the z-score as before.

Page 42: Introduction to Data Analysis Confidence Intervals

42

P-value The standard error of the estimator is ~3 and the Z-

score is ~1 then. We have no prior ideas of whether we think women

are more or less religious than men, so we just want to test the possibility that they are not the same.

i.e. we want to know how likely it would be to get an individual estimate of the difference between the sample means that is either 1 SE greater than the null hypothesis (i.e. zero) or 1 SE less than the null hypothesis.

Page 43: Introduction to Data Analysis Confidence Intervals

43

Two-sided tests (1)

-12 -9 -6 -3 0 3 6 9 12

Churchgoing for women - churchgoing for men (in days per year)

H0 mean = 0.SE = ~3.

Difference between sample means = 3i.e. SampleWomen is 3 greater than sampleMen

68% of the distribution

Page 44: Introduction to Data Analysis Confidence Intervals

44

Two-sided tests (2)

The probability of a difference between the two sample means being 3 or greater is 0.16.

The probability of a difference between the two sample means being 3 or less is 0.16.

So the p-value for a 2-sided test is 2*(0.16) or 0.32. This value is high (much higher than our 5% cut off

value), so we fail to reject H0 that men and women do not differ in their church attendance.

Page 45: Introduction to Data Analysis Confidence Intervals

45

Two-sided tests (3)

Regardless of our theoretical expectations, the convention is to use two-tailed tests. Why?

In essence making it even more difficult to find results just due to chance.

We normally don’t have very strong prior information about the difference.

One tailed tests are often the hallmark of someone trying to make something out of nothing.

What does it mean to use a one-tailed test?? Not necessarily bad—in fact arguably MUCH smarter.

Page 46: Introduction to Data Analysis Confidence Intervals

46

Significance tests and CIs (1)

Notice that our significance test has ended up looking rather similar to the CIs.

We could use a CI around the difference between the two sample means to test the hypothesis that they are the same.

A 95% CI would just be 1.96*SE. We’ve just worked out the SE (it’s approximately 3).

Page 47: Introduction to Data Analysis Confidence Intervals

47

Significance tests and CIs (2)

The 95% CI encloses zero (which was our null hypothesis, that women are the same as men).

CIs and significance tests are doing the same job, just presenting the information in a slightly different way.

Page 48: Introduction to Data Analysis Confidence Intervals

48

Proportions and significance tests

This means of course that all I said about proportions and CIs, applies to proportions and significance tests.

A hypothesis related to a previous example is that one of the candidates does have a lead.

Therefore the null hypothesis is that both candidates have a vote share of 50%.

Page 49: Introduction to Data Analysis Confidence Intervals

49

Proportions example – the return Sample is 1000, proportion voting Democrat is 0.45 (or 45%). Null hypothesis is that the population proportion is 0.5. Thus, according to my sample, it is very likely that the

Presidential race is not a dead-heat.Note: The SE is calculated with the null hypothesis proportion, not with the sample proportion.

Remember, we are testing the null hypothesis, so the mean is the null hypothesis mean .

Page 50: Introduction to Data Analysis Confidence Intervals

50

Summary

Because we know the shape of the sampling distribution, we can work out:

Ranges around an individual sample mean that will enclose the population mean X per cent of the time.

The probability that a hypothesis about the population mean is true, given a particular sample mean.

The probability that population means for different groups are different, given two sample means.

All of the above for proportions.

Although there are some small complications...

Page 51: Introduction to Data Analysis Confidence Intervals

51

Things I haven’t mentioned

The reality of statistical hypothesis testing is slightly more complicated than this in some ways particularly:

If your sample sizes are small. If the sample < 30 or < 40 then we need to use a different distribution called the t distribution. To use this we need to assume that the population distribution we are interested in is normal.

As sample size increases, the t-distribution looks more and more similar to the z-distribution. This is why significance tests (even for big sample sizes) are often called t-tests.

When looking at proportions we often use slightly different tests as well to accommodate the fact that a proportion cannot be greater than 1, or less than 0.