52
Computing in Computing in Archaeology Archaeology Session 10. Statistical Session 10. Statistical tests of significance tests of significance © Richard Haddlesey www.medievalarchitecture.net

Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

Embed Size (px)

Citation preview

Page 1: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

Computing in Computing in ArchaeologyArchaeology

Session 10. Statistical Session 10. Statistical tests of significancetests of significance

© Richard Haddlesey www.medievalarchitecture.net

Page 2: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

AimsAims

To understand what we mean by statistical To understand what we mean by statistical significance and archaeological significance and archaeological significancesignificance

To test significance (through the To test significance (through the Null Null hypothesishypothesis and and Chi-squaredChi-squared testing) testing)

Key text: Fletcher & Lock (2Key text: Fletcher & Lock (2ndnd Ed) 2005. Ed) 2005. Digging NumbersDigging Numbers. Oxford. 63-5, 128-38 . Oxford. 63-5, 128-38

Page 3: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

Is it significant?

What do we mean by significant?

Page 4: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

Choosing a test

Page 5: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

positively skeweddistribution

normal distribution

Page 6: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

positively skeweddistribution

normal distribution

parametric test

Page 7: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

positively skeweddistribution

non-parametric test

normal distribution

parametric test

Page 8: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

Hypothesis testingHypothesis testing

Before we can test significance we must Before we can test significance we must formulate two hypothesesformulate two hypotheses

So what do we mean by hypothesis So what do we mean by hypothesis testing?testing?

Theories abound in archaeology although Theories abound in archaeology although many of them cannot be tested in any way many of them cannot be tested in any way let alone in the formal way described let alone in the formal way described throughout this lecturethroughout this lecture

Page 9: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

Hypothesis testingHypothesis testing

A test must be repeatable, not just by you, A test must be repeatable, not just by you, but by anyone who has access to the data but by anyone who has access to the data setset

A hypothesis, therefore, must represent a A hypothesis, therefore, must represent a quantifiable relationship and it is this quantifiable relationship and it is this relationship which is tested formallyrelationship which is tested formally

We could say that all hypotheses are We could say that all hypotheses are theories whereas not all theories are theories whereas not all theories are hypotheseshypotheses

Page 10: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

Example hypothesisExample hypothesis

In order to illustrate the logic of a In order to illustrate the logic of a hypothesis test consider testing the hypothesis test consider testing the hypothesis that at least 40% of all hypothesis that at least 40% of all bronze spearheads come from bronze spearheads come from burialsburials

Fletcher & Lock, 63

Page 11: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

Step 1 – formulate two hypotheses

Page 12: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

Step 1 – formulate two hypotheses• null hypothesis (H0)

Page 13: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

Step 1 – formulate two hypotheses• null hypothesis (H0)

• alternative hypothesis (H1)

Page 14: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

Step One:Step One: H H00 & H & H11

This should be done so that one and This should be done so that one and only one only one mustmust be true be true

In this case we would have:In this case we would have:• HH00: proportion of bronze spears from burials is ≥40%: proportion of bronze spears from burials is ≥40%

• HH11: proportion of bronze spears from burials is : proportion of bronze spears from burials is <<40%40%

Page 15: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

Step 1 – formulate two hypotheses• null hypothesis (H0)

• alternative hypothesis (H1)

Step 2 – take measurements

Page 16: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

Step TwoStep Two

Take a suitable measurement or Take a suitable measurement or observation from which a test statistic and observation from which a test statistic and its associated probability (step 3) can be its associated probability (step 3) can be calculatedcalculated

Here we have a sample of 20 bronze Here we have a sample of 20 bronze spearheads 7 of which have been found in spearheads 7 of which have been found in burials (this is the observed result)burials (this is the observed result)

Page 17: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

So far so good!So far so good!

Page 18: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

Step 1 – formulate two hypotheses• null hypothesis (H0)

• alternative hypothesis (H1)

Step 2 – take measurements

Step 3 – calculate test statistic

Page 19: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

Step 3: the difficult bitStep 3: the difficult bit

Here we calculate a Here we calculate a test statistictest statistic which can then be tested for which can then be tested for significance in step 4. significance in step 4.

The test statistic allows for the The test statistic allows for the calculation of the probability of the calculation of the probability of the observed result which is often called observed result which is often called the p-valuethe p-value

Page 20: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

Step 3: continuedStep 3: continued

If HIf H0 0 is true and at least 40% of all bronze is true and at least 40% of all bronze spearheads do come from burials what is spearheads do come from burials what is the probability of a sample of 20 the probability of a sample of 20 containing 7 from burials?containing 7 from burials?

P (burial)P (burial) =0.40 and so P (not burial) =0.60=0.40 and so P (not burial) =0.60

P (not burial for 1P (not burial for 1stst & 2 & 2ndnd)) =(0.60)(0.60)=(0.60)(0.60) =(0.60) =(0.60)22

hence P (not burial for 13 (20-7))hence P (not burial for 13 (20-7)) =(0.60)=(0.60)1313 =0.0013=0.0013

The p-value (probability of the observed The p-value (probability of the observed result) is 0.0013 or 0.13%result) is 0.0013 or 0.13%

Page 21: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

Step 1 – formulate two hypotheses• null hypothesis (H0)

• alternative hypothesis (H1)

Step 2 – take measurements

Step 3 – calculate test statistic

Step 4 – calculate significance

Page 22: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

Step 4: testing the hypothesesStep 4: testing the hypotheses

Remember that the Remember that the null hypothesis null hypothesis is being tested. The significance of is being tested. The significance of the test statistic will determine the test statistic will determine whether the Null Hypothesis is whether the Null Hypothesis is accepted or rejectedaccepted or rejected

There are set conventions for There are set conventions for significance testing and these will significance testing and these will guide our discussionguide our discussion

Page 23: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

Step 4: continuedStep 4: continued

Common significance levels used in Common significance levels used in the social sciences are:the social sciences are:

• pp<0.10 reject at the 10% level<0.10 reject at the 10% level• p<0.05 reject at the 5% levelp<0.05 reject at the 5% level• p<0.01 reject at the 1% levelp<0.01 reject at the 1% level• p<0.001 reject at the 0.1% levelp<0.001 reject at the 0.1% level

The 5% level is often used within The 5% level is often used within archaeologyarchaeology

Page 24: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

Step 4: continuedStep 4: continued

If pIf p<0.05 (5%) reject H<0.05 (5%) reject H00 at the 5% at the 5% level and conclude that there is level and conclude that there is significant evidence to show that the significant evidence to show that the percentage of bronze spearheads percentage of bronze spearheads from burials is less than 40% (in from burials is less than 40% (in other words if Hother words if H0 0 is rejected His rejected H11 mustmust be acceptedbe accepted

Page 25: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

What does this mean?What does this mean?

We can now conclude that we are 95% We can now conclude that we are 95% certain that the percentage of bronze certain that the percentage of bronze spearheads from burials is less than 40%spearheads from burials is less than 40%

If, however, the p-value was greater than If, however, the p-value was greater than 0.05, the conclusion would have been to 0.05, the conclusion would have been to reject Hreject H00 at the 5% level and accept H at the 5% level and accept H11

Page 26: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey
Page 27: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

Confidence interval

90%

95%

99%

Page 28: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

Confidence interval

90%

95%

99%

Probability

p=0.10

Page 29: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

Confidence interval

90%

95%

99%

Probability

p=0.10

p=0.05

Page 30: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

Confidence interval

90%

95%

99%

Probability

p=0.10

p=0.05

p=0.01

Page 31: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

p<0.05 reject at the 5% level

Page 32: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

p<0.05 reject at the 5% level

p<0.10 reject at the 10% level

Page 33: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

p<0.05 reject at the 5% level

p<0.10 reject at the 10% level

p<0.01 reject at the 1% level

Page 34: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

Chi-squared testChi-squared test

The Chi-squared Test was developed by The Chi-squared Test was developed by Karl Pearson in 1900 to test if a Karl Pearson in 1900 to test if a contingency table provides significant contingency table provides significant evidence of an association between two evidence of an association between two variablesvariables

It can be used for both nominal and It can be used for both nominal and ordinal levels, though it is better suited to ordinal levels, though it is better suited to nominal datanominal data

Page 35: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

sample:• 40 spearheads

variables:• material – iron/bronze• loop – yes/no

First we need to display the data in a contingency table

Chi-squared testChi-squared test

Page 36: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

Chi-squared explainedChi-squared explained

It’s a method of comparing the It’s a method of comparing the observed frequencies (the data) with observed frequencies (the data) with those expected under the null those expected under the null hypothesis of no association between hypothesis of no association between two variablestwo variables

Page 37: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

bivariate frequency tablebivariate frequency table

No loopNo loop LoopLoop

IronIron 2020 00 2020

BronzeBronze 99 1111 2020

2929 1111 4040

•Is there any association between the two variables?

•How strong is the association between the two variables?

Page 38: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

expected frequency (E) = expected frequency (E) = (row total)(column total)(row total)(column total) (overall total) (overall total)

No loopNo loop LoopLoop

IronIron 2020 00 2020

BronzeBronze 99 1111 2020

2929 1111 4040

Page 39: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

expected frequency (E) = expected frequency (E) = (row total)(column total)(row total)(column total) (overall total) (overall total)

No loopNo loop LoopLoop

IronIron 2020 00 2020

BronzeBronze 99 1111 2020

2929 1111 4040

Page 40: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

expected frequency (E) = expected frequency (E) = (20)(11)(20)(11) = 5.5 = 5.5 (40) (40)

No loopNo loop LoopLoop

IronIron 2020 0 0 (5.5)(5.5) 2020

BronzeBronze 99 1111 2020

2929 1111 4040

Page 41: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

No loopNo loop LoopLoop

IronIron 20 20 (14.5)(14.5) 0 0 (5.5)(5.5) 2020

BronzeBronze 9 9 (14.5)(14.5) 11 11 (5.5)(5.5) 2020

2929 1111 4040

Page 42: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey
Page 43: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

No loopNo loop LoopLoop

IronIron 20 (14.5)20 (14.5) 0 (5.5)0 (5.5) 2020

BronzeBronze 9 (14.5)9 (14.5) 11 (5.5)11 (5.5) 2020

2929 1111 4040

Page 44: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

No loopNo loop LoopLoop

IronIron 20 (14.5)20 (14.5) 0 (5.5)0 (5.5) 2020

BronzeBronze 9 (14.5)9 (14.5) 11 (5.5)11 (5.5) 2020

2929 1111 4040

degrees of freedom (d.f.) = (r-1)(c-1)

Page 45: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

No loopNo loop LoopLoop

IronIron 20 (14.5)20 (14.5) 0 (5.5)0 (5.5) 2020

BronzeBronze 9 (14.5)9 (14.5) 11 (5.5)11 (5.5) 2020

2929 1111 4040

degrees of freedom (d.f.) = (r-1)(c-1) = (2-1)(2-1) = (1)(1) = 1

Page 46: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

Critical values of the χ2 distribution

Page 47: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

Critical values of the χ2 distribution

d.f. = 1

Page 48: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

Critical values of the χ2 distribution

d.f. = 1

χ2 = 15.18

Page 49: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

Cramer’s V statisticCramer’s V statistic

Cramer’s V statistic can be Cramer’s V statistic can be calculated to measure the strength calculated to measure the strength of associationof association

This gives us a value This gives us a value V V between 0 between 0 and 1 with values close to 1 and 1 with values close to 1 indicating a strong relationshipindicating a strong relationship

Page 50: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

Cramer’s V statistic

Where:

n = total of all frequencies (40)

m = the smaller of (c-1) and (r-1)

Page 51: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

Cramer’s V statistic

V= √15.18/(40)(1)√0.3795= 0.62

Page 52: Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

SummarySummary