Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey

Computing in Computing in ArchaeologyArchaeology

Session 10. Statistical Session 10. Statistical tests of significancetests of significance

© Richard Haddlesey www.medievalarchitecture.net

AimsAims

To understand what we mean by statistical To understand what we mean by statistical significance and archaeological significance and archaeological significancesignificance

To test significance (through the To test significance (through the Null Null hypothesishypothesis and and Chi-squaredChi-squared testing) testing)

Key text: Fletcher & Lock (2Key text: Fletcher & Lock (2ndnd Ed) 2005. Ed) 2005. Digging NumbersDigging Numbers. Oxford. 63-5, 128-38 . Oxford. 63-5, 128-38

Is it significant?

What do we mean by significant?

Choosing a test

positively skeweddistribution

normal distribution


normal distribution

parametric test


non-parametric test

normal distribution

parametric test

Hypothesis testingHypothesis testing

Before we can test significance we must Before we can test significance we must formulate two hypothesesformulate two hypotheses

So what do we mean by hypothesis So what do we mean by hypothesis testing?testing?

Theories abound in archaeology although Theories abound in archaeology although many of them cannot be tested in any way many of them cannot be tested in any way let alone in the formal way described let alone in the formal way described throughout this lecturethroughout this lecture

Hypothesis testingHypothesis testing

A test must be repeatable, not just by you, A test must be repeatable, not just by you, but by anyone who has access to the data but by anyone who has access to the data setset

A hypothesis, therefore, must represent a A hypothesis, therefore, must represent a quantifiable relationship and it is this quantifiable relationship and it is this relationship which is tested formallyrelationship which is tested formally

We could say that all hypotheses are We could say that all hypotheses are theories whereas not all theories are theories whereas not all theories are hypotheseshypotheses

Example hypothesisExample hypothesis

In order to illustrate the logic of a In order to illustrate the logic of a hypothesis test consider testing the hypothesis test consider testing the hypothesis that at least 40% of all hypothesis that at least 40% of all bronze spearheads come from bronze spearheads come from burialsburials

Fletcher & Lock, 63

Step 1 – formulate two hypotheses

Step 1 – formulate two hypotheses• null hypothesis (H0)


• alternative hypothesis (H1)

Step One:Step One: H H00 & H & H11

This should be done so that one and This should be done so that one and only one only one mustmust be true be true

In this case we would have:In this case we would have:• HH00: proportion of bronze spears from burials is ≥40%: proportion of bronze spears from burials is ≥40%

• HH11: proportion of bronze spears from burials is : proportion of bronze spears from burials is <<40%40%



Step 2 – take measurements

Step TwoStep Two

Take a suitable measurement or Take a suitable measurement or observation from which a test statistic and observation from which a test statistic and its associated probability (step 3) can be its associated probability (step 3) can be calculatedcalculated

Here we have a sample of 20 bronze Here we have a sample of 20 bronze spearheads 7 of which have been found in spearheads 7 of which have been found in burials (this is the observed result)burials (this is the observed result)

So far so good!So far so good!




Step 3 – calculate test statistic

Step 3: the difficult bitStep 3: the difficult bit

Here we calculate a Here we calculate a test statistictest statistic which can then be tested for which can then be tested for significance in step 4. significance in step 4.

The test statistic allows for the The test statistic allows for the calculation of the probability of the calculation of the probability of the observed result which is often called observed result which is often called the p-valuethe p-value

Step 3: continuedStep 3: continued

If HIf H0 0 is true and at least 40% of all bronze is true and at least 40% of all bronze spearheads do come from burials what is spearheads do come from burials what is the probability of a sample of 20 the probability of a sample of 20 containing 7 from burials?containing 7 from burials?

P (burial)P (burial) =0.40 and so P (not burial) =0.60=0.40 and so P (not burial) =0.60

P (not burial for 1P (not burial for 1stst & 2 & 2ndnd)) =(0.60)(0.60)=(0.60)(0.60) =(0.60) =(0.60)22

hence P (not burial for 13 (20-7))hence P (not burial for 13 (20-7)) =(0.60)=(0.60)1313 =0.0013=0.0013

The p-value (probability of the observed The p-value (probability of the observed result) is 0.0013 or 0.13%result) is 0.0013 or 0.13%




Step 3 – calculate test statistic

Step 4 – calculate significance

Step 4: testing the hypothesesStep 4: testing the hypotheses

Remember that the Remember that the null hypothesis null hypothesis is being tested. The significance of is being tested. The significance of the test statistic will determine the test statistic will determine whether the Null Hypothesis is whether the Null Hypothesis is accepted or rejectedaccepted or rejected

There are set conventions for There are set conventions for significance testing and these will significance testing and these will guide our discussionguide our discussion


Common significance levels used in Common significance levels used in the social sciences are:the social sciences are:

• pp<0.10 reject at the 10% level<0.10 reject at the 10% level• p<0.05 reject at the 5% levelp<0.05 reject at the 5% level• p<0.01 reject at the 1% levelp<0.01 reject at the 1% level• p<0.001 reject at the 0.1% levelp<0.001 reject at the 0.1% level

The 5% level is often used within The 5% level is often used within archaeologyarchaeology


If pIf p<0.05 (5%) reject H<0.05 (5%) reject H00 at the 5% at the 5% level and conclude that there is level and conclude that there is significant evidence to show that the significant evidence to show that the percentage of bronze spearheads percentage of bronze spearheads from burials is less than 40% (in from burials is less than 40% (in other words if Hother words if H0 0 is rejected His rejected H11 mustmust be acceptedbe accepted

What does this mean?What does this mean?

We can now conclude that we are 95% We can now conclude that we are 95% certain that the percentage of bronze certain that the percentage of bronze spearheads from burials is less than 40%spearheads from burials is less than 40%

If, however, the p-value was greater than If, however, the p-value was greater than 0.05, the conclusion would have been to 0.05, the conclusion would have been to reject Hreject H00 at the 5% level and accept H at the 5% level and accept H11

Confidence interval

90%

95%

99%

Confidence interval

90%

95%

99%

Probability

p=0.10

Confidence interval

90%

95%

99%

Probability

p=0.10

p=0.05

Confidence interval

90%

95%

99%

Probability

p=0.10

p=0.05

p=0.01

p<0.05 reject at the 5% level






Chi-squared testChi-squared test

The Chi-squared Test was developed by The Chi-squared Test was developed by Karl Pearson in 1900 to test if a Karl Pearson in 1900 to test if a contingency table provides significant contingency table provides significant evidence of an association between two evidence of an association between two variablesvariables

It can be used for both nominal and It can be used for both nominal and ordinal levels, though it is better suited to ordinal levels, though it is better suited to nominal datanominal data

sample:• 40 spearheads

variables:• material – iron/bronze• loop – yes/no

First we need to display the data in a contingency table

Chi-squared testChi-squared test

Chi-squared explainedChi-squared explained

It’s a method of comparing the It’s a method of comparing the observed frequencies (the data) with observed frequencies (the data) with those expected under the null those expected under the null hypothesis of no association between hypothesis of no association between two variablestwo variables

bivariate frequency tablebivariate frequency table

No loopNo loop LoopLoop

IronIron 2020 00 2020

BronzeBronze 99 1111 2020

2929 1111 4040

•Is there any association between the two variables?

•How strong is the association between the two variables?

expected frequency (E) = expected frequency (E) = (row total)(column total)(row total)(column total) (overall total) (overall total)


IronIron 2020 00 2020


2929 1111 4040

expected frequency (E) = expected frequency (E) = (row total)(column total)(row total)(column total) (overall total) (overall total)


IronIron 2020 00 2020


2929 1111 4040

expected frequency (E) = expected frequency (E) = (20)(11)(20)(11) = 5.5 = 5.5 (40) (40)


IronIron 2020 0 0 (5.5)(5.5) 2020


2929 1111 4040


IronIron 20 20 (14.5)(14.5) 0 0 (5.5)(5.5) 2020

BronzeBronze 9 9 (14.5)(14.5) 11 11 (5.5)(5.5) 2020

2929 1111 4040


IronIron 20 (14.5)20 (14.5) 0 (5.5)0 (5.5) 2020

BronzeBronze 9 (14.5)9 (14.5) 11 (5.5)11 (5.5) 2020

2929 1111 4040


IronIron 20 (14.5)20 (14.5) 0 (5.5)0 (5.5) 2020

BronzeBronze 9 (14.5)9 (14.5) 11 (5.5)11 (5.5) 2020

2929 1111 4040

degrees of freedom (d.f.) = (r-1)(c-1)


IronIron 20 (14.5)20 (14.5) 0 (5.5)0 (5.5) 2020

BronzeBronze 9 (14.5)9 (14.5) 11 (5.5)11 (5.5) 2020

2929 1111 4040

degrees of freedom (d.f.) = (r-1)(c-1) = (2-1)(2-1) = (1)(1) = 1

Critical values of the χ2 distribution


d.f. = 1


d.f. = 1

χ2 = 15.18

Cramer’s V statisticCramer’s V statistic

Cramer’s V statistic can be Cramer’s V statistic can be calculated to measure the strength calculated to measure the strength of associationof association

This gives us a value This gives us a value V V between 0 between 0 and 1 with values close to 1 and 1 with values close to 1 indicating a strong relationshipindicating a strong relationship

Cramer’s V statistic

Where:

n = total of all frequencies (40)

m = the smaller of (c-1) and (r-1)

Cramer’s V statistic

V= √15.18/(40)(1)√0.3795= 0.62

SummarySummary

Documents

Computing in Archaeology Session 10. Statistical tests of significance © Richard Haddlesey