Upload
spencer-davidson
View
216
Download
3
Embed Size (px)
Citation preview
STATISTICSAdvanced
Higher
Chi-squared test
Advanced Higher
STATISTICS
Chi-squared testFinding if there is a significant association between sets of data.
Lesson Objectives1. Explain why it is used.2. List the advantages and disadvantages .3. Understand how to apply the statistical test.4. Apply it to a relevant context.
Advanced Higher
STATISTICSCh
i-squ
ared
test
: loo
king
for a
diff
eren
ce The situationA group of students have visited the Lake District National Park to investigate the impact of tourism upon the landscape. One of their data collection techniques is to record the amount of traditional and modern looking houses in 8 villages inside of the National Park boundary and 8 villages outside of the boundary line...
What should they do?• What data should they have collected to complete this investigation?• How much data should they collect?• How can they make sure that the data is reliable?• What initial data representation skill could they utilise to discover an initial impression?• What statistical test should they use to confidently state there is or is not a relationship?
-What did you observe? (what data did you actually collect?)
- What would you expect if there was no association?Ch
i-squ
ared
test
: loo
king
for a
diff
eren
ce
O = the Observed frequency (what you actually counted)
E = the Expected frequency (what you would expect if there was no association)
(O-E)2
EX2 = S
Traditional houses Modern houses
We found 180 traditional homes inside of the National Park and
23 outside.
We found 103 modern homes inside of the
National Park and 452 outside.
Null Hypothesis:
There is no significant difference between building ages inside and outside of the National ParkAlternative Hypothesis: There is a significant difference between building ages inside and outside of the National Park
TESTING THE RELATIONSHIP
(O-E)2
EX2 = S
OBSERVED FREQUENCIES
Inside the national park
Outside the national park
row total
Traditional houses 180 23 203
Modern houses 103 452 555
Column total 283 475 753
1st: construct a table with the data that you have observed
EXPECTEDFREQUENCIES
Inside the national park
Outside the national park
row total
Traditional houses 40.7 128 203
Modern houses 111 350 555
Column total 151 478 753
2nd: work out the expected frequency
Expected Frequency = row total x column totalGrand total
(O-E)2
EX2 = S
(O-E)2
EX2 = S
Null Hypothesis:
There is no significant difference between building ages inside and outside of the National ParkAlternative Hypothesis: There is a significant difference between building ages inside and outside of the National Park
TESTING THE RELATIONSHIP
Degrees of Freedom
0.05 (95%) 0.01 (99%)
1 3.84 6.642 5.99 9.213 7.82 11.344 9.49 13.285 11.08 15.096 12.59 16.817 14.07 18.488 15.51 20.099 16.92 21.67
10 18.31 23.2111 19.68 24.7212 21.03 26.2213 22.36 27.6914 23.68 29.1415 25.00 30.5816 26.30 32.0017 27.59 33.4118 28.87 34.8019 30.14 36.1920 37.57 37.5730 43.77 50.89
FINAL STATEMENT
IF X2 IS HIGHER THAN OR EQUAL TO THE CRITICAL VALUE REJECT THE
NULL HYPOTHESIS AND ACCEPT THE ALTERNATIVE.
As X2 is (greater than / less than) the Critical Value I can (accept / reject) the Null Hypothesis and (accept /
reject) the Alternative Hypothesis.
Therefore I can state that there (is no / is a) significant association…
…to a significance level of 0.05 (95%
sure results have not occurred by chance).
CALCULATE THE DEGREES OF FREEDOM: (Number of Rows – 1) x (Number of Columns – 1)
Chi2 value of ____ is higher than3.84 and 6.64 so…
Reasons to use it• It allows you to identify if there is a difference or a relationship
between two characteristics.• It is simple to carry out• It compares the data that you have observed with what you would
expect to happen. Disadvantages of using it• The data must be in the form of frequencies.• The frequency of the data must have a precise numerical value and be
able to be organised into categories or groups.• The total number of observations must be more than 20.• The expected frequency in any one cell of the table must be more
than 5.
There is a significant associationbetween housing age inside and outside of the Lake
District National Park.
State the answer in terms of the alternative hypothesis.
• Sometimes buildings are built recently but designed to look old.• The survey may have included unused farm buildings as traditional but
not necessarily used as homes.• It is uncertain how the survey determined what was modern or
traditional.• The survey indicates that villages inside of the Park are smaller.
• Perhaps there is a static village size and new buildings aren’t being built.
Referring to a National park that you have
studied, comment on
the results shown in this
test.
Justify the suitability of
using chi2 test.
You compare the observed data with the data that you would expect.Looking for a difference between O & E.
If there is a difference, then there is an association!
Reason to use this test:• If you have categorical data (eg. blue eyes)
• means are not a category. Colours, for example, are.
Must have:• More than one category
• A minimum of 5 in each one