Upload
gwendoline-robertson
View
227
Download
0
Tags:
Embed Size (px)
Citation preview
Statistics in IB BiologyError bars, standard deviation, t-test and more
Error barsAdded to a graph
◦Show range of data - or – ◦Standard deviation of data
Mean test score (@ 120 min.)
x
y
Time spent studying for test (minutes)
Gra
de o
n
test
(%
)
120
94
Highest test score (@120 min.)
Lowest test score (@120 min.)
Error barsDiscuss how error bars affect the
meaning of these graphs
Time studying v. grades Time studying v. grades
Hours
Perc
en
t gra
de
Mean and Standard Deviation
Calculate the means for these two sets of data:◦Set 1: 11, 9, 10, 9, 8, 10, 11, 12◦Set 2: 2, 17, 5, 14, 9, 6, 16, 11
The mean is the same, but the range is very different.
Standard deviation describes the
spread of values around the mean
= 10= 10
Standard Deviation
For your reference only:1. Find the mean of the values2. Subtract each value from the mean3. Square each difference from the mean (Can you see why?)4. Find the sum of the squared differences5. Divide by the number of values (This is called the variance)6. Take the square root
Find the standard deviation function on your calculator
Standard deviation tutorial
Standard deviation
When the data are distributed normally (in a bell shape, not weirdly scattered) 68% of values are within one standard deviation of the mean
Standard deviationSmall standard deviations
indicate tightly clustered dataLarge standard deviations
indicate widely spread data
Which graph (top or bottom) would have a larger standard deviation?Explain!
Standard deviation questions
1. If US women have a mean height of 64 inches and an SD of 2.5 inches, 68% of US women would fall into what range of heights? 95% of women?
2. How does the SD of “pro beach volleyball women” compare to the SD of US women in general?
Analyzing DataChi-Square
T-test Chi-squareTests continuous
dataCompares the means
of two groupsDegrees of freedom =
◦ total # of samples - 2
Requirements:◦ Normal distribution
◦ Should have 10+ samples (more = more reliable results)
Tests categorical dataCompares the
frequencies of multiple (2+) groups
Degrees of freedom = ◦ # of categories – 1 –or—
◦ (# of rows – 1)(# of columns -1)
Requirements:◦ Random sample
◦ 20+ samples, 5+ per category
Chi-square: Goodness of fit testOne variable only Example: rolling a die – is it a fair die?
◦H0 = null hypothesis, results match expectations
◦H1 = a real difference between results and expectations
Degrees of freedom = # of categories - 16-1 = 5
Results from 60 rolls of a single die
Number 1 2 3 4 5 6
Frequency 12 6 8 11 8 15
Using the chi-square formula
There were 60 rolls and number should have an equal chance of coming up, so
60 / 6 = 10 is the expected value for each number
Results from 60 rolls of a single die
Number 1 2 3 4 5 6
Observed Frequency
12 6 8 11 8 15
Expected frequency
10 10 10 10 10 10
Using the chi-square formula
Why square the difference between O and E?
Why divide by the expected value?
Results from 60 rolls of a single die
Number 1 2 3 4 5 6
Observed Frequency
12 6 8 11 8 15
Expected frequency
10 10 10 10 10 10
(O-E) 2 -4 -2 1 -2 5
(O-E)2 4 16 4 1 4 25
(O-E)2/E 0.4 1.6 0.4 0.1 0.4 2.5
Using the chi-square formula
Sum of differences = 0.4 + 1.6 + 0.4 + 0.1 + 0.4 + 2.5
= 5.4
2= 5.4 with 5 degrees of freedom (df)
Results from 60 rolls of a single die
Number 1 2 3 4 5 6
Observed Frequency
12 6 8 11 8 15
Expected frequency
10 10 10 10 10 10
(O-E) 2 -4 -2 1 -2 5
(O-E)2 4 16 4 1 4 25
(O-E)2/E 0.4 1.6 0.4 1 0.4 2.5
Using the chi-square formula
P = .05 is the standard critical value
What does a p-value of 0.05 mean?
2= 5.4 with 5 dfIs the die fair?
Discuss.
Chi-square: independence testTwo or more variablesExample: blood pressure
medicine
Blood pressure levels of patients on Medicine A
High Normal Low Total:
Medicine 110 350 40 500
Placebo 150 320 30 500
Total: 260 670 70 1000
Df = (columns – 1)(rows – 1)* rows / columns of values, NOT totals or labels
= (3-1)(2-1) = 2 df
Using chi-square
Expected values = (row total)(column total)
overall total Example: on medicine with high blood
pressureExpected = (500)(260) / 1000 =
130
Blood pressure levels of patients on Medicine A
High Normal Low Total:
Medicine 110 350 40 500
Placebo 150 320 30 500
Total: 260 670 70 1000
Using chi-squareThe 2 value is 8.92Discuss what this means.
t-testTells whether the difference
between two sets of data is significant (meaningful and not caused by chance)
Requirements for the t-test◦Sample size of at least 10 (according
to IB)◦Normally distributed data◦Continuous data
t-testt-test gives a p – value (0 – 1) p-value represents the probability
that the difference between the means is due to random chance / sampling error
The critical value for scientific “significance” is often p= 0.05◦p< 0.05 is significant◦p> 0.05 is NOT significant
t - testNote that the mean is the same in all
groups of one colorIn which case is the difference between
the green and blue groups most likely to be significant? How would a t – test show this?
Need to calculate a t – test?Use graphpad.com
◦http://graphpad.com/quickcalcs/index.cfm
◦Has t – test, standard deviation, and many other calculators (You will also need to know Chi-square)
Correlation and CausationCorrelated variables
show a shared trend ◦Inverse correlation
Causal relationships mean a change in one variable causes the change in another
Correlation and CausationA U.S. study
shows: Eating ice cream
causes sunburn!
Time of year (month)
Am
ount
Explain what is wrong with this “study”.
Read about a current
debate on lead and
crime here!