Statistics in IB Biology Error bars, standard deviation, t-test and more

Statistics in IB BiologyError bars, standard deviation, t-test and more

Error barsAdded to a graph

◦Show range of data - or – ◦Standard deviation of data

Mean test score (@ 120 min.)

x

y

Time spent studying for test (minutes)

Gra

de o

n

test

(%

)

120

94

Highest test score (@120 min.)

Lowest test score (@120 min.)

Error barsDiscuss how error bars affect the

meaning of these graphs

Time studying v. grades Time studying v. grades

Hours

Perc

en

t gra

de

Mean and Standard Deviation

Calculate the means for these two sets of data:◦Set 1: 11, 9, 10, 9, 8, 10, 11, 12◦Set 2: 2, 17, 5, 14, 9, 6, 16, 11

The mean is the same, but the range is very different.

Standard deviation describes the

spread of values around the mean

= 10= 10

Standard Deviation

For your reference only:1. Find the mean of the values2. Subtract each value from the mean3. Square each difference from the mean (Can you see why?)4. Find the sum of the squared differences5. Divide by the number of values (This is called the variance)6. Take the square root

Find the standard deviation function on your calculator

Standard deviation tutorial

http://mtsu32.mtsu.edu:11308/regression/level1/meanstdev/stdevbyhand.htm

http://mtsu32.mtsu.edu:11308/regression/level1/meanstdev/stdevbyhand.htm

Standard deviation

When the data are distributed normally (in a bell shape, not weirdly scattered) 68% of values are within one standard deviation of the mean

Standard deviationSmall standard deviations

indicate tightly clustered dataLarge standard deviations

indicate widely spread data

Which graph (top or bottom) would have a larger standard deviation?Explain!

Standard deviation questions

1. If US women have a mean height of 64 inches and an SD of 2.5 inches, 68% of US women would fall into what range of heights? 95% of women?

2. How does the SD of “pro beach volleyball women” compare to the SD of US women in general?

Analyzing DataChi-Square

T-test Chi-squareTests continuous

dataCompares the means

of two groupsDegrees of freedom =

◦ total # of samples - 2

Requirements:◦ Normal distribution

◦ Should have 10+ samples (more = more reliable results)

Tests categorical dataCompares the

frequencies of multiple (2+) groups

Degrees of freedom = ◦ # of categories – 1 –or—

◦ (# of rows – 1)(# of columns -1)

Requirements:◦ Random sample

◦ 20+ samples, 5+ per category

Chi-square: Goodness of fit testOne variable only Example: rolling a die – is it a fair die?

◦H0 = null hypothesis, results match expectations

◦H1 = a real difference between results and expectations

Degrees of freedom = # of categories - 16-1 = 5

Results from 60 rolls of a single die

Number 1 2 3 4 5 6

Frequency 12 6 8 11 8 15

Using the chi-square formula

There were 60 rolls and number should have an equal chance of coming up, so

60 / 6 = 10 is the expected value for each number


Number 1 2 3 4 5 6

Observed Frequency

12 6 8 11 8 15

Expected frequency

10 10 10 10 10 10


Why square the difference between O and E?

Why divide by the expected value?


Number 1 2 3 4 5 6

Observed Frequency

12 6 8 11 8 15

Expected frequency

10 10 10 10 10 10

(O-E) 2 -4 -2 1 -2 5

(O-E)2 4 16 4 1 4 25

(O-E)2/E 0.4 1.6 0.4 0.1 0.4 2.5


Sum of differences = 0.4 + 1.6 + 0.4 + 0.1 + 0.4 + 2.5

= 5.4

2= 5.4 with 5 degrees of freedom (df)


Number 1 2 3 4 5 6

Observed Frequency

12 6 8 11 8 15

Expected frequency

10 10 10 10 10 10

(O-E) 2 -4 -2 1 -2 5

(O-E)2 4 16 4 1 4 25

(O-E)2/E 0.4 1.6 0.4 1 0.4 2.5


P = .05 is the standard critical value

What does a p-value of 0.05 mean?

2= 5.4 with 5 dfIs the die fair?

Discuss.

Chi-square: independence testTwo or more variablesExample: blood pressure

medicine

Blood pressure levels of patients on Medicine A

High Normal Low Total:

Medicine 110 350 40 500

Placebo 150 320 30 500

Total: 260 670 70 1000

Df = (columns – 1)(rows – 1)* rows / columns of values, NOT totals or labels

= (3-1)(2-1) = 2 df

Using chi-square

Expected values = (row total)(column total)

overall total Example: on medicine with high blood

pressureExpected = (500)(260) / 1000 =

130

Blood pressure levels of patients on Medicine A

High Normal Low Total:

Medicine 110 350 40 500

Placebo 150 320 30 500

Total: 260 670 70 1000

Using chi-squareThe 2 value is 8.92Discuss what this means.

t-testTells whether the difference

between two sets of data is significant (meaningful and not caused by chance)

Requirements for the t-test◦Sample size of at least 10 (according

to IB)◦Normally distributed data◦Continuous data

t-testt-test gives a p – value (0 – 1) p-value represents the probability

that the difference between the means is due to random chance / sampling error

The critical value for scientific “significance” is often p= 0.05◦p< 0.05 is significant◦p> 0.05 is NOT significant

t - testNote that the mean is the same in all

groups of one colorIn which case is the difference between

the green and blue groups most likely to be significant? How would a t – test show this?

Need to calculate a t – test?Use graphpad.com

◦http://graphpad.com/quickcalcs/index.cfm

◦Has t – test, standard deviation, and many other calculators (You will also need to know Chi-square)

http://graphpad.com/quickcalcs/index.cfm

http://graphpad.com/quickcalcs/index.cfm

Correlation and CausationCorrelated variables

show a shared trend ◦Inverse correlation

Causal relationships mean a change in one variable causes the change in another

Correlation and CausationA U.S. study

shows: Eating ice cream

causes sunburn!

Time of year (month)

Am

ount

Explain what is wrong with this “study”.

Read about a current

debate on lead and

crime here!

http://www.bbc.com/news/magazine-27067615

Documents

Statistics in IB Biology Error bars, standard deviation, t-test and more