37
Biol 500: basic statistics tand basics of experimental design ntrols plication tand how to report quantitative data e to interpret the reporting of statistical result al article

Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

Embed Size (px)

Citation preview

Page 1: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

Biol 500: basic statistics

Goals:

1) understand basics of experimental design - controls - replication

2) understand how to report quantitative data

3) be able to interpret the reporting of statistical results in a journal article

Page 2: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

Replication: allows you to determine if the difference between treatments or groups of samples is greater than the variation within a treatment or group

Is there a difference in how effective the 3 drugs are in curing headaches?

Page 3: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

?

no

yes

Replication: allows you to determine if the difference between treatments or groups of samples is greater than the variation within a treatment or group

Is there a difference in how effective the 3 drugs are in curing headaches?

Generally, overlapping error bars indicate no significant difference between the mean values that are being graphed

Bars don’t overlap = probably different

Page 4: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

Controls: From these data, could you tell if the least effective drug has any effect at all?

Page 5: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

Controls: From these data, could you tell if the least effective drug has any effect at all?

Key to interpreting your results: Include a control that is the same in all respects except the one variable you will manipulate experimentally

Page 6: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

Controls: Procedural controls allow you to diagnose problems in your experiment, samples or technique

When we amplify DNA from unknown samples by PCR, we include a positive control (a DNA sample that always works) and a negative control (all the PCR reagents, but no DNA)

This allows us to interpret the results of our gels, and to troubleshoot any problems

Page 7: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

Do squirrels bury acorns?

My experiment: I remove all the squirrels from 3 clumps of trees in one park, but leave the squirrels in 3 ‘control’ tree clumps in another park, on the other side of town

Park A Park B

Page 8: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

Pseudoreplication

In this example the unit of replication is the park, not the clump of trees – I have no actual replication

Park A Park B

Any difference I measure could be due to differences between the two parks, and not due to my squirrel-removal treatment

Page 9: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

Avoiding pseudoreplication

Correct design would be to have squirrel-removal and control areas in each of several replicate parks

This lets you assess differences between treatment and control areas, while simultaneously measuring variation among parks

Park A Park CPark B

Page 10: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

n = 10

Did these two classes do differently on my 418 midterm?

Page 11: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

n = 20

Page 12: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

n = 44

Page 13: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

n = 44 = 133.9 ± 29.7 SD range: 59 - 183

= 126.3 ± 38.8 SD range: 42 - 188

The statistical approach is to ask if the means of these two populations are significantly different

Page 14: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

n = 44

the standard deviation (SD) is what you should report if you are actually interested in the variation – ie, for purposes ofdeciding where to draw the line between grades

= 133.9 ± 29.7 SD range: 59 - 183

= 126.3 ± 38.8 SD range: 42 - 188

Page 15: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

n = 44 = 133.9 ± 29.7 SD or ± 4.3 SErange: 59 - 183

= 126.3 ± 38.8 SD or ± 5.8 SE range: 42 - 188

the standard error (SE, or SEM) is SD

√ n sample size

Page 16: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

n = 44

the standard error is what you report when you want to compare the means of different treatments or samples

= 133.9 ± 29.7 SD or ± 4.3 SErange: 59 - 183

= 126.3 ± 38.8 SD or ± 5.8 SE range: 42 - 188

Page 17: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

= 133.9 ± 29.7 SD

= 126.3 ± 38.8 SD

unpaired two-tailed t-test: df = 86, t = 1.20, P = 0.23

a t-test compares 2 populations by calculating a test statistic called t and determining the probability (P, or p) of getting that value of t, with that sample size, by chance alone

Page 18: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

unpaired two-tailed t-test: df = 86, t = 1.20, P = 0.23

paired would be, you compare the % scores on midterm versus final for each student; most tests are unpaired

= 133.9 ± 29.7 SD

= 126.3 ± 38.8 SD

Page 19: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

unpaired two-tailed t-test: df = 86, t = 1.20, P = 0.23

one-tailed if you have some reason to think, in advance, that the 2009 scores will only be higher (or lower) than 2007

- cuts your P-value in half, but you need a reason to do this!

= 133.9 ± 29.7 SD

= 126.3 ± 38.8 SD

Page 20: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

unpaired two-tailed t-test: df = 86, t = 1.20, P = 0.23

power of your test will depend on your degrees of freedom, which is (sample size) – (number of groups)

- in this case: (44 + 44 students) – (2 groups) = 88 -2 = 86

= 133.9 ± 29.7 SD

= 126.3 ± 38.8 SD

Page 21: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

unpaired two-tailed t-test: df = 86, t = 1.20, P = 0.23

P values below 0.05 are accepted as significant, meaning there is less than a 5% chance of getting a test statistic this large if the groups are not really any different

= 133.9 ± 29.7 SD

= 126.3 ± 38.8 SD

Page 22: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

3 or more samples can be compared using a one-way Analysis of Variance, or ANOVA

instead of calculating a t statistic, ANOVA calculates an F-ratio, which compares variation within groups (error bars) to the differences in mean values among groups

2 degrees of freedom: 1st = (# of groups – 1) 2nd = (total sample size) – (# of groups)

F2,129 = 7.12

P <0.001

df subscripted under F ratio

n = 44 n = 44 n = 44 overall P for 3-way comparison of means

Page 23: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

If your overall P-value is significant, you can then do a post-hoc(“after the fact”) test to work out which specific means are different from each other

Bonferroni - not too conservative; may see differences that aren’t real

Scheffe - very conservative; if it sees a difference, there really is one

Dunnett - compares each mean to a control; most powerful

F2,129 = 7.12

P<0.001Scheffe: P = 0.002n = 44 n = 44 n = 44

Scheffe: P = 0.050

P = 0.474

Page 24: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

2-way ANOVA tests for interactions among 2 or more factors

0

25

50

75

100

control aspirin only tylenol only aspirin + tylenol

factors: aspirin, yes/no tylenol, yes/no

Page 25: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

2-way ANOVA tests for interactions among 2 or more factors

0

25

50

75

100

control aspirin only tylenol only aspirin + tylenol

when the response to two treatments combined is not what you would expect from adding their individual effects, this is an interaction

interactions are usually the most biologically interesting result!

Page 26: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

2-way ANOVA tests for interactions among 2 or more factors

0

25

50

75

100

control aspirin only tylenol only aspirin + tylenol

NOT appropriate to do a 1-way ANOVA on these data, because that requires that each treatment be independent of the other treatments

- since 2 treatments involve aspirin, they are not independent

- also, you miss the interaction, which is the important result

A B C D

Page 27: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

Correlation analysis is appropriate when you think 2 variables are related, but not in a cause-and-effect way

- arm length and leg length are related, but longer arms do not cause you to have longer legs; both are due to your height

Regression analysis is when you believe a change in one predictor variable (what you manipulate) causes a change in the response variable (the thing you measure)

- adding more water makes plants grow taller

Page 28: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

Output of a regression analysis includes:

1) ANOVA table

tells you if your modelexplains a significantamount of the variationin the response

Page 29: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

Output of a regression analysis includes:

1) ANOVA table

2) equation of the best-fit line

summarizes the relationship between predictor and response

Page 30: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

Output of a regression analysis includes:

1) ANOVA table

2) equation of the best-fit line

3) table testing the effect of each predictor

in multiple regression, you can test many possible predictors that might matter, and see which significantly affect the response variable

Page 31: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

Output of a regression analysis includes:

1) ANOVA table

2) equation of the best-fit line

3) table testing the effect of each predictor

4) r2

r2 is the % of variation in the response that is due a change in the predictor

Page 32: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

More scatter = lower r2

You can have a low r2, but still have a significant slope

Page 33: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

ANOVA and regression are both types of linear models, which test the same basic equation:

response variable = model + error

thing you measure

predictors, andcoefficients thattell you how theyaffect the response

variance in the response that is not explained by the model

this is what a simple linearregression model looks like

Page 34: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

Does predictor X affect response?

test is to set the coefficient = 0, which drops out the predictor, and see if the model (now just the residual error term) is really any worse

Page 35: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

All of the tests we have discussed are parametric tests

- they use the numerical values of your actual data

- however, they also have built-in assumptions that your data, and the residual errors, fit a normal distribution

(bell curve)

Parametric versus non-parametric tests

0

2

4

6

8

10

12

14

16

Cou

nt

75 100 125 150 175 200 225 250 275 300Column 1

Histogram

Page 36: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

If your data do not fit a normal distribution, you can transform the raw numbers to make them more “normal” – put the data through a mathematical function

Parametric versus non-parametric tests

0

2

4

6

8

10

12

14

16

Cou

nt

.3 .4 .5 .6 .7 .8 .9 1Column 1

Histogram

0

2

4

6

8

10

12

14

16

18

20

Cou

nt

.5 .6 .7 .8 .9 1 1.1 1.2 1.3 1.4 1.5Column 2

Histogram

% scores

arcsine(square-root(%)) is a standard transformation for %’s which stop at 100%, and are often not normally distributed

Page 37: Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3)

Parametric versus non-parametric testsAlternatively, there are non-parametric versions of most common statistical tests that use ranked values instead of the raw data

- are typically more conservative: if they see a difference, it is real

- make no assumptions about the shape of the distribution

raw ranked (high to low)3 52 66 34 49 21 712 1