28
Statistics for Water Science: Hypothesis Testing: Fundamental concepts and a survey of methods Unite 5: Module 17, Lecture 2

Statistics for Water Science: Hypothesis Testing: Fundamental concepts and a survey of methods Unite 5: Module 17, Lecture 2

Embed Size (px)

Citation preview

Page 1: Statistics for Water Science: Hypothesis Testing: Fundamental concepts and a survey of methods Unite 5: Module 17, Lecture 2

Statistics for Water Science: Hypothesis Testing: Fundamental concepts and a survey of methods

Unite 5: Module 17, Lecture 2

Page 2: Statistics for Water Science: Hypothesis Testing: Fundamental concepts and a survey of methods Unite 5: Module 17, Lecture 2

Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s2

Statistics

A branch of mathematics dealing with the collection, analysis, interpretation and presentation of masses of numerical data: Descriptive Statistics (Lecture 1)

Basic description of a variable Hypothesis Testing (Lecture 2)

Asks the question – is X different from Y? Predictions (Lecture 3)

What will happen if…

Page 3: Statistics for Water Science: Hypothesis Testing: Fundamental concepts and a survey of methods Unite 5: Module 17, Lecture 2

Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s3

Objectives

Introduce the basic concepts and assumptions of significance tests Distributions on parade Developing hypotheses What is “true”?

Survey statistical methods for testing for differences in populations of numbers Sample size issues Appropriate tests

What we won’t do: Elaborate on mathematical underpinnings of tests (take a

good stats course for this!)

Page 4: Statistics for Water Science: Hypothesis Testing: Fundamental concepts and a survey of methods Unite 5: Module 17, Lecture 2

Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s4

The mean: A measure of central tendency

The Standard Deviation: A measure of the ‘spread’ of the data

From our last lecture

Page 5: Statistics for Water Science: Hypothesis Testing: Fundamental concepts and a survey of methods Unite 5: Module 17, Lecture 2

Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s5

Tales of the normal distribution

Many kinds of data follow this symmetrical, bell-shaped curve, often called a Normal Distribution.

Normal distributions have statistical properties that allow us to predict the probability of getting a certain observation by chance.

-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0

Page 6: Statistics for Water Science: Hypothesis Testing: Fundamental concepts and a survey of methods Unite 5: Module 17, Lecture 2

Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s6

When sampling a variable, you are most likely to obtain values close to the mean 68% within 1 SD 95% within 2 SD

-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.0 1.0 0 1.0 2.0

Tales of the normal distribution

Page 7: Statistics for Water Science: Hypothesis Testing: Fundamental concepts and a survey of methods Unite 5: Module 17, Lecture 2

Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s7

Note that a couple values are outside the 95th (2 SD) interval These are improbable

Tales of the normal distribution

-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.0 1.0 0 1.0 2.0

Page 8: Statistics for Water Science: Hypothesis Testing: Fundamental concepts and a survey of methods Unite 5: Module 17, Lecture 2

Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s8

The essence of hypothesis testing: If an observation appears in one of the tails of a

distribution, there is a probability that it is not part of that population.

Tales of the normal distribution

-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.0 1.0 0 1.0 2.0

Page 9: Statistics for Water Science: Hypothesis Testing: Fundamental concepts and a survey of methods Unite 5: Module 17, Lecture 2

Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s9

“Significant Differences”

A difference is considered significant if the probability of getting that difference by random chance is very small.

P value: The probability of making an error by chance

Historically we use p < 0.05

Page 10: Statistics for Water Science: Hypothesis Testing: Fundamental concepts and a survey of methods Unite 5: Module 17, Lecture 2

Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s10

The magnitude of the effect A big difference is more likely to be significant

than a small one

The probability of detecting a significant difference is influenced by:

Page 11: Statistics for Water Science: Hypothesis Testing: Fundamental concepts and a survey of methods Unite 5: Module 17, Lecture 2

Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s11

The spread of the data If the Standard Deviation is low, it will be easier

to detect a significant difference

The probability of detecting a significant difference is influenced by:

Page 12: Statistics for Water Science: Hypothesis Testing: Fundamental concepts and a survey of methods Unite 5: Module 17, Lecture 2

Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s12

The number of observations Large samples more likely to detect a difference

than a small sample

The probability of detecting a significant difference is influenced by:

Page 13: Statistics for Water Science: Hypothesis Testing: Fundamental concepts and a survey of methods Unite 5: Module 17, Lecture 2

Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s13

Hypothesis testing

Hypothesis: A statement which can be proven false

Null hypothesis HO: “There is no difference”

Alternative hypothesis (HA): “There is a difference…”

In statistical testing, we try to “reject the null hypothesis” If the null hypothesis is false, it is likely that our

alternative hypothesis is true “False” – there is only a small probability that the results

we observed could have occurred by chance

Page 14: Statistics for Water Science: Hypothesis Testing: Fundamental concepts and a survey of methods Unite 5: Module 17, Lecture 2

Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s14

AlphaLevel

Reject Null Hypothesis

P > 0.05 Not significant No

P < 0.05 1 in 20 Significant Yes

P <0.01 1 in 100 Significant Yes

P < 0.001 1 in 1000Highly

SignificantYes

Common probability levels

Page 15: Statistics for Water Science: Hypothesis Testing: Fundamental concepts and a survey of methods Unite 5: Module 17, Lecture 2

Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s15

Accept Ho Reject Ho

Ho is True Correct DecisionType I Error

Alpha

Ho is FalseType II Error

BetaCorrect Decision

Types of statistical errors (you could be right, you could be wrong)

Page 16: Statistics for Water Science: Hypothesis Testing: Fundamental concepts and a survey of methods Unite 5: Module 17, Lecture 2

Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s16

-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.0 1.0 0 1.0 2.0

Type I Error

Type II Error

Examples of type I and type II errors

Page 17: Statistics for Water Science: Hypothesis Testing: Fundamental concepts and a survey of methods Unite 5: Module 17, Lecture 2

Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s17

Common statistical tests

Question Test

Does a single observation belong to a population of values? Z-test

Are two (or more populations) of number different? T-testF-test (ANOVA)

Is there a relationship between x and y Regression

Is there a trend in the data (special case of above Regression

Page 18: Statistics for Water Science: Hypothesis Testing: Fundamental concepts and a survey of methods Unite 5: Module 17, Lecture 2

Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s18

On June 26, 2002, a temperature probe reading at 7 m depth in Medicine Lake was 20.30 C. Is this unusually high for June?

Medicine Lake June 2002 Temp - 7 m

0

2

4

6

8

10

12

18.00

18.25

18.50

18.75

19.00

19.25

19.50

19.75

20.00

20.25

20.50

20.75

21.00

Temperature

# o

bse

rvat

ion

s

June Temp

Note: this is a “one-tailed test”, we just want to know if it’s high

We’re not asking if it is unusually low or high (2-tailed)

Does a single observation belong to a population of values: The Z-test

Page 19: Statistics for Water Science: Hypothesis Testing: Fundamental concepts and a survey of methods Unite 5: Module 17, Lecture 2

Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s19

The Z-distribution is a Normal Distribution, with special properties: Mean = 0 Variance = 1

Z = (observed value – mean)/standard error Standard error = standard deviation * sqrt(n)

The Z distribution

The z distribution: Standard normal distribution)

Page 20: Statistics for Water Science: Hypothesis Testing: Fundamental concepts and a survey of methods Unite 5: Module 17, Lecture 2

Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s20

Calculate the Z-score for the observed data Compare the Z score with the significant value

for a one tailed test (1.645)

Medicine Lake June 2002 Temp - 7 m

0

2

4

6

8

10

12

18.00

18.25

18.50

18.75

19.00

19.25

19.50

19.75

20.00

20.25

20.50

20.75

21.00

Temperature

# o

bse

rvat

ion

s

June Temp

Medicine lake example

Page 21: Statistics for Water Science: Hypothesis Testing: Fundamental concepts and a survey of methods Unite 5: Module 17, Lecture 2

Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s21

The Deep Math…

Since 6.89 > the critical Z value of 1.64 Our deep temperature is significantly higher than the

June average temperature. Further exploration shows that a storm the previous

day caused the warmer surface waters to mix into the deeper waters.

Z = (observed value – mean)/standard error

Standard error = standard deviation * sqrt(n)

Z = (20.3 – 19.7) 0.08 = 6.89

Page 22: Statistics for Water Science: Hypothesis Testing: Fundamental concepts and a survey of methods Unite 5: Module 17, Lecture 2

Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s22

Are two populations different: The t-test

Also called Student’s t-test. “Student” was a synonym for a statistician that worked for Guinness brewery

Useful for “small” samples (<30) One of the most basic statistical tests, can be

performed in Excel or any common statistical package

Page 23: Statistics for Water Science: Hypothesis Testing: Fundamental concepts and a survey of methods Unite 5: Module 17, Lecture 2

Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s23

Are two populations different: The t-test

Page 24: Statistics for Water Science: Hypothesis Testing: Fundamental concepts and a survey of methods Unite 5: Module 17, Lecture 2

Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s24

Are two populations different: The t-test

One of the most basic statistical tests, can be performed in Excel or any common statistical package

Same principle as Z-test – calculate a t value, and assess the probability of getting that value

Page 25: Statistics for Water Science: Hypothesis Testing: Fundamental concepts and a survey of methods Unite 5: Module 17, Lecture 2

Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s25

In Excel

Formula: @ttest(Pop1, Pop2, #Tails, TestType)

Tailed tests: 1 or 2 TestType

1 - paired (if there is a logical pairing of XY data)2 - equal variance3 - unequal variance

Test returns exact probability value

Page 26: Statistics for Water Science: Hypothesis Testing: Fundamental concepts and a survey of methods Unite 5: Module 17, Lecture 2

Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s26

@ttest(Pop1, Pop2, 1, 3) = 1.5 * 10-149

Example: 1-tailed temperature comparison

Page 27: Statistics for Water Science: Hypothesis Testing: Fundamental concepts and a survey of methods Unite 5: Module 17, Lecture 2

Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s27

ANOVA: Tests of multiple populations

ANOVA – analysis of variance Compare 2 or more populations

Surface temperatures for 3 lakes Can handle single or multiple factors

One way ANOVA – comparing lakes Two-way ANOVA – compare two factors

Temperature x Light effects on algal populations Repeated measures ANOVA – compare factors

over time

Page 28: Statistics for Water Science: Hypothesis Testing: Fundamental concepts and a survey of methods Unite 5: Module 17, Lecture 2

Developed by: Host Updated: Jan. 21, 2004 U5-m17b-s28

Next Time: Regression - Finding relationships among variables

Halsted Surface - August 1999

0

5

10

15

20

7.5 8 8.5 9 9.5

pH

Dis

solv

ed

Ox

yg

en

(p

pm

)

DO