42
Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Embed Size (px)

Citation preview

Page 1: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Application of Statistical Techniques to Interpretation

of Water Monitoring Data

Eric Smith, Golde Holtzman,

and Carl Zipper

Page 2: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

OutlineI. Water quality data: program design (CEZ, 15 min)

II. Characteristics of water-quality data (CEZ, 15 min)

III. Describing water quality(GIH, 30 min)

IV. Data analysis for making decisions

A, Compliance with numerical standards (EPS, 45 min)

Dinner Break

B, Locational / temporal comparisons (“cause and effect”) (EPS, 45)

C, Detection of water-quality trends (GIH, 60 min)

Page 3: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

III. Describing water quality(GIH, 30 min)

• Rivers and streams are an essential component of the biosphere

• Rivers are alive• Life is characterized by variation• Statistics is the science of variation• Statistical Thinking/Statistical Perspective • Thinking in terms of variation• Thinking in terms of distribution

Page 4: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

The present problem is multivariate

• WATER QUALITY as a function of • TIME, under the influence of co-variates like• FLOW, at multiple • LOCATIONS

Page 5: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

WQ variable versus time

Time in Years

Wat

er V

aria

ble

Page 6: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Bear Creek below Town of Wise STP

6.5

7

7.5

8

8.5

9

PH

1973/12/14 1978/12/14 1983/12/14 1988/12/14 1993/12/14

DATE

Page 7: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Univariate WQ Variable

Time

Wat

er Q

ual

ity

Page 8: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Univariate WQ Variable

Time

Wat

er Q

ual

ity

Wat

er Q

ual

ity

Water Quality

Wat

er Q

ual

ity

Water Quality

Wat

er Q

ual

ity

Wat

er Q

ual

ity

Wat

er Q

ual

ity

Wat

er Q

ual

ity

Wat

er Q

ual

ity

Wat

er Q

ual

ity

Wat

er Q

ual

ity

Page 9: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Univariate Perspective, Real Data (pH below STP)

6.5 7 7.5 8 8.5 9

6.5

7

7.5

8

8.5

9

Page 10: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

The three most important pieces of information in a sample:

• Central Location– Mean, Median, Mode

• Dispersion– Range, Standard Deviation,

Inter Quartile Range

• Shape– Symmetry, skewness, kurtosis

Page 11: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Central Location: Sample Mean

• (Sum of all observations) / (sample size)• Center of gravity of the distribution• depends on each observation• therefore sensitive to outliers

Page 12: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Central Location: Sample Mean

• (Sum of all observations) / (sample size)• Center of gravity of the distribution• depends on each observation• therefore sensitive to outliers

Page 13: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Central Location: Sample Mean

• (Sum of all observations) / (sample size)• Center of gravity of the distribution• depends on each observation• therefore sensitive to outliers

Page 14: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Central Location: Sample Mean

• (Sum of all observations) / (sample size)• Center of gravity of the distribution• depends on each observation• therefore sensitive to outliers

Page 15: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Central Location: Sample Mean

• (Sum of all observations) / (sample size)• Center of gravity of the distribution• depends on each observation• therefore sensitive to outliers

Page 16: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Central Location: Sample Mean

• (Sum of all observations) / (sample size)• Center of gravity of the distribution• depends on each observation• therefore sensitive to outliers

Page 17: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Central Location: Sample Median• Center of the ordered array

• I.e., the (½)(n + 1) observation in the ordered array.

If sample size n is odd, then the

median is the middle value in the

ordered array.

Example A:

1, 1, 0, 2 , 3

Order:

0, 1, 1, 2, 3

n = 5, odd

(½)(n + 1) = 3

Median = 1

If sample size n is even, then the

median is the average of the two

middle values in the ordered array.

Example B:

1, 1, 0, 2, 3, 6

Order:

0, 1, 1, 2, 3, 6

n = 6, even,

(½)(n + 1) = 3.5

Median = (1 + 2)/2 = 1.5

Page 18: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Central Location: Sample Median

• Center of the ordered array• depends on the magnitude of the central

observations only• therefore NOT sensitive to outliers

Page 19: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Central Location: Sample Median

• Center of the ordered array• depends on the magnitude of the central

observations only• therefore NOT sensitive to outliers

Page 20: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Central Location: Sample Median

• Center of the ordered array• depends on the magnitude of the central

observations only• therefore NOT sensitive to outliers

Page 21: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Central Location: Sample Median

• Center of the ordered array• depends on the magnitude of the central

observations only• therefore NOT sensitive to outliers

Page 22: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Central Location: Sample Median

• Center of the ordered array• depends on the magnitude of the central

observations only• therefore NOT sensitive to outliers

Page 23: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Central Location: Sample Median

• Center of the ordered array• depends on the magnitude of the central

observations only• therefore NOT sensitive to outliers

Page 24: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Central Location: Mean vs. Median

• Mean is influenced by outliers• Median is robust against (resistant to) outliers• Mean “moves” toward outliers• Median represents bulk of observations almost always

Comparison of mean and median tells us about outliers

Page 25: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Dispersion

• Range• Standard Deviation• Inter-quartile Range

Page 26: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Dispersion: Range

• Maximum - Minimum

• Easy to calculate

• Easy to interpret

• Depends on sample size (biased)

• Therefore not good for statistical inference

Page 27: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Dispersion: Standard Deviation

1

2

n

YY-0 5

-1+1

SD = 10

0 5

-2+2

SD = 2

1 2

-1 1 3

Page 28: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Dispersion: Properties of SD• SD > 0 for all data

• SD = 0 if and only if all observations the same (no variation)

• For a normal distribution, – 68% expected within 1 SD,– 95% expected within 2 SD,– 99.6% expected within 3 SD,

• For any distribution, nearly all observations lie within 3 SD

Page 29: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Interpretation of SD

6.5 7 7.5 8 8.5 9

n = 200

SD = 0.41

Median = 7.6

Mean = 7.6

Page 30: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Quantiles, Five Number Summary, Boxplot

Maximum 4th quartile 100th percentile 1.00 quantile

3rd quartile 75th percentile 0.75 quantile

Median 2nd quartile 50th percentile 0.50 quantile

1st quartile 25th percentile 0.25 quantile

Minimum 0th quartile 0th percentile 0.00 quantile

Page 31: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Quantile Location and Quantiles

Quantile Rank Quantile Location Quartile

0.75 = 3/4

0.50 = 2/4

0.25 = 1/4

Example: 0, − 3.1, − 0.4, 0, 2.2, 5.1, 3.8, 3.8, 3.9, 2.3, n = 10

Value Rank

5.1 10

3.9 9

3.8 8

3.8 7

2.3 6

2.2 5

0 4

0 3

−0.4 2

−3.1 1

0.75 1 8.3n 3

3.8 3.93.85

2Q

0.50 1 5.5n 2

2.2 2.32.25

2Q

0.25 1 2.8n 1

0.4 00.2

2Q

Minimum = −3.1

Maximum = 5.1

Page 32: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

5-Number Summary and Boxplot

Min Q1 Q2 Q3 Max

−3.10 −0.20 2.25 3.85 5.10

2 2.25Median Q

5.10 3.10 8.20Range Max Min

3 1 3.85 0.20 4.05IQR Q Q

Page 33: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Dispersion: IQRInter-Quartile Range

• (3rd Quartile - (1st Quartile)

• Robust against outliers

Page 34: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Interpretation of IQR

6.5 7 7.5 8 8.5 9

n = 200

SD = 0.41

Median = 7.6

Mean = 7.6

IQR = 0.54

For a Normal distribution, Median 2IQR includes 99.3%

Page 35: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Shape: Symmetry and Skewness

• Symmetry mean bilateral symmetry

Page 36: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Shape: Symmetry and Skewness

• Symmetry mean bilateral symmetry

• Positive Skewness (asymmetric “tail” in positive direction)

Page 37: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Shape: Symmetry and Skewness• “Symmetry” mean bilateral

symmetry, skewness = 0• Mean = Median (approximately)

• Positive Skewness (asymmetric “tail” in positive direction)

• Mean > Median

• Negative Skewness (asymmetric “tail” in negative direction)

• Mean < Median

Comparison of mean and median tells us about shape

Page 38: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

6.5 7 7.5 8 8.5 9

6.5

7

7.5

8

8.5

9

Bear Creek below Town of Wise STP

Page 39: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

6.5

7

7.5

8

8.5

9

Outlier Box Plot

Outliers

Whisker

Whisker

Median

75th %-tile = 3rd Quartile

25th %-tile = 1st Quartile

IQR

Page 40: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Wise, VA, below STP

6.5

7

7.5

8

8.5

9

0

2

4

6

8

1011

13

pH

TK

N m

g/l

Page 41: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Wise, VA below STP

10

20

30

40

50

60

70

80

90

100

110

120

130

0

5

10

15

20

25

DO

(%

sat

ur)

BO

D

(mg/

l)

Page 42: Application of Statistical Techniques to Interpretation of Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

0

1

2

3

4

5

Wise, VA below STPT

ot P

hosp

horo

us (

mg/

l

0

10000

20000

30000

40000

50000

60000

Fecal Coliforms