Upload
felicia-hemby
View
219
Download
3
Tags:
Embed Size (px)
Citation preview
Application of Statistical Techniques to Interpretation
of Water Monitoring Data
Eric Smith, Golde Holtzman,
and Carl Zipper
OutlineI. Water quality data: program design (CEZ, 15 min)
II. Characteristics of water-quality data (CEZ, 15 min)
III. Describing water quality(GIH, 30 min)
IV. Data analysis for making decisions
A, Compliance with numerical standards (EPS, 45 min)
Dinner Break
B, Locational / temporal comparisons (“cause and effect”) (EPS, 45)
C, Detection of water-quality trends (GIH, 60 min)
III. Describing water quality(GIH, 30 min)
• Rivers and streams are an essential component of the biosphere
• Rivers are alive• Life is characterized by variation• Statistics is the science of variation• Statistical Thinking/Statistical Perspective • Thinking in terms of variation• Thinking in terms of distribution
The present problem is multivariate
• WATER QUALITY as a function of • TIME, under the influence of co-variates like• FLOW, at multiple • LOCATIONS
WQ variable versus time
Time in Years
Wat
er V
aria
ble
Bear Creek below Town of Wise STP
6.5
7
7.5
8
8.5
9
PH
1973/12/14 1978/12/14 1983/12/14 1988/12/14 1993/12/14
DATE
Univariate WQ Variable
Time
Wat
er Q
ual
ity
Univariate WQ Variable
Time
Wat
er Q
ual
ity
Wat
er Q
ual
ity
Water Quality
Wat
er Q
ual
ity
Water Quality
Wat
er Q
ual
ity
Wat
er Q
ual
ity
Wat
er Q
ual
ity
Wat
er Q
ual
ity
Wat
er Q
ual
ity
Wat
er Q
ual
ity
Wat
er Q
ual
ity
Univariate Perspective, Real Data (pH below STP)
6.5 7 7.5 8 8.5 9
6.5
7
7.5
8
8.5
9
The three most important pieces of information in a sample:
• Central Location– Mean, Median, Mode
• Dispersion– Range, Standard Deviation,
Inter Quartile Range
• Shape– Symmetry, skewness, kurtosis
Central Location: Sample Mean
• (Sum of all observations) / (sample size)• Center of gravity of the distribution• depends on each observation• therefore sensitive to outliers
Central Location: Sample Mean
• (Sum of all observations) / (sample size)• Center of gravity of the distribution• depends on each observation• therefore sensitive to outliers
Central Location: Sample Mean
• (Sum of all observations) / (sample size)• Center of gravity of the distribution• depends on each observation• therefore sensitive to outliers
Central Location: Sample Mean
• (Sum of all observations) / (sample size)• Center of gravity of the distribution• depends on each observation• therefore sensitive to outliers
Central Location: Sample Mean
• (Sum of all observations) / (sample size)• Center of gravity of the distribution• depends on each observation• therefore sensitive to outliers
Central Location: Sample Mean
• (Sum of all observations) / (sample size)• Center of gravity of the distribution• depends on each observation• therefore sensitive to outliers
Central Location: Sample Median• Center of the ordered array
• I.e., the (½)(n + 1) observation in the ordered array.
If sample size n is odd, then the
median is the middle value in the
ordered array.
Example A:
1, 1, 0, 2 , 3
Order:
0, 1, 1, 2, 3
n = 5, odd
(½)(n + 1) = 3
Median = 1
If sample size n is even, then the
median is the average of the two
middle values in the ordered array.
Example B:
1, 1, 0, 2, 3, 6
Order:
0, 1, 1, 2, 3, 6
n = 6, even,
(½)(n + 1) = 3.5
Median = (1 + 2)/2 = 1.5
Central Location: Sample Median
• Center of the ordered array• depends on the magnitude of the central
observations only• therefore NOT sensitive to outliers
Central Location: Sample Median
• Center of the ordered array• depends on the magnitude of the central
observations only• therefore NOT sensitive to outliers
Central Location: Sample Median
• Center of the ordered array• depends on the magnitude of the central
observations only• therefore NOT sensitive to outliers
Central Location: Sample Median
• Center of the ordered array• depends on the magnitude of the central
observations only• therefore NOT sensitive to outliers
Central Location: Sample Median
• Center of the ordered array• depends on the magnitude of the central
observations only• therefore NOT sensitive to outliers
Central Location: Sample Median
• Center of the ordered array• depends on the magnitude of the central
observations only• therefore NOT sensitive to outliers
Central Location: Mean vs. Median
• Mean is influenced by outliers• Median is robust against (resistant to) outliers• Mean “moves” toward outliers• Median represents bulk of observations almost always
Comparison of mean and median tells us about outliers
Dispersion
• Range• Standard Deviation• Inter-quartile Range
Dispersion: Range
• Maximum - Minimum
• Easy to calculate
• Easy to interpret
• Depends on sample size (biased)
• Therefore not good for statistical inference
Dispersion: Standard Deviation
1
2
n
YY-0 5
-1+1
SD = 10
0 5
-2+2
SD = 2
1 2
-1 1 3
Dispersion: Properties of SD• SD > 0 for all data
• SD = 0 if and only if all observations the same (no variation)
• For a normal distribution, – 68% expected within 1 SD,– 95% expected within 2 SD,– 99.6% expected within 3 SD,
• For any distribution, nearly all observations lie within 3 SD
Interpretation of SD
6.5 7 7.5 8 8.5 9
n = 200
SD = 0.41
Median = 7.6
Mean = 7.6
Quantiles, Five Number Summary, Boxplot
Maximum 4th quartile 100th percentile 1.00 quantile
3rd quartile 75th percentile 0.75 quantile
Median 2nd quartile 50th percentile 0.50 quantile
1st quartile 25th percentile 0.25 quantile
Minimum 0th quartile 0th percentile 0.00 quantile
Quantile Location and Quantiles
Quantile Rank Quantile Location Quartile
0.75 = 3/4
0.50 = 2/4
0.25 = 1/4
Example: 0, − 3.1, − 0.4, 0, 2.2, 5.1, 3.8, 3.8, 3.9, 2.3, n = 10
Value Rank
5.1 10
3.9 9
3.8 8
3.8 7
2.3 6
2.2 5
0 4
0 3
−0.4 2
−3.1 1
0.75 1 8.3n 3
3.8 3.93.85
2Q
0.50 1 5.5n 2
2.2 2.32.25
2Q
0.25 1 2.8n 1
0.4 00.2
2Q
Minimum = −3.1
Maximum = 5.1
5-Number Summary and Boxplot
Min Q1 Q2 Q3 Max
−3.10 −0.20 2.25 3.85 5.10
2 2.25Median Q
5.10 3.10 8.20Range Max Min
3 1 3.85 0.20 4.05IQR Q Q
Dispersion: IQRInter-Quartile Range
• (3rd Quartile - (1st Quartile)
• Robust against outliers
Interpretation of IQR
6.5 7 7.5 8 8.5 9
n = 200
SD = 0.41
Median = 7.6
Mean = 7.6
IQR = 0.54
For a Normal distribution, Median 2IQR includes 99.3%
Shape: Symmetry and Skewness
• Symmetry mean bilateral symmetry
Shape: Symmetry and Skewness
• Symmetry mean bilateral symmetry
• Positive Skewness (asymmetric “tail” in positive direction)
Shape: Symmetry and Skewness• “Symmetry” mean bilateral
symmetry, skewness = 0• Mean = Median (approximately)
• Positive Skewness (asymmetric “tail” in positive direction)
• Mean > Median
• Negative Skewness (asymmetric “tail” in negative direction)
• Mean < Median
Comparison of mean and median tells us about shape
6.5 7 7.5 8 8.5 9
6.5
7
7.5
8
8.5
9
Bear Creek below Town of Wise STP
6.5
7
7.5
8
8.5
9
Outlier Box Plot
Outliers
Whisker
Whisker
Median
75th %-tile = 3rd Quartile
25th %-tile = 1st Quartile
IQR
Wise, VA, below STP
6.5
7
7.5
8
8.5
9
0
2
4
6
8
1011
13
pH
TK
N m
g/l
Wise, VA below STP
10
20
30
40
50
60
70
80
90
100
110
120
130
0
5
10
15
20
25
DO
(%
sat
ur)
BO
D
(mg/
l)
0
1
2
3
4
5
Wise, VA below STPT
ot P
hosp
horo
us (
mg/
l
0
10000
20000
30000
40000
50000
60000
Fecal Coliforms