Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
THE IMPORTANCE OF VALIDATING SOURCE IDENTIFICATION RESULTS AND EVALUATING ALTERNATIVE HYPOTHESES IN A FORENSIC EVALUATION
Michael Bock ([email protected]) and
Lauren Brown (Ramboll, Portland Maine, USA)
OVERVIEW
• The use of statistical methods for source identification have gained wide acceptance
• Because of the complexity of the methods, their validity and accuracy are often not questioned
• Model testing and validation is often skipped or incomplete
• Formal hypotheses are either unstated or not developed
• When these steps are not rigorously followed and documented, incorrect conclusions are likely
TYPICAL WORKFLOW
Historical Investigation
Data Compilation
Descriptive Statistics
Multivariate Methods
(HCA/PCA)Number of
End MembersReceptor Modeling
Validation Interpretation Conclusions
• Inspired by a dioxin site
• Setting changed to a different harbor complex
• Chemicals of concern changed to PCBs
• Simulated PCB profiles based on Frame Aroclor data plus random variation
Any resemblance to a real site is purely coincidental
SIMULATED CASE STUDY
SIMULATED PCB EXAMPLEADAPTED FROM DIOXIN RESULTS FROM A DIFFERENT SITE
Not-
-15
-10
-5
0
5
-10 0 10PC1
PC
2TYPICAL SIMPLE PCB PCA ANALYSIS
1248 1260
1254
Asked to review previous assessment• PCA analysis of normalized
data• Consistent with three sources
• 1248• 1254• 1260
• No formal assessment of end members
• No formal assessment of individual profiles
• Limited historical investigation, assumed background provided by regulators was correct
SAMPLES ASSIGNED TO GROUPS BASED ON HCA
-15
-10
-5
0
5
-10 0 10PC1
PC
2
Cluster1
2
3
4
1248 1260
1254
SID-1
SID-10SID-11SID-12
SID-13
SID-14
SID-15
SID-16
SID-17
SID-18
SID-19
SID-2
SID-20
SID-21
SID-22
SID-23
SID-24
SID-25
SID-26
SID-27
SID-28
SID-29
SID-3
SID-30SID-31
SID-32
SID-33
SID-34SID-35
SID-36SID-37
SID-38
SID-39
SID-4
SID-40
SID-41
SID-42
SID-43SID-44SID-45
SID-5
SID-6SID-7
SID-8SID-9
1248
1260
1254
• Three known PCB sources: (1) 1248 (2) 1254 and (3)1260 sites
• Impacts from the 1260 source have migrated outside of the harbour
• Addition investigation outside of the harbour recommended to delineate the 1260 source
• Validated the calculations of the other expert, but is this an accurate assessment of sources?
INITIAL CONCLUSIONS
Not-
CLUSTER VARIABILITY 1
2
3
4
0.000
0.025
0.050
0.075
0.100
0.000
0.025
0.050
0.075
0.100
0.000
0.025
0.050
0.075
0.100
0.000
0.025
0.050
0.075
0.1001 3 4 6 7 8 9 13 15 16 17 18 19 20 22 25 26 27 28 31 32 33 37 40 41 42 43 44 45 46 47 48 49 51 52 53 55 56 57 59 60 63 64 66 67 70 71 72 74 75 76 77 81 82 83 84 85 86 87 88 89 91 92 93 94 95 96 97 99 10
110
210
310
510
911
011
411
511
711
811
912
212
312
412
512
812
913
013
113
213
313
413
513
613
713
813
914
114
414
614
714
915
115
315
415
615
715
816
316
416
616
717
017
117
217
317
417
517
617
717
817
918
018
318
518
718
919
019
119
319
419
519
619
719
819
920
020
120
220
320
520
620
720
8
variable
Mea
n
Cluster1
2
3
4
• Variability in Cluster 1 is higher than Clusters 2, 3, and 4
• Congeners 180, 187, 199, and 203 show more variability than most of the others
• Initial suggestion that the conceptual model and the statistical model may both be wrong
• Use PCA analysis to reduce the dimensionality of the dataset
• Retain the 1st 2-4 principal components
• Recalculate the original values
• Test GOF
• Confirm number of end members and confirm fit
DETERMINATION OF NUMBER OF END MEMBERS
TWO END MEMBERS
203 205 206 207 208
196 197 198 199 200 201 202
187 189 190 191 193 194 195
176 177 178 179 180 183 185
167 170 171 172 173 174 175
154 156 157 158 163 164 166
141 144 146 147 149 151 153
Original
Red
uced
2 End Members
THREE END MEMBERS
203 205 206 207 208
196 197 198 199 200 201 202
187 189 190 191 193 194 195
176 177 178 179 180 183 185
167 170 171 172 173 174 175
154 156 157 158 163 164 166
141 144 146 147 149 151 153
Original
Red
uced
3 End Members
FOUR END MEMBERS
• Begin with PCA/HCA
• Variability within clusters
• Frequency of detection
• Number of end members for receptor modeling
• Model versus measured fit by analyte
• ‘Remix’ the sample and calculate goodness of fit
• Model versus measured by analyte
• Model versus measured by sample (RMSE and by sample plots)
• Try to find flaws in the statistical model
• Revise/update model
• Retest
UNMIXING MODELING AND VALIDATION
COMPARISON OF MODELED AND MEASURED PROFILES
“Remixing” the samples
• Use modeled source profiles and source contributions to recalculate profiles
• Compare measured and modeled profiles
• Calculated the mean squared error for the 3 end member model
• Poor fit for the samples collected outside of the harbour
• Laboratory and data validation reports obtained and reviewed
• New historical records search initiated
GOODNESS OF FIT (3 END MEMBERS)
Samples with poor fit
Not-
• Compare the modeled end member profiles to each other
• Compare the modeled end member profiles to a source library
• Source profiles from the literature
• Source profiles based on upland data
• Interpretation of end members and source profiles
• Are the modeled end members represented in the source library?
• Can the unusual end members be simply explained?
• Revise and update the model if flaws are apparent?
• Source attribution
• Endmembers <> sources
SOURCE ATTRIBUTION VALIDATION
• Formal hypothesis
• H0: Site A is not a significant contributor to an end member
• HA: Site A cannot be ruled out as a significant contributor to an end member
• Receptor modeling cannot differentiate between similar source types, for example:
• The same PCB Aroclor could have been used at multiple sites
• Multiple waste incinerators could have released dioxins
• Nearby herbicide manufactures could have used the same dioxin production process or ingredients
• The first site you find that matches the end member is often not the only potential source
• Comprehensive historical record search is required
• Chemical processes and quantities
• Waste handling and discharge pathways
• The lack of records regarding usage and discharges is a data gap and not exculpatory
HYPOTHESIS TESTING
ADDITIONAL SOURCE INVESTIGATIONNEW PCBS SOURCES LOCATED: 1248 AND 1262 WERE USED AT TWO NEW SITES
Previously Unknown
Source:1248 Not-
1248
1260
1254 Previously Unknown
Source:1262
• Formal validation of the statistical model must be complete
• Developing the model should be an iterative process: refine -> test -> refine
• Hypotheses should be formally stated and tested
• A complete background investigation is needed
• Weight of evidence evaluation can be used to evaluate multiple lines of evidence
• Uncertainties and data gaps should be formally acknowledged
• Unforced errors can be avoided with careful analysis
CONCLUSIONS