21
THE IMPORTANCE OF VALIDATING SOURCE IDENTIFICATION RESULTS AND EVALUATING ALTERNATIVE HYPOTHESES IN A FORENSIC EVALUATION Michael Bock ([email protected]) and Lauren Brown (Ramboll, Portland Maine, USA)

THE IMPORTANCE OF VALIDATING SOURCE IDENTIFICATION …

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: THE IMPORTANCE OF VALIDATING SOURCE IDENTIFICATION …

THE IMPORTANCE OF VALIDATING SOURCE IDENTIFICATION RESULTS AND EVALUATING ALTERNATIVE HYPOTHESES IN A FORENSIC EVALUATION

Michael Bock ([email protected]) and

Lauren Brown (Ramboll, Portland Maine, USA)

Page 2: THE IMPORTANCE OF VALIDATING SOURCE IDENTIFICATION …

OVERVIEW

• The use of statistical methods for source identification have gained wide acceptance

• Because of the complexity of the methods, their validity and accuracy are often not questioned

• Model testing and validation is often skipped or incomplete

• Formal hypotheses are either unstated or not developed

• When these steps are not rigorously followed and documented, incorrect conclusions are likely

Page 3: THE IMPORTANCE OF VALIDATING SOURCE IDENTIFICATION …

TYPICAL WORKFLOW

Historical Investigation

Data Compilation

Descriptive Statistics

Multivariate Methods

(HCA/PCA)Number of

End MembersReceptor Modeling

Validation Interpretation Conclusions

Page 4: THE IMPORTANCE OF VALIDATING SOURCE IDENTIFICATION …

• Inspired by a dioxin site

• Setting changed to a different harbor complex

• Chemicals of concern changed to PCBs

• Simulated PCB profiles based on Frame Aroclor data plus random variation

Any resemblance to a real site is purely coincidental

SIMULATED CASE STUDY

Page 5: THE IMPORTANCE OF VALIDATING SOURCE IDENTIFICATION …

SIMULATED PCB EXAMPLEADAPTED FROM DIOXIN RESULTS FROM A DIFFERENT SITE

Not-

Page 6: THE IMPORTANCE OF VALIDATING SOURCE IDENTIFICATION …

-15

-10

-5

0

5

-10 0 10PC1

PC

2TYPICAL SIMPLE PCB PCA ANALYSIS

1248 1260

1254

Asked to review previous assessment• PCA analysis of normalized

data• Consistent with three sources

• 1248• 1254• 1260

• No formal assessment of end members

• No formal assessment of individual profiles

• Limited historical investigation, assumed background provided by regulators was correct

Page 7: THE IMPORTANCE OF VALIDATING SOURCE IDENTIFICATION …

SAMPLES ASSIGNED TO GROUPS BASED ON HCA

-15

-10

-5

0

5

-10 0 10PC1

PC

2

Cluster1

2

3

4

1248 1260

1254

SID-1

SID-10SID-11SID-12

SID-13

SID-14

SID-15

SID-16

SID-17

SID-18

SID-19

SID-2

SID-20

SID-21

SID-22

SID-23

SID-24

SID-25

SID-26

SID-27

SID-28

SID-29

SID-3

SID-30SID-31

SID-32

SID-33

SID-34SID-35

SID-36SID-37

SID-38

SID-39

SID-4

SID-40

SID-41

SID-42

SID-43SID-44SID-45

SID-5

SID-6SID-7

SID-8SID-9

Page 8: THE IMPORTANCE OF VALIDATING SOURCE IDENTIFICATION …

1248

1260

1254

• Three known PCB sources: (1) 1248 (2) 1254 and (3)1260 sites

• Impacts from the 1260 source have migrated outside of the harbour

• Addition investigation outside of the harbour recommended to delineate the 1260 source

• Validated the calculations of the other expert, but is this an accurate assessment of sources?

INITIAL CONCLUSIONS

Not-

Page 9: THE IMPORTANCE OF VALIDATING SOURCE IDENTIFICATION …

CLUSTER VARIABILITY 1

2

3

4

0.000

0.025

0.050

0.075

0.100

0.000

0.025

0.050

0.075

0.100

0.000

0.025

0.050

0.075

0.100

0.000

0.025

0.050

0.075

0.1001 3 4 6 7 8 9 13 15 16 17 18 19 20 22 25 26 27 28 31 32 33 37 40 41 42 43 44 45 46 47 48 49 51 52 53 55 56 57 59 60 63 64 66 67 70 71 72 74 75 76 77 81 82 83 84 85 86 87 88 89 91 92 93 94 95 96 97 99 10

110

210

310

510

911

011

411

511

711

811

912

212

312

412

512

812

913

013

113

213

313

413

513

613

713

813

914

114

414

614

714

915

115

315

415

615

715

816

316

416

616

717

017

117

217

317

417

517

617

717

817

918

018

318

518

718

919

019

119

319

419

519

619

719

819

920

020

120

220

320

520

620

720

8

variable

Mea

n

Cluster1

2

3

4

• Variability in Cluster 1 is higher than Clusters 2, 3, and 4

• Congeners 180, 187, 199, and 203 show more variability than most of the others

• Initial suggestion that the conceptual model and the statistical model may both be wrong

Page 10: THE IMPORTANCE OF VALIDATING SOURCE IDENTIFICATION …

• Use PCA analysis to reduce the dimensionality of the dataset

• Retain the 1st 2-4 principal components

• Recalculate the original values

• Test GOF

• Confirm number of end members and confirm fit

DETERMINATION OF NUMBER OF END MEMBERS

Page 11: THE IMPORTANCE OF VALIDATING SOURCE IDENTIFICATION …

TWO END MEMBERS

203 205 206 207 208

196 197 198 199 200 201 202

187 189 190 191 193 194 195

176 177 178 179 180 183 185

167 170 171 172 173 174 175

154 156 157 158 163 164 166

141 144 146 147 149 151 153

Original

Red

uced

2 End Members

Page 12: THE IMPORTANCE OF VALIDATING SOURCE IDENTIFICATION …

THREE END MEMBERS

203 205 206 207 208

196 197 198 199 200 201 202

187 189 190 191 193 194 195

176 177 178 179 180 183 185

167 170 171 172 173 174 175

154 156 157 158 163 164 166

141 144 146 147 149 151 153

Original

Red

uced

3 End Members

Page 13: THE IMPORTANCE OF VALIDATING SOURCE IDENTIFICATION …

FOUR END MEMBERS

Page 14: THE IMPORTANCE OF VALIDATING SOURCE IDENTIFICATION …

• Begin with PCA/HCA

• Variability within clusters

• Frequency of detection

• Number of end members for receptor modeling

• Model versus measured fit by analyte

• ‘Remix’ the sample and calculate goodness of fit

• Model versus measured by analyte

• Model versus measured by sample (RMSE and by sample plots)

• Try to find flaws in the statistical model

• Revise/update model

• Retest

UNMIXING MODELING AND VALIDATION

Page 15: THE IMPORTANCE OF VALIDATING SOURCE IDENTIFICATION …

COMPARISON OF MODELED AND MEASURED PROFILES

“Remixing” the samples

• Use modeled source profiles and source contributions to recalculate profiles

• Compare measured and modeled profiles

Page 16: THE IMPORTANCE OF VALIDATING SOURCE IDENTIFICATION …

• Calculated the mean squared error for the 3 end member model

• Poor fit for the samples collected outside of the harbour

• Laboratory and data validation reports obtained and reviewed

• New historical records search initiated

GOODNESS OF FIT (3 END MEMBERS)

Samples with poor fit

Not-

Page 17: THE IMPORTANCE OF VALIDATING SOURCE IDENTIFICATION …

• Compare the modeled end member profiles to each other

• Compare the modeled end member profiles to a source library

• Source profiles from the literature

• Source profiles based on upland data

• Interpretation of end members and source profiles

• Are the modeled end members represented in the source library?

• Can the unusual end members be simply explained?

• Revise and update the model if flaws are apparent?

• Source attribution

• Endmembers <> sources

SOURCE ATTRIBUTION VALIDATION

Page 18: THE IMPORTANCE OF VALIDATING SOURCE IDENTIFICATION …

• Formal hypothesis

• H0: Site A is not a significant contributor to an end member

• HA: Site A cannot be ruled out as a significant contributor to an end member

• Receptor modeling cannot differentiate between similar source types, for example:

• The same PCB Aroclor could have been used at multiple sites

• Multiple waste incinerators could have released dioxins

• Nearby herbicide manufactures could have used the same dioxin production process or ingredients

• The first site you find that matches the end member is often not the only potential source

• Comprehensive historical record search is required

• Chemical processes and quantities

• Waste handling and discharge pathways

• The lack of records regarding usage and discharges is a data gap and not exculpatory

HYPOTHESIS TESTING

Page 19: THE IMPORTANCE OF VALIDATING SOURCE IDENTIFICATION …

ADDITIONAL SOURCE INVESTIGATIONNEW PCBS SOURCES LOCATED: 1248 AND 1262 WERE USED AT TWO NEW SITES

Previously Unknown

Source:1248 Not-

1248

1260

1254 Previously Unknown

Source:1262

Page 20: THE IMPORTANCE OF VALIDATING SOURCE IDENTIFICATION …

• Formal validation of the statistical model must be complete

• Developing the model should be an iterative process: refine -> test -> refine

• Hypotheses should be formally stated and tested

• A complete background investigation is needed

• Weight of evidence evaluation can be used to evaluate multiple lines of evidence

• Uncertainties and data gaps should be formally acknowledged

• Unforced errors can be avoided with careful analysis

CONCLUSIONS

Page 21: THE IMPORTANCE OF VALIDATING SOURCE IDENTIFICATION …