Applied Problem Solving and Research Using Statistical...

Preview:

Citation preview

Applied Problem Solving andResearch Using StatisticalMethods with NIST Examples

Adam L. Pintar

September, 2016

Introduction

About Me

Grew up in Kansas

Education

Pittsburg State University: Mathematics

: Statistics

Family

3/71

About NIST

NIST’s mission is to promote U.S. innovation and industrialcompetitiveness by advancing measurement science,standards, and technology in ways that enhance economicsecurity and improve our quality of life.

NIST is the national metrology institute (NMI) for the UnitedStates. As an NMI, NIST

Maintains primary measurement standards for the seven baseunits in the SI system of units and for derived unitsOffers calibration services and measurement standards tosupport international tradeDevelops new measurement technologies

The Institute for National Measurement Standards of theNational Research Council is the Canadian counterpart

4/71

NIST Campus

5/71

About SED

Churchill Eisenhart was the firstChief

Originally Statistical EngineeringLaboratory

1947

Lola Deming was a foundingmember too

6/71

More History

W. J. Youden (Right)

J. Cameron (Left)

Graeco-Latin Square

Enamel on metal

7/71

Outline

Experiment Design

X-ray CT for detecting additive manufacturing (3D printing)defectsInteractive discussion (choosing factors and levels)Accelerated degradation of polymeric materials

Exploratory Data Analysis

Accelerated degradation of polymeric materialsVolumetric versus RECIST length tumor measurementsStandard reference material heterogeneity

Probability/Stochastic Modeling

Standard reference material heterogeneityDistribution of peak pressureStandard reference material valuesInteractive discussion (choosing prior distributions)

Multidisciplinary projects

Localizing leaks of geologically sequestered CO2

8/71

Experiment Design

X-ray CT for detecting additive manufacturing (3Dprinting) defects

Goal

Best practices for measuring void size

11/71

Question 1

What factors influence the measurements?

12/71

List of Factors – Version 1

CT acquisition

1. Voltage2. Current3. Filter type4. Filter thickness5. Magnification6. Frame rate7. Number of images per projection8. Detector type9. Pixel size

10. Scintillator type11. Scintillator thickness

Reconstruction

1. Algorithm2. Center of rotation3. Beam hardening correction4. Scattering correction

Artifact

1. Material2. Flaw size3. Flaw shape

Image processing

1. Smoothing filter2. Thresholding algorithm

20 factors!

13/71

Available Resources

Able to produce about 20 images

Need 220 ≈ 1m runs for a 2-level full factorial experiment

The closest we can get to 20 runs is a 220−15 fractionalfactorial

32 runsMain effects confounded with 2 factor interactionsNo estimate of pure error

14/71

List of Factors – Version 2

CT acquisition

1. Voltage (numeric)2. Current (numeric)3. Magnification (numeric)4. Frame rate (numeric)5. Number of images per projection (numeric)

Reconstruction

1. Algorithm (categorical)

15/71

Design

26−2 fractional factorial with 4“center” runs

Main effects not confoundedwith 2 factor interactions2 df pure error estimateSome information about 2factor interactions

## volt curr mag fr n_img alg

## 1 0 0 0 0 0 -1

## 2 0 0 0 0 0 1

## 3 -1 -1 1 1 1 1

## 4 1 1 -1 1 -1 1

## 5 1 -1 1 -1 -1 1

## 6 -1 -1 1 -1 1 -1

## 7 1 1 1 1 1 1

## 8 -1 1 -1 -1 1 1

## 9 1 -1 1 1 -1 -1

## 10 -1 -1 -1 -1 -1 -1

## 11 0 0 0 0 0 -1

## 12 1 1 1 -1 1 -1

## 13 1 -1 -1 -1 1 1

## 14 -1 1 1 1 -1 -1

## 15 -1 1 -1 1 1 -1

## 16 1 1 -1 -1 -1 -1

## 17 1 -1 -1 1 1 -1

## 18 -1 1 1 -1 -1 1

## 19 -1 -1 -1 1 -1 1

## 20 0 0 0 0 0 1

16/71

Computation

R is a language and environment for statistical computing andgraphics. . . .

17/71

2-level Fractional Factorial Designs in R

FrF2 function from the R package FrF2

factor_names <- c('volt', 'curr',

'mag', 'fr',

'n_img', 'alg')

my_design <- FrF2(factor.names = factor_names,

resolution = 4, ncenter = 4)

18/71

## volt curr mag fr n_img alg

## 1 0 0 0 0 0 0

## 2 0 0 0 0 0 0

## 3 -1 -1 1 -1 1 -1

## 4 1 1 -1 1 -1 1

## 5 -1 -1 -1 1 -1 1

## 6 1 -1 -1 1 1 -1

## 7 1 -1 1 -1 -1 1

## 8 1 1 1 -1 1 -1

## 9 -1 -1 -1 -1 -1 -1

## 10 1 -1 1 1 -1 -1

## 11 0 0 0 0 0 0

## 12 -1 1 -1 -1 1 1

## 13 -1 1 1 -1 -1 1

## 14 1 1 1 1 1 1

## 15 1 -1 -1 -1 1 1

## 16 -1 1 -1 1 1 -1

## 17 -1 1 1 1 -1 -1

## 18 1 1 -1 -1 -1 -1

## 19 -1 -1 1 1 1 1

## 20 0 0 0 0 0 0

## class=design, type= FrF2.center

19/71

design.info(my_design)$alias

## $legend

## [1] "A=volt" "B=curr" "C=mag" "D=fr" "E=n_img" "F=alg"

##

## $main

## character(0)

##

## $fi2

## [1] "AB=CE=DF" "AC=BE" "AD=BF" "AE=BC" "AF=BD" "CD=EF"

## [7] "CF=DE"

20/71

Interactive DiscussionChoosing Factor Levels

Accelerated Degradation of Polymeric Materials

Goal

Lab measurements

Field measurements

23/71

Two Experiments

Laboratory

Precisely control/measure a few factors

Field

Environmental variationMimic building movement

Focus on Laboratory

24/71

Lab Factors

Light intensity

Temperature

Humidity

Mechanical strain

25/71

Design

Light intensity – Fixed at max

Temperature – 4 levels

Humidity – Fixed

Mechanical strain – 4 levels

42 full factorial

26/71

0 5 10 15 20

2025

3035

4045

50

Mechanical Strain (%)

Tem

pera

ture

(C

)

27/71

Exploratory Data Analysis

Accelerated Degradation of Polymeric Materials

Raw Data

Temp: 21Strain: 0

0.0

0.4

0.8

1.2

0 20 40 60 80

Chamber 1Chamber 2Chamber 3Chamber 4

Temp: 21Strain: 5

Temp: 21Strain: 11

0 20 40 60 80

Temp: 21Strain: 21

Temp: 31Strain: 0

Temp: 31Strain: 5

Temp: 31Strain: 11

Temp: 31Strain: 21

0.0

0.4

0.8

1.2

Temp: 41Strain: 0

0.0

0.4

0.8

1.2

Temp: 41Strain: 5

Temp: 41Strain: 11

Temp: 41Strain: 21

Temp: 51Strain: 0

Temp: 51Strain: 5

0 20 40 60 80

Temp: 51Strain: 11

Temp: 51Strain: 21

0.0

0.4

0.8

1.2

0 20 40 60 80

Exposure Days

Mod

ulus

Rat

io

30/71

Volumetric versus RECIST Length TumorMeasurements

Experiment

32/71

Raw Data

Volume (cm3)

Mas

s (g

)

2 3 4 5 6

23

45

6

RECIST (mm)

20 25 30 35

23

45

6

Color differentiates diapers

Expect straight lines with positive slopes

33/71

Standard Reference Material Heterogeneity

SRMs

35/71

SRM Certificate

36/71

Coal Bottle-to-Bottle Differences (Heterogeneity)

2 4 6 8 10

3.2

3.4

3.6

3.8

4.0

4.2

4.4 Bromine

Bottle

mg/

kg

ObservationsGrand mean

37/71

Probabilistic/StochasticModelling

Standard Reference Material Heterogeneity

Bottle-to-Bottle Differences (Heterogeneity)

2 4 6 8 10

3.0

3.5

4.0

4.5

Bromine

Bottle

mg/

kg

ObservationsGrand meanGrand mean CIBottle mean CI

ANOVA p−value: 0.074

40/71

p-value

H0: Bottles all the sameHA: At least one bottle different

anova(lm(raw_data$mg_kg ~ factor(raw_data$bottle)))

## Analysis of Variance Table

##

## Response: raw_data$mg_kg

## Df Sum Sq Mean Sq F value Pr(>F)

## factor(raw_data$bottle) 9 1.62004 0.180005 2.6253 0.07438 .

## Residuals 10 0.68565 0.068565

## ---

## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

41/71

Power

Probability of concluding at least one bottle is different whenthat is the true state of nature

power.anova.test(groups = 10, n = 2,

between.var = c(0.5, 1, 1, 2), within.var = 1,

sig.level = c(0.01, 0.05, 0.15, 0.1) )

##

## Balanced one-way analysis of variance power calculation

##

## groups = 10

## n = 2

## between.var = 0.5, 1.0, 1.0, 2.0

## within.var = 1

## sig.level = 0.01, 0.05, 0.15, 0.10

## power = 0.08086632, 0.51463217, 0.77715565, 0.93728416

##

## NOTE: n is number in each group

42/71

Peak Pressure

Structural Design

44/71

Experimental Data

0 20 40 60 80 100

−3.

0−

2.0

−1.

00.

0

Time (s)

Pre

ssur

e

45/71

Data Processing

20 40 60 80 100

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Time (s)

Pre

ssur

e

46/71

Simulated Data

0 50 100 150 200

0.0

1.0

2.0

3.0

Pre

ssur

eOriginal Data Set

0 50 100 150 200

0.0

1.0

2.0

3.0

Pre

ssur

e

Fake Data Set #1

0 50 100 150 200

0.0

1.0

2.0

3.0

Time (s)

Pre

ssur

e

Fake Data Set #2

47/71

Distribution of the Peak

Distribution of the Peak Value

Peak Value

Den

sity

3.0 3.5 4.0 4.5

0.0

0.5

1.0

1.5

Mean

48/71

Uncertainty in the Distribution of the Peak

Distribution of the Peak Value

Peak Value

Den

sity

3.0 3.5 4.0 4.5

0.0

0.5

1.0

1.5

Mean

Bootstrap Replicates80% CI for the Mean

49/71

R Package – Reference

https://github.com/usnistgov/potMax

50/71

Standard Reference Material Values

Bayesian methods – high level

Two sources of information

DataSubject matter expertise (Prior)

Bayes rule tells us how to combine them

52/71

Hard Rock Mine Waste

Governor Basin, Colorado53/71

Ag (Silver) Raw Data

7075

80m

g/kg

ICPMS Lab 1 ICPOES ICPMS Lab 2 INAA

54/71

Posterior With Flat Prior

40 60 80 100 120

0.00

0.02

0.04

0.06

0.08

Ag (Silver)

mg/kg

Den

sity

95% interval [55, 104]55/71

The Problem

7075

80m

g/kg

ICPMS Lab 1 ICPOES ICPMS Lab 2 INAA

56/71

Prior Information

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

Logarithmic Units

Den

sity

InfomativeFlat

Quality control data (Log scale)57/71

Posterior With Informative Prior

60 70 80 90 100 110

0.00

0.02

0.04

0.06

0.08

0.10

Ag (Sliver)

mg/kg

Den

sity

95% interval [66, 83]58/71

Interactive DiscussionChoosing Prior Distributions

Multidisciplinary Projects

Goal

Detect and locate leaks of CO2 stored in geological formations

61/71

Test Site in Montana

The data that were analyzed come from a similar site in Ft.Wayne Indiana using the same equipment

62/71

Likelihood at Ft. Wayne

Grid is clearly visibleExpected no signal, but found a strong one

63/71

Collaborators

Zachary H. Levine (NIST)

Jeremy T. Dobler (Harris Corp.)

Nathan Blume (Harris Corp.)

Michael Braun (Harris Corp.)

T. Scott Zaccheo (Atmospheric and Environmental Research)

Timothy G. Pernini (Atmospheric and EnvironmentalResearch)

64/71

Reference

Levine, Z. H., Pintar, A. L., Dobler, J. T., Blume, N., Braun, M.,Zaccheo, T. S., Pernini, T. G., ”The Detection of Carbon DioxideLeaks Using Quasi-tomographic Laser Absorption SpectroscopyMeasurements in Variable Wind,” *Atmospheric MeasurementTechniques*, **9**, 1627–1636, 2016, http://www.atmos-meas-tech.net/9/1627/2016/amt-9-1627-2016.pdf.

65/71

Summary

Three main toolboxes

Experiment designExploratory data analysisStochastic/Probabilistic modelling

Toolboxes interact with each other

Multidisciplinary teams

66/71

Questions

67/71

Question One

Thank you Michelle T.

Synopsis

Purchase four physical standards for calibrating liquidchromatography instrumentsCombine the standards into a single new “secondary” standardHow to verify that the combination procedure does not changethe concentrations listed on the certificates

68/71

Simple Graphical Approach

Quality control checks when measuring candidate SRMs

810

1214

16

µg/k

g

Certificate Measurements

69/71

Formalization of the Simple Approach

H0: No differenceHA: Some difference

Review paper

Rukhin, A. L., “Assessing Compatibility of Two Laboratories:Formulations as a Statistical Hypothesis Testing Problem,”Metrologia, 50, 49–59 (2013).

Potential problem

Can find strong evidence of a difference but not strongevidence of equivalence

70/71

Second FormalizationH0: Difference is practically importantHA: Difference is not practically important

ReferenceAnderson-Cook, C. M. and Borror, C. M., “The DifferenceBetween “equivalent” and “not different,”” QualityEngineering, 28, 249–262 (2016).

Some modification necessary8

1012

1416

µg/k

g

Certificate Measurements

71/71

Recommended