Design of Experiments and Data Analysis. Let’s Work an Example Data obtained from MS Thesis...

Preview:

Citation preview

Design of Experiments and Data Analysis

Let’s Work an Example

• Data obtained from MS Thesis

• Studied the “bioavailability” of metals in sediment cores

• We’ll analyze chromium data

Pt. Mugu Marsh

Analytical Techniques

• Sediment samples were taken with cores• Sliced into 1 cm slices• Sediment in each slice was extracted using a

strong acid• Extracts were analyzed using an Inductively

Coupled Plasma Mass Spectrometer (ICP-MS)• Calibrations were also conducted• Surfaces areas (SA) and organic carbon (OC)

contents of sediment in each slice were also measured

Core processing

1-cm slices

Organic Carbon

Surface Areas

Tessier Extractions

Objectives

• To determine if there is a correlation between sediment surface area and organic carbon content

• To determine if there is a relationship between concentration of a specific metal and sediment SA and/or OC

• To determine if there is a relationship between or among metal concentrations

Example of Results

0 1 2 3

10

8

6

4

2

0

Dep

th (

cm)

Organic Carbon (%)

1.2 1.8 2.4

CC01

Surface Area (m2/g)

0.3 0.6 0.9 1.2

0.8 1.6 2.4

LM02

0.3 0.6 0.9 1.2

CC02

0.6 1.2 1.8

0.1 0.2 0.3

0.21 0.24

CC03

0 2 4

0 2 4 6 8

LM01

Example of Results

0.0

0.5

1.0

1.5

2.0

2.5

3.0

0 1 2 3 4 5 6 7

Surface Area (m2/g)

Org

an

ic C

arb

on

Co

nte

nt (

%) Slope = 0.39

R2 = 0.7

Data File

• Create a folder entitled “REU” in the C:\My Documents folder

• Create a folder entitled “2006” in this REU folder• Create a folder entitled “Data Analysis

Workshop” in this 2006 folder• Download Excel File REU_dataanalysis_data.xls

from instructional1.calstatela.edu/ckhachi into the Data Analysis Workshop folder

• Open the file

Data File Structure

• There should be 2 worksheets in the workbook:– Data: raw SA, OC, and metals concentration

data– Calibration Curves: ICP-MS calibration data

(relating raw metals concentrations to known calibration concentrations)

• Data for the cores are separated by yellow bands

Data File Structure

• Data Columns include:– ID: Random sample ID– Ave Depth: Ave depth of each slice– Solid Mass: Mass of sediments in each slice– Raw ICP-MS data for each of five metals

• Calibration Columns include:– Conc: Concentration of standards in parts per

billion (ppb)– ICP-MS responses for the 5 metals

Let’s Start with Calibration Curves

• Most instruments over reasonable ranges have linear responses (i.e., calibration curves are straight lines)

• We need to “model” the data – regression analysis to determine the best-fit line that relates ICP-MS response to concentrations

• We will then use these calibration equations to calculate concentrations for our samples

• Note: because we know that calibrations are usually linear, we will choose a linear regression model…if you don’t know the relationship b/w 2 variables, it sometimes helps to start with plots

Calibration Curve for Cr

• Linear response• We know slope and

intercept• R2 value provided• Best-fit line drawn

(looks good to me)• Not enough statistical

information provided to be able to conduct proper error analysis

y = 259.07x + 1787.3

R2 = 0.999

0.0000

2000.0000

4000.0000

6000.0000

8000.0000

10000.0000

12000.0000

14000.0000

16000.0000

0 10 20 30 40 50 60

Series1

Linear (Series1)

Regression Analysis for Cr

Rename Worksheet“Cr Analysis”

Assumptions

• On average, errors are not consistently positive nor negative.– Linear Model: yi = mx + b + ei, where ei is the error

associated with each observation– Line goes through the middle of data

• Variance of error terms the same across all observations

• Data are independent of each other• Error terms are normally distributed (not that

important)

Residual PlotResiduals Plot

-161.9482

0.0000

161.9482

323.8964

1 2 3 4 5 6 7

Observation

Re

sid

ua

l (g

rid

line

s =

SE

est

)

Look at data and linear fit carefully; points lie above the line for smaller values of concentration. If you delete the last point, you get a very different result

Regression Statistics

• Multiple R (or just r) is the correlation: – +1 perfectly positively correlated (as x goes

up, so does y)– 0 not correlated– -1 perfectly negatively correlated (as x

goes up, y goes down)

)y()x(

)yy)(xx(1n

1

r

n

1iii

Regression Statistics

• R Square (R2): coefficient of determination– Between 0 and 1

• 0 no linear relationship • 1 perfect linear relationship (+ or -)

– Square of the r value– Theoretically, as the number of data points ∞, R2

1 (denominator is fixed)

• Adjusted R Square: fixes this problem…is probably a better measure of how strong the linear relationship is (R2 more common)

• Use 2 or 3 significant figures to report these #s

Regression Statistics

• Standard Error: a measure of the amount of error in the prediction of y for an individual x.

• Observations: # of data points

ANOVA

• ANalysis Of VAriance (sometimes called an F test)

• df: degrees of freedom• SS: sum of squares

R2 = (1-SSresidual)/SStotal

• MS: Mean squares = SS/df

• F = MSregression/MSresidual larger reject null hypothesis (no correlation)

• Not very useful for single treatment

Correlation results• Linear Calibration: y = mx + b

– Slope (m) = 259.0709– Intercept (b) = 1787.2679

• Standard Error: used for hypothesis testing and confidence band formation

Correlation results• Confidence intervals

– Intercept• Lower: 1787.2679 – 70.2724 (2.571) = 1606.597• 2.571 standard two-tale t-test table with df = 5

and probability = 0.05

– Slope• Lower: 259.079 – 3.6280(2.571) = 249.74• Upper: 259.079 + 3.6280(2.571) = 268.40

• t stat: = Coefficient/Standard Error

Correlation results• P-value: probability of wrongly rejecting

the null hypothesis (Ho), in this case no correlation, if it is in fact true – p > 0.10 null hypothesis maybe OK– 0.10 < p < 0.05 slight evidence against null

hypothesis – p < 0.05 moderate evidence against null

hypothesis – p < 0.01 strong evidence against null

hypothesis

• Consult statistical tables again:– For df = 5 and t stat = 25.4, p < 0.000005– For df = 5 and t stat = 71.4, p < 0.0000001

• Very, very strong evidence that Ho is false the calibration curves are linear!

• Linear Model:

Correlation results

)27.70(27.1787ionConcentrat)63.3(07.259sponseRe

Using Calibration Equations

• Now we have an equation that relates the response of our equipment to concentrations

• Let’s use this equation to determine concentrations in our samples

Raw Data Excel Sheet

Measurement Errors

• Add 2 columns to the right of the Cr data• Assume instrument has a 3% error (in reality,

you need to run sample 3 times to get the proper error)

Propagation of Errors

• Let us assume that X is dependent upon the experimental variables p, q, and r, which fluctuate in a random and independent way.

• Addition or Subtraction: X = p + q - r:

• Where “s” is the standard deviation or error for each of the variables

2r

2q

2px ssss

Propagation of Errors (cont’d)

• Multiplication or Division: X = p * (q/r)

• Other equations exist for logs, etc.• Round +/- to the # of decimal places of the component

number with the fewest number of decimal places• Round x/÷ to the number of significant digits of the

component number with the fewest significant digits.

2

r

2

q

2

px

r

s

q

s

p

s

X

s

Let’s use the Calibration Eqn

• Response detector output

• Concentration what we are looking for in the column labeled “Cr Conc (ppb)”

)27.70(27.1787ionConcentrat)63.3(07.259sponseRe

Let’s use the Calibration Eqn

• Let’s look at the first line:

• Rearrange to solve for Conc:

• Let’s look at the numerator

70.27)1787.27( Concx 3.63)259.07( 244.69)8156.35(

x 3.63)259.07(

)27.071787.27(- 244.69)8156.35( Conc

• Num = 8156.35-1787.27 = 6369.08

• Error in Num:– Recall for +/-:

– Error in Conc =

• So now:

Let’s use the Calibration Eqn

2r

2q

2px ssss

)63.3259.07(

)27.071787.27(- 244.69)8156.35( Conc

63.307.259

58.25408.6369 Conc

244.692

70.272

254.58

• Conc =

• Recall, for x/÷: or

• So, ErrConc =

• Final result Conc = 24.58 ± 1.04

Let’s use the Calibration Eqn

2

r

2

q

2

px

r

s

q

s

p

s

X

s

63.307.259

58.25408.6369 Conc

6369.08

259.0724.58

2

r

2

q

2

px r

s

q

s

p

sXs

24.58254.58

6369.08

23.63

259.07

2

1.04

Final Results

• Use error bars in the plots

0

2

4

6

8

10

12

14

16

0 5 10 15 20 25 30

Chromium Concentration (ppb)

De

pth

(cm

)

Plotting Error Bars

• Error bars can be:– 1-3 standard deviation(s)– Standard error– etc…

• Just be clear in your figure caption what your error bar represents

Next Presentation

• A little about design of experiments

• A little more about errors, hypothesis testing, etc…

Recommended