Statistics for Anal. Chem.- Lecture Notes- Xu Ly So Lieu

Preview:

Citation preview

Statistics for Analytical Statistics for Analytical ChemistryChemistry

Lecture NotesLecture Notes

Dr. Ta Thi ThaoDr. Ta Thi Thao

Syllabus( 2 credits)

Introduction of Analytical process Chapter 1: Error in analytical chemistry Chapter 2: Descriptive statistics Chapter 3: Basic Distributions Chapter 4: Significant Test Chapter 5: ANOVA Chapter 6: Correlation and Regression Chapter 7: QA/QC Software: EXCEL, ORIGIN, MINITAB,

MATLAB, STATGRAPHICS, SPSS…

What is analytical chemistry?• Almost all chemists routinely make qualitative or

quantitative measurements.• Analytical chemistry is not a separate branch of

chemistry, but simply the application of chemical knowledge.

The craft of analytical chemistry is not in performing a routine analysis on a routine sample but in improving established methods, extending existing methods to new types of samples, and developing new methods for measuring chemical phenomena.

The analytical process1. Definition problems

2. Chose methods

3. Sampling

4. Sample preparation

5. Chemical Separation 6. Analysis

7. Data processing and Report Results

The analytical process (cont.)1. Define the problem What need to be found? Qualitative or quantitative? What will the information be used for? Who will

be used? When will it be needed? How accurate and precise does it have to be? What is the fund? The analysts should consult with the clients to

plan the useful and efficient analysis, including how to obtain a useful sample.

The analytical process (cont.)2. Choose methods Sample type; - Size of sample; Sample preparation needed; Concentration and range (sensitivity needed) Selectivity needed (interferences) Accuracy and precision needed Tools/ instruments available Expertise/ experience Cost - Speed Does it need to be automated? Are method available in the chemical literature? Are standard methods available?

The analytical process (cont.)3. Sampling Sample type Representative/ random sample Sample size Minimum sample number Sampling statistics/ error

The analytical process (cont.)4. Sample preparation Samples are solid, liquid or gas? Dissolve? Ash or digest? Chemical separation or masking of

interferences needed? Need to be concentrate the analysts? Need to change analyst for detection? Need to adjust solution conditions ( pH, add

reagent…)

The analytical process (cont.)

5. Chemical Separation if necessary

Distillation

Precipitation

Solvent Extraction

Solid phase extraction

Chromatography

Electrophoresis

May be done as part of the measurement step

The analytical process (cont.)6. Analysis- Calibration- Validation/controls/ blanks- Replicates

7. Data processing and Report ResultsStatistical analysis

Report the results

Signal Instrumental MethodsEmission of radiation Emission spectroscopy (X-ray, UV, visible, electron,

Auger); fluorescence, phosphorescence, and luminescence (X-ray, UV, and visible)

Absorption of radiation Spectrophotometry and photometry (X-ray, UV, visible, IR); photoacoustic spectroscopy; nuclear magnetic resonance and electron spin resonance spectroscopy

Scattering of radiation Turbidimetry;nephelometry; Raman spectroscopyRefraction of radiation Refractometry; interferometry

Diffraction of radiation X-ray and electron diffraction methods

Rotation of radiation Polarimetry; optical rotatry dispersion;circular dichroism

Electrical potential Potentiometry; chronopotentiometryElectric charge Coulometry

Electric current Polarography; amperometryElectrical resistance ConductometryMass-to-charge ratio

Rate of reactionMass spectrometry

Kinetic methodsThermal properties Thermal conductivity and enthalpy methodsRadioactivity Activation and isotope dilution methods

Types of Instrumental Methods

Comparison of Different analytical methods

Method Approx. range

( mol/L)

Approxprecision (%)

Selectivity

Speed Cost Principle uses

Gravimetry

Titrimetry

Potentionmetry

Electrogravimetry,

coulometry

Voltammetry

Spectrophotometry

Fluorometry

Atomic spectrometry

Chromatography

Kinetic methods

10-1-10-2

10-1-10-4

10-1 -10-6

10-1-10-4

10-3 -10-10

10-3-10-6

10-6-10-9

10-3 -10-9

10-3 -10-9

10-2-10-10

0.1

0.1-1

2

0.01-2

2-5

2

2-5

2-10

2-5

2-10

Poor- mod.

Poor- mod.

Good

Moderate

Good

Good-mod.

Moderate

Good

Good

Good-Mod.

Slow

Mod.

Fast

Slow-mod.

Moderate

Fast- mod.

Moderate

Fast

Fast-Mod.

Fast- Mod.

Low

Low

Low

Mod.

Mod.

Low- Mod.

Mod.

Mod.- High

Mod.-high

Mod.

Inorg.

Inorg., Org.

Inorg.

Inorg., org.

Inorg., org.

Inorg., org.

Org.

Inorg- Multiele.

Org. Multicom.

Inorg.,org,enzyme

Validation of a method Precision must be checked by analyze replicate

samples Accurate result must be obtained by:+ use proper calibration= Analyses spiked sample+ compare the sample’s results with the those

obtained with another accepted method+ analyze the standard reference material of

known composition+ Run control sample at least daily To assure validation method, apply the

guidelines of good laboratory practice (GLP)

The Laboratory Notebook Use to record your job in a analytical chemist,

documents everything you do. Some good rules:+ Use a hardcover notebook+Number pages consecutively+ Record only in ink+ Never tear out pages.+ Date each page, sign it and have it signed by

someone else.+ Record the name of the project, why it being

done, and any literature references+ Record all data on the day you obtain it.

The Laboratory Notebook An example of laboratory notebook:

+Date of experiment

+ Name of experiment

+ Principle

+ Reaction for determination:

+ Work of standardization of preparation of chemicals, reagents…

+ The way to calculate; raw data of experiment, calculate the average and Standard deviation.

+ The final result.

Chapter 1:

Error in Anal. Chem.

1. Error

2. Absolute and Relative error

3. Systematic and random error

4. Outliers and accumulative error

5. Repeatability, reproducibility

6. Precision and accuracy* Every measurement that is made is subject to a number of errors. If you cannot measure it, you cannot know it.

A. Einstein

absolute error X = X=

= measured value – true value

EA= x – relative error = x = X / X

percent relative error = x x 100 (%)

Absolute and Relative Error

Random Error (indeterminate error)

– Cannot be determined (no control over)– A result of fluctuations (+ and -) in random variables– Multiple trials help to minimize

Random errors can be reduced by:- Better experiments (equipment, methodology, training of

analyst)- Large number of replicate samples

• Random errors show Gaussian distribution for a large number of replicates

• Can be described using statistical parameters

Systematic Error (determinate error)

• Known cause: - Operator- Calibration of glassware, sensor, or instrument a result of

a bias in one direction

– A result of a bias in one direction ( + or -)– When determined can be corrected– May be of a constant or proportional nature

To detect a systematic error:

• Use Standard Reference Materials• Run a blank sample• Use different analytical methods• Participate in “round robin” experiments (different labs and

people running the same analysis)

Types of ErrorProportional error influences the slope.

Constant error influences the intercept.

If the nature of the error is not known (random or systematic?) then the following rules will apply:

Accumulate ErrorAccumulate Error

Addition and subtractions

When adding or subtracting measurements the absolute errors are added.

Example 1:

X X

mass of beaker plus sample 21.1184 g 0.0003 g

mass of empty beaker 15.8465 g 0.0003 g

mass of sample 5.2719 g 0.0006 g (errors added !)

(21.1184 0.0003) g – (15.8465 0.0003) g = (5.2719 0.0006) g

Multiplication and division

When multiplying or dividing measurements the relative errors are added.

Consequently the absolute errors of the measurements must first be converted to relativeerrors.

Example 1:

A = (1.56 0.04) cm, A = 0.04 cm A = 0.04 cm / 1.56 cm = 0.0256

B = (15.8 0.2) cm2, B = 0.2 cm2 B = 0.2 cm2 / 15.8 cm2 = 0.0127

Product of A and B: AB = (1.56 cm)(15.8 cm2) = 24.648 cm3 = 24.6 cm3 to 3 SF

Adding relative errors: AB = A + B = 0.0256 + 0.0127 = 0.0383 = 0.04

The % relative error in the product AB is therefore = 4 %

Sampling

Preparation

Analysis

Representativesample

homogeneousvs.

heterogeneousLoss

Contamination(unwanted addition)

Measurement of Analyte

Calibration of Instrument orStandardsolutions

How about sampling a chocolate chip cookie?

1. Static Error 2. Dynamic Error 3. Insertion and Loading Errors4. Instrument Error 5. Human Error6. Theoretical Error 7. Miscellaneous Error

Repeatability, reproducibility

The closeness of agreement between independent results obtained with the same

method on identical test material,

• under the same conditions (same operator, same apparatus, same laboratory and after short intervals of time) (repeatability).

• under different conditions (different operators, different apparatus, different laboratories and/or after different intervals of time) (reproducibility).

Accuracy and Precision

True value – standard or reference of known value or a theoretical value

Accuracy – closeness to the true value

Precision – reproducibility or agreement with each other for multiple trials.

Accuracy------ Precision • Only obtained if

measured values agree with true values

• Must reduce systemic & random error to improve accuracy

• Always requires the use or comparison to a known standard

• Describes the range of spread of the individual measurements from the average value for the series

• Describes the reproducibility of the measurement

• Improves with reduction in random error

Exercise 1

Fig. 1:

Exercise 2

Exercise 3

Exercise 4

Exercise 5

What kind of error?

Chapter 2:

Descriptive statistics

• How do you assess the total error?

- One way to assess total error is to treat a reference standard as a sample.

- The reference standard would be carried through the entire process to see how close the results are to the reference value.

Accuracy and Precision

Nature of accuracy and precision

Both accurate and precise

Precise only Neither accurate nor precise

Mathematical comments

•Small standard deviation or %CV•Small %error

•Small standard deviation or %CV•Large %error

•Large

standard deviation or %CV•Large %error

The center ofthe target isthe true value.

Scientific comments

Very small error in measurement

All cluster the true value

Remember a standard or true value is needed

Clustered multiple measurements but consistently off from true value

Calibration of probe or other measuring device is off or unknown systematic error

The shot-gun effect

Get a new measurement system or operator

Both a & p Precision only Neither a nor p

Expressing accuracy and precision

• Mean (average)

• Percent error

• Range

• Deviation

• Standard deviation

• Percent coefficient of variationprecision

accuracy

(See also in chapter 3)

Population vs. sample

• Population = the entire collection of itemse.g. all 100 mg vitamin C tablets produced

Sample = a portion of the population

e.g. a bottle of vitamin C pillsGenerally only data for samples is available

since it is generally impossible to obtain data for the whole population

Standard Deviation of the…

• Population

Actual variation in the population

• Sample – part of population

Estimatesthe variation

in the population- May not be

representative sample

sx x

N

xx

NN

ii

N

i

ii

N

i

N

_ 2

1

2 1

2

1

1 1

N

xxN

ii

1

2

Why divide by N-1 when calculating “s”?

• N-1 = degrees of freedom (Df) of sample– number of independent values on which a

result is based, or the number of values in the final calculation of a statistic that are free to vary

– for a population Df = N– for a sample Df = N-1

• one Df lost when calculating the Average of a sample

More on Dfs

To calculate the std. dev. of a random sample, we must first calculate

the mean of that sample and then compute the sum of the several

squared deviations from that mean.

While there will be n such squared deviations only (n - 1) of them are, in fact, free to assume any value whatsoever.

This is because the final squared deviation from the mean must include the one value of X such that the sum of all the Xs divided by n will equal the obtained mean of the sample.

All of the other (n - 1) squared deviations from the mean can, theoretically, have any values whatsoever.

For these reasons, std. dev. of a sample is said to have only (n - 1) degrees of freedom.

Population DataFor an infinite set of data,

N N → → ∞∞ then x x → → µµ and s s → → σσ

population mean population std. dev.

The experiment that produces a small standard deviation is more precise .

Remember, greater precision does not imply greater accuracy.

Experimental results are commonly expressed in the form:

mean standard deviation

sx

_

Standard deviation of the mean (standard error)

• When the standard deviation of several mean values is taken, the amount of deviation between the mean values will be reduced by a factor proportional to the square root of the number of data points (N) present in each set used to calculate each mean value

• s = standard deviation between individual values

• sm = standard deviation between mean values

ss

Nm

• Variance

• Relative standard deviation

• Percent RSD / coefficient of variation

x

sRSD

Other ways of expressing the precision of the data:

Variance = s2

100x

s%RSD

Box and whisker plot on Minitab 14

median

range

Large variation

Small variatio

n

outlies

The same rules apply to calculations involving standard deviations (assuming the standard deviation is due onlyto random errors)

If the nature of the errors are If the nature of the errors are ALL KNOWN TO BE ALL KNOWN TO BE RANDOMRANDOM ERRORS ERRORS

then the following set of rules will applythen the following set of rules will apply

Significant Figures• The number of digits reported in a measurement reflect the

accuracy of the measurement and the precision of the measuring device.

• The results are reported to the fewest significant figures (for multiplication and division) or fewest decimal places (addition and subtraction.

• Significant Figures1. Digits 1 6 9…2. Zeros between Significant Digits3. Terminal Zeros to Right of Decimal4. Terminal Zeros to Left of Decimal

(two thoughts)5. Place holding zeros

Except log x = 0.025

Rounding off• When the answer to a calculation contains too

many significant figures, it must be rounded off. This approach to rounding off is summarized as

follows. If the digit is smaller than 5, drop this digit and

leave the remaining number unchanged. Thus, 1.684 becomes 1.68.

If the digit is 5 or larger, drop this digit and add 1 to the preceding digit. Thus, 1.247 becomes 1.25.

If the last digit is 5, the number is rouded off to the nearest even digist

Methods of Expressing Uncertainty in Results

A. Three methods 1. Record Uncertainty (Absolute) 2. Record Relative Uncertainty in % 3. Use of Significant Digits + Record all accurately known digits + a digit that is uncertain

B. Method assumes that the last digit recorded is uncertain by 1 unless stated differently

Examples of presentation the data

Weight Measured 9.82 ± 0.2385 g = 9.82 ± 0.02 g

6051.78 ± 30 m/s = 6050 ± 30 m/s

For stating uncertainty:

-Round uncertainty to one significant figure….unless δx has a 1 as a leading digit δx = 0.14 then δx =0.14 not 0.1

For calculation though you should retain one significant more than justified

Chapter 3: Basic Distribution What is a Distribution? The pattern of variation of a variable is

called its distribution, which can be described both mathematically and graphically. In essence, the distribution records all

possible numerical values of a variable and how often each value occurs (its frequency). Can be either discrete or continuous

Which statistical test is appropriate will depend upon the distribution of your data.

From: http://stat.tamu.edu/stat30x/notes/node16.html

Types of Distributions Binomial

Distribution Normal

Distribution Poisson

Distribution Exponential

Distribution Logistic

Distribution t-Distribution

Chi-squared Distribution

F-Distribution Gamma Distribution Hypergeometric Laplace Distribution

Note that distributions can be either discrete or continuous

Binomial Distribution Graphic

From http://mathworld.wolfram.com/BinomialDistribution.html

For a large number of experiment replicates the results For a large number of experiment replicates the results approach an ideal smooth curve called theapproach an ideal smooth curve called the GAUSSIAN GAUSSIAN oror NORMAL DISTRIBUTION CURVENORMAL DISTRIBUTION CURVE

Characterised by:

The mean value The mean value xx

gives the center of the distribution

The standard The standard deviation sdeviation s

measures the width of the distribution

22 /2)(xe2

1y σμ

πσ

The Gaussian curve whose area is unity is called a normal error curve.

µ = 0 and µ = 0 and σσ = 1 = 1

The Gaussian curve equation:

πσ 2

1= Normalization factor

It guarantees that the area under the curve is unity

Probability of measuring a value in a certain range = area below the graph of that range

Gaussian Distribution of Random Errors (Population)Gaussian Distribution of Random Errors (Population)

Another way to represent a Gaussian distribution is to relate it to a new variable, z, on the x-axis

s

xxxz

_

σμ

Where

z = deviation from the mean of a data point stated in terms of units of std dev.

Gaussian Distribution of Random ErrorsGaussian Distribution of Random Errors

Range Percentage of measurements

µ ± 1σ 68.3

µ ± 2σ 95.5

µ ± 3σ 99.7

The more times you measure, the more confident you are that your average value is approaching the “true” value.

The uncertainty decreases in proportion to n1/

The standard deviation measures the width of the Gaussian curve.

(The larger the value of σ, the broader the curve)

Normal distribution

Data Transformation

What do you do if your data is not normally distributed? Use a non-parametric test Transform your data

Logarithmic transformation: Variable x log (Variable x +1) Power transformation: e.g. Variable x √(Variable x) Angular transformation: e.g. Variable x arcsine (√(Variable x))

Poisson Distribution Typically used to model the number of

random occurrences of some phenomenon in a specified unit of space or time. E.g. The number of birds seen in a 10 min

period Can usually be approximated by a normal

distribution

Exponential DistributionDescribes a sample

where y= x^a

Messy to work with, but can be transformed (sometimes) or you can use a non-parametric test

Logistic Distribution Typically describes

sample that fits y = log (x) Again, messy to

work with (sometimes) but can be transformed or you can use a non-parametric test

T-distributions

Normal distribution

- As N (DF) increases t-distribution is less spread out.- As N (DF) increases t-distribution is less spread out.- At large N t-distribution approaches shape of - At large N t-distribution approaches shape of Gaussian Gaussian distribution.distribution.

T-distribution ( 1-sided)

F-Distribution A distribution that

typically arises when testing whether two variables have the same variance

It’s the ratio of two independent chi-squared statistics

ANOVAs are based on F distributions

Chi-squared Distribution This is also based upon degrees of

freedom Can be used to approximate many

different distributions For example, may be used to approximate the

sampling distribution of the likelihood ratio statistic (may cover this later)

Chi-square Distribution examples

Estimating Random Error The random error (x)in a set of data can

be estimated by multiplying Sm by a statistical function called the student-t distribution function

x t st s

Np v m

p v ,,

Confidence intervals X at a given confidence level (say 95%)

implies that the true value will be found within X of the calculated mean

xt s

Np v,

xt s

Nx

t s

Np v p v

, ,

Chapter 4: Significant Test

• Hypothesis:

• F-test compares levels of PRECISION

• T-test compares levels of ACCURACY

Rearranging Student’s t equation:

µ = true population mean

x = measured mean

n = number of samples needed

s2 = variance of the sampling operation

e = sought-for uncertainty

Required number of

replicate analyses:

e

n

tsx 2

22

e

stn

Since degrees of freedom is not known at this stage, the value of t=1.96 for n → ∞ is used to estimate n.

The process is then repeated a few times until a constant value for n is found.

How many samples/replicates to analyze?

Comparing a mean value to the true value (1t)

• Calculate a “t” value as shown below

• Compare to a value of t in a t-table at the appropriate confidence level and DF

• If the tcalc > ttable the two results are significantly different

tx N

scalc ( )

xts

N

xts

N

x N

st

Comparing two sets of data• Comparison of means (T-test) (2t)

– unpaired data• samples from the same population

» e.g. comparing the results for the analysis water samples performed by two different labs (water samples from the same population)

– paired data• samples from different populations

» e.g. comparing cholesterol levels in different individuals using two analytical methods

• Comparison of variances (F-test)– unpaired data only

Comparison of variances (F-test)

-Calculate F :

-Compare Fcal. with Ftable

-If Fcal.> Ftable (2-tailed-test) then s1 and s2 are statistically comparable. • S1and S2 are significant differences

(Pvalue<-level=0.05)

122

21

. S

SFcal

Which type of t test should be use

Comparing two means (unpaired data)

• Textbook method: 1. Comparison of variances

2. Comparison of mean

• if S1 and S2 are not significantly different • once tcalc is determined compare to ttable ttable determined for f= n1+n2-2 • if tcalc > ttable then difference is significant(Pvalue>-level=0.05)

ss n s n

n npooled

12

1 22

2

1 2

1 1

2

( ) ( )

tx x

s

n n

n ncalcpooled

1 21 2

1 2

Comparing two means (unpaired data) (cont.)

• Textbook method– if s1 and s2 are NOT statistically

comparable (F-test fails) or S1 and S2 are significantly different.

– tcalc and DF for ttable need to be determined as follows:

if tcalc > ttable then difference is significant

(Pvalue<-level=0.05)

tx x

sn

sn

calc

1 2

12

1

22

2

DF

sn

sn

sn

n

sn

n

12

1

22

2

2

12

1

2

1

22

2

2

21 1

2

Comparing two means (unpaired data) (cont.)

• Mosi method– Calculate the confidence interval for each mean– Compare the confidence intervals

– The results are statistically comparable if the intervals overlap such that each interval overlaps with the mean value of the other interval as shown in the diagram below.

Comparing two means (paired data)

• Calculate differences between PAIRS of data: di=xAi-xBi

– values can be either + or -• do not take absolute values of differences!

• Calculate average and standard deviation of differences (sd)

• Calculate a t- value as shown below

• If the tcalc > ttable the two results are significantly different (f = N-1)

td N

scalcd

( )

d

1n

)d(ds

2i

d

Chapter 5: ANOVA (analysis of variance)

• t distribution: to test the hypothesis of no difference between two population/sample means.

• If we wish to know about the relative effect of three or more different “treatments”, t distribution can be used?.

• The t-test is inadequate in several ways.– Any statistic that is based on only part of the evidence (as

is the case when any two groups are compared) is less stable than one based on all of the evidence.

– There are so many comparisons that some will be significant by chance.

– It is tedious to compare all possible combinations of groups.

The logic of ANOVA• Hypothesis testing in ANOVA is about whether the

means of the samples differ more than you would expect if the null hypothesis were true.

• This question about means is answered by analyzing variances.– Among other reasons, you focus on variances because

when you want to know how several means differ, you are asking about the variances among those means.

• ANOVA is also used for evaluation of main/ interaction effects.

Some ANOVA notes• Hypothesis: H0: µ1 = µ2 = µ3 …. = µk

Ha: At least 2 of the means differ (Does NOT mean that all population means differ)

• The term variance refers to the statistical method being tested, not the hypothesis being tested(Does NOT test whether the variances of the groups are

different)

• The P value in an ANOVA has many tails• Reported as:

(one-way ANOVA, Fdf between groups, df within groups = , p = )

Assumptions of ANOVA

• Samples are randomly selected from larger populations.

• Sample groups are independent.

• Observations within each sample were obtained independently.

• The data from each population is normally distributed.

Two Sources of Variability• In ANOVA, an estimate of variability

between groups is compared with variability within groups.– Between-group variation is the variation

among the means of the different treatment conditions due to chance (random sampling error) and treatment effects, if any exist.

– Within-group variation is the variation due to chance (random sampling error) among individuals given the same treatment.

A N O VA

W ithin-Groups VariationV a ria tion du e to ch a nce .

Betw een-Groups VariationV a ria tion du e to ch an ce

a n d tre a tm e nt e ffe c t (if a ny e x is tis ).

Total Variation Am ong Scores

Variability Between Groups

• There is a lot of variability from one mean to the next.• Large differences between means probably are not due to

chance.• It is difficult to imagine that all six groups are random

samples taken from the same population.• The null hypothesis is rejected, indicating a treatment

effect in at least one of the groups.

One-way ANOVA formula• The one-way ANOVA fits data to this:

Yi,j = grand mean + group effect + εi,j

.Yi,j = the value of the i th subject and j th group

.Group effect = the difference between the means of population i and the grand mean

.Each εi,j is a random value from a normally distributed population with a mean of 0

The F Ratio

A N O VA (F)

M ean Squares W ithin

W ithin-Groups VariationV a ria tion du e to ch a nce .

M ean Squares Betw een

Betw een-G roups VariationV a ria tion du e to ch an ce

a n d tre a tm e nt e ffe c t (if a ny e x is tis ).

Total Variation Am ong Scores

F MSbetween

MSwithinbilityGroupVariaWithin

bilityGroupVariaBetween F

The F Ratio

F MSbetween

MSwithin

MSbetween SSbetween

dfbetweenMSwithin

SSwithin

dfwithin

SStotal SSbetween SSwithin

The F Ratio: SS Between

Grand Total (add all of the scores together, then square the total)

Total number of subjects.

N

G

n

TSSbetween

22

Find each group total, square it, and divide by the number of subjects in the group

2)( grandgroupbetween XXnSS

The F Ratio: SS Within

Squared group total.

Number of subjects in each group.

n

TXSSwithin

22

Square each individual score and then add up all of the squared scores

2)( groupwithin XXSS

The F Ratio: SS Total

1 - groups ofnumber betweendf

SStotal X2 G 2

N Grand Total (add all of

the scores together, then square the total)

Total number of subjects.

Square each score, then add all of the squared scores together.

)()()( 2groupgrandgroupgrandtotal XXXXXXSS

Degrees of Freedom:Degrees of Freedom:Between: Between:

Within:Within:groups ofnumber total- subjects ofnumber total

...111 321

within

within

df

nnndf

1

2

3

Two-Way ANOVA

• Two-Way ANOVA uses same error term

as One-Way ANOVA

– Average of within-cell variances (SSWC/dfWC)

• Difference is that between-cell SS is

partitioned into each main effect (rows,

columns) and the interaction

SSR, SSC, SSRXC

Two-Way ANOVA

Two-Way ANOVA

Two-Way ANOVA

Two-Way ANOVA

Latin square• Latin squares has counterbalancing built in

Nr of rows equals the nr of columns• The letter presenting treatments appears in

each column and row only once• Effects of treatment, order and sequence are

isolated –systematic counterbalancing• Order 1 2 3• seq 1 A B C• seq 2 B C A• seq 3 C A B

Latin Squares

Latin Squares

Chapter 6: Correlation and Linear Regression analysis

6.1. Bivariate Correlation:- is used to measure the strength of the

linear relationship between variables.

- measures how variables or rank orders are related.

-computes Pearson’s correlation coefficient and Spearmen’s rank correlation.

Assumptions

Subjects are representative of a larger populationPaired samples (must have 2 variables) are

Independent observationsX and Y values must be measured independentlyX values are measured but not controlledNormal distribution (if not use Spearman’s rank

correlation)All covariation must be linear

Note that outliers have a large influence in correlation

Scatter Diagram

Designate one variable X and the other Y. Although it does not matter which is which, in cases

where one variable is used to predict the other, X is the “predictor” variable (the variable you’re predicting from).

Draw axes of equal length for your graph. Determine the range of values for each variable. Place

the high values of X to the right on the horizontal axis and the high values of Y toward the top of the vertical axis. Label convenient points along each axis.

For each pair of scores, find the point of intersection for the X and Y values and indicate it with a dot.

Pearson correlation

- Compute for correlation coefficient (r) which indicated the strength that variables are linearly related in a sample.

-The significance test for “r” reveals whether there is a linear relationship between variables in population.

Pearson’s r assumes an underlying linear relationship (a relationship that can be best represented by a straight line).

Not all relationships are linear

Correlation Analysis

With a simple two variable correlation, you need to know the strength and direction of the correlation

Scatterplots help illustrate the relationships between variables

Pearson’s r

Definitional formula:

Computational formula:

))()()((

))(()(2222

YYnXXn

YXXYnr

r COVXY

(sx )(sy) n

YYXXCOVXY

))((

separately vary Y and X which todegree

ther vary togeY and X which todegreer

Strength of Relationship

How can we describe the strength of the relationship in a scatter diagram?

Pearson’s r. A number between -1 and +1 that indicates the

relationship between two variables.

The sign (- or +) indicates the direction of the relationship. The number indicates the strength of the relationship.

-1 ------------ 0 ------------ +1Perfect Relationship No Relationship Perfect Relationship

Correlation Coefficient

is the best-known and easiest technique

rs is given by the equation:

where d is the difference between rankings in two

ranking methods

When N 10, rs can be used to calculate a t-score with

the equation and the resulting t-score is used in

a two-tailed test of significance

Spearman Rank Correlation

1)-N( N

d6

- 1 = r2

2N

=1is

Kendall Rank Correlation Coefficient ()

More complicated than the Spearman rankShould be used when three or more sets of

rankings are comparedCalculated by the proportion of concordant pairs

minus the proportion of discordant pairs There exist two bivariate observations, (xi,yi) and (xj,yj)

Concordant pairs are when (xi-xj)*(yi-yj) are positive

Discordant pairs are when (xi-xj)*(yi-yj) is negative

Scores range from -1 to 1

Goodman and Kruskal’s Lambda (λ)

λ is used when nominal scales are usedSpearman rank and won’t work because

the ordering element is missing with nominal scales

λ can be calculated by statistical packages

Partial Correlations (rP)

- To indicate the degree of two variables are linearly related in a sample, partialling out the effects of one or more control variables.

- To interpret partial correlation between two variables we must know the bivariate correlation between the variables first.

- To conduct a partial correlation, there must be at least three variables.

can be used in the following ways…Partial correlation between two variablesPartial correlation among multiple variables within

a setPartial correlation between sets of variables

Method of Least Squares Assumptions

The uncertainties in the y-values are greater than those in the x values.

The line representing the data should be drawn so that the deviations of the y-values are minimized.

Thus the best fit line or the least squares line is the straight line drawn that minimizes the vertical deviations (residuals) between the points on the line deviations cab be positive or negative ->should minimize the sum of the squares of the deviations.

The linear relationship of the analyte content and the measured signal: Y = mX + b or signal = m (Conc.) + Sblank

That is we draw the straight line that has the least value for the sum of the squares of the deviations.

Least square method

signal = m (Conc.) + Ssignal = m (Conc.) + Sblankblank

Linear regression y=b+mx

222 )()(

))((

ii

iiii

i

ii

xxn

yxyxn

xx

yyxxm

n

i

n

iii

n

i

n

iii

n

iii

n

ii

xxn

yxxxyxmyb

1 1

22

1 11

2

1

)(

....

Linear regressiony= ( bSb)+ (mSm)x

2

2

n

dS i

y

22

22

)()(

)(

ii

iyb

xxn

xSS

22

2

)()( ii

ym xxn

nSS

fiding Sy, Sm, Sb

Important Parameters in Instrumental Analysis

1) Sensitivity

2) Detection Limit

3) Dynamic Range

4) Selectivity

5) Signal-to-noise Ratio

Detection Limit ( LOD) LOD: The [analyte]min that can be determined with

statistical confidence. Analytical signal must be statistically greater than the

random noise of the blank. (i.e. analytical signal = 2 or 3 times S.D.of blank

measurement ( approx. equal to the peaknoise).Calculation of LOD The minimal detectable analytical signal (Sm) is given

by: Sm = Sbl + k.SDblank

To experimentally determine– Perform 20-30 blank measurements over an extended

period of time.– Calculate Sbl (mean blank signal) and SDblank

– Detection limit (Cm) is : Cm = (Sm-SDbl/)/m

LOQ, LOL, Dynamic Range

• LOQ (limit of quantitation): [lowest] at which quantitative measurements can reliably be made.

LOQ=10 x Average Signal for blank

•LOL (limit of linearity): point where signal is no longer proportional to concentration.

[Dynamic Range]: from LOQ to LOL.

Cm: detection limitCm: detection limit

Sensitivity

Indicates the response of the instrument to changes in analyte concentration or a measure of a methods ability to distinguish between small differences in concentration in different samples.

• In other words, a change in analytical signal per unit change in [analyte].

Effected by the slope of calibration curve & precision • For two methods with equal precision, the one with the

steeper calibration curve is more sensitive.( Calibration Sensitivity)

• If two methods have calibration curves with equal slopes, the one with higher precision is more sensitive.

(Analytical Sensitivity)

Calibration Sensitivity

– is the slope of the calibration curve evaluated .

S = mc + Sbl (m= slope; c= conc; Sbl = Signal of Blank) – Advantage: sensitivity independent of [analyte] – Disadvantage: Does not account for precision of individual measurements

Analytical Sensitivity(Defined by Mandel and Stiehler )

to include precision in sensitivity definition g = m/Ss(m = slope; Ss is the standard deviation of measurement)

- Advantage: Insensitive to amplification factors i.e. increasing gain also increases m but Ss also increases by same factor hence g stays constant

- Disadvantage: concentration dependent as Ss usually varies with [analyte]

Selectivity Degree to which a measurement is free from

interferences by other species contained in the matrix• Analytical Signal Detected is a sum of the analyte signal

plus interference signals

S = maCa + mbC + mCc + Sblank

• Selectivitity is a measure of how easy it is to distinguishbetween the analyte signal and the interference signal• Selectivity of an analytical method can be described using

a figure of merit called selectivity coefficient

kb,a = mb/ma : kc,a = mc/ma

S = ma(Ca + kb,aCb + kc,aCc) + Sblank

• Selectivity coefficients range from 0 >> 1. Can be negative if interference reduces the observed signal

Standard Addition Calibration

Most useful when analyzing complex samples when significant matrix effects are possible.

Most common “form” is adding 1 or more aliquots to sample aliquot (sample spiked)

If sample limited, can add to sample aliquot

Where : k is a proportionality constant relating signal toconcentration, Vs is the volume of standard added at a concentration of Cs,

Vx is the volume of unknown (aliquot) added at a concentration Cx, and Vt is the total (final) volume.

The Standard Addition Method(Spiking)

Technique to be used when:– Samples have substantial matrix effects.– Assay requires instrumental conditions that are difficult to

control Procedure• A measurement is made on a portion of the sample• Varying but known amounts (called spikes) of the assayed

substances are added to several equal portions of the sample → standard addition

• Each solution is diluted to same volume and measured.• The assay measurement is then plotted as a function of the

concentration spike.• The resulting plot is extrapolated to the concentration axis (i.e.

xaxis)

Internal Standard Method

An internal standard is a substance that is added in a constant amount to all samples, blanks and calibration standards in an analysis.

Procedure:• Carefully measured quantity of the internal standard is

introduced into each standard sample.• The solutions are diluted to the same volume and the

analytical signal is measured.• Calibration curve: Plot a ratio of the analyte signal to internal

standard signal vs. the analyte concentration of the standards

• The ration for the samples is then used to obtain their analyte concentration from the calibration plot.

Internal Standard (IS)

Internal standards are essential if we have a time-varying instrumental response Internal standards are very useful if you have matrix effects

Chapter 7: Quality Assurance / Quality Control

• QA: The planned measures that ensure a service or product meets minimum professional standards.

• QC: The day-to-day activities that monitor the quality of laboratory reagents, supplies and equipment.

• QA/QC: Proficiency Testing

Laboratory Accreditation

Validation

ISO 9000• An international set of standards for quality

management.

• Applicable to a range of organisations from manufacturing to service industries.

• ISO 9001 applicable to organisations which design, develop and maintain products.

• ISO 9001 is a generic model of the quality process that must be instantiated for each organisation using the standard.

ISO 9001

ISO 9000 certification

• Quality standards and procedures should be documented in an organisational quality manual.

• An external body may certify that an organisation’s quality manual conforms to ISO 9000 standards.

• Some customers require suppliers to be ISO 9000 certified although the need for flexibility here is increasingly recognised.

Documentation standards

• Particularly important - documents are the tangible manifestation of the software.

• Documentation process standards– Concerned with how documents should be

developed, validated and maintained.

• Document standards– Concerned with document contents, structure, and

appearance.

• Document interchange standards– Concerned with the compatibility of electronic

documents.

Document standards• Document identification standards

– How documents are uniquely identified.

• Document structure standards– Standard structure for project documents.

• Document presentation standards– Define fonts and styles, use of logos, etc.

• Document update standards– Define how changes from previous versions

are reflected in a document.

Quality in Environmental Analysis

• Value of Quality Control• General QC principles.

• Sources of error.

• Terminology and Definitions.

• Quality Control vs. Quality Assurance.

QC Terminology and Definitions

Principle Data Quality Indicators (DQIs):Precision

– Bias– Accuracy– Representativeness– Comparability– Completeness

Precision:- The agreement between the numerical values of two

or more measurements that have been made in an identical fashion.

- Calculated as range or standard deviation.- Intralaboratory & interlaboratory precision.

QC Terminology and DefinitionsBias:- The systematic or persistent distortion of a

measurement process that can cause errors in one direction

Accuracy:- The measure of how close an individual or

average measurement is to the true value.- Combination of precision and bias.- A reference material must be used in

determining accuracy.

QC Terminology and Definitions

Representativeness:- A measure of the degree to which data

accurately and precisely represents a sampling point or process condition.

- A measure of how closely a sampleis representative of a larger process.

Comparability:- A qualitative term that expresses the confidence

that two data sets can contribute to a common analysis.

QC Terminology and Definitions

Completeness:- A measure of the amount of valid data

obtained from a measurement system, expressed as a percentage of the valid measurements that should have been collected (i.e., measurements that were planned to be collected).

Quality Control vs. Quality Assurance

- QC is a component of QA.

- QC measures and estimates errors in a system.

- QA is the ability to prove that the data is as reported.

Sources of Error- Sample errors- Reagent errors- Reference material errors- Method errors- Calibration errors- Equipment errors- Signal registration and recording errors- Calculation errors- Errors in reporting results

Sources of ErrorSample Errors

- Sample container contaminated.- Incorrect sample location.- Non-representative sample.- Incorrect sample container.- Sample mix up.

Reagent Errors- Impure reagents or solvents.- Improper storage of reagents.- Neglect of reagent expiration date.- Evaporated reagents.- Consideration of different purities or grades.

Sources of ErrorReference Material Errors

- Impurity of reference materials.- Errors from interfering substances.- Changes due to improper storage.- Errors in preparing reference material.- Using expired reference material.

General Method Errors- Deviating from the analysis procedure.- Disregard for the limit of detection.- Disregard for a blank correction.- Calculation errors (dilutions, mixtures, additions).- Not using the correct analytical procedure.

Sources of ErrorCalibration Errors

- Volumetric measuring errors.- Weighing errors.- Inaccurate equipment adjustments.

Equipment Errors-Equipment not cleaned- Maintenance neglected.- Temperature, electrical, and magnetic effects.- Errors in using auto-pipettes (not calibrated, pipette tip

not correctly attached, contamination).- Errors in using glass pipettes (damaged, bad technique,

contamination).

Sources of Error

Equipment Errors (continued)

• Cuvette errors (defects not considered, unsuitable cuvette glass, not filled to minimum, wet on the outside, air bubbles, contamination).

• Photometer errors (wrong wavelength, insufficient lamp intensity, dirty optics, drift effect ignored, incorrectly set zero, light entering the sample chamber).

Sources of ErrorSignal Registration and Recording Errors

- Incorrect range setting.- Reading errors.- Recording errors.- Switching of data.

Calculation Errors- Arithmetic errors, decimal point errors, incorrect units.- Rounding errors.- Not taking into account the reagent blank values.- Error in dilution factor.

Errors in Reporting Results- Omitting a sample error.- No quality assurance implemented

Validation demonstrates that a procedure is robust, reliable and reproducible

• A robust method is one which produces successful results a high percentage of the time.

• A reliable method is one that produces accurate results.

• A reproducible method produces similar results each time a sample is tested.

QA- Does the method still work

• Control charts - Documenting and archiving• Proficiency testingParticipating in collaborative interlaboratory studiesCalculate Z-score:Where:

S

XXz i

ˆ

iX is the mean of I replicate measurements by laboratory

is the accepted concentration

is the standard deviation of the accepted concentration

S

Defining the Problem

1. What accuracy is required?

2. How much sample is available?

3. What is the concentration range of the analyte?

4. What components of the sample will cause interference?

5. What are the physical and chemical properties of the sample matrix?

6. How many samples are to be analyzed?

Selecting an Analytical Method

Numerical Criteria for Selecting Analytical MethodsNumerical Criteria for Selecting Analytical Methods

Parameters for method validation

• Accuracy

• Precision

• LOD, LOQ, Sensitivity

• Selectivity

• Linearity

• Range

• Ruggedness or Robustness

Accuracy (determination)

• Compare results of the method with results of an established reference method

• Positive controls (dilution must be done separately from calibration point with fresh reagents, different supplier of standard or other batch than calibration)

• Measurements of CRM• Spiking the sample matrix with a known

concentration of RM

Standard operation procedure(SOP)

It should include: • Validity ( e.g. application in wastewater)• Short description of the main principle• Possible errors and problems• Preparation of reagents, standards, instruments• Sample preparation (sampling, enrichment,

chromatography, detection)• Quantification of the compounds• QA/QC