Transcript

1

Module Seven : Quantifying uncertainties

In the previous modules, we have discussed a variety of numerical and graphical methods for handling one and two sample problems in Inter-laboratory studies. In this module, we will discuss some general rules to

operate uncertainties. •How to quantify the uncertainty of a system of operations.

•How to combine uncertainties of the same characteristics measured different individuals.

•When no information about the relationship of variables

•When variables are independent

•When variables are correlated

•How to determine uncertainty based on the probability distributions.

•Uncertainty from linear least squares calibration.

•Measuring Type B uncertainty

2

A system of components involving the measurement of uncertainty

In many practical cases, we are interested in measuring the combined uncertainty of the entire system which consists of several uncertainty measurements.

Input X1 X2 Xk Output

The system consists of k components. Each component, we measure its uncertainty:

Where is the best estimate for the component i. is the measurement uncertainty.

The system is a function of ,denoted by

i ix Uxix iUx

1 2, ,..., kx x x1 2( , ,..., )kf x x x

3

Q2: How to quantify the uncertainty of

Our goal is to quantify in terms of

1 2( , ,..., )kf x x x

1 1 2 2( , ,..., )k kf x Ux x Ux x Ux

1 21 2 ( , ,..., )( , ,..., )kk f x x xf x x x U

Four Questions involving quantifying the uncertainty of the system

Q3: If there are n individuals measure the same component. Each obtains a measurement

How do we combine these measurements and uncertainties ?

j jx Ux

Q1: When measuring the uncertainty of one variable, x. It is often that f(x) is the interest of measurement. How do we extend the measurement uncertainty to f(x), if the measurement of x with uncertainty is ?xx U

4

For each measurand, the measurement of the measured property usually obeys certain probability distribution. For example, a normal curve has been used for a variety of continuous variables. These distributional characteristics allow us to make a confidence interval estimate of the measurement with an intended level of confidence.

Q4: When the distribution characteristic is applied to measuring the uncertainty, how to quantify the uncertainty based on the distribution characteristic of the measured property?

5

Some common functions of x variable are

Case 1: f(x) = axn Case 2: exp(x)

We consider the general form of f(x) and apply to these special cases.

Q1: When measuring the uncertainty of one variable, x. It is often that f(x) is the interest of measurement. How do we extend the measurement uncertainty to f(x), if the measurement of x with uncertainty is ?xx U

x-Ux x x+Ux

f(x+Ux)

f(x)

f(x-Ux)Uf

Uf

f(x)

x-Ux x x+Ux

f(x-Ux)

f(x)

f(x+Ux)

6

A fundamental calculus approximation asserts the following fact , given f(x) is monotone increasing, for Ux sufficiently small:

This gives the uncertainty of f(x): , when f(x) is an increasing function of x.

Similarly, when f(x) is decreasing, the uncertainty of f(x) is

Combining these two situations, the measurement of f(x) with uncertainty is given by:

( ) ( )x x

dff x U f x U

dx

f x

dfU U

dx

( ) | | x

dff x U

dx

f x

dfU U

dx

Case 1: f(x) = xa : f(x) with uncertainty is

Case 2: f(x) = exp(x): f(x) with uncertainty is

Case 3: f(x) = ln(x): f(x) with uncertainty is

Case 4: f(x) = cos(x): f(x) with uncertainty is

Case 5: f(x) = ax: f(x) with uncertainty is

1 (1 | | )| |

a a a xx

Ux nx U x a

x

exp( ) exp( ) exp( )(1 )x xx x U x U

ln( ) xUxx

cos( ) | sin | xx x U

| | xax a U

7

Examples:

1. One measured the thickness of 100 sheets of papers many times and obtain the thickness:

What is the thick ness of one sheet of paper?

Ans: f(x) =x/100. The thickness of one sheet is

2. One measured the radius of a circle and obtain the measurement:

What is the measurement of the area of the circle?

Ans: f(x) = r2. The area is

Class Hands-on activity

Using the hands-on activity of ‘drawing 2 cm of line segment’ data to compute the uncertainty of your draws using the 10 data points, that is compute the sample s.d. Then, estimate the uncertainty for drawing one cm line segment.

2.50 .12cm

(2.5 /100) (.12 /100) .025 .0012cm

4.5 .1cm

2 2(1 2 ) (4.5) [1 2(.1) / 4.5] 20.25 .9rUrr

8

Q2: How to quantify the uncertainty of

Our goal is to quantify in terms of

1 2( , ,..., )kf x x x

1 1 2 2( , ,..., )k kf x Ux x Ux x Ux 1 2 1 2( , ,..., ) ( , ,..., )k kf x x x Uf x x x

We begin with a simple system of two components, x and y

The measurements and uncertainties are

The basic function of f(x,y) for describing the system are

Case 1: x + y Case 2: x-y Case 3: xy Case 4: x/y

Any function that is a combination of these operations can be propagated.

and x Ux y Uy

9

Conditions of measuring uncertainty when combining variables:

(1) No prior information about the components themselves or their relationship: For this situation, the maximum combined uncertainty may be more appropriate.

(2) When variables are independent, that is the process of measuring one component is not related to the process of measuring other processes: For this situation, the independence property can be applied to reduce the combined uncertainty.

(3) When distribution of the variables are known or can be approximated reasonably by some probability distributions, the measurement uncertainty of the component is estimated by standard error of the best estimate, and a probability of confidence can be obtained to describe the measurement uncertainty. Type A and Type B uncertainty are defined accordingly.

(4) It often happens in a practical situation, uncertainty occurs not only randomly, but also systematically. One needs to be careful about the existence of systematic error, and make an effort to estimate this component of error, when ever possible. When reporting uncertainty involving both random error and systematic error, it is a good practice to provide separate presentation as well as combined uncertainty.

10

We have introduced Youden Plots and some numerical measures for measuring systematic and random errors. The estimation of systematic error requires some additional effort by conducting appropriate experiments that are specifically designed for estimating a suspected systematic error component. The analysis often depends on the design of the experiment. Some commonly used designs and their analysis will be discussed later. In this following we will focus on measuring measurement uncertainty for condition (1) and (2).

Case 1: f(x,y) = x+y

Given the uncertainty for each variable:

We are looking for

Under Condition 1: Maximum possible measurement for (x+y) is

x+Ux + (y+Uy) = (x+y) +(Ux+Uy)

The minimum possible measurement is (x+y) –(Ux+Uy)

Therefore, the uncertainty for the sum is :

and x Ux y Uy ( ) ( )x y U x y

( ) ( )x y Ux Uy

11

Case 1: Under Condition 2 : x and y are independent

Independence of x and y has the following geometric property for the uncertainty:

Ux

Uy

The uncertainty of x+y, due to independence, is given by

U(x+y) = 2 2U x U y

Case 1: Condition 3: x and y follow a certain probability distribution. The measurement of property x is the best estimate from the sample information (Type A) or from the physical property or prior knowledge (Type B). At first, step, we determine the best estimate for the characteristic by observing n measurements. The uncertainty of the variable x is estimated by sample standard deviation. In many cases, we are interested in estimating the unknown average of the characteristic. The best estimate is the sample mean,

The uncertainty of is the Standard Error of , which is

And is estimated by the sample standard deviation, sx.

xx x / n

12

Similarly, the best estimate of the average for variable y and its uncertainty is

The estimate of the average characteristic for variable x and its uncertainty is

xsxn

Therefore, the uncertainty of the average of (x+y) is

2 2( ) ( / ) ( / )x yx y s n s n

ysyn

Assuming that the two samples are chosen at random , and x and y are independent.

13

Under Condition 1: the uncertainty is ( ) ( )x y Ux Uy

Under Condition 2: the uncertainty is 2 2( ) x yx y U U

Under Condition3: the best estimate and the uncertainty is: 2 2( ) ( / ) ( / )x yx y s n s n

Case 2: f(x,y) = x-y:

14

Case 3: f(x,y) =xy

Under Condition 1: x is measured by = x(1+ )

Therefore, the measurement of f(x,y) = xy is

x * y = xy

= xy(1+ + + ) (xy)(1+ + ) = xy (1+ )

Therefore,

xx U| |

xU

x

(1 )| |

xU

x (1 )

| |yU

y (1 )

| |xU

x (1 )

| |yU

y

| |xU

x | |yU

y | |xU

x | |yU

y | |yU

y| |xU

x | |xyU

xy

| | | | | |f yx

U UU

f x y

Case 3, Under Condition 2: X and Y are independent:

The fractional uncertainty of f(x,y) = xy is given by22

The measurement of x/y with uncertainty is (xy) [1 ]yxUU

x y

15

Case 4: f(x,y) = x/y: Under Condition 1:

The largest measurement of f(x,y)= x/y =

The smallest measurement of f(x,y)= x/y =

Using the Binomial Expression: 1/(1-a) = 1+a+a2+a3+ …. When a is small, we can approximate 1/(1-a) = 1+a. The largest measurement of x/y has the form of

(1+b)/(1-a) (1+b)(1+a) = 1+b+a+ab 1+a+b (since ab is very small).

The largest measurement of x/y =

Similarly, smallest measurement is

Therefore, the measurement of x/y with uncertainty is

[1 ]| | | |

yxUUx

y x y

[1 ]| | | |

yxUUx

y x y

[1 ( )]| | | |

yxUUx

y x y

(1 / | |)

(1 / | |)x

y

x U x

y U y

(1 / | |)

(1 / | |)x

y

x U x

y U y

Case 4: f(x,y)=x/y: Under Condition 2:

The measurement of x/y with uncertainty is 22

[1 ]yxUUx

y x y

16

A General form of combining k variables, x1,x2, … xk.

Case 1 to 4 are cases of combining two variables of x and y. Their measurement uncertainties are investigated under three different conditions for case 1 and 2,and under two conditions for case 3 and 4. In this section, we will consider a general function, f(x1,x2, …,xk)

Under Condition1: the measurement of f(x1,x2, …, xk) with uncertainty is given by:

1 21 2 1 2( , , , ) | / | | / | | / |kk x x k xf x x x f x U f x U f x U

Under Condition 2: xi’s are independent. The measurement of f(x1,x2, …, xk) with uncertainty is given by:

1 2

22 2

1 21 2

( , , , )kk x x x

k

f f ff x x x U U U

x x x

17

Examples:

1. To find the area of a triangle, the base is measured to be

The height is measured to be

What is the measurement of the area?

Ans: f(b,h) = bh/2. Under condition 1: the fractional uncertainty of f(x) =

Hence, the area is

Under Condition 2: The measuring process of base and height are independent.

The fractional uncertainty of f(b.h) =

Hence, the area is

5.0 .3cm

3.0 .2cm

(.3 / 5) (.2 / 3) .1266b hU U

b h

(5)(3) / 2[1 .1266] 7.5 .94

2 22 2(.3 / 5) (.2 / 3) .0897b hU U

b h

7.5(1 .0897) = 7.5 .67

18

Hands-on Activity

Consider a system consists of four components: x,y,z,w. The measurements are

(a) Suppose the system is

Find the measurement of f under condition 1 and 2.

(b) Suppose the system is

Find the measurement of f under condition 1 and 2.

( c) Suppose the system is

Find the measurement of f under condition 1 and 2.

(d) Suppose the system is

Find the measurement of f under condition 1 and 2.

(e) Suppose the system is

Find the measurement of f under condition 1 and 2.

80 4, 100 6, z=70 2, w=150 5x y

23( , , , )

exp( ) 2

x yzf x y z w

wz xy z

2( , , , ) 3f x y z w x yz

( , , , ) ln( )f x y z w wz y

( , , , ) /f x y z w xy wz

( , , , ) ( 2 )f x y z w x y z

19

Under Condition1: the measurement of f(x1,x2, …, xk) with uncertainty is given by:

1 21 2 1 2( , , , ) | / | | / | | / |kk x x k xf x x x f x U f x U f x U

A General form of combining k variables, x1,x2, … xk- Continued

We discuss how to determine the uncertainty of a general function, f(x1,x2, …,xk) under Condition 1: no information is known, and (2) Variables are independent. There are situations where some variables may not be independent. For example, when measuring the water pressure of water in a container, it clearly related to the volume of the container. In some cases, the physical or chemical properties give us functional relations between variables, and we can take the advantages of the function relationship. In many, we do not have the functional relation. However, we can use data to estimate their relation and take into account the relation into the

computation of combined uncertainty. Recall the results under conditions (1) and (2).

Under Condition 2: xi’s are independent. The measurement of f(x1,x2, …, xk) with uncertainty is given by:

1 2

22 2

1 21 2

( , , , )kk x x x

k

f f ff x x x U U U

x x x

When Xi’s are correlated, we need to estimate the correlation structure and take it into the computation of the uncertainty.

20

We shall consider several simple cases before discussing the general form for cases when variables are correlated.

Case 1: f(x,y) = x +y. X and y are random variables follows a certain probability distribution. Then a typical measure of uncertainty is the variance.

Result 1: V(x+y) = v(x) +v(y) + cov(x,y), where cov(x,y) is called the covariance of x and y. When samples are uses to estimate these components,

v(x) is estimated by

V(y) is estimated by

And cov(x,y) is estimated by

2 2 2[ ( ) ] /( 1)x is x n x n 2 2 2[ ( ) ] /( 1)y is y n y n

[ ( )] /( 1)xy i is x y n xy n The standardize form of cov(x,y) gives the correlation coefficient, is estimated by the Pearson’s correlation:

2 2

xyxy

x y

sr

s s

21

For computational simplicity, we usually compute Sum of Squares and apply them to compute variances , covariance and correlation coefficient.

2 2 2[ ( ) ] , and /( 1)x i x xSS x n x s SS n

[ ( )] and /( 1)xy i i xy xySS x y n xy s SS n

2 2 NOTE: -1 r 1xy xy

xy

x yx y

s SSr

SS SSs s

2 2 2[ ( ) ] , and /( 1)y i y ySS y n y s SS n

From the correlation, the estimated covariance can be computed from correlation and sample variances:

2 2( ) ( )( ) ( )xy xy x y xy x y xy x ys r s s r s s r SS SS

These formulae look pretty complicated. However, they all come from three values: SSx , SSy, and SSxy. The following figures demonstrate some cases of correlation.

22

Relationship between X and Y

0 1 2 3 4 5 6 7 8 9 10

5

10

15

X

Y

r > 0 and about .7

0 1 2 3 4 5 6 7 8 9 10

5

10

15

XY

r < 0 and about - .7

0 1 2 3 4 5 6 7 8 9 10

5

10

15

X

Y

r is approximately zero

0 1 2 3 4 5 6 7 8 9 10

5

10

15

X

Y

r is approximately zero, and therelationship is nonlonear

0 1 2 3 4 5 6 7 8 9 10

5

10

15

X

Y

r is positive, about .7, and therelationship is nonlonear

0 1 2 3 4 5 6 7 8 9 10

5

10

15

X

Y

r is almost one

23

Result 3:

Result 2: with xi’s being random variables1 2( , ,..., )k i if x x x a x2

,

2 2 2

( ) ( ) 2 ( )

In terms of sample estimates:

( ) 2 [ ( )( )]i i j

i i i i i j i ji j

i i i x i j ij x xi j

V a x a V x a a Cov x x

s a x a s a a r s s

The uncertainty is just the sample standard deviation of the corresponding estimate of the function when using sample information. Therefore, in terms of the uncertainty notation, result 2 becomes to:

2 2,

2 2

( ) ( ) 2 ( )

= ( ) 2 [ ( )( )]i j

i i i i i j i ji j

i i i j ij x xi j

U a x a U x a a U x x

a U x a a r U U

1 2( , ,..., )k i if x x x a x

24

Measurement Uncertainty of a general function f(x1,x2, …,xk) with the xi’s being correlated.

2 21 2 ,

2 2

( ( , , , )) = ( ) 2 ( )

= ( ) 2 [ ( )( )]

where , which is called the sensitivity coefficient.

This measures the change of the fu

i j

k i i i j i ji j

i i i j ij x xi j

i i

U f x x x a U x a a U x x

a U x a a r U U

a f x

i

i

nction f on the coordinate x

when x is incresed by one unit.

NOTE:

When sample information is used to estimate the uncertainty,

U(x) = sx , and U(x,y) = cov(x,y) = rxyU(x)U(y) = rxy(sx)(sy)

25

Hands-on activity

A system consists of three components, x,y,z. These components are correlated. Based on the sample information, the sample variance-covariance matrix is

The measurement for each component is:

2x

2

2

s 16

Var-Cov matirx of (x,y,z) = 8 25

4 1036xy y

xz yz z

s s

s s s

y z50 4, y U 44 5, z U 74 6xx U

(a) Suppose the system is f(x,y,z) = 2x-y/z.

Determine the uncertainty for the system.

(b) Suppose the system is f(x,y,z) =(x/2 – y2)z

Determine the uncertainty for the system.

26

Q3: If there are n individuals measure the same component. Each obtains a measurement

How do we combine these measurements and uncertainties ?

j jx Ux

1: 25 1.5, Lab2: 24.5 .8, Lab3: 26.2 1.8Lab

Consider the situation: Three standardized labs tested the same material using the same procedure and assume the environmental conditions are uniform. The tested results from the three labs are:

The intend is to combine the testing results as the reference for other labs. This type of problem is different from what we discussed before. In this case, we measured one variables by three participants, and obtain different uncertainties.

It is most likely that each result is from a repetitions of several tests. In this case, xj is the sample mean, and Uj is the standard deviation of mean, ./s n

27

In combining these results to obtain the best estimate, we should make sure that the lab testing results are consistent, and the lab systematic error is negligible. That is the uncertainty is from the random error only. Therefore, before combining the test results, we should make a quick check :

•If there are any unusually large discrepancy |xi – xj| between each lab.If this discrepancy is much larger than Ui and Uj, we should suspect that at least one measurement has gone wrong, and a close examination of the process of testing is needed.

•If there is any unusually large uncertainty from a lab. If Ui/Uj is over three, we should suspect some systematic errors exist, and a close examine of the process of testing is needed.

One way to combine the testing results is by an average:

f(x1,x2,x3) = (x1+x2+x3)/3

However, the results have different precision. We would assume that the more precise result (smaller uncertainty) should be given a larger weight. In stead of using the un-weighted average, a weighted average gives a better estimate.

28

Weighted Average for combining measurements that measuring the same variable.

Different weighting scheme can be developed mathematically. In the following we apply the one that is developed based on ‘Maximum Likelihood Principle’, one of the most important estimation principle in statistical estimation.

Consider k labs measured the same variable and obtained

The weighted average is given by:

22 21 2

1 22 2 2

2ii2

i

i 2

comb 2i i

1/1/ 1/

(1/ ) (1/ ) (1/ )

x = (1/ ) = w

U

1where, w

1 1The measurement uncertainty of x is given by =

w (1/U )

kcomb k

i i i

i i i

i

UU Ux x x x

U U U

U x w

U

1 1 2 2 k, x , , x kx U U U

29

Example: Consider now the case of three labs. The measurements are

The best estimate of the measurement for the variable X is

The combined uncertainty is

1: 25 1.5, Lab2: 24.5 .8, Lab3: 26.2 1.8Lab

2 2 2 2 2 2

25.0 24.5 26.2 1 1 1/ 57.48 / 2.3156 24.82

(1.5) (0.8) (1.8) (1.5) (0.8) (1.8)combx

2 2 2

2

1 1 1 11/ 1/1.5217 .657

1.5 0.8 1.81

iU

The best combined estimate for the variable x is

24.82 .657

30

Hands-on Activity

1. Three individuals measured the same component and obtain the following results:

Obtain the weighted combined measurement and its uncertainty.

2. Four labs conducted the same testing procedure to test the same material in order to set up a reference uncertainty for the material. Each lab repeated the test for eight times. The testing results are:

:80 5 B: 78 4 C: 83 5A

Lab A 12.4 12.6 12.3 11.9 12.0 12.4 12.1 12.0

Lab B 11.9 11.9 12.2 12.1 12.4 12.0 11.8 12.2

Lab C 12.0 12.3 12.6 12.5 12.0 12.1 12.0 12.3

Lab D 11.8 12.3 12.5 11.9 12.2 12.3 12.3 12.2

(a) Use this data to estimate the lab average and within-lab uncertainty for each lab.

(b) Is there any unusual lab averages or within-lab uncertainties?

(c) Determine the best estimate of lab average with uncertainty of the best estimate for each lab.

(d) Obtain the best estimate of combined lab average and uncertainty using the weighted method.

31

Q4: When the distribution characteristic is applied to measuring the uncertainty, how to quantify the uncertainty based on the distribution characteristic of the measured property?

In the previous sections, we discussed how to quantify uncertainties without imposing the concept of probability distribution to the variables of interest. The uncertainty has been presented as one unit uncertainty.

An important question in statistical estimation and and quantifying uncertainty is to ask

‘How much confidence’ we can claim that the actual unknown characteristic falls between

bestEst xx U

32

In the following section, we will discuss how a confidence level is formed, and the role that probability distribution plays in making the level of confidence.

We will focus on

1. Estimating population mean. Our purpose is to be able to make statements such as we are 95% confidence (sure) that the true lab average for testing Material A is between 32.3 to 35.4, and so on.

The key difference between ‘be able to make a confidence level statement’ and ‘presenting uncertainty only’ is that the occurrence of variable of interest follows a certain probability distribution. Therefore, we can characterize the variable using an appropriate distribution.

For example, based on our common experience, adult weights usually follows a normal curve. While, distribution of salary is usually skewed-to-right. B

By imposing adequate distribution, we can find out how much chance that each of these intervals will cover the truth of the measurement:

, 2 , or more generally, BestEst x BestEst x BestEst xx U x U x kU

33

When combining several Type A uncertainties, due to the fact that different components may be measured using different number of observations. Each uncertainty has different degrees of freedom. As a consequence, a combined degrees of freedom would be necessary for expanded uncertainty when distribution property is assumed. A simple approach of combining d.f.’s is by using a weighted average.

(This is what is called Welch-Satterthwaite method)

Consider uncertainty for component i is obtained based on vi d.f.. Then, a weighted combined d.f. can be found by:

42

comb4 4

( ( , ,..., ) NOTE:

( / ) ( )i k

comb ii i

i

U f x x xv

f x U x

v

D.F. for the combined measurement uncertainty

An expanded uncertainty based on t-distribution is now possible:

( / 2, ) ( )dff t U f

34

Determine the uncertainty of Sample Mean

Recall the activity of ‘drawing two centimeters of line segment ten time’.

•Our goal is to estimate how well we can draw two cm line. The best single estimate would be the average. Since there is always uncertainty in our drawing, we usually report our estimate as an interval:

•The quantity k is determined by the level of confidence we intend to report.

•In the drawing activity, there are two possible averages: Individual’s average of ten draws for estimating individual’s measurement and the average of all draws from everyone for estimating the measurement of the target population of interest.

Best Estimate k(Uncertainty of the Best Estimate)

In the case of using sample average to estimate the unknow population mean:

an appropriate report is [ ( )] X k SD X

35

•Depending on the purpose, we use the corresponding sample average to estimate the unknown nature of the true average. Accordingly, we will need to estimate the uncertainty of using sample mean, to estimate the population unknown truth mean, we usually use the notation .

How much uncertainty is it when using sample mean to estimate the population mean? How to estimate this uncertainty?

•Using our common experience, we can conclude that if we increase sample size, then the sample mean is closer to the population mean.

•How close is it? Can we measure the degree of closeness? The answer can be understood from the following scenario:

Imagine in a laboratory, we test a characteristic of a material. Assuming the testing procedure is standardized and the testing process is under statistical control. Let X represent the measurement of the characteristic. Each day, five sub-samples of the material are tested for 400 days.

x

36

Sample 1 2 3 4 5 Sample mean,

Standard deviation, s

Day1 X11 X12 X13 X14 X15 s1

Day2 X21 X22 X23 X24 X25 s2

Day3 X31 X32 X33 X34 X35 s3

Day400 X400,1 X400,2 X400,3 X400,4 X400,5 s400

Average of all 400 days

x

Recording the data in a spreadsheet:

1x

2x

3x

400x

x s

We notice that

•The individual measurements are different, and we can observe the average, variability and the histogram of these 2000 individual measurements.

•The daily averages are also different, but they are closer to each other than individual measurements. We can also observe the average, variability and histogram of these 400 sample means.

37

Computer Simulation Activity to Demonstrate Sampling Distribution of Sample Mean

Simulate 400 days of laboratory tests.

Consider (a) n =5 sub-samples per day

(b) n = 20 sub-samples per day

(c) n = 40 sub-samples per day.

Compute daily averages, and demonstrate the relation ship among the distributions of individual test measurements, sample means from n = 5, n = 20 and n = 40.

Summarize the pattern of these relationships.

38

Patterns from the computer simulation

39

•Consider n = 5, the 2000 measurements resemble the unknown population. And therefore, the distribution of these 2000 observations, the average and s.d. should also be very close to the population. We use the notation: the population mean is , and population s.d. is

•The distribution of the 400 sample means reflects the uncertainty of sample means of size n = 5. The smaller the s.d. of the sample mean, the more precise the sample mean is for estimating the population mean. We use the notation: the distribution mean of sample averages is , the distribution s.d. of the sample averages is .

•The relationships are: = and =

Distribution means are the same. This is the property of UNBIAS: The averages of all possible sample means = population mean.

Distribution s.d. of Sample Mean is smaller when sample size increase. This measures how close sample mean is to the population mean for a sample of size n. Therefore, measures the uncertainty of using sample mean to estimate population mean.

When we obtain a sample, and compute the sample mean, says one unit of error between the population mean, and the observed is

When sample size increases, the error of using sample mean to estimate population mean decreases. Therefore, we also call the Standard Error of Sample Mean,

x x

x x / n

. .( ) /s d X n

. .( ) /s d X n

x / n

. .( ) /s d X n ( )SE X

40

Since population s.d., is usually unknown, we use sample s.d., s to estimate Therefore, .( ) /SE X s n

To report the sample mean with the uncertainty measurement is given by:

( ) or equivalently, ( / )x k SE x x k s n

The multiple, k, is determined based on the level of confidence we make the claim. Recall the Empirical Rule suggests that if k = 2 and the distribution shape of the sample mean is normal, then, this provides about a 95% level of confidence.

How can we be sure and when the shape of distribution will follow a normal distribution?

• Fortunately, it is guaranteed that, if the sample size is large, the shape of distribution is approximately normal (common experience suggests n > 30 is large enough to guarantee the normal shape. Putting the above discussion together, we have:

•The distribution of is approximately normal with mean and s.d. =

replace by s if is not known, when sample size is large enough (n > 30).

X

X

X / n

41

Using the above 400 days of testing data, sample size n = 5 each day. Suppose these 2000 measurements give us the best estimate of the population characteristics:

and s = 3. Since our sample size , n = 5, we would can determine the distribution of sample means has the center = 15 and

( ) . .( ) / 3 / 5 1.342SE X s d X s n

x

By assuming the population characteristic follows normal, we can make a 95% confidence level of estimation for the population mean:

95% of chance that the average of the characteristic is between

Equivalently, we are 95% confident (sure) that the the average of the characteristic is between

More precisely, in stead of using k =2 , we can apply the normal distribution and use k = 1.96.

Extending this, we can obtain a general pattern for constructing a confidence interval for estimating population mean at any level of confidence.

15 2(1.342)

15 2(1.342)

42

A General Pattern for constructing 100(1-)% confidence interval for the unknown population mean when sample size is considered

large~ ( , / ), For this example, ~ (15,3 / 5)X N n X N

Z = ( -15)/1.342

15

0

X

.95 = 1-

.025 = .025 =

X1.96=Z-1.96=-Z

100(1-)% confidence interval for population mean is

95% confidence interval is

90% confidence interval is

99% confidence interval is

/ 2( ) ( )x z SE X

(1.96) ( )x SE X(1.645) ( )x SE X

(2.576) ( )x SE X

43

How can we determine a confidence interval of population mean using the information of a small sample?

When reporting the uncertainty of a sample mean, we report:

( ) or equivalently, ( / )x k SE x x k s n

The multiple, k, is related to the level of confidence. For large sample cases, we take the advantage of the normality property of , and use its standardized distribution, the Z-distribution to determine the multiple, k, for any given level of confidence.

When sample is drawn from a normal population with unknown variability, we are not able to enjoy the same nice normality property for the distribution of

A somewhat more complicated distribution, called t-distribution is used as the standardized distribution for

T-distribution is similar to the Z-distribution: It has the center at 0 and distribution is also symmetric about the center, 0. However, it depends on what we called degrees of freedom (d.f.). In one sample confidence interval cases, the d.f. = (n-1).

X

X

X

0

.95 = 1-

.025 = .025 =

( )

/

Xt

s n

2.262=t-2.262=-t

Example: n = 10, d.f. = n-1=9. T-values depends on d.f.. Some commonly used t-values can be found in every statistical book.

44

A General Pattern for constructing 100(1-)% confidence interval for the unknown population mean when sample size, n, is small

100(1-)% confidence interval for population mean is

95% confidence interval, when n = 10, is

90% confidence interval, when n = 10, is

99% confidence interval, when n = 10, is

/ 2( , )df

sx t

n

(1.833)s

xn

(2.262)s

xn

(3.25)s

xn

An example: In a lab testing, 16 samples are tested. Data are summarized. Sample mean, = 32.5 and s = 4.8

A 95% confidence interval is

x

/ 2 .025,16 1

4.8( , ) 32.5 ( ) 32.5 2.131 1.2 [29.94,35.06]

16df

sx t t

n

More exercises for applying t-distribution, if needed.

45

Some commonly used distributions

1. Normal distribution: is the most common one for describing many continuous variables. It is symmetric about mean . The shape is bell-shaped. The spread is characterized by the scale parameter,

2. Uniform distribution: This distribution describes the random occurrence with equal chance in a given interval. This is applied for describing the Type B uncertainty.

3. Triangular distribution: This distribution is applied for describing the Type B uncertainty as well.

(NOTE (2) and (3) are easy to operate. They uncertainty of uniform distribution is usually large, therefore, more conservative. This is applied when no prior knowledge for Type B uncertainty. Triangular distribution is symmetric about and easy to operate.When the variables are not measured empirically, this is applied for a less conservative Type B uncertainty.)

4. Weibull distribution is common for characterizing physical properties, such as life time, strength, hardness, and so on. It is usually skewed to the right.

46

5. Gamma and Exponential distributions: These are another common distributions for characterizing physical properties.

6. Binomial distribution: X = # of successes in n identical and independent trials. This is for discrete random variables. It is most commonly used in describing # of defectives in a sample. It is a very useful tool for quality control when the # of defectives are to be monitored in a process.

7. Poisson distribution: X = # of occurrences of an event in a short time period. This is common for describing # of mutations in cells in a given short time period when treated with a different dosages of radiation. It is also common in quality control, where the number of defects in a finish product is to be monitored.

47

8. Several important distributions for sampling statistics: t- distribution, F-distribution, and Chi-Square distribution.

T-distribution describes the standardized distribution of sample mean when sample size is small. It is commonly used for testing the difference of two population means.

F-distribution describes the distribution of some function of the ratio of sample variances. It is commonly used in analysis of variance, and in comparing two population variances.

Chi-square distribution describes the sum of squared standardized difference between observed and expected frequencies. It is commonly use as a goodness of fit test for testing how well a distribution fits a real world data. It is also used for comparing a sample variance with a reference variance.

48

Each distribution describes a certain real world phenomena. It is important to learn the property and assumptions when applying them to meet your needs.

49

Uncertainty from Linear Least Squares Calibration

Calibration is an important tool for adjusting an instrument so that the systematic bias due to the inaccurate instrument read out will be reduced. Calibration involves with

1. Determine response variable and the level of analyte x for the instrument to be calibrated.

2. Observing the responses y at different levels of the analyte x.

3. Modeling the relationship between x and y.

In most cases, such a relationship is linear: y = a+bx

4. Use an observed response y to determine the anlyte level x through the regression line. This level is then compared with a standard or a reference level that supposes to result the given y response.

5. The uncertainty of the predicted level of x from the response y can be estimated using the regression line.

50

That is, is not zero. This is the the system bias,

which is often due to instrunment wearing along the time.

The approach of calibration is used to adjust the instrument

back to its original cond

B x

ition.

In statistical term, it is to make the Bias = 0.

Consider the following simple situation:

When a machine is used to measure the diameter of a bearing. The A standard bearing with known diameter, is measured by the machine for n times, it is often the case that the measurements are varied, xi, and worse, the average of the measurements may be much far away from the ‘known’ true diameter,

51

X y

30 45.6 mg

31 45.8

32 45.8

33 46.0

34 46.2

35 46.2

36 46.5

37 46.7

Example, suppose we would like to calibrate a weighing device. A key adjustment knob is the key to the weighing system.

An object with a known ‘correct’ weight is used for calibrating the system. The knob can be set up at a continuous position.

For the calibration purpose, 10 different positions are tested. These positions are measured at degree of angle from the zero position. And suppose the true correct weight of this object is 46.0 mg. How can we use the data and the correct weight to calibrate the knob?

52

1. Fit a linear regression line: y = a+bx by the least square method:

2. For a given yobs, compute the predicted x-value is : xpred = (yob – a)/b

3. Compute the uncertainty involves with xpred by : s(xpred) = s(yobs)/b

4. For the given y response, the predicted x measurement is presented by

5. A 100(1-)% confidence interval for the predicted x is (the expanded uncertainty) is given by

2

2 and xy xy y y

xy xyx x xx

SS s s SSb r r a y bx

SS s SSs

2( )1where ( ) 1 pred

obsx

x xs y s

n SS

( )pred predx s x

Procedure of conducting a calibration

( / 2, 2) ( )pred n predx t s x

53

Hands-on Activity

X y

30 45.6 mg

31 45.8

32 45.8

33 46.0

34 46.2

35 46.2

36 46.5

37 46.7

Consider the weighing device calibration example

Covariances: X, Y

X Y

X 6.00

Y 0.90 0.14

Variable N Mean StDev SE Mean

X 8 33.5 2.449 0.866

Y 8 46.1 0.374 0.132

30 31 32 33 34 35 36 37

45.5

46.0

46.5

X

Y

Y = 41.075 + 0.15 X

S = 0.0763763 R-Sq = 96.4 % R-Sq(adj) = 95.8 %

Regression

95% CI

Regression Plot for Calibration Study

54

Row x-pred-y y-pred sdyfit

1 32.80 45.9950 0.0282351

2 32.81 45.9965 0.0282009

3 32.82 45.9980 0.0281672

4 32.83 45.9995 0.0281339

5 32.84 46.0010 0.0281010

6 32.85 46.0025 0.0280686

At y = 46.0, the ‘correct’ weight, xpred = (46-41.075)/.15 = 32.83

Several x-values are used to predict the y-responses and the results are in the table.

For y = 46, we obtain x = 38.33. The knob should be calibrated to 38.33. The uncertainty of xpred is given by

2 2 2( ) ( ) / ( ) / .0763763 .0281339 / .15 .5426pred obs fits x s y b MSE s y b

NOTE: The Minitab output provides s(yfit) not s(ypred). The relationship between s2(ypred) and s2(yfit) is

s2(ypred) = MSE + s2(yfit)

55

Some important activities for calibration study

It is important to conduct a diagnosis of the residuals

1. to make sure that the linear fit is adequate, and

2. to make sure there is no unusual observations that due to some special causes.

3. If the fit line is not adequate,

• It most likely because the range of x-values is too wide. One can narrow the range of the x-values, observe additional responses in the smaller range of x-values, then re-conduct the calibration study.

• If the small range of x-values does not work, a more complicated model such as quadratic model may be needed.

• It may be possible that the x-variable is not the key factor that affect the ‘shift’ of y-responses. Further investigation starting from brain-storming and setting up cause-effect diagram may be needed.

56

Type B Uncertainty

Type B uncertainty rises when the uncertainty is due to the physical property, the prior information, previous experience, or specifications of measurements predetermined for the instruments, or sometimes, the environment does not allow or is very costly to conduct empirical data analysis to determine Type A uncertainty.

Because of the limitations of observing the actual data to estimate Type A uncertainty, we usually are more conservative in determining the Type B uncertainty when no information about the distribution of possible range of measurement. In addition, in estimating Type B uncertainty, we would like to simplify the estimation process with a relatively conservative, but, ‘good enough’ estimate. For these reasons, simple distributions such as such as Rectangle or Triangle distribution are typically applied.

57

When there is no information about the variable to be measured, the Type B uncertainty assumes the distribution to be rectangular distribution with the x-range of 2a. From statistical probability point of view, this is a uniform distribution, and x-values have equal chance to occur. The distribution shape:

.5/a

2a

X

c-ac+a

The uncertainty is

Based on a uniform distribution, 2 = ((c+a)-(c-a))2/12 = a2/3.

Therefore, U(x) = = / 3a

Variable follows Uniform Distribution

58

Variable follows a Triangular distribution

In some cases, we are able to assume the variable follows a symmetric distribution with smaller variability than Uniform, and at the same time keep the simplicity for computation.

A Triangular distribution is a good approximation for these situations.

1/a2a

Xc-a c+a

1( ) for x [c-a,0]

( )

1 = for x [0,c+a]

( )

xf x

a a a c

x

a a a c

The uncertainty of the Triangular distribution is The variance of X is given by = a2/6. Therefore,

( ) 6U x a

When we measure Type B uncertainty, we need to determine which distribution is more appropriate.

59

Normal distribution is commonly used in Type A. It can also be applied for Type B. Since Type B does not require observing data to estimate the standard deviation, we need a quick estimate of uncertainty if the variable is assumed normal.

For Normal distribution,

2 from the mean is about 95% of coverage probability, and

3 from the mean is about 99% of converage porbability.

Therefore, if we have some good idea about the range of the variable, R (=largest -

smallest),

we can approximate the uncertainty, , by

/ 6 (since R 6 co vers almost 100% of the data)

Or a somewhat more conservative estimate by / 4(s ince R 4 covers95% of data.)

R

R

Type B uncertainty based on Normal Distribution

60

Degrees of Freedom for Type B uncertainty is another activity we need to estimate.

When a system consists of uncertainty components of both Type A and Type B, there is a need to determine appropriate degrees of freedom for the Type B uncertainty, which can then be used to combine with Type A uncertainty. This may be needed for the expanded uncertainty as discussed before.

In most cases, the degrees of freedom for Type B uncertainty is assumed infinite, since the uncertainty is considered the population parameter, not a sample estimate.

61


Recommended