of 33 /33
Slide 1 Error Analysis - Statistics Accuracy and Precision Individual Measurement Uncertainty Distribution of Data Means, Variance and Standard Deviation Confidence Interval Uncertainty of Quantity calculated from several Measurements Error Propagation Least Squares Fitting of Data

# Uji Statistik

Embed Size (px)

DESCRIPTION

JJH

Citation preview

• Slide 1

Error Analysis - Statistics

Accuracy and Precision Individual Measurement Uncertainty

Distribution of Data Means, Variance and Standard Deviation Confidence Interval

Uncertainty of Quantity calculated from several Measurements Error Propagation

Least Squares Fitting of Data

• Slide 2

Accuracy and Precision

AccuracyCloseness of the data (sample) to the true value.

PrecisionCloseness of the grouping of the data (sample) around some central value.

• Slide 3

Accuracy and Precision

Inaccurate & Imprecise Precise but Inaccurate

Rel

ativ

e Fr

eque

ncy

X ValueTrue Value

Rel

ativ

e Fr

eque

ncy

X ValueTrue Value

• Slide 4

Accuracy and Precision

Accurate but Imprecise Precise and Accurate

Rel

ativ

e Fr

eque

ncy

X ValueTrue Value

Rel

ativ

e Fr

eque

ncy

X ValueTrue Value

• Slide 5

Accuracy and Precision

Q: How do we quantify the concept of accuracy and precision? -- How do we characterize the error that occurred in our measurement?

• Individual Measurement Statistics

Take N measurements: X1, . . . , XN Calculate mean and standard deviation:

What to use as the best value and uncertainty so we can say we are Q% confident that the true value lies in the interval xbest x.

Need to know how data is distributed.

N

iiXN

x1

1

N

ixix XN

S1

22 1

Slide 6

• Slide 7

Population and Sample

Parent PopulationThe set of all possible measurements.

SampleA subset of the population -measurements actually made.

Population

Bag of Marbles

Handful of marbles from the bag

Samples

• Slide 8

Histogram (Sample Based)

Histogram A plot of the number of

times a given value occurred.

Relative Frequency A plot of the relative

number of times a given value occurred.

Histogram

0

5

10

15

20

25

30 35 40 45 50 55 60 65 70 75 80

X Value (Bin)

Num

ber o

f M

easu

rem

ents

Relative Frequency Plot

0

0.05

0.1

0.15

0.2

0.25

0.3

30 35 40 45 50 55 60 65 70 75 80

X Value (Bin)

Rel

ativ

e Fr

eque

ncy

• Slide 9

Probability Distribution Function (P(x))

Probability Distribution Function is the integral of the pdf, i.e.

Q: Plot the probability distribution function vs x.

Q: What is the maximum value of P(x)?

Probability Distribution (Population Based)

Probability Density Function (pdf) (p(x)) Describes the probability

distribution of all possible measures of x.

Limiting case of the relative frequency.

xX

dxxpxP x Probability Density Function

0

0.05

0.1

0.15

0.2

0.25

0.3

30 35 40 45 50 55 60 65 70 75 80

x Value (Bin)

Prob

abili

ty p

er u

nit

chan

ge in

x

][ xXPxP Probability that

• Slide 10

Ex:

is a probability density function. Find the relationship between A and B.

Probability Density Function

The probability that a measurement X takes value between (-) is 1.

Every pdf satisfies the above property.

Q: Given a pdf, how would one find the probability that a measurement is between A and B?

p x dx 1

p xA

xB

12

e

e 2

Hint: - a x dxa

120

• Slide 11

Gaussian (Normal) Distribution

where: x = measured valuex = true (mean) valuex = standard deviationx2 = variance

Q: What are the two parameters that define a Gaussian distribution?

Common Statistical Distributions

2

2 2 1 e 2

x

x

x

x

p x

Q: How would one calculate the probability of a Gaussian distribution between x1and x2? ( See Chapter 4, Appendix A )

x Value

p x

• Slide 12

Uniform Distribution

where: x = measured valuex1 = lower limitx2 = upper limit

Q: Why do x1 and x2 also define the magnitude of the uniform distribution PDF?

Common Statistical Distributions

otherwise 0

1 2112

xxxxx

xp

x Value

p x

• Slide 13

Common Statistical Distributions

Ex: A voltage measurement has a Gaussian distribution with mean 3.4 [V] and a standard deviation of 0.4 [V]. Using Chapter 4, Appendix A, calculate the probability that a measurement is between:(a) [2.98, 3.82] [V]

(b) [2.4, 4.02] [V]

Ex: The quantization error of an ADC hasa uniform distribution in the quantization interval Q. What is the probability that the actual input voltage is within Q/8 of the estimated input voltage?

• Slide 14

Standard Deviation (x and Sx ) Characterize the typical deviation of measurements from the mean

and the width of the Gaussian distribution (bell curve). Smaller x , implies better ______________.

Population Based

Sample Based (N samples)

Q: Often we do not know x , how should we calculate Sx ?

Statistical Analysis

x xx p x dx

2

12

N

ixix XN

S1

21

• Slide 15

Standard Deviation (x and Sx ) (cont.)

Statistical Analysis

Common Name for"Error" Level

Error Level inTerms of

% That the Deviationfrom the Mean is Smaller

Odds That theDeviation is Greater

Standard Deviation 68.3 about 1 in 3

"Two-Sigma Error" 95 1 in 20

"Three-Sigma Error" 99.7 1 in 370

"Four-Sigma Error" 99.994 1 in 16,000

x x x xZ x Z

• Slide 16

Sampled Mean is the best estimate of x .

Sampled Standard Deviation ( Sx ) Use when x is not available. reduce by one degree of freedom.

Q: If the sampled mean is only an estimate of the true mean x , how do we characterize its error?

Q: If we take another set of samples, will we get a different sampled mean?Q: If we take many more sample sets, what will be the statistics of the set of sampled means?

Statistical Analysis

x

dxxpxXEx

N

iiXN

x1

1

Degree of Freedom

Best Estimate

x

N

iix

N

ixix xXN

SXN

S x1

2knownnot When

1

2

11 1

• Slide 17

Statistical Analysis

Ex: The inlet pressure of a steam generator was measured 100 times during a 12 hour period. The specified inlet pressure is 4.00 MPa, with 0.7% allowable fluctuation. The measured data is summarized in the following table:Pressure (P)(MPa) Number of Results (m)

3.970 13.980 33.990 124.000 254.010 334.020 174.030 64.040 24.050 1

(1) Calculate the mean, variance and standard deviation. (2) Given the data, what pressure range will contain 95% of the data?

• Slide 18

Sampled Mean Statistics If N is large, will also have a Gaussian distribution. (Central Limit Theorem)

Mean of :

is an unbiased estimate.

Standard Deviation of :

is the best estimate of the errorin estimating x .

Q: Since we dont know x , how would we calculate ?

Confidence Interval

x

x xE x x

x

x

xx

N

x

x

x

x

p x( )

p x( )

p x( )

• Slide 19

For Large Samples ( N > 60 ), Q% of all the sampled means will lie in the interval

Equivalently,

is the Q% Confidence Interval

When x is unknown, Sx will be a reasonable approximation.

Confidence Interval

x

x x xx

N z zQ Q

x

Nx

Nx

xx

x x

z zQ Q

x x

p x

zQ x zQ x

• Slide 20

Confidence Interval

Ex: 64 acceleration measurements were taken during an experiment. The estimated mean and standard deviation of the measurements were 3.15 m/s2and 0.4 m/s2. (1) Find the 98% confidence interval for the true mean.

(2) How confident are you that the true mean will be in the range from 2.85 to 3.45 m/s2 ?

• Slide 21

For Small Samples ( N < 60 ), the Q% Confidence Interval can be calculated using the Student-T distribution, which is similar to the normal distribution but depends on N.

with Q% confidence, the true mean x will lie in the following interval about any sampled mean:

t,Q is defined in class notes Chapter 4, Appendix B.

Confidence Interval

x S

Nx S

N

N

x

S

xx

Sx x

t t

where

,Q ,Q

Q% confidence interval

1

• Slide 22

Confidence Interval

Ex: A simple postal scale is supplied with , 1, 2, and 4 oz brass weights. For quality check, 14 of the 1 oz weights were measured on a precision scale. The results, in oz, are as follows:

1.08 1.03 0.96 0.95 1.041.01 0.98 0.99 1.05 1.080.97 1.00 0.98 1.01

Based on this sample and that the parent population of the weight is normally distributed, what is the 95% confidence interval for the true weight of the 1 oz brass weights?

• Slide 23

Propagation of Error

Q: If you measured the diameter (D) and height (h) of a cylindrical container, how would the measurement error affect your estimation of the volume ( V = D2h/4 )?

Q: What is the uncertainty in calculating the kinetic energy ( mv2/ 2 ) given the uncertainties in the measurements of mass (m) and velocity (v)?

How do errors propagate through calculations?

• Slide 24

A Simple ExampleSuppose that y is related to two independent quantities X1 and X2 through

To relate the changes in y to the uncertainties in X1 and X2, we need to find dy = g(dX1, dX2):

The magnitude of dy is the expected change in y due to the uncertainties in x1 and x2:

Propagation of Error

212211 , XXfXCXCy

dy

22212

22

2

11

21 xxyCCx

Xfx

Xfy

• Slide 25

General FormulaSuppose that y is related to n independent measured variables {X1, X2, , Xn} by a functional representation:

Given the uncertainties of Xs around some operating points:

The expected value of and its uncertainty y are:

Propagation of Error

nXXXfy ,,, 21

x x x x x xn n1 1 2 2 , , ,

nxxx

nn

n

xXfx

Xfx

Xfy

xxxfy

,,,

22

22

2

11

11

11

,,,

y

• Propagation of Error

Proof:Assume that the variability in measurement y is caused by k independent zero-mean error sources: e1, e2, . . . , ek.Then, (y - ytrue)2 = (e1 + e2 + . . . + ek)2

= e12 + e22 + . . . + ek2 + 2e1e2 + 2e1e3 + . . .E[(y - ytrue)2] = E[e12 + e22 + . . . + ek2 + 2e1e2 + 2e1e3 + . . .]

= E[e12 + e22 + . . . + ek2]

y k kE e E e E e 12 2 2 2 12 2 2 2

Slide 26

• Slide 27

Example (Standard Deviation of Sampled Mean)Given

Use the general formula for error propagation:

Propagation of Error

NXXXXNx 321

1

N

Xx

Xx

Xx

Xx

xx

xN

xxxx N

22

3

2

2

2

1321

• Slide 28

Propagation of Error

Ex: What is the uncertainty in calculating the kinetic energy ( mv2/ 2 ) given the uncertainties in the measurements of mass (m) and velocity (v)?

KE KEm

m KEv

v

mv mm

mv vv

mv mm

vv

2 2

22

22

22 2

12

2

12

2

• Slide 29

Best Linear FitHow do we characterize BEST?

Fit a linear model (relation)

to N pairs of [xi, yi] measurements.

Given xi, the error between the estimated output and the measured output yi is:

The BEST fit is the model that minimizes the sum of the ___________ of the error

Least Squares Fitting of Data

Input X

Out

put Y best linear

fit yest

measured output yi

y a a xi o i 1

y i

n y yi i i

min minn y yi

i=

N

i ii=

N2

1

2

1

Least Square Error

• Slide 30

Let

The two independent variables are?

Q: What are we trying to solve?

Least Squares Fitting of Data

J y y y a a xi ii=

N

i o ii=

N

2

11

2

1

M inim ize Find and such that 1J a a dJo 0

Ja

y a a x

o

i o iiN

0

2 011

Ja

x y a a xi i o iiN

0

2 011

• Slide 31

Least Squares Fitting of Data

Rewrite the last two equations as two simultaneous equations for ao and a1:

ax y x x y

aN x y x y

N x xo

i i i i i

i i i ii i

2

1

2 2

where

a N a x y

a x a x x y

aa

yx y

o i i

o i i i i

o i

i i

1

12

1

• Slide 32

Summary: Given N pairs of input/output measurements [xi, yi], the best linear Least Squares model from input xi to output yi is:

where

The process of minimizing squared error can be used for fitting nonlinear models and many engineering applications.

Same result can also be derived from a probability distribution point of view (see Course Notes, Ch. 4 - Maximum Likelihood Estimation ).

Q: Given a theoretical model y = ao + a2 x2 , what are the Least Squares estimates for ao & a2?

Least Squares Fitting of Data

y a a xi o i 1

a

x y x x y

aN x y x y N x x

oi i i i i

i i i ii i

2

1

2 2

and

• Slide 33

Least Squares Fitting of Data

Variance of the fit:

Variance of the measurements in y: y2

Assume measurements in x are precise. Correlation coefficient:

is a measure of how well the model explains the data.R2 = 1 implies that the linear model fits the data perfectly.

RS

n

y

n

y

22

2

2

21 1

,

n N i o iiN y a a x2 1 2 1

21