Uji Statistik

Error Analysis - Statistics

Accuracy and Precision Individual Measurement Uncertainty

Distribution of Data Means, Variance and Standard Deviation Confidence Interval

Uncertainty of Quantity calculated from several Measurements Error Propagation

Least Squares Fitting of Data

Accuracy and Precision

AccuracyCloseness of the data (sample) to the true value.

PrecisionCloseness of the grouping of the data (sample) around some central value.


Inaccurate & Imprecise Precise but Inaccurate

Rel

ativ

e Fr

eque

ncy

X ValueTrue Value

Rel

ativ

e Fr

eque

ncy

X ValueTrue Value


Accurate but Imprecise Precise and Accurate

Rel

ativ

e Fr

eque

ncy

X ValueTrue Value

Rel

ativ

e Fr

eque

ncy

X ValueTrue Value


Q: How do we quantify the concept of accuracy and precision? -- How do we characterize the error that occurred in our measurement?

Individual Measurement Statistics

Take N measurements: X1, . . . , XN Calculate mean and standard deviation:

What to use as the best value and uncertainty so we can say we are Q% confident that the true value lies in the interval xbest x.

Need to know how data is distributed.

N

iiXN

x1

1

N

ixix XN

S1

22 1

Slide 6

Population and Sample

Parent PopulationThe set of all possible measurements.

SampleA subset of the population -measurements actually made.

Population

Bag of Marbles

Handful of marbles from the bag

Samples

Histogram (Sample Based)

Histogram A plot of the number of

times a given value occurred.

Relative Frequency A plot of the relative

number of times a given value occurred.

Histogram

0

5

10

15

20

25

30 35 40 45 50 55 60 65 70 75 80

X Value (Bin)

Num

ber o

f M

easu

rem

ents

Relative Frequency Plot

0

0.05

0.1

0.15

0.2

0.25

0.3

30 35 40 45 50 55 60 65 70 75 80

X Value (Bin)

Rel

ativ

e Fr

eque

ncy

Probability Distribution Function (P(x))

Probability Distribution Function is the integral of the pdf, i.e.

Q: Plot the probability distribution function vs x.

Q: What is the maximum value of P(x)?

Probability Distribution (Population Based)

Probability Density Function (pdf) (p(x)) Describes the probability

distribution of all possible measures of x.

Limiting case of the relative frequency.

xX

dxxpxP x Probability Density Function

0

0.05

0.1

0.15

0.2

0.25

0.3

30 35 40 45 50 55 60 65 70 75 80

x Value (Bin)

Prob

abili

ty p

er u

nit

chan

ge in

x

][ xXPxP Probability that

Ex:

is a probability density function. Find the relationship between A and B.

Probability Density Function

The probability that a measurement X takes value between (-) is 1.

Every pdf satisfies the above property.

Q: Given a pdf, how would one find the probability that a measurement is between A and B?

p x dx 1

p xA

xB

12

e

e 2

Hint: - a x dxa

120

Gaussian (Normal) Distribution

where: x = measured valuex = true (mean) valuex = standard deviationx2 = variance

Q: What are the two parameters that define a Gaussian distribution?

Common Statistical Distributions

2

2 2 1 e 2

x

x

x

x

p x

Q: How would one calculate the probability of a Gaussian distribution between x1and x2? ( See Chapter 4, Appendix A )

x Value

p x

Uniform Distribution

where: x = measured valuex1 = lower limitx2 = upper limit

Q: Why do x1 and x2 also define the magnitude of the uniform distribution PDF?


otherwise 0

1 2112

xxxxx

xp

x Value

p x


Ex: A voltage measurement has a Gaussian distribution with mean 3.4 [V] and a standard deviation of 0.4 [V]. Using Chapter 4, Appendix A, calculate the probability that a measurement is between:(a) [2.98, 3.82] [V]

(b) [2.4, 4.02] [V]

Ex: The quantization error of an ADC hasa uniform distribution in the quantization interval Q. What is the probability that the actual input voltage is within Q/8 of the estimated input voltage?

Standard Deviation (x and Sx ) Characterize the typical deviation of measurements from the mean

and the width of the Gaussian distribution (bell curve). Smaller x , implies better ______________.

Population Based

Sample Based (N samples)

Q: Often we do not know x , how should we calculate Sx ?

Statistical Analysis

x xx p x dx

2

12

N

ixix XN

S1

21

Standard Deviation (x and Sx ) (cont.)


Common Name for"Error" Level

Error Level inTerms of

% That the Deviationfrom the Mean is Smaller

Odds That theDeviation is Greater

Standard Deviation 68.3 about 1 in 3

"Two-Sigma Error" 95 1 in 20

"Three-Sigma Error" 99.7 1 in 370

"Four-Sigma Error" 99.994 1 in 16,000

x x x xZ x Z

Sampled Mean is the best estimate of x .

Sampled Standard Deviation ( Sx ) Use when x is not available. reduce by one degree of freedom.

Q: If the sampled mean is only an estimate of the true mean x , how do we characterize its error?

Q: If we take another set of samples, will we get a different sampled mean?Q: If we take many more sample sets, what will be the statistics of the set of sampled means?


x

dxxpxXEx

N

iiXN

x1

1

Degree of Freedom

Best Estimate

x

N

iix

N

ixix xXN

SXN

S x1

2knownnot When

1

2

11 1


Ex: The inlet pressure of a steam generator was measured 100 times during a 12 hour period. The specified inlet pressure is 4.00 MPa, with 0.7% allowable fluctuation. The measured data is summarized in the following table:Pressure (P)(MPa) Number of Results (m)

3.970 13.980 33.990 124.000 254.010 334.020 174.030 64.040 24.050 1

(1) Calculate the mean, variance and standard deviation. (2) Given the data, what pressure range will contain 95% of the data?

Sampled Mean Statistics If N is large, will also have a Gaussian distribution. (Central Limit Theorem)

Mean of :

is an unbiased estimate.

Standard Deviation of :

is the best estimate of the errorin estimating x .

Q: Since we dont know x , how would we calculate ?

Confidence Interval

x

x xE x x

x

x

xx

N

x

x

x

x

p x( )

p x( )

p x( )

For Large Samples ( N > 60 ), Q% of all the sampled means will lie in the interval

Equivalently,

is the Q% Confidence Interval

When x is unknown, Sx will be a reasonable approximation.

Confidence Interval

x

x x xx

N z zQ Q

x

Nx

Nx

xx

x x

z zQ Q

x x

p x

zQ x zQ x

Confidence Interval

Ex: 64 acceleration measurements were taken during an experiment. The estimated mean and standard deviation of the measurements were 3.15 m/s2and 0.4 m/s2. (1) Find the 98% confidence interval for the true mean.

(2) How confident are you that the true mean will be in the range from 2.85 to 3.45 m/s2 ?

For Small Samples ( N < 60 ), the Q% Confidence Interval can be calculated using the Student-T distribution, which is similar to the normal distribution but depends on N.

with Q% confidence, the true mean x will lie in the following interval about any sampled mean:

t,Q is defined in class notes Chapter 4, Appendix B.

Confidence Interval

x S

Nx S

N

N

x

S

xx

Sx x

t t

where

,Q ,Q

Q% confidence interval

1

Confidence Interval

Ex: A simple postal scale is supplied with , 1, 2, and 4 oz brass weights. For quality check, 14 of the 1 oz weights were measured on a precision scale. The results, in oz, are as follows:

1.08 1.03 0.96 0.95 1.041.01 0.98 0.99 1.05 1.080.97 1.00 0.98 1.01

Based on this sample and that the parent population of the weight is normally distributed, what is the 95% confidence interval for the true weight of the 1 oz brass weights?

Propagation of Error

Q: If you measured the diameter (D) and height (h) of a cylindrical container, how would the measurement error affect your estimation of the volume ( V = D2h/4 )?

Q: What is the uncertainty in calculating the kinetic energy ( mv2/ 2 ) given the uncertainties in the measurements of mass (m) and velocity (v)?

How do errors propagate through calculations?

A Simple ExampleSuppose that y is related to two independent quantities X1 and X2 through

To relate the changes in y to the uncertainties in X1 and X2, we need to find dy = g(dX1, dX2):

The magnitude of dy is the expected change in y due to the uncertainties in x1 and x2:


212211 , XXfXCXCy

dy

22212

22

2

11

21 xxyCCx

Xfx

Xfy

General FormulaSuppose that y is related to n independent measured variables {X1, X2, , Xn} by a functional representation:

Given the uncertainties of Xs around some operating points:

The expected value of and its uncertainty y are:


nXXXfy ,,, 21

x x x x x xn n1 1 2 2 , , ,

nxxx

nn

n

xXfx

Xfx

Xfy

xxxfy

,,,

22

22

2

11

11

11

,,,

y


Proof:Assume that the variability in measurement y is caused by k independent zero-mean error sources: e1, e2, . . . , ek.Then, (y - ytrue)2 = (e1 + e2 + . . . + ek)2

= e12 + e22 + . . . + ek2 + 2e1e2 + 2e1e3 + . . .E[(y - ytrue)2] = E[e12 + e22 + . . . + ek2 + 2e1e2 + 2e1e3 + . . .]

= E[e12 + e22 + . . . + ek2]

y k kE e E e E e 12 2 2 2 12 2 2 2

Slide 26

Example (Standard Deviation of Sampled Mean)Given

Use the general formula for error propagation:


NXXXXNx 321

1

N

Xx

Xx

Xx

Xx

xx

xN

xxxx N

22

3

2

2

2

1321


Ex: What is the uncertainty in calculating the kinetic energy ( mv2/ 2 ) given the uncertainties in the measurements of mass (m) and velocity (v)?

KE KEm

m KEv

v

mv mm

mv vv

mv mm

vv

2 2

22

22

22 2

12

2

12

2

Best Linear FitHow do we characterize BEST?

Fit a linear model (relation)

to N pairs of [xi, yi] measurements.

Given xi, the error between the estimated output and the measured output yi is:

The BEST fit is the model that minimizes the sum of the ___________ of the error


Input X

Out

put Y best linear

fit yest

measured output yi

y a a xi o i 1

y i

n y yi i i

min minn y yi

i=

N

i ii=

N2

1

2

1

Least Square Error

Let

The two independent variables are?

Q: What are we trying to solve?


J y y y a a xi ii=

N

i o ii=

N

2

11

2

1

M inim ize Find and such that 1J a a dJo 0

Ja

y a a x

o

i o iiN

0

2 011

Ja

x y a a xi i o iiN

0

2 011


Rewrite the last two equations as two simultaneous equations for ao and a1:

ax y x x y

aN x y x y

N x xo

i i i i i

i i i ii i

2

1

2 2

where

a N a x y

a x a x x y

aa

yx y

o i i

o i i i i

o i

i i

1

12

1

Summary: Given N pairs of input/output measurements [xi, yi], the best linear Least Squares model from input xi to output yi is:

where

The process of minimizing squared error can be used for fitting nonlinear models and many engineering applications.

Same result can also be derived from a probability distribution point of view (see Course Notes, Ch. 4 - Maximum Likelihood Estimation ).

Q: Given a theoretical model y = ao + a2 x2 , what are the Least Squares estimates for ao & a2?


y a a xi o i 1

a

x y x x y

aN x y x y N x x

oi i i i i

i i i ii i

2

1

2 2

and


Variance of the fit:

Variance of the measurements in y: y2

Assume measurements in x are precise. Correlation coefficient:

is a measure of how well the model explains the data.R2 = 1 implies that the linear model fits the data perfectly.

RS

n

y

n

y

22

2

2

21 1

,

n N i o iiN y a a x2 1 2 1

21

Documents

Uji Statistik