View
252
Download
2
Category
Tags:
Preview:
DESCRIPTION
JJH
Citation preview
Slide 1
Error Analysis - Statistics
Accuracy and Precision Individual Measurement Uncertainty
Distribution of Data Means, Variance and Standard Deviation Confidence Interval
Uncertainty of Quantity calculated from several Measurements Error Propagation
Least Squares Fitting of Data
Slide 2
Accuracy and Precision
AccuracyCloseness of the data (sample) to the true value.
PrecisionCloseness of the grouping of the data (sample) around some central value.
Slide 3
Accuracy and Precision
Inaccurate & Imprecise Precise but Inaccurate
Rel
ativ
e Fr
eque
ncy
X ValueTrue Value
Rel
ativ
e Fr
eque
ncy
X ValueTrue Value
Slide 4
Accuracy and Precision
Accurate but Imprecise Precise and Accurate
Rel
ativ
e Fr
eque
ncy
X ValueTrue Value
Rel
ativ
e Fr
eque
ncy
X ValueTrue Value
Slide 5
Accuracy and Precision
Q: How do we quantify the concept of accuracy and precision? -- How do we characterize the error that occurred in our measurement?
Individual Measurement Statistics
Take N measurements: X1, . . . , XN Calculate mean and standard deviation:
What to use as the best value and uncertainty so we can say we are Q% confident that the true value lies in the interval xbest x.
Need to know how data is distributed.
N
iiXN
x1
1
N
ixix XN
S1
22 1
Slide 6
Slide 7
Population and Sample
Parent PopulationThe set of all possible measurements.
SampleA subset of the population -measurements actually made.
Population
Bag of Marbles
Handful of marbles from the bag
Samples
Slide 8
Histogram (Sample Based)
Histogram A plot of the number of
times a given value occurred.
Relative Frequency A plot of the relative
number of times a given value occurred.
Histogram
0
5
10
15
20
25
30 35 40 45 50 55 60 65 70 75 80
X Value (Bin)
Num
ber o
f M
easu
rem
ents
Relative Frequency Plot
0
0.05
0.1
0.15
0.2
0.25
0.3
30 35 40 45 50 55 60 65 70 75 80
X Value (Bin)
Rel
ativ
e Fr
eque
ncy
Slide 9
Probability Distribution Function (P(x))
Probability Distribution Function is the integral of the pdf, i.e.
Q: Plot the probability distribution function vs x.
Q: What is the maximum value of P(x)?
Probability Distribution (Population Based)
Probability Density Function (pdf) (p(x)) Describes the probability
distribution of all possible measures of x.
Limiting case of the relative frequency.
xX
dxxpxP x Probability Density Function
0
0.05
0.1
0.15
0.2
0.25
0.3
30 35 40 45 50 55 60 65 70 75 80
x Value (Bin)
Prob
abili
ty p
er u
nit
chan
ge in
x
][ xXPxP Probability that
Slide 10
Ex:
is a probability density function. Find the relationship between A and B.
Probability Density Function
The probability that a measurement X takes value between (-) is 1.
Every pdf satisfies the above property.
Q: Given a pdf, how would one find the probability that a measurement is between A and B?
p x dx 1
p xA
xB
12
e
e 2
Hint: - a x dxa
120
Slide 11
Gaussian (Normal) Distribution
where: x = measured valuex = true (mean) valuex = standard deviationx2 = variance
Q: What are the two parameters that define a Gaussian distribution?
Common Statistical Distributions
2
2 2 1 e 2
x
x
x
x
p x
Q: How would one calculate the probability of a Gaussian distribution between x1and x2? ( See Chapter 4, Appendix A )
x Value
p x
Slide 12
Uniform Distribution
where: x = measured valuex1 = lower limitx2 = upper limit
Q: Why do x1 and x2 also define the magnitude of the uniform distribution PDF?
Common Statistical Distributions
otherwise 0
1 2112
xxxxx
xp
x Value
p x
Slide 13
Common Statistical Distributions
Ex: A voltage measurement has a Gaussian distribution with mean 3.4 [V] and a standard deviation of 0.4 [V]. Using Chapter 4, Appendix A, calculate the probability that a measurement is between:(a) [2.98, 3.82] [V]
(b) [2.4, 4.02] [V]
Ex: The quantization error of an ADC hasa uniform distribution in the quantization interval Q. What is the probability that the actual input voltage is within Q/8 of the estimated input voltage?
Slide 14
Standard Deviation (x and Sx ) Characterize the typical deviation of measurements from the mean
and the width of the Gaussian distribution (bell curve). Smaller x , implies better ______________.
Population Based
Sample Based (N samples)
Q: Often we do not know x , how should we calculate Sx ?
Statistical Analysis
x xx p x dx
2
12
N
ixix XN
S1
21
Slide 15
Standard Deviation (x and Sx ) (cont.)
Statistical Analysis
Common Name for"Error" Level
Error Level inTerms of
% That the Deviationfrom the Mean is Smaller
Odds That theDeviation is Greater
Standard Deviation 68.3 about 1 in 3
"Two-Sigma Error" 95 1 in 20
"Three-Sigma Error" 99.7 1 in 370
"Four-Sigma Error" 99.994 1 in 16,000
x x x xZ x Z
Slide 16
Sampled Mean is the best estimate of x .
Sampled Standard Deviation ( Sx ) Use when x is not available. reduce by one degree of freedom.
Q: If the sampled mean is only an estimate of the true mean x , how do we characterize its error?
Q: If we take another set of samples, will we get a different sampled mean?Q: If we take many more sample sets, what will be the statistics of the set of sampled means?
Statistical Analysis
x
dxxpxXEx
N
iiXN
x1
1
Degree of Freedom
Best Estimate
x
N
iix
N
ixix xXN
SXN
S x1
2knownnot When
1
2
11 1
Slide 17
Statistical Analysis
Ex: The inlet pressure of a steam generator was measured 100 times during a 12 hour period. The specified inlet pressure is 4.00 MPa, with 0.7% allowable fluctuation. The measured data is summarized in the following table:Pressure (P)(MPa) Number of Results (m)
3.970 13.980 33.990 124.000 254.010 334.020 174.030 64.040 24.050 1
(1) Calculate the mean, variance and standard deviation. (2) Given the data, what pressure range will contain 95% of the data?
Slide 18
Sampled Mean Statistics If N is large, will also have a Gaussian distribution. (Central Limit Theorem)
Mean of :
is an unbiased estimate.
Standard Deviation of :
is the best estimate of the errorin estimating x .
Q: Since we dont know x , how would we calculate ?
Confidence Interval
x
x xE x x
x
x
xx
N
x
x
x
x
p x( )
p x( )
p x( )
Slide 19
For Large Samples ( N > 60 ), Q% of all the sampled means will lie in the interval
Equivalently,
is the Q% Confidence Interval
When x is unknown, Sx will be a reasonable approximation.
Confidence Interval
x
x x xx
N z zQ Q
x
Nx
Nx
xx
x x
z zQ Q
x x
p x
zQ x zQ x
Slide 20
Confidence Interval
Ex: 64 acceleration measurements were taken during an experiment. The estimated mean and standard deviation of the measurements were 3.15 m/s2and 0.4 m/s2. (1) Find the 98% confidence interval for the true mean.
(2) How confident are you that the true mean will be in the range from 2.85 to 3.45 m/s2 ?
Slide 21
For Small Samples ( N < 60 ), the Q% Confidence Interval can be calculated using the Student-T distribution, which is similar to the normal distribution but depends on N.
with Q% confidence, the true mean x will lie in the following interval about any sampled mean:
t,Q is defined in class notes Chapter 4, Appendix B.
Confidence Interval
x S
Nx S
N
N
x
S
xx
Sx x
t t
where
,Q ,Q
Q% confidence interval
1
Slide 22
Confidence Interval
Ex: A simple postal scale is supplied with , 1, 2, and 4 oz brass weights. For quality check, 14 of the 1 oz weights were measured on a precision scale. The results, in oz, are as follows:
1.08 1.03 0.96 0.95 1.041.01 0.98 0.99 1.05 1.080.97 1.00 0.98 1.01
Based on this sample and that the parent population of the weight is normally distributed, what is the 95% confidence interval for the true weight of the 1 oz brass weights?
Slide 23
Propagation of Error
Q: If you measured the diameter (D) and height (h) of a cylindrical container, how would the measurement error affect your estimation of the volume ( V = D2h/4 )?
Q: What is the uncertainty in calculating the kinetic energy ( mv2/ 2 ) given the uncertainties in the measurements of mass (m) and velocity (v)?
How do errors propagate through calculations?
Slide 24
A Simple ExampleSuppose that y is related to two independent quantities X1 and X2 through
To relate the changes in y to the uncertainties in X1 and X2, we need to find dy = g(dX1, dX2):
The magnitude of dy is the expected change in y due to the uncertainties in x1 and x2:
Propagation of Error
212211 , XXfXCXCy
dy
22212
22
2
11
21 xxyCCx
Xfx
Xfy
Slide 25
General FormulaSuppose that y is related to n independent measured variables {X1, X2, , Xn} by a functional representation:
Given the uncertainties of Xs around some operating points:
The expected value of and its uncertainty y are:
Propagation of Error
nXXXfy ,,, 21
x x x x x xn n1 1 2 2 , , ,
nxxx
nn
n
xXfx
Xfx
Xfy
xxxfy
,,,
22
22
2
11
11
11
,,,
y
Propagation of Error
Proof:Assume that the variability in measurement y is caused by k independent zero-mean error sources: e1, e2, . . . , ek.Then, (y - ytrue)2 = (e1 + e2 + . . . + ek)2
= e12 + e22 + . . . + ek2 + 2e1e2 + 2e1e3 + . . .E[(y - ytrue)2] = E[e12 + e22 + . . . + ek2 + 2e1e2 + 2e1e3 + . . .]
= E[e12 + e22 + . . . + ek2]
y k kE e E e E e 12 2 2 2 12 2 2 2
Slide 26
Slide 27
Example (Standard Deviation of Sampled Mean)Given
Use the general formula for error propagation:
Propagation of Error
NXXXXNx 321
1
N
Xx
Xx
Xx
Xx
xx
xN
xxxx N
22
3
2
2
2
1321
Slide 28
Propagation of Error
Ex: What is the uncertainty in calculating the kinetic energy ( mv2/ 2 ) given the uncertainties in the measurements of mass (m) and velocity (v)?
KE KEm
m KEv
v
mv mm
mv vv
mv mm
vv
2 2
22
22
22 2
12
2
12
2
Slide 29
Best Linear FitHow do we characterize BEST?
Fit a linear model (relation)
to N pairs of [xi, yi] measurements.
Given xi, the error between the estimated output and the measured output yi is:
The BEST fit is the model that minimizes the sum of the ___________ of the error
Least Squares Fitting of Data
Input X
Out
put Y best linear
fit yest
measured output yi
y a a xi o i 1
y i
n y yi i i
min minn y yi
i=
N
i ii=
N2
1
2
1
Least Square Error
Slide 30
Let
The two independent variables are?
Q: What are we trying to solve?
Least Squares Fitting of Data
J y y y a a xi ii=
N
i o ii=
N
2
11
2
1
M inim ize Find and such that 1J a a dJo 0
Ja
y a a x
o
i o iiN
0
2 011
Ja
x y a a xi i o iiN
0
2 011
Slide 31
Least Squares Fitting of Data
Rewrite the last two equations as two simultaneous equations for ao and a1:
ax y x x y
aN x y x y
N x xo
i i i i i
i i i ii i
2
1
2 2
where
a N a x y
a x a x x y
aa
yx y
o i i
o i i i i
o i
i i
1
12
1
Slide 32
Summary: Given N pairs of input/output measurements [xi, yi], the best linear Least Squares model from input xi to output yi is:
where
The process of minimizing squared error can be used for fitting nonlinear models and many engineering applications.
Same result can also be derived from a probability distribution point of view (see Course Notes, Ch. 4 - Maximum Likelihood Estimation ).
Q: Given a theoretical model y = ao + a2 x2 , what are the Least Squares estimates for ao & a2?
Least Squares Fitting of Data
y a a xi o i 1
a
x y x x y
aN x y x y N x x
oi i i i i
i i i ii i
2
1
2 2
and
Slide 33
Least Squares Fitting of Data
Variance of the fit:
Variance of the measurements in y: y2
Assume measurements in x are precise. Correlation coefficient:
is a measure of how well the model explains the data.R2 = 1 implies that the linear model fits the data perfectly.
RS
n
y
n
y
22
2
2
21 1
,
n N i o iiN y a a x2 1 2 1
21
Recommended