33
Slide 1 Error Analysis - Statistics Accuracy and Precision Individual Measurement Uncertainty Distribution of Data Means, Variance and Standard Deviation Confidence Interval Uncertainty of Quantity calculated from several Measurements Error Propagation Least Squares Fitting of Data

Uji Statistik

Embed Size (px)

DESCRIPTION

JJH

Citation preview

  • Slide 1

    Error Analysis - Statistics

    Accuracy and Precision Individual Measurement Uncertainty

    Distribution of Data Means, Variance and Standard Deviation Confidence Interval

    Uncertainty of Quantity calculated from several Measurements Error Propagation

    Least Squares Fitting of Data

  • Slide 2

    Accuracy and Precision

    AccuracyCloseness of the data (sample) to the true value.

    PrecisionCloseness of the grouping of the data (sample) around some central value.

  • Slide 3

    Accuracy and Precision

    Inaccurate & Imprecise Precise but Inaccurate

    Rel

    ativ

    e Fr

    eque

    ncy

    X ValueTrue Value

    Rel

    ativ

    e Fr

    eque

    ncy

    X ValueTrue Value

  • Slide 4

    Accuracy and Precision

    Accurate but Imprecise Precise and Accurate

    Rel

    ativ

    e Fr

    eque

    ncy

    X ValueTrue Value

    Rel

    ativ

    e Fr

    eque

    ncy

    X ValueTrue Value

  • Slide 5

    Accuracy and Precision

    Q: How do we quantify the concept of accuracy and precision? -- How do we characterize the error that occurred in our measurement?

  • Individual Measurement Statistics

    Take N measurements: X1, . . . , XN Calculate mean and standard deviation:

    What to use as the best value and uncertainty so we can say we are Q% confident that the true value lies in the interval xbest x.

    Need to know how data is distributed.

    N

    iiXN

    x1

    1

    N

    ixix XN

    S1

    22 1

    Slide 6

  • Slide 7

    Population and Sample

    Parent PopulationThe set of all possible measurements.

    SampleA subset of the population -measurements actually made.

    Population

    Bag of Marbles

    Handful of marbles from the bag

    Samples

  • Slide 8

    Histogram (Sample Based)

    Histogram A plot of the number of

    times a given value occurred.

    Relative Frequency A plot of the relative

    number of times a given value occurred.

    Histogram

    0

    5

    10

    15

    20

    25

    30 35 40 45 50 55 60 65 70 75 80

    X Value (Bin)

    Num

    ber o

    f M

    easu

    rem

    ents

    Relative Frequency Plot

    0

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    30 35 40 45 50 55 60 65 70 75 80

    X Value (Bin)

    Rel

    ativ

    e Fr

    eque

    ncy

  • Slide 9

    Probability Distribution Function (P(x))

    Probability Distribution Function is the integral of the pdf, i.e.

    Q: Plot the probability distribution function vs x.

    Q: What is the maximum value of P(x)?

    Probability Distribution (Population Based)

    Probability Density Function (pdf) (p(x)) Describes the probability

    distribution of all possible measures of x.

    Limiting case of the relative frequency.

    xX

    dxxpxP x Probability Density Function

    0

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    30 35 40 45 50 55 60 65 70 75 80

    x Value (Bin)

    Prob

    abili

    ty p

    er u

    nit

    chan

    ge in

    x

    ][ xXPxP Probability that

  • Slide 10

    Ex:

    is a probability density function. Find the relationship between A and B.

    Probability Density Function

    The probability that a measurement X takes value between (-) is 1.

    Every pdf satisfies the above property.

    Q: Given a pdf, how would one find the probability that a measurement is between A and B?

    p x dx 1

    p xA

    xB

    12

    e

    e 2

    Hint: - a x dxa

    120

  • Slide 11

    Gaussian (Normal) Distribution

    where: x = measured valuex = true (mean) valuex = standard deviationx2 = variance

    Q: What are the two parameters that define a Gaussian distribution?

    Common Statistical Distributions

    2

    2 2 1 e 2

    x

    x

    x

    x

    p x

    Q: How would one calculate the probability of a Gaussian distribution between x1and x2? ( See Chapter 4, Appendix A )

    x Value

    p x

  • Slide 12

    Uniform Distribution

    where: x = measured valuex1 = lower limitx2 = upper limit

    Q: Why do x1 and x2 also define the magnitude of the uniform distribution PDF?

    Common Statistical Distributions

    otherwise 0

    1 2112

    xxxxx

    xp

    x Value

    p x

  • Slide 13

    Common Statistical Distributions

    Ex: A voltage measurement has a Gaussian distribution with mean 3.4 [V] and a standard deviation of 0.4 [V]. Using Chapter 4, Appendix A, calculate the probability that a measurement is between:(a) [2.98, 3.82] [V]

    (b) [2.4, 4.02] [V]

    Ex: The quantization error of an ADC hasa uniform distribution in the quantization interval Q. What is the probability that the actual input voltage is within Q/8 of the estimated input voltage?

  • Slide 14

    Standard Deviation (x and Sx ) Characterize the typical deviation of measurements from the mean

    and the width of the Gaussian distribution (bell curve). Smaller x , implies better ______________.

    Population Based

    Sample Based (N samples)

    Q: Often we do not know x , how should we calculate Sx ?

    Statistical Analysis

    x xx p x dx

    2

    12

    N

    ixix XN

    S1

    21

  • Slide 15

    Standard Deviation (x and Sx ) (cont.)

    Statistical Analysis

    Common Name for"Error" Level

    Error Level inTerms of

    % That the Deviationfrom the Mean is Smaller

    Odds That theDeviation is Greater

    Standard Deviation 68.3 about 1 in 3

    "Two-Sigma Error" 95 1 in 20

    "Three-Sigma Error" 99.7 1 in 370

    "Four-Sigma Error" 99.994 1 in 16,000

    x x x xZ x Z

  • Slide 16

    Sampled Mean is the best estimate of x .

    Sampled Standard Deviation ( Sx ) Use when x is not available. reduce by one degree of freedom.

    Q: If the sampled mean is only an estimate of the true mean x , how do we characterize its error?

    Q: If we take another set of samples, will we get a different sampled mean?Q: If we take many more sample sets, what will be the statistics of the set of sampled means?

    Statistical Analysis

    x

    dxxpxXEx

    N

    iiXN

    x1

    1

    Degree of Freedom

    Best Estimate

    x

    N

    iix

    N

    ixix xXN

    SXN

    S x1

    2knownnot When

    1

    2

    11 1

  • Slide 17

    Statistical Analysis

    Ex: The inlet pressure of a steam generator was measured 100 times during a 12 hour period. The specified inlet pressure is 4.00 MPa, with 0.7% allowable fluctuation. The measured data is summarized in the following table:Pressure (P)(MPa) Number of Results (m)

    3.970 13.980 33.990 124.000 254.010 334.020 174.030 64.040 24.050 1

    (1) Calculate the mean, variance and standard deviation. (2) Given the data, what pressure range will contain 95% of the data?

  • Slide 18

    Sampled Mean Statistics If N is large, will also have a Gaussian distribution. (Central Limit Theorem)

    Mean of :

    is an unbiased estimate.

    Standard Deviation of :

    is the best estimate of the errorin estimating x .

    Q: Since we dont know x , how would we calculate ?

    Confidence Interval

    x

    x xE x x

    x

    x

    xx

    N

    x

    x

    x

    x

    p x( )

    p x( )

    p x( )

  • Slide 19

    For Large Samples ( N > 60 ), Q% of all the sampled means will lie in the interval

    Equivalently,

    is the Q% Confidence Interval

    When x is unknown, Sx will be a reasonable approximation.

    Confidence Interval

    x

    x x xx

    N z zQ Q

    x

    Nx

    Nx

    xx

    x x

    z zQ Q

    x x

    p x

    zQ x zQ x

  • Slide 20

    Confidence Interval

    Ex: 64 acceleration measurements were taken during an experiment. The estimated mean and standard deviation of the measurements were 3.15 m/s2and 0.4 m/s2. (1) Find the 98% confidence interval for the true mean.

    (2) How confident are you that the true mean will be in the range from 2.85 to 3.45 m/s2 ?

  • Slide 21

    For Small Samples ( N < 60 ), the Q% Confidence Interval can be calculated using the Student-T distribution, which is similar to the normal distribution but depends on N.

    with Q% confidence, the true mean x will lie in the following interval about any sampled mean:

    t,Q is defined in class notes Chapter 4, Appendix B.

    Confidence Interval

    x S

    Nx S

    N

    N

    x

    S

    xx

    Sx x

    t t

    where

    ,Q ,Q

    Q% confidence interval

    1

  • Slide 22

    Confidence Interval

    Ex: A simple postal scale is supplied with , 1, 2, and 4 oz brass weights. For quality check, 14 of the 1 oz weights were measured on a precision scale. The results, in oz, are as follows:

    1.08 1.03 0.96 0.95 1.041.01 0.98 0.99 1.05 1.080.97 1.00 0.98 1.01

    Based on this sample and that the parent population of the weight is normally distributed, what is the 95% confidence interval for the true weight of the 1 oz brass weights?

  • Slide 23

    Propagation of Error

    Q: If you measured the diameter (D) and height (h) of a cylindrical container, how would the measurement error affect your estimation of the volume ( V = D2h/4 )?

    Q: What is the uncertainty in calculating the kinetic energy ( mv2/ 2 ) given the uncertainties in the measurements of mass (m) and velocity (v)?

    How do errors propagate through calculations?

  • Slide 24

    A Simple ExampleSuppose that y is related to two independent quantities X1 and X2 through

    To relate the changes in y to the uncertainties in X1 and X2, we need to find dy = g(dX1, dX2):

    The magnitude of dy is the expected change in y due to the uncertainties in x1 and x2:

    Propagation of Error

    212211 , XXfXCXCy

    dy

    22212

    22

    2

    11

    21 xxyCCx

    Xfx

    Xfy

  • Slide 25

    General FormulaSuppose that y is related to n independent measured variables {X1, X2, , Xn} by a functional representation:

    Given the uncertainties of Xs around some operating points:

    The expected value of and its uncertainty y are:

    Propagation of Error

    nXXXfy ,,, 21

    x x x x x xn n1 1 2 2 , , ,

    nxxx

    nn

    n

    xXfx

    Xfx

    Xfy

    xxxfy

    ,,,

    22

    22

    2

    11

    11

    11

    ,,,

    y

  • Propagation of Error

    Proof:Assume that the variability in measurement y is caused by k independent zero-mean error sources: e1, e2, . . . , ek.Then, (y - ytrue)2 = (e1 + e2 + . . . + ek)2

    = e12 + e22 + . . . + ek2 + 2e1e2 + 2e1e3 + . . .E[(y - ytrue)2] = E[e12 + e22 + . . . + ek2 + 2e1e2 + 2e1e3 + . . .]

    = E[e12 + e22 + . . . + ek2]

    y k kE e E e E e 12 2 2 2 12 2 2 2

    Slide 26

  • Slide 27

    Example (Standard Deviation of Sampled Mean)Given

    Use the general formula for error propagation:

    Propagation of Error

    NXXXXNx 321

    1

    N

    Xx

    Xx

    Xx

    Xx

    xx

    xN

    xxxx N

    22

    3

    2

    2

    2

    1321

  • Slide 28

    Propagation of Error

    Ex: What is the uncertainty in calculating the kinetic energy ( mv2/ 2 ) given the uncertainties in the measurements of mass (m) and velocity (v)?

    KE KEm

    m KEv

    v

    mv mm

    mv vv

    mv mm

    vv

    2 2

    22

    22

    22 2

    12

    2

    12

    2

  • Slide 29

    Best Linear FitHow do we characterize BEST?

    Fit a linear model (relation)

    to N pairs of [xi, yi] measurements.

    Given xi, the error between the estimated output and the measured output yi is:

    The BEST fit is the model that minimizes the sum of the ___________ of the error

    Least Squares Fitting of Data

    Input X

    Out

    put Y best linear

    fit yest

    measured output yi

    y a a xi o i 1

    y i

    n y yi i i

    min minn y yi

    i=

    N

    i ii=

    N2

    1

    2

    1

    Least Square Error

  • Slide 30

    Let

    The two independent variables are?

    Q: What are we trying to solve?

    Least Squares Fitting of Data

    J y y y a a xi ii=

    N

    i o ii=

    N

    2

    11

    2

    1

    M inim ize Find and such that 1J a a dJo 0

    Ja

    y a a x

    o

    i o iiN

    0

    2 011

    Ja

    x y a a xi i o iiN

    0

    2 011

  • Slide 31

    Least Squares Fitting of Data

    Rewrite the last two equations as two simultaneous equations for ao and a1:

    ax y x x y

    aN x y x y

    N x xo

    i i i i i

    i i i ii i

    2

    1

    2 2

    where

    a N a x y

    a x a x x y

    aa

    yx y

    o i i

    o i i i i

    o i

    i i

    1

    12

    1

  • Slide 32

    Summary: Given N pairs of input/output measurements [xi, yi], the best linear Least Squares model from input xi to output yi is:

    where

    The process of minimizing squared error can be used for fitting nonlinear models and many engineering applications.

    Same result can also be derived from a probability distribution point of view (see Course Notes, Ch. 4 - Maximum Likelihood Estimation ).

    Q: Given a theoretical model y = ao + a2 x2 , what are the Least Squares estimates for ao & a2?

    Least Squares Fitting of Data

    y a a xi o i 1

    a

    x y x x y

    aN x y x y N x x

    oi i i i i

    i i i ii i

    2

    1

    2 2

    and

  • Slide 33

    Least Squares Fitting of Data

    Variance of the fit:

    Variance of the measurements in y: y2

    Assume measurements in x are precise. Correlation coefficient:

    is a measure of how well the model explains the data.R2 = 1 implies that the linear model fits the data perfectly.

    RS

    n

    y

    n

    y

    22

    2

    2

    21 1

    ,

    n N i o iiN y a a x2 1 2 1

    21