Ch4 Errors

Embed Size (px)

Citation preview

  • 7/28/2019 Ch4 Errors

    1/52

    Copyright 2004 David J. Lilja 1

    Errors in Experimental

    Measurements

    Sources of errors

    Accuracy, precision, resolution

    A mathematical model of errors Confidence intervals

    For means

    For proportions How many measurements are needed for

    desired error?

  • 7/28/2019 Ch4 Errors

    2/52

    Copyright 2004 David J. Lilja 2

    Why do we need statistics?

    1. Noise, noise, noise, noise, noise!

    OK not really this type of noise

  • 7/28/2019 Ch4 Errors

    3/52

    Copyright 2004 David J. Lilja 3

    Why do we need statistics?

    2. Aggregate data into

    meaningful

    information.

    445 446 397 226388 3445 188 1002

    47762 432 54 12

    98 345 2245 8839

    77492 472 565 999

    1 34 882 545 4022827 572 597 364

    ...x

  • 7/28/2019 Ch4 Errors

    4/52

    Copyright 2004 David J. Lilja 4

    What is a stat ist ic?

    A quantity that is computed from a sample[of data].

    Merriam-Webster

    A single number used to summarize a larger

    collection of values.

  • 7/28/2019 Ch4 Errors

    5/52

    Copyright 2004 David J. Lilja 5

    What are stat ist ics?

    A branch of mathematics dealing with thecollection, analysis, interpretation, and

    presentation of masses of numerical data.

    Merriam-Webster

    We are most interested in analysis and

    interpretation here.

    Lies, damn lies, and statistics!

  • 7/28/2019 Ch4 Errors

    6/52

    Copyright 2004 David J. Lilja 6

    Goals

    Provide intuitive conceptual background for

    some standard statistical tools.

    Draw meaningful conclusions in presence of

    noisy measurements.

    Allow you to correctly and intelligently apply

    techniques in new situations.

    Dont simply plug and crank from a formula.

  • 7/28/2019 Ch4 Errors

    7/52

    Copyright 2004 David J. Lilja 7

    Goals

    Present techniques for aggregating large

    quantities of data.

    Obtain a big-picture view of your results.

    Obtain new insights from complex

    measurement and simulation results.

    E.g. How does a new feature impact the

    overall system?

  • 7/28/2019 Ch4 Errors

    8/52

    Copyright 2004 David J. Lilja 8

    Sources of Experimental Errors

    Accuracy, precision, resolution

  • 7/28/2019 Ch4 Errors

    9/52

    Copyright 2004 David J. Lilja 9

    Experimental errors

    Errors noise in measured values Systematicerrors

    Result of an experimental mistake Typically produce constant or slowly varying bias

    Controlled through skill of experimenter

    Examples

    Temperature change causes clock drift Forget to clear cache before timing run

  • 7/28/2019 Ch4 Errors

    10/52

    Copyright 2004 David J. Lilja 10

    Experimental errors

    Random errors Unpredictable, non-deterministic

    Unbiased equal probability of increasing or decreasingmeasured value

    Result of Limitations of measuring tool

    Observer reading output of tool

    Random processes within system

    Typically cannot be controlled Use statistical tools to characterize and quantify

  • 7/28/2019 Ch4 Errors

    11/52

    Copyright 2004 David J. Lilja 11

    Example: Quantization

    Random error

  • 7/28/2019 Ch4 Errors

    12/52

    Copyright 2004 David J. Lilja 12

    Quantization error

    Timer resolution

    quantization error

    Repeated measurements

    X

    Completely unpredictable

  • 7/28/2019 Ch4 Errors

    13/52

    Copyright 2004 David J. Lilja 13

    A Model of Errors

    Error Measured

    value

    Probability

    -E x E

    +E x+ E

  • 7/28/2019 Ch4 Errors

    14/52

    Copyright 2004 David J. Lilja 14

    A Model of Errors

    Error 1 Error 2 Measured

    value

    Probability

    -E -E x 2E

    -E+E x

    +E -E x

    +E +E x+ 2E

  • 7/28/2019 Ch4 Errors

    15/52

    Copyright 2004 David J. Lilja 15

    A Model of Errors

    Probability

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    x-E x x+E

    Measured value

  • 7/28/2019 Ch4 Errors

    16/52

    Copyright 2004 David J. Lilja 16

    Probability of Obtaining a

    Specific Measured Value

  • 7/28/2019 Ch4 Errors

    17/52

    Copyright 2004 David J. Lilja 17

    A Model of Errors

    Pr(X=xi) = Pr(measurexi)

    = number of paths from real value toxi

    Pr(X=xi) ~ binomial distribution As number of error sources becomes large

    n ,

    Binomial Gaussian (Normal)

    Thus, the bell curve

  • 7/28/2019 Ch4 Errors

    18/52

    Copyright 2004 David J. Lilja 18

    Frequency of Measuring Specific

    Values

    Mean of measured values

    True value

    Resolution

    Precision

    Accuracy

  • 7/28/2019 Ch4 Errors

    19/52

    Copyright 2004 David J. Lilja 19

    Accuracy, Precision,

    Resolution

    Systematic errors accuracy How close mean of measured values is to true

    value Random errors precision Repeatability of measurements

    Characteristics of tools resolution Smallest increment between measured values

  • 7/28/2019 Ch4 Errors

    20/52

    Copyright 2004 David J. Lilja 20

    Quantifying Accuracy,

    Precision, Resolution

    Accuracy

    Hard to determine true accuracy

    Relative to a predefined standard

    E.g. definition of a second

    Resolution

    Dependent on tools

    Precision Quantify amount ofimprecision using statistical

    tools

  • 7/28/2019 Ch4 Errors

    21/52

    Copyright 2004 David J. Lilja 21

    Confidence Interval for the

    Mean

    c1 c2

    1-

    /2 /2

  • 7/28/2019 Ch4 Errors

    22/52

    Copyright 2004 David J. Lilja 22

    Normalize x

    1)(deviationstandard

    mean

    tsmeasuremenofnumber

    /

    n

    1i

    2

    1

    nxxs

    xx

    n

    ns

    xxz

    i

    n

    i

    i

  • 7/28/2019 Ch4 Errors

    23/52

    Copyright 2004 David J. Lilja 23

    Confidence Interval for the

    Mean

    Normalized zfollows a Students tdistribution (n-1) degrees of freedom

    Area left of c2 = 1/2

    Tabulated values fort

    c1 c2

    1-

    /2 /2

  • 7/28/2019 Ch4 Errors

    24/52

    Copyright 2004 David J. Lilja 24

    Confidence Interval for the

    Mean

    As n , normalized distribution becomesGaussian (normal)

    c1 c2

    1-

    /2 /2

  • 7/28/2019 Ch4 Errors

    25/52

    Copyright 2004 David J. Lilja 25

    Confidence Interval for the

    Mean

    1)Pr(

    Then,

    21

    1;2/12

    1;2/11

    cxc

    n

    stxc

    n

    stxc

    n

    n

  • 7/28/2019 Ch4 Errors

    26/52

    Copyright 2004 David J. Lilja 26

    An Example

    Experiment Measured value

    1 8.0 s

    2 7.0 s

    3 5.0 s

    4 9.0 s

    5 9.5 s

    6 11.3 s7 5.2 s

    8 8.5 s

  • 7/28/2019 Ch4 Errors

    27/52

    Copyright 2004 David J. Lilja 27

    An Example (cont.)

    14.2deviationstandardsample

    94.71

    sn

    xx

    n

    i i

  • 7/28/2019 Ch4 Errors

    28/52

    Copyright 2004 David J. Lilja 28

    An Example (cont.)

    90% CI 90% chance actual value in interval 90% CI = 0.10

    1 - /2 = 0.95

    n= 8 7 degrees of freedom

    c1 c2

    1-

    /2 /2

  • 7/28/2019 Ch4 Errors

    29/52

    Copyright 2004 David J. Lilja 29

    90% Confidence Interval

    a

    n 0.90 0.95 0.975

    5 1.476 2.015 2.571

    6 1.440 1.943 2.447

    7 1.415 1.895 2.365

    1.282 1.645 1.960

    4.98

    )14.2(895.194.7

    5.68

    )14.2(895.194.7

    895.1

    95.02/10.012/1

    2

    1

    7;95.01;

    c

    c

    tt

    a

    na

  • 7/28/2019 Ch4 Errors

    30/52

    Copyright 2004 David J. Lilja 30

    95% Confidence Interval

    a

    n 0.90 0.95 0.975

    5 1.476 2.015 2.571

    6 1.440 1.943 2.447

    7 1.415 1.895 2.365

    1.282 1.645 1.960

    7.98

    )14.2(365.294.7

    1.68

    )14.2(365.294.7

    365.2

    975.02/10.012/1

    2

    1

    7;975.01;

    c

    c

    tt

    a

    na

  • 7/28/2019 Ch4 Errors

    31/52

    Copyright 2004 David J. Lilja 31

    What does it mean?

    90% CI = [6.5, 9.4]

    90% chance real value is between 6.5, 9.4

    95% CI = [6.1, 9.7]

    95% chance real value is between 6.1, 9.7

    Why is interval wider when we are more

    confident?

  • 7/28/2019 Ch4 Errors

    32/52

    Copyright 2004 David J. Lilja 32

    Higher Confidence Wider

    Interval?

    6.5 9.4

    90%

    6.1 9.7

    95%

  • 7/28/2019 Ch4 Errors

    33/52

    Copyright 2004 David J. Lilja 33

    Key Assumption

    Measurement errors are

    Normally distributed.

    Is this true for most

    measurements on realcomputer systems?

    c1 c2

    1-

    /2 /2

  • 7/28/2019 Ch4 Errors

    34/52

    Copyright 2004 David J. Lilja 34

    Key Assumption

    Saved by the Central Limit Theorem

    Sum of a large number of values from anydistribution will be Normally (Gaussian)

    distributed. What is a large number? Typically assumed to be > 6 or 7.

  • 7/28/2019 Ch4 Errors

    35/52

    Copyright 2004 David J. Lilja 35

    How many measurements?

    Width of interval inversely proportional to n

    Want to minimize number of measurements

    Find confidence interval for mean, such that:

    Pr(actual mean in interval) = (1)

    xexecc )1(,)1(),( 21

  • 7/28/2019 Ch4 Errors

    36/52

    Copyright 2004 David J. Lilja 36

    How many measurements?

    2

    2/1

    2/1

    2/1

    21 )1(),(

    ex

    szn

    exn

    sz

    n

    szx

    xecc

  • 7/28/2019 Ch4 Errors

    37/52

    Copyright 2004 David J. Lilja 37

    How many measurements?

    But n depends on knowing mean and

    standard deviation!

    Estimate s with small number of

    measurements

    Use this s to find n needed for desired

    interval width

  • 7/28/2019 Ch4 Errors

    38/52

    Copyright 2004 David J. Lilja 38

    How many measurements?

    Mean = 7.94 s

    Standard deviation = 2.14 s

    Want 90% confidence mean is within 7% ofactual mean.

  • 7/28/2019 Ch4 Errors

    39/52

    Copyright 2004 David J. Lilja 39

    How many measurements?

    Mean = 7.94 s

    Standard deviation = 2.14 s

    Want 90% confidence mean is within 7% ofactual mean.

    = 0.90

    (1-/2) = 0.95

    Error = 3.5%

    e = 0.035

  • 7/28/2019 Ch4 Errors

    40/52

    Copyright 2004 David J. Lilja 40

    How many measurements?

    9.212

    )94.7(035.0

    )14.2(895.12

    2/1

    ex

    szn

    213 measurements

    90% chance true mean is within 3.5% interval

  • 7/28/2019 Ch4 Errors

    41/52

    Copyright 2004 David J. Lilja 41

    Proportions

    p = Pr(success) in n trials of binomial

    experiment

    Estimate proportion: p = m/n

    m = number of successes

    n = total number of trials

  • 7/28/2019 Ch4 Errors

    42/52

    Copyright 2004 David J. Lilja 42

    Proportions

    n

    pp

    zpc

    n

    pp

    zpc

    )1(

    )1(

    2/12

    2/11

  • 7/28/2019 Ch4 Errors

    43/52

    Copyright 2004 David J. Lilja 43

    Proportions

    How much time does processor spend in

    OS?

    Interrupt every 10 ms

    Increment counters

    n = number of interrupts

    m = number of interrupts when PC within OS

  • 7/28/2019 Ch4 Errors

    44/52

    Copyright 2004 David J. Lilja 44

    Proportions

    How much time does processor spend inOS?

    Interrupt every 10 ms

    Increment counters n = number of interrupts

    m = number of interrupts when PC within OS

    Run for 1 minute n = 6000

    m = 658

  • 7/28/2019 Ch4 Errors

    45/52

    Copyright 2004 David J. Lilja 45

    Proportions

    )1176.0,1018.0(6000

    )1097.01(1097.096.11097.0

    )1(),( 2/121

    n

    ppzpcc

    95% confidence interval for proportion

    So 95% certain processor spends 10.2-11.8% of itstime in OS

    N b f t f

  • 7/28/2019 Ch4 Errors

    46/52

    Copyright 2004 David J. Lilja 46

    Number of measurements for

    proportions

    2

    2

    2/1

    2/1

    2/1

    )(

    )1(

    )1(

    )1()1(

    pe

    ppzn

    n

    ppzpe

    n

    ppzppe

    N b f t f

  • 7/28/2019 Ch4 Errors

    47/52

    Copyright 2004 David J. Lilja 47

    Number of measurements for

    proportions

    How long to run OS experiment?

    Want 95% confidence

    0.5%

    N b f t f

  • 7/28/2019 Ch4 Errors

    48/52

    Copyright 2004 David J. Lilja 48

    Number of measurements for

    proportions

    How long to run OS experiment?

    Want 95% confidence

    0.5% e = 0.005

    p = 0.1097

    N b f t f

  • 7/28/2019 Ch4 Errors

    49/52

    Copyright 2004 David J. Lilja 49

    Number of measurements for

    proportions

    102,247,1

    )1097.0(005.0)1097.01)(1097.0()960.1(

    )(

    )1(

    2

    2

    2

    2

    2/1

    pe

    ppzn

    10 ms interrupts

    3.46 hours

  • 7/28/2019 Ch4 Errors

    50/52

    Copyright 2004 David J. Lilja 50

    Important Points

    Use statistics to

    Deal with noisy measurements

    Aggregate large amounts of data

    Errors in measurements are due to:

    Accuracy, precision, resolution of tools

    Other sources of noise

    Systematic, random errors

  • 7/28/2019 Ch4 Errors

    51/52

    Copyright 2004 David J. Lilja 51

    Important Points: Model errors

    with bell curve

    True value

    Precision

    Mean of measured values

    Resolution

    Accuracy

  • 7/28/2019 Ch4 Errors

    52/52

    Copyright 2004 David J. Lilja 52

    Important Points

    Use confidence intervals to quantify precision

    Confidence intervals for

    Mean ofn samples

    Proportions

    Confidence level

    Pr(actual mean within computed interval)

    Compute number of measurements needed

    for desired interval width