Upload
prakashrs295
View
245
Download
0
Embed Size (px)
Citation preview
7/28/2019 Ch4 Errors
1/52
Copyright 2004 David J. Lilja 1
Errors in Experimental
Measurements
Sources of errors
Accuracy, precision, resolution
A mathematical model of errors Confidence intervals
For means
For proportions How many measurements are needed for
desired error?
7/28/2019 Ch4 Errors
2/52
Copyright 2004 David J. Lilja 2
Why do we need statistics?
1. Noise, noise, noise, noise, noise!
OK not really this type of noise
7/28/2019 Ch4 Errors
3/52
Copyright 2004 David J. Lilja 3
Why do we need statistics?
2. Aggregate data into
meaningful
information.
445 446 397 226388 3445 188 1002
47762 432 54 12
98 345 2245 8839
77492 472 565 999
1 34 882 545 4022827 572 597 364
...x
7/28/2019 Ch4 Errors
4/52
Copyright 2004 David J. Lilja 4
What is a stat ist ic?
A quantity that is computed from a sample[of data].
Merriam-Webster
A single number used to summarize a larger
collection of values.
7/28/2019 Ch4 Errors
5/52
Copyright 2004 David J. Lilja 5
What are stat ist ics?
A branch of mathematics dealing with thecollection, analysis, interpretation, and
presentation of masses of numerical data.
Merriam-Webster
We are most interested in analysis and
interpretation here.
Lies, damn lies, and statistics!
7/28/2019 Ch4 Errors
6/52
Copyright 2004 David J. Lilja 6
Goals
Provide intuitive conceptual background for
some standard statistical tools.
Draw meaningful conclusions in presence of
noisy measurements.
Allow you to correctly and intelligently apply
techniques in new situations.
Dont simply plug and crank from a formula.
7/28/2019 Ch4 Errors
7/52
Copyright 2004 David J. Lilja 7
Goals
Present techniques for aggregating large
quantities of data.
Obtain a big-picture view of your results.
Obtain new insights from complex
measurement and simulation results.
E.g. How does a new feature impact the
overall system?
7/28/2019 Ch4 Errors
8/52
Copyright 2004 David J. Lilja 8
Sources of Experimental Errors
Accuracy, precision, resolution
7/28/2019 Ch4 Errors
9/52
Copyright 2004 David J. Lilja 9
Experimental errors
Errors noise in measured values Systematicerrors
Result of an experimental mistake Typically produce constant or slowly varying bias
Controlled through skill of experimenter
Examples
Temperature change causes clock drift Forget to clear cache before timing run
7/28/2019 Ch4 Errors
10/52
Copyright 2004 David J. Lilja 10
Experimental errors
Random errors Unpredictable, non-deterministic
Unbiased equal probability of increasing or decreasingmeasured value
Result of Limitations of measuring tool
Observer reading output of tool
Random processes within system
Typically cannot be controlled Use statistical tools to characterize and quantify
7/28/2019 Ch4 Errors
11/52
Copyright 2004 David J. Lilja 11
Example: Quantization
Random error
7/28/2019 Ch4 Errors
12/52
Copyright 2004 David J. Lilja 12
Quantization error
Timer resolution
quantization error
Repeated measurements
X
Completely unpredictable
7/28/2019 Ch4 Errors
13/52
Copyright 2004 David J. Lilja 13
A Model of Errors
Error Measured
value
Probability
-E x E
+E x+ E
7/28/2019 Ch4 Errors
14/52
Copyright 2004 David J. Lilja 14
A Model of Errors
Error 1 Error 2 Measured
value
Probability
-E -E x 2E
-E+E x
+E -E x
+E +E x+ 2E
7/28/2019 Ch4 Errors
15/52
Copyright 2004 David J. Lilja 15
A Model of Errors
Probability
0
0.1
0.2
0.3
0.4
0.5
0.6
x-E x x+E
Measured value
7/28/2019 Ch4 Errors
16/52
Copyright 2004 David J. Lilja 16
Probability of Obtaining a
Specific Measured Value
7/28/2019 Ch4 Errors
17/52
Copyright 2004 David J. Lilja 17
A Model of Errors
Pr(X=xi) = Pr(measurexi)
= number of paths from real value toxi
Pr(X=xi) ~ binomial distribution As number of error sources becomes large
n ,
Binomial Gaussian (Normal)
Thus, the bell curve
7/28/2019 Ch4 Errors
18/52
Copyright 2004 David J. Lilja 18
Frequency of Measuring Specific
Values
Mean of measured values
True value
Resolution
Precision
Accuracy
7/28/2019 Ch4 Errors
19/52
Copyright 2004 David J. Lilja 19
Accuracy, Precision,
Resolution
Systematic errors accuracy How close mean of measured values is to true
value Random errors precision Repeatability of measurements
Characteristics of tools resolution Smallest increment between measured values
7/28/2019 Ch4 Errors
20/52
Copyright 2004 David J. Lilja 20
Quantifying Accuracy,
Precision, Resolution
Accuracy
Hard to determine true accuracy
Relative to a predefined standard
E.g. definition of a second
Resolution
Dependent on tools
Precision Quantify amount ofimprecision using statistical
tools
7/28/2019 Ch4 Errors
21/52
Copyright 2004 David J. Lilja 21
Confidence Interval for the
Mean
c1 c2
1-
/2 /2
7/28/2019 Ch4 Errors
22/52
Copyright 2004 David J. Lilja 22
Normalize x
1)(deviationstandard
mean
tsmeasuremenofnumber
/
n
1i
2
1
nxxs
xx
n
ns
xxz
i
n
i
i
7/28/2019 Ch4 Errors
23/52
Copyright 2004 David J. Lilja 23
Confidence Interval for the
Mean
Normalized zfollows a Students tdistribution (n-1) degrees of freedom
Area left of c2 = 1/2
Tabulated values fort
c1 c2
1-
/2 /2
7/28/2019 Ch4 Errors
24/52
Copyright 2004 David J. Lilja 24
Confidence Interval for the
Mean
As n , normalized distribution becomesGaussian (normal)
c1 c2
1-
/2 /2
7/28/2019 Ch4 Errors
25/52
Copyright 2004 David J. Lilja 25
Confidence Interval for the
Mean
1)Pr(
Then,
21
1;2/12
1;2/11
cxc
n
stxc
n
stxc
n
n
7/28/2019 Ch4 Errors
26/52
Copyright 2004 David J. Lilja 26
An Example
Experiment Measured value
1 8.0 s
2 7.0 s
3 5.0 s
4 9.0 s
5 9.5 s
6 11.3 s7 5.2 s
8 8.5 s
7/28/2019 Ch4 Errors
27/52
Copyright 2004 David J. Lilja 27
An Example (cont.)
14.2deviationstandardsample
94.71
sn
xx
n
i i
7/28/2019 Ch4 Errors
28/52
Copyright 2004 David J. Lilja 28
An Example (cont.)
90% CI 90% chance actual value in interval 90% CI = 0.10
1 - /2 = 0.95
n= 8 7 degrees of freedom
c1 c2
1-
/2 /2
7/28/2019 Ch4 Errors
29/52
Copyright 2004 David J. Lilja 29
90% Confidence Interval
a
n 0.90 0.95 0.975
5 1.476 2.015 2.571
6 1.440 1.943 2.447
7 1.415 1.895 2.365
1.282 1.645 1.960
4.98
)14.2(895.194.7
5.68
)14.2(895.194.7
895.1
95.02/10.012/1
2
1
7;95.01;
c
c
tt
a
na
7/28/2019 Ch4 Errors
30/52
Copyright 2004 David J. Lilja 30
95% Confidence Interval
a
n 0.90 0.95 0.975
5 1.476 2.015 2.571
6 1.440 1.943 2.447
7 1.415 1.895 2.365
1.282 1.645 1.960
7.98
)14.2(365.294.7
1.68
)14.2(365.294.7
365.2
975.02/10.012/1
2
1
7;975.01;
c
c
tt
a
na
7/28/2019 Ch4 Errors
31/52
Copyright 2004 David J. Lilja 31
What does it mean?
90% CI = [6.5, 9.4]
90% chance real value is between 6.5, 9.4
95% CI = [6.1, 9.7]
95% chance real value is between 6.1, 9.7
Why is interval wider when we are more
confident?
7/28/2019 Ch4 Errors
32/52
Copyright 2004 David J. Lilja 32
Higher Confidence Wider
Interval?
6.5 9.4
90%
6.1 9.7
95%
7/28/2019 Ch4 Errors
33/52
Copyright 2004 David J. Lilja 33
Key Assumption
Measurement errors are
Normally distributed.
Is this true for most
measurements on realcomputer systems?
c1 c2
1-
/2 /2
7/28/2019 Ch4 Errors
34/52
Copyright 2004 David J. Lilja 34
Key Assumption
Saved by the Central Limit Theorem
Sum of a large number of values from anydistribution will be Normally (Gaussian)
distributed. What is a large number? Typically assumed to be > 6 or 7.
7/28/2019 Ch4 Errors
35/52
Copyright 2004 David J. Lilja 35
How many measurements?
Width of interval inversely proportional to n
Want to minimize number of measurements
Find confidence interval for mean, such that:
Pr(actual mean in interval) = (1)
xexecc )1(,)1(),( 21
7/28/2019 Ch4 Errors
36/52
Copyright 2004 David J. Lilja 36
How many measurements?
2
2/1
2/1
2/1
21 )1(),(
ex
szn
exn
sz
n
szx
xecc
7/28/2019 Ch4 Errors
37/52
Copyright 2004 David J. Lilja 37
How many measurements?
But n depends on knowing mean and
standard deviation!
Estimate s with small number of
measurements
Use this s to find n needed for desired
interval width
7/28/2019 Ch4 Errors
38/52
Copyright 2004 David J. Lilja 38
How many measurements?
Mean = 7.94 s
Standard deviation = 2.14 s
Want 90% confidence mean is within 7% ofactual mean.
7/28/2019 Ch4 Errors
39/52
Copyright 2004 David J. Lilja 39
How many measurements?
Mean = 7.94 s
Standard deviation = 2.14 s
Want 90% confidence mean is within 7% ofactual mean.
= 0.90
(1-/2) = 0.95
Error = 3.5%
e = 0.035
7/28/2019 Ch4 Errors
40/52
Copyright 2004 David J. Lilja 40
How many measurements?
9.212
)94.7(035.0
)14.2(895.12
2/1
ex
szn
213 measurements
90% chance true mean is within 3.5% interval
7/28/2019 Ch4 Errors
41/52
Copyright 2004 David J. Lilja 41
Proportions
p = Pr(success) in n trials of binomial
experiment
Estimate proportion: p = m/n
m = number of successes
n = total number of trials
7/28/2019 Ch4 Errors
42/52
Copyright 2004 David J. Lilja 42
Proportions
n
pp
zpc
n
pp
zpc
)1(
)1(
2/12
2/11
7/28/2019 Ch4 Errors
43/52
Copyright 2004 David J. Lilja 43
Proportions
How much time does processor spend in
OS?
Interrupt every 10 ms
Increment counters
n = number of interrupts
m = number of interrupts when PC within OS
7/28/2019 Ch4 Errors
44/52
Copyright 2004 David J. Lilja 44
Proportions
How much time does processor spend inOS?
Interrupt every 10 ms
Increment counters n = number of interrupts
m = number of interrupts when PC within OS
Run for 1 minute n = 6000
m = 658
7/28/2019 Ch4 Errors
45/52
Copyright 2004 David J. Lilja 45
Proportions
)1176.0,1018.0(6000
)1097.01(1097.096.11097.0
)1(),( 2/121
n
ppzpcc
95% confidence interval for proportion
So 95% certain processor spends 10.2-11.8% of itstime in OS
N b f t f
7/28/2019 Ch4 Errors
46/52
Copyright 2004 David J. Lilja 46
Number of measurements for
proportions
2
2
2/1
2/1
2/1
)(
)1(
)1(
)1()1(
pe
ppzn
n
ppzpe
n
ppzppe
N b f t f
7/28/2019 Ch4 Errors
47/52
Copyright 2004 David J. Lilja 47
Number of measurements for
proportions
How long to run OS experiment?
Want 95% confidence
0.5%
N b f t f
7/28/2019 Ch4 Errors
48/52
Copyright 2004 David J. Lilja 48
Number of measurements for
proportions
How long to run OS experiment?
Want 95% confidence
0.5% e = 0.005
p = 0.1097
N b f t f
7/28/2019 Ch4 Errors
49/52
Copyright 2004 David J. Lilja 49
Number of measurements for
proportions
102,247,1
)1097.0(005.0)1097.01)(1097.0()960.1(
)(
)1(
2
2
2
2
2/1
pe
ppzn
10 ms interrupts
3.46 hours
7/28/2019 Ch4 Errors
50/52
Copyright 2004 David J. Lilja 50
Important Points
Use statistics to
Deal with noisy measurements
Aggregate large amounts of data
Errors in measurements are due to:
Accuracy, precision, resolution of tools
Other sources of noise
Systematic, random errors
7/28/2019 Ch4 Errors
51/52
Copyright 2004 David J. Lilja 51
Important Points: Model errors
with bell curve
True value
Precision
Mean of measured values
Resolution
Accuracy
7/28/2019 Ch4 Errors
52/52
Copyright 2004 David J. Lilja 52
Important Points
Use confidence intervals to quantify precision
Confidence intervals for
Mean ofn samples
Proportions
Confidence level
Pr(actual mean within computed interval)
Compute number of measurements needed
for desired interval width