Upload
evelyn-dalton
View
222
Download
2
Embed Size (px)
Citation preview
EMIS 7300SYSTEMS ANALYSIS METHODS
FALL 2005
Dr. John Lipp
Copyright © 2003-2005 Dr. John Lipp
EMIS7300 Fall 2005Copyright 2003-2005 Dr. John Lipp
S3P1-2
Course Outline
• Part 1: Rank (Order) and Non-Parametric Statistics.
• Part 2: Statistical Process Control.
• Part 3: Reliability.
• Mid-term Exam.
EMIS7300 Fall 2005Copyright 2003-2005 Dr. John Lipp
S3P1-3
Today’s Topics
• Empirical Cumulative Distribution Function
• Rank Transform.
• Sign Test.
• Tukey’s Two Sample Quick Test.
• Circular Error Probability Confidence Interval.
EMIS7300 Fall 2005Copyright 2003-2005 Dr. John Lipp
S3P1-4
Non-Parametric and Rank Statistics
• Non-parametric statistical procedures are designed without the use of the underlying data distribution and its parameters.
– The only assumption is the data samples are statistically independent and come from the same distribution.
– Also known as distribution-free.
– Hypothesis tests and confidence intervals are on the median, quartiles, percentiles, or other quantiles.
EMIS7300 Fall 2005Copyright 2003-2005 Dr. John Lipp
S3P1-5
Empirical Cumulative Distribution Function
• The sample CDF or empirical CDF is defined by
is equivalent to sorting the data yi = sort(xi) and plotting yi vs. i/n as a stair-step function.
n
xxxF i
X
#)(ˆ
n x i y i i /n
1 1.4 0 0.142 0 1.4 0.293 9.6 3.4 0.434 5.9 5.9 0.575 3.4 6.4 0.716 7.6 7.6 0.867 6.4 9.6 1
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15
EMIS7300 Fall 2005Copyright 2003-2005 Dr. John Lipp
S3P1-6
Empirical Cumulative Distribution Function (cont.)
• The sample PDF is an unbiased estimator
• The variance is given by
• The calculations of the expected values are actually easy
– The distribution of #xi < x has a binomial distribution with p = F(x) !!!
)(
)(#)(ˆ xF
nxnF
n
xxExFE i
X
n
xFxFxFVar X
)(1)()(ˆ
EMIS7300 Fall 2005Copyright 2003-2005 Dr. John Lipp
S3P1-7
Empirical Cumulative Distribution Function (cont.)
EMIS7300 Fall 2005Copyright 2003-2005 Dr. John Lipp
S3P1-8
Rank Transform
• The rank transform is simply replacing the data with the data’s ranks from sorting in ascending order.
• Using the ranks often simplifies calculations:
– Rank Sample Mean (unaffected by ties)
– Rank Sample Standard Deviation (affected by ties)
21
2)1(111
11
nnnn
in
Rn
Rn
i
n
ii
12)1(
6)12)(1(
11
11
11
2
2
1
2
tiesifuseMust
2
1
22
nnRn
nnnn
Rnin
RnRn
Sn
i
n
ii
EMIS7300 Fall 2005Copyright 2003-2005 Dr. John Lipp
S3P1-9
Sign Test
• Consider a hypothesis test on the median of a data set {xi}
• The test is performed by subtracting C from the data {xi} and taking the sign, {si} = {sign(xi – C)}.
• The number of “+” and “–” values of {si} are counted, denoted r+ and r–, respectively.
C
C
CH
CH
~sided)(oneor,~sided)(twoor,~:
~:
1
0
xi 13.5 9.8 11.4 12.2 7.9 8.6 9.1 10.6 11.3 10.1
xi-10 3.5 -0.2 1.4 2.2 -2.1 -1.4 -0.9 0.6 1.3 0.1
si + – + + – – – + + +
EMIS7300 Fall 2005Copyright 2003-2005 Dr. John Lipp
S3P1-10
Sign Test (cont.)
• What is the distribution of r+?
– r+ is a discrete random variable.– r+ can be thought of as the count of successful “+”s in {si}.
– This success rate is a constant, p = 0.5.Ergo, r+ has a binomial distribution, .
21
][ nR r
nrf
0 1 2 3 4 5 6 7 8 9 100
0.05
0.1
0.15
0.2
0.25
r+
f R+[r+ ]
EMIS7300 Fall 2005Copyright 2003-2005 Dr. John Lipp
S3P1-11
Sign Test (cont.)
• For n large (n >> 10), can use a Z test with
• Otherwise, a table built from the binomial PDF is needed
– Two-sided
– One-sided
nnR
Z5.0
5.0
Acceptance Region
r+ = 54 r+
63 r+
72 r+
81 r+
9
0.754 0.344 0.109 0.022 0.002
Acceptance Region
r+ 3
r+ 4 r+ 5 r+ 6 r+ 7 r+ 8 r+ 9
0.828 0.623 0.377 0.172 0.055 0.011 0.001
EMIS7300 Fall 2005Copyright 2003-2005 Dr. John Lipp
S3P1-12
Sign Test (cont.)
• The sign test can be used to test any quantile p, F(p) = p.
• The null hypothesis test is H0: p = C.
• The distribution of the test statistic r+ is binomial for p,
• Example: test H0: first quartile = 8.5 (q1 = 0.25= 8.5)
rrn
Rpp
r
nrf )1(][
xi 13.5 9.8 11.4 12.2 7.9 8.6 9.1 10.6 11.3 10.1
xi-8.5 5.0 1.3 2.9 3.7 -0.6 0.1 0.6 2.1 2.8 1.6
si + + + + – + + + + +
Acceptance Region
7 r+ 8
6 r+ 9
5 r+ 10
0.4682 0.1344 0.0197
EMIS7300 Fall 2005Copyright 2003-2005 Dr. John Lipp
S3P1-13
Tukey’s Two-Sample Quick Test
• Plot two data samples {xi} and {yi} on the same graph, using a different symbol for each point.
• Count the number of points of {xi} that protrude past {yi} at one end, and the number of points of {yi} that protrude past {xi} at the opposite end.
– The total is denoted the end-count.– If {xi} protrudes at both ends, or visa-versa, then the end-
count is 0.
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3
{x}
{y}
EMIS7300 Fall 2005Copyright 2003-2005 Dr. John Lipp
S3P1-14
Tukey’s Two-Sample Quick Test (cont.)
• Use the table below for the significance level
– Confidence level, 1 – , is the chance a difference in the medians exists between {xi} and {yi} (or their means, if the PDF is symmetric).
Sample Size n = 0.05 = 0.01 = 0.001
4-8 7 9 13
9-21 7 10 13
22-24 7 10 14
25+ 8 10 14
EMIS7300 Fall 2005Copyright 2003-2005 Dr. John Lipp
S3P1-15
Circular Error Probability
• Circular Error Probability, or CEP, is specified in many weapon system’s requirements.
• The CEP is the median, radial miss distance.
• The standard model for the radial miss distance is the Rayleigh distribution.
CEP
R
f R(R
)
R
FR(R
)
EMIS7300 Fall 2005Copyright 2003-2005 Dr. John Lipp
S3P1-16
Circular Error Probability (cont.)
• The appropriateness of the Rayleigh radial miss distance model tends to decrease as the system complexity increases.
• Point Estimator: the sample median.
– Need the distribution of R to analyze!– Solution: Use non-parametric methods!
• Confidence Interval: sort the sample radial miss data so that
R1 R2 R3 … Rk … Rn
and find the value of k such that
P(CEP Rk) = 1 –
• Finding the appropriate value of k takes a little manipulation.
EMIS7300 Fall 2005Copyright 2003-2005 Dr. John Lipp
S3P1-17
Circular Error Probability (cont.)
• First, let m be the index of the largest radial miss that is less than or equal to the population median (= CEP)
1< m < n: R1 R2 R3 … Rm CEP … Rn, or
m = 0: CEP R1 R2 R3 … Rn, or
m = n: R1 R2 R3 … Rn CEP
• The PDF of m, fM[m], is binomial with p = ½ !
– The radial miss distances are assumed to be statistically independent.
– The probability that a particular radial miss distance is less than the median is a constant ½ (by definition).
– m is the # of radial miss distances less than the median.
EMIS7300 Fall 2005Copyright 2003-2005 Dr. John Lipp
S3P1-18
Circular Error Probability (cont.)
• The desired probability can be rewritten using the total probability rule:
• Evaluate P(CEP Rk|m)
– If m < k: P(CEP Rk|m) = 1
R1 R2 R3 … Rm CEP … Rk … Rn
– If m = k: P(CEP Rk|m) = 1
R1 R2 R3 … CEP Rk … Rn
– If m > k: P(CEP Rk|m) = 0
R1 R2 R3 … Rk … Rm CEP … Rn
n
mMkk mfmRPRP
0
][)|(CEP)(CEP
EMIS7300 Fall 2005Copyright 2003-2005 Dr. John Lipp
S3P1-19
Circular Error Probability (cont.)
• That is,
and thus
• A similar result holds for a two-sided confidence interval
km
kmmRP k 0
1)|(CEP
k
mn
k
mM
n
mMk
m
n
mf
mfkm
kmRP
0
0
0
21
][
][0
1)(CEP
u
lmnul m
nRRP
21
)CEP(
EMIS7300 Fall 2005Copyright 2003-2005 Dr. John Lipp
S3P1-20
Circular Error Probability (cont.)
• A one-sided test for n = 10,
• If the desired value of is not on the table, linear interpolation can be used:
where k k-1.
111
11
1
kkk
kk
kk
kkk
kk
kk RRRRRCEP
k 0 1 2 3 4 5 6 7 8 9 10
1-k 0.001 0.01 0.05 0.17 0.38 0.62 0.83 0.95 0.99 0.999 1.0
k 0.999 0.99 0.95 0.83 0.62 0.38 0.17 0.05 0.01 0.001 0.0
EMIS7300 Fall 2005Copyright 2003-2005 Dr. John Lipp
S3P1-21
Circular Error Probability (cont.)
i Raw Data Sorted Data k
1 2.0214 0.3958 0.9997 2 1.8952 0.6757 0.9979 3 1.1592 0.8748 0.9894 4 1.6317 0.9548 0.9616 5 0.8748 1.0343 0.8949 6 1.2302 1.1592 0.7728 7 2.0064 1.2302 0.5982 8 0.6757 1.2775 0.4018 9 1.2775 1.5215 0.227210 2.4094 1.6221 0.105111 1.6221 1.6317 0.038412 0.3958 1.8952 0.010613 2.3211 2.0064 0.002114 0.9548 2.0214 0.000315 1.0343 2.3211 <0.000116 1.5215 2.4094 0.0
• The data on the left is Rayleigh with a median of 2ln(2) 1.3863.
• The sample median is 1.340.
• Select = 0.05. n = 16.
• Looking at the table, k = 11.
• Using the interpolation formula,
CEP 0.8261R11 + 0.1739R10
• Final result:
CEP 1.630
with 95% confidence.
EMIS7300 Fall 2005Copyright 2003-2005 Dr. John Lipp
S3P1-22
Homework
• Use the rank transform on the time data for the Hot Wheels launcher experiment and repeat the regression analysis for HW S2P4-1 modify your Excel spreadsheet to use the ranks instead of the raw data.