EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © 2003-2005 Dr. John Lipp

EMIS 7300SYSTEMS ANALYSIS METHODS

FALL 2005

Dr. John Lipp

Copyright © 2003-2005 Dr. John Lipp

EMIS7300 Fall 2005Copyright 2003-2005 Dr. John Lipp

S3P1-2

Course Outline

• Part 1: Rank (Order) and Non-Parametric Statistics.

• Part 2: Statistical Process Control.

• Part 3: Reliability.

• Mid-term Exam.


S3P1-3

Today’s Topics

• Empirical Cumulative Distribution Function

• Rank Transform.

• Sign Test.

• Tukey’s Two Sample Quick Test.

• Circular Error Probability Confidence Interval.


S3P1-4

Non-Parametric and Rank Statistics

• Non-parametric statistical procedures are designed without the use of the underlying data distribution and its parameters.

– The only assumption is the data samples are statistically independent and come from the same distribution.

– Also known as distribution-free.

– Hypothesis tests and confidence intervals are on the median, quartiles, percentiles, or other quantiles.


S3P1-5

Empirical Cumulative Distribution Function

• The sample CDF or empirical CDF is defined by

is equivalent to sorting the data yi = sort(xi) and plotting yi vs. i/n as a stair-step function.

n

xxxF i

X

#)(ˆ

n x i y i i /n

1 1.4 0 0.142 0 1.4 0.293 9.6 3.4 0.434 5.9 5.9 0.575 3.4 6.4 0.716 7.6 7.6 0.867 6.4 9.6 1

0

0.2

0.4

0.6

0.8

1

1.2

0 5 10 15


S3P1-6

Empirical Cumulative Distribution Function (cont.)

• The sample PDF is an unbiased estimator

• The variance is given by

• The calculations of the expected values are actually easy

– The distribution of #xi < x has a binomial distribution with p = F(x) !!!

)(

)(#)(ˆ xF

nxnF

n

xxExFE i

X

n

xFxFxFVar X

)(1)()(ˆ


S3P1-7

Empirical Cumulative Distribution Function (cont.)


S3P1-8

Rank Transform

• The rank transform is simply replacing the data with the data’s ranks from sorting in ascending order.

• Using the ranks often simplifies calculations:

– Rank Sample Mean (unaffected by ties)

– Rank Sample Standard Deviation (affected by ties)

21

2)1(111

11

nnnn

in

Rn

Rn

i

n

ii

12)1(

6)12)(1(

11

11

11

2

2

1

2

tiesifuseMust

2

1

22

nnRn

nnnn

Rnin

RnRn

Sn

i

n

ii


S3P1-9

Sign Test

• Consider a hypothesis test on the median of a data set {xi}

• The test is performed by subtracting C from the data {xi} and taking the sign, {si} = {sign(xi – C)}.

• The number of “+” and “–” values of {si} are counted, denoted r+ and r–, respectively.

C

C

CH

CH

~sided)(oneor,~sided)(twoor,~:

~:

1

0

xi 13.5 9.8 11.4 12.2 7.9 8.6 9.1 10.6 11.3 10.1

xi-10 3.5 -0.2 1.4 2.2 -2.1 -1.4 -0.9 0.6 1.3 0.1

si + – + + – – – + + +


S3P1-10

Sign Test (cont.)

• What is the distribution of r+?

– r+ is a discrete random variable.– r+ can be thought of as the count of successful “+”s in {si}.

– This success rate is a constant, p = 0.5.Ergo, r+ has a binomial distribution, .

21

][ nR r

nrf

0 1 2 3 4 5 6 7 8 9 100

0.05

0.1

0.15

0.2

0.25

r+

f R+[r+ ]


S3P1-11

Sign Test (cont.)

• For n large (n >> 10), can use a Z test with

• Otherwise, a table built from the binomial PDF is needed

– Two-sided

– One-sided

nnR

Z5.0

5.0

Acceptance Region

r+ = 54 r+

63 r+

72 r+

81 r+

9

0.754 0.344 0.109 0.022 0.002

Acceptance Region

r+ 3

r+ 4 r+ 5 r+ 6 r+ 7 r+ 8 r+ 9

0.828 0.623 0.377 0.172 0.055 0.011 0.001


S3P1-12

Sign Test (cont.)

• The sign test can be used to test any quantile p, F(p) = p.

• The null hypothesis test is H0: p = C.

• The distribution of the test statistic r+ is binomial for p,

• Example: test H0: first quartile = 8.5 (q1 = 0.25= 8.5)

rrn

Rpp

r

nrf )1(][

xi 13.5 9.8 11.4 12.2 7.9 8.6 9.1 10.6 11.3 10.1

xi-8.5 5.0 1.3 2.9 3.7 -0.6 0.1 0.6 2.1 2.8 1.6

si + + + + – + + + + +

Acceptance Region

7 r+ 8

6 r+ 9

5 r+ 10

0.4682 0.1344 0.0197


S3P1-13

Tukey’s Two-Sample Quick Test

• Plot two data samples {xi} and {yi} on the same graph, using a different symbol for each point.

• Count the number of points of {xi} that protrude past {yi} at one end, and the number of points of {yi} that protrude past {xi} at the opposite end.

– The total is denoted the end-count.– If {xi} protrudes at both ends, or visa-versa, then the end-

count is 0.

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3

{x}

{y}


S3P1-14

Tukey’s Two-Sample Quick Test (cont.)

• Use the table below for the significance level

– Confidence level, 1 – , is the chance a difference in the medians exists between {xi} and {yi} (or their means, if the PDF is symmetric).

Sample Size n = 0.05 = 0.01 = 0.001

4-8 7 9 13

9-21 7 10 13

22-24 7 10 14

25+ 8 10 14


S3P1-15

Circular Error Probability

• Circular Error Probability, or CEP, is specified in many weapon system’s requirements.

• The CEP is the median, radial miss distance.

• The standard model for the radial miss distance is the Rayleigh distribution.

CEP

R

f R(R

)

R

FR(R

)


S3P1-16

Circular Error Probability (cont.)

• The appropriateness of the Rayleigh radial miss distance model tends to decrease as the system complexity increases.

• Point Estimator: the sample median.

– Need the distribution of R to analyze!– Solution: Use non-parametric methods!

• Confidence Interval: sort the sample radial miss data so that

R1 R2 R3 … Rk … Rn

and find the value of k such that

P(CEP Rk) = 1 –

• Finding the appropriate value of k takes a little manipulation.


S3P1-17


• First, let m be the index of the largest radial miss that is less than or equal to the population median (= CEP)

1< m < n: R1 R2 R3 … Rm CEP … Rn, or

m = 0: CEP R1 R2 R3 … Rn, or

m = n: R1 R2 R3 … Rn CEP

• The PDF of m, fM[m], is binomial with p = ½ !

– The radial miss distances are assumed to be statistically independent.

– The probability that a particular radial miss distance is less than the median is a constant ½ (by definition).

– m is the # of radial miss distances less than the median.


S3P1-18


• The desired probability can be rewritten using the total probability rule:

• Evaluate P(CEP Rk|m)

– If m < k: P(CEP Rk|m) = 1

R1 R2 R3 … Rm CEP … Rk … Rn

– If m = k: P(CEP Rk|m) = 1

R1 R2 R3 … CEP Rk … Rn

– If m > k: P(CEP Rk|m) = 0

R1 R2 R3 … Rk … Rm CEP … Rn

n

mMkk mfmRPRP

0

][)|(CEP)(CEP


S3P1-19


• That is,

and thus

• A similar result holds for a two-sided confidence interval

km

kmmRP k 0

1)|(CEP

k

mn

k

mM

n

mMk

m

n

mf

mfkm

kmRP

0

0

0

21

][

][0

1)(CEP

u

lmnul m

nRRP

21

)CEP(


S3P1-20


• A one-sided test for n = 10,

• If the desired value of is not on the table, linear interpolation can be used:

where k k-1.

111

11

1

kkk

kk

kk

kkk

kk

kk RRRRRCEP

k 0 1 2 3 4 5 6 7 8 9 10

1-k 0.001 0.01 0.05 0.17 0.38 0.62 0.83 0.95 0.99 0.999 1.0

k 0.999 0.99 0.95 0.83 0.62 0.38 0.17 0.05 0.01 0.001 0.0


S3P1-21


i Raw Data Sorted Data k

1 2.0214 0.3958 0.9997 2 1.8952 0.6757 0.9979 3 1.1592 0.8748 0.9894 4 1.6317 0.9548 0.9616 5 0.8748 1.0343 0.8949 6 1.2302 1.1592 0.7728 7 2.0064 1.2302 0.5982 8 0.6757 1.2775 0.4018 9 1.2775 1.5215 0.227210 2.4094 1.6221 0.105111 1.6221 1.6317 0.038412 0.3958 1.8952 0.010613 2.3211 2.0064 0.002114 0.9548 2.0214 0.000315 1.0343 2.3211 <0.000116 1.5215 2.4094 0.0

• The data on the left is Rayleigh with a median of 2ln(2) 1.3863.

• The sample median is 1.340.

• Select = 0.05. n = 16.

• Looking at the table, k = 11.

• Using the interpolation formula,

CEP 0.8261R11 + 0.1739R10

• Final result:

CEP 1.630

with 95% confidence.


S3P1-22

Homework

• Use the rank transform on the time data for the Hot Wheels launcher experiment and repeat the regression analysis for HW S2P4-1 modify your Excel spreadsheet to use the ranks instead of the raw data.

Documents

EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © 2003-2005 Dr. John Lipp