1 Some Ideas for Detecting Spurious Observations Based on Mixture Models Jim Lynch NISS/SAMSI & University of South Carolina

1

Some Ideas for Detecting Spurious Observations Based

on Mixture Models

Jim LynchNISS/SAMSI & University of South Carolina

2

Some Ideas for Detecting Spurious Observations

•Work with Dave Dickey and Francisco Vera•Very Preliminary Ideas•Primarily Motivated by Dave’s American Airlines Data and Proschan’s (1963) paper on pooling to explain a decreasing failure rate and, to a lesser extent, M. J. Bayarri talk on Multiple testing

3

Outline

• 1. Introduction

• 2. Mixture Models

• 3. Some Ideas

• 4. Simulations

• 5. The American Airlines Data

4

IntroductionSome Motivation – AA Data(Largest Log Vol Removed)

• Some Time Series Diagnostics Suggest That Log Volume Ratio is an MA(1)

• Fit an MA(1) to the log Vol Ratio to the AA Data

• Look At The Residuals

Index

RES

I1

2702432161891621351088154271

2.0

1.5

1.0

0.5

0.0

-0.5

-1.0

-1.5

Time Series Plot of RESI1

Observation

Indiv

idual V

alu

e

2702432161891621351088154271

2.0

1.5

1.0

0.5

0.0

-0.5

-1.0

-1.5

_X=-0.000

UCL=1.235

LCL=-1.236

11

1

1

Individuals Chart of the Residual - Largest Removed From Analysis

5

Introduction

• Detecting spurious observations is an important area of research and has implications for anomaly detection (AD).

• The term spurious observation is used to distinguish it from an outlier, since outliers are usually extreme observations in the data while a spurious observation need not be.– E.g., one could imagine that sophisticated

intruders into computer systems would make sporadic intrusions and try to mimic as best as possible normal behavior

6

Introduction

• Goal– To develop approaches to detect very

transient spurious events where the objectives are

• To detect when there are spurious events present and, if possible,

• To identify them

7

Introduction• The Basic Data Analytic Model

– X1,…, Xn iid ~ fp = (1-p) f0 + p f1

• f0 is the background model

• f1 models the spurious behavior

• The likelihood is then

n}{1,..., from j size ofsubset a denotes C where

)()()1(

)}()()1{(

j

100

11

0

iCi

iCi

n

j C

jnj

ii

n

i

XfXfpp

XpfXfp

cjjj

8

Introduction

• A “More Realistic” Model– Generate a configuration C with probability p(C)

– Given C, for iC, Xi are iid ~ f0 and, for iCc, Xi are iid ~ f1

• C and Cc model a spatial or temporal (e.g., a change-point) pattern

• You are “pooling” observations based on the configuration C

• The likelihood is then

)()()( 10 iCi

iCiC

XfXfCpc

9

IntroductionSome Approaches for Analyzing the “MR” Model

• Envision that the data are the effects of pooling observations from f0 and f1.

• Treat the data as if it is from a mixture model and use a mixture model to determine the mle, p*, of the mixing proportion.– Use p* to test H0: p=0 versus H1: p>0

(Under H0 and the mixture model, n-.5p* converges in distribution to X where X=0 with probability .5 and =|N(0,I0

-1)| with probability .5)– If H0 is rejected see if the mixture model can give insights into the

configuration Cj

• E.g., do an empirical Bayes with prior p(Cj)=(1-p*)jp*n-j. Then)()(**)1(),...,|C( 101j i

Cii

Ci

jnjn XfXfppXXP

cjj

10

IntroductionAnother Approach

• Since f1 models the spurious behavior p~0• p~0 suggest using the locally most powerful

(LMP) test statistic for testing H0:p=0 versus H1:p>0 as the basis of discovering if there are spurious observations present

• The test statistic is related essentially to the gradient plot introduced by Lindsay (1983) to determine when a finite mixture mle is the global mixture mle in the mixed distribution model

11

IntroductionAnother Approach

• The basis of this approach– use the gradient plot to determine if the one point

mixture mle is the global mixture mle– When it isn’t, this suggest that some spurious

behavior is present• One can then use the components in the mle mixed

distribution to calculate “assignment probabilities” to the data to indicate what observations might be considered spurious

• The examples indicate that detecting the presence of spurious observations seems to be considerably simpler than identifying which ones they are

12

IntroductionMining Data Graphs

• Data (Maguire, Pearson and Wynn, 1952): Time Between Accidents with 10 or more fatalities

• At the right are the gradient plots for the 2 and 3 point mixture mle’s and the assignment function for the 3 pt mle (mixing over exponentials)

• The 2 and 3 pt mixture mle’s– 592.9, 166.2

p: .175, .825– 595.5, 171.6, 29.1

p: .171, .806, .023

13

Mixture Models– X1,…, Xn iid ~ fp = (1-p) f0 + p f1

• f0 is the background model

• f1 models the spurious behavior

• Since the spurious observations are sporadic/transient p~0

– Denote the log likelihood by

(f(X1),…, f(Xn)) = (f) = log if(Xi)

– Denote the gradient function of by

n

i i

iip

p Xf

XfXf

p

ffff

1 0

010

0010 )(

)()()()(lim);(

14

Mixture Models – LMP• Lemma

The locally most powerful test for testing H0:p=0 versus H1:p>0 is based on 0(f1; f0).

• ProofThe LMP test for testing H0:p= p0 versus H1:p> p0 is based on the statistic

For p=0 this reduces to

n

i

n

i ip

iippip Xf

XfXfXf

p 1 1

01

)(

)()(|)(log

0

0

n

i i

ii

Xf

XfXfff

1 0

01010 )(

)()();(

15

Mixture Model

• The Function (f1; f0)– Plays a prominent role in the analysis of data from

mixtures models where it is essentially the gradient function.

– Introduced by Lindsay (1983a&b and 1995) to determine when the mle for the mixing distribution with a finite number of points was the global mixture mle.

16

Mixture ModelFramework

• Family of densities {f:}.– M is the set of probability measures on Q.– The mixed distribution over the family with mixing

distribution Q by

– For X1,…, Xn be iid from fQ, the likelihood and log likelihood are given by

• L(Q) = fQ(Xi) and (fQ) = log ifQ(Xi)

• fQ = (fQ(X1),…, fQ(Xn)).

)( dQffQ

17

Mixture ModelFramework

• The Directional Derivative

)();();(

thatnote ,}1)(

)({)f;f()Q;(For

}1)(

)({

)(

)()(

/}ff)1{(lim)f;f(

10

1Q

1 0

1

1 0

01

1Q0Q00Q1Q

01

dQQDff

Xf

XfD

Xf

Xf

Xf

XfXf

QQ

n

i iQ

i

n

i iQ

iQn

i iQ

iQiQ

18

Mixture ModelA Diagnostic

• Theorem 4.1 of Lindsay (1983a)– A. The following three conditions are equivalent:

• Q* maximizes L(Q)• Q* minimizes supD(;Q)• sup D(;Q*)=0.

– B. Let f*=fQ*. The point (f*,f*) is a saddle point of .i.e.,

(fQ’;f*) < 0 = (f*;f*) < (f*; fQ’’) for Q’, Q’’ e M.

– C. The support of Q* is contained in the set of for which D(;Q*)=0.

19

Mixture ModelThe Assignment/Membership Function

)X(

)X()1()X(

is Xn observatiofor Qon based The

not. if 0 andA if 1 )(

on wheredistributi mixedpoint twoa be

)(p)(p)-(1Q(A)

Let

i

ii

i

0

10

QQ f

fpA

functionassignment

A

AA

20

Simulations n=10: 5 points N(0,1), 5 points N(1,1)

• 0 -0.34964• 0 -1.77582• 0 -0.92900• 0 0.58061• 0 -0.36032• 1 2.51937• 1 0.59549• 1 1.16238• 1 0.76632• 1 1.57752

C3000

Perc

ent

5.02.50.0-2.5-5.0

99

95

90

80

70

605040

30

20

10

5

1

Mean

0.912

0.3787StDev 1.261N 10AD 0.166P-Value

Probability Plot of C3000Normal - 95% CI

21

Simulations n=10: 5 points N(0,1), 5 points N(1,1)

210-1-2

4

3

2

1

0

Gradient Plot - 1 pt MLE fit mu-hat = .3795 observations from N(0,1), 5 from N(1,1)

mu

D(mu,Q1)

C3006

C3007

210-1-2

0.0

-0.5

-1.0

-1.5

-2.0

-2.5

Gradient Plot - 2 pt MLE fit5 observations from N(0,1), 5 from N(1,1)

p-.487880 .388813

.929969 .611187

22

SimulationsThe Assignment Function

observation

ass

ign

1086420

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0

Assignment Functionn=10 - 5 n(1,1), 5 n(0,1)

observation

logodds

1086420

3

2

1

0

-1

-2

-3

-4

LogOddsn=10 - 5 n(1,1), 5 n(0,1)

23

Simulationsn=30: 25 points N(0,1), 5 points N(1,1)

C3000

Perc

ent

43210-1-2-3-4

99

95

90

80

70

605040

30

20

10

5

1

Mean

0.858

0.2243StDev 1.250N 30AD 0.206P-Value

Probability Plot of C3000Normal - 95% CI

mu

gra

d2pt

3210-1-2

0

-1

-2

-3

-4

-5

-6

Gradient Plot - 2 pt MLE fit25 observations from N(0,1), 5 from N(1,1)

p-0.05537 0.867670 2.05801 0.132330

24

Simulations

n=30: 25 points N(0,1), 5 points N(1,1)

observation

ass

ign

302520151050

1.0

0.8

0.6

0.4

0.2

0.0

Assignment Function5 n(1,1), 25 n(0,1)

observation

logodds

302520151050

10

8

6

4

2

0

-2

-4

Logoddsn=30 - 5 n(1,1), 5 n(0,1)

25

SimulationsAnother n=30: 25 points N(0,1), 5 points N(1,1)

C3000

Perc

ent

543210-1-2-3

99

95

90

80

70

605040

30

20

10

5

1

Mean

0.761

0.9866StDev 1.222N 30AD 0.238P-Value

Another n=30 - 5 n(1,1) and 25 n(0,1)Normal - 95% CI

mu

gra

d2pt

543210-1-2-3-4

0

-5

-10

-15

-20

-25

-30

Gradient Plot - 2 pt MLE fitAnother 25 observations from N(0,1), 5 from N(1,1)

p

0.78767 0.921009

3.30559 0.078991

26

Simulations

Another n=30: 25 points N(0,1), 5 points N(1,1)

observation

ass

ign

302520151050

1.0

0.8

0.6

0.4

0.2

0.0

Assignment FunctionAnother Sample n=30 - 5 n(1,1), 5 n(0,1)

observation

logodds

302520151050

10.0

7.5

5.0

2.5

0.0

-2.5

-5.0

LogOddsAnother Sample n=30 - 5 n(1,1), 25 n(0,1)

27

AA Data

• Francisco will discuss this and some other simulations in a moment.

28

Closing Comments• Is there an analogue (or alternative) of these ideas for the SCAN (or

for the SCAN framework)? – As an alternative, view the problem as having several (two) mechanisms

creating observations• background• infectious material is present.

– Just consider that the data are a pooling from all these sites. See if the data is a 2-component mixture. If it is, try to “assign” the sites to these components. (You might use a thresh-holding of the assignment function to do this or p in the LMP Test Statistic.)

– Instead of the assignment function, consider the following based on the LMP test statistic. Define Li=(f1(Xi) - f0(Xi))/f0(Xi). Let L(1) <L(2) <…< L(n) and let j(i) denote the inverse rank, i.e., L(i)= Lj(i). For mixture or scanning purposes, consider the sets Ci={j(n),..,j(n-i+1)}={k: L(n-i+1) < Lk}. For mixtures with mle p*, assign Ci to f1 and Ci

c to f0 where np*~i. For scanning purposes, look through increasing sequence of sets Ci for a spatial pattern to emerge.

29

REFERENCES

Ferguson, T. S. (1967) Mathematical Statistics: A Decision Theoretical Approach. Academic Press, NY.

Grego, J., Hsi, Hsiu-Li, and Lynch, J. D. (1990). A strategy for analyzing mixed and pooled exponentials. Applied Stochastic Models and Data Analysis, 6, 59-70.

Lindsay, B.G. (1983a). The geometry of mixture likelihoods: a general theory. Ann. Statist., 11, 86-94.

Lindsay, B.G. (1983b). The geometry of mixture likelihoods, Part II: the exponential family. Ann. Statist., 11, 783-792.

Lindsay, B.G. (1995). Mixture Models: Theory, Geometry & Applications, NSF-CBMS lecture series, IMS/ASA

Maguire, B.A., Pearson, E.S., and Wynn, A.H.A. (1952) The time interval between industrial accidents. Biometrika, 39, 168-180.

Proschan, F. (1963). Theoretical explanation of decreasing failure rate. Technometrics, 5, 375-383.

Documents

1 Some Ideas for Detecting Spurious Observations Based on Mixture Models Jim Lynch NISS/SAMSI & University of South Carolina