Yao Xie, David Siegmundyxie77/Yao_GIT_talk2012.pdf · Solar flare detection I Video sequences,...

Preview:

Citation preview

Sequential Multisensor ChangepointDetection with Sparsity

Yao Xie, David Siegmund

Statistics Seminar,Georgia Institute of Technology

October 3, 2012

Outline

I BackgroundI Mixture procedureI Profile procedureI Parallel mixture procedureI Summary

Change-point detection using multiple sensors

•  Detect%malicious%ac-vity%or%traffic%conges-on%in%internet%

•  Sensor%network%monitoring%traffic,%environment%

•  Epidemiology:%monitoring%onset%of%infec-ous%disease%%

•  fMRI%signal:%ac-va-on%of%brain%regions%

•  Anomalous%paBerns%arising%in%social%networks%

Solar flare detection

I Video sequences, each pixel is a “sensor”I Very high-dimensional: # sensors = 232× 292 = 67744I Goal: online detection of small and transient solar flare

t = 100t = 226

Source: NASA

Multisensor changepoint detection

0 10 20 30 40 50−10010

y 1,t

0 10 20 30 40 50−10010

y 2,t

0 10 20 30 40 50−10010

y 3,t

0 10 20 30 40 50−10010

y 4,t

0 10 20 30 40 50−10010

t

y 5,t

1 2

3 4 5

63 κ

Change-point occurs at time κ

Single-sensor change-point detection

“change-point” detection in statistics and quality-controlI Min-max formulation: Page (54), Lorden (71)I Bayesian: Shiryayev (63), Roberts (66)I A sequence i.i.d. observations y1, y2, · · ·I Unknown change-point κ > 0.

H0 :yt ∼ f0, t = 1,2, . . .H1 :yt ∼ f0, t = 1, . . . , κ,

yt ∼ f1, t = κ+ 1, . . .

I Goal:I Low false-alarm rate: long average-run-length (ARL)I Small expected detection delay

I For a hypothesized κ = k :

`(t , k) =

∏km=1 f0(ym) ·

∏tm=k+1 f1(ym)∏t

m=1 f0(ym)=

t∏m=k+1

f1(ym)

f0(ym)

I Likelihood ratio change-point detection:

T = inft ≥ 1 : maxk<t

log `(t , k) ≥ b

Normal distributions

I f0 = N (0,1), f1 = N (µ,1), µ > 0

I Likelihood ratio: log `(t , k) =∑t

m=k+1(µym − µ2

2 )

I Stopping rule

T = inft ≥ 1 : maxk<t

t∑m=k+1

(µym −µ2

2) ≥ b

CUSUM statisticsI When µ is unknown: µ(k) = (

∑tm=k+1 ym)/(t − k)

I Stopping rule becomes

T = inft ≥ 1 : maxk<t

(∑t

m=k+1 ym)2

t − k≥ b

GLR statistics

Example

µ = 0.5Hard to see change in signal, visible in CUSUM and GLR:

0 100 200 300 400 500−5

0

5y

t

0 100 200 300 400 5000

20

40

CU

SU

M

100 200 300 400 5000

20

40

GL

R

Multisensor problem formulation

I H0: sensors observe i.i.d. white noise

yn,t ∼ N (0,1),n = 1, · · · ,N, t = 1,2, . . .

I H1: exists a changepoint affect a subset of sensors

n ∈ S :yn,t ∼ N (0,1), t = 1, . . . , κyn,t ∼ N (µn,1), t = κ+ 1 . . . ,

n ∈ Sc :yn,t ∼ N (0,1), t = 1,2, . . .

I κ: unknown changepoint timeI µn > 0: unknown changepoint amplitudesI S: unknown subset of affected sensors

Related multisensor changepoint detection methods

I Mei, 2010

Tmei = inft ≥ 1 :N∑

n=1

maxk<t

t∑m=k+1

(µnyn,m −µ2

2) ≥ b

I Tartakovsky, Veeravalli, 2008

TTV = inft ≥ 1 : maxk<t

N∑n=1

t∑m=k+1

(µnyn,m −µ2

2) ≥ b

I w : window length, only exam the past w samples

Y. Mei, 2010, efficient scalable schemes for monitoring a large number of data streams. Biometrika.

A. Tartakovsky and V. Veeravalli, 2008, asymptotic optimal quickest change detection in distributed sensor network

systems, Sequential Analysis.

General likelihood ratio for multisensor

I Form local log-generalized likelihood ratio statistic

Un,k ,t = maxµn

t∑m=k+1

(µnyn,m −µ2

n2

)

I Unknown S, having to search all 2N possible subsets

T = inft ≥ 1,maxk<t

maxS∈Ω

∑n∈S

Un,k ,t

I Size |Ω| = 2N , exponentially complex in N

Two extremes

I Sum procedure: sum over all sensors

Tsum = inft ≥ 1 : maxt−w≤k<t

N∑n=1

Un,k ,t ≥ b

I Max procedure: only use statistic from one sensor

Tmax = inft ≥ 1 : maxt−w≤k<t

Nmaxn=1

Un,k ,t ≥ b

I w : window length, only exam the past w samples

Performance metricsSequential detection procedure

Ek T − k( )

E∞ T( )50 100 150 200

0

10

20

30

40

50

60

70

t

N = 100, p = 0.05

b

I Average run length E∞TI Expected detection delay EkT − k |T > k

0 50 100 150 2000

50

100

150

200

250

300

t

Test

Sta

tisti

c

N = 100, p = 0.05

Mix

Max

Sum

I Sum procedure: non-selective, include too much noisefrom unaffected sensors.

I Max procedure: ignores information on most but oneaffected sensors.

I What is missing?

Sparsity

t = 226

I Number of sensors affected by thechangepoint can be small

Mixture Procedure

Mixture procedure

I Typically a fraction p of sensors affected by changepointI Fraction of affected sensors:

p = |N |/N 1

I We assume each sensor affected with probability p0

I p0 is a guess for pI Generalized likelihood ratio statistic on each sensor

log(1− p0 + p0eU2n,k,t/2)

I From an mixture procedure

Tmix = inft ≥ 1 :∑

t−w≤k<t

N∑n=1

log(1− p0 + p0eU2n,k,t/2) ≥ b

Soft-thresholding

0 2 4 6 8 100

2

4

6

8

10

x

log

(1 −

p0 +

p0 e

x)

p0 = 0.01

p0 = 0.1

p0 = 1

Choice of threshold b

Choice of b involves tradeoff between:

ARLI Usually want this to be big number ∼ 5000, 10000I Computationally expensive to simulateI Accurate theoretical approximation is highly valuedI Theory analysis hard in general

Expected detection delayI A relatively small number ∼ 10I Fairly easy to simulate

Exponential approximation

2000 4000 6000 8000 10000

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

m

P(T

2(0

.1)

> m

)

Monte Carlo

Theory

Figure : Tail probability PTmix > m, p0 = 0.1. Numerical valuesobtained from 500 Monte Carlo trials.

Approximation of ARLTheoremAs b →∞, N →∞, with b/N fixed, for θ : ψ(θ) = b/N,

E∞T ≈ f (N, θ, p0)∫ [2Nγ(θ)]1/2

[2Nγ(θ)/w ]1/2 yν2(y)dy

where

g(x ,p0) = log(1− p0 + p0ex2/2)

ψ(θ) = logEexp[θg(U,p0)],

γ(θ) =12θ2E[g(U,p0)]2 exp[θg(U,p0)− ψ(θ)]

f (N, θ, p0) =θ[2πψ(θ)]1/2

γ(θ)N1/2 expN[θψ(θ)− ψ(θ)]

ν(x) = 2x−2 exp−2∞∑1

n−1Φ(−|x |n1/2/2)

Monte Carlo vs. Theoretical Approximation

Table : Average run length (ARL) of Tmix ,w = 200.

p0 b Theory Monte Carlo0.3 31.2 5001 55040.3 32.3 10002 102210.1 19.5 5000 49680.1 20.4 10001 100930.03 12.7 5001 48300.03 13.5 10001 9948

Approximation of expected detection delay

TheoremAs b →∞,

E0T = 2∆−2[b + ρ(∆)− |N | log p0−|N |/2 + Emin

t≥0St − (N − |N |)Eg(U,p0)+ o(1)].

I Total energy: ∆ = [∑

n∈N µ2n]1/2

I St ,∑t

i=1 zi , zi ∼ N (∆2/2,∆2)

I ρ(∆) = ∆2

4 + 1−∑∞

i=1 i−1ES−1i

I Emint≥0 St = ρ(∆)− 1− ∆2

4

Monte Carlo vs. Theoretical Approximation

Table : Expected detection delay of Tmix , ARL ≈ 5000, µ = 1, andw = 200.

p p0 b Theory Monte Carlo0.3 0.3 31.2 3.5 3.20.1 0.3 31.2 6.2 6.50.3 0.1 19.5 5.2 3.60.1 0.1 19.5 7.2 6.70.03 0.1 19.5 13.9 14.30.03 0.03 12.7 13.9 14.2

Comparison of detection delay

Table : Expected Detection Delays, N = 100, 500 Monte Carlo trials,ARL 5000.

p Method DD, µ = 1 DD, µ = 0.70.05 max 15.5 28.4

Tmix (p0 = .1) 10.4(10.1) 18.9(19.9)Tsum 14.8 28.1TTV 15.7 27.0Mei 15.7 26.9

Comparison of expected detection delay, ARL = 5000

0 0.05 0.1 0.15 0.2 0.25 0.30

50

100

150

200

250

p

Exp

ecte

d D

ete

cti

on

Dela

y

Max

GLRMixture, p

0 = 0.1

Mei

Modified TV

Hard-thresholding

I Avoid numerical stability issue in log(1− p0 + p0ex2/2)

I We have analytic approximate ARL of hard-thresholdingmixture procedure.

0 2 4 6 80

1

2

3

4

5

6

x

g(x) = log(1 − p0 + p

0 e

x)

h(x) = (log p0 + x)

+

Tmix ,h = inf

t : max

0≤k<t

N∑n=1

[Un,k ,t + log(p0)]+ ≥ b

Solar flare detectionI Pre-processing: subtract time-varying background using

tracking method

Yao Xie, Jiaji Huang, Rebecca Willett, Changepoint detection for high-dimensional time series with missing data,

submitted, 2012.

0 50 100 150 200 2500

2

4

6

8

10

12x 10

4

Mei

0 50 100 150 200 2500

1

2

3

4

5

6

7

8

p0 = 1

t

Tmix ,h, p0 = 1

0 50 100 150 200 2500

0.5

1

1.5

2

2.5

3p0 = 1e−4

t

Tmix ,h, p0 = 0.0001

Profile Procedure

When magnitude has spatial profile

0.05

0.05

0.05

0.05

0.05

0.05

0.05

0.05

0.05

0.05

0.05

0.05

0.1

0.1

0.1

0.10.1

0.1

0.10.1

0.1

0.1

0.1

0.1

0.15

0.15

0.15

0.15

0.15

0.150.1

50.15

0.20.2

0.20.2

0.20.2

0.20.2

0.25

0.250.25

0.25

0.3

0.30.3

0.3

x

y

Sensor location and change−point profile

0 2 4 6 8 100

2

4

6

8

10

Profile-based method

I change-point amplitude at the nth sensor is determined by

µn =M∑

m=1

rmαzm (xn),

I M is the number of sourcesI zm ∈ D is the (unknown) location of mth sourceI xn is the location of n-th sensorI scalar rm unknown strength of the mth signalI profile function

αz(x) =1√2πβ

e−1

4β ||x−z||2, x ∈ R2, β > 0

Siegmund, D. O. and Yakir, B. (2008). Detecting the emergence of a signal in a noisy image. Statistics and Its

Inference.

I assuming one sourceI maximizing over the possible source location zI profile-based procedure

Tprofile = inft : maxt−w≤k<t

maxz∈D

[N∑

n=1

αz(xn)Un,k ,t︸ ︷︷ ︸“matched filter”

]2 ≥ b.

Table : Comparison of Expected Detection Delays (DD)

b DD r = 1 DD, r = 1.5Profile 26.3 25.6 12.3

Mix 39.7 78.3 35.8

Parallel Mixture Procedure

Sensitivity to p

I Mix procedure: good enough?I We do not know true pI When p0 6= p, N is large: p0N very different from pN

0 0.05 0.1 0.15 0.2 0.25 0.3 0.350

5

10

15

20

25

30

35

p

Ex

pecte

d D

ete

cti

on

Dela

y

p0 = p

p0 = 0.1

N = 100

Parallel Procedure

I How to be more robust to uncertainty in pI Use two mixture procedures

Tparallel , minTmix (p1,b1),Tmix (p2,b2).

I Choose p1 and p2:p ∈ [p1,p2]

Parallel mixture procedure for multi-sensor change-point detection, Yao Xie, David Siegmund, 2012 Joint Statistical

Meetings, Jan. 2012.

ARL of parallel procedure

I No closed form ARL of parallel procedureI Choose b1 and b2: ARL of two procedures equalI Use a very conservative lower bound

I P∞Tmix (pi ,bi ) ≤ 1000 ≈ 0.05, i = 1,2I By Bonferroni inequality:

P∞min[Tmix (p1,b1),Tmix (p2,b2)] ≤ 1000 ≤ 0.1,

So E∞Tparallel ≥ 10000

Expected detection delay: parallel procedure

Table : Comparison of Parallel and Single Procedures

p µn = µ Tmc Tmix (0.1) Tparallel0.1 0.7 5.7 6.5 6.4

0.005 2.0 48.5 27.1 22.90.005 1.0 94.7 54.5 45.80.25 7.5 8.4 12.0 10.5

Parallel sum procedure

I drawback of parallel procedure: no closed form ARLI parallel sum procedure: linearly combine the statistics of

Tmix (pi ,bi):

Tparallel,2 = inft : c1S(p1) + c2S(p2) > b

I can compute ARL approximationI slower than parallel procedure in some cases

Ongoing: sequential changepoint detection withgeometry

Courtesy: R. Calderbank

I If we know the sparsity has knowncorrelation

I Example: network anomaly detection ininternet

I We can build a sparse transformation(e.g. expander graph) to boost signalstrength

z = Ax

Joint work with Meng Wang and Robert Calderbank

Summary

I How to exploit sparsity structure in multisensorchangepoint detection?

I mixture procedureI profile-based procedureI parallel mixture procedure

I How to characterize their performance?I How to exploit correlation between sensors

Thank you!

Generalized likelihood ratioI f0 = N (0,1), f1 = N (µ,1), µ is unknownI Log-generalized likelihood ratio

maxµ

µ(St − Sk )− 12

(t − k)µ2

I Maximum-likelihood estimate µ = (St − Sk )/(t − k)

I Stopping rule becomes

T = inft ≥ 1 :(St − Sk )2

t − k≥ b

I Very precise approximation for ARL [Vankatraman,Siegmund, 1995]

E(T ) ∼ (2π)1/2 exp(b2/2)

b∫∞

0 xν2(x)dx

ν(x) = 2x−2 exp−2∑∞

n=1 n−1Φ(−xn1/2/2)

Recommended