Yao Xie, David Siegmundyxie77/Yao_GIT_talk2012.pdf · Solar ﬂare detection I Video sequences,...

Sequential Multisensor ChangepointDetection with Sparsity

Yao Xie, David Siegmund

Statistics Seminar,Georgia Institute of Technology

October 3, 2012

Outline

I BackgroundI Mixture procedureI Profile procedureI Parallel mixture procedureI Summary

Change-point detection using multiple sensors

•  Detect%malicious%ac-vity%or%traffic%conges-on%in%internet%

•  Sensor%network%monitoring%traffic,%environment%

•  Epidemiology:%monitoring%onset%of%infec-ous%disease%%

•  fMRI%signal:%ac-va-on%of%brain%regions%

•  Anomalous%paBerns%arising%in%social%networks%

Solar flare detection

I Video sequences, each pixel is a “sensor”I Very high-dimensional: # sensors = 232× 292 = 67744I Goal: online detection of small and transient solar flare

t = 100t = 226

Source: NASA

Multisensor changepoint detection

0 10 20 30 40 50−10010

Change-point occurs at time κ

Single-sensor change-point detection

“change-point” detection in statistics and quality-controlI Min-max formulation: Page (54), Lorden (71)I Bayesian: Shiryayev (63), Roberts (66)I A sequence i.i.d. observations y1, y2, · · ·I Unknown change-point κ > 0.

H0 :yt ∼ f0, t = 1,2, . . .H1 :yt ∼ f0, t = 1, . . . , κ,

yt ∼ f1, t = κ+ 1, . . .

I Goal:I Low false-alarm rate: long average-run-length (ARL)I Small expected detection delay

I For a hypothesized κ = k :

`(t , k) =

∏km=1 f0(ym) ·

∏tm=k+1 f1(ym)∏t

m=1 f0(ym)=

t∏m=k+1

f1(ym)

f0(ym)

I Likelihood ratio change-point detection:

T = inft ≥ 1 : maxk<t

log `(t , k) ≥ b

Normal distributions

I f0 = N (0,1), f1 = N (µ,1), µ > 0

I Likelihood ratio: log `(t , k) =∑t

m=k+1(µym − µ2

I Stopping rule

t∑m=k+1

(µym −µ2

2) ≥ b

CUSUM statisticsI When µ is unknown: µ(k) = (

∑tm=k+1 ym)/(t − k)

I Stopping rule becomes

m=k+1 ym)2

t − k≥ b

GLR statistics

Example

µ = 0.5Hard to see change in signal, visible in CUSUM and GLR:

0 100 200 300 400 500−5

0 100 200 300 400 5000

100 200 300 400 5000

Multisensor problem formulation

I H0: sensors observe i.i.d. white noise

yn,t ∼ N (0,1),n = 1, · · · ,N, t = 1,2, . . .

I H1: exists a changepoint affect a subset of sensors

n ∈ S :yn,t ∼ N (0,1), t = 1, . . . , κyn,t ∼ N (µn,1), t = κ+ 1 . . . ,

n ∈ Sc :yn,t ∼ N (0,1), t = 1,2, . . .

I κ: unknown changepoint timeI µn > 0: unknown changepoint amplitudesI S: unknown subset of affected sensors

Related multisensor changepoint detection methods

I Mei, 2010

Tmei = inft ≥ 1 :N∑

maxk<t

t∑m=k+1

(µnyn,m −µ2

2) ≥ b

I Tartakovsky, Veeravalli, 2008

TTV = inft ≥ 1 : maxk<t

N∑n=1

t∑m=k+1

(µnyn,m −µ2

2) ≥ b

I w : window length, only exam the past w samples

Y. Mei, 2010, efficient scalable schemes for monitoring a large number of data streams. Biometrika.

A. Tartakovsky and V. Veeravalli, 2008, asymptotic optimal quickest change detection in distributed sensor network

systems, Sequential Analysis.

General likelihood ratio for multisensor

I Form local log-generalized likelihood ratio statistic

Un,k ,t = maxµn

t∑m=k+1

(µnyn,m −µ2

I Unknown S, having to search all 2N possible subsets

T = inft ≥ 1,maxk<t

maxS∈Ω

∑n∈S

Un,k ,t

I Size |Ω| = 2N , exponentially complex in N

Two extremes

I Sum procedure: sum over all sensors

Tsum = inft ≥ 1 : maxt−w≤k<t

N∑n=1

Un,k ,t ≥ b

I Max procedure: only use statistic from one sensor

Tmax = inft ≥ 1 : maxt−w≤k<t

Nmaxn=1

Un,k ,t ≥ b

I w : window length, only exam the past w samples

Performance metricsSequential detection procedure

Ek T − k( )

E∞ T( )50 100 150 200

N = 100, p = 0.05

I Average run length E∞TI Expected detection delay EkT − k |T > k

0 50 100 150 2000

N = 100, p = 0.05

I Sum procedure: non-selective, include too much noisefrom unaffected sensors.

I Max procedure: ignores information on most but oneaffected sensors.

I What is missing?

Sparsity

t = 226

I Number of sensors affected by thechangepoint can be small

Mixture Procedure

Mixture procedure

I Typically a fraction p of sensors affected by changepointI Fraction of affected sensors:

p = |N |/N 1

I We assume each sensor affected with probability p0

I p0 is a guess for pI Generalized likelihood ratio statistic on each sensor

log(1− p0 + p0eU2n,k,t/2)

I From an mixture procedure

Tmix = inft ≥ 1 :∑

t−w≤k<t

N∑n=1

log(1− p0 + p0eU2n,k,t/2) ≥ b

Soft-thresholding

0 2 4 6 8 100

(1 −

p0 = 0.01

p0 = 0.1

p0 = 1

Choice of threshold b

Choice of b involves tradeoff between:

ARLI Usually want this to be big number ∼ 5000, 10000I Computationally expensive to simulateI Accurate theoretical approximation is highly valuedI Theory analysis hard in general

Expected detection delayI A relatively small number ∼ 10I Fairly easy to simulate

Exponential approximation

2000 4000 6000 8000 10000

Monte Carlo

Theory

Figure : Tail probability PTmix > m, p0 = 0.1. Numerical valuesobtained from 500 Monte Carlo trials.

Approximation of ARLTheoremAs b →∞, N →∞, with b/N fixed, for θ : ψ(θ) = b/N,

E∞T ≈ f (N, θ, p0)∫ [2Nγ(θ)]1/2

[2Nγ(θ)/w ]1/2 yν2(y)dy

g(x ,p0) = log(1− p0 + p0ex2/2)

ψ(θ) = logEexp[θg(U,p0)],

γ(θ) =12θ2E[g(U,p0)]2 exp[θg(U,p0)− ψ(θ)]

f (N, θ, p0) =θ[2πψ(θ)]1/2

γ(θ)N1/2 expN[θψ(θ)− ψ(θ)]

ν(x) = 2x−2 exp−2∞∑1

n−1Φ(−|x |n1/2/2)

Monte Carlo vs. Theoretical Approximation

Table : Average run length (ARL) of Tmix ,w = 200.

p0 b Theory Monte Carlo0.3 31.2 5001 55040.3 32.3 10002 102210.1 19.5 5000 49680.1 20.4 10001 100930.03 12.7 5001 48300.03 13.5 10001 9948

Approximation of expected detection delay

TheoremAs b →∞,

E0T = 2∆−2[b + ρ(∆)− |N | log p0−|N |/2 + Emin

t≥0St − (N − |N |)Eg(U,p0)+ o(1)].

I Total energy: ∆ = [∑

n∈N µ2n]1/2

I St ,∑t

i=1 zi , zi ∼ N (∆2/2,∆2)

I ρ(∆) = ∆2

4 + 1−∑∞

i=1 i−1ES−1i

I Emint≥0 St = ρ(∆)− 1− ∆2

Monte Carlo vs. Theoretical Approximation

Table : Expected detection delay of Tmix , ARL ≈ 5000, µ = 1, andw = 200.

p p0 b Theory Monte Carlo0.3 0.3 31.2 3.5 3.20.1 0.3 31.2 6.2 6.50.3 0.1 19.5 5.2 3.60.1 0.1 19.5 7.2 6.70.03 0.1 19.5 13.9 14.30.03 0.03 12.7 13.9 14.2

Comparison of detection delay

Table : Expected Detection Delays, N = 100, 500 Monte Carlo trials,ARL 5000.

p Method DD, µ = 1 DD, µ = 0.70.05 max 15.5 28.4

Tmix (p0 = .1) 10.4(10.1) 18.9(19.9)Tsum 14.8 28.1TTV 15.7 27.0Mei 15.7 26.9

Comparison of expected detection delay, ARL = 5000

0 0.05 0.1 0.15 0.2 0.25 0.30

GLRMixture, p

0 = 0.1

Modified TV

Hard-thresholding

I Avoid numerical stability issue in log(1− p0 + p0ex2/2)

I We have analytic approximate ARL of hard-thresholdingmixture procedure.

0 2 4 6 80

g(x) = log(1 − p0 + p

h(x) = (log p0 + x)

Tmix ,h = inf

t : max

0≤k<t

N∑n=1

[Un,k ,t + log(p0)]+ ≥ b

Solar flare detectionI Pre-processing: subtract time-varying background using

tracking method

Yao Xie, Jiaji Huang, Rebecca Willett, Changepoint detection for high-dimensional time series with missing data,

submitted, 2012.

0 50 100 150 200 2500

12x 10

0 50 100 150 200 2500

p0 = 1

Tmix ,h, p0 = 1

0 50 100 150 200 2500

3p0 = 1e−4

Tmix ,h, p0 = 0.0001

Profile Procedure

When magnitude has spatial profile

0.10.1

0.150.1

0.20.2

0.250.25

0.30.3

Sensor location and change−point profile

0 2 4 6 8 100

Profile-based method

I change-point amplitude at the nth sensor is determined by

µn =M∑

rmαzm (xn),

I M is the number of sourcesI zm ∈ D is the (unknown) location of mth sourceI xn is the location of n-th sensorI scalar rm unknown strength of the mth signalI profile function

αz(x) =1√2πβ

4β ||x−z||2, x ∈ R2, β > 0

Siegmund, D. O. and Yakir, B. (2008). Detecting the emergence of a signal in a noisy image. Statistics and Its

Inference.

I assuming one sourceI maximizing over the possible source location zI profile-based procedure

Tprofile = inft : maxt−w≤k<t

maxz∈D

αz(xn)Un,k ,t︸︷︷︸“matched filter”

]2 ≥ b.

Table : Comparison of Expected Detection Delays (DD)

b DD r = 1 DD, r = 1.5Profile 26.3 25.6 12.3

Mix 39.7 78.3 35.8

Parallel Mixture Procedure

Sensitivity to p

I Mix procedure: good enough?I We do not know true pI When p0 6= p, N is large: p0N very different from pN

0 0.05 0.1 0.15 0.2 0.25 0.3 0.350

p0 = p

p0 = 0.1

N = 100

Parallel Procedure

I How to be more robust to uncertainty in pI Use two mixture procedures

Tparallel , minTmix (p1,b1),Tmix (p2,b2).

I Choose p1 and p2:p ∈ [p1,p2]

Parallel mixture procedure for multi-sensor change-point detection, Yao Xie, David Siegmund, 2012 Joint Statistical

Meetings, Jan. 2012.

ARL of parallel procedure

I No closed form ARL of parallel procedureI Choose b1 and b2: ARL of two procedures equalI Use a very conservative lower bound

I P∞Tmix (pi ,bi ) ≤ 1000 ≈ 0.05, i = 1,2I By Bonferroni inequality:

P∞min[Tmix (p1,b1),Tmix (p2,b2)] ≤ 1000 ≤ 0.1,

So E∞Tparallel ≥ 10000

Expected detection delay: parallel procedure

Table : Comparison of Parallel and Single Procedures

p µn = µ Tmc Tmix (0.1) Tparallel0.1 0.7 5.7 6.5 6.4

0.005 2.0 48.5 27.1 22.90.005 1.0 94.7 54.5 45.80.25 7.5 8.4 12.0 10.5

Parallel sum procedure

I drawback of parallel procedure: no closed form ARLI parallel sum procedure: linearly combine the statistics of

Tmix (pi ,bi):

Tparallel,2 = inft : c1S(p1) + c2S(p2) > b

I can compute ARL approximationI slower than parallel procedure in some cases

Ongoing: sequential changepoint detection withgeometry

Courtesy: R. Calderbank

I If we know the sparsity has knowncorrelation

I Example: network anomaly detection ininternet

I We can build a sparse transformation(e.g. expander graph) to boost signalstrength

z = Ax

Joint work with Meng Wang and Robert Calderbank

Summary

I How to exploit sparsity structure in multisensorchangepoint detection?

I mixture procedureI profile-based procedureI parallel mixture procedure

I How to characterize their performance?I How to exploit correlation between sensors

Thank you!

Generalized likelihood ratioI f0 = N (0,1), f1 = N (µ,1), µ is unknownI Log-generalized likelihood ratio

µ(St − Sk )− 12

(t − k)µ2

I Maximum-likelihood estimate µ = (St − Sk )/(t − k)

I Stopping rule becomes

T = inft ≥ 1 :(St − Sk )2

t − k≥ b

I Very precise approximation for ARL [Vankatraman,Siegmund, 1995]

E(T ) ∼ (2π)1/2 exp(b2/2)

b∫∞

0 xν2(x)dx

ν(x) = 2x−2 exp−2∑∞

n=1 n−1Φ(−xn1/2/2)

Yao Xie, David Siegmundyxie77/Yao_GIT_talk2012.pdf · Solar ﬂare detection I Video sequences,...

Documents

Herschel-SPIRE observations of the Polaris ﬂare : structure of … · 2010. 7. 29. · 2 Miville-Deschenes, M.-A. et al.:ˆ Herschel-SPIRE observations of the Polaris ﬂare Fig.1

Xie CSRP08 DryGranulationHeatRecovery (1)

Joe Xie May 26, 2011

Heterochronic parabiosis: the promise of pro- and anti ...Wyss-Coray Lab Tony Wyss-Coray . Kira I. Mosher . Rachelle Abbey . Daniela Berdnik . Jadon Shen . Xie Lab/AfaSci Simon Xie

Jianhua Xie - LonglongCare.com

Hui xie 591r_presentation

Naturalness in Xie Lingyun’s Poetic Works · Naturalness in Xie Lingyun’s Poetic Works ... Naturalness In Xie Lingyun 357 ... Just as the Yijing represented a hermeneutical system

Jingua Xie, matfestivalen 2012

Ken Xie CEO & Founder

Super--spspiii nn--fl id i f ifluid in a ferromagnetic ...ctcp/HKForum10/talks/Xie-XC-HKFP2010.pdf · Super--spspiii nn--fl id i f ifluid in a ferromagnetic graphene X. C. Xie International

The Walt Disney Company By Rita Xie. Rita Xie Sales Manager in the Walt Disney Company

Jimmy Gray Xie - UC San Diego Theatre and Dancetheatre.ucsd.edu/.../2019/GrayXJimmy_Resume.pdf · Jimmy Gray Xie j i mmyg ra yx i e @ g ma i l . c o m 9 2 5 - 4 0 8 - 6 1 7 5

A tidal disruption ﬂare in a massive galaxy? Implications

Performance Debugging for Distributed Systems of Black Boxes Yinghua Wu, Haiyong Xie Yinghua Wu, Haiyong Xie

XIE Washington

portfolio environmental graphic_yi xie

Modified by Feng Xie

Two countries Mrs Zeng. ni hao xie xie zai jian

Liping Xie

, Heng Li, Mao Xie, Bibek Gyanwali, Lihong Xie, Yikang Liu, … · 2016. 2. 28. · Meichan Zhu #, Guangyao He #, Heng Li, Mao Xie, Bibek Gyanwali, Lihong Xie, Yikang Liu, Songhua