35
Towards a Detection Theory For Intelligence Applications Stephen Ahearn Jim Ferry Darren Lo Aaron Phillips http://www.metsci.com

Towards a Detection Theory For Intelligence Applications

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Towards a Detection Theory For Intelligence Applications

Towards a Detection Theory For Intelligence Applications

Stephen AhearnJim FerryDarren Lo Aaron Phillips

http://www.metsci.com

Page 2: Towards a Detection Theory For Intelligence Applications

2

Motivation

Classical detection theory has been developed and applied for decades to detect and track stealthy targets– Submarines, aircraft, missiles, …– Using a variety of sensors: sonar, radar, IR, …– Gives U.S. a distinct advantage over adversaries

We seek to develop an analogous theory of detection on networks– To detect and track threat networks and activities

that cannot be observed directly– Exploiting diverse data: SIGINT, HUMINT, IMINT …– To maintain our advantage over today’s adversary

and win the GWOT

Page 3: Towards a Detection Theory For Intelligence Applications

3

Classical Detection Theory Applications

Wind

Current

( )( ) ( ) ( )

( ) ( ) ( )

T Tt t t t t t t t

t t t T Tt t t t

P x P x x x z dxx z

P P x x z dxX

X

++Δ +Δ

+Δ +

∅ + ΛΛ =

∅∅ + ∅ Λ

∫∫

, 1 ,( , ( , ( )))i t i t ty F y map G Yη+ =

( ) ,SZ S H ∪ ε=( )

( )

2

( )2

( )( )

( )

(1 ) if and ,

( | ) 0 if and ,

(1 ) if .

n

n e G

e Ge G HS

S

e G

p p S N H G

L Z G S S N H G

p p S N−

−−⎧⎪⎪ − ≠ ⊆⎪⎪⎪⎪= = ≠⎨⎪⎪⎪ − =⎪⎪⎪⎩

( ) ( ) *

( ) ( ) *

( 1) ( ; )

( ) ( 1) ( ; )

S YSH GS

S

e G e K m

H L G

e G e L mH n

H L G

c L p

X K c L p

∈⊆

⊆ ⊆

⊆ ⊆

Λ=−

∑ ∑

Detection TheoryDetection Theory

Detection theory enables detection and tracking of stealthy targets– Such targets cannot be detected by

analyzing sensor reports separately– Only through principled data fusion

does the signal stand out from noise– The probabilistic framework correctly

manages uncertainty and risk

Page 4: Towards a Detection Theory For Intelligence Applications

4

Data of InterestVast Sea of Data

AA #77(Pentagon)

AA #11(WTC North)

UA #93(Pennsylvania)

UA #175(WTC South)

SalemAlhamzi

HaniHanjour

NawafAlhmazi

KhalidAlmihdharMajed

MoqedAhmedAlnami

SaeedAlghamdi

HamzaAlghamdi

AhmedAlghamdi

AbdulAziz Alomari

MohamedAtta

Adapted from “Connecting the Dots --Tracking Two Identified Terrorists,”Valdis Krebs, 2001.

Detection Theory for Intelligence Applications

COMINT

HUMINT

ELINT

IMINT

MASINT

Sig

nal p

roce

ssin

g, d

ata

colle

ctio

n

( )( ) ( ) ( )

( ) ( ) ( )

T Tt t t t t t t t

t t t T Tt t t t

P x P x x x z dxx z

P P x x z dxX

X

++Δ +Δ

+Δ +

∅ + ΛΛ =

∅∅ + ∅ Λ

∫∫

, 1 ,( , ( , ( )))i t i t ty F y map G Yη+ =

( ) ,SZ S H ∪ ε=( )

( )

2

( )2

( )( )

( )

(1 ) if and ,

( | ) 0 if and ,

(1 ) if .

n

n e G

e Ge G HS

S

e G

p p S N H G

L Z G S S N H G

p p S N−

−−⎧⎪⎪ − ≠ ⊆⎪⎪⎪⎪= = ≠⎨⎪⎪⎪ − =⎪⎪⎪⎩

( ) ( ) *

( ) ( ) *

( 1) ( ; )

( ) ( 1) ( ; )

S YSH GS

S

e G e K m

H L G

e G e L mH n

H L G

c L p

X K c L p

∈⊆

⊆ ⊆

⊆ ⊆

Λ=−

∑ ∑

Detection TheoryDetection Theory

Data transformation

Detect and track networks

Detect and track activities

Assess threat levels and intent

Detection Theory

Detection Theory

Page 5: Towards a Detection Theory For Intelligence Applications

5

Approach: Two well-established theories

Likelihood Ratio Detection and Tracking– Explicitly models noise and signal– Principled Bayesian framework for managing uncertainty– Used for decades for tracking stealthy kinematical targets– Traditional domain has metric structure

Random graph theory– Models transactional domain– Discrete structure– Rich mathematics

Page 6: Towards a Detection Theory For Intelligence Applications

6

Likelihood Ratio Detection and Tracking (LRDT)

Requirements– State space– Measurement model L(z | x) (or L(z | x) = L(z | x) / L(z | Ø) )– Motion model PT(xt | xt-Δt)

Result – Update equations: probability form — P(xt)

– Update equations: likelihood ratio form — Λ(xt) = P(xt) / P(Ø)

( ) ( ) ( )Tt t t t t t t tP x P x x P x dx−

−Δ −Δ −Δ= ∫

{ }= ∪ ∅X X

( ) ( ) ( )1t t t tP x L z x P x

C−=

( ) ( ) ( )Tt t t t t t t tx P x x x dx−

−Δ −Δ −ΔΛ = Λ∫ ( ) ( ) ( )t t t tx z x x−Λ = ΛL

Page 7: Towards a Detection Theory For Intelligence Applications

7

Peak value over target track ~ 107

Second highest peak ~ 10

Time

Cumulative likelihood ratio surface (fusing data using motion model)

Time

Radar return data –Measurement likelihood ratio surfaces

Distance from radar

Classical Likelihood RatioDetection and Tracking (LRDT)

Motion model describes movement of periscopeFusion over time smoothes out random fluctuations from noise and clutterLR peaks accumulate on movement that fits the motion model (i.e., the periscope)LR peaks dissipate for structures which move in other ways (e.g., waves)

Page 8: Towards a Detection Theory For Intelligence Applications

8

t = 5t = 20t = 10t = 15t = 25t = 30t = 35

LRDT on Networks: a Simple Example

( ) 0P x = ( ) 1P x =

t = 1

Ground truth position ofcell highlighted in purple

State space: all 1140 possible triangles (i.e., “terrorist cells”)Measurement model:– Signal: the cell appears with probability 0.8– Noise: each possible edge appears independently with probability 0.3

Motion model: cell swaps out a member with probability 0.1

Page 9: Towards a Detection Theory For Intelligence Applications

9

Overview

1. Graph-theoretic Underpinnings– Graph-theoretic analogues of noise and signal

2. Tracking Plans in Networks– Extension of LRDT to transactional domain

3. Hierarchical Hypothesis Management– Novel methodologies required to mitigate combinatorial

explosion of state space

4. References

Page 10: Towards a Detection Theory For Intelligence Applications

10

1. Graph-theoretic Underpinnings

Erdős-Rényi Random Graph Model G(n,p)– Provides noise model for detection problem– Simplest, most tractable random graph model

Inserted subgraph problem– Provides signal model for detection problem

Likelihood ratio– Optimal decision statistic for detection of inserted subgraph

Distribution of subgraph countOther random graph models?

Page 11: Towards a Detection Theory For Intelligence Applications

11

Erdős-Rényi Random Graph Model G(n,p)

The notation G(n,p) denotes a random graph…– on n vertices– with each edge appearing independently with probability p

Very simple noise process model– No correlation structure– Well-studied, but still yields difficult problems– First case to explore before moving on to more realistic network

models (Random Collaboration, Geometric, Gaussian, etc.)

Instance of G(n,p) for n = 6, p = 0.5

Page 12: Towards a Detection Theory For Intelligence Applications

12

Inserted Subgraph Problem

Evidence graph J– Background noise Erdős-Rényi random graph G(n,p).– Target graph H may or may not be inserted somewhere.

Binary decision problem: Is H present or not?Neyman–Pearson lemma: Likelihood ratio

is the optimal decision statistic, i.e., yields highestprobability of detection for given false alarm rate.

Theorem [Mifflin, Boner]:

( | )( )( | )HP J HJ

P J HΛ =

presentnot present

( )( )[ ]H

HH

X J H JJX H

Λ = =E

# copies of in # copies of expected just from noise

Page 13: Towards a Detection Theory For Intelligence Applications

13

Noise Process

Signal + Noise Process

People, Companies, Ports, Groups, Etc. Target

Graph H

H could represent four shipments of precursor items to four distinct, but linked entities

P(J | )P(J | + )

ΛH (J) =

Likelihood Ratio

optimal decision statistic

Inserted Subgraph Problem

Page 14: Towards a Detection Theory For Intelligence Applications

14

Likelihood Ratio Calculations

Example 1: n = 100, p = 0.07, XH(J) = 200– Are the 200 copies of H likely to have arisen by chance?– Answer:

Example 2: n = 1000, p = 0.007, XH(J) = 2000– Are the 2000 copies of H likely to have arisen by chance?– Answer:

Need information about distribution of XH to set thresholds and establish performance boundaries!– Expected value of XH easy.– But that’s all that’s easy about it!

( ) 200( ) 0.226 1[ ] 883.1H

HH

X JJX

Λ = = = <E

: probably just noise

( ) 2000( ) 174.8 1[ ] 11.4H

HH

X JJX

Λ = = =E

: probably contains target(s)

H =

Page 15: Towards a Detection Theory For Intelligence Applications

15

Expected Value of Subgraph Count XH

E[XH]: average # of subgraphs in an instance of G(n,p)– Some preliminary notation

• v(H) = number of vertices of H• e(H) = number of edges of H• |Aut(H)| = number of automorphisms of H

– Simple formula for E[XH] (Erdős):

( )( )![ ]( ) | Aut( ) |

e HH

n v HX pv H H⎛ ⎞⎟⎜= ⎟⎜ ⎟⎟⎜⎝ ⎠

E

# of arrangements of H on this vertex set

# of choices for vertex set of H

probability of all e(H)edges of H appearing

|Aut(H)| = 2 :

H =v(H) = 4

e(H) = 4

4

4

4![ ]4 2

1806

for

H

nX p

pn

⎛ ⎞⎟⎜= ⎟⎜ ⎟⎟⎜⎝ ⎠

==

E

Page 16: Towards a Detection Theory For Intelligence Applications

16

Distribution of Subgraph Count XH

Example: distribution of XH for n = 6, p = 0.5– Minimum possible value = 0 (probability = 18.1%), – Maximum possible value = 180 (probability = 0.003%)– Mean given by Erdős formula:

– Variance:

[ ] 4180 11.25HX p= =E

[ ] ( )( ) ( )24 2 3var 180 1 1 13 50 128 14.2HX p p p p p= − + + + =

H =

Instance ofG(n,p) forn = 6, p = 0.5

copies of862022010118

…0 1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627

0.025

0.05

0.075

0.1

0.125

0.15

0.175

( )

( ) ( )6 7 2

Pr

90 1 4

4.67% 0.5

exactly 8 copies of

for

H

p p p

p

=

− + =

=P

roba

bilit

y

XH

average = 11.25

Page 17: Towards a Detection Theory For Intelligence Applications

17

Asymptotic Estimate of Variance

Theorem [Ferry]:– Decompose H into core cr(H) and rooted trees Ti

– Color each node i of core by isomorphism class of Ti

– Then

where

and B is computed by an exact recursive formula.

1( )

cr( )

var, ;H

i iG i V HH

XB T T d O n

X π

π ⎛ ⎞⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎟⎜⎝ ⎠

⎡ ⎤⎢ ⎥ ⎛ ⎞ ⎛ ⎞⎢ ⎥ ⎟⎜ ⎟⎣ ⎦ ⎜⎟⎜ ⎟⎜⎟ ⎟⎜ ⎜⎟⎡ ⎤ ⎟⎜⎟⎜ ⎝ ⎠⎝ ⎠⎢ ⎥

⎢ ⎥⎣ ⎦

∈ ∈

= +∑ ∏E

Aut cr( ) Aut cr( )/G H Hχ⎛ ⎞ ⎛ ⎞⎟ ⎟⎜ ⎜⎟ ⎟⎜ ⎜⎟ ⎟⎜ ⎜⎟ ⎟⎜ ⎜⎟ ⎟⎜ ⎜⎟ ⎟⎜ ⎜⎟ ⎟⎟ ⎟⎜ ⎜⎟ ⎟⎜ ⎜⎝ ⎠ ⎝ ⎠

=

New result for study of distribution of XH– Bollobás, 1981 only applies to strictly balanced graphs.– Ruciński, 1988 does not estimate variance.– Janson, Łuczak, Ruciński give a much worse estimate.

π

H

cr(H)

( ) ( ), ; , ; B d B d

( ) ( ), ; , ; B d B d ×

( ) ( ), ; , ; B d B d ×

Page 18: Towards a Detection Theory For Intelligence Applications

18

Asymptotic Estimate of Variance

With example H, formula yields

For n = 106, mean degree d = 10( )

34613

6

( )![ ]( ) | Aut( ) |

10 31! 10 1.3 10768 1031

e H

H

n v H dXv H H n⎛ ⎞ ⎛ ⎞⎟⎜ ⎟⎜= ⎟ ⎟⎜ ⎜⎟ ⎟⎜⎜ ⎟ ⎝ ⎠⎝ ⎠

⎛ ⎞ ⎛ ⎞⎟⎜ ⎟⎜⎟= = ×⎜ ⎟⎜⎟ ⎟⎜⎜ ⎟⎜ ⎝ ⎠⎝ ⎠

E

[ ] [ ] [ ] [ ]18var 5.4 10H H H HX X X Xν= = ×E E 0 2 4 6 8 10

10000

1.μ109

1.μ1014

1.μ1019

1.μ1024 [ ]HXν

d

[ ][ ]

( 2 3 4 5var 1 192 2304 12960 47008 127120 276544192

H

H

Xd d d d d

X= + + + + + +

E

)

6 7 8 9

10 11 12 13

14 15 16 17

18 19 20 21 22

23 24 25

503280 784840 1066956 12786441361424 1296700 1110904 860672607208 394332 239356 13820876436 39914 19041 7875 2698729 147 18

d d d dd d d d

d d d dd d d d d

d d d

+ + + ++ + + ++ + + +

+ + + + ++ +

H

Page 19: Towards a Detection Theory For Intelligence Applications

19

2. Tracking Plans in Networks

Extension of LRDT to transactional domain– Noise model: Sequence of independent instances of G(n,p)– Signal model: Pattern of inserted subgraphs

Susceptible to combinatorial explosion

Page 20: Towards a Detection Theory For Intelligence Applications

20

Noise Model

Classic Erdős-Rényi random graph model G(n, p)– n=30– p=0.3

Observe L instances

G(30, 0.3)

1 2, , , LJ J J…

Page 21: Towards a Detection Theory For Intelligence Applications

21

Signal Model

Insert a sequence of graphs H1, H2, …, Hm into some fixed, but unknown, location.The Hks appear for τk time steps.Each edge has a probability pV of being observed.Total Plan length: T = 40

H1

τ1 = 10

Leader

H5

τ5 = 10

H4

τ4 = 5

H3

τ3 = 10

H2

τ2 = 5

Page 22: Towards a Detection Theory For Intelligence Applications

22

State Space

Must track location and internal plan time τ.Possible time states:

α indicates the plan has not yet started.ω indicates the plan has finished.State space:

indicates no plan is present.Use a diffuse prior on the state space.

{ } { },

( , )H

τ= ∅X ∪

1 , , τ τ α τ ω≤ ≤Τ = =

H

Page 23: Towards a Detection Theory For Intelligence Applications

23

Size of the State Space

There are

possible locations.Possible time states:

There are 42 possible time states.State space consists of

states.

5(12,180)(42) 1 511,561 5 10+ = ≈ ×

1 40, , τ τ α τ ω≤ ≤ = =

430 3! 12,180 103 2

⎛ ⎞⎟⎜ = ≈⎟⎜ ⎟⎟⎜⎝ ⎠

Page 24: Towards a Detection Theory For Intelligence Applications

24

Motion Model

Advance the probability of each state as follows:

where

( )

( )( ) ( )( ) ( )( )( ) ( )( )

1 ( , ), 1 ,

( , ), 1 1,( , ),

( , 1), 1 2, , ,

( , ), 1 ( , ), 1 .

if

if

if

if

S t P H t

S t P H tP H t

P H t

P H t P H t

α τ α

α ττ

τ τ

ω τ ω

⎧⎪ − − =⎪⎪⎪⎪ − =⎪⎪=⎨⎪ − − = Τ⎪⎪⎪⎪ Τ − + − =⎪⎪⎩

1( )1

S tL t

=− +

Page 25: Towards a Detection Theory For Intelligence Applications

25

Measurement Model

Let J be an evidence graph. The likelihood function is defined by

and

where q = 1 – p and q* = qER qV.

( ) ( \ ) ( ) ( ) ( \ )* *( , ) .k k k ke J H N e J H e J H e H J

ER ERL J H p q p qτ − ∪ ∩=

( ) ( ) ( ) .e J N e JER ERL J p q −∅ =

Page 26: Towards a Detection Theory For Intelligence Applications

26

Update Equation

Update the probability distribution by

where

Note that x can be or .

( ) ( ),x

C L J x P x t−

=∑X

( ) ( ) ( )1, ,P x t L J x P x tC

−=

( , )H τ′ ′∅

Page 27: Towards a Detection Theory For Intelligence Applications

27

Example: Plan starts at time t = 25

Embedded movie removed

Page 28: Towards a Detection Theory For Intelligence Applications

28

Summary: Tracking Plans in Networks

Proof of concept: LRDT can be extended to transactional domainCan be generalized, e.g.:– May allow “tips” about possible subterfuge– May allow attributed nodes and links– Use more complicated network and plan models– May Include a clutter model

Our best attempt to create a rule-based method to “score” nodes based on links and tips observed faltered– E.g. 6 malefactors identified, 1 of which is correct– Misled by tips

Difficulty: number of states grows quickly

Page 29: Towards a Detection Theory For Intelligence Applications

29

3. Hierarchical Hypothesis Management

Number of hypotheses in previous example:– Numerically feasible to compute exactly

For larger problems:– Number of hypotheses rapidly increases– Impossible to maintain all hypotheses

Solution:– Group detailed hypotheses into successively coarser ones– Maintain probabilities on coarser hypotheses

• High probability coarse hypotheses get resolved to finer levels• Low probability hypotheses perish

Example:– Coarse hypothesis: “This edge is a member of the sought pattern”– Finer hypothesis: “This set of edges is a subset of the pattern”– Finest hypothesis: “This set of edges is the sought

pattern”

55 10×

Page 30: Towards a Detection Theory For Intelligence Applications

30

Feature Selection

As features become larger…– they serve better

to distinguish target from noise;

– computational intensity increases.

leve

l

1k =

11k =,…

, ,

3k =

2k =

easy calculation,poor discrimination

good discrimination,hard calculation

Page 31: Towards a Detection Theory For Intelligence Applications

31

A larger problem: Find a given pattern Hinserted 20% of the time In a fixed,unknown location (out of 5.2 x 1017

possible locations) In 125 instances ofrandom noise on 100 nodes.Too hard for direct solution: in each instance,160 trillion random copies of the pattern Hobscure the real one.HHM algorithm recursively prunes hypothesis space untilfeasible to detect the target. final output is the configuration of the target pattern in the data.

Example of HHM

H =

Page 32: Towards a Detection Theory For Intelligence Applications

32

HHM in Action

100 1208060400 20

100 1208060400 20

100 1208060400 20

100 1208060400 20

Pick top 719 edges out of 4950.

In these 719 are 10222 2-edge pieces.Pick top 220 of these.

Among edges in top 220 2-edge pieces are 2553 3-edge pieces.Pick top 28 of these.

1-edge pieces

2-edge pieces

3-edge pieces

4-edge pieces

Among edges in top 28 3-edge pieces, there are 145 4-edge pieces.The edges in top 17 of these form the sought pattern. Pattern found!

2-edge piece

counts

Countsof all4950 edges

Top 719

edges

Top 220

Page 33: Towards a Detection Theory For Intelligence Applications

33

HHM in Action

Embedded movie removed

Page 34: Towards a Detection Theory For Intelligence Applications

34

Summary: HHM

LRDT approaches can have enormous state spacesIn the classical domain, multigrid and particle methods have been developed to tame this problem– Both these rely on the existence of an underlying metric space

In the intelligence domain, state spaces often lack a metric structure, so new technology is neededStates can often be grouped into natural, user-defined hierarchies, which are then amenable to HHMKey research areas:– Optimal threshold setting, e.g. better than Monte Carlo– Feature selection, e.g. connected k-sets are more discriminating

than arbitrary k-sets.

Page 35: Towards a Detection Theory For Intelligence Applications

35

4. References

J. Ferry and D. Lo, “Fusing Transactional Data to Detect Threat Patterns,” Proc. 9th International Conference on Information Fusion, Florence, Italy, July 2006. G. Godfrey, J. Cunningham and T. Tran, “A Bayesian, nonlinear particle filtering approach for tracking the state of terrorist operations,” Proceedings of the Military Applications Society Conference on Homeland Security in the 21st Century, Mystic CT, July 2006.T. Mifflin, C. Boner and G. Godfrey, “Detecting Terrorist Activities in the 21st Century: A theory of detection for transactional networks,” Emergent Information Technologies and Enabling Policies for Counter-Terrorism, eds. R. Popp and J. Yen, Wiley IEEE, June 2006.C. Boner, “Novel, Complementary Technologies for Detecting Threat Activities within Massive Amounts of Transactional Data,” Proceedings of the International Conference on Intelligence Analysis, Tysons Corner, May 2005.C. Boner, “Automated Detection of Terrorist Activities through Link Discovery within Massive Databases,” Proceedings of the AAAI Spring Symposium on AI Technologies for Homeland Security, Palo Alto, March 2005.T. Mifflin, C. Boner, G. Godfrey and J. Skokan, “A random graph model of terrorist transactions,” Proceedings of the IEEE Aerospace Conference, Big Sky, March 2004.