Towards a Detection Theory For Intelligence Applications

Preview:

Citation preview

Towards a Detection Theory For Intelligence Applications

Stephen AhearnJim FerryDarren Lo Aaron Phillips

http://www.metsci.com

2

Motivation

Classical detection theory has been developed and applied for decades to detect and track stealthy targets– Submarines, aircraft, missiles, …– Using a variety of sensors: sonar, radar, IR, …– Gives U.S. a distinct advantage over adversaries

We seek to develop an analogous theory of detection on networks– To detect and track threat networks and activities

that cannot be observed directly– Exploiting diverse data: SIGINT, HUMINT, IMINT …– To maintain our advantage over today’s adversary

and win the GWOT

3

Classical Detection Theory Applications

Wind

Current

( )( ) ( ) ( )

( ) ( ) ( )

T Tt t t t t t t t

t t t T Tt t t t

P x P x x x z dxx z

P P x x z dxX

X

++Δ +Δ

+Δ +

∅ + ΛΛ =

∅∅ + ∅ Λ

∫∫

, 1 ,( , ( , ( )))i t i t ty F y map G Yη+ =

( ) ,SZ S H ∪ ε=( )

( )

2

( )2

( )( )

( )

(1 ) if and ,

( | ) 0 if and ,

(1 ) if .

n

n e G

e Ge G HS

S

e G

p p S N H G

L Z G S S N H G

p p S N−

−−⎧⎪⎪ − ≠ ⊆⎪⎪⎪⎪= = ≠⎨⎪⎪⎪ − =⎪⎪⎪⎩

( ) ( ) *

( ) ( ) *

( 1) ( ; )

( ) ( 1) ( ; )

S YSH GS

S

e G e K m

H L G

e G e L mH n

H L G

c L p

X K c L p

∈⊆

⊆ ⊆

⊆ ⊆

Λ=−

∑ ∑

Detection TheoryDetection Theory

Detection theory enables detection and tracking of stealthy targets– Such targets cannot be detected by

analyzing sensor reports separately– Only through principled data fusion

does the signal stand out from noise– The probabilistic framework correctly

manages uncertainty and risk

4

Data of InterestVast Sea of Data

AA #77(Pentagon)

AA #11(WTC North)

UA #93(Pennsylvania)

UA #175(WTC South)

SalemAlhamzi

HaniHanjour

NawafAlhmazi

KhalidAlmihdharMajed

MoqedAhmedAlnami

SaeedAlghamdi

HamzaAlghamdi

AhmedAlghamdi

AbdulAziz Alomari

MohamedAtta

Adapted from “Connecting the Dots --Tracking Two Identified Terrorists,”Valdis Krebs, 2001.

Detection Theory for Intelligence Applications

COMINT

HUMINT

ELINT

IMINT

MASINT

Sig

nal p

roce

ssin

g, d

ata

colle

ctio

n

( )( ) ( ) ( )

( ) ( ) ( )

T Tt t t t t t t t

t t t T Tt t t t

P x P x x x z dxx z

P P x x z dxX

X

++Δ +Δ

+Δ +

∅ + ΛΛ =

∅∅ + ∅ Λ

∫∫

, 1 ,( , ( , ( )))i t i t ty F y map G Yη+ =

( ) ,SZ S H ∪ ε=( )

( )

2

( )2

( )( )

( )

(1 ) if and ,

( | ) 0 if and ,

(1 ) if .

n

n e G

e Ge G HS

S

e G

p p S N H G

L Z G S S N H G

p p S N−

−−⎧⎪⎪ − ≠ ⊆⎪⎪⎪⎪= = ≠⎨⎪⎪⎪ − =⎪⎪⎪⎩

( ) ( ) *

( ) ( ) *

( 1) ( ; )

( ) ( 1) ( ; )

S YSH GS

S

e G e K m

H L G

e G e L mH n

H L G

c L p

X K c L p

∈⊆

⊆ ⊆

⊆ ⊆

Λ=−

∑ ∑

Detection TheoryDetection Theory

Data transformation

Detect and track networks

Detect and track activities

Assess threat levels and intent

Detection Theory

Detection Theory

5

Approach: Two well-established theories

Likelihood Ratio Detection and Tracking– Explicitly models noise and signal– Principled Bayesian framework for managing uncertainty– Used for decades for tracking stealthy kinematical targets– Traditional domain has metric structure

Random graph theory– Models transactional domain– Discrete structure– Rich mathematics

6

Likelihood Ratio Detection and Tracking (LRDT)

Requirements– State space– Measurement model L(z | x) (or L(z | x) = L(z | x) / L(z | Ø) )– Motion model PT(xt | xt-Δt)

Result – Update equations: probability form — P(xt)

– Update equations: likelihood ratio form — Λ(xt) = P(xt) / P(Ø)

( ) ( ) ( )Tt t t t t t t tP x P x x P x dx−

−Δ −Δ −Δ= ∫

{ }= ∪ ∅X X

( ) ( ) ( )1t t t tP x L z x P x

C−=

( ) ( ) ( )Tt t t t t t t tx P x x x dx−

−Δ −Δ −ΔΛ = Λ∫ ( ) ( ) ( )t t t tx z x x−Λ = ΛL

7

Peak value over target track ~ 107

Second highest peak ~ 10

Time

Cumulative likelihood ratio surface (fusing data using motion model)

Time

Radar return data –Measurement likelihood ratio surfaces

Distance from radar

Classical Likelihood RatioDetection and Tracking (LRDT)

Motion model describes movement of periscopeFusion over time smoothes out random fluctuations from noise and clutterLR peaks accumulate on movement that fits the motion model (i.e., the periscope)LR peaks dissipate for structures which move in other ways (e.g., waves)

8

t = 5t = 20t = 10t = 15t = 25t = 30t = 35

LRDT on Networks: a Simple Example

( ) 0P x = ( ) 1P x =

t = 1

Ground truth position ofcell highlighted in purple

State space: all 1140 possible triangles (i.e., “terrorist cells”)Measurement model:– Signal: the cell appears with probability 0.8– Noise: each possible edge appears independently with probability 0.3

Motion model: cell swaps out a member with probability 0.1

9

Overview

1. Graph-theoretic Underpinnings– Graph-theoretic analogues of noise and signal

2. Tracking Plans in Networks– Extension of LRDT to transactional domain

3. Hierarchical Hypothesis Management– Novel methodologies required to mitigate combinatorial

explosion of state space

4. References

10

1. Graph-theoretic Underpinnings

Erdős-Rényi Random Graph Model G(n,p)– Provides noise model for detection problem– Simplest, most tractable random graph model

Inserted subgraph problem– Provides signal model for detection problem

Likelihood ratio– Optimal decision statistic for detection of inserted subgraph

Distribution of subgraph countOther random graph models?

11

Erdős-Rényi Random Graph Model G(n,p)

The notation G(n,p) denotes a random graph…– on n vertices– with each edge appearing independently with probability p

Very simple noise process model– No correlation structure– Well-studied, but still yields difficult problems– First case to explore before moving on to more realistic network

models (Random Collaboration, Geometric, Gaussian, etc.)

Instance of G(n,p) for n = 6, p = 0.5

12

Inserted Subgraph Problem

Evidence graph J– Background noise Erdős-Rényi random graph G(n,p).– Target graph H may or may not be inserted somewhere.

Binary decision problem: Is H present or not?Neyman–Pearson lemma: Likelihood ratio

is the optimal decision statistic, i.e., yields highestprobability of detection for given false alarm rate.

Theorem [Mifflin, Boner]:

( | )( )( | )HP J HJ

P J HΛ =

presentnot present

( )( )[ ]H

HH

X J H JJX H

Λ = =E

# copies of in # copies of expected just from noise

13

Noise Process

Signal + Noise Process

People, Companies, Ports, Groups, Etc. Target

Graph H

H could represent four shipments of precursor items to four distinct, but linked entities

P(J | )P(J | + )

ΛH (J) =

Likelihood Ratio

optimal decision statistic

Inserted Subgraph Problem

14

Likelihood Ratio Calculations

Example 1: n = 100, p = 0.07, XH(J) = 200– Are the 200 copies of H likely to have arisen by chance?– Answer:

Example 2: n = 1000, p = 0.007, XH(J) = 2000– Are the 2000 copies of H likely to have arisen by chance?– Answer:

Need information about distribution of XH to set thresholds and establish performance boundaries!– Expected value of XH easy.– But that’s all that’s easy about it!

( ) 200( ) 0.226 1[ ] 883.1H

HH

X JJX

Λ = = = <E

: probably just noise

( ) 2000( ) 174.8 1[ ] 11.4H

HH

X JJX

Λ = = =E

: probably contains target(s)

H =

15

Expected Value of Subgraph Count XH

E[XH]: average # of subgraphs in an instance of G(n,p)– Some preliminary notation

• v(H) = number of vertices of H• e(H) = number of edges of H• |Aut(H)| = number of automorphisms of H

– Simple formula for E[XH] (Erdős):

( )( )![ ]( ) | Aut( ) |

e HH

n v HX pv H H⎛ ⎞⎟⎜= ⎟⎜ ⎟⎟⎜⎝ ⎠

E

# of arrangements of H on this vertex set

# of choices for vertex set of H

probability of all e(H)edges of H appearing

|Aut(H)| = 2 :

H =v(H) = 4

e(H) = 4

4

4

4![ ]4 2

1806

for

H

nX p

pn

⎛ ⎞⎟⎜= ⎟⎜ ⎟⎟⎜⎝ ⎠

==

E

16

Distribution of Subgraph Count XH

Example: distribution of XH for n = 6, p = 0.5– Minimum possible value = 0 (probability = 18.1%), – Maximum possible value = 180 (probability = 0.003%)– Mean given by Erdős formula:

– Variance:

[ ] 4180 11.25HX p= =E

[ ] ( )( ) ( )24 2 3var 180 1 1 13 50 128 14.2HX p p p p p= − + + + =

H =

Instance ofG(n,p) forn = 6, p = 0.5

copies of862022010118

…0 1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627

0.025

0.05

0.075

0.1

0.125

0.15

0.175

( )

( ) ( )6 7 2

Pr

90 1 4

4.67% 0.5

exactly 8 copies of

for

H

p p p

p

=

− + =

=P

roba

bilit

y

XH

average = 11.25

17

Asymptotic Estimate of Variance

Theorem [Ferry]:– Decompose H into core cr(H) and rooted trees Ti

– Color each node i of core by isomorphism class of Ti

– Then

where

and B is computed by an exact recursive formula.

1( )

cr( )

var, ;H

i iG i V HH

XB T T d O n

X π

π ⎛ ⎞⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎟⎜⎝ ⎠

⎡ ⎤⎢ ⎥ ⎛ ⎞ ⎛ ⎞⎢ ⎥ ⎟⎜ ⎟⎣ ⎦ ⎜⎟⎜ ⎟⎜⎟ ⎟⎜ ⎜⎟⎡ ⎤ ⎟⎜⎟⎜ ⎝ ⎠⎝ ⎠⎢ ⎥

⎢ ⎥⎣ ⎦

∈ ∈

= +∑ ∏E

Aut cr( ) Aut cr( )/G H Hχ⎛ ⎞ ⎛ ⎞⎟ ⎟⎜ ⎜⎟ ⎟⎜ ⎜⎟ ⎟⎜ ⎜⎟ ⎟⎜ ⎜⎟ ⎟⎜ ⎜⎟ ⎟⎜ ⎜⎟ ⎟⎟ ⎟⎜ ⎜⎟ ⎟⎜ ⎜⎝ ⎠ ⎝ ⎠

=

New result for study of distribution of XH– Bollobás, 1981 only applies to strictly balanced graphs.– Ruciński, 1988 does not estimate variance.– Janson, Łuczak, Ruciński give a much worse estimate.

π

H

cr(H)

( ) ( ), ; , ; B d B d

( ) ( ), ; , ; B d B d ×

( ) ( ), ; , ; B d B d ×

18

Asymptotic Estimate of Variance

With example H, formula yields

For n = 106, mean degree d = 10( )

34613

6

( )![ ]( ) | Aut( ) |

10 31! 10 1.3 10768 1031

e H

H

n v H dXv H H n⎛ ⎞ ⎛ ⎞⎟⎜ ⎟⎜= ⎟ ⎟⎜ ⎜⎟ ⎟⎜⎜ ⎟ ⎝ ⎠⎝ ⎠

⎛ ⎞ ⎛ ⎞⎟⎜ ⎟⎜⎟= = ×⎜ ⎟⎜⎟ ⎟⎜⎜ ⎟⎜ ⎝ ⎠⎝ ⎠

E

[ ] [ ] [ ] [ ]18var 5.4 10H H H HX X X Xν= = ×E E 0 2 4 6 8 10

10000

1.μ109

1.μ1014

1.μ1019

1.μ1024 [ ]HXν

d

[ ][ ]

( 2 3 4 5var 1 192 2304 12960 47008 127120 276544192

H

H

Xd d d d d

X= + + + + + +

E

)

6 7 8 9

10 11 12 13

14 15 16 17

18 19 20 21 22

23 24 25

503280 784840 1066956 12786441361424 1296700 1110904 860672607208 394332 239356 13820876436 39914 19041 7875 2698729 147 18

d d d dd d d d

d d d dd d d d d

d d d

+ + + ++ + + ++ + + +

+ + + + ++ +

H

19

2. Tracking Plans in Networks

Extension of LRDT to transactional domain– Noise model: Sequence of independent instances of G(n,p)– Signal model: Pattern of inserted subgraphs

Susceptible to combinatorial explosion

20

Noise Model

Classic Erdős-Rényi random graph model G(n, p)– n=30– p=0.3

Observe L instances

G(30, 0.3)

1 2, , , LJ J J…

21

Signal Model

Insert a sequence of graphs H1, H2, …, Hm into some fixed, but unknown, location.The Hks appear for τk time steps.Each edge has a probability pV of being observed.Total Plan length: T = 40

H1

τ1 = 10

Leader

H5

τ5 = 10

H4

τ4 = 5

H3

τ3 = 10

H2

τ2 = 5

22

State Space

Must track location and internal plan time τ.Possible time states:

α indicates the plan has not yet started.ω indicates the plan has finished.State space:

indicates no plan is present.Use a diffuse prior on the state space.

{ } { },

( , )H

τ= ∅X ∪

1 , , τ τ α τ ω≤ ≤Τ = =

H

23

Size of the State Space

There are

possible locations.Possible time states:

There are 42 possible time states.State space consists of

states.

5(12,180)(42) 1 511,561 5 10+ = ≈ ×

1 40, , τ τ α τ ω≤ ≤ = =

430 3! 12,180 103 2

⎛ ⎞⎟⎜ = ≈⎟⎜ ⎟⎟⎜⎝ ⎠

24

Motion Model

Advance the probability of each state as follows:

where

( )

( )( ) ( )( ) ( )( )( ) ( )( )

1 ( , ), 1 ,

( , ), 1 1,( , ),

( , 1), 1 2, , ,

( , ), 1 ( , ), 1 .

if

if

if

if

S t P H t

S t P H tP H t

P H t

P H t P H t

α τ α

α ττ

τ τ

ω τ ω

⎧⎪ − − =⎪⎪⎪⎪ − =⎪⎪=⎨⎪ − − = Τ⎪⎪⎪⎪ Τ − + − =⎪⎪⎩

1( )1

S tL t

=− +

25

Measurement Model

Let J be an evidence graph. The likelihood function is defined by

and

where q = 1 – p and q* = qER qV.

( ) ( \ ) ( ) ( ) ( \ )* *( , ) .k k k ke J H N e J H e J H e H J

ER ERL J H p q p qτ − ∪ ∩=

( ) ( ) ( ) .e J N e JER ERL J p q −∅ =

26

Update Equation

Update the probability distribution by

where

Note that x can be or .

( ) ( ),x

C L J x P x t−

=∑X

( ) ( ) ( )1, ,P x t L J x P x tC

−=

( , )H τ′ ′∅

27

Example: Plan starts at time t = 25

Embedded movie removed

28

Summary: Tracking Plans in Networks

Proof of concept: LRDT can be extended to transactional domainCan be generalized, e.g.:– May allow “tips” about possible subterfuge– May allow attributed nodes and links– Use more complicated network and plan models– May Include a clutter model

Our best attempt to create a rule-based method to “score” nodes based on links and tips observed faltered– E.g. 6 malefactors identified, 1 of which is correct– Misled by tips

Difficulty: number of states grows quickly

29

3. Hierarchical Hypothesis Management

Number of hypotheses in previous example:– Numerically feasible to compute exactly

For larger problems:– Number of hypotheses rapidly increases– Impossible to maintain all hypotheses

Solution:– Group detailed hypotheses into successively coarser ones– Maintain probabilities on coarser hypotheses

• High probability coarse hypotheses get resolved to finer levels• Low probability hypotheses perish

Example:– Coarse hypothesis: “This edge is a member of the sought pattern”– Finer hypothesis: “This set of edges is a subset of the pattern”– Finest hypothesis: “This set of edges is the sought

pattern”

55 10×

30

Feature Selection

As features become larger…– they serve better

to distinguish target from noise;

– computational intensity increases.

leve

l

1k =

11k =,…

, ,

3k =

2k =

easy calculation,poor discrimination

good discrimination,hard calculation

31

A larger problem: Find a given pattern Hinserted 20% of the time In a fixed,unknown location (out of 5.2 x 1017

possible locations) In 125 instances ofrandom noise on 100 nodes.Too hard for direct solution: in each instance,160 trillion random copies of the pattern Hobscure the real one.HHM algorithm recursively prunes hypothesis space untilfeasible to detect the target. final output is the configuration of the target pattern in the data.

Example of HHM

H =

32

HHM in Action

100 1208060400 20

100 1208060400 20

100 1208060400 20

100 1208060400 20

Pick top 719 edges out of 4950.

In these 719 are 10222 2-edge pieces.Pick top 220 of these.

Among edges in top 220 2-edge pieces are 2553 3-edge pieces.Pick top 28 of these.

1-edge pieces

2-edge pieces

3-edge pieces

4-edge pieces

Among edges in top 28 3-edge pieces, there are 145 4-edge pieces.The edges in top 17 of these form the sought pattern. Pattern found!

2-edge piece

counts

Countsof all4950 edges

Top 719

edges

Top 220

33

HHM in Action

Embedded movie removed

34

Summary: HHM

LRDT approaches can have enormous state spacesIn the classical domain, multigrid and particle methods have been developed to tame this problem– Both these rely on the existence of an underlying metric space

In the intelligence domain, state spaces often lack a metric structure, so new technology is neededStates can often be grouped into natural, user-defined hierarchies, which are then amenable to HHMKey research areas:– Optimal threshold setting, e.g. better than Monte Carlo– Feature selection, e.g. connected k-sets are more discriminating

than arbitrary k-sets.

35

4. References

J. Ferry and D. Lo, “Fusing Transactional Data to Detect Threat Patterns,” Proc. 9th International Conference on Information Fusion, Florence, Italy, July 2006. G. Godfrey, J. Cunningham and T. Tran, “A Bayesian, nonlinear particle filtering approach for tracking the state of terrorist operations,” Proceedings of the Military Applications Society Conference on Homeland Security in the 21st Century, Mystic CT, July 2006.T. Mifflin, C. Boner and G. Godfrey, “Detecting Terrorist Activities in the 21st Century: A theory of detection for transactional networks,” Emergent Information Technologies and Enabling Policies for Counter-Terrorism, eds. R. Popp and J. Yen, Wiley IEEE, June 2006.C. Boner, “Novel, Complementary Technologies for Detecting Threat Activities within Massive Amounts of Transactional Data,” Proceedings of the International Conference on Intelligence Analysis, Tysons Corner, May 2005.C. Boner, “Automated Detection of Terrorist Activities through Link Discovery within Massive Databases,” Proceedings of the AAAI Spring Symposium on AI Technologies for Homeland Security, Palo Alto, March 2005.T. Mifflin, C. Boner, G. Godfrey and J. Skokan, “A random graph model of terrorist transactions,” Proceedings of the IEEE Aerospace Conference, Big Sky, March 2004.

Recommended