Upload
hoangdiep
View
213
Download
0
Embed Size (px)
Citation preview
TOP SECRET//COMINT//REL TO USA, FVEY
SKYNET: Courier Detection via Machine Learning
R66F/JHU R66F
R66F* T1211 , T1211 S2I51
, S2I5/TD
June 5, 2012
TOP SECRET//COMINT//REL TO USA, FVEY
TOP SECRET//COMINT//REL TO USA, FVEY
Given a handful of courier selectors, can we find others that "behave similarly" by analyzing GSM metadata?
n Paths Legend
O Miram Shah, North Waziristan
O Wana, South Waiirislan
Miles ,?o ,-io KeoVFsoai
m - -.¿Wl"
rjm <~jk\ O Faisalabad
O Lahore
H f f i H iafl
«¡ffl
It's worth noting that: • we are looking for
different people using phones in similar ways
• without using any call chaining techniques from known selectors
• by scanning through all selectors seen in Pakistan that have not left Af/Pak (~55M)
TOP SECRET//COMINT//REL TO USA, FVEY
TOP SECRET//COMINT//REL TO USA, FVEY
From GSM metadata, we can measure aspects of each selector's and travel behavior
imabad
TZ Paths Legend 23
Sunday Monday Tuesday Wednesday Thursday Friday Saturday
TOP SECRET//COMINT//REL TO USA, FVEY
TOP SECRET//COMINT//REL TO USA, FVEY
This presentation describes our search for AQSL couriers using behavioral profiling
Behavioral Feature Extraction
Cross Validation Experiment on AQSL Couriers
Preliminary SIGINT Findings
TOP SECRET//COMINT//REL TO USA, FVEY
TOP SECRET//COMINT//REL TO USA, FVEY
Counting unique UCELLIDs shows that couriers travel more often than typical Pakistani selectors
TOP SECRET//COMINT//REL TO USA, FVEY
TOP SECRET//COMINT//REL TO USA, FVEY
By examining multiple features at once, we can see some indicative behaviors of our courier selectors
i 1 1 1 1 r 0 2 4 6 8 10 Average Daily Incoming Voice
TOP SECRET//COMINT//REL TO USA, FVEY
TOP SECRET//COMINT//REL TO USA, FVEY
Looking at a hierarchical clustering derived from all 80 features, the AQSL groups mostly stay together
Gen Pop
Gon Pop :
Gon Pop
AQSL Remote Comms
AOSI Riminlit Cnmmc
AQSL Remote Comms
AQSL Remote Comms
Gen Pop
AQSL Local Comms
AQSL Local Comms
Gen Pop
Gen Pop
Gen Pop
Gon Pop
Gon Pop
Gen Pop
Gon Pop
Gon Pop
Gon Pop
Gen Pop
Gon Pop
Gen Pop
Gen Pop
Gon Pop
4 days of Gen Pop collect GenPop
\
Gen Pop
Gen Pop
Gon Pop
Gon Pop
AQSL Local Comms
Gen Pop
ms
imsi!
imsi
imsi
imsi
imsi
imsi
msi imsi
imsi
imsi
msi imsi
imsi
imsi
msi
imsi
imsi
imsi
ms
imsi
imsi
imsi
imsi
: ms
: ms
imsi
ims
ims
ims
ims
ims
TOP SECRET//COMINT//REL TO USA, FVEY
TOP SECRET//COMINT//REL TO USA, FVEY
Now, we'll describe a cross validation experiment on the AQSL selectors that we were provided
Behavioral Feature Extractic
Cross Validation Experiment on AQSL Couriers
TOP SECRET//COMINT//REL TO USA, FVEY
TOP SECRET//COMINT//REL TO USA, FVEY
Our initial detector uses the centroid of the AQSL couriers to "find other selectors like these"
AQSL Cross-Validation Experiment • 7 MSISDN/IMSI pairs • Hold each pair out
and score them when training the centroid on the rest
• Assume that random draws of Pakistani selectors are nontargets
• How well do we do?
2 Q. w en
S
s
O ò
t t ? CD
O CM
in r
1 e-04 0.01 0.1 20 40 60 80 95 99
false alarm probability (%)
TOP SECRET//COMINT//REL TO USA, FVEY
TOP SECRET//COMINT//REL TO USA, FVEY
Our initial detector uses the centroid of the AQSL couriers to "find other selectors like these"
AQSL Cross-Validation s
Experiment g
• Initial experiments § showed EER in s 10-20% range | 9
~ o
• Here, performance is f much worse again st f these nontargets:
• Seen in Pakistan ° • Not seen outside of
Af/Pak ?
• Not FVEY selectors *
t r 7 - * - if* : I I s : :
• . K . : T ; * \ *. \
T * - if* : I I s : :
1 1 . L . L L 444H T ; * \ *. \ 1 ! i/X.LJ 444H T
I ili! ! : ! : « T"": _ T T i
: L_ : 1 / rn i M M ! = ! ! [J^ î M M
; ; ; :
Ì M M i l l ! i Hi ! • ; < ; : : : :
— 1 : i ]
: : ; : : : : : : :
; ; i : i ! J , - ' ' : U , i : : : ! | i , A ! : ' \ : j
h : : :
U ! i
j L L L L } | i : ! p j 1 1 : I M I
• j... . , i „ . J.i i ; ; : : : : : : • • • • • . • . : : J : ! 1 : l ; " " 7 x 7 j j
. . . i i i : : ! : »1 i H ! : : : I • 1 ; H
_ 1 r l - r ^ r | „ r ;
Z . L _ L 1 _ _ _ _ I . A.....À 1 . 1 ) i ; .{ . A . . . : r l - r ^ : : : : : : , :
r : 1 : ; I IT h : : : i
y • i
• - -
f i t
Centroid{AII Raw Features) — Centroid{AII Normalized Features)
Centroid{Outgoing Raw Features) — Centroid{Outgoing Normalized Features)
1 e-04 0.01 0.1 1 5 20 40 60 80 95 99
false alarm probability (%)
TOP SECRET//COMINT//REL TO USA, FVEY
TOP SECRET//COMINT//REL TO USA, FVEY
Statistical algorithms are able to find the couriers at very low false alarm rates, if we're allowed to miss half of them
Random Forest Classifier • 7 MSISDN/IMSI pairs • Hold each pair out and
then try to find them after learning how to distinguish remaining couriers fro ri other Pakistanis (using 100k random selectors here)
• Assume that random draws of Pakistani selectors are nontargets
• 0.18% False Alarm Rate at 50% Miss Rate
XI re X! O
in
<J> cn
in en
o co
o to
o
o CM
in s-
o ò
o i
CD
1e-04 I I I I
0.01 0.1
CentroidfAII Raw Features) Centroid{AII Normalized Features) Centroid{Outgoing Raw Features) Centroid{Outgoing Normalized Features) Random Forest(A!l Raw Features) Random Forest(Outgoing Raw Features)
"i i i — i — r T T T i — i — [ — m
20 40 60 80 95 99
false alarm probability (%)
TOP SECRET//COMINT//REL TO USA, FVEY
TOP SECRET//COMINT//REL TO USA, FVEY
We've been experimenting with several error metrics on both small and large test sets
100k Test Selectors 55M Test Selectors
False Alarm Mean Tasked Tasked Rate at 50% Reciprocal Selectors in Selectors in
Training Data Classifier Features Miss Rate Rank Top 500 Top 100
None Random None 50% 1/23k
(simulated) 0.64
(active/Pak) 0.13
(active/Pak)
Known Couriers
Centroid All 20% 1/18k
Known Couriers
Centroid 43% 1/27k
Known Couriers
Random Forest
Outgoing 0.18% 1/9.9 5 1
+ Anchory Selectors
Random Forest
Outgoing
Random Forest: • 0.18% false alarm rate at 50% miss rate • 7x improvement over random performance when
evaluating its tasked precision at 100 TOP SECRET//COMINT//REL TO USA, FVEY
TOP SECRET//COMINT//REL TO USA, FVEY
To get more training data we scraped selectors from S2I11 Anchory reports containing keyword "courier"
Anchory Selectors • Searched for reports
containing "S2I11" AND "courier"
• Filtered out non-mobile numbers and kept selectors with "interesting" travel patterns seen in SmartTracker
TOP SECRET//COMINT//REL TO USA, FVEY
TOP SECRET//COMINT//REL TO USA, FVEY
Adding selectors from Anchory reports to the training data reduced the false alarm rates even further
Anchory Selectors • Searched for reports
containing "S2I11" AND "courier"
• Filtered out non-mobile numbers and kept selectors with "interesting" travel patterns seen in SmartTracker
false alarm probability (%)
>N
li co .Q O
uy V)
<J> o>
ID Cf)
O 00
O CD
O
O CJ
in —
o ò o
I 0)
1— 1e-04
I I I I
0.01 0.1
T T
Centroid{AII Raw Features) Centroid(AII Normalized Features) Centroid{Oiitgoing Raw Features) Centroid{Outgoing Normalized Features) Random Forest(All Raw Features) Random Forest(Outgoing Raw Features)
i — r
5
i — r T T 20 40 60 80
~r 95
T "T"
99
TOP SECRET//COMINT//REL TO USA, FVEY
TOP SECRET//COMINT//REL TO USA, FVEY
We've been experimenting with several error metrics on both small and large test sets
Training Data Classifier Features
100k Test Selectors 55M Test Selectors
Training Data Classifier Features
False Alarm Rate at 50% Miss Rate
Mean Reciprocal
Rank
Tasked Selectors in
Top 500
Tasked Selectors in
Top 100
None Random None 50% 1/23k
(simulated) 0.64
(active/Pak) 0.13
(active/Pak)
Known Couriers
Centroid All 20% 1/18k
Known Couriers
Centroid
Outgoing
43% 1/27k Known
Couriers
Random Forest
Outgoing 0.18% 1/9.9 5 1
+ Anchory Selectors
Random Forest
Outgoing 0.008% 1/14 21 6
Random Forest trained on Known Couriers + Anchory Selectors: • 0.008% false alarm rate at 50% miss rate • 46x improvement over random performance when
evaluating its tasked precision at 100 TOP SECRET//COMINT//REL TO USA, FVEY
TOP SECRET//COMINT//REL TO USA, FVEY
Now, we'll investigate some findings after running these classifiers on +55M Pakistani selectors via MapReduce
Preliminary SIGINT Findings
TOP SECRET//COMINT//REL TO USA, FVEY
TOP SECRET//COMINT//REL TO USA, FVEY
The highest scoring selector that traveled to Peshawar and Lahore is PROB AHMED ZAIDAN
n Paths Legend lAz 4ft 4fi V
• PROB AHMED MUWAFAK Z AID AN
• MEMBER OF AL-QA'IDA • MEMBER OF MUSLIM
BROTHERHOOD • WORKS FOR AL JAZEERA
TOP SECRET//COMINT//REL TO USA, FVEY
TOP SECRET//COMINT//REL TO USA, FVEY
In the top 500 scoring selectors, 21 are tasked leading us to believe that we're on the right track
TOP SECRET//COMINT//REL TO USA, FVEY
TOP SECRET//COMINT//REL TO USA, FVEY
We have also discovered many untasked selectors with interesting travel patterns
TOP SECRET//COMINT//REL TO USA, FVEY
TOP SECRET//COMINT//REL TO USA, FVEY
Preliminary results indicate that we're on the right track, but much remains 10 aone
Cross Validation Experiment: - Random Forest classifier operating at
0.18% false alarm rate at 50% miss - Enhancing training data with Anchory
selectors reduced that to 0.008% - Mean Reciprocal Rank is ~1/10
Preliminary SIGINT Findings: - Behavioral features helped discover
similar selectors with "courier-like" travel patterns
- High number of tasked selectors at the top is hopefully indicative of the detector performing well "in the wild"
TOP SECRET//COMINT//REL TO USA, FVEY