SMS WATCHDOG: PROFILING SOCIAL BEHAVIORS OF SMS USERS FOR ANOMALY DETECTIONAuthors: Guanhua Yan, Stephan Eidenbenz, Emannuele GalliPresented by: Ishtiaq Rouf
2
Overview of presentation Introduction to Short Message System (SMS)
SMS architecture, tracing SMSs, SMS proxy Common threats to SMS systems, existing solutions
Behavior analysis Statistically accurate metrics
SMS Watchdog Detection types
Performance analysis Accuracy and usefulness of protocol
3
An overview of the SMS architecture, SMS proxies, and common threats on SMS systems.
Short Message System
4
Short message system (SMS)
SMSs were introduced in 1980s and have become a fabric of our lives since.
Uses the signal paths necessary to control the telephony traffic. Not an intended use! Designed for emergency only.
More than 1 trillion SMSs are delivered each year. Lucrative target for attackers.
5
Threats to SMS systems Common network attacks launched
against SMS: Spamming
Sending unsolicited messages
Spoofing Falsely pretending to be a sender
Phishing Trying to steal device information
6
Previously attempted solutions
IP-based solutions: Signature-based detection schemes to examine
mobile network traffic Power usage of mobile applications Machine-learning based approach to discriminate at
the level of APIs
Information-theoretical solutions: Analysis of message size, distribution, service time
distribution User clique analysis, similar to email spam protection
7
Limitation of traditional methods No determination of mobility
Mobility of malicious device is not considered
One-size-fits-all solutions Attempting to use solutions that are not scaled for SMS
Power requirements Solutions are not suitable for battery-operated devices
Computational complexity Cellular phones have less computational ability compared to
servers and workstations
8
Features of proposed solutions Apply a protection mechanism at the SMS Center
Implemented at the server, where most control and information are available
Collect usage data over five months to create a trace of usage Used to train a pattern recognition script An SMS proxy in Italy was used to collect data.
Four unique schemes used in combination Combination of four systems will work better than one
“silver bullet” solution
9
SMS Architecture
Alphabet soup: BSS – Base Station
System SGSN – Serving
GPRS Support Node GGSN – Gateway
GPRS Support Node MSC – Mobile
Switching Center SMSC – SMS Center
Protection applied here
10
An overview of statistical methods that can be useful in analyzing the trace of SMS users.
Behavior analysis
11
Trace analysis “Trace” of users was collected from the SMS proxy
Interested in statistically time-invariant metrics Various statistical operations displayed different strengths
Coefficient of variation (COV) is deemed to be a better metric compared to basic functions The ratio of standard deviation to the mean
Entropy of the distributions was computed
p is the fraction of SMSs sent to the i-th unique user
12
Usage analysis (1/4) Number of messages and unique
sender/receiver per day over 5 months
Increased usage as users increase with time
13
Usage analysis (2/4) Average number of messages for persistent
users (daily/weekly)
Anomalous spikes make the system unreliable
14
Usage analysis (3/4) Average number of receivers per persistent
user (daily/weekly)
Similar spike in usage observed
15
Usage analysis (4/4) Average entropies for persistent users
(daily/weekly)
Entropy is a better measure, but not a full solution
16
Window-based analysis High variation is inherent in many SMS users’
behaviors on a temporally periodic basis. A window-based approach can mitigate issues and
help bound the parameters better.
Two parameters are selected, in particular: : number of blocks created in the dataset
10 or 20 blocks created : minimum number of SMSs sent by users
considered 100 or 200 SMSs considered
17
COV > 1 for window-based behaviors
Window-based behaviors of SMS users bear lower variation than their temporally periodic behaviors.
“COV > 1” means “high variation” Not useful for anomaly detection
18
Similarity measure The following equation is used to get the
recipient similarity metric:
Relative entropy is used as a comparison of distributions to determine similarity: Jensen-Shannon (JS) divergence used Provides relative symmetry
19
COV > 1 for similarity measure Divergence analysis shows better
performance compared to previous metrics.
20
An overview of how SMS Watchdog is designed to make use of statistical analyses of behavioral patterns.
SMS Watchdog
21
Threat models Two families of threats were considered:
Blending attacks Occurs when an SMS user’s account is used to
send messaged for a different person. Trojan horse Spoofing SMS proxy
Broadcast attacks Mirrors the behaviors of mobile malware that
send out phishing or spamming messages
22
Workflow of SMS Watchdog The proposed solution works in three
steps: Monitoring
Maintains a window size, h, for each user that has subscribed for this service
Also keeps a count, k, of number of SMSs sent Anomaly detection
Watches for anomalous behaviors (explained later)
Alert handling Sends an alert to the SMS user using a different
medium
23
Anomaly detection Anomaly detection is done in multiple steps:
Decision on detection window size Minimize the COV of the JS-divergence after grouping recipients
(to maximize the level of similarity)
Mean-based anomaly detection Leverages average number of unique recipients and average
entropy within each block (both show low variation) Checks if the mean of these two metrics vary radically
Similarity-based anomaly detection In a light-weight version, it is proposed that historic information be
condensed into a set of recipients and a distributional function
24
Threat determination metric denotes a block or the test sequence
Mean-based detection: : Number of unique recipients in : Entropy of
Similarity-based detection: : Set of top recipients
S-type detection : Normalized distribution of the number of SMSs sent to
the top recipients within sequence D-type detection
25
Evaluation of experimental performance observed by the authors.
Performance analysis
26
False positive rates Detector parameters
70% of data used for training, 30% for testing
= 10, , n = # of SMSs = Upper bound
Low false-positive rate observed for all metrics:
27
Detecting blending attacks Entire dataset was divided into pairs of two
Observations: Similarity-based (S- and D-type) schemes detect better
Contains more information in the detection metrics H- and D-type perform better than R- and S-type
Consider not only the set of unique recipients, but also the distribution of the number of SMSs send to each recipient
28
Detecting broadcast attacks Test dataset of each user is intermingled with
maliciously sent messages malicious messages sent (“broadcast threshold”)
Unlike before, R-type is good at detecting the threat Considers message number only
29
Hybrid detection Two hybrid schemes proposed:
R/H/S/D Any flag is treated as anomalous
S/D Only S- and D-type flags are treated as
anomalous Performance of hybrid detections
schemes:
30
Self-reported limitations SMS Watchdog fails to detect the
following cases: SMS faking attacks Transient accounts that are set up for
phishing Behavioral training that is not covered
31
Questions?