1 SnT – Interdisciplinary Centre for Security, Reliability and Trust2 Bell Laboratories, Alcatel-Lucent
Identifying abnormal patterns in cellular communication flowsIPTCOMM 2013Principles, Systems and Applications of IP TelecommunicationsOctober 15 - 17, 2013
David Goergen1
Veena Mendiratta2
Radu State1
Thomas Engel1
OUTLINE
• Introduction
• Related Work
• Model / Metric
• D4D Dataset
• Evaluation
• Future work
• Conclusion
IPTCOMM 201304/10/23 2
Intro
• Analyzing large volumes of cellular communications records– Can help to improve the overall quality it
provides to its users– Allows operators to detect abnormal
patterns and react accordingly
• Definition of model and metric to detect abnormal traffic
• Application on a country-level data set• Correlated detected flows with events
IPTCOMM 201304/10/23 3
D4D Dataset specification
• One country
• Time Period: 01.12.2011 to 28.04.2012
• 5 million users
• 1124 base stations (for mobile communications)
• More then 3 billion entries summarizing on a hourly basis the SMS and Voice Calls
• 50000 mobile users tracked over these months with GPS and call records
IPTCOMM 201304/10/23 4
D4D Dataset specification
• Set 1: Base station-to-base station ongoing calls
• Set 2: User movement among base stations• Set 3: User movement among region
subdivision• Set 4: Communication sub-graphs
IPTCOMM 201304/10/23 5
Related Work• S. van den Elzen, D. H. Jorik Blaas, J.-K. Buenen, J. J. van Wijk, R.
Spousta, A. Miao, S. Sala, and S. Chan. Exploration and Analysis of Massive Mobile Phone Data: A Layered Visual Analytics approach. In NetMob, 2013
• M. Cerinsek, J. Bodlaj, and V. Batagelj. Symbolic clustering of users and antennae. In NetMob, 2013.
• G. Krings, F. Calabrese, C. Ratti, and V. D. Blondel. Urban Gravity: A Model For Intercity Telecommunication Flows. Journal Of Statistical Mechanics: Theory And Experiment, 2009, 2009
• V. A. Traag, A. Browet, F. Calabrese, and F. Morlot. Social Event Detection in Massive Mobile Phone Data Using Probabilistic Location Inference. In Privacy, Security, Risk and Trust (PASSAT), 2011 IEEE Third International Conference on Social Computing (SocialCom), 2011.
IPTCOMM 201304/10/23 6
Model / Metric
IPTCOMM 201304/10/23 7
Comparison
Related Work Our method
04/10/23 IPTCOMM 2013 8
Data processing on Hadoop cluster
• Hadoop 2.0.0-cdh 4.3.0
• 4 nodes– hexacore
2.4GHz Xeon
• 120 GB RAM• HDFS 27.54 TB • 2 x 1GB
Ethernet bonded
IPTCOMM 201304/10/23 9
Hadoop job process
04/10/23 IIT RTC conference 10
Metric parameters
• Analyzing the impact of α and the window size w– α ↑ dataset ↑ – w ↑ dataset ↓
• Tradeoff between granularity and loss of detail
• Chosen w = 10 and α = 0.5
IPTCOMM 201304/10/23 11
Abnormal number of calls
• Applying our metric– Circle
• Power failures
– Square • President appeared
at court
– Triangle• Rebelious fanatics
invasion
IPTCOMM 201304/10/23 12
Abnormal duration of calls
• Applying our metric– Circle
• Power failures
– Square • President appeared
at court
– Triangle• Rebelious fanatics
invasion
IPTCOMM 201304/10/23 13
Closer look at specific region
IPTCOMM 201304/10/23 14
Detecting abnormal situation
IPTCOMM 201304/10/23 15
IPTCOMM 201304/10/23 16
Observation of the highlighted period
IPTCOMM 201304/10/23 17
IPTCOMM 201304/10/23 18
Clustering: mean duration vs mean number of calls
• Group A– Small amount of calls
and short to medium duration
• Group B– Large amount of calls
and short to medium duration
usual diurnal behaviour of night and day communication
• Group C is the outlier– Long duration and
global average amount of calls
– All calls occur in the same time slot on different days
IPTCOMM 201304/10/23 19
Principal Component Analysis
PCA on number of calls
Var
ianc
es
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10
020
4060
8010
012
014
0
PCA on the total duration
Var
ianc
es
050
100
150
200
250
300
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10
IPTCOMM 201304/10/23 20
PCA result
• Cross-referencing the results of both analyses
• 10 base stations most affected by PC1
IPTCOMM 201304/10/23 21
Future Work
• Further investigate the impact of the chosen parameters – Window size and α
• Graph theory analysis– Detect effects on the complete connected
graph– Using page-rank or HITS algorithm
IPTCOMM 201304/10/23 22
Conclusion
• Detection of abnormal traffic is possible by our metric
• Large data set analysis in reasonable amount of time
IPTCOMM 201304/10/23 23
Thank you for your attentionQUESTIONS?
IPTCOMM 201304/10/23 24
Requirements
04/10/23 IPTCOMM 2013 25