Upload
trinhnhi
View
220
Download
0
Embed Size (px)
Citation preview
Anomaly Detection on ITS Data via
View Association
Anomaly Detection on ITS Data via
View Association
Junaidillah Fadlil, Hsing-Kuo Pao and Yuh-Jye Lee
National Taiwan University of Science & Technology (Taiwan Tech)
KDD 2013 Workshop on Outlier Detection and Description, August 11th, Chicago Page 2
Outline
Problem and motivation: anomaly detection on ITS
Related work
Datasets & Ground Truth
Detection by view association
Results of batch learning and online learning
Discussion
Conclusion
KDD 2013 Workshop on Outlier Detection and Description, August 11th, Chicago Page 3
Problem and Motivation
Anomaly detection in sensor-deployed intelligent
transportation systems (ITS)
Anomalous events in transportation systems include:
Traffic accidents
Emergency car passing
Harsh weather conditions, etc.
Focusing on traffic accident in this study
Plan to automatically create a police report for the
anomalous events
KDD 2013 Workshop on Outlier Detection and Description, August 11th, Chicago Page 4
Related Work…
Chawla et al proposed to model the road structures as a directed graph then
utilized PCA in order to detect anomalies. [Chawla et al, 2012]
Developed co-training algorithm which uses different views to label data for
semi-supervised learning. [Blum & Mitchell, 1998]
Proposed a hierarchical clustering method to classify vehicle motion
trajectories in real traffic video. [Fu et al, 2005]
Proposed using a tree of clusters in online fashion given trajectory data for
behavior analysis. [Piciarelli et al, 2006]
Employed manifold embedding method to examine anomalous cargo. [Agovic
et al, 2009]
Nevertheless, anomaly detection is important in the big-data era!!!
KDD 2013 Workshop on Outlier Detection and Description, August 11th, Chicago Page 5
Datasets and Ground Truth
― Anomaly Detection on ITS Data via View Association ―
KDD 2013 Workshop on Outlier Detection and Description, August 11th, Chicago Page 6
Datasets
Mobile Century Project Data (Century Data)
Collected on February 8, 2008
Including both of PeMS and GPS data
http://traffic.berkeley.edu/
Short-term but with more variety
California DOT Website Data (Caltrans Data)
Data recorded since 1993
Mainly PeMS data
http://pems.dot.ca.gov/
Long-term but with less data types
Focusing on the data of Dec, 14 2007 for now
KDD 2013 Workshop on Outlier Detection and Description, August 11th, Chicago Page 7
Datasets (cont’d)
PeMS Data (Loop detector)
Computing the temporal mean speed (TMS) for
every 5 (or 30) minutes
We associate each influence area with a
detector station
GPS Data (Trajectory)
Stored by mobile devices
Each mobile phone recording:
Position (latitude & longitude) & Velocity
for every 3 seconds
1388 trajectories collected
KDD 2013 Workshop on Outlier Detection and Description, August 11th, Chicago Page 8
Ground Truth
Detecting anomalies = accidents
Accident in Mobile Century Data
Time: 10:34 AM, February 8, 2008
Location: postmile 26.641
Consequence: a traffic congestion of 34 mins
Accident in Caltrants Data
Time: 1 PM, December 14, 2007
Location: postmile 26.641
Consequence: a traffic congestion of 38 mins
More anomaly data is needed!
Also focusing more on other types of anomalies
KDD 2013 Workshop on Outlier Detection and Description, August 11th, Chicago Page 9
Detection Scheme
― Anomaly Detection on ITS Data via View Association ―
KDD 2013 Workshop on Outlier Detection and Description, August 11th, Chicago Page 10
Detection Scheme
Feature
Extraction
Report
Feature
Extraction
Isomap
Hierarchical Clustering
Data
Representation
Clustering
view n
Report
view m
Data
Representation
Clustering
Final Report
KDD 2013 Workshop on Outlier Detection and Description, August 11th, Chicago Page 11
Views and Feature Extraction
View Data Source Feature
Flow Century PeMS 1. mean of flow
2. std. of flow
3. skewness of flow
4. mean of △flow
5. std. of △flow
6. skewness of of △flow
Speed Century GPS 1. mean of speed
2. std. of speed
3. skewness of speed
4. mean of △speed
5. std. of △speed
6. skewness of of △speed
Duration Century GPS 1. mean of duration
2. std. of duration
3. skewness of duration
4. total duration
Flow Caltrans PeMS similar to Century PeMS
Speed Caltrans PeMS similar to Century GPS
KDD 2013 Workshop on Outlier Detection and Description, August 11th, Chicago Page 12
Data Representation
Manifold learning approach
Utilizing Isomap in this work:
Construct the neighborhood graph using kNN
Compute the shortest path for each pair of points, by e.g., Dijkstra
algorithm
Apply Multidimensional Scaling (MDS) method for visualization
KDD 2013 Workshop on Outlier Detection and Description, August 11th, Chicago Page 13
Data Clustering
Given: projected data in low-dimensional
space
A hierarchical clustering (singular distance)
approach
Splitting data into two clusters: normal &
anomalous groups!
Applying 90-10 (or x% and 1 – x%) principle
in the clustering process (90% normal data
and 10% anomalous data)
KDD 2013 Workshop on Outlier Detection and Description, August 11th, Chicago Page 14
Report Generation
Based on the result of data clustering
One “raw” report generated (labeled) for each single view
Report association done by intersecting two or more reports, one
from each view, to generate the final report
Compared to ground truth for evaluation
More complicated view association mechanisms can be applied
By view association, we may be able to automatically decide the parameter
set that is necessary for the unsupervised learning method like anomaly
detection!
View association can be done in different data representation
spaces!
KDD 2013 Workshop on Outlier Detection and Description, August 11th, Chicago Page 15
Batch mode detection
― Anomaly Detection on ITS Data via View Association ―
KDD 2013 Workshop on Outlier Detection and Description, August 11th, Chicago Page 16
Experimental Settings
kiso = 5 (kNN in Isomap)
Intrinsic dimensionality = 2
Combined views (Exp. Via Century Data) :
- Flow & speed - Speed & duration
- Flow & duration - Flow, speed and duration
Dataset View Data Interval Station ID
Via Mobile
Century Project
Data
Century PeMS flow 5 mins, 1 station 400488 (24.007)
401561 (24.477)
400611 (24.917)
400284 (25.767)
400041 (26.027)
400165 (26.641)
Century GPS speed 1 station
Century GPS duration 1 station
Via Caltrans PeMS Caltrans PeMS flow, speed 30 mins, 1
station
400165 (26.641)
KDD 2013 Workshop on Outlier Detection and Description, August 11th, Chicago Page 17
Via the Mobile Century Project Data
Our final report indicates an accident near station 400165 (26.641)
which matches the ground truth
Ground truth: 10:34 AM, postmile 26.641, a duration of 34 mins
Flow and speed view combination gives us the best result
Result from all view combinations:
Flow and speed : 10:35 AM
Flow and duration : 10:50 AM
Speed and duration : 10:50 AM
Flow, speed and duration : 10:50 AM
KDD 2013 Workshop on Outlier Detection and Description, August 11th, Chicago Page 18
Via the Mobile Century Project Data (cont’d)
-30 -20 -10 0 10 20 30 40 50 60-10
-5
0
5
10
15
anomaly
normal
Speed view (GPS)-2 -1.5 -1 -0.5 0 0.5 1
-0.5
0
0.5
1
1.5
2
2.5
anomaly
normal
Duration view (GPS)-8 -6 -4 -2 0 2 4
-5
-4
-3
-2
-1
0
1
2
anomaly
normal
Flow view (PeMS)
40041
-8 -6 -4 -2 0 2 4-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
anomaly
normal
Flow view (PeMS)
400284
-5 -4 -3 -2 -1 0 1 2 3-3
-2
-1
0
1
2
3
anomaly
normal
Flow view (PeMS)
400488
Flow view (PeMS)
401561
-6 -5 -4 -3 -2 -1 0 1 2 3-3
-2
-1
0
1
2
3
anomaly
normal
Flow view (PeMS)
400611
-8 -6 -4 -2 0 2 4-6
-5
-4
-3
-2
-1
0
1
2
3
4
anomaly
normal
Flow view (PeMS)
400165
-10 -8 -6 -4 -2 0 2 4-5
-4
-3
-2
-1
0
1
2
anomaly
normal
Century PeMS Century GPS
KDD 2013 Workshop on Outlier Detection and Description, August 11th, Chicago Page 19
Via the Caltrans PeMS Data
The speed and flow view
combination also gives the
best result!
Our final report records the
accident happened at 1:00
PM, which matches the
ground truth
The anomalous points are
well separated from the
normal points in low
dimensional representation
-3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5-1.5
-1
-0.5
0
0.5
1
Normal
Anomaly-3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
Normal
Anomaly
Speed view Flow view
KDD 2013 Workshop on Outlier Detection and Description, August 11th, Chicago Page 20
Online mode detection
― Anomaly Detection on ITS Data via View Association ―
KDD 2013 Workshop on Outlier Detection and Description, August 11th, Chicago Page 21
Experimental Settings (Online detection)
Online detection via Caltrans PeMS
Focusing on the accident at December 14, 2007
Using speed view to detect anomalous events
Using the previous 10 days’ data
Only including data at the same time with the test data
Using previous days’ data without accidents
Choosing the previous days’ data for training:
Only using working days’ data
Including weekend days’ data (1-day weekend and 2-day weekend)
Detecting by hourly basis
KDD 2013 Workshop on Outlier Detection and Description, August 11th, Chicago Page 22
Online Detection via the Caltrans PeMS Data
Using previous days’
data to detect
anomalies in the
current moment
Working days
influence the
detection results:
(The “red squares”
are the predicted
anomalies, The “red
crosses” are the real
anomalies)
-10 -5 0 5 10 15 20-2
-1
0
1
2
3
4
5
Normal
Anomaly
-15 -10 -5 0 5 10-4
-3
-2
-1
0
1
2
3
Normal
Anomaly
-10 -8 -6 -4 -2 0 2 4-1.5
-1
-0.5
0
0.5
1
1.5
Normal
Anomaly
Weekend days
working days One-day weekend Two-day weekend
Working days One-day weekend Two-day weekend
12/14/2007 13:00 12/9/2007 13:00 12/8/2007 13:00
12/9/2007 13:00
Final Report
Weekend days
KDD 2013 Workshop on Outlier Detection and Description, August 11th, Chicago Page 23
Discussion
Need to have rigorous criteria to judge “qualified” views for
association
Views provide trustworthy information
Can not allow contextual anomalies crossing views
Can allow contextual anomalies within a view
Using Isomap mainly for data representation and visualization
View association can be applied w or w/o Isomap
Deriving a better x% & 1 – x% principle
A strict or soft principle?
Respect more to the principle or to the data?
KDD 2013 Workshop on Outlier Detection and Description, August 11th, Chicago Page 24
Conclusion
Proposed an ITS anomaly detector for traffic analysis
Detecting anomalies based on view association
Can automatically generating an anomaly report
Claimed benefits if compared to other detectors:
The proposed method needs little parameter tuning
Can detect different types of anomalies given different training
data
The method can work efficiently if implemented in parallel
KDD 2013 Workshop on Outlier Detection and Description, August 11th, Chicago Page 25
Acknowledgement
The research was supported by
Taiwan National Science Council under Grant NSC101-2218-E-011-009
Taiwan National Science Council, National Taiwan University and Intel
Corporation under Grants NSC100-2911-I-002-001 and NTU101R7501-1
(Intel-NTU Connected Context Computing Center - http://ccc.ntu.edu.tw/)
KDD 2013 Workshop on Outlier Detection and Description, August 11th, Chicago Page 26
q & a