Discovering Interesting Sub-paths in Spatiotemporal Datasets: A Summary of
Results
Xun Zhou, Shashi Shekhar, Pradeep Mohan, Stefan Liess, and Peter K. Snyder
2
Outline
• Introduction• Problem Formulation & Challenges • Computational Solutions • Experimental Evaluation • Case Study • Conclusion and Future Work
3
Interesting ST Sub-path• Interesting subsets of ST paths
৹ Climate Change
৹ Transport Science
৹ Environmental Monitoring
Speed profile along a trajectory on I-95
Mississippi river
Source: http://ops.fhwa.dot.gov/tolling_pricing/value_pricing/pubs_reports/projectreports/i95managedlanes/index.htm
Sea level rise along coastal areas
Source: http://scienceblogs.com/intersection/2009/01/federal_report_warns_of_rising.php
Source: http://blog.seattlepi.com/environment/
4
Sub-path of Abrupt Change• Spatial sub-path of Abrupt Change
৹ Sharp change in vegetation cover
৹ Transition between ecological zones (ecotones)
৹ Moves in response to climate change
The change is enduringly abrupt
W1=[12N, 17N]W2W3
A plot of vegetation cover along 18.5E longitude (the red line) from GIMMS vegetation dataset [1]
Vegetation Cover in Africa in NDVI (normalized difference vegetation index)
5
Related WorkInteresting Sub-path Discovery
Interesting point/unit sub-path
Interesting sub-path with arbitrary length (our work)
2-D: Edge detection[4]
1-D: Change point detection, e.g., CUSUM[3]
6
Our contribution
• Formalize the Interesting Sub-path Discovery problem
• A novel computational solution : SEP
• Cost model and analysis on its performance
• Case study in real application
7
Problem Formulation: Basic Concepts
• Interesting Sub-path (ISP):(1). Interest Measure: Function Fspi(i, j) R, R is a real value. Fspi is an
algebraic function[5] (e.g., mean=sum/count)
(2). Interestingness test T: Fspi {True, False}
(3). Example: “average increase is at least 3.5”
8 2 3 2 7 12 16 13 18 23 121
1 2 3 4 5 6 7 8 9 10 11 12
Attribute value
Location
Difference value: 7 -6 1 -1 5 5 4 -3 5 5 -11
Unit Sub-path : 1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-10 10-11 11-12
•Sub-path: A contiguous subset of a path
•Unit sub-path: two neighboring locations, length = 1. ৹A value is associated with each unit sub-path.
•Dominant ISP (DISP):৹An ISP that is not a subset of any other ISP.
Slope of (5, 11) = 3.5 !
Aggregate Functions
Distributive: SUM, COUNT. SUM(1, 5)= SUM(SUM(1,3), SUM(3,5))
Algebraic: AVG. AVG = SUM/COUNT.
Holistic: MEDIAN
8
Problem Statement
• Given৹ A path S in a ST framework with n unit sub-paths
৹ A function f of values associated with each sub-path in S
৹ A interestingness measure (algebraic function) Fspi: Rn R
৹ A test function T: R {True, False}
• Find৹ All the dominant interesting sub-paths (DISP) in S
• Objective৹ Reduce computational cost
• Constraints৹ Correctness & CompletenessDifference value: 7 -6 1 -1 5 5 4 -3 5 5 -11
Unit Sub-path : 1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-10 10-11 11-12
9
Challenges• No pre-defined maximum length for DISPs
৹ E.g., the length can range from 1 to then |S|
• Pattern interestingness is lack of monotonicity৹ Interest measures are usually algebraic functions
৹ E.g., sub-path (8, 9) in sub-path (5, 11).
• The data volume can be very large.৹ Long time series/Fine resolution images.
৹ GPS Trajectories.
10
Step 1: ISP identification• Exhaustively enumerate all the sub-paths• Scan each sub-path to compute and test the interestingness
Step 2: Dominated ISP elimination • For each ISP in the candidate set, eliminate all the ISPs it dominates
Computational Solutions: Naive Approach
• Bottleneck 1: Repetitive scans of sub-paths to computer Fspi.• Bottleneck 2: Many dominated sub-paths are generated.
11
Computational Solution: SEP Approach• Solution 1: Build lookup tables for distributive functions
৹ E.g., SUM(3,5)=SUM(1,5)-SUM(1,3)
৹ Built in linear time, lookup in constant time
৹ Reversible Aggregate Function[6] : sum, count, etc.
• Solution 2: Design efficient enumeration strategies ৹ Traverse the sub-path space in certain order
৹ Following the dominance relationship
• The Sub-path Enumeration and Pruning (SEP) Approach
Sub-path (1,2) (1,3) (1,4) (1,5) (1,6) (1,7) (1,8) (1,9) (1,10)
(1,11)
(1,12)
SUM 7 1 2 1 6 11 15 12 17 22 11
Difference value: 7 -6 1 -1 5 5 4 -3 5 5 -11
Unit Sub-path : 1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-10 10-11 11-12
12
• Step 0: Build the lookup table by scanning the entire path
• Step 1: Sub-path enumeration
• Step 2: Dominated sub-path elimination
{Identical to that of Naive Approach}
SEP with Row-wise Traversal
1-2
1-3
1-4
1-5
1-6
1-7
1-8
1-9
1-10
1-11
1-12
2-3
2-4
2-5
2-6
2-7
2-8
2-9
2-10
2-11
2-12
3-4
3-5
3-6
3-7
3-8
3-9
3-10
3-11
3-12
4-5
4-6
4-7
4-8
4-9
4-10
4-11
4-12
5-6
5-7
5-8
5-9
5-10
5-11
5-12
6-7
6-8
6-9
6-10
6-11
6-12
7-8
7-9
7-10
7-11
7-12
8-9
8-10
8-11
8-12
9-10
9-11
9-12
10-11
10-12 11-12
13
SEP with Top-down Traversal
1-2
1-3
1-4
1-5
1-6
1-7
1-8
1-9
1-10
1-11
1-12
2-3
2-4
2-5
2-6
2-7
2-8
2-9
2-10
2-11
2-12
3-4
3-5
3-6
3-7
3-8
3-9
3-10
3-11
3-12
4-5
4-6
4-7
4-8
4-9
4-10
4-11
4-12
5-6
5-7
5-8
5-9
5-10
5-11
5-12
6-7
6-8
6-9
6-10
6-11
6-12
7-8
7-9
7-10
7-11
7-12
8-9
8-10
8-11
8-12
9-10
9-11
9-12
10-11
10-12 11-12
1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-10 10-11 11-12
• Traversal space Grid-based DAG• A breadth-first traversal on the G-DAG
৹ A node can be visited only if none of its
predecessors is pruned.
৹ Determine the number of predecessors
1
1 2
1 2 2
1 2 2 2
1 2 2 2 2
1 2 2 2 2 2
1 2 2 2 2 2 2
1 2 2 2 2 2 2 2
1 2 2 2 2 2 2 2 2
1 2 2 2 2 2 2 2 2 2
0 1 1 1 1 1 1 1 1 1 1
1 2 3 4 5 6 7 8 9 10 11 12
1
2
3
4
5
6
7
8
9
10
11
12
14
Experimental Evaluation(1)
(1) PLR = 0.1 (worst case for SEP ) (2) PLR = 1 (best case for SEP top-down)
• Pattern Length Ratio (PLR)• Longest DISP’s length against total number of unit sub-paths
Run time: Naive vs. SEP two designs.
15
Experimental Evaluation(2) Performance of the two traversal designs with PLR: 0.1 1
Summary:
(1)SEP is scalable & efficient compared to the Naive approach.
(2) Top-down outperforms row-wise when data has longer DISPs.
16
Case Study: Results on Spatial Paths
Input : GIMMS vegetation cover in NDVI, Aug. 1-15, 1981, Africa.
Output : Sub-paths with vegetation cover change in above data.
• Interest Measure: “Sameness Degree (SD)”
• “Average value change” against “average value change that >=Θa”
17
Conclusion and Future Work
• Conclusion৹ SEP is a novel computational solution to the Interesting
Sub-path Discovery problem
৹ It is effective, efficient and scalable.
৹ A cost model is studied to analyze the performance tradeoff.
• Future Work৹ Improve algorithmic design and evaluation metric
৹ Interesting Spatial-Temporal Regions.
৹ Application on other domains (transport science, etc).
18
Acknowledgements and References
• We would like to thank ৹ ACMGIS reviewers৹ Sponsor of this work: NSF, USDOD৹ Spatial Database and Data Mining Group @ UMN৹ Kim Koffolt
References[1] Tucker, C.J., J.E. Pinzon, M.E. Brown. Global inventory modeling and mapping studies. Global Land Cover Facility, University of Maryland, College Park, Maryland, 1981-2006.[2] Joint Institute for the Study of the Atmosphere and Ocean(JISAO). Sahel rainfall index. http://jisao.washington.edu/data/sahel/.[3] E. Page. Continuous inspection schemes. Biometrika, 41(1/2):100-115, 1954.[4] J. Canny. A computational approach to edge detection. Readings in computer vision: issues, problems, principles, and paradigms, 184(87-116):86, 1987.[5] S. Shekhar and S. Chawla. Spatial Ddatabases: A Tour. Prentice Hall, 2003 (ISBN 013-017480-7).[6] S. Cluet and G. Moerkotte. Efficient evaluation of aggregates on bulk types. In In Proc. Int. Workshop on Dat
abase Programming Languages, 1995
19
Sub-path of Abrupt Change• Spatial sub-path of Abrupt Change
৹ Sharp change in vegetation cover
৹ Transition between ecological zones (ecotones)
৹ Moves in response to climate change
The change is enduringly abrupt
W1=[12N, 17N]W2W3
A plot of vegetation cover along 18.5E longitude (the red line) from GIMMS vegetation dataset [1]
Vegetation Cover in Africa in NDVI (normalized difference vegetation index)
• Temporal sub-path of Abrupt Change৹ Abrupt shift in precipitation, temperature, etc.
৹ Climate change detection.
Smoothed Sahel precipitation anomaly (JJASO)
Raw Sahel precipitation anomaly (JJASO)[2]
21
SEP with Top-down Traversal(2)
• Determine the number of predecessors
• Use an array to record the number
of predecessors visited
1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-10 10-11 11-12
Case # Predecessors
Root node 0
Boundary node 1
Inner node 2
5
11 2 10
22
• Step 0: Build the lookup table by scanning the entire path
• Step 1: ISP Identification
• Step 2: Not Needed
ptv[][] : predecessors to visit; Q[]: queue for breadth-first traversal;
Q.Enqueue (S)
While Q is not empty W = Q.pop() Compute Fspi(W) using the lookup tables If T(Fspi) == TRUE Then Output W Next Loop End IF For each successor (i, j) of W update ptv[i][j] If ptv[i][j]==0 Then Q.enqueue([i,j]) End For
End While
SEP with Top-down Traversal(3)
23
Theoretical Analysis• n: Number of unit sub-paths
Approach Naive SEP Row-wise SEP Top-down
Best case time complexity
O(n3) O(n2) O(n)
Worst case time complexity
O(n4) O(n2) O(n2)
Space complexity O(n) O(n) O(n2)
24
Case Study: Results on Temporal Dimension
Temporal Sub-paths of abrupt precipitation change in the Sahel region, Africa.
Recommended