Upload
hoan-nguyen
View
113
Download
3
Tags:
Embed Size (px)
Citation preview
Searching and MiningTrillions of Time Series Subsequencesunder Dynamic Time Warping
Thanawin Rakthanmanon, Bilson Campana, Abdullah Mueen,
Gustavo Batista, Brandon Westover, Qiang Zhu, Jesin Zakaria, Eamonn Keogh
Hoan Nguyen – Trung Minh Nguyen
2
Abstract
Optimizationsto search and mine
large databasesvery fast
3
Outline
Problem
Related work
Definitions
Method
Results
Conclusion
4
Problem
Similarity search is an important part of most time series data mining algorithm.
Dynamic Time Warping is the best measure to use but slow.
5
DefinitionsTime series
Time series T is an ordered list:
T = t1, t2, … ,tm
6
DefinitionsSubsequence
Subsequence Ti,k of time series T is a time series of length k start at position i:
T = t1, t2, … ,tm
7
DefinitionsDynamic Time Warping
8
Related workKnown optimizations
Squared distance
√❑
9
Related workKnown optimizations
Lower bounding
LB_KimFL LB_Keogh
10
Related workKnown optimizations
Early abandon
11
MethodEarly abandon Z-Normalization
Q
TT3
T2
T1
Z-N
orm
aliz
atio
n
Q’
T3’T2’
T1’
…
Long Time series
SubsequencesNormalized
Subsequences
QueryNormalized
Query
Normal approach
12
MethodEarly abandon Z-Normalization Novel approach
Early abandon with Z-normalization
1. Query is Z-normalized
2. Z-normalization of each subsequence will be calculated on the fly with the distance calculation.
3. If distance > best_so_far then early abandon both calculation
13
MethodRe-ordering Early Abandoning
Ordering is created based on the query.
14
MethodCascading Lower Bounds
Lower bounds are used in a cascade to prune candidates.
15
Results
Comparison between:
Naïve
- Z-normalization from start
- full ED(DTW) calculation
State-of-the-art (SOTA)
- Z-normalization from start
- early abandoning
- LB_Keogh bounding for DTW
UCRSuite
16
ResultsBaseline Tests on Random Walk
Million Billion Trillion0
5000
10000
15000
20000
25000
30000
UCR-ED
SOTA-ED
UCR-DTW
SOTA-DTWmin
ute
s
|𝑄|=128
17
ResultsBaseline Tests on Random Walk
Million Billion0
500
1000
1500
2000
2500
UCR-ED
SOTA-ED
UCR-DTW
SOTA-DTWseco
nd
s
|𝑄|=128
18
ResultsBaseline Tests on Random Walk
|𝑇|=2×106
19
ResultsEEG
Series10
100
200
300
400
500
600
3.4
494.3
UCR-ED
SOTA-ED
ho
urs
20
Conclusion
- The approach is very simple yet so effective.
- These optimizations can be applied to most measures but may not work for some, like: Hamming distance