20
Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping Thanawin Rakthanmanon, Bilson Campana, Abdullah Mueen, Gustavo Batista, Brandon Westover, Qiang Zhu, Jesin Zakaria, Eamonn Keogh Hoan Nguyen – Trung Minh Nguyen

Similar search with trillions of time series

Embed Size (px)

Citation preview

Page 1: Similar search with trillions of time series

Searching and MiningTrillions of Time Series Subsequencesunder Dynamic Time Warping

Thanawin Rakthanmanon, Bilson Campana, Abdullah Mueen,

Gustavo Batista, Brandon Westover, Qiang Zhu, Jesin Zakaria, Eamonn Keogh

Hoan Nguyen – Trung Minh Nguyen

Page 2: Similar search with trillions of time series

2

Abstract

Optimizationsto search and mine

large databasesvery fast

Page 3: Similar search with trillions of time series

3

Outline

Problem

Related work

Definitions

Method

Results

Conclusion

Page 4: Similar search with trillions of time series

4

Problem

Similarity search is an important part of most time series data mining algorithm.

Dynamic Time Warping is the best measure to use but slow.

Page 5: Similar search with trillions of time series

5

DefinitionsTime series

Time series T is an ordered list:

T = t1, t2, … ,tm

Page 6: Similar search with trillions of time series

6

DefinitionsSubsequence

Subsequence Ti,k of time series T is a time series of length k start at position i:

T = t1, t2, … ,tm

Page 7: Similar search with trillions of time series

7

DefinitionsDynamic Time Warping

Page 8: Similar search with trillions of time series

8

Related workKnown optimizations

Squared distance

√❑

Page 9: Similar search with trillions of time series

9

Related workKnown optimizations

Lower bounding

LB_KimFL LB_Keogh

Page 10: Similar search with trillions of time series

10

Related workKnown optimizations

Early abandon

Page 11: Similar search with trillions of time series

11

MethodEarly abandon Z-Normalization

Q

TT3

T2

T1

Z-N

orm

aliz

atio

n

Q’

T3’T2’

T1’

Long Time series

SubsequencesNormalized

Subsequences

QueryNormalized

Query

Normal approach

Page 12: Similar search with trillions of time series

12

MethodEarly abandon Z-Normalization Novel approach

Early abandon with Z-normalization

1. Query is Z-normalized

2. Z-normalization of each subsequence will be calculated on the fly with the distance calculation.

3. If distance > best_so_far then early abandon both calculation

Page 13: Similar search with trillions of time series

13

MethodRe-ordering Early Abandoning

Ordering is created based on the query.

Page 14: Similar search with trillions of time series

14

MethodCascading Lower Bounds

Lower bounds are used in a cascade to prune candidates.

Page 15: Similar search with trillions of time series

15

Results

Comparison between:

Naïve

- Z-normalization from start

- full ED(DTW) calculation

State-of-the-art (SOTA)

- Z-normalization from start

- early abandoning

- LB_Keogh bounding for DTW

UCRSuite

Page 16: Similar search with trillions of time series

16

ResultsBaseline Tests on Random Walk

Million Billion Trillion0

5000

10000

15000

20000

25000

30000

UCR-ED

SOTA-ED

UCR-DTW

SOTA-DTWmin

ute

s

|𝑄|=128

Page 17: Similar search with trillions of time series

17

ResultsBaseline Tests on Random Walk

Million Billion0

500

1000

1500

2000

2500

UCR-ED

SOTA-ED

UCR-DTW

SOTA-DTWseco

nd

s

|𝑄|=128

Page 18: Similar search with trillions of time series

18

ResultsBaseline Tests on Random Walk

|𝑇|=2×106

Page 19: Similar search with trillions of time series

19

ResultsEEG

Series10

100

200

300

400

500

600

3.4

494.3

UCR-ED

SOTA-ED

ho

urs

Page 20: Similar search with trillions of time series

20

Conclusion

- The approach is very simple yet so effective.

- These optimizations can be applied to most measures but may not work for some, like: Hamming distance