23
Efficient Proper Length Time Series Motif Discovery 1 Sorrachai Yingchareonthawornchai, Haemwaan Sivaraks, Thanawin Rakthanmanon*, Chotirat Ann Ratanamahatana Department of Computer Engineering Chulalongkorn University, Thailand. *Kasetsart University, Thailand.

ICDM2013-Presentationyingchar/Slides/ICDM2013_slides.pdf · Title: ICDM2013-Presentation.key Created Date: 2/12/2014 4:21:38 AM

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ICDM2013-Presentationyingchar/Slides/ICDM2013_slides.pdf · Title: ICDM2013-Presentation.key Created Date: 2/12/2014 4:21:38 AM

Efficient Proper Length Time Series Motif Discovery

!1

Sorrachai Yingchareonthawornchai, Haemwaan Sivaraks, Thanawin Rakthanmanon*, Chotirat Ann Ratanamahatana

Department of Computer Engineering

Chulalongkorn University, Thailand. *Kasetsart University, Thailand.

Page 2: ICDM2013-Presentationyingchar/Slides/ICDM2013_slides.pdf · Title: ICDM2013-Presentation.key Created Date: 2/12/2014 4:21:38 AM

Outline

• Introduction & Research problem • The Proposed Algorithm • Experiments

– Correctness – Speed – Parameter-freeness

!2

Page 3: ICDM2013-Presentationyingchar/Slides/ICDM2013_slides.pdf · Title: ICDM2013-Presentation.key Created Date: 2/12/2014 4:21:38 AM

Time Series Motif

• Repeated occurring patterns in time series !!!!

• Definition? 1. A windowed pair with the closest

distance: Euclidean distance, Dynamic time warping, LCSS, etc.

!3

0 500 1000 1500 2000 2500-4

-2

0

2

Page 4: ICDM2013-Presentationyingchar/Slides/ICDM2013_slides.pdf · Title: ICDM2013-Presentation.key Created Date: 2/12/2014 4:21:38 AM

Research Problem

• What are the proper lengths of motifs?

!4

0 200 400 600 800 1000 1200 1400 1600-200

-100

0

100

200

0 500 1000 1500 2000 2500-4

-3

-2

-1

0

1

Electrocardiography: ECG

Electroencephalography: EEG

Page 5: ICDM2013-Presentationyingchar/Slides/ICDM2013_slides.pdf · Title: ICDM2013-Presentation.key Created Date: 2/12/2014 4:21:38 AM

DEFINITION OF VARIABLE-LENGTH MOTIF

!5

Page 6: ICDM2013-Presentationyingchar/Slides/ICDM2013_slides.pdf · Title: ICDM2013-Presentation.key Created Date: 2/12/2014 4:21:38 AM

Time Series Motif: Objective Function

• K-Compression Motifs – A motif class has > 1 occurrence

!6

argminH

DL(H) +DL(T |H)

DL(T ) = m ⇤ Entropy(T )Where

Page 7: ICDM2013-Presentationyingchar/Slides/ICDM2013_slides.pdf · Title: ICDM2013-Presentation.key Created Date: 2/12/2014 4:21:38 AM

Faster search

!!

• Computationally Intractable • Approximation via maximizing difference

before and after compression using H. • Decomposability principle

!7

argminH

DL(H) +DL(T |H)

�DL(T, T |H) = DL(T )� (DL(T |H) +DL(H))

=X

1ik

(DL(To(i):w)�DL(T

o(i):w|H))�DL(H); k � 2

Page 8: ICDM2013-Presentationyingchar/Slides/ICDM2013_slides.pdf · Title: ICDM2013-Presentation.key Created Date: 2/12/2014 4:21:38 AM

PLMD: ALGORITHM

!8

Page 9: ICDM2013-Presentationyingchar/Slides/ICDM2013_slides.pdf · Title: ICDM2013-Presentation.key Created Date: 2/12/2014 4:21:38 AM

Algorithm Overview

!9

100 1000 1300

L : {100,1000, 1300} total bitsave = 250

Proposed Update Rule

Candidate entry

Page 10: ICDM2013-Presentationyingchar/Slides/ICDM2013_slides.pdf · Title: ICDM2013-Presentation.key Created Date: 2/12/2014 4:21:38 AM

Answer Update Rule

• Overlapping condition – Two entries are overlapped iff each element

overlaps another entity’s element • Update Rule

– If there is no overlapping entry • Then, add the entity to answer set

– Else • If bitsave is superior, then replace an old one

!10

Entry

Entry

t

Page 11: ICDM2013-Presentationyingchar/Slides/ICDM2013_slides.pdf · Title: ICDM2013-Presentation.key Created Date: 2/12/2014 4:21:38 AM

0 400 800 1200 1600

-2

0

2

4Motif (L = 286)

Too small !!!Proper

!!!Too big

Measured by MDL principle

Page 12: ICDM2013-Presentationyingchar/Slides/ICDM2013_slides.pdf · Title: ICDM2013-Presentation.key Created Date: 2/12/2014 4:21:38 AM

Early Termination

• At some w, we may have already discovered all motifs

• Every new answer entries are “Too big” to update the answer set so far ! terminate

!12

Page 13: ICDM2013-Presentationyingchar/Slides/ICDM2013_slides.pdf · Title: ICDM2013-Presentation.key Created Date: 2/12/2014 4:21:38 AM

EXPERIMENTS

!13

Support webpage,www.cp.eng.chula.ac.th/~ann/icdm2013/

Page 14: ICDM2013-Presentationyingchar/Slides/ICDM2013_slides.pdf · Title: ICDM2013-Presentation.key Created Date: 2/12/2014 4:21:38 AM

Metrics

!14

||

),()(

DP

dpunitAoDSAoD DP

∩=∑∩

),( DPS =

}),(|),{( overlappedarepanddPpDddpDP ∧∈∈=∩1V. Niennattrakul, D. Wanichsan, and C. A. Ratanamahatana. Accurate SubsequenceMatching on Data Stream under Time Warping Distance. (PAKDD’09), pp. 752-755, Pattaya, Thailand.

t

pd

Page 15: ICDM2013-Presentationyingchar/Slides/ICDM2013_slides.pdf · Title: ICDM2013-Presentation.key Created Date: 2/12/2014 4:21:38 AM

Metrics

!• Correctness

!15

}{S=Ψ

∑Ψ∈Ψ

=S

SAoDSAoRscorrectnes )()(||

1

Page 16: ICDM2013-Presentationyingchar/Slides/ICDM2013_slides.pdf · Title: ICDM2013-Presentation.key Created Date: 2/12/2014 4:21:38 AM

Experiment: Correctness

• Single pattern datasets • Multi-pattern datasets • Generated by NRW (Non repeated random

walk) + Patterns acquired from UCR time series classification/clustering homepage. • Ground Truth

!16 E. Keogh, X. Xi, L. Wei, C. A. Ratanamahatana, “The UCR time series classification/clustering

homepage,” 2008, http://www.cs.ucr.edu/aeamonn/time_series_data/

Page 17: ICDM2013-Presentationyingchar/Slides/ICDM2013_slides.pdf · Title: ICDM2013-Presentation.key Created Date: 2/12/2014 4:21:38 AM

Results

!17

Dataset Pattern: # Rank AoR (d/p)* k-AoD

spd8** Oliveoil: 2 1 100%(2/2) 2-AoD: 99.65%

-2.2500

-1.5000

-0.7500

0.0000

0.7500

1.5000

2.2500

3.0000

0 50 100 150 200 250 300

Page 18: ICDM2013-Presentationyingchar/Slides/ICDM2013_slides.pdf · Title: ICDM2013-Presentation.key Created Date: 2/12/2014 4:21:38 AM

Results

Input Output!

Datasets!

Length!

PatternPattern Length

# patter

n

AoR AoD CorrectnessProposed

%kBM*

%Proposed

%kBM*

%Proposed

%kBM*

%

  mpd1

  7486

Gun 150 5 100 80 98.69 99.34 !99.03

!92.2250Words 270 3 100 100 99.27 98.89

Fish 463 2 100 100 99.14 98.29 

mpd2 

8842CBF 128 8 100 75 98.1 98.47 !

99.16!

88.73Yoga 426 2 100 100 99.6 96.39Fish 463 2 100 100 99.7 94.31

OliveOil 570 2 100 100 99 90.35

  mpd3

  7609

FaceAll 131 9 88.89 66.67 98.51 100 !96.19

!70.25Adiac 176 5 100 60 98.33 97.74

50Words 270 3 100 66.67 99.27 85.09

Beef 470 2 100 100 99.58 98.95 

mpd4 

10979FaceAll 131 7 100 57.14 97.78 98.51 !

98.74!

68.31Gun 150 6 100 50 99.34 98.69Adiac 176 8 100 75 98.31 99.44

OSULeaf 427 2 100 100 99.53 93.45 !18

* P. Nunthanid, V. Niennattrakul, and C. A. Ratanamahatana, “Parameter-free motif discovery for time series data,” ECTI-CON, 2012, pp. 1 – 4.

Page 19: ICDM2013-Presentationyingchar/Slides/ICDM2013_slides.pdf · Title: ICDM2013-Presentation.key Created Date: 2/12/2014 4:21:38 AM

Running time

!19

05

10152025

7486 7609 8842 10979

Hrs

length

kBMPLMD without Early Termination

PLMD with Early Termination

Page 20: ICDM2013-Presentationyingchar/Slides/ICDM2013_slides.pdf · Title: ICDM2013-Presentation.key Created Date: 2/12/2014 4:21:38 AM

Anytime algorithm: correctness plot

!20

Page 21: ICDM2013-Presentationyingchar/Slides/ICDM2013_slides.pdf · Title: ICDM2013-Presentation.key Created Date: 2/12/2014 4:21:38 AM

Real-world dataset

!21

Page 22: ICDM2013-Presentationyingchar/Slides/ICDM2013_slides.pdf · Title: ICDM2013-Presentation.key Created Date: 2/12/2014 4:21:38 AM

Summary

• Problem of finding proper length of motifs • We propose automatic parameter selection scheme based on MDL principle • The algorithm is robust for finding meaningful subset of motifs. • Speedups: (Decomposability, Early Termination)

• Some cases can discover motifs very fast. • Anytime-algorithm: (Matching addition with answers)

•Keep updating answers.!22

Page 23: ICDM2013-Presentationyingchar/Slides/ICDM2013_slides.pdf · Title: ICDM2013-Presentation.key Created Date: 2/12/2014 4:21:38 AM

Thank you for your attention Questions Answers

!23