21
NAME OF PRESENTER An Efficient Approach to Mine Flexible Periodic Patterns in Time Series Databases Supervised by Dr. Chowdhury Farhan Ahmed Associate Professor Md. Samiullah (Lecturer) Presented by Ashis Kumar Chanda Swapnil Saha Department of Computer Science and Engineering University of Dhaka

FPPM algorithm

Embed Size (px)

Citation preview

Page 1: FPPM algorithm

1 I NAME OF PRESENTER

An Efficient Approach to Mine Flexible Periodic Patterns in

Time Series Databases

Supervised byDr. Chowdhury Farhan AhmedAssociate Professor

Md. Samiullah (Lecturer)

Presented byAshis Kumar Chanda

Swapnil Saha

Department of Computer Science and EngineeringUniversity of Dhaka

Page 2: FPPM algorithm

2 I NAME OF PRESENTERCSE, DU2

Introduction

Problem Definitions

Motivation

Contribution

Experimental Results

Conclusion

1

2

4

7

8

Existing Algorithms3

The Proposed Algorithm

5

Topics to be covered

6

Page 3: FPPM algorithm

3 I NAME OF PRESENTERCSE, DU3

Extracting hidden patterns or structureGain Information from huge data

Data Mining

Introduction

Example:Periodic amount of money withdrawn within a fixed time Interval from an ATM booth in a specific location

Day Time slot Money amount (million)

Sun 12 am - 8 am8 am – 4 pm

4 pm – 12 am

269

Mon 12 am - 8 am8 am – 4 pm

4 pm – 12 am

1.2129

Thu 12 am - 8 am8 am – 4 pm

4 pm – 12 am

1.53

4.5

Page 4: FPPM algorithm

4 I NAME OF PRESENTERCSE, DU4

Flexible Periodic Pattern: Skipping a single or couple of particular intermediate characters or events which are not interesting in the user's point of view

F = ‘abc’ or ‘adc’

Introduction (cont.)

Example:Consider T = {abc adc abc}Flexible pattern = ‘a*c’Where ‘*’ indicates any unimportant intermediate events

‘a*c’

Page 5: FPPM algorithm

5 I NAME OF PRESENTER

Problem Definition

CSE, DU5

Flexible Pattern Mining: Given a sequence with n number of characters or events, S = {e1, e2, e3 ... en} a time series database, user specified maximum event skipping threshold, ϴ and support threshold, σ

Mine all possible Flexible Periodic sequence of events, FP = {e1, e2, e3 ... ei} Є S

that satisfy σ, and considering variable starting position st, where i ≤ n with maximum ϴ number of unimportant intermediate events

Page 6: FPPM algorithm

6 I NAME OF PRESENTER

Existing Algorithms

CSE, DU6

Effective periodic pattern mining

Apriori based sequential pattern mining

Nishi et al. 2013 Huge candidate set,False pattern generation

Most notable algorithms:Algorithm Mechanism Authors Year Drawbacks

CONV Convolution process

M. G. Elfeky et al.

2005 Fails in insertion, deletion process

WARP Time warping technique

M. G. Elfeky et al.

2005 Only detects segment periodicity

STNR Suffix tree Rasheed et al. 2010 Lack of skipping intermediate events

Page 7: FPPM algorithm

7 I NAME OF PRESENTER

Motivation

CSE, DU7

Apriori based approach should be avoided

To vary starting positions in generated sequences

Mine three types of periodicity detection in one run

Page 8: FPPM algorithm

8 I NAME OF PRESENTER

Contribution

CSE, DU8

Reduced redundant patterns>

Developed a new algorithm using suffix tree like data structure to generate Flexible Periodic Patterns

>

Also proposed a new periodicity detection algorithm>

Capable of mining all three types of periodicity in a single run>

Considered variable starting positions from the given time series sequence

>

Page 9: FPPM algorithm

9 I NAME OF PRESENTER

Terms & Definitions

CSE, DU9

• Occurrence vector• Confidence

T = {acbd afbd agbd}

Occurrence vector:• occ_vec[a] = [0, 4, 8]• occ_vec[c] = [1]• occ_vec[b] = [2, 6, 10]

Confidence of ‘a’:• actual periodicity = 3• perfect periodicity = 3• Confidence = 3 / 3

Confidence of ‘c’:• actual periodicity = 1• perfect periodicity = 3• Confidence = 1 / 3

Perfect periodicity = (endpos – stpos + 1)/ periodConfidence = actual periodicity/ perfect periodicity

0 1 2 3 4 5 6 7 8 9 10 11

Page 10: FPPM algorithm

10 I NAME OF PRESENTER

Terms & Definitions

Ladder factor:• lad_fact[A2] = 3• lad_fact[A6] = 2

CSE, DU10

• Occurrence vector• Confidence • Length vector• Ladder factor

a

$ $

$b

bbb

A5A4

A3A2

A1A6

A7

A8

A9

Fig: SSES tree for T = {abb$}

Length vector:• len_vec[A2] = [3]• len_vec[A6] = [2, 1]

support threshold, σ = 50%

lad_fact = nth max(len_vec)n = size of len_vec * σ

Page 11: FPPM algorithm

11 I NAME OF PRESENTER

The Proposed Algorithm

CSE, DU11

Key Features:- Apply discretization technique on given database- Construct the Single Symbol Edge based Suffix (SSES)

tree - Calculate occurrence vector at the time of construction - Traverse the tree level-wise - Mine patterns following joining property- Check each generated patterns through the proposed

periodicity detection algorithm

Page 12: FPPM algorithm

12 I NAME OF PRESENTER

SSES Tree Construction

12

1T = { } abcabbabb$12 45

3934

23

45

6

7

8

9

10

11

1314

15

16

17

18

19

20

2930

31

32

33

43 35 44

36

37

38

40

41

42

2122

23

24

25

26

27

28

a

a

a

aa

a

aa

a

a

$

$

$

$

$

$

$

$

$bb

b

b

b

b

b

b

b

b

b

b

b

b

b

b

b

b

b

b

b

b

c c

c1

5 17

1412

23

4 6711

13 16 15

89

10

a

a

aa

a

$

$bb

b b

b

bc c

c

Period = 3

root

Page 13: FPPM algorithm

13 I NAME OF PRESENTERCSE, DU13

Unique event

occ_vec

Occurrence vector calculation

b [1, 4, 5, 7, 8]

Confidence calculation

Pattern

occ_vec

confidence status

b [1, 4, 7]

100% √

Algorithm Demonstration

1

5 17

1412

23

4 6711

13 16 15

89

10

a

a

aa

a

$

$bb

b b

b

bc c

c

Patterns

Pattern occ_vec

b [1, 4, 5, 7, 8]

L1

L4 L7 L5

L8

σ = 50%

Page 14: FPPM algorithm

14 I NAME OF PRESENTER

bb [4, 7]

CSE, DU14

Unique event

occ_vec

Occurrence vector calculation

c [1]

Confidence calculation

Pattern

occ_vec

confidence status

bc [1] 33% χ

1

5 17

14126

7

13 16 15

10a

aa

$

$b

b

b

c

ba [5] 100% √

bb [4, 7] 100% √

b [4, 7]

a [5]

Patterns

Pattern occ_vec

b [1, 4, 5, 7, 8]

Joinba [5]

b* [1, 4, 5, 7]

Algorithm Demonstration

L1

L4 L7 L5

L8

σ = 50%

Page 15: FPPM algorithm

15 I NAME OF PRESENTERCSE, DU15

Unique event

occ_vec

Occurrence vector calculation

a [1]

Pattern

occ_vec

confidence status

bba [4] 50% √

1

5 17

14126

7

13 16 15

10a

aa

$

$b

b

b

c

b*a [1, 4] 66% √

baa [] 0% χ

a [1, 4]

b [5]

Patterns

Pattern occ_vec

Joinba [5]

b* [1, 4, 5, 7]

bab [5] 100% √

bbb [] 0% χ

b*b [5] 100% √

bb [4, 7]

b*a [1, 4]

bab [5]

bba [4]

b*b [5]

Algorithm DemonstrationConfidence calculation

L1

L4 L7 L5

L8

σ = 50%

Page 16: FPPM algorithm

16 I NAME OF PRESENTER

Final Result

CSE, DU16

Mined patternsPattern occ_veca [0, 3, 6]ab [0, 3, 6]abb [3, 6]

a*b [3, 6]

b [1, 4, 7]

bb [4, 7]

ba [5]

b*a [1, 4]

bab [5]

b*b [5]

c [2]

ca [2]

cab [2]

c*b [2]bba [4]

A1

1

5 17

1412

23

4 6711

13 16 15

89

10

a

aa

aa

a

$

$bb

b b

b

bc c

c

Pattern occ_vec

Pattern occ_vec

T = {abcabbabb$}

Page 17: FPPM algorithm

17 I NAME OF PRESENTER

Experimental Result

CSE, DU17

Page 18: FPPM algorithm

18 I NAME OF PRESENTER

Conclusion

CSE, DU18

Future Works:

Improve the proposed procedure to compare with noise-resilient features

Develop an efficient way to execute in parallel time series databases

Reduce memory consumption

Summary:

Mine Flexible Periodic Patterns using Suffix tree like structure

Improve performance by pruning tree Consider variable starting positions in given time sequence

Page 19: FPPM algorithm

19 I NAME OF PRESENTER

References

CSE, DU19

1. Mohamed G. Elfeky, Walid G. Aref, and Ahmed K. Elmagarmid. Periodicity detection in time series databases. IEEE Trans. Knowl. Data Eng., 17(7):875-887, 2005

2. Faraz Rasheed, Mohammed Al-Shalalfa, and Reda Alhajj. Adapting machine learning technique for periodicity detection in nucleosomal locations in sequences. In IDEAL, pages 870-879, 2007.

3. Manziba Akanda Nishi, Chowdhury Farhan Ahmed, Md. Samiullah, and Byeong-Soo Jeong. Eective periodic pattern mining in time series databases. Expert Syst. Appl., 40(8):3015-3027, 2013.

4. Mohamed G. Elfeky, Walid G. Aref, and Ahmed K. Elmagarmid. Warp: Time warping for periodicity detection. In ICDM, pages 138-145, 2005.5. Dan Gusfield. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology. Cambridge University Press, 1997.6. Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2000.7. Piotr Indyk, Nick Koudas, and S. Muthukrishnan. Identifying representative trends in massive time series data sets using sketches. In VLDB, pages

363-372, 2000.8. Roman M. Kolpakov and Gregory Kucherov. Finding maximal repetitions in a word in linear time. In FOCS, pages 596{604, 1999.9. Jian Pei, Jiawei Han, Behzad Mortazavi-Asl, Helen Pinto, Qiming Chen, Umeshwar Dayal, and Meichun Hsu. Prefix Span: Mining sequential patterns by

prefix-projected growth. In ICDE, pages 215{224, 2001.10. Faraz Rasheed, Mohammed Al-Shalalfa, and Reda Alhajj. Efficient periodicity mining in time series databases using suffix trees. IEEE Trans. Knowl.

Data Eng., 23(1):79-94, 2011.11. Faraz Rasheed and Reda Alhajj. Stnr: A suffix tree based noise resilient algorithm for periodicity detection in time series databases. Appl. Intell.,

32(3):267-278, 2010. 12. Esko Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249-260,1995.13. Andreas S. Weigend and Neil A. Gerschenfeld. Time Series Prediction: Forecasting the Future and Understanding the Past. Addison-Wesley, 1994.14. Huei-Wen Wu and Anthony J. T. Lee. Mining closed exible patterns in time-series databases. Expert Syst. Appl., 37(3):2098-2107, 2010.15. Ramakrishnan Srikant and Rakesh Agrawal. Mining sequential patterns: Generalizations and performance improvements. In EDBT, pages 3-17, 1996.16. Anthony K. H. Tung, Hongjun Lu, Jiawei Han, and Ling Feng. Breaking the barrier of transactions: Mining inter-transaction association rules. In KDD,

pages 297-301,1999.17. Chang Sheng, Wynne Hsu, and Mong-Li Lee. Mining dense periodic patterns in time series data. In ICDE, page 115, 2006. 18. Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo. Discovery of frequent episodes in event sequences. Data Min. Knowl. Discov., 1(3):259-289,

1997.19. Sheng Ma and Joseph L. Hellerstein. Mining partially periodic event patterns with unknown periods. In ICDE, pages 205-214, 2001.20. Earl F. Glynn, Jie Chen, and Arcady R. Mushegian. Detecting periodic patterns in unevenly spaced gene expression time series using lomb-scargle

periodograms. Bioinformatics, 22(3):310-316, 2006.21. Walid G. Aref, Mohamed G. Elfeky, and Ahmed K. Elmagarmid. Incremental, online, and merge mining of partial periodic patterns in time-series

databases. IEEE Trans. Knowl. Data Eng., 16(3):332-342, 2004.

Page 20: FPPM algorithm

20 I NAME OF PRESENTERCSE, DU20

Questions?

Page 21: FPPM algorithm

21 I NAME OF PRESENTERCSE, DU21

Thank You