Mining Serial Episode Rules with Time Lags over Multiple Data Streams Tung-Ying Lee, En Tzu Wang...

Mining Serial Episode Rules with Time Lags over

Multiple Data StreamsTung-Ying Lee, En Tzu Wang

Dept. of CS, National Tsing Hua Univ. (Taiwan)

Arbee L.P. Chen Dept. of CS, National Chengchi Univ.

(Taiwan)

DaWaK’08

Outline Introduction Related work Preliminaries

◦ Support of a serial episode◦ Support/ confidence of a serial episode rule◦ Data structure used in the algorithms

Algorithms◦ LossyDL ◦ TLT

Experiments Conclusions

Introduction In many applications, data are generated as

a form of continuous data streams.◦ Continuously detecting flow and occupancy of a

road to qualify the congestion condition of a road forms data streams

◦ When roads A and B have heavy traffic, 5 mins later, road C will most likely be congested

◦ Regarding the values of flows and occupancies coming from roads as an environment of multi-streams and finding serial episode rules from it

◦ Serial episode rules with time lags (SER) : XlagY

Related Work Finding episodes/episode rules from static

time series data has been studied for decades

Finding episodes over data streams◦ Serial episodes [SSDBM04]◦ Episodes [KDD07]

Serial episode

Episode

Serial episode rule

Precursor Successor

Preliminaries Environment: a centralized system collecting n

synchronized data streams DS1, DS2, …, DSn ◦ n-tuple event: a set of items coming from all streams

at the same time◦ itemset: a subset of an n-tuple event◦ serial episode: described as an ordered list of itemsets

n-tuple event

Itemset {gA}

time: 1, 2, 3, 4, 5, 6, 7, 8DS1: a, b, b, c, g, a, b, f DS2: A, B, S, G, A, B, A, F

DSn: , , , , , , ,

e.g. serial episode (aA)(bB)

Minimal occurrence: given a serial episode S, a time interval [a, b] is a minimal occurrence of S, if ◦ S occurs in [a, b]◦ S does not occur in any proper subintervals of [a, b]◦ If (b-a+1) T, a time bound given by users, [a, b] is valid

MO(S): the set of all minimal occurrences of S Supp(S): the number of valid minimal occurrences of S

Serial episodes Minimal Occurrences Support(a A)(b B) [1, 2], [6, 7], [11, 12], [13, 14], [18, 19] 5(g G) [5, 5], [10, 10], [15, 15], [17, 17] 4

Time bound T: 3

Preliminaries (cont.)

A SER is R: S1Lag = LS2

Supp(R): |{[a, b]|[a, b]MO(S1)[a, b]: valid [c, d] MO(S2)[c, d]: valid s.t. (c-a) = L}

Conf(R) = Supp(R)/Supp(S1)

Time bound T: 3

Serial episode rules Minimal OccurrencesSupport, Confidence

(a A)(b B)→4 (g G)[1, 2]→[5, 5], [6, 7]→[10, 10], [11, 12]→[15, 15], [13, 14]→[17, 17]

Supp: 4,Conf: 4/5 = 0.8

Problem Formulation: given 4 parameters◦ the maximum time lag (Lmax) ◦ the minimum support (minsup)◦ the minimum confidence (minconf)◦ the time bound (T)

Find all SERs e.g. R: S1Lag = LS2 satisfying◦ L Lmax◦ Supp(R) N minsup, (N: the number of received n-tuple events)◦ Conf(R) minconf◦ Calculating supports for serial episodes and SERs must take T into

account

Serial episode rules Minimal Occurrences Support, Confidence

(a A)(b B)→4 (g G) [1, 2]→[5, 5], [6, 7]→[10, 10], [11, 12]→[15, 15], [13, 14]→[17, 17]

Supp: 4, 4 (N=19) 0.2Conf: 4/5 = 0.8

Lmax Minsup Minconf T5 0.2 0.8 3 8

Using the prefix tree for keeping serial episodes S: a serial episode, X: an item

◦ S+X: X follows S◦ S+_X: X and the last itemset in S appear at the same

Serial episode (AB)

Serial episode (A)(B)

Root Level 0

Level 1

Level 2

The concept of LossyDL: keeping the valid minimal occurrences of a serial episode for generating rules

[1, 2]

[1, 1]

At time point = 3, a 2-tupe event (BC) arrives, T = 3

Each item in the current 2-tuple event needs to be processed (traversing in a bottom-up order)

[2, 2]

B [1, 3]

[1, 3]: not minimalB [2, 3]

[3, 3]

The last two minimal occurrences needs to be checked

Processing C can generate (B)(C): [2, 3] and (BC): [3, 3]

Using Lossy Counting [VLDB02], whenever N 0 mod 1/, the oldest minimal occurrence is removed

LossyDL

Mining SERs◦ For any two serial episode with supports

(minsup ) N are checked to see if any minimal occurrences of them can be combined. Then, Supp(R) can be computed

◦ For each R: S1Lag = LS2, it will be returned if Supp(R) (minsup ) N, and (Supp(R) + N)/Supp(S1) minconf

Serial episode rules Minimal Occurrences

(a A)(b B)→4 (g G) [1, 2]→[5, 5], [6, 7]→[10, 10], [11, 12]→[15, 15], [13, 14]→[17, 17]

Serial episodes Minimal Occurrences(a A)(b B) [1, 2], [6, 7], [11, 12], [13, 14], [18, 19](g G) [5, 5], [10, 10], [15, 15], [17, 17]

LossyDL (Rule Generation)

A lot of minimal occurrences are kept in LossyDL, but only the last two are used while updating◦ Keeping supports instead of the minimal occurrence lists◦ How to generate rules without the minimal occurrence

lists? ◦ Re: using the following observations to prune the

insignificant rules Observations

◦ XL(AB) and XLA, obviously Supp(XLA) Supp(XL(AB)): XL(AB) is not significant if XLA does not satisfy one of minsup and minconf

◦ (AB)L(CD) and ALC, obviously Supp(ALC) Supp((AB)L(CD)): (AB)L(CD) is not significant if Supp(ALC) < Supp(AB) minconf

Observations (cont.):◦ Given a SER: (A)(B)5(CD), and T = 3

A1B or A2B, that is ApB, 0<p< T (T1 types) A1B4(CD), A2B3(CD), that is ApBLp(CD) Supp(ApBLp(CD)) min(Supp(ApB), Supp(BLpC)) (A)(B)5(CD) is not significant, if

pmin(Supp(ApB), Supp(BLpC)) < Supp(A)(B) minconf Using the observations to prune insignificant rules Time lag table (TLT)

◦ ALB is a reduced SER, if A and B are single items◦ For finding S1LmaxS2, the reduced SERs having a time lag at

most Lmax+T1 (from the first itemset of precursor to the last itemset of successor)

◦ Using Lmax+T1 Time Lag Tables to keep the supports of reduced SER

TLT (cont.)

The support and the last two minimal occurrences of an serial episode are kept in the prefix tree◦ Keeping supports instead of keeping minimal occurrence

lists◦ Keeping the last two minimal occurrences for updating

the supports◦ Whenever N 0 mod 1/, all supports are decreased by 1

In addition, the last Lmax+T1 n-tuple events are kept for updating the Time Lag Tables

TLT (cont.)

Mining SERs◦ Any two serial episode with supports (minsup

) N form the candidate SERs A candidate SER will be returned if it can pass the

pruning rules from the above observations

TLT (Rule Generation)

Two real dataset◦ PDOMEI: the dataset contains the dryness and

climate indices derived by experts, usually used to predict droughts Four streams with distinct items # = 28

◦ Traffic: the dataset is “Twin Cities’ Traffic data near the 50th St. during the first week of Feb, 2006 Three streams with distinct items # = 55

Parameter setting◦ = 0.1minsup◦ Lmax = 10

Experiments

We address the problem of finding significant serial episode rules with time lags over multiple data streams and propose two methods to solve it. TLT is more space-efficient, but LossyDL has high precision

In the near future, we will combine these two methods into a hybrid method to investigate the balance between memory space and precision

Moreover, we will try to extend the problem of finding serial episode rules to that of finding general episode rules

Conclusions

Mining Serial Episode Rules with Time Lags over Multiple Data Streams Tung-Ying Lee, En Tzu Wang...

Documents

The Effectiveness Study of Music Information Retrieval Arbee L.P. Chen National Tsing Hua University 2002 ACM International CIKM Conference

TSING HUA · National ｜ Tsing Hua ｜ University Vol. 11 TSING HUA UNIVE ... Prize in Physics in 2015, ... Tung Yau of Harvard University, the winner of the Fields Medal in 1982

National Tsing Hua Universitylife.nthu.edu.tw/.../menu/article/Yeast-two-hybrid.pdf · National Tsing Hua University

20160314 - BI reply - 4th Subcom - ENG clean3...As revealed in the photos at Annex B, ... Tsing Yi 7 Kwai Tsing Cheung On Bus Terminus, Tsing Yi 8 Kwai Tsing Kau Wa Keng Public Transport

National Tsing Hua University

Links Lags Ladders

Socio-Economic Review of Appalachia - Home - … lags behind rural America; urban Appalachia lags behind urban America; and metropolitan Appalachia lags behind metropolitan America

m04 Cne200 Lags

Links Lags Ladders - Mosaic Projects

C201700217a02 ODII ] M Route A10 …FileLinkedWithBanner...After Tsing Ma Bridge, divert via North West Tsing Yi Interchange, Tsing Yi North Coastal Road, Tsing Tsuen Road, Tsuen Wan

School Work - Tsing Tao Beer Presentation

LAGs CATALOGUE-Approved Local Action Groups (LAGs) in the Republic of Croatia 2013 … · 2021. 1. 10. · LAGs CATALOGUE-Approved Local Action Grou ps (LAGs) in the Republic of Croatia

Link Aggregation Groups (LAGs)

DISTRIBUTED LAGS

Policy Lags and Crowding-Out Effect

Anna Tsing - Global Situation

Indexing - National Tsing Hua University

Music Retrieval and Analysis Part I: Music Retrieval Arbee L.P. Chen National Tsing Hua University ISMIR’03 Tutorial III

Tsing the Global Situation

Tsing Capital Presentation by Daniel Zhu