17
The Landmark Model: An The Landmark Model: An Instance Selection Method for Instance Selection Method for Time Series Data Time Series Data C.-S. Perng, S. R. Zhang, and D. S. Parker Instance Selection and Construction for Data Mi ning, Chapter 7, pp. 113-130 Cho, Dong-Yeon

The Landmark Model: An Instance Selection Method for Time Series Data C.-S. Perng, S. R. Zhang, and D. S. Parker Instance Selection and Construction for

Embed Size (px)

Citation preview

Page 1: The Landmark Model: An Instance Selection Method for Time Series Data C.-S. Perng, S. R. Zhang, and D. S. Parker Instance Selection and Construction for

The Landmark Model: An Instance The Landmark Model: An Instance Selection Method for Time Series Selection Method for Time Series DataData

C.-S. Perng, S. R. Zhang, and D. S. Parker

Instance Selection and Construction for Data Mining, Chapter 7, pp. 113-130

Cho, Dong-Yeon

Page 2: The Landmark Model: An Instance Selection Method for Time Series Data C.-S. Perng, S. R. Zhang, and D. S. Parker Instance Selection and Construction for

IntroductionIntroduction

Complexity Patterns: continuous time series segments with

particular features The reflection of events in time series is better

represented by patterns. The complexity of processing patterns

The number of all possible segments for a time series of length N is N(N+1)/2.

A simple inspection of each of these segments takes O(N3).

Good instance selection algorithms are especially helpful here, since they can greatly reduce complexity by reducing the volume of data.

Page 3: The Landmark Model: An Instance Selection Method for Time Series Data C.-S. Perng, S. R. Zhang, and D. S. Parker Instance Selection and Construction for

Similarity Model Euclidian distance does not match human intuition.

1,2,3,4,3 and 3,4,5,6,5

Previous works None of these proposed techniques supports a similarity model

that can both capture the similarity and support efficient pattern querying of time series.

Page 4: The Landmark Model: An Instance Selection Method for Time Series Data C.-S. Perng, S. R. Zhang, and D. S. Parker Instance Selection and Construction for

Pattern Representation Two formats for temporal association rules to verify the

cause-effect relation Forward association: C1,…,Cn E1,…,Em

Backward association: C1,…,Cn E1,…,Em

Association rules can be either formulated as hypotheses and verified with data, or be discovered by data mining process.

It is sill not clear what kind of segments can represented event.

What is the basic vocabulary for spelling association rule?

Page 5: The Landmark Model: An Instance Selection Method for Time Series Data C.-S. Perng, S. R. Zhang, and D. S. Parker Instance Selection and Construction for

Noise Removal and Data Smoothing Commonly-used smoothing techniques, such as moving

averages, often lag or miss the most significant peaks and bottoms.

These peaks and bottoms can be very meaningful, and smoothing or removing them can lose a great deal of information.

Little previous work takes smoothing as an integral part of the process of pattern definition, index construction, and query processing.

Page 6: The Landmark Model: An Instance Selection Method for Time Series Data C.-S. Perng, S. R. Zhang, and D. S. Parker Instance Selection and Construction for

The Landmark Data Model The Landmark Data Model and Similarity Modeland Similarity Model The Landmark Concept

Episodic memory: human and animals depend on landmarks in organizing their spatial memory

Landmarks: (times, events) Using landmarks instead of the raw data for processing N-th order landmark of a curve if the N-th order derivative is 0. Local maxima, local minima, and inflection points

Tradeoff The more different types of landmarks in use, the more accurat

ely a time series will be represented. Using fewer landmarks will result in storage savings and small

er index trees.

Page 7: The Landmark Model: An Instance Selection Method for Time Series Data C.-S. Perng, S. R. Zhang, and D. S. Parker Instance Selection and Construction for

Stock market data Almost half of the record The normalized error is reasonably small when the curve is

reconstructed from the landmarks. The more volatile the time series, the less significant the

higher-order landmarks.

Page 8: The Landmark Model: An Instance Selection Method for Time Series Data C.-S. Perng, S. R. Zhang, and D. S. Parker Instance Selection and Construction for

Smoothing Minimal Distance/Percentage Principle (MDPP)

A minimal distance D and a minimal percentage P Remove landmarks (xi, yi) and (xi+1, yi+1) if

Pyy

yyDxx

ii

iiii

2/|)||(|

|| and

1

11

Page 9: The Landmark Model: An Instance Selection Method for Time Series Data C.-S. Perng, S. R. Zhang, and D. S. Parker Instance Selection and Construction for

The effect of the MDPP

Page 10: The Landmark Model: An Instance Selection Method for Time Series Data C.-S. Perng, S. R. Zhang, and D. S. Parker Instance Selection and Construction for

Normalized error generated by the MDPP and DFT

Page 11: The Landmark Model: An Instance Selection Method for Time Series Data C.-S. Perng, S. R. Zhang, and D. S. Parker Instance Selection and Construction for

Transformations Six kinds of transformations

Shifting: SHk(f) such that SHk(f(t))=f(t)+k where k is a constant.

Uniform Amplitude Scaling: UASk(f) such that UASk(f(t))=kf(t) where k is a constant.

Uniform Time Scaling: UTSk(f) such that UTSk(f(t))=f(kt) where k is a positive constant.

Uniform Bi-scaling: UBSk(f) such that UBSk(f(t))=kf(t/k) where k is a positive constant.

Time Warping: TWg(f) such that TWg(f(t))=f(g(t)) where g is a positive and monotonically increasing.

Non-uniform Amplitude Scaling: NASg(f) such that NASg(f(t))=g(t) where for every t, g´(t)=0 if and only if f´(t)=0.

Page 12: The Landmark Model: An Instance Selection Method for Time Series Data C.-S. Perng, S. R. Zhang, and D. S. Parker Instance Selection and Construction for

The more transformation included in a similarity model, the more powerful the similarity model.

Page 13: The Landmark Model: An Instance Selection Method for Time Series Data C.-S. Perng, S. R. Zhang, and D. S. Parker Instance Selection and Construction for

These transformations can be composed to form new transformations.

The composition order is flexible: The composition is idempotent:

Two time series are defined to be similar if they differ only by a transform.

vuvu FGGF

vuw FFF

Page 14: The Landmark Model: An Instance Selection Method for Time Series Data C.-S. Perng, S. R. Zhang, and D. S. Parker Instance Selection and Construction for

Landmark Similarity Dissimilarity measure

Given two sequences of landmarks L= L1,…,Ln and L´= L´1,…,L´n where Li=(xi, yi) and L´i=(x´i, y´i), the distance between the k-th landmark is defined by where

The distance between the two sequences is

We define

)),(),,((),( LLLLLL ampk

timekk

otherwise2/|)||(|

|| if0

),(

otherwise0

1 if2/|)||(|

|)()(|),(

11

11

kk

kk

kkampk

kkkk

kkkktimek

yy

yyyy

LL

nkxxxx

xxxxLL

),(),(,),(),( amptimeamptime LLLLLL ampamptimetimeamptimeamptime and if ),(),(

Page 15: The Landmark Model: An Instance Selection Method for Time Series Data C.-S. Perng, S. R. Zhang, and D. S. Parker Instance Selection and Construction for

A land mark similarity measure is a binary relation on time series segments defined by a 5-tuple LSM=D,P,T,time,amp. Given two time series sequences s1

and s2, let L1 and L2 be the landmark sequences after MDPP(D, P) smoothing.

(s1, s2)LMS if and only if |L1|=|L2| and there exist two parameterized transformations T1 and T2 of T whose dissimilarity satisfies time(T1(L1), T2(L2)) < time and amp(T1(L1), T2(L2)) < amp.

Page 16: The Landmark Model: An Instance Selection Method for Time Series Data C.-S. Perng, S. R. Zhang, and D. S. Parker Instance Selection and Construction for

Data RepresentationData Representation

Family of Time Series Segments Equivalent under the six transformations

Replacing naïve landmark coordinates with various features of landmarks that are invariant under these transformations

F = {y, h, v, hr, vr, vhr, pv} hi=xi-xi-1 vi=yi-yi-1 hri=hi+1/hi vri=vi+1/vi vhri=vi/ hi pvi=vi/yi

Invariant features under transformations

Page 17: The Landmark Model: An Instance Selection Method for Time Series Data C.-S. Perng, S. R. Zhang, and D. S. Parker Instance Selection and Construction for

ConclusionConclusion

Landmark Model An instance selection system for time series This integrates similarity measures, data representation and

smoothing techniques in a single framework. Minimal Distance/Percentage Principle (MDPP): The smoothing

method for the Landmark Model

This also supports a generalized similarity model which can ignore differences corresponding to six transformations.

Intuitive to human