12
Similarity measure based on piecewise linear approximation and derivative dynamic time warping for time series mining Hailin Li a , Chonghui Guo a,, Wangren Qiu b a Institute of Systems Engineering, Dalian University of Technology, Dalian 116024, China b Research Center of Information and Control, Dalian University of Technology, Dalian 116024, China article info Keywords: Similarity measure Dynamic time warping Piecewise linear approximation Time series mining abstract We propose a new method to calculate the similarity of time series based on piecewise linear approxi- mation (PLA) and derivative dynamic time warping (DDTW). The proposed method includes two phases. One is the divisive approach of piecewise linear approximation based on the middle curve of original time series. Apart from the attractive results, it can create line segments to approximate time series faster than conventional linear approximation. Meanwhile, high dimensional space can be reduced into a lower one and the line segments approximating the time series are used to calculate the similarity. In the other phase, we utilize the main idea of DDTW to provide another similarity measure based on the line seg- ments just we got from the first phase. We empirically compare our new approach to other techniques and demonstrate its superiority. Crown Copyright Ó 2011 Published by Elsevier Ltd. All rights reserved. 1. Introduction Time series is an ubiquitous data form which is relative to the time and distributed in various fields, such as the stock data (Yang, Wang, & Philip, 2003), the data of production consumption and web transaction (Samia & Conrad, 2007). Although some have nothing to do with the time, it can be transformed into the form of time series and studied by the models and algorithms of time series. For example, the data of the shape of the tree leaves can be treated as time series (Ye & Keogh, 2009). There is much valu- able information hiding in time series, including interesting pat- terns (Ajumobi, Pken, & Preda, 2004; Anthony, Wu, & Lee, 2009), anomaly points (Keogh, Lin, & Fu, 2005) and motifs (Lin, Keogh, Lonardi, & Patel, 2002). In most cases, we need to measure the sim- ilarity (Chen, Hong, & Tseng, 2009) or dissimilarity (distance) be- tween two time series in advance. However, dimensionality curse of time series goes against the accurate similarity measure. There are many ways to reduce the dimensionality, such as the discrete fourier transform (DFT) (Agrawal, Faloutsos, & Swami, 1993; Agrawal, Psaila, Wimmers, & Zait, 1995), singular value decomposition (SVD) (Korn, Jagadish, & Faloutsos, 1997), discrete wavelet transform (DWT) (Chan & Fu, 1999), piecewise linear approximation (PLA) (Manjula, Morgan, & Layne, 2008), and sym- bolic aggregate approximation (SAX) (Keogh & Pazzani, 1998; Keogh et al., 2005) based on the piecewise aggregate approxima- tion (PAA) (Hung & Duong, 2008). In particularity, SAX and PLA are widely applied to many fields (Keogh, Chakrabarti, Pazzani, & Mehrotra, 2001; Keogh, Chu, Hart, & Pazzani, 2001; Lin & Keogh, 2006) and obtain very good results. After reducing the dimensionality of time series data, Euclidean distance is useful and simple for similarity measure, but it has some disadvantages. For example, the abnormal data in time series affects the whole similarity measure. Moreover, it will abandon the sequence query too early, which causes the false alarm when indexing. Another popular method to compare time series in di- verse areas is the dynamic time warping (DTW) (Keogh & Pazzani, 1999; Keogh & Ratanamahatana, 2005). It offers a more reasonable measure for description of the relations between the different time series by time warping. Its improved version, which is called deriv- ative dynamic time series (DDTW) (Keogh & Pazzani, 2001), can produce the more intuitive warping and better results by consider- ing the derivative of the time series. In this paper, we propose a novel approach to measure the sim- ilarity of time series. Firstly, a divisive approach of piecewise linear approximation (DPLA), whose time complexity is lower than the conventional ones, is given to approximate time series. Secondly, we propose middle curve piecewise linear approximation (MPLA) based on DPLA to approximate time series. What we do have two advantages at least. One is that it is more suitable for the middle curve to describe the local and whole trends. Because some conventional methods based on the original time series are diffi- cult to express the trend and easy to fall into local optimization, it is reasonable for utilizing a middle curve to represent the original time series for their trends. The other is the lower time 0957-4174/$ - see front matter Crown Copyright Ó 2011 Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2011.05.007 Corresponding author. Tel.: +86 41184708007. E-mail addresses: [email protected] (H. Li), [email protected] (C. Guo), [email protected] (W. Qiu). Expert Systems with Applications 38 (2011) 14732–14743 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

1-s2.0-S0957417411007901-main

  • Upload
    saman

  • View
    212

  • Download
    0

Embed Size (px)

DESCRIPTION

data mining

Citation preview

Expert Systems with Applications 38 (2011) 14732–14743

Contents lists available at ScienceDirect

Expert Systems with Applications

journal homepage: www.elsevier .com/locate /eswa

Similarity measure based on piecewise linear approximation and derivativedynamic time warping for time series mining

Hailin Li a, Chonghui Guo a,⇑, Wangren Qiu b

a Institute of Systems Engineering, Dalian University of Technology, Dalian 116024, Chinab Research Center of Information and Control, Dalian University of Technology, Dalian 116024, China

a r t i c l e i n f o

Keywords:Similarity measureDynamic time warpingPiecewise linear approximationTime series mining

0957-4174/$ - see front matter Crown Copyright � 2doi:10.1016/j.eswa.2011.05.007

⇑ Corresponding author. Tel.: +86 41184708007.E-mail addresses: [email protected] (H. Li), g

(C. Guo), [email protected] (W. Qiu).

a b s t r a c t

We propose a new method to calculate the similarity of time series based on piecewise linear approxi-mation (PLA) and derivative dynamic time warping (DDTW). The proposed method includes two phases.One is the divisive approach of piecewise linear approximation based on the middle curve of original timeseries. Apart from the attractive results, it can create line segments to approximate time series faster thanconventional linear approximation. Meanwhile, high dimensional space can be reduced into a lower oneand the line segments approximating the time series are used to calculate the similarity. In the otherphase, we utilize the main idea of DDTW to provide another similarity measure based on the line seg-ments just we got from the first phase. We empirically compare our new approach to other techniquesand demonstrate its superiority.

Crown Copyright � 2011 Published by Elsevier Ltd. All rights reserved.

1. Introduction

Time series is an ubiquitous data form which is relative to thetime and distributed in various fields, such as the stock data (Yang,Wang, & Philip, 2003), the data of production consumption andweb transaction (Samia & Conrad, 2007). Although some havenothing to do with the time, it can be transformed into the formof time series and studied by the models and algorithms of timeseries. For example, the data of the shape of the tree leaves canbe treated as time series (Ye & Keogh, 2009). There is much valu-able information hiding in time series, including interesting pat-terns (Ajumobi, Pken, & Preda, 2004; Anthony, Wu, & Lee, 2009),anomaly points (Keogh, Lin, & Fu, 2005) and motifs (Lin, Keogh,Lonardi, & Patel, 2002). In most cases, we need to measure the sim-ilarity (Chen, Hong, & Tseng, 2009) or dissimilarity (distance) be-tween two time series in advance. However, dimensionality curseof time series goes against the accurate similarity measure.

There are many ways to reduce the dimensionality, such as thediscrete fourier transform (DFT) (Agrawal, Faloutsos, & Swami,1993; Agrawal, Psaila, Wimmers, & Zait, 1995), singular valuedecomposition (SVD) (Korn, Jagadish, & Faloutsos, 1997), discretewavelet transform (DWT) (Chan & Fu, 1999), piecewise linearapproximation (PLA) (Manjula, Morgan, & Layne, 2008), and sym-bolic aggregate approximation (SAX) (Keogh & Pazzani, 1998;Keogh et al., 2005) based on the piecewise aggregate approxima-

011 Published by Elsevier Ltd. All

[email protected]

tion (PAA) (Hung & Duong, 2008). In particularity, SAX and PLAare widely applied to many fields (Keogh, Chakrabarti, Pazzani, &Mehrotra, 2001; Keogh, Chu, Hart, & Pazzani, 2001; Lin & Keogh,2006) and obtain very good results.

After reducing the dimensionality of time series data, Euclideandistance is useful and simple for similarity measure, but it hassome disadvantages. For example, the abnormal data in time seriesaffects the whole similarity measure. Moreover, it will abandon thesequence query too early, which causes the false alarm whenindexing. Another popular method to compare time series in di-verse areas is the dynamic time warping (DTW) (Keogh & Pazzani,1999; Keogh & Ratanamahatana, 2005). It offers a more reasonablemeasure for description of the relations between the different timeseries by time warping. Its improved version, which is called deriv-ative dynamic time series (DDTW) (Keogh & Pazzani, 2001), canproduce the more intuitive warping and better results by consider-ing the derivative of the time series.

In this paper, we propose a novel approach to measure the sim-ilarity of time series. Firstly, a divisive approach of piecewise linearapproximation (DPLA), whose time complexity is lower than theconventional ones, is given to approximate time series. Secondly,we propose middle curve piecewise linear approximation (MPLA)based on DPLA to approximate time series. What we do have twoadvantages at least. One is that it is more suitable for the middlecurve to describe the local and whole trends. Because someconventional methods based on the original time series are diffi-cult to express the trend and easy to fall into local optimization,it is reasonable for utilizing a middle curve to represent theoriginal time series for their trends. The other is the lower time

rights reserved.

H. Li et al. / Expert Systems with Applications 38 (2011) 14732–14743 14733

consumption used to approximation. Therefore, we adopt middlecurve based on DPLA to get finite line segments of time series.These two processes constitute the new piecewise linear approxi-mation like PAA. Finally, due to the particularities of the line seg-ments derived from MPLA, we provide another reasonablesimilarity measure based on DDTW.

The rest of the paper is organized as follows. Section 2 discussessome methods of piecewise linear approximation and SAX basedon PAA. Meanwhile, we analyze a few problems which need tosolve. We also introduce derivative dynamic time warping. Section3 presents MPLA algorithm based on DPLA method and similaritymeasure based on DDTW in detail. In Section 5, we demonstratethe superiority of our approach by some experiments. The last sec-tion is the conclusion of the whole work, and shows some views forfuture research.

0 10 20 30 40 50 60−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Fig. 1. New values in lower space by PAA.

2. Related work

2.1. Piecewise linear approximation

There are many kinds of piecewise linear approximation (Keogh& Chu et al., 2001) to reduce the dimensionality of time series data,which can be grouped into three classes.

Window-Sliding: Once another next point joins in a window,which results in the cost of the line segment to approximate thesubsequence larger than a threshold values, then create the linesegment of the subsequence.

Bottom-Up: Merge the adjacent segments until every mergedcost of the segments is larger than a threshold value.

Top-Down: Segment the time series from top to down recur-sively until some threshold value is met.

The main idea of the window-sliding algorithm is to slide thewindow to the point which is not fall into the window. At the sametime, the line segment approximating the subsequence within thewindow is formed suitably. The approximation depends on athreshold value, which will cause several pathologically poor resultsunder some circumstances. Shatkay notices the problem and givessome explanation (Shatkay & Zdonik, 1996). Later, a modified ver-sion (Park, Lee, & Chu, 1999) is given to improve the algorithm.The two most important properties of the linear approximationare the linear time complexity and the online computation.

The bottom-up method is different from the sliding window.Firstly, it works well by the m/2 segments approximating time ser-ies of the length of m. Secondly, after merging the adjacent seg-ments with a minimum merging cost, it must delete one of thetwo merged segments and respectively recalculate the mergingcost between the other one and the new one. Redo above steps un-til every merged cost is larger than threshold value. Similar to thesliding window, the bottom-up approach is widely used for timeseries mining (Hunter & McIntosh, 1999) and its time complexityis linear to the length of time series, i.e. O(Km), where K is the num-ber of line segments. However, it is not an online algorithm.

As opposed to the above bottom-up algorithm, the top-downalgorithm approximates time series into different line segmentsby searching the best location each time. At each recursive stepthe algorithm should search a minimum divisive error. If the min-imum error at the ith point is smaller than a threshold value e, thesubsequence of time series Q will be partitioned into two parts atthe ith point. Next, let it consider the two subsequences againand redo the above steps until all the divisive minimum errorsare larger than the threshold value.

The top-down algorithm also has been widely used in diverseareas, such as a framework for the sequence mining, text miningand time series mining. Part et al. introduced a modified versionwhere they first mark every peak and valley after scanning the en-

tire dataset (Park et al., 1999). However, time complexity of top-down algorithm is larger than the two previous. Its time-consum-ing is O(Km2). The reason is that each recursion should calculatethe divisive errors of the two line segments when one point breaksone sequence into two subsequences. That is, each recursionshould find a best breakpoint to partition the present sequenceso that the approximation error of the two subsequences locatedin the both sides of the breakpoint is minimum.

The PLA algorithms represent time series with some line seg-ments. Most of the algorithms have a low computational complex-ity which is linear to length of the time series, but some havehigher complexity (Bauman, Dorofeyuk, & Kornilovm, 2004; Zhang& Wan, 2008) because they are in pursuit of the optimal results. InSection 3, we will propose another algorithm based on top-downalgorithm, whose time complexity is linear to the length of thetime series. We call it divisive piecewise linear approximation(DPLA).

2.2. Symbolic aggregate approximation

SAX transforms one original time series into some discretestrings. The whole work includes two phases. The first, piecewiseaggregate approximation (PAA), is a process of high dimensionalityreduction for time series. The second is a transformation frommean values to some discrete strings, which is the symbolic proce-dure. Since SAX has two important advantages, dimensionalityreduction and lower bounding (Lin, Keogh, & Lonardi, 2003), it isoften used to mine time series.

A time series of length m can be represented as a vectorQ = {q1,q2, . . . ,qm}, which can be transformed it into w dimensionalspace by a new vector Q 0 ¼ fq01; q02; . . . ; q0wg according to

q0i ¼Xk�i

j¼k�ði�1Þþ1

qj

0@

1A,k; i ¼ 1;2; . . . ;w; ð1Þ

where k = m/w.From formula (1), we know that time series data is divided into

w equal sized ‘‘frames’’ whose length is k and the element value ofnew vector is the mean of points falling within the frame. The timecomplexity of PAA for the measure similarity is linear to the lengthof time series, i.e. O(m). The representative value of the originaltime series in new space is shown in Fig. 1. SAX is a method tochange the mean values into discrete string representations (Linet al., 2003). Fig. 2 shows the result of SAX after PAA procedure.

It is easy to find that there are some special values in originalspace to be generalized, which cannot express the property of timeseries value well, such as local trend, whole trend and individualpoint distribution. There are several different cases hiding the

10 20 30 40 50 60−3

−2

−1

0

1

2

3

D

A

BC

Fig. 2. Symbolic representation of time series by SAX, but they only reflect thetrend of some points.

14734 H. Li et al. / Expert Systems with Applications 38 (2011) 14732–14743

valuable information in the new space as shown in Fig. 2. It meansthat in some cases the points within the frames have the equalmean values by SAX, but unfortunately, their correspondingEuclidean distance is still very large, which result in mismatch oftime series.

In Fig. 2, SAX algorithm cannot represent the trend of the origi-nal time series well. It only generalizes those particular segmentsof time series with four discrete letters (A, B, C, D) without anyreflections of the local trend of the time series. The trend of timeseries is very important. It not only describes the features of thesubsequences but also embodies the whole trend of the time ser-

i

j

w1

w2

w3

...

11

n

m

Fig. 3. The warping path of dynamic time warping.

10 20 30 40 50 60

0

5

10

15

20

25

30

(a) DTW

Fig. 4. The difference of t

ies. Therefore, we should retain these significant properties of timeseries.

2.3. Derivative dynamic time warping

Dynamic time warping (DTW) used to similarity measure notonly reflects the naturally common features of two different timeseries but also can compare time series of different length. Manypapers regard it as a method to compute the true distance betweentwo different objects. Meanwhile, its improved version, derivativedynamic time warping (DDTW), can warp the time more suitablyand obtains better result than DTW.

Now suppose we have two time series, Q = {q1,q2, . . .,qn} andC = {c1,c2, . . . ,cm}. There is a n by m matrix D whose element isd(i, j) = (qi � cj)2.

The warping path W is a contiguous set of D’s elements, whichdenotes the mapping between Q and C. The lth element of W is de-fined as wl = d(i, j)l, so we have W = {w1,w2, . . . ,wl, . . . ,wk}, wherek 2 [max (m,n),m + n � 1), especially, when m = n, k 2 [m,2m � 1).The relative information is also illustrated in Fig. 3.

The warping path must be typically subject to several con-straints showed as follows.

Boundary conditions: w1 = d(1,1) and wk = d(m,n), mean thatthe path must begin at the element (1,1) and end up at the element(m,n) in matrix D.

Continuity and Monotonicity: Given wl = (a,b) thenwl�1 = (a0,b0), where 0 6 a � a0 6 1 and 0 6 b � b0 6 1. It not onlymakes the allowable steps in the warping path to adjacent cells,but also forces the points in warping path to be monotonously or-dered by time.

We know that there are many warping paths satisfying thoseconstraints, but only the best one is required. Therefore, we shouldchoose it carefully according to the minimum warping cost

DTWðQ ;CÞ ¼ minwl1k

ffiffiffiffiffiffiffiffiffiffiffiPkl¼1

wl

s( ): ð2Þ

Generally, the optimal path can be found by using dynamic pro-gramming method

rði; jÞ ¼ dði; jÞ þminrði; j� 1Þ;rði� 1; j� 1Þ;rði� 1; jÞ:

8><>: ð3Þ

The value of the cumulative distance r(i, j) is the distance d(i, j)found in the current cell plus the minimum of the cumulative dis-tance of the three adjacent elements. We need to seek for the three,rather than eight, adjacent elements because of the continuity andmonotonicity of the warping path.

10 20 30 40 50 60

0

5

10

15

20

25

30

(b) DDTW

he DTW and DDTW.

H. Li et al. / Expert Systems with Applications 38 (2011) 14732–14743 14735

Since DTW tries to explain variability of time series values bywarping the time axis, it causes unintuitive alignments where asingle point on time series maps to a large subsection of another.Keogh and Pazzani (2001) call it ‘‘ singularity’’. Moreover, if qi

and cj, respectively, are the points of a rising trend of one time ser-ies and a falling trend of the other time series, we might not mapone to the other directly because of the obvious feature. To dealwith this problem, DDTW modified DTW by considering moreabout the higher level feature of shapes. For example, the slopeof the points is an importance feature.

In DTW, the distance between qi and cj is d(i, j) = (qi � cj)2. How-ever, in DDTW, the square of difference of the estimated deriva-tives of qi and cj is replaced d(i, j). It uses

DiðqÞ ¼ðqi � qi�1Þ þ ðqiþ1 � qi�1Þ=2

2ð4Þ

to estimate the qi value. Likewise, the derivatives of the points of Calso can be got by formula (4) and is denoted as Dj(c). Then a newdistance is given by

D0ði; jÞ ¼ ðDiðqÞ � DjðcÞÞ2: ð5Þ

Now the original D is replaced by the new D0, and the remainingstep of DDTW is same to that of DTW. Fig. 4 shows the difference ofDTW and DDTW, explaining that DDTW can reflect the trend of se-quence but DTW cannot.

3. New piecewise linear approximation and similarity measure

The representations of original time series in new space shouldnot only reduce the dimensionality but also reflect the trends ofthe subsequences and the whole time series. That means represen-tations must indicate the trends clearly so as to distinguish theascending segments and descending segments. The slope of linesegment is an important feature for every sequence, approxima-tion had better reflect the changeable trend of time series. Tradi-tionally, some kinds of piecewise linear approximation get linesegments without slope research like PAA algorithm. As shown inFig. 5, the mark 1 and 3 have the same mean, which cannot reflectthe different local trend well. Some papers use the mean of thepoints in the frame to reduce dimensionality and simultaneouslylet the slope express the trend of sequence. For instance, LPAA pro-posed by Hung and Duong (2008) is a method based on the originalPAA and slope to improve the similarity measure. It can alleviatebut not get rid of the disadvantages of time series with strong turnsat some points. In other words, LPAA loses the ability of trendexpression after approximating the limited subsequence by PAA.

0 5 10 151

1.5

2

2.5

3

3.5

4

1

2

3

Fig. 5. The curve have strong turning. It has three different trend (descending, flatand ascending), but PAA cannot not express the sharp turning and trend well.

Fortunately, our proposed method can approximate the trend withthe slope and reduce the dimensionality better.

3.1. Divisive piecewise linear approximation

Divisive piecewise linear approximation (DPLA) is a kind of top-down algorithm. Our motivation is to reduce the dimensionalityand approximate the trends, including local trend changes of sub-sequence and whole trend changes of time series. If one line seg-ment L(i : j) is approximating the subsequence Q(i : j) of timeseries Q, approximation cost of the subsequence should be smallerthan a threshold values e. In this paper, we donot consider all thepoints together to find the minimum approximation cost. Instead,we only find the farthest point in the subsequence Q(i : j) to thecorresponding line segment L(i : j) and judge whether the distanceis larger than the threshold value e or not. If it is true, the farthestpoint is regarded as a breakpoint; otherwise, the line segment isseen as the best approximation of the subsequence.

For a subsequence of time series, Q(i : j), there is a line segmentL(i : j) to approximate it well. The distance of the point qt(i 6 t 6 j)to the line segment L(i : j) is denoted as D(qt,L(i, j)). Because the twoendpoints of the line segment L(i : j) is qi and qj, we have

Lði : jÞ ¼qi j� qji

j� iþ

qj � qi

j� it; i 6 t 6 j: ð6Þ

Suppose the equation of the straight line segment is q = b + at,namely, at � q + b = 0, then we have

a ¼ qj�qi

j�i ;

b ¼ qij�qjij�i :

8<: ð7Þ

So the distance of the point qt to the L(i : j) is

Dðqt; Lði : jÞÞ ¼ jat � qt þ bjffiffiffiffiffiffiffiffiffiffiffiffiffiffia2 þ 1p : ð8Þ

For a time series Q of length m or subsequence Q(i : j) with dis-crete real values, if we directly link every two adjacent points withm � 1 or j � i line segments, the distance of every point to the cor-responding line segment is equal to 0. It means the line segmentapproximation of time series is the best and the approximationcost is minimum (equal to 0). Although it is best to approximatetime series, it causes no dimensionality reduction and is meaning-less. Therefore, we let the distance between the point and the cor-responding line segment close to some value rather than theminimal distance. In other words, let the distance be larger thana defined threshold value e.

For subsequence Q(i : j), if the distance of point ql(i 6 l 6 j) to theline segment L(i : j) is maximum and D(ql,L(i : j)) > e, the point ql isa breakpoint and subsequence Q(i : j) is divided into two parts, Q(i :l) and Q(l : j); otherwise, the line segment approximates subse-quence Q(i : j) well, it is not necessary to divide the subsequenceQ(i : j) any more.

The algorithm of divisive piecewise linear approximation isshown as follows.

Step 1: Input time series Q(i : j) and threshold value e. A vector Bpis used to restore the breakpoints. k records the number ofthe present breakpoints. pos denotes the position of thenewest breakpoint. Initially, i = 1, j = m, where m is thelength of time series. Since the first point and the last pointare the special breakpoints, let k = 2, Bp(1) = q1 andBp(2) = qm.

Step 2: For time series Q(i : j), create line segment L(i : j) accordingto the formula (6). Set two variables l = i + 1 andbest_so_far = 0.

14736 H. Li et al. / Expert Systems with Applications 38 (2011) 14732–14743

Step 3: Calculate the distance of point ql to the line segment L(i : j),that is D(ql,L(i : j)).

Step 4: If D(ql,L(i : j)) > best_so_far, best_so_far = D(ql,L(i : j)) andpos = l.

Step 5: l = l + 1. If l P j, go to the step 6; otherwise, go back to step3.

Step 6: If best_so_far P e, k = k + 1, Bp(k) = qpos, go back to the Step2 and let the two subsequences Q(i : pos) and Q(pos : j)redo the step 2 to step 6, respectively.

Step 7: Sort the element of vector Bp by an ascending time andoutput the sorted result.

Every breakpoint is one of the endpoints of line segmentsapproximating time series. We only need to link the adjacentbreakpoints to create the line segments. From the process of theabove algorithm, our approach depends on the threshold value eas well as the conventional top-down method. As shown inFig. 6, the various groups of the line segments are made for differ-ent e.

It is not reasonable and feasible to use the presetting thresholdvalues to control the approximation. Every time series in largedatabase has unique features. In other words, we should considerthe special features for different time series. Therefore, It is impor-tant to automatically find a reasonable threshold value e to controlthe number and the trends of line segments to approximate timeseries adaptively. Usually, the standard deviation is used to de-scribe the variability of a data set from its mean. We choose thestandard deviation of the distances of the points in time series to

0 10 20 30 40 50 60−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

0 10 20 30 40 50 60−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Fig. 6. The number and the trends of line segments are

the first line segment L(1 : m) as the upper bound of the permissi-ble divisive condition. That is

e ¼ STDðDðqt ; Lð1 : mÞÞÞ

¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1m

Xm

t¼1

ðDðqt ; Lð1 : mÞÞ � 1m

Xm

t¼1

Dðqt ; Lð1 : mÞÞÞ2s

: ð9Þ

We must choose the threshold e carefully, because it impacts onthe last results. At the beginning, DPLA creates the first line seg-ment to approximate all points of time series. The first and lastpoints of time series are the two endpoints of the first line seg-ment. In some cases, if one or two of the two endpoints may abnor-mally deviate from most of the points, we cannot use the linesegment to calculate the e. Instead, we should search the next suit-able line segment without deviating from most of points to be thefirst line segment L(1 : m) for calculating the threshold value e, asshown in Fig. 7.

Although DPLA is similar to the top-down Algorithm, there aresome obvious differences. One is that time complexity of our ap-proach is linear to the length of time series. It is O(km), where kis the number of line segments and m is the length of time series.However, time complexity of the traditional top-down method isO(km2). Another is that we use the distance measure and standarddeviation to replace the original approximation cost which shouldbe calculated for every point of time series in traditional method.Especially, we set the standard deviation as the threshold value,which makes the approximation of time series have an adaptivityand reflects their own features.

0 10 20 30 40 50 60−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

0 10 20 30 40 50 60−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

various according to the different threshold values.

0 10 20 30 40 50 60−2

−1

0

1

2

3

4

5

L1

L3

L2

Fig. 7. L1 is far away from most of the points in time series, which will cause thestandard deviation large. We should look for the next line segment to replace it, andfind that the L2 embraces most of points in subsequence Q(1:59). So we shouldregard L2 as L1 and apply the formula to compute the e value. Since the distancedeviation of the points in subsequence Q(59:60) to L3 is equal to zero, L3 is notsuitable to be regarded as L1. Actually, it is one of the final line segmentsapproximating the subsequence.

2

H. Li et al. / Expert Systems with Applications 38 (2011) 14732–14743 14737

Actually, the distance D(qt,L(i : j)) of the point qt to the line seg-ment L(i : j) can be replaced by the length of the right angled side ofa right-angled triangle. According to the right triangle theorem, thelength of the hypotenuse is longer than any other right-angledsides. Therefore, to calculate the distance faster, we use

Dðqt; Lði : jÞÞ ¼ jqt � ðat þ bÞj

¼ qt �qij� qji

j� iþ

qj � qi

j� i

� ��������� ð10Þ

instead of the formula (8), which can speed up the whole divisivepiecewise linear approximation algorithm.

3.2. Middle curve-based piecewise linear approximation

In Fig. 8, some line segments with sharp slopes cannot expressthe trends of subsequences well. For example, the line segmentmarked ‘‘a’’ directly links the two endpoints in disregard of anypoint between them. All the points of the subsequence only appearon the bottom-left side of the line segment. It is not reasonable forthis approximation. Likewise, for the subsequences in time seriesmarked ‘b’, there are too many line segments to approximate thetrends. Actually, it is easy to find that the sequences are smooth ex-cept for some frequent amplitudes and can be approximated natu-rally by one or two line segments with slightly flat slopes.

Why does it happen in our algorithm? The reason is that wesearch the breakpoints and link them directly without considering

0 10 20 30 40 50 60−2

−1

0

1

2

a

b

Fig. 8. DLPA cannot express the slope well.

other points between the two breakpoints when the distance ofevery point within the sequence to the corresponding line segmentis smaller than e value.

To overcome the disadvantages, we propose another piecewiselinear approximation based on the middle curve (MPLA) whichlooks like a center line of an irregular pipe line. The whole algo-rithm has two phases. One is to find a middle curve to representthe original time series. The other is using DPLA to approximatethe middle curve. Because in the second phase we can regard themiddle curve as the original time series and put it into DLPA algo-rithm directly, we only consider how to create the middle curve inthis subsection.

The middle curve is the center line of time series. It seems to bea center line of an irregular pipe line. Therefore, we should createthe pipe line in advance. It means that the peaks and valleys oftime series should be on the edges of line pipe. It also means thatthe peaks and valleys can construct the edges of the line pipe. Sowe should mine the peaks and valleys in the time series and storethem in two matrixes, Up and Lw, which denote the sites of peaksand valleys, respectively.

If there are continuous ascending points for r times in somesubsequences, then the last ascending point at this subsequenceis peak point. We store the peak points in the matrix Up. Likewise,we obtain the valleys’ matrix, Lw. In other words, qi is the firstpoint of the subsequence Q(i : j). For each l(i < l 6 r + i) andr + i + 1 6 j, if ql�1 < ql and qr+i > qr+i+1, the point qr+i is the peak pointand restored into the matrix Up. Similarly, For each l(i < l 6 r + i)and r + i + 1 6 j, if ql�1 > ql and qr+i < qr+i+1, the point qr+i is valleypoint and restored into the matrix Lw.

Sometimes a small number of continuous points have severallarge amplitudes. That is, the difference of the adjacent points inthis subsequence is very large. For example, in Fig. 9 the pointsmarked ‘‘1’’, ‘‘2’’ and ‘‘3’’ are continuous and have large amplitudes.Although the subsequence is made up of three points, it has verylarge amplitudes. In this case, the algorithm must recognize themand save the first and last points in the point group with sametrend for the middle curve. It means the algorithm must save thepoints marked ‘‘1’’ and ‘‘4’’ for the middle curve. The point marked‘‘1’’ or ‘‘3’’ should be regarded as the point of the middle curvewithout any changes. However, because the points marked ‘‘1’’ to‘‘4’’ are in the ascending group, our algorithm only saves the pointsmarked ‘‘1’’ and ‘‘4’’ for the middle curve instead of the pointsmarked ‘‘1’’ and ‘‘2’’. In other words, If the difference of the adja-cent points (qi�1 and qi) is much larger than the threshold valuee, we directly save the point qi and qi�1 for the middle curve, whichmeans that the two points are the upper and lower bound pointssimultaneously, as shown in the Fig. 11. The points marked ‘‘C’’

0 10 20 30 40 50 60−2

−1.5

−1

−0.5

0

0.5

1

1.5

1

4

2

3

Fig. 9. Three points in time series have large amplitudes and should be saved forthe middle curve with considering the trends of points.

2 2

1 1

Fig. 10. Interpolate four values into line segment L12.

0 10 20 30 40 50 60−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

C

B

A

Fig. 11. The upper bound and lower bound of time series. The points marked ‘‘A’’represent the lower bound points, the points marked ‘‘B’’ mean the upper boundpoints and the points marked ‘‘C’’ denote upper bound point and lower bound pointtogether.

0 10 20 30 40 50 60−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Fig. 12. The middle curve and the original time series.

14738 H. Li et al. / Expert Systems with Applications 38 (2011) 14732–14743

are the upper and lower points. To execute the proposed algorithmconveniently, as illustrated in Fig. 10, we only interpolate severalnew values into the line segments (L12 and L23).

Given a line segment L(i : i + 1), if it is interpolated with n val-ues, the interpolated line segment Lðl : lþ nÞ0 ¼ ðq0l; q0lþ1; . . . ; q0lþnÞ,i.e.

q0lþj ¼ qi þ ðqi � qiþ1Þ j=n; 0 6 j 6 n: ð11Þ

By the above analysis, we obtain the upper bound matrix Upand lower bound matrix Lw. We put them into a combination ma-trix T = [Up;Lw] and sort the element of T by the time column. Fi-nally, with the sorted T, we can calculate every point of middlecurve and store into matrix M, i.e.

MðiÞ ¼ TðiÞ þ Tðiþ 1Þ2

; i ¼ 1;2; . . . lengthðTÞ � 1:

The algorithm of middle curve-based piecewise linear approxi-mation is illustrated as follows.

Step 1: Input time series Q = q1, q2, . . . , qm. The vector Len storesthe difference of every two adjacent points, namely,Len(i � 1) = qi � qi�1, i = 2, 3, . . . , m. Variance minLen isthe mean of Len. Use the matrix Q0 to record the new val-ues of the interpolated time series.

Step 2: Let j = 1, i = 2. Interpolate some values into the irregularsubsequence and execute the following sub-steps.Step 2–1: k = 1. If Len(i) > minlen, the number of interpo-

late values is num = bLen(i)/minlenc and thestep length is l = Len(i)/num.

Step 2–2: k = k + 1. If qi > qi�1, then q0k ¼ qi�1 þ l � j;otherwise, q0k ¼ qi�1 � l � j.

Step 2–3: If j < num � 1, j = j + 1 and go back to Step 2–2;otherwise, execute the next sub-step.

Step 2–4: k ¼ kþ 1; q0k ¼ qi. If i < m, i = i + 1 and go backto step 2–1; otherwise, go to Step 3.

Step 3: If q01 < q02, then tag = 1; otherwise, tag = 0. tag = 1 meansthat the current value of the points in time series isascending and tag = 0 means the current value isdescending. At the beginning, append the first pointof time series Q0 into Lw and Up respectively,Lw ¼ appendðLw; q01Þ and Up ¼ appendðUp; q0lÞ. Setn = length(Q0) and i = 2.

Step 4: If q0i < q0iþ1 and tag = 0, which means the trend of thepoints is changed from descending to ascending, thenq0i is the minimum value at present, append q0i intomatrix Lw, namely, Lw ¼ appendðLw; q0iÞ. Meanwhile,the tag value must be changed, tag = 1. If q0i > q0iþ1 andtag = 1, which means the trend is changed from ascend-ing to descending, then q0i is maximum value at present,Up ¼ appendðUp; q0iÞ and tag = 0.

Step 5: i = i + 1. If i 6 n � 1, go back to Step 4; otherwise, storethe last point of time series into Lw and Up, namely,Lw ¼ appendðLw; q0nÞ and Up ¼ appendðUp; q0nÞ.

Step 6: According to formula (12), combine the upper boundpoints values Up with the low bound points values Lwto calculate the points’ value of the middle curve M.

Step 7: Put the middle curve M into the DLPA algorithm, then wecan obtain the breakpoints Bp of the middle curve for theoriginal time series.

By the above algorithm, we can obtain the upper bound pointvalues and lower bound point values as shown in Fig. 11. They looklike the points on the edges of the irregular pipe line. According tothe formula (12), we can obtain the points values M of middlecurve shown in Fig. 12.

Fig. 13 illustrates the whole process of the middle curve-basedpiecewise linear approximation for time series. The data used byFig. 13 seems to be different from the one used to illustrate ourideas. Actually, they are identical except for stretching the lengthof time axis. From the bottom picture in Fig. 13, we find that ourmethod not only reduces the dimensionality but also can approxi-mate the local and whole trends of time series well.

3.3. Similarity measure based on MPLA and DDTW

We extract the features of the time series by MPLA. The featuresof time series are a set of the lines segments with slopes. Moreover,different time series have different number of line segments.Therefore, we cannot use the Euclidean distance to measure the

Fig. 13. The process of creating the middle piecewise linear approximation.

H. Li et al. / Expert Systems with Applications 38 (2011) 14732–14743 14739

similarity. Instead, derivative dynamic time warping algorithm is agood choice.

Due to the particularity of line segments, such as the numberand the length of line segments are diverse, we should not usederivative dynamic time warping directly. We define a warpingwindow length to constraint the warping path such as Sakoe–Chiba Band and the Itakura Parallelogram (Rabiner & Juang,1993; Sakeo & Chiba, 1990). For two time series Q and C, we canget the breakpoints, respectively, QBp = {qbp1,qbp2, . . . ,qbpl} andCBp = {cbp1,cbp2, . . . ,cbps}. If r denotes the half length of the warp-ing window in original time series, then we regard another formulaas the distance between the two points. That is,

Ldði; jÞ ¼dði; jÞ; if jqbpið1Þ � qbpjð1Þj < r;

dði; jÞ þ Pði; jÞ; otherwise;

�ð12Þ

where d(i, j) = (qbpi(2) � cbpj(2))2, P(i, j) = ((jqbpi(1) � qbpj(1)j � r)⁄w)2 and w is the mean of Euclidean distance of the subse-quences with the length of m chosen from two time series. That

is, w ¼ 1m

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPmi¼1ðqi � ciÞ2

q; m <¼ minðlengthðqÞ; lengthðcÞÞ.

From the formula (12), we know that if the time difference oftwo endpoints respectively from two line segments surpasses thewindow length r, we need to punish the distance of q(i) and c(j)so that time series in the different groups have a larger distancevalues.

0 10 20 30 40 50 60−2

0

2

4

6

8

10

12

Fig. 14. The warping result of middle line segments based on DDTW.

So, we transform the data with formula (5) and calculate thesimilarity by DDTW. The warping result is shown in Fig. 14. It iswell known that the complexities of DTW and DDTW are O(n m),where n and m are the length of Q and C, respectively. However,the time complexity of our approach is O(N M), where N and Mare the number of endpoints of line segments approximating Qand C. They are much smaller than m and n, that is, N� n andM�m.

4. Experimental evaluation

Many experiments have already demonstrated that the SAXalgorithm based on PAA and its extended versions are useful andfeasible for time series mining by Keogh and Lin (2002). Therefore,in this section, we mainly compare our method with SAX algorithm.

4.1. Approximation results

We begin our experiments to approximate time series from theStock data web page (2005) dataset with the length of 2119415.We arbitrarily choose 50 subsequences from the time series. Thelength of every subsequence is larger than 1000. After executingMPLA, we can obtain the results of the line segments. It is interest-ing to find that the average compress ratios are stable and approx-imately equal to 1

4. Fig. 15 shows the result of the average compressratio.

Compress Ratio ¼ Number of the line segmentsLength of the time series

: ð13Þ

From Fig. 16, it states that MPLA automatically creates the 73 linesegments to approximate time series with 300 points. The lines ex-press the original time series well. They not only reflect the wholetrend but also describe the local trend. Of course, if we want use lessnumber of line segments to express the original time series, we onlyneed to use the man-made setting of the threshold e to control theline segments in the process of MPLA.

Since SAX is based on PAA, we make another experiment illus-trate the average errors of the approximations to the original timeseries for PAA and MPLA respectively. We still use the Stock dataweb page (2005) dataset and choose arbitrarily 30 subsequencesof length 2000. When reducing dimensionality by the two meth-ods, we deliberately let the dimensionality number of PAA largerthan that of MPLA as shown in Fig. 17(a). It means that the approx-imation error of the SAX should be smaller than that of MPLA.However, in Fig. 17(b), the result of the experiments demonstratesthat the error of MPLA is smaller than that of the PAA in spite of its

1000 2000 3000 4000 5000 6000 7000 8000 9000 100000.2

0.21

0.22

0.23

0.24

0.25

0.26

0.27

0.28

0.29

The length of the subsequences

Com

pres

s ra

tio

Fig. 15. The average compress ratio for different length of time series.

0 50 100 150 200 250 300−2

−1.8

−1.6

−1.4

−1.2

−1

−0.8

−0.6

−0.4time seriesline segments

(a)

0 50 100 150 200 250 300−2

−1.8

−1.6

−1.4

−1.2

−1

−0.8

−0.6time seriesline segments

(b)

Fig. 16. (a) The separated curves of line segments and time series. (b) The overlap of line segments and time series, which means the line segments fit the time series well.

0 500 1000 1500 20000

100

200

300

400

500

600

700

800

900

1000

Length of the series

Dim

ensi

onal

ity n

umbe

r

Dimensionality number of PAADimensionality number of MPLA

(a) Dimensional number of PAA and MPLA

0 500 1000 1500 20000.004

0.006

0.008

0.01

0.012

0.014

0.016

Length of the series

Erro

r rat

ePAA ErrorMPLA Error

(b) Error of PAA and MPLA

Fig. 17. (a) Dimensionality number of PAA is greater than that of MPLA. (b) The error of MPLA is smaller than that of PAA to approximate the original time series, whoselength is 2000. It means that we can use the lower dimensionality of the MPLA to approximate the time series better than the higher dimensionality of PAA.

(a)40 60 80 100

8 9

1413 5 2 6 7

111012 1 3 4

(b)

Fig. 18. The best cluster result of the Euclidean.

14740 H. Li et al. / Expert Systems with Applications 38 (2011) 14732–14743

lower dimensionality. Therefore, MPLA can express the time seriesbetter than PAA.

4.2. Clustering

To testify the new similarity measure, we experiment clusteringstudy on synthetic control chart dataset (Alcock & Manolopoulos,1999) derived from UCI data. Comparing the hierarchical clusteringis one of the best ways for similarity comparison. The evaluation istypically objective. We only observe which dissimilarity measure isclose to the Euclidean distance. In view of the results of comparingthe Euclidean, SAX, IMPACTS and SDA in the paper Lin et al. (2003),we know that the SAX was the best one whose distance measure isclose to Euclidean. So we only compare our method with Euclideanand SAX and observe that whether our method is better than SAX.If it is true, our method is at least better than SAX, IMPACTS andSDA. We arbitrarily choose 14 time series to cluster with the hier-archy clustering. The results are shown in Figs. 18–21, which isused by different methods, including Euclidean, DDTW, SAX andMPLA. Note that the 14 data objects derive from 6 groups,normal{1,2}, cyclic{3,4,5}, increasing trend{6,7}, decreasingtrend{8,9}, upward shift{10,11,12}, downward shift{13,14}.

From Figs. 18–21, we know that the clustering result of DDTWis the worst, our method is at least better than DDTW. It alsomeans that our method is an improved DDTW because of the

new similarity measure based on DDTW. Moreover, it seems tobe surprised that the result of SAX is better than the new similaritymeasure. Actually, for the 3rd, 4th, 5th time series, the distance be-tween the 3rd and the 5th time series is larger than that betweenthe 4th and 5th. However, Since SAX only considers the whole

(a)0.2 0.25 0.3 0.35

4 5 8

131410 3 9

12 7

11 6 2 1

(b)

Fig. 19. The best cluster result of the derivative dynamic time warping (DDTW).

(a)1 2 3 4

71011 6

12 1 4 2 3 5 8 9

1413

(b)

Fig. 20. The best cluster result of SAX whose parameter is w = 10, word_size = 9.

(a)5 10 15

1314 8 9 6 7

111012 1 2 3 4 5

(b)

Fig. 21. The cluster result of new similarity measure.

11 12 10 6 7 1 3 4 2 5 8 9 14 130

0.5

1

1.5

2

2.5

Fig. 22. The result of SAX used to cluster is rough when the two parameters is setby w = 10 and a = 4, which demonstrates that it does not consider the local trend atall.

H. Li et al. / Expert Systems with Applications 38 (2011) 14732–14743 14741

trend and neglects the local trend, it is prior to classify the 3rd and5th time series into the same group. Since our method considersthe local and whole trend of time series, it not only classifies the4th and 5th time series into the same cluster in advance but alsoclusters the 3rd,4th and 5th time series into the same group.

Especially, for the 1st and 2nd time series, our method does notgroup them into ‘‘normal’’ group at the prior phase, which is thesame to the Euclidean. But SAX is prior to combine the 2nd withthe 1st and 4th because of considering of the means of the pointswithin the frames rather than the local trend. For considering localtrend of time series, the cyclic group (3rd, 4th, 5th) should be ana-lyzed because the cyclic property can better to reflect the localtrend. We find that SAX cannot classify the cyclic group into thesame cluster, but our method is able to and prior to do it. It is veryimportant that time series {6,7}, {8,9}, {10,12}, {13,14} aregrouped into the same class respectively, but SAX and other meth-ods cannot. Therefore, our method can deal with the local trendand the whole trend simultaneously.

Apart from the above advantages, we need to point out thatthere is none of parameter to be set for the MPLA. However, theSAX needs to set two parameters (the number of segments wand alphabet sizes a) which are usually hard to decide. Fig. 20 isthe best clustering result by adjusting the two parameters.Although the MPLA essentially has one threshold to set, yet it iseasy to be defined by formula (9). According to the differentparameters, SAX produces the rough clustering result as shownin Fig. 22. It means that we must have the priori information toset the parameters, otherwise we could not get good clusteringresults.

In additional, for the similar shapes of different time series, ourapproach can recognize it. For example, the similar shapes in twotime series are shown in Fig. 23. If we use the SAX to measurethe similarity, it cannot get the proper result of shape comparison,which should be equal to 1. However, the similarity result of ourapproach is equal to 1, which reflects the trend of the whole se-quence and subsequence well in time series. Therefore, it is betterfor our method to measure the similarity of the time series in spiteof equal and unequal length. It is beneficial to similarity search fortime series mining.

4.3. Similarity search

Similarity search in time series is to mine the similar subse-quences of the pattern sequence. Now we are studying the symbolicrepresentations of the line segments created by the MPLA. Weinitially use the cloud model (Li, Han, Shi, & Chan, 1998) to transformthe angles or the slopes into some string representations. Thesimilarity search is based on the symbolic representations.

The stock data is very large and has enough information for sim-ilarity search, so we experiment our method on it. We choose thesubsequence Q (200 : 250) as pattern sequence and let similaritysearch algorithm search time series Q(1 : 2000). The results ofthe similarity search are shown in Fig. 24. We find that the

Fig. 23. (a) Shows the two time series with the very similar shape in the same frame of axes; (b) shows the result of the SAX algorithm to approximate; (c) shows the newmethod to approximate.

0 500 1000 1500 2000−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

a

b

cd

(a)

5 10 15 20 25 30 35 40 45 50−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

a

d

b

c

(b)

Fig. 24. (a) The result of similarity search. (b) The zoom view of the result of similarity search.

14742 H. Li et al. / Expert Systems with Applications 38 (2011) 14732–14743

algorithm based on our method not only can search the pattern se-quence marked ‘‘a’’ but also discover the similar subsequences intime series Q(1 : 2000). They are the subsequences marked ‘‘a’’,‘‘b’’, ‘‘c’’ and ‘‘d’’. Therefore, the similarity research based on ourmethod also can get a good result.

5. Conclusions

In this paper, we propose a new method to measure the similar-ity of time series. DPLA adaptively divides time series into unequalsegments, which does not require any parameter to preset. More-over, its time complexity is O(kn), which is much lower than theconventional top–down linear approximation. We use MPLA,which is based on DLPA, to approximate time series with line seg-ments. It is better to reflect the trends of the subsequence and thewhole sequence in time series. Moveover, the line segments pro-duced by MPLA to approximate the middle curve of the originaltime series make the approximation error rate less. For the partic-ularity of the results of the MPLA, the modified derivative dynamictime warping is proposed to calculate similarity of time series,which is beneficial to separate the members from different groups.The empirical results demonstrate that the new similarity measureis an effective method for time series mining.

The experimental analysis shows that our method is better fordimensionality reduction and similarity measure, but the timecomplexity is a little higher than SAX. The reason is that our methodis based on derivative dynamic time warping to calculate thesimilarity.

The symbolic representations to measure time series can be ap-plied in our method. We have found a way (Li et al., 1998) to objec-tively describe the slopes or angles of line segments as symbolicrepresentations. Perhaps our method is a good choice for improv-ing the present and available methods of SAX to find motifs anddiscover rules. Of course, we also can use our method to do otherwork for time series mining, such as classification, clustering andother time series mining tasks.

Acknowledgments

This work is supported by the Natural Science Foundation ofChina (70871015 and 71031002) and the Fundamental ResearchFunds for the Central Universities (DUT11SX04). We also wouldlike to acknowledge Prof. Eamonn Keogh for the dataset and hisprocedure source code.

References

Agrawal, R., Psaila, G., Wimmers, E. L., & Zait, M. (1995). Querying shapes ofhistories. In Proceedings of the 21st international conference on very largedatabases, Zurich, Switzerland (pp. 502–514).

Agrawal, R., Faloutsos, C., & Swami, A. (1993). Efficient similarity search in sequencedatabases. In Proceedings of the 4th international conferences on foundations ofdata organization and algorithms. Chicago: Springer-Verlag.

Ajumobi, U., Pken, B., & Preda, A. (2004). Discovering all frequent trends in timeseries. In Proceedings of the winter international synposium on information andcommunication technologies (Vol. 6, pp. 1–6).

Alcock, R. J., & Manolopoulos, Y. (1999). Time-series similarity queries employing afeature-based approach. In 7th Hellenic conference on informatics (pp. 1–9).

H. Li et al. / Expert Systems with Applications 38 (2011) 14732–14743 14743

Anthony, J. T. L., Wu, H. W., & Lee, T. Y. (2009). Mining closed patterns in multi-sequence time-series databases. Data & Knowledge Engineering, 68, 1071–1090.

Bauman, E. V., Dorofeyuk, A. A., & Kornilovm, G. V. (2004). Optimal piecewise-linearapproximation algorithms for complex dependencies. Automation and RemoteControl, 65, 1667–1674.

Chan, K., & Fu, W. (1999). Efficient time series matching by wavelets, In Proceedingsof the 15th IEEE international conference on data engineering (pp. 117–126).

Chen, C. h., Hong, T. P., & Tseng, V. S. (2009). Mining fuzzy frequent trends from timeseries. Expert System with Applications: An International Journal, 36, 4147–4153.

Hung, N. Q., & Duong, T. A. (2008). An improvement of paa for dimensionalityreduction in large time series databases. In Proceedings of the 10th Pacific riminternational conference on artificial intelligence (Vol. 12, pp. 698–707).

Hunter, J., McIntosh, N., (1999). Knowledge-based event detection in complex timeseries data. In Proceedings of the joint European conference on artificial intelligencein medicine and medical decision making. (pp. 271–280).

Keogh, E., & Lin, J. (2002). <http://www.cs.ucr.edu/eamonn/>.Keogh, E., & Pazzani, M. (1998). An enhanced representation of time series which

allows fast and accurate classification, clustering, and relevance feedback. InProceedings of the 4th international conference on knowledge discovery and datamining (Vol. 9, pp. 239–241).

Keogh, E., & Pazzani, M. (1999). Scaling up dynamic time warping to massivedataset. In Proceedings of the 3rd European conference on principles of data miningand knowledge discovery (Vol. 9, pp. 1–11).

Keogh, E., & Pazzani, M. (2001). Derivative dynamic time warping. In Proceedings ofthe 1st SIAM international conference on data mining (Vol. 1–11).

Keogh, E., Chakrabarti, K., Pazzani, M. J., & Mehrotra, S. (2001). Dimensionalityreduction for fast similarity search in large time series databses. Knowledge andInformation Systems, 3, 263–286.

Keogh, E., Chu, S., Hart, D., & Pazzani, M. (2001). An online algorithm for segmentingtime series. IEEE International Conference on Data Mining, 289–296.

Keogh, E., Lin, J., & Fu, A. (2005). Hot sax: efficiently finding the most unusual timeseries subsequence. In Proceedings of the 5th IEEE international conference ondata mining (Vol. 11, pp. 226–233).

Keogh, E., & Ratanamahatana, C. (2005). Exact indexing of dynamic time warping.Knowledge and Information Systems, 3, 358–386.

Korn, F., Jagadish, H. V., & Faloutsos, C. (1997). Efficently supporting ad hoc queriesin large dataset of time sequences. Special Interest Group on Management of Data(SIGMOD’97), 289–300.

Li, D. Y., Han, J. W., Shi, X. M., & Chan, M. C. (1998). Knowledge representation anddiscovery based on linguistic atoms. Knowledge-based Systems, 10, 431–440.

Lin, J., & Keogh, E. (2006). Group sax: extending the notion of contrast sets to timeseries and multimedia data. In Proceedings of the 10th European conference onprinciples and practice of knowledge discovery in databases (pp. 284–296).

Lin, J., Keogh, E., Lonardi, S., & Patel, P. (2002). Finding motifs in time series. In The8th ACM international conference on knowledge discovery and data mining (pp.53–68).

Lin, J., Keogh, E., & Lonardi, S. (2003). A symbolic representation of time series, withimplications for streaming algorithms. In Proceedings of the 8th ACM SIGMODworkshop on research issues in data mining and knowledge discovery (Vol. 7, pp.2–11).

Manjula, A. I., Morgan, M. H., & Layne, T. W. (2008). A performance comparison ofpiecewise linear estimation methods. In Proceedings of the 2008 SpringSimulation Multi-Conference (Vol. 4, pp. 273–278).

Park, S., Lee, D., & Chu, W. W. (1999). Fast retrieval of similar subsequence in longsequence databases. In Proceedings of the 3nd IEEE Knowledge and DataEngineering Exchange Workshop (pp. 60–67).

Rabiner, L., & Juang, B. (1993). Fundamentals of speech recognition. Englewood Cliffs,N.J: Prentice Hall.

Sakeo, H., & Chiba, S. (1990). Dynamic programming algorithm optimization forspoken word recognition. Readings in Speech Recognition, 159–165.

Samia, M., & Conrad, F. (2007). A time-series representation for temporal webmining using a data band approach. In Proceedings of the 2007 conference ondatabases and information systems IV (Vol. 6, pp. 161–174).

Shatkay, H., Zdonik, S.B. (1996). Approximate queries and representations for largedata Sequences. In Proceedings of the 12th International Conference on DataEngineering (pp. 536–545).

Stock data web page, 2005. <http://www.cs.ucr.edu/wli/FilteringData/stock.zip>.Yang, J., Wang, W., & Philip, S. Y. (2003). Mining asynchronous periodic patterns in

time series data. IEEE Transactions on Knowledge and Data Engineering, 15,613–628.

Ye, L. X., & Keogh, E. (2009). Time series shapelets: A new primitive for data mining.In International conference on knowledge discovery and data mining, Paris, France(pp. 947–956).

Zhang, H., & Wan, S. N. (2008). Linearly constrained global optimization viapiecewise-linear approximation. Journal of Computational and AppliedMathematics, 214, 111–120.