An Efficient Clustering Algorithm for Moving Object ...psrcentre.org/images/extraimages/16 214368.pdf · Evidence of increasing and continuous diffusion of low cost GPS devices, it

Abstract— Evidence of increasing and continuous diffusion of

low cost GPS devices, it is becoming the challenges to analyze the

moving objects trajectory data. To analyze the moving object

trajectories, there is a need for mechanism that how to effectively

cluster on moving objects. Trajectory clustering has long been an

important research direction on move mining, but still remains which

algorithm is more effective among existing algorithms. In this paper,

we propose a clustering algorithm which is based on Density-Based

Spatial Clustering of Applications with Noise (DBSCAN). It cannot

cluster data sets well with large differences in densities. We address

this problem by proposed clustering algorithm which enhanced the

DBSCAN by solving time consuming. Finally we evaluate an

efficient trajectory clustering algorithm with real trajectory dataset by

comparing with DBSCAN. Evaluation results show that proposed

clustering algorithm can provide better performance and minimal

error than DBSCAN.

Keywords—DBSCAN, MoveMine, Moving Object Trajectory,

Trajectory Clustering.

I. INTRODUCTION

ITH a widespread use of location aware devices such as

mobile phones and GPS-enabled devices, huge amount

of moving object data have been collected. This leads to a

growing research area with automatic analysis of animal

behavior and traffic management using computer vision

techniques. Many researchers pay a lot of attention on

trajectory data modeling, indexing and query processing issues

for trajectories and proposing new models specifically

dedicated to moving objects and their trajectories.

Based on the above motivation, MoveMine system is

designed for the discovery of various kinds of movement

patterns and knowledge in numerous applications such traffic

control, climatological forecast and animal movement pattern.

For instance, the animal migration demonstrates that there is a

temporally and spatially correlation with the movement of

creatures. In biological domains, many researchers discovered

that some wild animals form large social groups when

migration occurs. The study of animals' social behavior and

wildlife migration are more concerned with a group of animals'

movement patterns than each individual's.

MoveMine System is integrated into two functions: moving

Hnin Su Khaing is with University of Computer Studies, Mandalay,

Myanmar. (e-mail: [email protected]).

Thandar Thein is with University of Computer Studies, Yangon,

Myanmar. (e-mail: [email protected]).

object pattern mining and trajectory mining. Trajectory data

associated with moving objects is one of the fields which have

increased in volume considerably. This indication becomes a

challenge of finding moving animal belonging to the same

group. Trajectory clustering take part in trajectory mining and

there exits many algorithms using data mining techniques.

In general, there are a lot of data mining methods developed

for analyzing moving animal based on the nature of methods.

Especially, the data analysis task of clustering is to find objects

that have move in a similar way. DBSCAN is the one of the

algorithms for clustering the trajectory data. It can find a

number of clusters starting from the estimated density

distribution of corresponding nodes but it cannot well cluster

with very large densities.

The goal of this work is to propose an efficient clustering

algorithm which can solve the problem of DBSCAN for

moving object trajectories. This algorithm is composed of

three phases: partitioning; clustering and grouping. In

partitioning phase, we divide the trajectory data into 'k'

partitions. Then, we develop the clustering phase by exploiting

with DBSCAN and finally we group the separated clusters.

The rest of this paper is organized as follows: Section II

presents the related work and Section III describes background

theory for trajectory clustering. In Section IV, proposed

clustering algorithm is discussed and evaluation is conducted

in Section V. Finally conclusion is conducted in Section VI.

II. RELATED WORK

Trajectory clustering, one of which plays a major role in

moving object trajectory mining. There are a lot of studies for

trajectory data such as transportation management and

behavioral analysis. The author [3] observed that the moving

objects similarity between trajectory sets. He designed a

similarity metric to find the similarity between trajectory sets

where each set is generated by a moving object and based on

these measures, he proposed a clustering algorithm to cluster

trajectory sets. In order to prove the effective and efficiency of

algorithm his algorithm, he conducted with intensive

experiments using mobile phones data.

To reduce the estimating of complex parameters,

complexity and computational cost for human analyst, a vector

field k-Means clustering technique was proposed in [8] that

took together ideas from visualization [2], data clustering and

scalar field design to find a locally optimal cluster and

demonstrated that how can find global patterns and handle

An Efficient Clustering Algorithm for

Moving Object Trajectories

Hnin Su Khaing, and Thandar Thein

W

3rd International Conference on Computational Techniques and Artificial Intelligence (ICCTAI'2014) Feb. 11-12, 2014 Singapore

74

partial trajectories.

An extended k-means technique for clustering moving

objects was proposed in [9]. They use the direction as a

heuristic to determine the different number of cluster. They use

silhouette coefficient as a measure for quality of their approach

and they showed the performance and accuracy on both real

and synthetic dataset.

The authors [1] presented a density based k-Nearest

Neighbors Clustering Algorithm for trajectory data which can

resolve the sensitive user defined parameters problem in

DBSCAN. This cluster method has three main features;

discovering clusters of arbitrary shape, strong ability of

disposing noise; easily setting the input-parameter; and the

recommended value is more accurate than others. They use

two real datasets of moving vehicles in Milan (Italy) and

Athens (Greece) and extensive experiments were conducted.

To predict the locations of moving objects, clustered

periodical trajectories used a compact representation of spatio-

temporal trajectory in [10]. They suggested an algorithm by

using cluster's centroids to predict future locations with

experimental real-world data and evaluated the precision and

recall of the result.

A new partition and group framework for trajectory

clustering (TRACLUS) was proposed in [4]. In this algorithm,

a trajectory is partitioned into a set of line segments and then,

grouped similar line segments together into a cluster. For

partitioning algorithm, they used the minimum description

length (MDL) principle. They demonstrated that TRACLUS

correctly discover common sub-trajectories from real

trajectory data.

In this paper, a new clustering algorithm is purposed and we

show that how the algorithm is more efficient and effective

than others by comparing with real world trajectory dataset.

III. PRELIMINARY CONCEPTS

Despite the growing demands for diverse applications, there

have been few scalable tools available for mining massive and

sophisticated moving object data. MoveMine system has two

categories based on the nature of methods: pattern mining and

trajectory mining [10].

A. Pattern Mining

The first category is moving object pattern mining which

emphasizes the analysis of discrete locations with temporal

information [11]. It includes swarm pattern, periodic pattern

and follower pattern in Fig 1.

B. Trajectory Mining

Trajectory mining in Fig 2, focuses more on the mining of

trajectories associated with geometric shapes, such as

clustering and finding outliers from hurricane path across years

[11]. Trajectory clustering is the process of finding a set of

physical or abstract objects into classes of similar object by

applying the various clustering algorithms such as k-means, k-

nearest neighbors and DBSCAN etc depend on their trajectory

dataset.

Fig. 1 Pattern Mining

Trajectory outlier is a object that is different from or

inconsistent with the remaining set of data. It can be used by

outlier algorithm such as distribution-based, distance-based,

density-based and deviation-based [6]. Trajectory

classification is model construction for predicting the class

labels of moving objects based on their trajectories and other

features.

Fig. 2 Trajectory Mining

C. Clustering Techniques

Clustering is a dynamic field of research in data mining and

an unsupervised learning process because there are no class

labels to help. A cluster is a collection of data objects that are

similar to one another within the same cluster and are

dissimilar to the objects in other clusters. A cluster of data

objects can be treated collectively as one group and so may be

considered as a form of data compression. In general, the

major clustering methods can be classified into the following

categories.

A partitioning method first creates an initial set of k

partitions, where parameter k is the number of partitions to

construct. It then uses an iterative relocation technique that

attempts to improve the partitioning by moving objects from

one group to another. Typical partitioning methods include k-

means, k-medoids, CLARANS, and etc. A hierarchical method

creates a hierarchical decomposition of the given set of data

objects. The method can be classified as being either

agglomerative (bottom-up) or divisive (top-down), based on

how the hierarchical decomposition is formed. In density-

based method, it clusters objects based on the notion of

density. It either grows clusters according to the density of

neighborhood objects (such as in DBSCAN) or according to

some density function (such as in DENCLUE). A grid-based


75

method first quantizes the object space into a finite number of

cells that form a grid structure, and then performs clustering on

the grid structure. A model-based method hypothesizes a

model for each of the clusters and finds the best fit of the data

to that model [5].

D. Distance Measure

In our analysis scenario, we evaluate the distance between

Latitude and Longitude of points using Euclidean distance [5].

where X1 = (x11, x12, …, x1n) and X2 = (x21, x22, …, x2n),

shown in equation (1).

2

1 2 1 21(X ,X ) ( )

n

i iidist x x

(1)

IV. PROPOSED EFFICIENT CLUSTERING ALGORITHM

The proposed trajectory clustering algorithm consists of

three phases; partitioning; clustering; and grouping. Initially

we perform the partitioning phase by decomposing the

trajectory into k partition. In second, we apply the clustering

phase on each partition. In grouping phase, we reform the

separated clusters. Architecture of proposed clustering

algorithm is shown in Fig 3.

Fig. 3 System Flow for Proposed Clustering Algorithm

A. Partitioning

Firstly, we perform the partitioning phase on the trajectory

dataset in order to improve the efficiency of our algorithm. To

reduce the computation time in DBSCAN [7] which take more

time to perform the similarity measure, we make enhancing it

by dividing the data into k partitions. This algorithm mainly

emphasizes on huge amount of data and it requires a parameter

k for number of partitions.

B. Clustering

After partitioning the trajectory data, here, we apply the

clustering algorithm. Having k partitions from previous steps,

we now apply DBSCAN to cluster on each partition and it also

needs two parameters epsilon (eps) which is the distance

within we form cluster and minimum point (MinPts) in each

cluster respectively.

In this phase, it starts with arbitrary point that has not been

visited and then compute the similarity using the Euclidean

Distance in (1) for finding the neighbor points (NeighborPts)

within eps and if the size of neighbor is less than MinPts, we

eliminate the point as noise. For expanding the cluster, we find

the points which have similar distance of NeighborPts. All

points that are found in eps, neighbor are added into cluster

(C). This process continues until the connected cluster is

completely found.

Algorithm: Efficient Clustering Algorithm

Input: number of clusters K, epsilon eps, minimum point

MinPts, threshold t, trajectory dataset D

Output: set of trajectory clusters

Set C to be 0;

Partition (D,K);

Grouping (t);

/*PARTITIONING PHASE*/

Partition(D, K)

for each( k ε K) //partition the data to k

Clustering(D);

/*CLUSTERING PHASE*/

Clustering (D, eps, MinPts, t)

for each (d ε D) do

visited = P;// randomly selected

NeighborPts = regionQuery (P,eps) // find the

neighborPts by using distance function

if (sizeof(NeighborPts) < MinPts) then

Noise=P;

else

C++;

expandCluster (P, NeighborPts, C, eps, MinPts)

function expandCluster (P, NeighborPts, C, eps, MinPts)

C=P;

for each (n ε NeighborPts) do

if(P!=visited) then

visited=P;

NeighborPts= regionQuery(P,eps) // find the

neighborPts by using distance function

if (sizeof (NeighborPts) >= MinPts) then

NeighborPts = NeighborPts joined with NeighborPts

// join the NeighborPts

if (P is not yet member of any cluster C) then

C=P

retrun;

function regionQuery (P, eps)

Euclidean Distance//calculate distance

return all points within P’s eps-neighborhood

/*GROUPING PHASE*/

Grouping (t)

for each (c ε C)

mean(c) // calculate the mean value of each cluster

diff= difference of mean value of c with previous c

if(diff<t) then

c= join the two c; // join the two cluster

return joined clusters Fig. 4 Proposed Clustering Algorithm


76

C. Grouping

Now, we present the grouping of resulted clusters. In order

to improve the effectiveness of clustering algorithm, we group

the clusters in each partition. This phase is necessary to protect

the spread clusters without including in dense region. Due to

the spread clusters from partitioning phase, uncertain cluster

will produce.

In this phase, we calculate the mean values in each cluster,

then, comparing with each cluster to others. Here, we need to

define the threshold (t) for grouping the two or more cluster. If

the difference is less than threshold, group the clusters and we

verify the effectiveness of our algorithm by measuring with

Sum Squared Error (SSE).

V. EXPERIMENTAL EVALUATION

In this section, we evaluate the proposed clustering

algorithm by trajectory data set. We compare proposed

algorithm with DBSCAN. We also describe the data set used

in experiment and discuss the experimental results.

A. Experimental Study

The animal trajectory dataset is used to conduct the

effectiveness of the proposed clustering algorithm. It has been

generated by Starkey project. This data set contains the radio-

telemetry locations (with other information) of elk, deer, and

cattle from the years 1993 through 1996We use elk's

movements in 1993 and deer's movements in 1995 and cattle's

movement in 1993. Elk has 33 trajectories and 15422 points;

Deer 32 trajectories and 20065 points; Cattle 41 trajectories

and 19556 points. They have coordinates points which define

by Universal Transverse Mercator (UTM) and 20 fields such

as UTMGrid, UTMGridEast, UTMGridNorth and etc [13].

We extract the x, UTMGridEast and y, UTMGridNorth

coordinates from the telemetry data for our experiments. We

perform the evaluation of proposed clustering algorithm by

comparing with DBSCAN on trajectory data.

B. Performance Matrix

We show the performance of computation time on varying

data size of animal trajectory by making a comparison of

DBSCAN and proposed clustering algorithm. In our study we

find the fact that changing of data size effect the number of

cluster. We also attempt to measure the clustering quality by

employing Sum Squared Error (SSE). In order to measure the

clustering quality independent from the features used for

clustering and the number of clusters produced as a result our

analysis use SSE in (2).

2

1

(1/ 2 | | ( , ) )clus

i i

num

i

i x C y C

SSE C dis x y

(2)

We conduct the experiments on core i7 with 8GBytes of

main memory, running on Windows 7. We implement our

algorithm in jdk 1.7 on Eclipse Juno.

C. Result Discussion

The experiment studies the effect of changing the data size

among trajectory on clustering computation time for both

DBSCAN and proposed algorithm. In this experiment, we find

that our clustering algorithm performs well in large datasets.

This experiment shows that due to the increasing number of

data size as a result of less computation time. DBSCAN

algorithm takes more time for clustering of all objects. Fig 5

proofs that the differences of performance gain is more

significant on large datasets. Although changing the data size,

our algorithm changes the running time slightly.

0

500

1000

1500

2000

2500

3000

3500

4000

4500

10

00

20

00

30

00

40

00

50

00

60

00

70

00

80

00

90

00

Tim

e(m

ilis

ec)

Animal Trajectory Data Size

DBSCAN

Proposed Agorithm

Fig. 5 Performance Comparison of DBSCAN and Proposed

Algorithm

Fig 6 shows that SSE values of proposed algorithm. We

discover that SSE value of our algorithm is drastic compare

with DBSCAN. We define that error of proposed algorithm is

less than DBSCAN. It means that there are small numbers of

SSE. The small number of SSE, our algorithm correctly

classified.

0

5

10

15

20

25

30

35

40

45

50

SS

E

Animal Trajectory Data Size

DBSCAN

Proposed Algorithm

Fig. 6 Sum Square Error of DBSCAN vs. Proposed Algorithm

We also study the changing of data size effect the number of

cluster. The small number of cluster means an increase in

cluster size. Our algorithm well cluster without depending on

the changes of data size. We find that DBSCAN has dependent

of data size due to the expand cluster. So, it has more

computation time and large number of clusters. We address

these problems by an efficient clustering algorithm for large

trajectory datasets.


77

0

500

1000

1500

2000

2500

10

00

20

00

30

00

40

00

50

00

60

00

70

00

80

00

90

00

No

. o

f C

lust

er

Data Size

DBSCAN

Proposed Algorithm

Fig. 7 Accuracy of DBSCAN vs. Proposed Algorithm

D. Effect of Parameter Values

We study of changing the parameter value of eps on the

clustering result. If we use a smaller eps, we discover a larger

number of clusters. But if the value of eps is less than 30, we

find that only cluster discover in DBSCAN algorithm. We

have tested the effects of varying parameter values for both

algorithms. To study the effect of epsilon value on number of

cluster, we conduct the experiment with various epsilon

values. According to the experimental result shown in Fig 8,

we observe that epsilon value is less than 45, the number of

cluster is smaller. The epsilon value is between 45 and 55, the

optimal number of cluster is achieved. The epsilon value is

greater than 55, the number of cluster is decreasing. Fig. 8

shows the clustering result of optimal parameter using the

different values between 35 and 125.

0

50

100

150

200

250

300

350

400

35 45 55 65 75 85 95 105 115 125

No

. o

f C

lust

er

Epsilon(eps)

Proposed Algorithm

DBSCAN

Fig. 8 Effect of eps values on number of clusters

VI. CONCLUSION

In this paper, we propose an efficient clustering algorithm

for trajectory data. It composes of three phases; partitioning;

clustering and grouping. DBSCAN clustering algorithm cannot

cluster well in very large densities and distance calculation is

time consuming. To overcome time consuming issue, we

conducted the partitioning of dataset first and then trajectories

are clustered by applying DBSCAN algorithm in each

partition. Finally we perform the grouping phase to integrate

the spread clusters. To evaluate the effectiveness of proposed

algorithm we conducted the performance evaluation and

analyze the results by comparing proposed algorithm and

DBSCAN.

REFERENCES

[1] A.K. Akasapu, P.S. Rao, L. K. Sharma and S. K. Satpathy, “Density

Based k-Nearest Neighbors Clustering Algorithm for Trajectory Data”,

International Journal of Advanced Science and Technology, Vol. 31,

June 2011.

[2] G. McArdle, A. Tahir, M. Bertolotto, "Spatio-Temporal Clustering of

Movement Data: An Application to Trajectories Generated by Human-

Computer Interaction", ISPRS Annals of the Photogrammetry, Remote

Sensing and Spatial Information Sciences, Volume I-2, 2012, XXII

ISPRS Congress, 25 August – 01 September 2012, Melbourne,

Australia.

[3] J. Dai, "A Novel Moving Object Trajectories Clustering Approach for

Very Large Datasets", in: Proceeding of 2nd International Conference

on computer Science and Electronic Engineering (ICCSEE 2013).

[4] J.G. Lee, J. Han, and K.-Y. Whang. “Trajectory Clustering: A partition-

and-group framework”, in SIGMOD '07: Proceeding of the 2007 ACM

SIGMOD International Conference on Management of Data. New

Yourk, NY, USA: ACM, 2007. p. 593-604.

[5] J. Han and M. Kamber, "Data Ming: Concept and Technique", 2nd

edition, Morgan Kaufmann, p. 348 and 398, 2006.

[6] J. G. Lee, J. Han and X. Li, "Trajectory Outlier Detection: A Partition

and Detect Framework", Data Engineering 2008,ICDE, 2008,

IEEE International Conference, April 7-12,2008. p. 140-149. [7] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. "A density-based

algorithm for discovering clusters in large spatial databases", in:

Proceeding of 1996 International Conference Knowledge Discovery

and Data Mining (KDD’96), pages 226–231, Portland, OR, Aug. 1996.

[8] N.Ferreira1, J. Klosowski, C. E. Scheidegger, C. T. Silva1, " Vector

Field k-Means: Clustering Trajectories by Fitting Multiple Vector

Fields", Eurographics Conference on Visualization (EuroVis)

2013, Volume 32 (2013), Number 3. [9] O. Omnia, H. M.O. Mokhtar, M.E. El-Sharkawi, “An extended k-means

technique for clustering moving objects”, Egyptian Informatics Journal,

Cairo University, March 2011, Volume 12, Issue 1, p.45-51. [10] S.Elnekave, M. Last, O. Maimon, "Predicting Future Locations Using

Clusters' Centroids", in: Proceeding of 15th annual ACM international

symposium on Advances in geographic information systems,

ACMGIS’07, November 7–9, 2007, Seattle, WA, USA.

[11] Z. Li, M. Ji, J.G. Lee, L.A. Tang, Y. Yu, J. Han and R. Kays,

"MoveMine: Mining Moving Object Databases", in: Proceeding of

SIGMOD’10, ACM SIGMOD International Conference on

Management of Data, June 6–11, 2010, Indianapolis, Indiana, USA.

[12] Z. Li, J. Han, M. Ji, L. Tang, Y. Yu, B. Ding, “MoveMine: Mining

Moving Object Data for Discovery of Animal Movement Patterns”,

Journal of ACM Transactions on Intelligent Systems and Technology

(TIST), Volume 2 Issue 4, July 2011, Article 37, ACM New York, NY,

USA .

[13] http:// www.fs.fed.us/pnw/starkey/data/tables


78

Documents

An Efficient Clustering Algorithm for Moving Object ...psrcentre.org/images/extraimages/16 214368.pdf · Evidence of increasing and continuous diffusion of low cost GPS devices, it