Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Abstract— Evidence of increasing and continuous diffusion of
low cost GPS devices, it is becoming the challenges to analyze the
moving objects trajectory data. To analyze the moving object
trajectories, there is a need for mechanism that how to effectively
cluster on moving objects. Trajectory clustering has long been an
important research direction on move mining, but still remains which
algorithm is more effective among existing algorithms. In this paper,
we propose a clustering algorithm which is based on Density-Based
Spatial Clustering of Applications with Noise (DBSCAN). It cannot
cluster data sets well with large differences in densities. We address
this problem by proposed clustering algorithm which enhanced the
DBSCAN by solving time consuming. Finally we evaluate an
efficient trajectory clustering algorithm with real trajectory dataset by
comparing with DBSCAN. Evaluation results show that proposed
clustering algorithm can provide better performance and minimal
error than DBSCAN.
Keywords—DBSCAN, MoveMine, Moving Object Trajectory,
Trajectory Clustering.
I. INTRODUCTION
ITH a widespread use of location aware devices such as
mobile phones and GPS-enabled devices, huge amount
of moving object data have been collected. This leads to a
growing research area with automatic analysis of animal
behavior and traffic management using computer vision
techniques. Many researchers pay a lot of attention on
trajectory data modeling, indexing and query processing issues
for trajectories and proposing new models specifically
dedicated to moving objects and their trajectories.
Based on the above motivation, MoveMine system is
designed for the discovery of various kinds of movement
patterns and knowledge in numerous applications such traffic
control, climatological forecast and animal movement pattern.
For instance, the animal migration demonstrates that there is a
temporally and spatially correlation with the movement of
creatures. In biological domains, many researchers discovered
that some wild animals form large social groups when
migration occurs. The study of animals' social behavior and
wildlife migration are more concerned with a group of animals'
movement patterns than each individual's.
MoveMine System is integrated into two functions: moving
Hnin Su Khaing is with University of Computer Studies, Mandalay,
Myanmar. (e-mail: [email protected]).
Thandar Thein is with University of Computer Studies, Yangon,
Myanmar. (e-mail: [email protected]).
object pattern mining and trajectory mining. Trajectory data
associated with moving objects is one of the fields which have
increased in volume considerably. This indication becomes a
challenge of finding moving animal belonging to the same
group. Trajectory clustering take part in trajectory mining and
there exits many algorithms using data mining techniques.
In general, there are a lot of data mining methods developed
for analyzing moving animal based on the nature of methods.
Especially, the data analysis task of clustering is to find objects
that have move in a similar way. DBSCAN is the one of the
algorithms for clustering the trajectory data. It can find a
number of clusters starting from the estimated density
distribution of corresponding nodes but it cannot well cluster
with very large densities.
The goal of this work is to propose an efficient clustering
algorithm which can solve the problem of DBSCAN for
moving object trajectories. This algorithm is composed of
three phases: partitioning; clustering and grouping. In
partitioning phase, we divide the trajectory data into 'k'
partitions. Then, we develop the clustering phase by exploiting
with DBSCAN and finally we group the separated clusters.
The rest of this paper is organized as follows: Section II
presents the related work and Section III describes background
theory for trajectory clustering. In Section IV, proposed
clustering algorithm is discussed and evaluation is conducted
in Section V. Finally conclusion is conducted in Section VI.
II. RELATED WORK
Trajectory clustering, one of which plays a major role in
moving object trajectory mining. There are a lot of studies for
trajectory data such as transportation management and
behavioral analysis. The author [3] observed that the moving
objects similarity between trajectory sets. He designed a
similarity metric to find the similarity between trajectory sets
where each set is generated by a moving object and based on
these measures, he proposed a clustering algorithm to cluster
trajectory sets. In order to prove the effective and efficiency of
algorithm his algorithm, he conducted with intensive
experiments using mobile phones data.
To reduce the estimating of complex parameters,
complexity and computational cost for human analyst, a vector
field k-Means clustering technique was proposed in [8] that
took together ideas from visualization [2], data clustering and
scalar field design to find a locally optimal cluster and
demonstrated that how can find global patterns and handle
An Efficient Clustering Algorithm for
Moving Object Trajectories
Hnin Su Khaing, and Thandar Thein
W
3rd International Conference on Computational Techniques and Artificial Intelligence (ICCTAI'2014) Feb. 11-12, 2014 Singapore
74
partial trajectories.
An extended k-means technique for clustering moving
objects was proposed in [9]. They use the direction as a
heuristic to determine the different number of cluster. They use
silhouette coefficient as a measure for quality of their approach
and they showed the performance and accuracy on both real
and synthetic dataset.
The authors [1] presented a density based k-Nearest
Neighbors Clustering Algorithm for trajectory data which can
resolve the sensitive user defined parameters problem in
DBSCAN. This cluster method has three main features;
discovering clusters of arbitrary shape, strong ability of
disposing noise; easily setting the input-parameter; and the
recommended value is more accurate than others. They use
two real datasets of moving vehicles in Milan (Italy) and
Athens (Greece) and extensive experiments were conducted.
To predict the locations of moving objects, clustered
periodical trajectories used a compact representation of spatio-
temporal trajectory in [10]. They suggested an algorithm by
using cluster's centroids to predict future locations with
experimental real-world data and evaluated the precision and
recall of the result.
A new partition and group framework for trajectory
clustering (TRACLUS) was proposed in [4]. In this algorithm,
a trajectory is partitioned into a set of line segments and then,
grouped similar line segments together into a cluster. For
partitioning algorithm, they used the minimum description
length (MDL) principle. They demonstrated that TRACLUS
correctly discover common sub-trajectories from real
trajectory data.
In this paper, a new clustering algorithm is purposed and we
show that how the algorithm is more efficient and effective
than others by comparing with real world trajectory dataset.
III. PRELIMINARY CONCEPTS
Despite the growing demands for diverse applications, there
have been few scalable tools available for mining massive and
sophisticated moving object data. MoveMine system has two
categories based on the nature of methods: pattern mining and
trajectory mining [10].
A. Pattern Mining
The first category is moving object pattern mining which
emphasizes the analysis of discrete locations with temporal
information [11]. It includes swarm pattern, periodic pattern
and follower pattern in Fig 1.
B. Trajectory Mining
Trajectory mining in Fig 2, focuses more on the mining of
trajectories associated with geometric shapes, such as
clustering and finding outliers from hurricane path across years
[11]. Trajectory clustering is the process of finding a set of
physical or abstract objects into classes of similar object by
applying the various clustering algorithms such as k-means, k-
nearest neighbors and DBSCAN etc depend on their trajectory
dataset.
Fig. 1 Pattern Mining
Trajectory outlier is a object that is different from or
inconsistent with the remaining set of data. It can be used by
outlier algorithm such as distribution-based, distance-based,
density-based and deviation-based [6]. Trajectory
classification is model construction for predicting the class
labels of moving objects based on their trajectories and other
features.
Fig. 2 Trajectory Mining
C. Clustering Techniques
Clustering is a dynamic field of research in data mining and
an unsupervised learning process because there are no class
labels to help. A cluster is a collection of data objects that are
similar to one another within the same cluster and are
dissimilar to the objects in other clusters. A cluster of data
objects can be treated collectively as one group and so may be
considered as a form of data compression. In general, the
major clustering methods can be classified into the following
categories.
A partitioning method first creates an initial set of k
partitions, where parameter k is the number of partitions to
construct. It then uses an iterative relocation technique that
attempts to improve the partitioning by moving objects from
one group to another. Typical partitioning methods include k-
means, k-medoids, CLARANS, and etc. A hierarchical method
creates a hierarchical decomposition of the given set of data
objects. The method can be classified as being either
agglomerative (bottom-up) or divisive (top-down), based on
how the hierarchical decomposition is formed. In density-
based method, it clusters objects based on the notion of
density. It either grows clusters according to the density of
neighborhood objects (such as in DBSCAN) or according to
some density function (such as in DENCLUE). A grid-based
3rd International Conference on Computational Techniques and Artificial Intelligence (ICCTAI'2014) Feb. 11-12, 2014 Singapore
75
method first quantizes the object space into a finite number of
cells that form a grid structure, and then performs clustering on
the grid structure. A model-based method hypothesizes a
model for each of the clusters and finds the best fit of the data
to that model [5].
D. Distance Measure
In our analysis scenario, we evaluate the distance between
Latitude and Longitude of points using Euclidean distance [5].
where X1 = (x11, x12, …, x1n) and X2 = (x21, x22, …, x2n),
shown in equation (1).
2
1 2 1 21(X ,X ) ( )
n
i iidist x x
(1)
IV. PROPOSED EFFICIENT CLUSTERING ALGORITHM
The proposed trajectory clustering algorithm consists of
three phases; partitioning; clustering; and grouping. Initially
we perform the partitioning phase by decomposing the
trajectory into k partition. In second, we apply the clustering
phase on each partition. In grouping phase, we reform the
separated clusters. Architecture of proposed clustering
algorithm is shown in Fig 3.
Fig. 3 System Flow for Proposed Clustering Algorithm
A. Partitioning
Firstly, we perform the partitioning phase on the trajectory
dataset in order to improve the efficiency of our algorithm. To
reduce the computation time in DBSCAN [7] which take more
time to perform the similarity measure, we make enhancing it
by dividing the data into k partitions. This algorithm mainly
emphasizes on huge amount of data and it requires a parameter
k for number of partitions.
B. Clustering
After partitioning the trajectory data, here, we apply the
clustering algorithm. Having k partitions from previous steps,
we now apply DBSCAN to cluster on each partition and it also
needs two parameters epsilon (eps) which is the distance
within we form cluster and minimum point (MinPts) in each
cluster respectively.
In this phase, it starts with arbitrary point that has not been
visited and then compute the similarity using the Euclidean
Distance in (1) for finding the neighbor points (NeighborPts)
within eps and if the size of neighbor is less than MinPts, we
eliminate the point as noise. For expanding the cluster, we find
the points which have similar distance of NeighborPts. All
points that are found in eps, neighbor are added into cluster
(C). This process continues until the connected cluster is
completely found.
Algorithm: Efficient Clustering Algorithm
Input: number of clusters K, epsilon eps, minimum point
MinPts, threshold t, trajectory dataset D
Output: set of trajectory clusters
Set C to be 0;
Partition (D,K);
Grouping (t);
/*PARTITIONING PHASE*/
Partition(D, K)
for each( k ε K) //partition the data to k
Clustering(D);
/*CLUSTERING PHASE*/
Clustering (D, eps, MinPts, t)
for each (d ε D) do
visited = P;// randomly selected
NeighborPts = regionQuery (P,eps) // find the
neighborPts by using distance function
if (sizeof(NeighborPts) < MinPts) then
Noise=P;
else
C++;
expandCluster (P, NeighborPts, C, eps, MinPts)
function expandCluster (P, NeighborPts, C, eps, MinPts)
C=P;
for each (n ε NeighborPts) do
if(P!=visited) then
visited=P;
NeighborPts= regionQuery(P,eps) // find the
neighborPts by using distance function
if (sizeof (NeighborPts) >= MinPts) then
NeighborPts = NeighborPts joined with NeighborPts
// join the NeighborPts
if (P is not yet member of any cluster C) then
C=P
retrun;
function regionQuery (P, eps)
Euclidean Distance//calculate distance
return all points within P’s eps-neighborhood
/*GROUPING PHASE*/
Grouping (t)
for each (c ε C)
mean(c) // calculate the mean value of each cluster
diff= difference of mean value of c with previous c
if(diff<t) then
c= join the two c; // join the two cluster
return joined clusters Fig. 4 Proposed Clustering Algorithm
3rd International Conference on Computational Techniques and Artificial Intelligence (ICCTAI'2014) Feb. 11-12, 2014 Singapore
76
C. Grouping
Now, we present the grouping of resulted clusters. In order
to improve the effectiveness of clustering algorithm, we group
the clusters in each partition. This phase is necessary to protect
the spread clusters without including in dense region. Due to
the spread clusters from partitioning phase, uncertain cluster
will produce.
In this phase, we calculate the mean values in each cluster,
then, comparing with each cluster to others. Here, we need to
define the threshold (t) for grouping the two or more cluster. If
the difference is less than threshold, group the clusters and we
verify the effectiveness of our algorithm by measuring with
Sum Squared Error (SSE).
V. EXPERIMENTAL EVALUATION
In this section, we evaluate the proposed clustering
algorithm by trajectory data set. We compare proposed
algorithm with DBSCAN. We also describe the data set used
in experiment and discuss the experimental results.
A. Experimental Study
The animal trajectory dataset is used to conduct the
effectiveness of the proposed clustering algorithm. It has been
generated by Starkey project. This data set contains the radio-
telemetry locations (with other information) of elk, deer, and
cattle from the years 1993 through 1996We use elk's
movements in 1993 and deer's movements in 1995 and cattle's
movement in 1993. Elk has 33 trajectories and 15422 points;
Deer 32 trajectories and 20065 points; Cattle 41 trajectories
and 19556 points. They have coordinates points which define
by Universal Transverse Mercator (UTM) and 20 fields such
as UTMGrid, UTMGridEast, UTMGridNorth and etc [13].
We extract the x, UTMGridEast and y, UTMGridNorth
coordinates from the telemetry data for our experiments. We
perform the evaluation of proposed clustering algorithm by
comparing with DBSCAN on trajectory data.
B. Performance Matrix
We show the performance of computation time on varying
data size of animal trajectory by making a comparison of
DBSCAN and proposed clustering algorithm. In our study we
find the fact that changing of data size effect the number of
cluster. We also attempt to measure the clustering quality by
employing Sum Squared Error (SSE). In order to measure the
clustering quality independent from the features used for
clustering and the number of clusters produced as a result our
analysis use SSE in (2).
2
1
(1/ 2 | | ( , ) )clus
i i
num
i
i x C y C
SSE C dis x y
(2)
We conduct the experiments on core i7 with 8GBytes of
main memory, running on Windows 7. We implement our
algorithm in jdk 1.7 on Eclipse Juno.
C. Result Discussion
The experiment studies the effect of changing the data size
among trajectory on clustering computation time for both
DBSCAN and proposed algorithm. In this experiment, we find
that our clustering algorithm performs well in large datasets.
This experiment shows that due to the increasing number of
data size as a result of less computation time. DBSCAN
algorithm takes more time for clustering of all objects. Fig 5
proofs that the differences of performance gain is more
significant on large datasets. Although changing the data size,
our algorithm changes the running time slightly.
0
500
1000
1500
2000
2500
3000
3500
4000
4500
10
00
20
00
30
00
40
00
50
00
60
00
70
00
80
00
90
00
Tim
e(m
ilis
ec)
Animal Trajectory Data Size
DBSCAN
Proposed Agorithm
Fig. 5 Performance Comparison of DBSCAN and Proposed
Algorithm
Fig 6 shows that SSE values of proposed algorithm. We
discover that SSE value of our algorithm is drastic compare
with DBSCAN. We define that error of proposed algorithm is
less than DBSCAN. It means that there are small numbers of
SSE. The small number of SSE, our algorithm correctly
classified.
0
5
10
15
20
25
30
35
40
45
50
SS
E
Animal Trajectory Data Size
DBSCAN
Proposed Algorithm
Fig. 6 Sum Square Error of DBSCAN vs. Proposed Algorithm
We also study the changing of data size effect the number of
cluster. The small number of cluster means an increase in
cluster size. Our algorithm well cluster without depending on
the changes of data size. We find that DBSCAN has dependent
of data size due to the expand cluster. So, it has more
computation time and large number of clusters. We address
these problems by an efficient clustering algorithm for large
trajectory datasets.
3rd International Conference on Computational Techniques and Artificial Intelligence (ICCTAI'2014) Feb. 11-12, 2014 Singapore
77
0
500
1000
1500
2000
2500
10
00
20
00
30
00
40
00
50
00
60
00
70
00
80
00
90
00
No
. o
f C
lust
er
Data Size
DBSCAN
Proposed Algorithm
Fig. 7 Accuracy of DBSCAN vs. Proposed Algorithm
D. Effect of Parameter Values
We study of changing the parameter value of eps on the
clustering result. If we use a smaller eps, we discover a larger
number of clusters. But if the value of eps is less than 30, we
find that only cluster discover in DBSCAN algorithm. We
have tested the effects of varying parameter values for both
algorithms. To study the effect of epsilon value on number of
cluster, we conduct the experiment with various epsilon
values. According to the experimental result shown in Fig 8,
we observe that epsilon value is less than 45, the number of
cluster is smaller. The epsilon value is between 45 and 55, the
optimal number of cluster is achieved. The epsilon value is
greater than 55, the number of cluster is decreasing. Fig. 8
shows the clustering result of optimal parameter using the
different values between 35 and 125.
0
50
100
150
200
250
300
350
400
35 45 55 65 75 85 95 105 115 125
No
. o
f C
lust
er
Epsilon(eps)
Proposed Algorithm
DBSCAN
Fig. 8 Effect of eps values on number of clusters
VI. CONCLUSION
In this paper, we propose an efficient clustering algorithm
for trajectory data. It composes of three phases; partitioning;
clustering and grouping. DBSCAN clustering algorithm cannot
cluster well in very large densities and distance calculation is
time consuming. To overcome time consuming issue, we
conducted the partitioning of dataset first and then trajectories
are clustered by applying DBSCAN algorithm in each
partition. Finally we perform the grouping phase to integrate
the spread clusters. To evaluate the effectiveness of proposed
algorithm we conducted the performance evaluation and
analyze the results by comparing proposed algorithm and
DBSCAN.
REFERENCES
[1] A.K. Akasapu, P.S. Rao, L. K. Sharma and S. K. Satpathy, “Density
Based k-Nearest Neighbors Clustering Algorithm for Trajectory Data”,
International Journal of Advanced Science and Technology, Vol. 31,
June 2011.
[2] G. McArdle, A. Tahir, M. Bertolotto, "Spatio-Temporal Clustering of
Movement Data: An Application to Trajectories Generated by Human-
Computer Interaction", ISPRS Annals of the Photogrammetry, Remote
Sensing and Spatial Information Sciences, Volume I-2, 2012, XXII
ISPRS Congress, 25 August – 01 September 2012, Melbourne,
Australia.
[3] J. Dai, "A Novel Moving Object Trajectories Clustering Approach for
Very Large Datasets", in: Proceeding of 2nd International Conference
on computer Science and Electronic Engineering (ICCSEE 2013).
[4] J.G. Lee, J. Han, and K.-Y. Whang. “Trajectory Clustering: A partition-
and-group framework”, in SIGMOD '07: Proceeding of the 2007 ACM
SIGMOD International Conference on Management of Data. New
Yourk, NY, USA: ACM, 2007. p. 593-604.
[5] J. Han and M. Kamber, "Data Ming: Concept and Technique", 2nd
edition, Morgan Kaufmann, p. 348 and 398, 2006.
[6] J. G. Lee, J. Han and X. Li, "Trajectory Outlier Detection: A Partition
and Detect Framework", Data Engineering 2008,ICDE, 2008,
IEEE International Conference, April 7-12,2008. p. 140-149. [7] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. "A density-based
algorithm for discovering clusters in large spatial databases", in:
Proceeding of 1996 International Conference Knowledge Discovery
and Data Mining (KDD’96), pages 226–231, Portland, OR, Aug. 1996.
[8] N.Ferreira1, J. Klosowski, C. E. Scheidegger, C. T. Silva1, " Vector
Field k-Means: Clustering Trajectories by Fitting Multiple Vector
Fields", Eurographics Conference on Visualization (EuroVis)
2013, Volume 32 (2013), Number 3. [9] O. Omnia, H. M.O. Mokhtar, M.E. El-Sharkawi, “An extended k-means
technique for clustering moving objects”, Egyptian Informatics Journal,
Cairo University, March 2011, Volume 12, Issue 1, p.45-51. [10] S.Elnekave, M. Last, O. Maimon, "Predicting Future Locations Using
Clusters' Centroids", in: Proceeding of 15th annual ACM international
symposium on Advances in geographic information systems,
ACMGIS’07, November 7–9, 2007, Seattle, WA, USA.
[11] Z. Li, M. Ji, J.G. Lee, L.A. Tang, Y. Yu, J. Han and R. Kays,
"MoveMine: Mining Moving Object Databases", in: Proceeding of
SIGMOD’10, ACM SIGMOD International Conference on
Management of Data, June 6–11, 2010, Indianapolis, Indiana, USA.
[12] Z. Li, J. Han, M. Ji, L. Tang, Y. Yu, B. Ding, “MoveMine: Mining
Moving Object Data for Discovery of Animal Movement Patterns”,
Journal of ACM Transactions on Intelligent Systems and Technology
(TIST), Volume 2 Issue 4, July 2011, Article 37, ACM New York, NY,
USA .
[13] http:// www.fs.fed.us/pnw/starkey/data/tables
3rd International Conference on Computational Techniques and Artificial Intelligence (ICCTAI'2014) Feb. 11-12, 2014 Singapore
78