Upload
bertalan
View
216
Download
4
Embed Size (px)
Citation preview
Route separation strategies for human movement datasets
Marcell Feher, Krisztian Fekete, Kristof Csorba, Bertalan Forstner PhD.
Budapest University of Technology and EconomicsBudapest, Hungary
Email: [marcell.feher | krisztian.fekete | kristof.csorba | bertalan.forstner]@aut.bme.hu
Abstract—Learning patterns of human movement is a com-plex and hard task, including several computationally expensivealgorithms. This issue has even higher emphasis in mobileenvironment, since handheld devices contain significantly lessmemory and computing power than a usual PC does. In thispaper we are going to compare novel, mobile-optimized meth-ods for separating trajectories in a human routine recognitionframework.
Keywords-algorithm;gps;trajectory;pattern
I. INTRODUCTION
The growing penetration of internet connectivity all over
the world raised the populatity and spread of community
networks and smartphones in the last decade. Both of them
has extreme large user base, and could expose advantages
of each other. For instance, Global Posioning System (GPS)
modules of modern smartphones and feature phones offers
a good opportunity for community networks to integrate
location-based services (LBS). In case an LBS provider
releases a mobile application instead of using an optimized
website for small screens, exploiting GPS functionality
becomes possible. Beyond simple features based on spatial
information like sharing my current position with friends,
or check-in at points of interests, complex services would
be possible to be implemented. Our research focuses on a
recommendation module of a mobile community network
for finding the most suitable time and place of a meeting,
based on trajectory patterns (referred as routines) of users.
The conception of this service - entitled MeetYouThere- is based on the assumption, that the majority of people
follow predictable patterns during their everyday courses
[1]. Our main idea was utilize this well-known phenomenon
by designing a framework which is capable of tracking
positions of users and extracting patterns of the emerging
spatiotemporal data. These may occur every day like going
to work on weekdays, or rarely, for instance visiting friends
on every second saturday.
Identifying routines consists of several practical and math-
ematical problems such as reducing the size of the measure-
ment data, processing new locations, finding matching tracks
or determining proximity of trajectories. In this paper we are
examining the task of deciding if two tracks are similar or
not.
Trajectories consists of both spatial and temporal informa-
tion (where and when the movement occured). Within the
confines of this paper we focus on the former only, whereas
time dimension is not taken into consideration at proximity
measurement. In this manner we allow tracks to be similar
even if they came off at different times.
The track of a trajectory can easily be represented by
directed path graph G(V,E), where V holds the points
of orientation changes, and E stands for the edges be-
tween vertices, showing the path between the points. To
the first approximation, set of the vertices consists of (x, y)pairs, where x is proportional to the longitunial, and yis proportional to the latitudinal coordinate of the spatial
location. Vertices of the graph may be located at almost any
position, therefore quantity of common points among real
life trajectories is extremely low, as shown on Figure 1(a).
However, in the field of the presented application
environment (MeetYouThere), some assumptions could
highly increase this ratio, resulting a significant speedup
and simplicity of the proximity computation process.
Assumptions:
1) Vast majority of potential users live in an urban area
2) Meeting point arrangement is effective within cities
3) Almost every trajectories can be described as a se-
quence of street segments in a city
Adopting these assumptions allows the quantization of
arbitrary coordinates onto the street network. Thus vertices
of all path graphs are originated from the same domain, the
set of intersections, as shown on Figure 1(a)(b).
In this paper we are going to show methods for determin-
ing proximity of routes which are given this way, discuss
advantages and disadvantages of each from the perspective
of meeting point arrangement.
The algoritmical approach of this problem can be formal-
ized as seen on Algorithm 1.
2012 19th IEEE International Conference and Workshops on Engineering of Computer-Based Systems
978-0-7695-4664-3/12 $26.00 © 2012 IEEE
DOI 10.1109/ECBS.2012.35
150
2012 IEEE 19th International Conference and Workshops on Engineering of Computer-Based Systems
978-0-7695-4664-3/12 $26.00 © 2012 IEEE
DOI 10.1109/ECBS.2012.35
150
2012 IEEE 19th International Conference and Workshops on Engineering of Computer-Based Systems
978-0-7695-4664-3/12 $26.00 © 2012 IEEE
DOI 10.1109/ECBS.2012.35
150
(a) Tracks with arbitrary coordi-nates
(b) Tracks quantizated on map
Figure 1.
Algorithm 1 Pseudo code of route proximation decision
1: function IsRoutesNear(Route1, Route2)
2: RouteDiffstart = calculate spatial distance between
Route10 and Route203: RouteDiffend = calculate spatial distance between
Route1length(Route1) and Route2length(Route2)
4: if RouteDiffstart >MAX ENDPOINT DIST&&RouteDiffend >MAX ENDPOINT DIST then
5: return false6: else7: if MeasureProximity(Route1, Route2) ¡ TRESHOLD
then8: return true9: else
10: return false11: end if12: end if
Accordingly our goal is to show possible algorithms of
the function MeasureProximity.
This paper organizes as follows. In the next section, exist-
ing solutions of route proximity measurement are discussed,
than our contribution of examinig several methods is shown.
After that, further fields of our research are introduced,
followed by a conclusion and acknowledgments. The paper
is finished by the list of references.
II. RELATED WORK
Proximity of tracks could be approached many ways, for
instance inexact graph matching [2], [3], [4], [5], [6]. This
field proved to be successful in different research areas like
character recognition [7], [8], indexing images and videos
[9], [10], image registration [11], sketch-photo recognition
[?] or shape analysis [12]. Difference between graphs could
be measured in numerous ways, but one of them gained
much research interest in the previous years. Graph Edit
Distance (referred as GED) is defined as the least expensive
set of graph operations that are needed to make two graphs
isomorph.
GED is not a single method or index, several varieties has
been introduced since it has been published by Sanfeliu and
Fu [13]. Their initial idea was to compute nodes and edges
altogether with deletions and insertations of them, which are
neccessary to transform model graph to data graph. Based on
this method, extensions of subgraph edit distance [14], [15],
distance between strings [6], relationship between maximum
common subgraph size and GED, and further areas has
developed.
Altough GED is an excellent index for various purposes,
calculating route proximity cannot apply that fully, since it
lacks the information of geographical distance between the
edges and nodes of the corresponding graphs.
III. METHODS OF ROUTE PROXIMITY MEASUREMENT
In this chapter we are going to examine the possibilities
emerging for comparing the similarity of tracks. The primary
objective of the shown methods is to determine whether the
tested trajectories belong to the same routine by computing
the extent they differ from each other.
A. Distance of centroids
To the first approximation, we take the centroid of paths
into the focus of analysis, and use their distance to determine
the proximity of the original tracks. The calculation of
centroid is shown on Algortihm 1.
Algorithm 2 Pseudo code for centroid distance calculation
1: function CentroidDistance(Route1, Route2)
2: n1 = number of vertices in Route1
3: n2 = number of vertices in Route2
4: Centroid1 = ((∑n1
i=1 Route1ix)/n1,
(∑n1
i=1 Route1iy)/n1)
5: Centroid2 = ((∑n2
i=1 Route2ix)/n2,
(∑n2
i=1 Route2iy)/n2)
6: Distance = calculate spatial distance between Centroid1
and Centroid2
7: return Distance
As shown on Figure 2, significantly different tracks might
produce small distance between centroids. In spite of this
weak point, the method is benefical when routes consists
of only few edges, and great symmetric excursions are not
possible.
B. Rate of identical edges
For measuring similarity of the routes, when path graphs
are built using the same set of vertices, an appropriate
method could be looking for identical sections. In this case
similarity indicator is non other, than the ratio of matching
and total number of sections. The closer the paths, the higher
this rate is. In this case we are not comparing the vertices,
151151151
Figure 2. Centroids (vertices with blue stroke) of two significantly differentroutes
but looking for overlapping edges. If both tracks consist
of the same number of edges, the rate is unambiguous:
SimilarityRate = #(identical edges) / #(all edges). On
the contrary, the two graphs will produce different values,
since quantitiy of overlapping edges is divided by their own
edge count. In this case, considering the main goal of the
algorithm, less mistake is taking the smaller ratio as an
indicator of proximity. The pseudo code of the algorithm
is as follows:
Algorithm 3 Pseudo code for determining rate of identical
edges
1: procedure IdenticalEdgesRate(Route1, Route2)
2: n1 = number of edges in Route1
3: n2 = number of edges in Route2
4: identicalEdges = calculate number of identical edges in
Route1 and Route2
5: if n1 >= n2 then6: return identicalEdges/n17: else8: return identicalEdges/n29: end if
Although an advantage of this method is that it indicates
distance of routes on their full track, it also has disadvatages.
On the one hand, finding identical edges cannot be per-
formed in linear time, on the other hand, it does not show
how far the paths actually are.
As shown on Figure 3, small amount of differing sections
may cause significant distinction of routes. On the figure, for
instance, the algorithm produces 0.75 identity ratio, which
leads to the wrong conclusion of 75 per cent of the routes
are the same.
Figure 3. False high identical segment rate
C. Space between routes
The algorithm, to be discussed in this section, meant
to eliminate the main disadvantage of the former method,
obscuring real distance of routes. If geospatial range of
tracks is important, calculating the size of region between
them seems benefical. This operation is defined on polygons,
so every violating factor should be eliminated, which stalls
closeness. In this particular case, endpoints of paths may
bring problems. Altought according to algorithm 1. they
must be close to each other, being identical is not demanded.
Then the way of forming a polygon is connecting first and
last vertices of the routes, as shown on Figure 4. Accordingly
the area bordered by the two paths can be calculated, which
provides a good indicator of the difference.
Figure 4. Space between routes
Examining disatvantages of the method, aggregation leads
to problems again. The algorithm neglects the maximum
distance of the routes, it provides the sum of deflection only.
D. Segmenting tracks
For all the shown methods, problem emerges because
of routes under examination are processed in their full
lenght, therefore local and global deviations cannot be told
152152152
apart. Under particular circumstances, this leads to a wrong
decision at all times when determining proximity.
In order to examine paths on local level, they must be
degraded into smaller pieces, referred as segments. This
procedure raises series of problems and questions, which
will be presented in detail in later articles.
1) Quantity of segments: Determining the number of
segments is not trivial. On the one hand, it should cut routes
into as many segments as possible in order to gain a fine
granularity, while on the other hand, by cutting track into
too short pieces, we lose information about the bearing of
routes. Taking current speed of tracks into consideration
while segmenting is also an open question.
2) Cutting direction: Segmentation direction is yet a
field a research. Current possibilities includes parallell to
coordinate axis, perpendicularly to dominant direction of
progress or adaptively changing during the track.
IV. FUTURE WORK
The presented contribution still is under development,
and further tasks of the complex meeting recommendation
service must be researched. The authors are busy finding
answers to problems like:
• proper segmentation method
• suitable algorithm for quantizating GPS coordinates
onto street network
• new methods of route proximity measurement
• different methods may be best fit on different route
segments, research partitioned proximity calculation
algorithms
• teaching system by the help of user feedbacks
V. CONCLUSION
Offering suitable meeting points is a complex and hard
task, from which a small section, route proximity measure-
ment has been discussed. We shown methods altogether with
their advantages and drawbacks, substatively and compared
to each other. Although all of them are useful in particular
cases, a smart combination may be suitable in order to
achieve the global intent, determining proximity of arbitrary
routes.
ACKNOWLEDGMENTS
This work is connected to the scientific program of the
”Development of quality-oriented and cooperative R+D+I
strategy and functional model at BUTE” project. This project
is supported by the New Hungary Development Plan (Project
ID: TAMOP-4.2.1/B-09/1/KMR-2010-0002).
REFERENCES
[1] A.-L. Barabasi, Bursts - The Hidden Pattern Behind Every-thing We Do. Dutton Adult, 2010.
[2] S. Umeyama, “An eigendecomposition approach to weightedgraph matching problems,” IEEE Transactions on PatternAnalysis and Machine Intelligence, vol. 10, pp. 695–703,1988.
[3] H. Bunke, “Recent developments in graph matching,” PatternRecognition, International Conference on, vol. 2, p. 2117,2000.
[4] T. Caelli and S. Kosinov, “An eigenspace projection clusteringmethod for inexact graph matching,” IEEE Transactions onPattern Analysis and Machine Intelligence, vol. 26, pp. 515–519, 2004.
[5] A. D. J. Cross, R. C. Wilson, and E. R. Hancock, “Inexactgraph matching using genetic search,” Pattern Recognition,vol. 30, no. 6, pp. 953 – 970, 1997.
[6] R. A. Wagner and M. J. Fischer, “The string-to-string correc-tion problem,” J. ACM, vol. 21, pp. 168–173, January 1974.
[7] J. Rocha and T. Pavlidis, “A shape analysis model withapplications to a character recognition system,” IEEE Trans-actions on Pattern Analysis and Machine Intelligence, vol. 16,pp. 393–404, 1994.
[8] Y.-K. Wang, K. chin Fan, and J. tzong Horng, “Genetic-based search for error-correcting graph isomorphism,” IEEETransactions on Systems, Man, and Cybernetics: Part B -Cybernetics, vol. 27, pp. 588–597, 1997.
[9] M. V. Suisse, K. Shearer, H. Bunke, and S. Venkatesh, “Videoindexing and similarity retrieval by largest common subgraphdetection using decision trees,” 2000.
[10] D. Tao, X. Tang, and X. Li, “Which components are importantfor interactive image searching?,” Circuits and Systems forVideo Technology, IEEE Transactions on, vol. 18, pp. 3 –11,jan. 2008.
[11] W. J. Christmas, J. Kittler, and M. Petrou, “Structural match-ing in computer vision using probabilistic relaxation,” IEEETransactions on Pattern Analysis and Machine Intelligence,vol. 17, pp. 749–764, 1995.
[12] T. B. Sebastian, P. N. Klein, and B. B. Kimia, “Recognitionof shapes by editing their shock graphs,” IEEE Transactionson Pattern Analysis and Machine Intelligence, vol. 26, 2004.
[13] A. Sanfeliu and K. Fu, “A distance measure between at-tributed relational graphs for pattern recognition,” IEEETransactions on Systems, Man, and Cybernetics, vol. 13,pp. 353–362, 1983.
[14] B. Messmer and H. Bunke, “A new algorithm for error-tolerant subgraph isomorphism detection,” Pattern Analysisand Machine Intelligence, IEEE Transactions on, vol. 20,pp. 493 –504, may 1998.
[15] B. Messmer and H. Bunke, “Efficient subgraph isomorphismdetection: a decomposition approach,” Knowledge and DataEngineering, IEEE Transactions on, vol. 12, pp. 307 –323,mar/apr 2000.
153153153