4
Route separation strategies for human movement datasets Marcell Feher, Krisztian Fekete, Kristof Csorba, Bertalan Forstner PhD. Budapest University of Technology and Economics Budapest, Hungary Email: [marcell.feher | krisztian.fekete | kristof.csorba | bertalan.forstner]@aut.bme.hu Abstract—Learning patterns of human movement is a com- plex and hard task, including several computationally expensive algorithms. This issue has even higher emphasis in mobile environment, since handheld devices contain significantly less memory and computing power than a usual PC does. In this paper we are going to compare novel, mobile-optimized meth- ods for separating trajectories in a human routine recognition framework. Keywords-algorithm;gps;trajectory;pattern I. I NTRODUCTION The growing penetration of internet connectivity all over the world raised the populatity and spread of community networks and smartphones in the last decade. Both of them has extreme large user base, and could expose advantages of each other. For instance, Global Posioning System (GPS) modules of modern smartphones and feature phones offers a good opportunity for community networks to integrate location-based services (LBS). In case an LBS provider releases a mobile application instead of using an optimized website for small screens, exploiting GPS functionality becomes possible. Beyond simple features based on spatial information like sharing my current position with friends, or check-in at points of interests, complex services would be possible to be implemented. Our research focuses on a recommendation module of a mobile community network for finding the most suitable time and place of a meeting, based on trajectory patterns (referred as routines) of users. The conception of this service - entitled MeetYouThere - is based on the assumption, that the majority of people follow predictable patterns during their everyday courses [1]. Our main idea was utilize this well-known phenomenon by designing a framework which is capable of tracking positions of users and extracting patterns of the emerging spatiotemporal data. These may occur every day like going to work on weekdays, or rarely, for instance visiting friends on every second saturday. Identifying routines consists of several practical and math- ematical problems such as reducing the size of the measure- ment data, processing new locations, finding matching tracks or determining proximity of trajectories. In this paper we are examining the task of deciding if two tracks are similar or not. Trajectories consists of both spatial and temporal informa- tion (where and when the movement occured). Within the confines of this paper we focus on the former only, whereas time dimension is not taken into consideration at proximity measurement. In this manner we allow tracks to be similar even if they came off at different times. The track of a trajectory can easily be represented by directed path graph G(V,E), where V holds the points of orientation changes, and E stands for the edges be- tween vertices, showing the path between the points. To the first approximation, set of the vertices consists of (x, y) pairs, where x is proportional to the longitunial, and y is proportional to the latitudinal coordinate of the spatial location. Vertices of the graph may be located at almost any position, therefore quantity of common points among real life trajectories is extremely low, as shown on Figure 1(a). However, in the field of the presented application environment (MeetYouThere), some assumptions could highly increase this ratio, resulting a significant speedup and simplicity of the proximity computation process. Assumptions: 1) Vast majority of potential users live in an urban area 2) Meeting point arrangement is effective within cities 3) Almost every trajectories can be described as a se- quence of street segments in a city Adopting these assumptions allows the quantization of arbitrary coordinates onto the street network. Thus vertices of all path graphs are originated from the same domain, the set of intersections, as shown on Figure 1(a)(b). In this paper we are going to show methods for determin- ing proximity of routes which are given this way, discuss advantages and disadvantages of each from the perspective of meeting point arrangement. The algoritmical approach of this problem can be formal- ized as seen on Algorithm 1. 2012 19th IEEE International Conference and Workshops on Engineering of Computer-Based Systems 978-0-7695-4664-3/12 $26.00 © 2012 IEEE DOI 10.1109/ECBS.2012.35 150 2012 IEEE 19th International Conference and Workshops on Engineering of Computer-Based Systems 978-0-7695-4664-3/12 $26.00 © 2012 IEEE DOI 10.1109/ECBS.2012.35 150 2012 IEEE 19th International Conference and Workshops on Engineering of Computer-Based Systems 978-0-7695-4664-3/12 $26.00 © 2012 IEEE DOI 10.1109/ECBS.2012.35 150

[IEEE 2012 19th IEEE International Conference and Workshops on Engineering of Computer Based Systems (ECBS) - Novi Sad, Serbia (2012.04.11-2012.04.13)] 2012 IEEE 19th International

Embed Size (px)

Citation preview

Page 1: [IEEE 2012 19th IEEE International Conference and Workshops on Engineering of Computer Based Systems (ECBS) - Novi Sad, Serbia (2012.04.11-2012.04.13)] 2012 IEEE 19th International

Route separation strategies for human movement datasets

Marcell Feher, Krisztian Fekete, Kristof Csorba, Bertalan Forstner PhD.

Budapest University of Technology and EconomicsBudapest, Hungary

Email: [marcell.feher | krisztian.fekete | kristof.csorba | bertalan.forstner]@aut.bme.hu

Abstract—Learning patterns of human movement is a com-plex and hard task, including several computationally expensivealgorithms. This issue has even higher emphasis in mobileenvironment, since handheld devices contain significantly lessmemory and computing power than a usual PC does. In thispaper we are going to compare novel, mobile-optimized meth-ods for separating trajectories in a human routine recognitionframework.

Keywords-algorithm;gps;trajectory;pattern

I. INTRODUCTION

The growing penetration of internet connectivity all over

the world raised the populatity and spread of community

networks and smartphones in the last decade. Both of them

has extreme large user base, and could expose advantages

of each other. For instance, Global Posioning System (GPS)

modules of modern smartphones and feature phones offers

a good opportunity for community networks to integrate

location-based services (LBS). In case an LBS provider

releases a mobile application instead of using an optimized

website for small screens, exploiting GPS functionality

becomes possible. Beyond simple features based on spatial

information like sharing my current position with friends,

or check-in at points of interests, complex services would

be possible to be implemented. Our research focuses on a

recommendation module of a mobile community network

for finding the most suitable time and place of a meeting,

based on trajectory patterns (referred as routines) of users.

The conception of this service - entitled MeetYouThere- is based on the assumption, that the majority of people

follow predictable patterns during their everyday courses

[1]. Our main idea was utilize this well-known phenomenon

by designing a framework which is capable of tracking

positions of users and extracting patterns of the emerging

spatiotemporal data. These may occur every day like going

to work on weekdays, or rarely, for instance visiting friends

on every second saturday.

Identifying routines consists of several practical and math-

ematical problems such as reducing the size of the measure-

ment data, processing new locations, finding matching tracks

or determining proximity of trajectories. In this paper we are

examining the task of deciding if two tracks are similar or

not.

Trajectories consists of both spatial and temporal informa-

tion (where and when the movement occured). Within the

confines of this paper we focus on the former only, whereas

time dimension is not taken into consideration at proximity

measurement. In this manner we allow tracks to be similar

even if they came off at different times.

The track of a trajectory can easily be represented by

directed path graph G(V,E), where V holds the points

of orientation changes, and E stands for the edges be-

tween vertices, showing the path between the points. To

the first approximation, set of the vertices consists of (x, y)pairs, where x is proportional to the longitunial, and yis proportional to the latitudinal coordinate of the spatial

location. Vertices of the graph may be located at almost any

position, therefore quantity of common points among real

life trajectories is extremely low, as shown on Figure 1(a).

However, in the field of the presented application

environment (MeetYouThere), some assumptions could

highly increase this ratio, resulting a significant speedup

and simplicity of the proximity computation process.

Assumptions:

1) Vast majority of potential users live in an urban area

2) Meeting point arrangement is effective within cities

3) Almost every trajectories can be described as a se-

quence of street segments in a city

Adopting these assumptions allows the quantization of

arbitrary coordinates onto the street network. Thus vertices

of all path graphs are originated from the same domain, the

set of intersections, as shown on Figure 1(a)(b).

In this paper we are going to show methods for determin-

ing proximity of routes which are given this way, discuss

advantages and disadvantages of each from the perspective

of meeting point arrangement.

The algoritmical approach of this problem can be formal-

ized as seen on Algorithm 1.

2012 19th IEEE International Conference and Workshops on Engineering of Computer-Based Systems

978-0-7695-4664-3/12 $26.00 © 2012 IEEE

DOI 10.1109/ECBS.2012.35

150

2012 IEEE 19th International Conference and Workshops on Engineering of Computer-Based Systems

978-0-7695-4664-3/12 $26.00 © 2012 IEEE

DOI 10.1109/ECBS.2012.35

150

2012 IEEE 19th International Conference and Workshops on Engineering of Computer-Based Systems

978-0-7695-4664-3/12 $26.00 © 2012 IEEE

DOI 10.1109/ECBS.2012.35

150

Page 2: [IEEE 2012 19th IEEE International Conference and Workshops on Engineering of Computer Based Systems (ECBS) - Novi Sad, Serbia (2012.04.11-2012.04.13)] 2012 IEEE 19th International

(a) Tracks with arbitrary coordi-nates

(b) Tracks quantizated on map

Figure 1.

Algorithm 1 Pseudo code of route proximation decision

1: function IsRoutesNear(Route1, Route2)

2: RouteDiffstart = calculate spatial distance between

Route10 and Route203: RouteDiffend = calculate spatial distance between

Route1length(Route1) and Route2length(Route2)

4: if RouteDiffstart >MAX ENDPOINT DIST&&RouteDiffend >MAX ENDPOINT DIST then

5: return false6: else7: if MeasureProximity(Route1, Route2) ¡ TRESHOLD

then8: return true9: else

10: return false11: end if12: end if

Accordingly our goal is to show possible algorithms of

the function MeasureProximity.

This paper organizes as follows. In the next section, exist-

ing solutions of route proximity measurement are discussed,

than our contribution of examinig several methods is shown.

After that, further fields of our research are introduced,

followed by a conclusion and acknowledgments. The paper

is finished by the list of references.

II. RELATED WORK

Proximity of tracks could be approached many ways, for

instance inexact graph matching [2], [3], [4], [5], [6]. This

field proved to be successful in different research areas like

character recognition [7], [8], indexing images and videos

[9], [10], image registration [11], sketch-photo recognition

[?] or shape analysis [12]. Difference between graphs could

be measured in numerous ways, but one of them gained

much research interest in the previous years. Graph Edit

Distance (referred as GED) is defined as the least expensive

set of graph operations that are needed to make two graphs

isomorph.

GED is not a single method or index, several varieties has

been introduced since it has been published by Sanfeliu and

Fu [13]. Their initial idea was to compute nodes and edges

altogether with deletions and insertations of them, which are

neccessary to transform model graph to data graph. Based on

this method, extensions of subgraph edit distance [14], [15],

distance between strings [6], relationship between maximum

common subgraph size and GED, and further areas has

developed.

Altough GED is an excellent index for various purposes,

calculating route proximity cannot apply that fully, since it

lacks the information of geographical distance between the

edges and nodes of the corresponding graphs.

III. METHODS OF ROUTE PROXIMITY MEASUREMENT

In this chapter we are going to examine the possibilities

emerging for comparing the similarity of tracks. The primary

objective of the shown methods is to determine whether the

tested trajectories belong to the same routine by computing

the extent they differ from each other.

A. Distance of centroids

To the first approximation, we take the centroid of paths

into the focus of analysis, and use their distance to determine

the proximity of the original tracks. The calculation of

centroid is shown on Algortihm 1.

Algorithm 2 Pseudo code for centroid distance calculation

1: function CentroidDistance(Route1, Route2)

2: n1 = number of vertices in Route1

3: n2 = number of vertices in Route2

4: Centroid1 = ((∑n1

i=1 Route1ix)/n1,

(∑n1

i=1 Route1iy)/n1)

5: Centroid2 = ((∑n2

i=1 Route2ix)/n2,

(∑n2

i=1 Route2iy)/n2)

6: Distance = calculate spatial distance between Centroid1

and Centroid2

7: return Distance

As shown on Figure 2, significantly different tracks might

produce small distance between centroids. In spite of this

weak point, the method is benefical when routes consists

of only few edges, and great symmetric excursions are not

possible.

B. Rate of identical edges

For measuring similarity of the routes, when path graphs

are built using the same set of vertices, an appropriate

method could be looking for identical sections. In this case

similarity indicator is non other, than the ratio of matching

and total number of sections. The closer the paths, the higher

this rate is. In this case we are not comparing the vertices,

151151151

Page 3: [IEEE 2012 19th IEEE International Conference and Workshops on Engineering of Computer Based Systems (ECBS) - Novi Sad, Serbia (2012.04.11-2012.04.13)] 2012 IEEE 19th International

Figure 2. Centroids (vertices with blue stroke) of two significantly differentroutes

but looking for overlapping edges. If both tracks consist

of the same number of edges, the rate is unambiguous:

SimilarityRate = #(identical edges) / #(all edges). On

the contrary, the two graphs will produce different values,

since quantitiy of overlapping edges is divided by their own

edge count. In this case, considering the main goal of the

algorithm, less mistake is taking the smaller ratio as an

indicator of proximity. The pseudo code of the algorithm

is as follows:

Algorithm 3 Pseudo code for determining rate of identical

edges

1: procedure IdenticalEdgesRate(Route1, Route2)

2: n1 = number of edges in Route1

3: n2 = number of edges in Route2

4: identicalEdges = calculate number of identical edges in

Route1 and Route2

5: if n1 >= n2 then6: return identicalEdges/n17: else8: return identicalEdges/n29: end if

Although an advantage of this method is that it indicates

distance of routes on their full track, it also has disadvatages.

On the one hand, finding identical edges cannot be per-

formed in linear time, on the other hand, it does not show

how far the paths actually are.

As shown on Figure 3, small amount of differing sections

may cause significant distinction of routes. On the figure, for

instance, the algorithm produces 0.75 identity ratio, which

leads to the wrong conclusion of 75 per cent of the routes

are the same.

Figure 3. False high identical segment rate

C. Space between routes

The algorithm, to be discussed in this section, meant

to eliminate the main disadvantage of the former method,

obscuring real distance of routes. If geospatial range of

tracks is important, calculating the size of region between

them seems benefical. This operation is defined on polygons,

so every violating factor should be eliminated, which stalls

closeness. In this particular case, endpoints of paths may

bring problems. Altought according to algorithm 1. they

must be close to each other, being identical is not demanded.

Then the way of forming a polygon is connecting first and

last vertices of the routes, as shown on Figure 4. Accordingly

the area bordered by the two paths can be calculated, which

provides a good indicator of the difference.

Figure 4. Space between routes

Examining disatvantages of the method, aggregation leads

to problems again. The algorithm neglects the maximum

distance of the routes, it provides the sum of deflection only.

D. Segmenting tracks

For all the shown methods, problem emerges because

of routes under examination are processed in their full

lenght, therefore local and global deviations cannot be told

152152152

Page 4: [IEEE 2012 19th IEEE International Conference and Workshops on Engineering of Computer Based Systems (ECBS) - Novi Sad, Serbia (2012.04.11-2012.04.13)] 2012 IEEE 19th International

apart. Under particular circumstances, this leads to a wrong

decision at all times when determining proximity.

In order to examine paths on local level, they must be

degraded into smaller pieces, referred as segments. This

procedure raises series of problems and questions, which

will be presented in detail in later articles.

1) Quantity of segments: Determining the number of

segments is not trivial. On the one hand, it should cut routes

into as many segments as possible in order to gain a fine

granularity, while on the other hand, by cutting track into

too short pieces, we lose information about the bearing of

routes. Taking current speed of tracks into consideration

while segmenting is also an open question.

2) Cutting direction: Segmentation direction is yet a

field a research. Current possibilities includes parallell to

coordinate axis, perpendicularly to dominant direction of

progress or adaptively changing during the track.

IV. FUTURE WORK

The presented contribution still is under development,

and further tasks of the complex meeting recommendation

service must be researched. The authors are busy finding

answers to problems like:

• proper segmentation method

• suitable algorithm for quantizating GPS coordinates

onto street network

• new methods of route proximity measurement

• different methods may be best fit on different route

segments, research partitioned proximity calculation

algorithms

• teaching system by the help of user feedbacks

V. CONCLUSION

Offering suitable meeting points is a complex and hard

task, from which a small section, route proximity measure-

ment has been discussed. We shown methods altogether with

their advantages and drawbacks, substatively and compared

to each other. Although all of them are useful in particular

cases, a smart combination may be suitable in order to

achieve the global intent, determining proximity of arbitrary

routes.

ACKNOWLEDGMENTS

This work is connected to the scientific program of the

”Development of quality-oriented and cooperative R+D+I

strategy and functional model at BUTE” project. This project

is supported by the New Hungary Development Plan (Project

ID: TAMOP-4.2.1/B-09/1/KMR-2010-0002).

REFERENCES

[1] A.-L. Barabasi, Bursts - The Hidden Pattern Behind Every-thing We Do. Dutton Adult, 2010.

[2] S. Umeyama, “An eigendecomposition approach to weightedgraph matching problems,” IEEE Transactions on PatternAnalysis and Machine Intelligence, vol. 10, pp. 695–703,1988.

[3] H. Bunke, “Recent developments in graph matching,” PatternRecognition, International Conference on, vol. 2, p. 2117,2000.

[4] T. Caelli and S. Kosinov, “An eigenspace projection clusteringmethod for inexact graph matching,” IEEE Transactions onPattern Analysis and Machine Intelligence, vol. 26, pp. 515–519, 2004.

[5] A. D. J. Cross, R. C. Wilson, and E. R. Hancock, “Inexactgraph matching using genetic search,” Pattern Recognition,vol. 30, no. 6, pp. 953 – 970, 1997.

[6] R. A. Wagner and M. J. Fischer, “The string-to-string correc-tion problem,” J. ACM, vol. 21, pp. 168–173, January 1974.

[7] J. Rocha and T. Pavlidis, “A shape analysis model withapplications to a character recognition system,” IEEE Trans-actions on Pattern Analysis and Machine Intelligence, vol. 16,pp. 393–404, 1994.

[8] Y.-K. Wang, K. chin Fan, and J. tzong Horng, “Genetic-based search for error-correcting graph isomorphism,” IEEETransactions on Systems, Man, and Cybernetics: Part B -Cybernetics, vol. 27, pp. 588–597, 1997.

[9] M. V. Suisse, K. Shearer, H. Bunke, and S. Venkatesh, “Videoindexing and similarity retrieval by largest common subgraphdetection using decision trees,” 2000.

[10] D. Tao, X. Tang, and X. Li, “Which components are importantfor interactive image searching?,” Circuits and Systems forVideo Technology, IEEE Transactions on, vol. 18, pp. 3 –11,jan. 2008.

[11] W. J. Christmas, J. Kittler, and M. Petrou, “Structural match-ing in computer vision using probabilistic relaxation,” IEEETransactions on Pattern Analysis and Machine Intelligence,vol. 17, pp. 749–764, 1995.

[12] T. B. Sebastian, P. N. Klein, and B. B. Kimia, “Recognitionof shapes by editing their shock graphs,” IEEE Transactionson Pattern Analysis and Machine Intelligence, vol. 26, 2004.

[13] A. Sanfeliu and K. Fu, “A distance measure between at-tributed relational graphs for pattern recognition,” IEEETransactions on Systems, Man, and Cybernetics, vol. 13,pp. 353–362, 1983.

[14] B. Messmer and H. Bunke, “A new algorithm for error-tolerant subgraph isomorphism detection,” Pattern Analysisand Machine Intelligence, IEEE Transactions on, vol. 20,pp. 493 –504, may 1998.

[15] B. Messmer and H. Bunke, “Efficient subgraph isomorphismdetection: a decomposition approach,” Knowledge and DataEngineering, IEEE Transactions on, vol. 12, pp. 307 –323,mar/apr 2000.

153153153