40
Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas Fernando Terroso-Saenz 1 · Mercedes Valdes-Vela 1 · Antonio F. Skarmeta-Gomez 1 Received: 1 June 2015 / Accepted: 20 January 2016 © The Author(s) 2016 Abstract Personal route prediction has emerged as an important topic within the mobility mining domain. In this context, many proposals apply an off-line learning process before being able to run the on-line prediction algorithm. The present work introduces a novel framework that integrates the route learning and the prediction algorithm in an on-line manner. By means of a thin-client and server architecture, it also puts forward a new concept for route abstraction based on the detection of spatial regions where certain velocity features of routes frequently change. The proposal is evaluated by real-world and synthetic datasets and compared with a well-established mechanism by exhibiting quite promising results. Keywords Route prediction · Density-based clustering · Mobility mining 1 Introduction Nowadays, handheld and wearable devices have actually become instrumental tools for most of our daily tasks. Among other features, such devices have been steadily Responsible editor: Pierre Baldi. B Fernando Terroso-Saenz [email protected] Mercedes Valdes-Vela [email protected] Antonio F. Skarmeta-Gomez [email protected] 1 Department of Information and Communication Engineering, Faculty of Computer Science, University of Murcia, Campus de Espinardo S/N, 30100 Murcia, Spain 123

Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

Data Min Knowl DiscDOI 10.1007/s10618-016-0452-3

Online route prediction based on clustering ofmeaningful velocity-change areas

Fernando Terroso-Saenz1 ·Mercedes Valdes-Vela1 ·Antonio F. Skarmeta-Gomez1

Received: 1 June 2015 / Accepted: 20 January 2016© The Author(s) 2016

Abstract Personal route prediction has emerged as an important topic within themobility mining domain. In this context, many proposals apply an off-line learningprocess before being able to run the on-line prediction algorithm. The present workintroduces a novel framework that integrates the route learning and the predictionalgorithm in an on-line manner. By means of a thin-client and server architecture, italso puts forward a new concept for route abstraction based on the detection of spatialregions where certain velocity features of routes frequently change. The proposal isevaluated by real-world and synthetic datasets and compared with a well-establishedmechanism by exhibiting quite promising results.

Keywords Route prediction · Density-based clustering · Mobility mining

1 Introduction

Nowadays, handheld and wearable devices have actually become instrumental toolsfor most of our daily tasks. Among other features, such devices have been steadily

Responsible editor: Pierre Baldi.

B Fernando [email protected]

Mercedes [email protected]

Antonio F. [email protected]

1 Department of Information and Communication Engineering, Faculty of Computer Science,University of Murcia, Campus de Espinardo S/N, 30100 Murcia, Spain

123

Page 2: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

F. Terroso-Saenz et al.

enriched withmore precise positioning sensors, like GPS. This has allowed to collect alarge amount of high-resolution digital traceswhich, in turn, has eased the developmentof the mobility mining discipline (Körner et al. 2012).

In this discipline, one of the most prominent tasks has been route prediction due toits broad range of possible applications, like pervasive navigation systems (Steinfeldet al. 1996), route-specific in-car control systems (Deguchi et al. 2004) intelligentresource and cell allocation for wireless communication networks (Liou and Huang2005) or predictive queries for moving object databases (Hendawi and Mokbel 2012).

Most route prediction approaches usually consist of a chained process comprisingthree stages, route abstraction, route pattern mining/probabilistic model generationand eventually route prediction. If this process refers to a particular or a group ofspecific individuals sharing common features then it is regarded as personal routeprediction.

Although several studies already exist, personal route prediction is not fully sup-ported by existing solutions mainly due to the following challenges (Chen et al. 2011),

– First of all, people tend to move freely so that their trajectories may not beconstrained to particular road networks or pre-defined paths, so current grill orroute-map representation approaches are not suitable for certain scenarios.

– Secondly, as a person’s routines change, so do theirmobility patterns.Unfortunately,existingmechanisms usually rely on off-linemining processes that can not smoothlyadapt to these changes.

– Moreover, personal route prediction should take into account the limited computa-tional capabilities of existing handheld devices. This issue is not usually consideredby current approaches.

– Last but not least, location data is quite sensitive in terms of privacy for manypeople. Therefore, personal predictors should include mechanisms to counteractpossible privacy-leak threats.

In this context, the present paper puts forward PRoPTurn, Personal Route Predictorbased on Turn areas, a novel probabilistic approach for personal route predictionfor high-resolution location data. PRoPTurn has been designed bearing in mind theaforementioned challenges to overcome the problems of previous algorithms. For thatgoal, the proposed solution introduces newmechanisms in each of the three predictionstages distributed in a mobile-client and server architecture with privacy protection.

Regarding the route abstraction stage, the underlying idea of PRoPTurn is that aroute can be uniquely identified by means of the points where its velocity (speedand direction) remarkably changes. On the basis of these points, it is possible toextract those areas where a person’s regular routes usually change their direction,such as crossroads, roundabouts or crossings along with his or her frequent origins anddestinations like home or office. Consequently, as Fig. 1 shows, a route is representedas a sequence of these areas. This new representation is suitable even when the targetperson freely moves through large areas or with poor road-map coverage.

As for the probabilistic model generation, the introduced solution opts for storingthe frequency information of the routes as a multigraph model at the same time thetarget person moves. This on-line mechanism avoids any type of previous off-linetraining.

123

Page 3: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

Online route prediction based on clustering of meaningful...

Fig. 1 Overview of PRoPTurn route abstraction and multigraph generation in the 2D space

Concerning route prediction, a lightweight mechanism to deal with dynamicmobil-ity patterns has been developed. Such mechanism takes into account the novelty of theongoing route with respect to patterns detected so far along with its current directionto make a new prediction.

To sum up, the salient contributions of this work are: (i) the definition of a newvelocity-based route abstraction, (ii) a mechanism using density-based clustering andevent-based processing to make up that abstraction, (iii) a newmulti-graph storage foron-line pattern mining, and (iv) a route prediction mechanism to deal with incrementalroute patterns.

The remainder of the paper is structured as follows. Section 2 is devoted to describ-ing in detail the logic structure and the processing stages of the proposed system. Then,Sect. 3 discusses the main results of the performed experiments. Next, an overview

123

Page 4: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

F. Terroso-Saenz et al.

Table 1 ProPTurn notationAcronym Meaning

SP Stop point

TP Turn point

MVA Meaningful velocity area

MSA Meaningful stop area

MTA Meaningful turn area

Rq Query route

Rabs Abstraction of the routeRGstat Multigraph comprising the mobility model

about personal route prediction is put forward in Sect. 4. Finally, the main conclusionsand the future work are summed up in Sect. 5.

2 PRoPTurn

This section is devoted to explain in detail the goal along with the architecture of theproposal. For the sake of clarity, Table 1 summarizes the key acronyms used in thefollowing sections.

2.1 Prediction target

Themaingoal of PRoPTurn is to provide accurate and early prediction of themovementof a person in the short and long term. Regarding thismovement, as the bottomof Fig. 1shows, we assume that the mobility routines of a person defines a set of meaningfulstops like his or her home, office, school and so forth. In our setting, such meaningfulstops have been defined by means of a spatial and time-based approach.

Definition 1 A meaningful stop for a person P , ST p, is a spatial region where Pfrequently remains stationary for a certain period of time Tst .

Since the movements of a person are usually constrained to these stops, a set of regularroutes can be stated as follows,

Definition 2 A route of a person P ,Rp, is the continuous movement from an originOr to a destination Dr where both Or and Dr are meaningful stops ST p.1

Consequently, PRoPTurn focuses onpredicting the destinationDr and certain futurefeatures of the continuousmovement of a routeRp , at the same time it is being covered.This prediction is initiated when the person departs from Or , and it is updated in realtime while the person moves towards his actual destination.

1 In the present work, we equally use the terms route or trajectory to refer to this continuous movement ofa person.

123

Page 5: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

Online route prediction based on clustering of meaningful...

Fig. 2 PRoPTurn client-server structure. The route abstraction stage comprises the GPS data cleaner,TP-SP detectors, TP aggregator and MVA detector modules. The model generation stage contains themultigraph builder component, and the route prediction stage is compound of the prediction maker and theprediction handler modules

2.2 Architecture overview

In order to achieve the prediction features explained before, PRoPTurn has beendesigned to run in handheld devices so as to benefit from their endlessly improvedlocation sensors. Nonetheles, due to the computational and memory constrains of thistype of devices, the system architecture has been split into two different parts, a clientside that runs on the mobile device of each user, and a server side in charge of themost demanding tasks.

The PRoPTurn logic structure, depicted in Fig. 2, shows that each module is incharge of a task related to one of the three prediction stages, route abstraction, proba-bilistic model generation and route prediction.

2.3 Route abstraction

2.3.1 Route abstraction format

As it has been previously stated, PRoPTurn makes use of the velocity features of theroutes so as to compose a representation based on their Regions Of Interests (ROIs).These velocity-based ROIs have been named meaningful velocity areas (MVAs), andthey represent the usual areas where the routes of a particular user change some of theirvelocity features. Depending on the particular feature taken into account, it is possibleto distinguish between meaningful turn areas (MTAs) (for direction) and meaningfulstop areas (MSAs) (for speed),

Definition 3 A meaningful turn area (MTA) is a special type of MVA representinga geographic area where one or more routes Rp of a person P usually change theirdirection in a factor �dir . This area might match certain infrastructure elements likecrossroads or roundabouts but it is not limited to them.

Definition 4 A meaningful stop area (MSA) is a special type of MVA representing ageometrical region containing a meaningful stop ST p of a person P .

123

Page 6: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

F. Terroso-Saenz et al.

As a result, the route abstraction generated by PRoPTurn takes the following form,

Definition 5 The abstraction Rabs of a route R is a MVA sequence {MSAo →MTAj → ... → MTAj+n → MSAd} where– MSAo represents the origin Or of R.– MTAj , ...,MTAj+n (n ≥ 0) are the intermediate MTAs covered by R.– MSAd stands for the final destination Dr of R.

For example, Fig. 1 shows that route 2 (R2) is abstracted as the sequence R2abs =

{MSA1 → MTA2 → MSA3} as these are the MVAs covered by the route.

2.3.2 Route abstraction process

PRoPTurn makes up the aforementioned abstraction of the ongoing route at the sametime it is being covered. This process involves two steps.

1. Meaningful velocity episodes extraction Firstly, the trace of raw locations of theroute is analysed to detect remarkable episodes of direction changes or low speed.When one these episodes occur, the single location situated at the middle of suchepisode is extracted. Such a location is named Turn Point (TP) (for direction-change episodes) or Stop Point (SP) (for low-speed episodes).

2. Meaningful velocity areas detection For each TP or SP, a second step detects if itis included in any MVA. If that is the case, the MVA is appended as a new elementof the ongoing route’s abstraction.

As Fig. 2 depicts, while the mobile device side is in charge of the TP-SP extraction, theserver side is responsible for the MVA detection because this detection may requiretoo high computational and memory needs for handheld devices.

2.3.3 TP-SP extraction

This abstraction step is undertaken by theGPS data cleaner, TP detector, SP detectorand TP aggregator modules of the mobile device (see Fig. 2). In order to find TPs andSPs in real time, the first three modules have been developed by adopting the ComplexEvent Processing (CEP) approach (Etzion and Niblett 2010).

CEP is a software technology to timely process streams of information items, so-called events, to detect a palette of target situations, taking the form of complex events,by means of predefined event-based rules. CEP has been recently tested as a suitablesolution to perform local data processing on mobile platforms (Stipkovic et al. 2013;Dunkel et al. 2013). Consequently, the inner logic of the mobile side has been broadlyimplemented by means of different event-based processing rules.

GPS data cleaner The present version of ProPTurn only relies on the mobile device’sGPS sensor to extract the routes’ raw locations. For that goal, the sensor periodicallyprovides the system with a new measurement. Following a CEP point of view, eachnewmeasurement is reflected as a newGPS Event which basically comprises the tuple〈x, y, t〉 reporting the latitude-longitude coordinates 〈x, y〉 at instant t .

123

Page 7: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

Online route prediction based on clustering of meaningful...

t=1t=2

t=3t=4

t=5t=7t=6

t=8

t=9t=10t=11

bearing=20ᵒ(NNE)

bearing=30ᵒ(NNE)

bearing=50ᵒ(NE)

bearing=110ᵒ(ESE)

bearing=120ᵒ(SE)

initial Bearing=20ᵒfinalBearing=120ᵒ

dir= 45ᵒ

New TP

(a) (b) (c)

Fig. 3 Example of the TP generation process. aGPS data cleaning: each dot represents a newGPS event atinstant t and the circles the lower bound dmin of the distance threshold. The light ones (t = 1, 3, 5, 6, 9, 10)are those that are filtered in and converted into filtered GPS events. b TP detection: calculation of thebearing for each pair of filtered GPS events. The codes in brackets (NE, ESE,...) indicate the translation ofthe bearing to a compass rose. c TP detection: generation of a new TP because the bearing increment of thefiltered GPS events sequence (20◦ → 120◦) is over �dir (45

◦)

However, under certain conditions, a GPS sensor may return erroneous or irrelevantlocations (Zhang and Goodchild 2002). Hence, it is necessary to perform a filteringof its raw measurements before further processing.

This cleaning is done by means of a distance-based filter that allows to discardoutliers or irrelevant locations. If the distance between each new GPS event and themost recent filtered data (filtered GPS event) is within a range [dmin : dmax ] then itmeans that the location reported by the GPS event is neither irrelevant nor an outlier.In that case, theGPS event is mapped to a new filtered GPS event (which, in turn, willbe used in the following execution of the filter). In this scope, it is worth mentioningthat there are other solutions based on kernel methods or Gaussian mixture models (deVries 2012) able to perform a more detailed GPS-data cleaning. However, they are notfully compliant with real-time environments due to their computational complexity.

For the sake of clarity, Fig. 3a shows an example of this filtering process, andAppendix 1 includes the pseudocode of the event-based rule implementing the afore-mentioned filter.

TP detector The key goal of this module is to detect the turn episodes that arises inthe ongoing route and extract a single TP representing each of them

For the detection of meaningful turns, we have followed navigational approachas turns are reflected as variations of the route’s bearing. For example, changing thedirection from north to east implies a route bearing decrease from 360◦ to 90◦.

Consequently, TPs are detected by looking for sequences of filtered GPS eventsf gpsseq : { f gpsi → f gps1j → . . . → f gpsnj → f gps f } accomplishing two condi-tions. First of all, the bearing steadily increases/decreases for each pair of consecutiveevents. Secondly, the bearing increment/decrement considering the whole sequencef gpsseq must reach a predefined threshold �dir . If both conditions are fulfilled thenthe middle point of f gpsseq is extracted as a new TP reporting the discovered turn. Agraphical representation of this process is depicted in Fig. 3b, c.

123

Page 8: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

F. Terroso-Saenz et al.

Fig. 4 Snapshot of close andconsecutive TPs (yellow pins)from a unique user’s route(Color figure online)

Following the CEP paradigm, two event processing rules have been defined, onefor bearing increment detection and other for decrement detection. The pseudo-codeof the first one is included in Appendix 1.

SP detector Concerning the stop episodes, if a person remains stationary during acertain period of time then the new GPS events will be discarded by the distancefilter’s dmin boundary. As a result, no new filtered GPS events will be made up duringthat period. Hence, the module searches for long time periods with no filtered GPSevents. If this happens, a new SP reporting a stop of the ongoing route is createdcomprising the location of the last filtered GPS event. The pseudocode of the eventrule implementing such a logic is also included in Appendix 1.

TPaggregator Under certain circumstances, a single routeRmight give rise to severalconsecutive TPs located quite close to each other within a particular area as Fig. 4depicts. This can be mainly due to measurement inaccuracies given lowGPS coverageor just roaming movements of the user.

Such a group of consecutive and close TPs can be actually compressed into asingle TP representing the whole roaming episode of the route. For that reason, theTP aggregator component performs a micro-clustering in order to group TPs thatpotentially stands for the same movement episode. Then, only one point of eachmicro-cluster is used for further processing whereas the rest are discarded.

For that reason, the module searches for clusters taking the form of a sequence ofconsecutive TPs from the same route R, TPclus

R = {tpi → tpi+1 → . . . → tpi+n} sothat

I. dist(tpi+ j ,tpi+ j+1)� dmax , ∀i � 0, ∀ j n � j � 0II. dist(ζ(TPclus

R ), tpi )� dmax , ∀tpi ∈ TPclusR

III. |tpi+ j+1.timestamp − tpi+ j .timestamp| < tmax , ∀i � 0, ∀ j n � j � 0

123

Page 9: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

Online route prediction based on clustering of meaningful...

where ζ(TPclusR ) represents the middle point of the micro-cluster. Whilst the first two

conditions ensures that all the TPs are close to each other in spatial terms, the thirdone imposes a time constraint so that all the points are contained in the same route.

Algorithm 1 shows the mechanism to compose the aforementioned sequence. Aswe can see, if a new TP does not accomplish the three conditions listed above (lines5–8), a representative point tprep of the microcluster, comprising its centroid and finalbearing (the current bearing of the ongoing route), is set for further processing beforebeing reset. Moreover, in order to avoid TP starvation, an inner process automaticallycomposes the tprep of the current cluster if no TPs are generated during the last tmax

time units.

Algorithm 1: TP aggregation methodInput: A just-detected TP tpnewOutput: The up-to-date TP microcluster TPclusR of the ongoing routeR

1 tprep ← ∅2 if dist (tpnew ,tpprev)� dmax ∧ dist (ζ(TPclusR ), tpnew) �dmax ∧ |tpnew.timestamp − tpprev.timestamp| < tmax then

3 TPclusR ← TPclusR ∪ tpnew4 TPclusR .bear f inal ← tpnew.bear f inal

5 else6 tprep ← new TP(ζ(TPclusR ), TPclusR .bearini , TP

clusR .bear f inal )

7 TPclusR ← tpnew8 TPclusR .bear f inal ← tpnew.bear f inal

9 tpprev ← tpnew10 return tprep

Finally, the representative TPs along with the SPs are eventually sent to the serverside for further processing after being enconded by the privacy provider component.Thus, bearing in mind that the communication sensors are one of the most battery-draining equipment of amobile phone (Carroll andHeiser 2010), themobile part limitsthe information delivered to the server as it only emits the locations that representmeaningful velocity variations instead of the whole digital trace.

2.3.4 MVA detection

The second step of the abstraction process is undertaken by theMVA detector module(see Fig. 2) and it intends to infer whether a just-received SP or TP is included in anyMVA.

In this context, PRoPTurn does not rely on any predefinedMVAs. Instead, theMVAdetector incrementally generates the MVAs of a user at the same time it receiveshis/her SPs and TPs.

The rationale for the MVA detection is that, according to Sect. 2.3.2, these areasare geographical regions where the routes of a user frequently change their speed orbearing in a significant manner. Since such changes give raise to TPs and SPs, a MVA

123

Page 10: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

F. Terroso-Saenz et al.

Fig. 5 Snapshot of the TPs (yellow pins) extracted from the routes of a user (colored lines) in a crossroads(Color figure online)

can be regarded as a particular region where many TPs or STs occur quite close toeach other. For instance, Fig. 5 shows a vignette of TPs from a user at a particularcrossroads. Since the user frequently turns in this area, many TPs are detected quiteclose to each other. Therefore, the region comprising all these points can be regardedas a potential MVA.

As a result, a density-based clustering of SPs and TPs has been applied for MVAdetection. In short, density-based clustering is based on the concept of local neigh-bourhood N of a point p, that is, the number of points that are within a certain straightdistance Eps to p,

N (p) = {q ∈ S | dist (p, q) ≤ Eps}

where S is the set of available points. If |N (p)| is over a certain threshold MinPoints,then N (p) is considered a cluster. Furthermore, N (p) is density-joinable to N (q) (p�=q) if N (p) ∩ N (q) �= ∅

The density clustering algorithm applied for MVA detection is a slightly modifiedversion of the online landmark discovery algorithm (LDM) described in (Terroso-Saenz et al. 2015).

In brief, LDA firstly intends to detect the set of centroids C from the received pointsS where C = {c1, c2, . . . , cn} ⊂ S and

– |N (ci )| ≥ MinPoints ∀ ci ∈ C,– dist (ci , c j ) > Eps, i �= j ∀ ci , c j ∈ C

On the basis of the detected centroids, a landmark L is defined as a group of centroidswhere the neighbourhood of each centroid is density-joinable with the neighbourhoodof at least other centroid in the same landmark as Fig. 6 illustrates. Therefore, theLDM basically returns, given a point p, the landmark L of the centroid c whose

123

Page 11: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

Online route prediction based on clustering of meaningful...

Fig. 6 Example of a landmarkreturned by the LDA

Fig. 7 Example of two MVAs that are merged since the new incoming location p makes both MVAs tohave in commonMinPointsToMerge points. As a result, a newMVA comprising the previous ones is created(MVA_1_2)

neighbourhoodN (c) compries p, if any. At the same time, it also detects the differentcentroids and landmarks.

For PRoPTurn, this algorithm has been adapted so that the returned landmarks nowplay the role of theMVAs. Furthermore, the original algorithm has also been improvedin certain aspects to make it more robust.

As a result, the applied algorithm basically comprises three steps. In the first one(lines 3–9 of Algorithm 2), the algorithm tries to detect whether the incoming TP orSP p is already within the neighbourhood of any existing centroid. In that case, p isassociated with the first discovered centroid (line 5–7). Furthermore, the algorithmmerges the MVAs of all the centroids whose neighbourhoods include p (line 8–9). Inorder to avoid the over-merging problem of the original LDA, two different MVAsare now merged only if the neighbourhoods of two of their centroids share at leastMinPointsToMerge points. This way, it is possible to adjust the required overlappingof twoMVAs to be merged. For instance, Fig. 7 depicts twoMVAs (MVA_1, MVA_2)with one centroid each (c1, c2) that aremergedwhen anewpoint pmakes bothMVAs tohave three points in common which is the value ofMinPointsToMerge in this example.

The second step (lines 10–19) is executed if the incoming point p is not includedin any existing centroid’s neighborhood. In that case, the algorithm first checks if pcan be considered a centroid (line 11). If that condition is accomplished, then p is

123

Page 12: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

F. Terroso-Saenz et al.

linked with the points in its neighborhood as their centroid (lines 16–17), and its newlandmark is merged with any other surrounding (lines 18–19).

Algorithm 2: The MVA discovery algorithm.Input: A SP or TP location pOutput: The MVA m containing p, if any.

1 m ← null2 S ← S ∪ p3 for each q ∈ N (p) do4 if q ∈ C then5 if p.centroid = null then6 p.centroid ← q7 m ← q.MVA

8 else if N (q) ∩ N (p.centroid) > MinPointsT oMerge then9 m ←Merge(p.centroid.MVA,q.MVA)

10 if m = null then11 if |N(p)| ≥ MinPoints then12 C ← C ∪ p13 p.MVA ← new_MVA(p)14 m ← p.MVA15 for each q ∈ N (p) do16 if q.centroid = null then17 q.centroid ← p

18 else if N (q.centroid) ∩ N (p) > MinPointsT oMerge then19 m ←Merge(p.MVA,q.centroid.MVA)

20 else21 minD ← ∞22 for each q ∈ N (p) do23 d ← dist(p,q)24 if d < Eps × CloseFactor ∧ d < minD ∧ q.centroid.MV A �= null then25 minD ← d26 m ← q.centroid.MVA

27 if m = null ∧ p.type = TP then28 m.bear f inal ← p.bear f inal

29 return m

Next, the algorithm has been extended with a new phase with respect to the originalLDA (lines 21–26). Its goal is to overcome a common problem of density-basedclustering that occurs when a point is quite close to a cluster’s border but not inside it.In a classical approach, the algorithm will not return any cluster. However, commonsense dictates that such point can be viewed as part of its closest cluster. Hence,this new phase returns the closest MVA to p provided that the distance is less thanEps × CloseFactor (CloseFactor > 1) (line 24). Unlike the first phase, p is notlabelled with the outcoming MVA m as it is not actually included in any centroid’sneighbourhood of such MVA.

123

Page 13: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

Online route prediction based on clustering of meaningful...

The MVA-detector module executes two isolated instances of the algorithm, onefor TPs and another for SPs. Whilst the former generates MTAs, the latter composesthe MSAs. In that sense, it is important to note that when the algorithm instance forTPs returns a MTA, such area is enriched with the final bearing reported by the TP(lines 27–29). Since this point represents the last turn of the ongoing trajectory, itsfinal bearing indicates the current bearing of the route. This extra information willhelp to provide better prediction capabilities as it is detailed in Sect. 2.5.

In the end, the adopted density-based clustering for MVA detection allows to com-pose a pseudo-map of frequent stops (MSAs) and turns (MTAs) of each single user.This pseudo-map is independent of the type of ecosystem the user moves around (e.g.urban or rural) as it only depends on the velocity features of the user’s routes. If a ruraluser has routes with more turns than an urban user, the system will generate a morepopulated MSA-MTA map for him than for the urban one.

Computational complexity Computing the neighbourhood of a point isO(n2)withouta spatial index and O(n log n) with R-trees (Guttman 1984) where n is the numberof stored locations, and the join computation of two clusters is also O(n2) without aspatial index and O(n log n) with R-trees. Therefore, the complexity of the first andsecond steps is O(n2 log n) ≈ O(n2) whereas the third one is O(n). Therefore, theoverall complexity of the algorithm is O(n2).

2.3.5 Query-route composition

The output of the two instances of the MVA discovery algorithm is used to make up,on the fly, the abstraction of the ongoing route (aka query route) Rq

abs . This processdepends on whether these instances were capable of actually detecting an MVA andits type.

In case of a MSA (lines 1–9 of Algorithm 3), it indicates that the user has reacheda meaningful stop and, hence, the current route has come to an end. Therefore, thesystem appends it as the final destination of the current route and restarts the queryroute with the same MSA as its origin (lines 3–4). Moreover, the abstraction of thejust-finished routeR f

abs is sent to themultigraph builder for the model generation taskas it is explained in Sect. 2.4 (line 9).

In case of a MTA, it is added at the end of the current sequence enlarging it (line12). Moreover, the current bearing of ongoing route is updated with the last reportedturn (line 14).

It is important to notice that not all the TPs and SPs delivered by the mobile devicewill be mapped to MVAs. Therefore, they will not be reflected as elements in Rq

abs .Nonetheless, in case of a SP, which reports a meaningful stop ST p, a new (empty)query route must be started. This is because, according to Definition 2, a route isdelimited by two meaningful stops acting as its origin and destination. This situationis handled by lines 6–8 of Algorithm 3.

As a result of this lack of correspondence between TPs-SPs andMVAs, the abstrac-tion of a completed route R f

abs may not always take the general form of Definition2. These gaps are due the online approach of ProPTurn where the route abstractions

123

Page 14: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

F. Terroso-Saenz et al.

Algorithm 3: The query-route generationInput: A MVA m and its associated type mtype ({MSA,MTA})Output: The ongoing route abstraction Rq

abs1 if mtype = MSA then2 if m �= null then

3 R fabs ← append(Rq

abs ,m)

4 Rqabs ← m

5 call Prediction_Maker(Rqabs )

6 else

7 R fabs ← Rq

abs8 Rq

abs ← ∅9 call MultiGraph_Builder(R f

abs )

10 else/* the MVA is actually a MTA */

11 if m �= null then12 Rq

abs ← append(Rqabs ,m)

13 call Prediction_Maker(Rqabs )

14 Rqabs .current_bear ← m.bear f inal

15 return Rqabs

and its atomic elements (MVAs) are created in parallel. This defines a convergenceperiod for each route until its abstraction is stable and, hence, the same in each of itsrepetitions.

Definition 6 The convergence period of a route R is the time interval during withits abstractionRabs may change from one repetition to another by incorporating newMVAs to the sequence.

Hence, the convergence period can be viewed as the time window required byPRoPTurn to detect all the MTAs and MSAs composing the route’s abstraction.

2.4 Model generation

For this stage, PRoPTurn encodes the statistical information from the historical routesas a directed multigraph Gstat . This way, each vertice of Gstat represents a uniqueMVA.

As we can see from the illustrative example of Fig. 8, the multigraph approachallows to connect the same pair of MVAs with different directed edges where each oneis labelled with the identifier, frequency and destination MSA of a particular route.This way, a route is encoded in Gstat as a sequence of exclusive edges connectingthe MVA vertices in the same order than the route’s abstraction. For example, themultigraph of Fig. 8 indicates that routeR2, whose abstraction isR2

abs = {MSA3 →MT A2 → MT A4 → MSA1}, has been covered 12 times by the user. The labellingof the edges with the MSA comprising the final destination of each route will be veryuseful to provide early prediction of the destination as we will see later in Sect. 2.5.

123

Page 15: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

Online route prediction based on clustering of meaningful...

Fig. 8 Example of the ProPTurn multigraph structure. Each edge is labelled with the tuple {routeid:frequency:destination}. It comprises four different routes, R1

abs = {MSA1 → MT A1 → MT A2 →MT A3 → MSA2} (green edges), R2

abs = {MSA3 → MT A2 → MT A4 → MSA1} (red edges),

R3abs = {MSA1 → MT A4 → MT A3 → MT A2 → MSA3} (aquamarine edges), R4

abs = {MSA1 →MT A4 → MT A3 → MT A2 → MSA1} (purple edges) (Color figure online)

Gstat is updated by the multigraph builder module on the basis of each finishedroute abstractionR f

abs delivered by theMVA detector component (see Fig. 2). This isdone by means of a two-step procedure.

– Firstly, the module checks if R fabs is already fully included in Gstat . A route is

considered as fully included in Gstat iif all its MVAs are already vertices of Gstatand there exists a route identifier whose associated edges connect these MVAs inthe same order than the abstraction. If that is the case, the identifier of such routeis extracted. Otherwise, a new identifier is generated.

– Secondly, the frequency attribute of each edge associated to this identifier is incre-mented. In case of a new identifier, a new set of edges connecting the MVAs arecreated. During this step, if the incoming abstraction comprises a new MVA, anew vertice representing this new area is also generated.

Figure 9 shows two common cases of Gstat update that take as reference the modelin Fig. 8. Figure 9a shows the case whereR f

abs is fully included in Gstat as routeR1.In this case, the frequency of the edges representing such a route are incremented by1 (from 8 to 9) to reflect the fact that it has been covered by the user again. Figure9b shows the case where R f

abs represents a new route that, in addition, covers a newMVA (MSA4). As a result, Gstat is enlarged with a new vertice and a set of edges torepresent the new route labelled as R5. Besides, this example also depicts the R f

absof a route in its convergence period as it does not define a MSA element as origin.

This multigraph approach allows that a completed route is reflected in the model byonly inserting each new MVA once. On the contrary, in a tree-based approach a newroute is inserted as one or multiple branches comprising several nodes (Chen et al.2011). Furthermore, this lightweight update method makes possible to incrementally

123

Page 16: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

F. Terroso-Saenz et al.

(a) (b)

Fig. 9 Examples of multigraph updates. a A previously-covered route is used to update the multigraphby increasing (green edges). b A novel route updates the multigraph by adding a new vertice and set of(orange) edges (Color figure online)

integrate on the fly new routes. This particular feature is quite useful so as to smoothlyadapt to changes of a user’s mobility routines. For instance, when a regular routeis split into two different ones because the user decides to take a new path to samedestination. This is not possible in previous off-line processes where the probabilisticmodel is hard-coded beforehand.

2.5 Route prediction

Each time the MVA detector module enlarges or restarts Rqabs by appending a new

MVA (see Algorithm 3), the new query route is delivered to the prediction maker inorder to provide a new route prediction, as Fig. 2 shows. Apart fromRq

abs , this modulealso takes as input the multigraph Gstat .

On the basis of these two elements, the prediction maker focuses on forecasting theMSA containing the final destination and the next MTA to be covered by the ongoingroute. In order to provide the prediction with and adjustable reliability, the systemalso considers a domain-dependant parameter minProb that defines the minimumprobability of the prediction to be considered a suitable outcome.

In a nutshell, the proposed solution searches in Gstat the routes that fitRqabs . Next,

the final-MSA and next-MTA prediction is made by using the selected routes’ edgesand the current bearing ofRq

abs . As Algorithm 4 shows, the prediction method can bedivided into three steps.

First of all, the real-time and incremental approach for MVA detection might giverise to the situation where Rq

abs is compound of some MVAs that have not beenincluded in Gstat yet. This is because new MVAs are reflected in Gstat only when theroute comprising them is completed (see Algorithm 3). This will cause a mismatchbetweenRq

abs and Gstat even though the underlaying route has actually been coveredbefore. Figure 10 illustrates this situation. As we can see, the ongoing route Rq isquite similar to the historical routesR3 andR4 (already included in Gstat ). Since theMinPoints parameter of Algorithm 2 was set to 3, it gives rise to a new MTA (MTA5)

123

Page 17: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

Online route prediction based on clustering of meaningful...

Fig. 10 Example where the query route Rq gives rise to a new MTA (MTA5) that causes an artificialmismatch with its two most similar historical routes (R3, R4)

and, hence, its abstraction Rqabs = {MSA1 → MTA5 → MTA4} does not fit any

previous (historical) route.To solve this issue, the prediction algorithm only takes into account the MVAs of

Rqabs that are already contained in Gstat . Therefore, the first step of the prediction

algorithm is to remove from the incoming query route those MVAs that have beendetected during the ongoing route’s lifetime (line 1 of Algorithm 4). Going back to

our example, this will remove MTA5 from Rqabs , having Rq ′

abs = {MSA1 → MTA4}.This first removal step allows the cohabitation of the model generation and the routeprediction phases together in the same processing loop.

OnceRqabs has been adapted to Gstat , the algorithm detects the historical routes that

best fit the query route. Since in Gstat each historical route is reflected as a sequenceof exclusive edges, this detection is done by searching the maximum set of edges that

connect, in the same order, the Rq ′abs MVAs in Gstat . This task is carried out by the

function select_edges (lines 15–35 of of Algorithm 4). This function incrementally

intersects the outbound edges of all the MVA in Rq ′abs (lines 16–22) so that, in each

iteration of the loop, the function only retains the set Eres of candidate edges thatconnect the MVAs in the specified order.

However, due to the data sparsity problem, the query route could not match anyhistorical route. Hence, the aforementioned process will lead to an empty set. In orderto cope with this problem and provide an alternative set of candidate edges, we havedefined a lightweight heuristic. This heuristic is based on the intuition that a route’sorigin predefines in a high degree its potential destinations. For instance, if a personis used to going to the shopping mall or to the gym after working, and we know thatthis person has departed from his office, we could infer that his final destination willprobably be either the shopping mall or the gym event though he is actually taking anew route to reach them.

The aforementioned heuristic is included in the select_edges function as a specialcase (lines 16–22 of Algorithm 4). In this case, the outbound-edge intersection is

restricted to the first and last MVAs of Rq ′abs . Hence, only first MVA of Rq

abs and themost recent MTA are taken into account, avoiding the intermediate points of the route.

Previous heuristics to deal with the data sparsity problem are not completely suit-able in this scenario (Chen et al. 2011). Since they are based on sub-trajectories ofconsecutive elements, they are not prepared to process routes that are still in their

123

Page 18: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

F. Terroso-Saenz et al.

Algorithm 4: The PRoPTurn prediction methodInput:– Query route Rq

abs– multigraph Gstat– minimum probability minProb

Output:

– msadest containing the possible route’s destination– mtanext comprising the potential next meaningful route’s turn.

1 Rq ′abs ← remove_new_MVAs(Rq

abs )

2 E ← select_edges(Rq ′abs ,Gstat )

3 msadest ← ∅4 foreach m ∈ MSAS(E) do5 probm ←

∑e∈E|e.MSA=m e. f req

∑e∈E e. f req

6 if probm ≥ minProb then7 msadest ← msadest ∪ m

8 mtanext ← ∅9 foreach t ∈ ENDING_VERTICES(E) do

10 probt ←∑

e∈E|e.end_vertice=t e. f req∑e∈E e. f req

11 if probt ≥ minProb then12 mtanext ← mtanext ∪ t

13 return msadest ,mtanext14

15 function select_edges(Rq ′abs ,Gstat )

16 mva ← Rq ′abs .get(0)

17 Eres ← get_outbound_edges(Gstat ,mva)18 i ← 1

19 while i < Rq ′abs .length do

20 mva ← Rq ′abs .get(i)

21 Em ← get_outbound_edges(Gstat ,mva)22 Eres ← Eres ∩ Em23 if Eres = ∅ then

24 mvai ← Rq ′abs .get(0)

25 mvae ← Rq ′abs .lastMVA

26 Ei ← get_outbound_edges(Gstat ,mvai )27 Ee ← get_outbound_edges(Gstat ,mvae)28 Eres ← Ei ∩ Ee29 return Eres30 i++

31 foreach e ∈ Eres do/* the function returns the MVA at the end of edge e in Gstat */

32 mva ← get_ending_vertice(Gstat ,e)33 if bearing(Rq ′

abs .lastMVA, mva) /∈ [Rqabs .current_bear± �dir ] then

34 Eres ← Eres − e

35 return Eres

123

Page 19: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

Online route prediction based on clustering of meaningful...

convergence period and, hence, their abstractions might change in each new repetition(which, in turn, will change their intermediate sub-trajectories). On the contrary, theadopted heuristic only relies on the origin and last MVA of the query route discard-ing any intermediate sub-trajectory. By reducing the number of required MVAs, theimpact of new MVAs in the abstraction is minimized. However, it is true that givena person with a highly-dynamic movement, who has many different routes departingfrom the same MSA, the accuracy of the heuristic might decrease. Lastly, it should bealso remarked that this heuristic profits from the multigraph structure of Gstat , whereeach MVA is represented as a single vertice, that allows to detect the common routesof any pair of MVAs in a rapid manner.

Next, the resulting edges Eres and, thus, the potential prediction candidates, arerefined by considering the current bearing of the ongoing route (lines 23–29 of Algo-rithm 4). The rationale behind this refinement is that a route moves from one turnto the next one by following a straight movement. Consequently, since the currentbearing of the ongoing routeRq is known (see Algorithm 3), the prediction outcomeshould be consistent with such a value. To be specific, the predicted next turn (MTA)should be reach by following a bearing similar to the current one (± �dir ). Therefore,those edges associated with MTAs that would be reach only if a completely differentdirection were followed are discarded for the prediction process.

Figure 11 depicts an illustrative example of the aforementioned edge refinement.Given the route Rq

abs = {MSA12 → MT A10 → MT A18}, suppose that four poten-tial MTAs emerge as candidate predictions according to the frequency information inGstat , {MSA1, MT A2, MT A5, MT A9}. However, only MSA1 and MT A9 are actu-ally compatible with the current direction of the ongoing route (200◦). This causes theother two MTAs to be filtered out as prediction candidates.

Rq current bearing=200ᵒ(SSW)

MTA12

MTA10

MTA18

MTA2

MTA5

MTA1

MTA9

Rq.current bearing -▽

dir =155ᵒ

Rq.current bearing +▽dir=245ᵒ

bear

ing(

MTA

18,M

TA9)=

230ᵒ

bear

ing(

MTA

18,M

TA1)=

190ᵒ

Fig. 11 Example where two MTAs, MSA2 and MTA5 , are discarded as potential next-turn predictions asthey are not in the same direction than the current bearing of the ongoing route

123

Page 20: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

F. Terroso-Saenz et al.

Keeping on with our exemplary query route (Rq ′abs = {MSA1 → MTA4}) and

given the Gstat depicted in Fig. 8, select_edges would return the set of edges(Eres ={3:18:MSA3,4:16:MSA1}) as R3 and R4 are the only two routes of Gstatthat connect MSA1 and MTA4 in the same order than Rq .

Finally, lines 3–13 comprises the third and last step of the prediction method. Thisstep is responsible for actually generating the prediction outcome of the method. Thisis done by exclusively using the set of edges E of the previous step.

For the ending MSA prediction (lines 3–7), the method makes use of the edges’attribute indicating the destination MSA of its underlying route. Thus, the methodfirstly calculates for each destination MSA m in E (MSAS(E)) its probability probmof being the final destination of the ongoing route. Such a probability is calculated onthe basis of the frequency attribute of the edges in E (line 5). Lastly, the parameterminProb is used to filter out those regions with a too low probability to be considereda suitable outcome.

As for the next MTA prediction (lines 8–12), a similar approach is applied. Inthis case, the candidate MTAs are the ending vertices of the edges in E (END-ING_VERTICES(E)). These MTAs are the regions visited by the user just after

covering the sequence inRq ′abs according to the historical routes in E .

Lastly, the method would return the two different sets (MSAdest for the final MSAand MTAnext for the next MTA) as its prediction results. We should notice that bothsets may comprise a variable number of elements in each execution depending onprobability of each candidate MSA or MTA.

Returning to our exemplary scenario, the systemwould extract the setsMSAS(E) ={MSA3,MSA1} and ENDING_VERTICES(E) = {MTA3} from the edge set Eres . Onthe basis these two sets, the system would forecast that, after MTA4, the meaningfulturn will be in MTA3 with probability 1.0 ( 18+16

18+16 ) whereas the final MSA will be

either MSA3 with probability 0.47 ( 1616+18 ) or MSA1 with probability 0.53 ( 18

16+18 ).Provided that minProb was set to 0.5, the final prediction of the system would beMSAdest = {MSA1} and MTAnext = {MTA3}.

Finally, the two sets comprising the route prediction are eventually delivered back tothe mobile device. As Fig. 2 shows, they are processed by the prediction handler. Thismodule is in charge of sharing that information with other local or remote location-based services interested in forecasted movement information. Nevertheless, theseservices are out of the scope of the present work.

To sum up, the prediction approach proposed by PRoPTurn introduces two keyimprovements.

– Unlike previous prediction solutions that rely on fixed and previously-generatedmodels, the proposedmethod fosters an alternative online approachwhere themodelevolution and the prediction inference are integrated in a single loop.

– Secondly, the system copes with the data sparsity problem by means of a heuris-tic that only takes into account the first and last MVAs of the query route incase of a route mismatch. Unlike current solutions based on sub-trajectory pre-dictions, the adopted solution is most suitable for environments with evolving routeabstractions.

123

Page 21: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

Online route prediction based on clustering of meaningful...

(a) (b)

Fig. 12 Example PRoPTurn privacy protection. aMovement of TPs and SPs before being sent to the server.b Example of decoding a MVA location from the server before being delivered to the prediction handler

2.6 Privacy protection mechanism

When it comes to deal with location data, users are usually quite sensitive aboutpossible privacy leaks that can expose such information to undesired third parties.Therefore, it is paramount to develop solutions to counteract these potential safetythreads. In that sense, existing solutions usually follow two courses of action by, (1)applying a noise function to perturbate location dataset (Pham et al. 2010) or (2) bycarefully deleting certain points of the dataset to hide sensitive areas (Xue et al. 2013).

In PRoPTurn, we have developed a lightweight privacy mechanism that solely runson themobile device side. Basically, themechanism, supported by the privacy providercomponent, encodes the SPs and TPs before being send to the server and and decodesthe MVAs emitted by the server as prediction results. Consequently, the solutionsbased on the deletion of specific data points mentioned above is not feasible as all thedetected SPs and TPs are required for MVA composition afterwards.

To be specific, proposed solution is compound of three different steps, (1) take thelocation of each SP, TP or MVA, (2) to move such a location a certain distance δ toa certain direction β (in case of a SP or TP) or −β (in case of a received MVA), (3)to replace the original SP/TP/MVA location with the new shifted one. This one willbe the location transmitted to the server or processed by the prediction handler in themobile device. An example of this process is depicted in Fig. 12

This way, if a privacy leak occurred during the transmission or directly in the server,the uncovered locations would not be the original but the shifted ones.

An instrumental aspect of this solution is that the δ and β parameters must alwaystake the same values for a user. This way, all her locations are shifted in the same wayso that their relative distances remain the same. This is quite important for the MVAdiscovery algorithm.

Therefore, δ and β are randomly generated when the mobile device side is executedfor thefirst time and stored for subsequent executions. For that purpose, certain device’s

123

Page 22: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

F. Terroso-Saenz et al.

Table 2 GL dataset general information

Total Per route

Users Locations Routes Time period Locations Time length

20 4259546 10606 2007-04-12 →2012-07-27 418 ∼ 27′

parameters like its MAC address or timestamp could be used. This way, each singleuser will have different 〈δ, β〉 tuples stored in her personal mobile device.

Finally, this approach has certain similarities with the privacymechanism describedin (Pham et al. 2010) as it also proposes a pertubation of the location data based onposition displacement. However, in that approach, the new locations in the resultingdataset does not preserve their relative distances and positions among them. Since thisis a quite important need of the system in order to provide bearing-based predictions(see Algorithm 4), that mechanism would not completely feasible in our framework.

In addition to that, since solution only involves basic mathematics, it is suitable forconstrained scenarios in computational terms like handheld devices.

3 Experimental results

In order to state a comprehensive view of our proposal, we evaluated PRoPTurn on areal-world and a synthetic data set. Besides, we compared our proposal with a well-known predictive approach in the mobility mining domain.

3.1 Experiment setup

Datasets In order to test PRoPTurn, two different datasets have been used.

– The GeoLife dataset (GL) (Zheng et al. 2010), a public collection of human tra-jectories produced by 178 users carrying different GPS feeds in a period of overthree years. From this dataset, we extracted a representative subset of the 20 userswith more locations. The resulting dataset, whose general information is shownin Table 2, was the one used for the present evaluation. More information aboutthe target users can be found in Appendix 2. Figure 13 depicts the trace of theGL dataset. This subset falls into the square of latitude 40.17–39.86 and longitude116.14–16.34 covering a large area of Beijing city (China).

– We also use the Brinkhoff simulator (BK) () to generate a collection of synthetictrajectories on the road map of Oldenburg (Germany). This map took the formof a graph comprising 6105 nodes and 7035 edges. At each time step, each mov-ing object generated one location by using the default time and distance units ofthe generator. As a result, a dataset comprising 10074 different trajectories with154000 locations were generated.

Settings The PRoPTurn evaluation was conducted on a PC running a Ubuntu 12.04operating systemwith 2GiB ofmemory, Intel(R) Core 2 at 2.66GHz and Java Runtime

123

Page 23: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

Online route prediction based on clustering of meaningful...

Fig. 13 Digital trace of the GL dataset

Table 3 ProPTurn defaultconfiguration

Parameter ProPTurn module Value

dmin , dmax GPS data cleaner 100m, 800m

�dir TP detector 45◦Tst SP detector 5′Eps, minPoints MVA detector 300m, 15

minPointsToMerge MVA detector 7

minProb Prediction maker 0.9

Environment 7.0 (JRE 7) with 1.5 iB of allocated memory. Furthermore, Table 3summarizes the default configuration of PRoPTurn for the evaluation.

Reference approach PRoPTurn has been compared with the probabilistic path predic-tor R2–D2 (Zhou et al. 2013). Based on a grill-based route abstraction, the foremostnovelty of this work is to provide a mechanism to extract on the fly the historicaltrajectories suitable to predict the future movement of an object of interest. Like ourapproach, such mechanism intends to overcome the drawbacks of offline solutionswhere the historical trajectories are abandoned once the models or patterns have beengenerated. Table 4 shows the R2–D2’s key parameters setting for both datasets. Theselected configurations were the ones that provided better results for each dataset.

Measurements The evaluation of the framework has been carried out in the light oftwo different measurements, the prediction rate (PR) and the distance error (DE). PRcounts the number of query routes for which at least one tuple 〈mtanext ,msadest 〉 is

123

Page 24: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

F. Terroso-Saenz et al.

Table 4 Reference approach configuration

Parameter Description Value for GL Value for BK

θ Confidence threshold 0.2 0.2

dmic Cell width√2 × 3

√2

α Ccore function 1/16 1/16

h Backward steps 3 3

provided as prediction. By means of this factor, we intend to measure the coverageof the proposal. It should be made clear that detection rate counts the predictions foreach version of the query route (whenever a new element is appended to its sequence).Therefore PR can be defined by means of the following formula,

PR = # Rqwith prediction

# Rq

DE is the average of all distance deviations across each prediction of all query routes.This measure indicates how far PRoPTurn deviates from the true next-MTA or finalMSA. In order to measure the distance between two MVAs, a representative locationfor each MVA m is calculated as the averaged point of all its centroids, ζ(m). Thus,DE is calculated by the following formula,

DE = dist(ζ(mtaprednext ), ζ(mtarealnext )) + dist(ζ(msapreddest ), ζ(msarealdest ))

2

wheremtaprednext andmsapreddest are the predictor outcome,mtarealnext andmsa

realdest the actual

next turn and destination of the ongoing route and dist the euclidean distance. If thesystem provided more than one mtaprednext or msapreddest the distance is calculated as theaverage one among all the predicted MVAs and the actual one.

3.2 Evaluation of the ST-TP detection mechanism

One of the key reasons to detect SPs and TPs locally on the mobile side was to reducethe load of data sent to the server side. Therefore, we firstly decided to evaluate theeffectiveness of such an approach. To do so, we defined three different policies forthe mobile side to send data to the server, each one with a different level of localprocessing, namely

– P1: all the filtered GPS events from theGPS data cleaner module are directly sentto the central server (the SP and TP detection would be done on the server).

– P2: all the SPs and TPs are sent to the central server without a previous localmicroclustering (the TP aggregation would be done on the server).

– P3: all the SPs and the aggregated TPs are sent to the server. This last policy is theactual operational mode of ProPTurn.

123

Page 25: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

Online route prediction based on clustering of meaningful...

Table 5 Number of points sent to the server with different policies

Policy GL # sent points BK # sent points Avg. reduction

P1 2506053 (−41.1 %) 143823 (−6 %) −21.5 %

P2 180269 (−95.7 %) 30376 (−80.2 %) −87.9 %

P3 51803 (−98.7 %) 12024 (−92.2 %) −95.4 %

Table 5 shows the number of points sent to the server from a mobile phone given theaforementioned policies and its reduction with respect the sheer number of locationsof the GL (4,259,546 points) and BK (154,000 points) datasets.

From the results, we can see that by only emitting all the SPs and TPs (policy P2)we can reduce the traffic load over a 87 % with respect raw data. However, if we alsoaggregate the TPs locally in the mobile side (policy P3), the gainance is even greaterby achieving a 95% reduction of the emitted data. In that sense, results are better in theGL dataset for two reasons. Firstly, the simulated data comprises less outliers, so GPSfiltering discards less incoming locations. Secondly, since BK objects are constrainedby a road map, consecutive and close TPs are less likely to occur and, thus, fire themicroclustering procedure.

All in all, this shows that the local processing preform by ProPTurn can remarkablyreduce the communication between the mobile side and the back-end server.

3.3 Effect of MVA size

One of the factors that affect most the PRoPTurn performance is the spatial size ofthe MVAs. This size is mainly defined by the Eps and minPoints parameters of theMVA discovery algorithm. Figure 14 depicts the detection rate and the distance errorof the proposal for different Eps×minPoints configurations, 100 × 5, 300 × 15, 600× 30, 1000 × 50, 2000 × 100, 3000 × 150. As we can see, increasing the MVAsize leads to a decrease of the prediction accuracy of the next-MTA and the final-MSA. This is particularly noticeable in the distance error factor. This is because ifwe increase the size of a MVA (the spatial region it covers) then the distance betweenits representative location and the actual location of the next meaningful turn or finaldestination will inevitably increase. Apart from that, increasing minPoints causes thata particular region (crossroads, roundabout, corner, and the like) needs to be coveredmore times to give rise to a MVA. Hence, this makes the convergence period longeras it is required more repetitions of a route so that all its SPs and TPs are mapped toMVAs. This eventually leads to a decrease of the detection rate due to the instabilityof the system during that period.

Figure 14 also shows that 100×5 and 300× 15 configurations provide quite similarprediction accuracy. Consequently, and in order to providemore conclusive results, wedecided to study the prediction lenght of the configurations. This way, Fig. 15a showsthe distance between the current location of the user and the next-MTA forecastedby PRoPTurn. This length determines how much the system can anticipate the user’smovement. In that sense, the 300× 15 configuration substantially improves the 100×

123

Page 26: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

F. Terroso-Saenz et al.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

100 300 600 1000 2000 3000 0

100

200

300

400

500

600

700

800

900

1000

1100

1200

1300

1400P

R

DE

(m

eter

s)

Area size (meters)

PR DE

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

100 300 600 1000 2000 3000 0

50

100

150

200

250

300

350

400

PR

DE

(un

its)

Area size (units)

PR DE

(a) (b)

Fig. 14 Effect of the MVA size in the prediction accuracy for both GL and BK datasets. a Impact on GLdataset. b Impact on BK dataset prediction

0

200

400

600

800

1000

1200

1400

1600

1800

2000

2200

2400

2600

2800

100 300 600 1000 2000 3000

Pre

dict

ion

leng

th (

GL

met

ers

| BK

uni

ts)

Area size (GL meters | BK units)

GL BK

(a)

0

50

100

150

200

250

300

350

400

450

100 300 600 1000 2000 3000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Num

. of M

VA

s

Dis

t. am

ong

MV

As

(GL

km |

BK

kun

its)

Area size (GL meters | BK units)

GL num. BK num. GL dist. BK dist.

(b)

Fig. 15 a Length of the prediction given different MVA sizes. b Number of MVAs (MTAs + MSAs) andaverage distance among them for different sizes

Table 6 ProPTurn time latency for different Eps×minPoints configurations

Eps×minPoints

100×5 300×15 600×30 1000×50 2000×100 3000×150

Time latency 1.81 ms 1.42 ms 1.25 ms 1.23 ms 1.15 ms 1.10 ms

5 one for both GL and BK datasets. This is because, as Fig. 15b depicts, the 300 × 15configuration generates less MVAs per user and, hence, the average distance betweenthem is longer. This, in turn, also causes a longer prediction length.

Finally, we also studied the MVA size effect on the prediction time latency ofthe system. Table 6 shows the averaged latency for both datasets given the targetEps×minPoints configurations. The inverse correlation between the MVA size andthe time latency is because larger area size involves less number of MVAs (as Fig. 15bdepicted). This number is a key factor in the MVA discovery algorithm performanceas the more number of MVAs under consideration the more time required to map aSP/TP to the most suitable MVA.

123

Page 27: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

Online route prediction based on clustering of meaningful...

Fig. 16 MVAs generated by PRoPTurn on GL dataset. a MSAs’ centroids. bMTAs’ centroids

All in all, the selected Eps×minPoints configuration for the remainder of the eval-uation was 300×15 as it provided the best trade-off between detection rate, distanceerror, prediction range and time latency.

For the sake of completeness, since the GL dataset comprises real routes, the cen-troids of the MVAs generated by selected configuration for such a dataset are depictedin Fig. 16. The MSA centroids (Fig. 16a) are organized in two major areas, one at thenort-west of the region under study and the other as a long line at the center of thecity. The former comprises different colleges and student accomodations, whereas thelatter includes long avenues in the city center with a varied range of public facilities.In both cases, these regions are consistent with the fact that they represent potentialorigins and destinations of everyday trips. Regarding MTAs, their centroids (Fig. 16b)are more spread across the region under study and covering many intersections of theroad network. This is consistent with the MTA meaning given in Definition 3.

3.4 Effect of the convergence time

Another key aspect of the PRoPTurn evaluation was the effect of the convergenceperiod in the prediction process. Figure 17 summarizes the achieved results for boththe GL and BK datasets. Whist the bottom part shows the sheer percentage of TPs andSPs that were mapped to a MVA as PRoPTurn processed the dataset of each user, theupper part correlates such an evolution with the PR and DR of the system.

As we can see, the system needed, on average, to process the 30 % of each user’sdataset to reach a stable percentage of mapped SPs and TPs (around 92 %). This isalso the point where the PR reached its convergence point (0.62) as its variation forthe rest of the experiment was only ±0.5.

This fast convergence is mainly due to two factors. On the one hand, the low valueassigned to minPoints. This parameter makes possible the early detection of MVAsas it is only required 15 close TPs or SPs to generate a MVA. On the other hand, theheuristic applied in case of a query route mismatch allows to make up a predictionoutcome by only using two MVAs. This allows to provide an early prediction evenwhen only a few MVAs have been generated.

123

Page 28: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

F. Terroso-Saenz et al.

0 10 20 30 40 50 60 70 80 90

100

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100% T

Ps

and

SP

s w

ith M

VA

% Dataset

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

450

600

750

900

1050

1200

1350

1500

1650

PR

DE

PRDE

Fig. 17 Study of the convergence-period impact in the prediction accuracy

Table 7 Comparative of the prediction performance considering or not the current bearing of the ongoingroute

Prediction method GL PR GL DE (m) BK PR BK DE (units)

No Bearing-based 0.80 378.4 0.81 152.4

Bearing-based 0.60 (−25 %) 277.1 (−17 %) 0.68 (−22 %) 120.4 (−21 %)

The variation of the bearing-based prediction with respect the no-bearing-based one is shown in brackets

Finally, as the experiment proceeded, the number of MVAs increased which, inturn, caused the steady decrease of the distance error.

3.5 Evaluation of the bearing-based prediction

An important step in the prediction mechanism discards the candidate MVAs thatcan not be reach by following current bearing of the ongoing route (see Sect. 2.5).Consequently, in this section we evaluate whether taking into account or not thisfeature actually helps to provider better prediction outcomes. This is can be done byjust activating or disabling the last loop in the select_routes function of the predictionmechanism (lines 31–34 of Algorithm 4). Table 7 shows the PR and DE of ProPTurnwhen the Rq current bearing is taken into account or not given different predictionlengths.

Since considering the bearing imposes more requirements to the Gstat edges forbeing evaluated, in many situations the system discards all the edges initially selected.As a result, Table 7 shows that the PR is reduced over 22% when the bearing-basedprediction is activated. However, this requirement allows to select a more promis-ing set of candidate edges (and, thus, MVAs). Therefore, results also show that thatconsidering the bearing actually reduces the DE in a meaningful factor.

Finally, it is important to note that the system achieved better results with the BKdataset. This is mainly due to the movement of the objects in such scenario that is

123

Page 29: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

Online route prediction based on clustering of meaningful...

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

>50 >100 >150 >200 >250

PR

Prediction length (m)

PRoPTurn R2-D2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

>50 >100 >150 >200 >250

PR

Prediction length (units)

PRoPTurn R2-D2

0

50

100

150

200

250

300

350

400

450

500

>50 >100 >150 >200 >250

DE

(m

)

Prediction length (m)

ProPTurn R2-D2

0 10 20 30 40 50 60 70 80 90

100 110 120 130 140 150 160 170 180 190 200 210 220 230 240

>50 >100 >150 >200 >250

DE

(un

its)

Prediction length (units)

ProPTurn R2-D2

(a) (b)

(c) (d)

Fig. 18 Comparison with reference framework. a PR on GL dataset. b PR on BK dataset. c DE on GLdataset. d DE on BK dataset

constrained by a road network. To be specific, when a moving object drives across aroad-map node (previously detected as a MTA) from which several segments depart,taking into account the current bearing of the object allows to detect which outboundsegment the object is taking and, thus, predict the ending node of such segment as thenext turn more accurately.

3.6 Comparative with the reference approach

The last evaluated point was a comparison with the R2–D2 predictor. Since R2–D2only predicts future locations but not the final destination of the query route only thePRoPTurn’s next-MTA predictions were considered so as to properly compare bothframeworks. Figure 18 shows the mains results of this study with respect to differentminimum prediction-length settings.

To begin with, Fig. 18a, b depicts the PR of both approaches for the GL and BKdatasets. Regarding this measurement, our solution is slightly overcome by R2–D2given short prediction lengths (>50m/units). However, it clearly outperformedR2–D2given long lengths. Such an improvement is more clear as long as the minimum pre-diction length becomes longer. As a matter of fact, when a minimum prediction length

123

Page 30: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

F. Terroso-Saenz et al.

is set to 150m or more in the BK scenario, R2–D2 was able to only give predictions21 % of the times whereas our proposal provide predictions 60 % of the times.

The reason that explains why PRoPTurn works better than R2–D2 given long pre-diction distances is because the type of ground truth each approach relies on. R2–D2’spredictions are directly based on ongoing routes of other surrounding moving objectsthat are similar to the query route. This allows to easily make up short-term pre-dictions. However, in the long term, the external ongoing routes taken to composeshort-term predictions move towards quite different destinations and thus, they startto differ among them. Therefore, if the required prediction length is increased thenit is more difficult to find a proper set of ongoing routes that are similar thought along path. Therefore, the R2–D2 prediction outcome is rather limited in terms of longdistances.

On the contrary, PRoPTurn’s predictions are based on MTAs previously generatedfrom historical routes. Therefore, it does not suffer from the disturbance of externalongoing routes given long prediction lengths. Moreover, the composed MTAs areusually quite spread across the area of influence of the user (as it was discussed inSect. 3.3). These two factors allow our solution to provide longer predictions in termsof space than R2–D2.

Furthermore, Fig. 18c, d shows that PRoPTurn suffered from a slight DE increasewith respect to the reference framework for short-term predictions (>50 and >100).Nonetheless, while R2–D2 DE was affected by the required prediction length, PRoP-Turn DE did not remarkably increase in the same factor. This led to the fact that, forlong prediction lengths, our solution clearly outperformed the reference framework.

This lack of correlation between the required prediction length and the distance errorof our approach is because of the combination of its MVA-based route abstraction andits prediction mechanism. According to Sect. 2.5, the system only provides one stepprediction for the next MTA, that is, it only forecasts the MTA that are candidatesfor the immediately following MTA of the query route. Due to the spread distributionof the MTAs, these candidates can be located at quite different positions, each oneproviding quite different prediction lengths. Since all of them are one-step predictions,they do not suffer from the accuracy degradation of other methods based on Markovchains that require to concatenate predictions in order to provide a particular predictionlength (as it is the case of R2–D2).

3.6.1 Hybrid approach

From the results described above, we found out that both approaches complementeach other, while R2–D2 performs better given short-term distances, ProPTurn seemsa better solutionwhen it comes to provider longer predictions. Bearing this inmind, wedeveloped a hybridmodel prototype, combining both approaches, capable of providingtwo types of outcomes at the same time, a short-term and a long-term prediction.

Figure 19 shows the flow of information of the model where ST dist defines themaximum distance length of a short-term prediction and LT dist is the minimumdistance length of a long-term prediction. As we can see, the model prioritizes R2–D2as the provider of short-term predictions and PRoPTurn for long-term ones.

123

Page 31: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

Online route prediction based on clustering of meaningful...

Fig. 19 Flow of information of the hybrid model

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

HM R2D2ProPTurn

HM R2D2ProPTurn

HM R2D2ProPTurn

PR

[ST dist,LT dist]

ST onlyLT only

ST + LT

[150,250][100,200][50,150]

150

200

250

300

350

400

450

[50,150] [100,200] [150,250]

avg.

DE

[ST dist,LT dist]

R2-D2PRoPTurn

Hybrid model

(a) (b)

Fig. 20 PRHB and DE of the hybrid solution with respect the two original solutions. a PRHB. b DE

Figure 20 shows the averaged results achieved by the hybrid model (HM) for dif-ferent configurations of [ST dist,LT dist] pairs given the two target datasets. We alsocompare its results with the two original approaches.

Since these two approaches deliver predictions at a difference pace (R2–D2 eachtime a new location is receive, ProPTurn each time a new turn is detected), we redefinedthe PRmeasurement to properly undertake the evaluation. Thus, we divided each routein time slots of 5min, and we counted the total number of slots during which themodelunder evaluation generated at least one prediction. Hence, the newPR formula, PRHB ,was defined as,

123

Page 32: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

F. Terroso-Saenz et al.

Table 8 Contribution of each method to the HM outcome

Method ST pred. LT pred.

[50,150] [100,200] [150,250] [50,150] [100,200] [150,250]

ProPTurn 0.08 0.02 0.04 0.71 0.78 0.90

R2–D2 0.92 0.98 0.96 0.29 0.22 0.10

PRHB = # time slots with prediction

# time slots

Figure 20a depicts this measurement by also indicating the type of outcome ineach time slot, only a short term (ST) prediction, only a long-term (LT) one or, on thecontrary, a prediction comprising a ST and a LT forecast location. From the results, wecan see that HM was able to improve the prediction capability of its two composingmethods. Moreover, HM outperformed its two original methods in order to give bothST and LT predictions. However, the improvement is more clear with short predictionlengths ([50,150]) than with longer ones.

In order to find a reason of this decrease, Table 8, shows the contribution of eachcomposing method to the HM outcome. For example, for the [100,200] configuration,the 22 % of the LT predictions of HM came from R2 to D2. From this data, we can seethat the importance of ProPTurn for LT predictions grew as the required predictionlenghts were increased. Since this type of prediction is the dominant one according toFig. 20a, the prediction capability of HM smoothly became similar to ProPTurn.

Concerning the DE, Fig. 20b shows the averaged distance for both ST and LTpredictions. In the three configurations, HM got a DE between the ones of the twooriginal approaches, closer to R2–D2 given the shortest prediction lengths ([50,150])and closer to ProPTurn for longer configurations. Again, this is due to increasingdegree of influence of ProPTurn if longer distance ranges are configured.

To sum up, the hybrid method prototype shows that it is feasible to combine R2–D2and ProPTurn in order to come up with a more robust predictive approach combiningthe accuracy of the former for short distances and the high PR of the latter for long-term predictions. Nonetheless, the development of more intelligent mechanisms tocombine the outcome of both solutions should be further investigated.

3.7 Results discussion

From the performed evaluation we can draw interesting conclusions.

– First of all, the early detection of SP and TPs in the mobile phone allows to reducethe amount of data sent from the user’s hand-held device to the central server.

– Secondly, the size of the MVA affects in a high degree the performance of theproposal. Therefore, a detailed study of the domain of application should be doneto select the most suitable configuration before running the framework.

123

Page 33: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

Online route prediction based on clustering of meaningful...

– Next, the convergence period has unsurprisingly an important effect in the predic-tion process. However, its impact can be minimized by using a suitable minPointsconfiguration along with the origin-current MVA.

– Besides, taking into account the current bearing of the target route allows toimprove the prediction performance in more than a 20 %, specially with routesconstrained to road maps.

– Lastly, the comparative with R2–D2 shows that such a framework provides morereliable prediction for short distances whereas PRoPTurn has proved to be a moresuitable solution when it comes to provide further forecast locations. Therefore,bearing in mind that the strengths of both approaches complement each other, theyhave been merged into a single solution that make use of R2–D2’s or PRoPTurn’soutput in order to provide, at the same time, short-term and long-term predictions.

4 Related work

Route prediction is based on the assumption that common people follow daily routinesand, thus, have only a set of frequently-visited locations (Pappalardo et al. 2015). Thismakes people’s regular trips quite predictable due to their high level of repetition (Songet al. 2010; Lin et al. 2012).

As a result, a foremost trend for route prediction is based on pattern matching. Inbrief, solutions within this trend compare the route in progress (aka query route) witha set of route patterns (created on the basis of previously observed routes). If a matchfires, the selected (group of) pattern(s) is used to make a prediction. In general terms,the algorithms following this line of work comprise the three stages mentioned in Sect.1 so it is possible to find in the literature different approaches for each of them.

4.1 Route abstraction

As for the route abstraction phase, a common method consists of adapting the queryroute’s trajectory (regarded as sequential data of timestamped locations) to a moresuitable representation for pattern matching. In this frame, existing works can beclassified in three different alternatives.

Firstly, representation based on gridding spatial partion consists of dividing thearea of interest into squared cells of the same size, so each route is represented as asequence of cells according to the sequence of its locations (Zhou et al. 2013; Xue et al.2013; He et al. 2012; Krumm 2006). A major drawback of this type of representationis that defining a proper grill granularity for the whole area of interest is not trivial.Another importand downside is the sharp boundary problem which arises when theroute locations close to the boundary of cells might fall into different cells leadingto a failure in the prediction process. Although some solutions have been alreadyproposed (Wang et al. 2013), they involve extra computation steps, like distance-basedmembership functions, that might hamper prediction performance under computation-constrained scenarios.

A second trend makes use of cartography to represent a route as the sequenceof streets (or segments) a person has covered (Krumm et al. 2013; Qiu et al. 2013;

123

Page 34: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

F. Terroso-Saenz et al.

Krumm2010; Ziebart et al. 2008). Although this approach seems suitable for vehicularprediction, people sometimes move through areas which are poorly covered by street-maps (or not covered at all, like open sea) which makes rather difficult the mandatorymap-matching process. In addition to that, this approach relies on a street-map providerinvolving more computational and/or memory needs for the predictor.

Ourwork is enclosed in a third line ofwork that represents routes as the sequences ofRegions of Interest (ROIs). In this context, our solution states an innovative approachfor ROI-based abstraction. Unlike most works that define ROIs as frequently-visitedregions (Chen et al. 2011; Giannotti et al. 2007; Jeung et al. 2007), our approach statesthat a route’s ROIs can be its origin and destination (MSAs) along with the areaswhere its trajectory’s direction remarkably changes (MTAs). Besides, those methodsgenerally make use of primitive grilled spatial partitions to generate the ROIs so theymight inherit the above-mentioned problems of grilled-based abstractions. On thecontrary, PRoPTurn does not require any previous spatial partition or external street-map database to undertake the route abstraction stage, thus it does not suffer from theaforementioned drawbacks of previous approaches.

PRoPTurn presents some similarities with the route abstraction solution stated in(Alvarez-Garcia et al. 2010). This work intends to detect trajectory forks or crossingsof routes (aka support points) by directly processing the raw location traces with nospatial partition or street-maps involved. These support points are used to representthe routes afterwards. On the contrary, MTAs represent a more general view of thesepoints as they represent any part of a route where its direction meaningful changes.This allows to represent a route whereby more elements (MTAs) and, thus, provide amore detailed representation.

Regarding the density-based clustering for MVA detection, ( Zhou et al. 2004) alsofollows such approach to extract meaningful stop places fromGPS logs. Like the LDAused here, that work also applies density join functions to merge clusters. Nonetheless,it requires thewhole dataset of location points in advance.On the contrary, theLDAhasbeen designed for online environments where the location data is received on the fly.

Finally, (Civilis et al. 2005) also proposes the timely detection of speed or direction-change episodes of routes to predict the future movement of objects. However,its domain of application is restricted to moving object databases and, unlike ourapproach, such work focuses on cars’ routes relying on segment-based abstraction.

4.2 Probabilistic model generation

Once routes’ trajectories have been converted into sequences of representativecells/segments/regions, approaches for the next step have mainly followed two differ-ent trends, trajectory pattern mining and probabilistic model generation.

On the one hand, trajectory pattern mining has been widely studied, and workswithin this discipline can be classified into three lines of research, namely frequent itemmining, trajectory clustering and graph-based trajectory mining (Lin and Hsu 2014).

On the other hand, during the last years a host of studies have put forward novelapproaches to generate probabilistic models for route prediction without explicitlycreating trajectory patterns. In this type of approaches, a database of historical routes

123

Page 35: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

Online route prediction based on clustering of meaningful...

is usually used to make up such models. In that sense, bayesian networks (Krummet al. 2013; Krumm2006), n-th-ordermarkovmodels (Xue et al. 2013), hiddenmarkovmodels (Zhou et al. 2013; Qiu et al. 2013; Alvarez-Garcia et al. 2010; Jeung et al.2007) and markov decision processes (Ziebart et al. 2008) have been some of theapplied solutions.

For this stage, this work studies the incremental generation of the probabilisticmodel in real time instead of carrying out an off-line training. In this context, a similarapproach was already studied in (Krumm 2006). Nonetheless, such mechanism relieson a travel time database previously generated. On the contrary, PRoPTurn does notrequire any apriori information generating the probabilistic model from the scratch.

Regarding the particular model, unlike existing tree-based solutions (Chen et al.2011), PRoPTurn provides a new multigraph-based structure to represent such model.This way, each MVA is represented by a unique vertice, whereas the multi-edgescontain frequency features extracted from the historical routes. Such a representationallows to integrate new route information to the model and detect the common routesbetween two MVAs quite easily. Both features have proved to be quite useful for theincremental model generation and the route prediction heuristic.

4.3 Route prediction

As for route prediction, we can distinguish among works that pursue to detect the finaldestination of the ongoing route (Xue et al. 2013; Alvarez-Garcia et al. 2010; Krumm2006), those that intent to predict the target person’s future movement (either in theshort or in long therm) (Zhou et al. 2013; Krumm 2010) and finally those whose goalis to detect both features (Krumm et al. 2013; Qiu et al. 2013; Chen et al. 2011; Ziebartet al. 2008).

In these three cases, sometimes a prediction cannot bemade since it is not possible tofind amatch between thewhole query route and the probabilisticmodel. In the personalroute prediction context, this is due to the diversity of movements of a person. Thisphenomena, which has been coined the data sparsity problem, has been addressed bysome works for the last years. For example, (Xue et al. 2013) proposed to segmentthe query route and make predictions based on the resulting sub-trajectories insteadof the whole one. (Chen et al. 2011) applies an heuristic that skips certain parts of thequery route when a match failure occurs before starting the match process again.

For this step, ProPTurn intends to detect the next meaningful turn along with thefinal destination the ongoing route. For that goal, it states two key contributions. Firstof all, the proposed route prediction mechanism has been designed to deal with theincremental generation of the probabilisticmodel. This way, the system is able tomakea prediction at the same time the multi-graph is built. This real-time feature is a novelapproach regarding previous literature as most works assume that the probabilisticmodel is staticwhen it comes to make a prediction. Secondly, it faces the data sparsityproblem by means of a lightweight heuristic that only considers the origin and mostrecent MVA of the query route in case of a match failure. This heuristic simplifies theskip-and-follow heuristic described in (Chen et al. 2011) and, as it was described inSect. 3, it still provides suitable accuracy rates.

123

Page 36: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

F. Terroso-Saenz et al.

Lastly, from a Complex Event Processing (CEP) perspective, (Hong et al. 2013)proposes a solution to timely forecast a user’s locations in order to adapt his or herevent-based queries on the basis of such locations. To do so, the work relies on the out-come of well-known prediction techniques like the locations returned by a navigationsystem. In that sense, our research could enhance the mentioned work by providingfurther long-term predictions. Moreover, (Barouni and Moulin 2012) defines a theo-retical CEP framework to deal with spatio-temporal events by means of new spatialoperators based on fuzzy logic. The present research also processes such type of eventsby means of CEP in order to detect SPs and TPs. However, the cited work centres onspatio-temporal patterns among high-level events (e.g. traffic accidents, demonstra-tions, etc.) whereas the present research focuses on patterns related to GPS locationsthat can be regarded as more fine-grained data.

5 Conclusions and future work

In this day and age, mobility mining and, in particular, route prediction have attractedgreat attention from the research community. Although it is possible to find in theliterature severalmethodologies for accurate prediction of a usermobility,most of themfollow two foremost trends. Firstly, the usage of space partitioning or map-matchingfor route abstraction. Secondly, the off-line generation of probabilitic models basedon historical routes. These two trends limit the application and adaptation of existingsolutions to free-movement and dynamic scenarios.

The present work puts forward PRoPTurn which introduces new solutions so asto come up with a whole on-line prediction pipeline. Regarding route abstraction,a real-time client-server method combining Complex Event Processing and densityclustering of velocity change points have been proposed. As for on-line model gen-eration, a novel multigraph structure has been defined so as to minimize the impactof new routes and enhance the interconnection of its vertices. Concerning predictiongeneration, an heuristic to deal with query route mismatches has been proposed alongwith a smooth integration with the endlessly-changing probabilistic model.

Consequently, the proposed solution is suitable for domains where the user move-ment is not constrained by a road network. Moreover, its online model generationcan adapt to user’s mobility shifts making it an interesting solution for those domainswhere the regeneration of an off-line model is not feasible. The evaluation using real-world and synthetic datasets have shown the capability of the system to provide earlyprediction and the suitability of the adopted methodology.

In the end, further work will focus on improving stop and turn detection in themobile client so as to use less battery-draining sensors than GPS, like accelerometer orgyroscope. Such sensors may be used to detect the beginning of stop and turn episodesof a trajectory, and then re-active the GPS sensor to estimate the actual location of suchepisodes. In addition to that, the usage of adaptive GPS sampling will also studiedas potential solution to reduce the battery consumption of the client side. Finally, thedevelopment of mobile applications on top of PRoPTurn so as to provide final userswith novel solutions for their common mobility problems will be also a major goal.

123

Page 37: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

Online route prediction based on clustering of meaningful...

Acknowledgments This research is partially funded by the Spanish Ministry of Economy and Com-petitiveness’ project “Dynamic and Emergent intelligent for Smart Cities based on Internet of Things”TIN2014-52099-R and the European Commission through the ENTROPY-649849 EU Project.

Appendix 1: Event-based rules

Broadly speaking, event-processing rules usually comprises two different parts, (1) acondition part where the requirements for the rule to fire are listed and (2) an actionpart that indicates the actions to be done if the condition part is fulfilled. Hereafter,the rules pseudocode included in PRoPTurn are listed.

rule: ’GPS data cleaning’CONDITION FilteredGPSEvent fgps ->

GPSEvent gpsAND dist(gps.location,fgps.location)

∈ [dmin : dmax ]ACTION new FilteredGPSEvent(gps)

where the -> stands for the followed-by operator.

rule: ’TP detection’CONDITION FilteredGPSEvent fgps_i ->

[1:n]FilteredGPSEvent fgps_j ->FilteredGPSEvent fgps_fAND bearing(fgps_j[n],fgps_f)-

bearing(fgps_i,fgps_j[1])>�dir

AND bearing(fgps_j[k],fgps_j[k+1])>bearing(fgps_j[k+1],fgps_j[k+2])

ACTION new TP(fgps_j[ n2],bearing(fgps_j[n],fgps_f))

where [1:n] stands for a range between 1 and n events.

rule: ’SP detection’CONDITION (FilteredGPSEvent fgps ->

NOT FilteredGPSEvent.within(Tst minutes))OR(NOT FilteredGPSEvent.within(Tst minutes) ->FilteredGPSEvent fgps))

ACTION new SP(fgps)

where .within defines the time window with no filtered GPS events for the ruleto fire.

123

Page 38: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

F. Terroso-Saenz et al.

Appendix 2: Geolife Users’ profiles

# user Total Per route

Locations Routes Time period Locations Time length

1 867, 170 2111 2007-07-21 → 2012-06-17 408 22′2 205, 168 982 2008-10-23 → 2009-07-29 208 19′3 280, 256 838 2007-04-12 → 2012-07-27 334 26′4 180, 324 691 2008-10-23 → 2009-07-05 260 26′5 343, 401 559 2008-09-14 → 2009-09-13 614 26′6 240, 135 523 2008-03-01 → 2009-02-17 459 25′7 175, 850 496 2009-01-13 → 2009-07-29 354 22′8 261, 627 450 2008-12-15 → 2009-07-11 581 33′9 280, 076 443 2008-10-30 → 2009-07-04 632 32′10 116, 404 392 2008-04-28 → 2009-09-24 296 23′11 123, 604 390 2007-04-18 → 2011-03-10 316 30′12 180, 034 387 2007-12-07 → 2008-12-15 465 34′13 74, 978 357 2008-10-23 → 2009-07-05 210 21′14 168, 990 324 2008-02-13 → 2009-09-28 521 35′15 147, 514 321 2008-10-20 → 2009-04-17 459 20′16 157, 084 317 2008-04-02 → 2009-02-22 495 28′17 125, 441 312 2007-04-28 → 2009-09-28 402 20′18 138, 703 254 2008-07-21 → 2009-09-11 546 40′19 120, 110 247 2008-10-23 → 2009-03-22 486 36′20 72, 677 227 2009-02-11 → 2009-07-12 320 31’′Total 4259, 546 10, 606 2007-04-12 → 2012-07-27 418 27′

References

Alvarez-Garcia J, Ortega J, Gonzalez-Abril L, Velasco F (2010) Trip destination prediction based on pastGPS log using a Hidden Markov Model. Expert Syst. Appl. 37(12):8166–8171. doi:10.1016/j.eswa.2010.05.070

Barouni F, Moulin B (2012) An extended complex event processing engine to qualitatively determinespatiotemporal patterns. In: Proceedings of Global Geospatial Conference 2012, Quebec City, pp201–2133

Carroll A, Heiser G (2010) An analysis of power consumption in a smartphone. In: Proceedings of the 2010USENIX Conference on USENIX Annual Technical Conference, USENIX Association, Boston, MA,USENIXATC’10, pp 21–21, http://dl.acm.org/citation.cfm?id=1855840.1855861

Chen L, Lv M, Ye Q, Chen G, Woodward J (2011) A personal route prediction system based on trajectorydata mining. Inf. Sci. 181(7):1264–1284, doi:10.1016/j.ins.2010.11.035

Civilis A, Jensen CS, Pakalnis S (2005) Techniques for efficient road-network-based tracking of movingobjects. IEEE Trans Knowl Data Eng 17(5):698–712

Deguchi Y, Kuroda K, Shouji M, Kawabe T (2004) HEV charge/discharge control system based on navi-gation information. Technical report, SAE Technical Paper

de Vries G (2012) Kernel methods for vessel trajectories. PhD thesis, University of AmsterdamDunkel J, Bruns R, Stipkovic S (2013) Event-based smartphone sensor processing for ambient assisted liv-

ing. In: 2013 IEEE Eleventh international symposium on autonomous decentralized systems (ISADS),pp 1–6. doi:10.1109/ISADS.2013.6513422

Etzion O, Niblett P (2010) Event processing in action, 1st edn. Manning Publications Co., Greenwich

123

Page 39: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

Online route prediction based on clustering of meaningful...

Giannotti F, Nanni M, Pinelli F, Pedreschi D (2007) Trajectory pattern mining. In: Proceedings of the 13thACM SIGKDD international conference on knowledge discovery and data mining, KDD 07. ACM,New York, pp 330–339. doi:10.1145/1281192.1281230

Guttman A (1984) R-trees: A dynamic index structure for spatial searching. In: Proceedings of the 1984ACM SIGMOD international conference on management of data, SIGMOD ’84. ACM, New York,pp 47–57. doi:10.1145/602259.602266

He W, Li D, Zhang T, An L, Guo M, Chen G (2012) Mining regular routes from gps data for ridesharingrecommendations. In: Proceedings of the ACMSIGKDD international workshop on urban computing,UrbComp ’12. ACM, New York, pp 79–86. doi:10.1145/2346496.2346510

Hendawi AM, Mokbel MF (2012) Predictive spatio-temporal queries: a comprehensive survey and futuredirections. In: Proceedings of the First ACM SIGSPATIAL international workshop on mobile geo-graphic information systems, MobiGIS ’12. ACM, New York, pp 97–104. doi:10.1145/2442810.2442828

Hong K, Lillethun D, Ramachandran U, Ottenwälder B, Koldehofe B (2013) Opportunistic spatio-temporalevent processing for mobile situation awareness. In: Proceedings of the 7th ACM international con-ference on distributed event-based systems, DEBS ’13. ACM, New York, pp 195–206. doi:10.1145/2488222.2488266

Jeung H, Shen H, Zhou X (2007) Mining trajectory patterns using Hidden MarkovModels. In: Song I, EderJ, Nguyen T (eds) Data warehousing and knowledge discovery. Lecture Notes in Computer Science,vol 4654. Springer, Berlin, pp 470–480

Körner C, May M, Wrobel S (2012) Spatiotemporal modeling and analysis-introduction and overview. KI- Künstliche Intell 26(3):215–221. doi:10.1007/s13218-012-0215-2

KrummJ (2006) Real time destination prediction based on efficient routes. Technival Report, SAETechnicalPaper

KrummJ (2010)Wherewill they turn: predicting turn proportions at intersections. Pers.UbiquitousComput.14(7):591–599. doi:10.1007/s00779-009-0248-1

Krumm J, Gruen R, Delling D (2013) From destination prediction to route prediction. J Locat Based Serv7(2):98–120. doi:10.1080/17489725.2013.788228

LinM, HsuWJ (2014)Mining GPS data for mobility patterns: a survey. PervasiveMobile Comput 12(0):1–16. doi:10.1016/j.pmcj.2013.06.005

Lin M, Hsu WJ, Lee ZQ (2012) Predictability of individuals’ mobility with high-resolution positioningdata. In: Proceedings of the 2012 ACM conference on ubiquitous computing, UbiComp ’12. ACM,New York, pp 381–390. doi:10.1145/2370216.2370274

Liou SC, Huang YM (2005) Trajectory predictions in mobile networks. Int J Inf Technol 11(11):109–122Pappalardo L, Simini F, Rinzivillo S, Pedreschi D, Giannotti F, Barabási AL (2015) Returners and explorers

dichotomy in human mobility. Nat Commun 6Pham N, Ganti R, Uddin Y, Nath S, Abdelzaher T (2010) Privacy-preserving reconstruction of multidimen-

sional data maps in vehicular participatory sensing. In: Silva J, Krishnamachari B, Boavida F (eds)Wireless sensor networks. Lecture Notes in Computer Science, vol 5970. Springer, Berlin, pp 114–130

Qiu D, Papotti P, Blanco L (2013) Future locations prediction with uncertain data. In: Blockeel H, KerstingK, Nijssen S, Železny F (eds) Machine learning and knowledge discovery in databases. Lecture Notesin Computer Science, vol 8188. Springer, Berlin, pp 417–432

Song C, Qu Z, Blumm N, Barabási AL (2010) Limits of predictability in human mobility. Science327(5968):1018–1021

Steinfeld A,Manes D, Green P, Hunter D (1996) Destination entry and retrieval with the ali-scout navigationsystem. Technical Report UMTRI-96-30. University of Michigan, Transportation Research Institute(UMTRI)

Stipkovic S, Bruns R, Dunkel J (2013) Pervasive computing by mobile complex event processing. In: 2013IEEE 10th international conference on e-business engineering (ICEBE), pp 318–323. doi:10.1109/ICEBE.2013.49

Terroso-Saenz F, Valdes-Vela M, Campuzano F, Botia JA, Skarmeta-Gómez AF (2015) A complex eventprocessing approach to perceive the vehicular context. Inf Fusion 21(0):187–209. doi:10.1016/j.inffus.2012.08.008

Wang L, Hu K, Ku T, Yan X (2013) Mining frequent trajectory pattern based on vague space partition.Knowl Based Syst 50(0):100–111, doi:10.1016/j.knosys.2013.06.002

123

Page 40: Online route prediction based on clustering of …...Data Min Knowl Disc DOI 10.1007/s10618-016-0452-3 Online route prediction based on clustering of meaningful velocity-change areas

F. Terroso-Saenz et al.

Xue A, Zhang R, Zheng Y, Xie X, Huang J, Xu Z (2013) Destination prediction by sub-trajectory synthesisand privacy protection against such prediction. In: 2013 IEEE 29th international conference on dataengineering (ICDE), pp 254–265. doi:10.1109/ICDE.2013.6544830

Zhang J, Goodchild MF (2002) Uncertainty in geographical information. Taylor & Francis, LondonZheng Y, Xie X, Ma WY (2010) Geolife: a collaborative social networking service among user, location

and trajectory. IEEE Data Eng Bull 33(2):32–39Zhou C, Frankowski D, Ludford P, Shekhar S, Terveen L (2004) Discovering personal gazetteers: an

interactive clustering approach. In: Proceedings of the 12th annual ACM international workshop onGeographic information systems. ACM, pp 266–273

Zhou J, TungAK,WuW,NgWS (2013) A “semi-lazy” approach to probabilistic path prediction in dynamicenvironments. In: Proceedings of the 19th ACM SIGKDD international conference on knowledgediscovery and data mining, KDD ’13. ACM, New York, pp 748–756. doi:10.1145/2487575.2487609

Ziebart BD, Maas AL, Dey AK, Bagnell JA (2008) Navigate like a cabbie: Probabilistic reasoning fromobserved context-aware behavior. In: Proceedings of the 10th international conference on ubiquitouscomputing, UbiComp ’08. ACM, New York, pp 322–331. doi:10.1145/1409635.1409678

123