24
Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa) www-kdd.isti.cnr.it Anna Monreale Fabio Pinelli Roberto Trasarti Fosca Giannotti A. Monreale, F. Pinelli, R. Trasarti, F. Giannotti. WhereNext: a Location Predictor on Trajectory Pattern Mining. KDD 2009

Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa) Anna Monreale Fabio Pinelli Roberto Trasarti Fosca Giannotti A. Monreale,

Embed Size (px)

Citation preview

Page 1: Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa)  Anna Monreale Fabio Pinelli Roberto Trasarti Fosca Giannotti A. Monreale,

Knowledge Discovery and Delivery Lab(ISTI-CNR & Univ. Pisa)www-kdd.isti.cnr.it

Anna MonrealeFabio PinelliRoberto Trasarti Fosca Giannotti

A. Monreale, F. Pinelli, R. Trasarti, F. Giannotti. WhereNext: a Location Predictor on Trajectory Pattern Mining. KDD 2009

Page 2: Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa)  Anna Monreale Fabio Pinelli Roberto Trasarti Fosca Giannotti A. Monreale,

Wireless networks infrastructures are the nerves of our territory

besides offering their services, they gather highly informative traces about the human mobile activities

Miniaturization, wearability, pervasiveness will produce traces of increasing• positioning accuracy• semantic richness

Page 3: Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa)  Anna Monreale Fabio Pinelli Roberto Trasarti Fosca Giannotti A. Monreale,

From the analysis of the traces of our mobile phones it is possible to reconstruct our mobile behaviour, the way we collectively move

This knowledge may help us improving decision-making in many mobility-related issues:

• Planning traffic and public mobility systems in metropolitan areas;

• Planning physical communication networks• Forecasting traffic-related phenomena• Organizing logistics systems• Prediction

Page 4: Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa)  Anna Monreale Fabio Pinelli Roberto Trasarti Fosca Giannotti A. Monreale,
Page 5: Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa)  Anna Monreale Fabio Pinelli Roberto Trasarti Fosca Giannotti A. Monreale,

Predicting the next location of a trajectory can improve a large set of services such as:

Navigational services.Traffic management.Location-based advertising.Services Pre-fetching.Simulation.

??

?.4

.8

.35

Page 6: Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa)  Anna Monreale Fabio Pinelli Roberto Trasarti Fosca Giannotti A. Monreale,

How to realize this idea: Extract patterns from all the available movements in a certain area instead of on the individual history of an object; Using these Local movement patterns as predictive rules. Build a prediction tree as global model.

Trajectory dataset

Local patterns

Prediction Tree

Page 7: Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa)  Anna Monreale Fabio Pinelli Roberto Trasarti Fosca Giannotti A. Monreale,

Select the set of interesting trajectories

Extract T-Patterns (A set of Local models)

Merge T-Patterns (Global model)

Use the Condensed model as predictor

Validation

Evaluation

Page 8: Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa)  Anna Monreale Fabio Pinelli Roberto Trasarti Fosca Giannotti A. Monreale,

The local pattern we use is the T-Pattern. It describes the common behavior of a group of users in space and time.

F. Giannotti, M. Nanni, F. Pinelli, and D. Pedreschi. Trajectory pattern mining. KDD 2007: 330-339.

Page 9: Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa)  Anna Monreale Fabio Pinelli Roberto Trasarti Fosca Giannotti A. Monreale,

Generating all rules from each T-pattern and using them to build a classifier is too expensive.

T-Pattern

Rules R1 R2 R3 R4

R1 R2 R3 R4

R1 R2 R3 R4

α1 α2α3

Page 10: Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa)  Anna Monreale Fabio Pinelli Roberto Trasarti Fosca Giannotti A. Monreale,

To avoid the rules generation the T-Pattern set is organized as a prefix tree.

For Each node v • Id identifies the node v• Region a spatial component of the T-Pattern• Support is the support of the T-patternFor Each edge j • [a,b] correspond to the time interval αn of the T-Pattern

Page 11: Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa)  Anna Monreale Fabio Pinelli Roberto Trasarti Fosca Giannotti A. Monreale,

Three steps:1. Search for best match2. Candidate generation3. Make predictions

Best Match

Prediction

How to compute the Best Match?

Page 12: Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa)  Anna Monreale Fabio Pinelli Roberto Trasarti Fosca Giannotti A. Monreale,

The spatio-temporal distance computed between the segment of trajectory (bounded in time using the previous transition time) and the current node of the path.

Case a: The trajectory segment intersects the region of the nodeCase b: The enlarged trajectory segment intersects the regionCase c: The enlarged trajectory segment doesn’t intersect the region

Where the th_t is the time tolerance window defined by the user.

Page 13: Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa)  Anna Monreale Fabio Pinelli Roberto Trasarti Fosca Giannotti A. Monreale,

The path score is the aggregation of all punctual scores along a path.

The Best Match is the path having: the maximum path score; at least one admissible prediction.

10 min

15 min

8 min10 min

Punctual score:1

Punctual Score:.58

Punctual Score:.8

11 min16 min

Path score.79

Page 14: Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa)  Anna Monreale Fabio Pinelli Roberto Trasarti Fosca Giannotti A. Monreale,

o Average generalizes distances between the trajectory and each node

o Sum is based on the concept of depth

o Max is the optimistic one, the best punctual score is selected as path score

o Context-dependent aggregations can take into consideration other aspects of the problem.

Page 15: Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa)  Anna Monreale Fabio Pinelli Roberto Trasarti Fosca Giannotti A. Monreale,

The WhereNext algorithm can be tuned using its parameters:

- th_t : time window tolerance

- th_s: space window tolerance

- th_score: minimum prediction score threshold

- th_agg: the aggregation function used to compute the path score (Avg, Sum or Max)

Page 16: Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa)  Anna Monreale Fabio Pinelli Roberto Trasarti Fosca Giannotti A. Monreale,

It is very hard to understand which is the best set of

T-patterns we can use to build the our model:

a big set of T-patterns very slow prediction.

a small set of T-patterns coverage leaks

For this reason we have defined a way to measure the prediction power of a T-Pattern set.

Page 17: Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa)  Anna Monreale Fabio Pinelli Roberto Trasarti Fosca Giannotti A. Monreale,

An evaluating function is defined to estimate the predicting power of a T-Pattern set.

SpatialCoverage: the space coverage of the regions contained in the T-Patterns set; DatasetCoverage: measures how much the T-Pattern set represents the trajectories RegionSeparation: the precision of the regions in the T-Pattern set.

Model 1

Model 2

Testing the a priori evaluation

Page 18: Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa)  Anna Monreale Fabio Pinelli Roberto Trasarti Fosca Giannotti A. Monreale,

You arehere

Page 19: Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa)  Anna Monreale Fabio Pinelli Roberto Trasarti Fosca Giannotti A. Monreale,

The results are evaluated using the following measures:

Accuracy: rate of the correctly predicted locations (space and time) divided by the total number of trajectories to be predicted.

Average Error: the average distance between the real trajectories in the predicted interval and the region predicted.

Prediction rate: the number of trajectories which have a prediction divided by the total number of trajectories to be predicted.

Predicted

LocationCut

Original

Predicted

Location

Cut

Original

Error

Page 20: Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa)  Anna Monreale Fabio Pinelli Roberto Trasarti Fosca Giannotti A. Monreale,

We used real life GPS dataset obtained from 17,000 vehicles in the urban area of the city of Milan.

Training set: 4000 trajectories between 7am and 10 am on Wednesday Test set: 500 trajectories between 7am and 10 am on Thursday.

Page 21: Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa)  Anna Monreale Fabio Pinelli Roberto Trasarti Fosca Giannotti A. Monreale,

Predicted vs th_score

Average Error vs th_space

Page 22: Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa)  Anna Monreale Fabio Pinelli Roberto Trasarti Fosca Giannotti A. Monreale,

Accuracy vs Average Error

Single Users Accuracy and Prediction rate

Page 23: Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa)  Anna Monreale Fabio Pinelli Roberto Trasarti Fosca Giannotti A. Monreale,

A visual example of the application on Milan mobility data. The context is traffic management and we want to predict how the traffic will move in the city center.

We have built a predictor on a “good” set of T-patterns which include the city gates of Milan.

Part of the GeoPKDD integrated platform. F. Giannotti, D. Pedreschi, and et al. Geopkdd: Geographic privacy-aware knowledge discovery and delivery (european project), 2008.

Page 24: Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa)  Anna Monreale Fabio Pinelli Roberto Trasarti Fosca Giannotti A. Monreale,

- A new technique to predict the next locations of a trajectory based on previous movements of all the objects without considering any information about the users.

- The time information is used not only to order the events but is intrinsically equipped in the T-Patterns used to build the Prediction tree.

- The user can tune the method to obtain a good accuracy and prediction rate.

- We are experimenting the method in real world applications.