Upload
roberto-trasarti
View
1.426
Download
0
Embed Size (px)
DESCRIPTION
Citation preview
Mastering the Spatio-Temporal Knowledge Discovery Process
PhD Candidate:
Roberto Trasarti
PhD Thesis discussion
University of Pisa
Spatio-Temporal context
Research on moving-object data analysis has been recently fostered by the widespread diffusion of new techniques and systems for monitoring, collecting and storing location-aware data, generated by a wealth of technological infrastructures, such as: Global Positioning System (GPS) Global System for Mobile (GSM) Sensor networks
Knowledge Discovery Process
Knowledge discovery is a multi-step process, that involves data preprocessing, pattern mining stages and pattern post-processing.
Motivations
Lack of a unifying framework, where mining tools are specific components of the knowledge discovery process.
Having elements from different worlds causes an impedence mismatch
Data Models ?
Related Works
In the literature there aren’t proposals addressing the problem of an uniform framework
There are approaches on Moving Objects Database such as Secondo and Hermes which provide some primitives.
The thesis work has been inspired by well known literature works on the inductive database vision
The proposed Framework
A conceptual framework that poses the basis of the proposed data mining query language and the developed system, the Two-Worlds model.
This thesis proposes: A uniform way to represent the worlds
entities: data and models A set of operators between the two-worlds
The object relational database paradigm
Database: D = {S1...Sn}
Schema: Sj = {T1...Tm}
Table: Ti = <a1...ah>
Attribute: ar A Attribute types: A = {Numerical, Categorical, Descriptive,
Object}
Numerical : the types which describe a number with its precision.
Categorical : representing a value in a pre-defined set and format.
Descriptive : any string of characters.
Object : a complex type which can contain other attributes, lists and methods
Object representation of Data and Models
Using the object relational paradigm we represent data and models as objects
The set of attribute types A can be partitioned in three subset : As Ad Am
Ad
Data Types
Data World
Am
Models types
Model World
ObjectType
Spatial objectTemporal objectMoving object
T-Pattern objectsCluster object Flock object
Data Types
Spatial object is an object which has a geometric shape and a position in space.
Temporal objectis an object which has an absolute temporal reference and a duration.
Moving objectis an object which changesin time and space.
y
x
t
y
x
t
Data-World
The D-World represents the entities to be analyzed, as well as their properties and mutual relationships.
Intuitively the D-World is the set of entities which describe the trajectory dataset and/or a set of regions and/or a partition of the day.
The D-World is a set of tables defined only by attributes in Ad and As
Models Types
T-Pattern is a concise description of frequent behaviors, in terms of both space and time
Clusteris a the spatio-temrporal affinitybetween a set of moving objectsw.r.t. a distance function.
Flockis the spatio-temporal coincidence between a set of moving objectswho move togheter.
RegionA
RegionB
RegionC
10 min
5 min
Model-World
The M-World contains all the movement patterns extracted from the data with their properties and relationships.
The M-World contains the collection of models, unveiled at the different stages of the knowledge discovery process.
The M-World is a set of tables defined only by attributes in Am and As
Two-Worlds Operators
Operators can be intra-world or inter-world and for each type different classes of operators have been defined.
The aim of this class of operators is to build objects in D-World starting from the raw data.
It realizes the data acquisition step of the knowledge discovery process.
Generic Data Constructor operator is defined as OPconstructor(T,p)
Td
Data Constructor Operators
This kind operatorsrealizes the extractionof models from the D-World through data mining algorithms.
Generic Model Constructor operator is defined as OPmining(Td,p)
Tm
Model Constructor Operators
Transformation operators are intra-world tasks aimed at manipulating data and models
These operations are the means for expressing data pre-processing and post-processing tasks.
Generic D-Transformation operator is defined as OPD-Transf (Td,p)
T’d
Generic M-Transformation operator is defined as OPM-Transf
(Tm,p) T’m
Transformation Operators
Relation operatorsinclude both intra-worldand inter-world operations and have the objective of creating relations between data, models, and the combination of the two.
Generic DD-Relation operator is defined as OPDD-Relation (Tdd,f ) TR
dd
Generic MM-Relation operator is defined as OPMM-Relation (Tmm,f )
TRmm
Generic DM-Relation operator is defined as OPDM-Relation (Tdm,f ) TR
dm
Relation Operators
The predicate f can assume a large variety of predicates. However, the semantics of these predicates depends on the type of the data (resp.model) objects to which they are applied.
Predicates of relation operators
Spatial Object
Temporal Object
Moving Object
T-Pattern Cluster Flock
Spatial Object
Intersects
ContainsEquals
Intersects
Contains
Intersects
Contains
IntersectContains
IntersectContains
Temporal Object
Intersects
ContainsEquals
Intersects
Contains
Intersects
Contains
Moving Object
Intersects
ContainsEquals
Intersects
ContainsEntails
Intersects
ContainsEntails
Intersects
ContainsEntails
T-Pattern Intersects
ContainsEquals
Cluster Intersects
ContainsEquals
Flock Intersects
ContainsEquals
DM
MM
DD
Data Mining Query Language We defined a data mining query language to
support the user during knowledge discovery tasks.
Three advantages: The compositionality of the operators The iterative querying The repeatability of the process
DMQL Grammar
DMQL:= DataConstructionOperator| ModelConstructionOperator| TransformationOperator| RelationOperator|SQLStandard
TransformationOperator:=’CREATE TRANSFORMATION‘
TableName ’USING’ TransformationName
’FROM(’SqlCall’)[’SET’Parameters]
RelationOperator:=’CREATE RELATION’ TableName ’USING’ RelationPredicate’FROM(’SqlCall’)’
DataConstructionOperator:=’CREATE DATA’ TableName ’BUILDING’ DataConstructorName’FROM(’SqlCall’)’[’SET’Parameters]
ModelConstructionOperator:=’CREATE MODELS’ TableName ’USING’ ModelConstructorName’FROM(’SqlCall’)’[’SET’Parameters]
The Design of the GeoPKDD system
The GeoPKDD system is an implementation of the Two-Worlds model and the Data Mining Query Language.
Object Realtional Database and Database Manager
As described above the object relational database contains both data and models and grants the power of SQL. It contains the representation of data and models.
The database manager realizes a middle layer and using the translation libraries detaches the system from the database techonologies
Language Parser and Controller Identifies the various types of queries and
builds a plan of execution of them as sequence of actions for the controller.
Example:
CREATE MODELS ClusteringTable USING OPTICSFROM (Select t.id, t.trajobj fromTrajectories t)SET OPTICS.distance_method = Route Similarity AND OPTICS.eps = 50 AND OPTICS.min_size = 100
Plan:
1. Retrieve[ Select t.id, t.trajobj from Trajectories t ]
2. Translate[ Data type: Moving point ]
3. Execute[ Mining algorithm: Optics algorithm, Parameters: ... ]
4. Translate[ Model type: Cluster ]
5. Store[ Table Name: ClusteringTable ]
Algorithms Manager
This component is a plug-in module capable of managing different sets of libraries
Each library realizes a different sets of operators according to the Two-World framework proposed.
Algorithms Libraries Data construction library
Moving object Reconstruction algorithm
Spatial object Builder algotirhm Termporal object Builder algoritm
Model construction library T-Pattern algorithm Optics algorithm T-Flock algorithm
Transformation library Resampling algorithm Intersection algoritm Object filtering T-Anonimity algorithms
Relation Library All the predicates
CREATE DATA MobilityData BUILDING MOVING_POINTS FROM (SELECT userid,lon,lat,datetime
FROM MobilityRawData ORDER BY userid,datetime)
SET MOVING_POINT.MAX_SPACE_GAP = 2000 m AND MOVING_POINT.MAX_TIME_GAP = 1800 sec
CREATE MODELS Patterns USING T-PATTERN FROM (Select t.id, t.trajobj from Trajectories t) SET T-PATTERN.support = .02 AND T-PATTERN.time = 120 sec
CREATE TRANSFORMATION AnonimizedData USING NWA FROM (SELECT t.id, t.trajobj FROM Trajectories t) SET ANONYMIZATION.K = 10 AND ANONYMIZATION.TIME_SLOT = 600 sec
CREATE RELATION EntailmentTable USING ENTIAL FROM (SELECT t.id, t.trajobj, p.id, p.obj FROM Trajectories t, Patterns p)
Extending the system
The GeoPKDD system provides various way to be extended:
Architecture level: new components Algorithm level: new algrorithms Types level: new data types or model types
Add-ons: Reasoning component This component exploits application domain
knowledge encoded in an ontology to infer a semantic interpretation of discovered patterns.
SELECT id, trajobj FROM Trajectories tWHERE SEM_CONCEPT(trajobj) = 'TouristTrajectory'
Add-ons: Location Prediction The goal is to constructs a predictive model using
the set of T-patterns extracted on a set of trajectories.
Given a new trajectory the predictive model can be used to predict the next location of it.
Trajectory dataset Local patterns Prediction Tree
CREATE TRANSFORMATION TPatternTree USING TPATTERN_TREEFROM( Select p.id, p.TpatternObj FROM PatternTable p )
Add-ons: K-Best Map Matching A new way to perform the Map Matching
The shortest path assumption in real cases can be violated in situations where other external factors play a role (i.e. Traffic congestion)
CREATE DATA K-MobilityData BUILDING K-MOVING_POINTSFROM( SELECT userid, lon, lat, datetime FROM MobilityRawData ORDER BY userid, datetime)SET K-MOVING_POINTS.K = 5 AND K-MOVING_POINTS.MAP = StreetMapFile.wkt
A Case Study in a Urban Mobility Scenario
A set of experiments performed on a real world case study, demonstrating the capabilities of the GeoPKDD system and how this can be exploited to extract useful knowledge from raw mobility data.
GPS traces 17K private cars One week of ordinary mobility 200K trips (trajectories) Milan, Italy
Data donated by
Demo
GeoPKDD system Equipped with a very simple GUI which
enables the user to write down DMQL queries and visualize the results
M-Atlas The new generation of the GUI where the
DMQL is used to build complex analysis creating scripts.
Contributions
The contributions of the thesis are:
the creation of a theoretical framework in order to manage the complex Knowledge discovery process on mobility data
the definition of a DMQL which realizes the operators of the framework
the implementation of a real system capable of handling large amount of data
three extensions of the system: reasoning component, k-best map matching and location prediction algorithms
An extensive study and analysis on a real case of study
Achievements
The GeoPKDD system was one of the two project demonstrators and has been successfully presented in the final review of the GeoPKDD project.
Presented at the European parliament as one the selected project in the Future and Emerging Technologies (FET) program
Published in several conferences such as KDD, ICDM, EDBT, AGILE, etc.
It is used in the collaboration with the Milan Mobility Agency for mobility understanding
It is currently used in collaboration with Orange Telecom for the “Big Paris” project
Publications2010
12 Roberto Trasarti, Fosca Giannotti, Mirco Nanni, Dino Pedreschi, Chiara Renso: A Query Language for Mobility Data Mining. International Journal of Data Warehousing and Mining (IJDWM) 2010
11 Mirco Nanni, Roberto Trasarti, Chiara Renso, Fosca Giannotti, Dino Pedreschi : Advanced Knowledge Discovery on Movement Data with the GeoPKDD system. EDBT 2010
200910 Mirco Nanni, Roberto Trasarti, Fosca Giannotti: K-BestMatch reconstruction and comparison of trajectory data. SSTDM
- ICDM 20099 Fosca Giannotti, Roberto Trasarti: Mobility, Data Mining and Privacy: The GeoPKDD Paradigm. SIAM Journal (IM09)8 Fosca Giannotti, Mirco Nanni, Dino Pedreschi, Chiara Renso, Roberto Trasarti: Mining Social Mobility Behaviors from
GPS data. SCMPS - SocialCom 20097 Roberto Trasarti, Miriam Baglioni, Chiara Renso: DAMSEL: a System for Progressive Querying and Reasoning on
Movement data. FlexDBIST 20096 Anna Monreale, Fabio Pinelli, Roberto Trasarti, Fosca Giannotti: WhereNext: a Location Predictor on Trajectory Pattern
Mining. KDD 2009. 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining5 Fosca Giannotti, Chiara Renso, Roberto Trasarti: GeoPKDD – Geographic Privacy-aware Knowledge Discovery. FET
2009. The Eruopean Future Technologies Conference4 Miriam Baglioni, Jose de Macedo, Chiara Renso, Roberto Trasarti and Monica Wachowicz : Towards semantic
interpretation of movement behavior. 12th AGILE International Conference on Geographic Information Science.
20083 Riccardo Ortale, E. Ritacco, Nikos Pelekis, Roberto Trasarti, Gianni Costa, Fosca Giannotti, Giuseppe Manco, Chiara
Renso, Yannis Theodoridis: The DAEDALUS Framework: Progressive Querying and Mining of Movement Data. ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM GIS 2008).
2 Fabio Pinelli, Anna Monreale, Roberto Trasarti, Fosca Giannotti: Location prediction within the mobility data analysis environment Daedalus. First International Workshop on Computational Transportation Science (IWCTS). MOBIQUITOUS 2008. ACM digital library
1 Riccardo Ortale, E. Ritacco, Nikos Pelekis, Roberto Trasarti, Gianni Costa, Fosca Giannotti, Giuseppe Manco, Chiara Renso, Yannis Theodoridis: DAEDALUS: A knowledge discovery analysis framework for movement data. SEBD 2008: 191-198
Journals
Conference Proceedings
Demos or Posters
Thank you
Questions?