AD HOC DATA INTEGRATION FOR MOBILE GIS APPLICATIONSramya/kolloquium/Ad-hoc-data-integration... ·...

Preview:

Citation preview

AD HOC DATA INTEGRATIONFOR MOBILE GIS

APPLICATIONS

-Ramya Venkateswaran-(ramya@geo.uzh.ch)

1

Contents

1. Scenario

2. Research Objective

3. Introduction: Overview of the GenW2 project

4. Motivation: Why is Ad hoc Data Integration needed?

5. State of the Art

6. Research Questions: Discuss 3 research questions

7. Methods: TourGuide and friends

8. Next Steps: Data Enrichment and Quality control

2

Scenario1

Scenario of Usage I will be vacationing in Paris and I want to visit some of the famous palaces, History related places and other tourist locations in Paris

Other Sources?Recommendations

from

People

Tourist Guides

Albums & Images

Tourist & Travel Websites

Scenario of UsageI’d still like to go to Paris..

Other Sources?

People

Tourist Guides

Albums & Images

Tourist & Travel Websites

Tourguide

Recommendationsfrom

Research Objective2

Objective of my research

Data Integration

•Flavour Based integration

• Ad hoc DI vs. Traditional DI

• TourGuide

Data enrichment

• POI Enrichment

• Website credibility

Data quality control

• Completeness

• Correctness

• Credibility

• User feedback

Ad hoc Data Integration

7

Overview and Introduction3

Overview of the GenW2 Project

Short for: Generalization for portrayal in Web and Wireless mapping

Develop new methods for web and wireless mapping

Focus on ad hoc integration of heterogeneous information on-the-fly map generalization in a mobile context.

9

The GenW2 Framework10

Web

Result

Internal Database

Information retrieval component

ParserRuleset & Association Component

Spatio-Temporal

Event handler

User

Privacy Controller and Firewall

Visualization

Filter & Relevance

Component

Genera-lization

Query

ParsedQuery

Unranked dataset

Ranked dataset

Data

Static datasets

Facts DB

MRDB

Intelligent Ranker

Data Integrator

Data sources

1

1

3

2

The GenW2 Framework11

Web

Result

Internal Database

Information retrieval component

ParserRuleset & Association Component

Spatio-Temporal

Event handler

User

Privacy Controller and Firewall

Visualization

Filter & Relevance

Component

Genera-lization

Query

ParsedQuery

Unranked dataset

Ranked dataset

Data

Static datasets

Facts DB

MRDB

Intelligent Ranker

Data Integrator

Data sources

1

1

3

2

The GenW2 Framework12

ParserParsedQuery

Unranked dataset

Ranked dataset

Data

Static datasets

Facts DB

MRDB

Intelligent Ranker

Data sources

1

2

Data Integrator1

Web Interaction Component

Wrapper Generator

Data Transformer

Web

Image metadata

Webservices

Webpages

3

MRDBFacts DB

Image metadata

Types of Data sources

Webservices

13

Web pages

Staticdatasets

Motivation - Why is Ad hoc Data Integration needed?4

Motivation

So many data sources and so little structure

Web as a database – Too much information to ignore!

Ad hoc integration – Need based according to scenario and flavour, unlike search engines.

Importance of recording certain facts that can enrich the MRDB and the integration process.

15

State of the art5

Relevant Domains

Recommendation Systems

Information Filtering

Information Retrieval

Collaborative Filtering

17

Ad hoc Data Integration

State of Art

Data Integration

•Flavour Based integration

• Ad hoc DI vs. Traditional DI

• TourGuide

Data enrichment

• POI Enrichment

• Website credibility

Data quality control

• Completeness

• Correctness

• Credibility

• User feedback

Ad hoc Data Integration

18

Integration, IR and decision systems

Different concepts and methods in Data Integration Data Integration from multiple sources Geospatial data mining and integration. (Knoblock et al.

2001, Michalowski et al., 2004)

Mashup web data for overall importance of landmarks. (Grabler et al., 2008)

SPIRIT – Design, techniques and implementation (Purves et al., 2007, Jones et al., 2002, Bucher et al., 2005)

Geo parsing, geo coding and IR techniques (Clough et al., 2005)

19

Integration, IR and decision systems

Methods for marking tourist locations and a guide that is 'context aware'. (Abowd et al., 2004)

Activity based model of decisions that are affected based on activity-travel behavior and also predict the activities. (Arentze and Timmermans, 2004)

Voluntary information from a community, collaborative semantics, recommendation systems (Schlieder , 2007)

20

Data Enrichment

Methods and algorithms for the provision of auxiliary data and its use for controlling an automated adaptive generalization process (Neun, 2007)

21

Data quality and assessment

Framework for efficient and accurate integration of geospatial data from a large number of sources

Positional accuracy, completeness (Thakker et al., 2007)

VGI (Volunteered Geographic Information) Trust models for Gazetteers (Keßler et al., 2009)

22

Observations from literature

Considerable work and methods for traditional data integration, variety of methods in IR and GIR

Lesser work and methods for data integration from multiple and dynamic sources (Focus on semantics rather than data and context) and recording reusable facts.

Considerable work on user modeling, activities and activity recommendation

Data enrichment work for improving generalization

23

Challenges

Datasets are not static and are dynamic and heterogeneous

Auxiliary data Determining parameters (user categories, activities

habits etc, not a single user or set of preferences) Point of complete integration Methods to test and evaluate the effectiveness

24

Research Questions6

RQ1 – Flavour Based Integration

Given an activity and unrelated data that is heterogeneous and dynamic, what is an effective method of data integration, so that the results are streamlined towards information about events and places for a set of users? Flavour based data integration from various sources Ad hoc DI vs. Traditional DI Tour guide – An example of web data integration

26

RQ2 – Data Enrichment

How can the Generalization for portrayal in Web and Wireless mapping (GenW2) framework record and exploit valuable reusable information, obtained from the preceding data integration? Facts DB Activity-Location pairs

Data source credibility (Keßler et al., 2009) User feedback

27

RQ3 – Quality of data

What are the different metrics that can be used to control and/or assess the quality of the integrated data? Measurement of Quality?

Quality of data by completeness (Thakkar et al., 2007) Quality of data by correctness (Thakkar et al., 2007)

Another metric for Quality Assessment Quality of data by collective user feedback

Credibility rank of information sources (Keßler et al., 2009)

Evaluation Methodology

28

Methods7

Flavour Based Data Integration

Recommendation Systems

Information Filtering

Information Retrieval

Collaborative Filtering

30

Definition - Flavour Based Data Integration

Recommendation Systems

Information Filtering

Information Retrieval

Collaborative Filtering

“The central idea here is to base personalized recommendations for users on information obtained from other, ideally likeminded, users.” (Billsus and Pazzani, 1998).

“use the opinions of a community of users to help individuals in that community more effectively identify content of interest from a potentially overwhelming set of choices” (Resnick and Varian 1997).

“a field of study designed for creating a systematic approach to extracting information that a particular person finds important from a larger stream of information” (Canavese, 1994).

“the goal of an information [retrieval] system is for the user to obtain information from the knowledge resource which helps her/him in problem management” (Belkin, 1984)

31

Definition - Flavour Based Data Integration

Recommendation Systems

Information Filtering

Information Retrieval

Collaborative Filtering

“The central idea here is to base personalized recommendations for users on information obtained from other, ideally likeminded, users.” (Billsus and Pazzani, 1998).

“use the opinions of a community of users to help individuals in that community more effectively identify content of interest from a potentially overwhelming set of choices” (Resnick and Varian 1997).

“a field of study designed for creating a systematic approach to extracting information that a particular person finds important from a larger stream of information” (Canavese, 1994).

“the goal of an information [retrieval] system is for the user to obtain information from the knowledge resource which helps her/him in problem management” (Belkin, 1984)

32

Recommendation Systems

Information Filtering

Information Retrieval

Collaborative Filtering

“The central idea here is to base personalized recommendations for users on information obtained from other, ideally likeminded, users.” (Billsus and Pazzani, 1998).

“use the opinions of a community of users to help individuals in that community more effectively identify content of interest from a potentially overwhelming set of choices” (Resnick and Varian 1997).

“a field of study designed for creating a systematic approach to extracting information that a particular person finds important from a larger stream of information” (Canavese, 1994).

“the goal of an information [retrieval] system is for the user to obtain information from the knowledge resource which helps her/him in problem management” (Belkin, 1984)

33

Definition - Flavour Based Data Integration

Recommendation Systems

Information Filtering

Information Retrieval

Collaborative Filtering

“The central idea here is to base personalized recommendations for users on information obtained from other, ideally likeminded, users.” (Billsus and Pazzani, 1998).

“use the opinions of a community of users to help individuals in that community more effectively identify content of interest from a potentially overwhelming set of choices” (Resnick and Varian 1997).

“a field of study designed for creating a systematic approach to extracting information that a particular person finds important from a larger stream of information” (Canavese, 1994).

“the goal of an information [retrieval] system is for the user to obtain information from the knowledge resource which helps her/him in problem management” (Belkin, 1984)

34

Definition - Flavour Based Data Integration

Recommendation Systems

Information Filtering

Information Retrieval

Collaborative Filtering

“The central idea here is to base personalized recommendations for users on information obtained from other, ideally likeminded, users.” (Billsus and Pazzani, 1998).

“use the opinions of a community of users to help individuals in that community more effectively identify content of interest from a potentially overwhelming set of choices” (Resnick and Varian 1997).

“a field of study designed for creating a systematic approach to extracting information that a particular person finds important from a larger stream of information” (Canavese, 1994).

“the goal of an information [retrieval] system is for the user to obtain information from the knowledge resource which helps her/him in problem management” (Belkin, 1984)

35

Definition - Flavour Based Data Integration

Flavour Based Data Integration

Recommendation Systems

Information Filtering

Information Retrieval

Collaborative Filtering

“The central idea here is to base personalized recommendations for users on information obtained from other, ideally likeminded, users.” (Billsus and Pazzani, 1998).

“use the opinions of a community of users to help individuals in that community more effectively identify content of interest from a potentially overwhelming set of choices” (Resnick and Varian 1997).

“a field of study designed for creating a systematic approach to extracting information that a particular person finds important from a larger stream of information” (Canavese, 1994).

“the goal of an information [retrieval] system is for the user to obtain information from the knowledge resource which helps her/him in problem management” (Belkin, 1984)

36

Keyphrases in FBDI

Systematic approach to extracting information Obtain information from one or many knowledge

resource/s Recommendations for user groups or user

categories

Opinions of a community of users Keyword, flavour or activity such as tourism, history,

sport, culture, shopping etc

37

Definition of FBDI

FBDI is an activity based, systematic approach to extract and integrate information from multiple knowledge sources depending on habits of certain user groups or user categories, capable of learning over time.

Flavour = typical activities of a certain user group Examples – Tourism, Shopping, Sports, Historical

excursions, Cultural excursions etc

38

The GenW2 Framework40

ParserParsedQuery

Unranked dataset

Ranked dataset

Data

Static datasets

Facts DB

MRDB

Intelligent Ranker

Data sources

1

2

Data Integrator1

Web Interaction Component

Wrapper Generator

Data Transformer

Web

Image metadata

Webservices

Webpages

3

ParserParsedQuery

Unranked dataset

Ranked dataset

Data

Static datasets

Facts DB

MRDB

Intelligent Ranker

Data sources

1

2

Data Integrator1

Web Interaction Component

Wrapper Generator

Data Transformer

Web

Image metadata

Webservices

Webpages

3

The GenW2 Framework41

Adaptive tour guide for Paris

Flavour Based Integration with web as datasource Only web as the

database (Grabler et al.,2008 )

Integration of data on Tourism Transport User feedback User Rating Facebook profile Dopplr profile

Scheduler

42

Data Integrator

Example of web data integration Functional components (Baumgartner et al., 2009)

Web interaction component Lonelyplanet, wikitravel, virtualtourist, tripadvisor and

official tourist website

Wrapper generatorOpenKapow Robomaker

Data transformer DOM parser for RSS and XML formats

43

The GenW2 Framework44

ParserParsedQuery

Unranked dataset

Ranked dataset

Data

Static datasets

Facts DB

MRDB

Intelligent Ranker

Data sources

1

2

Data Integrator1

Web Interaction Component

Wrapper Generator

Data Transformer

Web

Image metadata

Webservices

Webpages

3

Data Integrator

Example of web data integration Functional components (Baumgartner et al., 2009)

Web interaction component Lonelyplanet, wikitravel, virtualtourist, tripadvisor and

official tourist website

Wrapper generatorOpenKapow Robomaker

Data transformer DOM parser for RSS and XML formats

45

The GenW2 Framework46

ParserParsedQuery

Unranked dataset

Ranked dataset

Data

Static datasets

Facts DB

MRDB

Intelligent Ranker

Data sources

1

2

Data Integrator1

Web Interaction Component

Wrapper Generator

Data Transformer

Web

Image metadata

Webservices

Webpages

3

Web data Extraction

Semi automatic wrappers

Automatic wrapper Induction WIEN (Kushmerick et al., 1997)

Stalker (Muslea et al., 2001)

DEBye (Laender et al., 2000)

47

Academic XWARP (Liu et al., 2000)

Lixto (Baumgartner et al., 2001)

Wargo (Pan et al., 2002)

Commercial RoboMaker

(Kapow Technologies)

WebQL(QL2 Software Inc.)

Data Integrator

Example of web data integration Functional components (Baumgartner et al., 2009)

Web interaction component Lonelyplanet, wikitravel, virtualtourist, tripadvisor and

official tourist website

Wrapper generatorOpenKapow Robomaker

Data transformer DOM parser for RSS and XML formats

48

The GenW2 Framework49

ParserParsedQuery

Unranked dataset

Ranked dataset

Data

Static datasets

Facts DB

MRDB

Intelligent Ranker

Data sources

1

2

Data Integrator1

Web Interaction Component

Wrapper Generator

Data Transformer

Web

Image metadata

Webservices

Webpages

3

Data Integrator

Example of web data integration Google as a first part of integration Second Part - Functional components (Baumgartner

et al., 2009) Web interaction component lonelyplanet, wikitravel, virtualtourist, tripadvisor and

official tourist website

Wrapper generatorOpenKapow Robomaker

Data transformer DOM parser for RSS and XML formats

50

The GenW2 Framework51

ParserParsedQuery

Unranked dataset

Ranked dataset

Data

Static datasets

Facts DB

MRDB

Intelligent Ranker

Data sources

1

2

Data Integrator1

Web Interaction Component

Wrapper Generator

Data Transformer

Web

Image metadata

Webservices

Webpages

3

Intelligent Ranker and Scheduler

Third step of integration. Applies different profiles to the data, like Facebook

and Dopplr. Arranges the data in a ranked form depending on

matches from user interests and activities. Brute force cumulative ranking algorithm

3 – Explicitly mentioned 2 – Description match 1 – Suggested by other users

Merges data from public transport

52

The GenW2 Framework53

ParserParsedQuery

Unranked dataset

Ranked dataset

Data

Static datasets

Facts DB

MRDB

Intelligent Ranker

Data sources

1

2

Data Integrator1

Web Interaction Component

Wrapper Generator

Data Transformer

Web

Image metadata

Webservices

Webpages

3

Facts DB

Location information from the MRDB and map LOD with place

Activity Location pairs Fact DB structure

54

Facts DB Structure

High Level Structure

Lower level structure – Database Object maps to more locations

Limit to two levels Inverse Page Lookup

55

Activity LocationFrom LocationTo Name Rank User Feedback

Shopping 47°22′40″N,8°32′25″E

47.3671°N , 8.5409°E

Bahnhofstrasse 3 Shop for watches, jewelry, clothes

Database Object

Data Quality

Evaluation through completeness and correctness Example : Shopping stores in Bahnofstrasse Extract lat-lng Shop name, website, details and contact details Shop opening and closing times Evaluate against manually collected data for completeness

and correctness.

56

Next steps8

Next Steps

Formalizing parameters and methods for integration (Link)

Improve scoring algorithm for places Structure of Facts DB for efficient storage and

retrieval Develop on quality control methods like considering

user feedback and credibility

58

Open Questions

At what point is the data integrated? When is it complete? Qualitative vs. Quantitative Error recovery and correction mechanism in

FactsDB? Mapping of place’s score to LOD?

59

Fall 2008Year 1Spring2009

Fall 2009Year 2Spring 2010

Fall 2010Year 3 Spring 2011

• Literature review

• Develop overall framework

• Start to develop research questions and focus area.

• Literature review

• Develop research questions

• Define use cases

• Make a prototype of one use case - TourGuide

• Develop concept and methods for RQ1

• Implement parts of TourGuide

• Develop user tests for input to RQ2 and RQ3

• Continue work on RQ1. Formaliseparameters.

• Analyseinput from user tests and combine with other parameters for RQ2

• Continue work with RQ2 and start RQ3

• Formaliseparameters for data quality control

• Perform evaluation of data, define and implement quality assessing/controlling parameters for FBDI

• Finalize publications

• Thesis write-up

Milestones60

Summary: Expected contributions

Working system and framework for ad hoc data integration, that will work for certain flavours

Methodology of Flavour based data integration (RQ1) Structure Algorithm for efficient data source selection depending on “flavour” Algorithm for scoring different places depending on number of parameters.

Concept and structure of FactsDB that will work with data from the MRDB for enrichment (RQ2)

Improved and adapted parameters and a mechanism for checking the quality of the integrated data and some test cases (RQ3)

61

The GenW2 Framework62

ParserParsedQuery

Unranked dataset

Ranked dataset

Data

Static datasets

Facts DB

MRDB

Intelligent Ranker

Data sources

Data Integrator

Web Interaction Component

Wrapper Generator

Data Transformer

Web

Image metadata

Webservices

Webpages

Thank you!

Ramya Venkateswaran (ramya@geo.uzh.ch)

Demo and slides at http://www.geo.uzh.ch/~ramya/kolloquium/

63

Recommended