Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Contents
1. Scenario
2. Research Objective
3. Introduction: Overview of the GenW2 project
4. Motivation: Why is Ad hoc Data Integration needed?
5. State of the Art
6. Research Questions: Discuss 3 research questions
7. Methods: TourGuide and friends
8. Next Steps: Data Enrichment and Quality control
2
Scenario1
Scenario of Usage I will be vacationing in Paris and I want to visit some of the famous palaces, History related places and other tourist locations in Paris
Other Sources?Recommendations
from
People
Tourist Guides
Albums & Images
Tourist & Travel Websites
Scenario of UsageI’d still like to go to Paris..
Other Sources?
People
Tourist Guides
Albums & Images
Tourist & Travel Websites
Tourguide
Recommendationsfrom
Research Objective2
Objective of my research
Data Integration
•Flavour Based integration
• Ad hoc DI vs. Traditional DI
• TourGuide
Data enrichment
• POI Enrichment
• Website credibility
Data quality control
• Completeness
• Correctness
• Credibility
• User feedback
Ad hoc Data Integration
7
Overview and Introduction3
Overview of the GenW2 Project
Short for: Generalization for portrayal in Web and Wireless mapping
Develop new methods for web and wireless mapping
Focus on ad hoc integration of heterogeneous information on-the-fly map generalization in a mobile context.
9
The GenW2 Framework10
Web
Result
Internal Database
Information retrieval component
ParserRuleset & Association Component
Spatio-Temporal
Event handler
User
Privacy Controller and Firewall
Visualization
Filter & Relevance
Component
Genera-lization
Query
ParsedQuery
Unranked dataset
Ranked dataset
Data
Static datasets
Facts DB
MRDB
Intelligent Ranker
Data Integrator
Data sources
1
1
3
2
The GenW2 Framework11
Web
Result
Internal Database
Information retrieval component
ParserRuleset & Association Component
Spatio-Temporal
Event handler
User
Privacy Controller and Firewall
Visualization
Filter & Relevance
Component
Genera-lization
Query
ParsedQuery
Unranked dataset
Ranked dataset
Data
Static datasets
Facts DB
MRDB
Intelligent Ranker
Data Integrator
Data sources
1
1
3
2
The GenW2 Framework12
ParserParsedQuery
Unranked dataset
Ranked dataset
Data
Static datasets
Facts DB
MRDB
Intelligent Ranker
Data sources
1
2
Data Integrator1
Web Interaction Component
Wrapper Generator
Data Transformer
Web
Image metadata
Webservices
Webpages
3
MRDBFacts DB
Image metadata
Types of Data sources
Webservices
13
Web pages
Staticdatasets
Motivation - Why is Ad hoc Data Integration needed?4
Motivation
So many data sources and so little structure
Web as a database – Too much information to ignore!
Ad hoc integration – Need based according to scenario and flavour, unlike search engines.
Importance of recording certain facts that can enrich the MRDB and the integration process.
15
State of the art5
Relevant Domains
Recommendation Systems
Information Filtering
Information Retrieval
Collaborative Filtering
17
Ad hoc Data Integration
State of Art
Data Integration
•Flavour Based integration
• Ad hoc DI vs. Traditional DI
• TourGuide
Data enrichment
• POI Enrichment
• Website credibility
Data quality control
• Completeness
• Correctness
• Credibility
• User feedback
Ad hoc Data Integration
18
Integration, IR and decision systems
Different concepts and methods in Data Integration Data Integration from multiple sources Geospatial data mining and integration. (Knoblock et al.
2001, Michalowski et al., 2004)
Mashup web data for overall importance of landmarks. (Grabler et al., 2008)
SPIRIT – Design, techniques and implementation (Purves et al., 2007, Jones et al., 2002, Bucher et al., 2005)
Geo parsing, geo coding and IR techniques (Clough et al., 2005)
19
Integration, IR and decision systems
Methods for marking tourist locations and a guide that is 'context aware'. (Abowd et al., 2004)
Activity based model of decisions that are affected based on activity-travel behavior and also predict the activities. (Arentze and Timmermans, 2004)
Voluntary information from a community, collaborative semantics, recommendation systems (Schlieder , 2007)
20
Data Enrichment
Methods and algorithms for the provision of auxiliary data and its use for controlling an automated adaptive generalization process (Neun, 2007)
21
Data quality and assessment
Framework for efficient and accurate integration of geospatial data from a large number of sources
Positional accuracy, completeness (Thakker et al., 2007)
VGI (Volunteered Geographic Information) Trust models for Gazetteers (Keßler et al., 2009)
22
Observations from literature
Considerable work and methods for traditional data integration, variety of methods in IR and GIR
Lesser work and methods for data integration from multiple and dynamic sources (Focus on semantics rather than data and context) and recording reusable facts.
Considerable work on user modeling, activities and activity recommendation
Data enrichment work for improving generalization
23
Challenges
Datasets are not static and are dynamic and heterogeneous
Auxiliary data Determining parameters (user categories, activities
habits etc, not a single user or set of preferences) Point of complete integration Methods to test and evaluate the effectiveness
24
Research Questions6
RQ1 – Flavour Based Integration
Given an activity and unrelated data that is heterogeneous and dynamic, what is an effective method of data integration, so that the results are streamlined towards information about events and places for a set of users? Flavour based data integration from various sources Ad hoc DI vs. Traditional DI Tour guide – An example of web data integration
26
RQ2 – Data Enrichment
How can the Generalization for portrayal in Web and Wireless mapping (GenW2) framework record and exploit valuable reusable information, obtained from the preceding data integration? Facts DB Activity-Location pairs
Data source credibility (Keßler et al., 2009) User feedback
27
RQ3 – Quality of data
What are the different metrics that can be used to control and/or assess the quality of the integrated data? Measurement of Quality?
Quality of data by completeness (Thakkar et al., 2007) Quality of data by correctness (Thakkar et al., 2007)
Another metric for Quality Assessment Quality of data by collective user feedback
Credibility rank of information sources (Keßler et al., 2009)
Evaluation Methodology
28
Methods7
Flavour Based Data Integration
Recommendation Systems
Information Filtering
Information Retrieval
Collaborative Filtering
30
Definition - Flavour Based Data Integration
Recommendation Systems
Information Filtering
Information Retrieval
Collaborative Filtering
“The central idea here is to base personalized recommendations for users on information obtained from other, ideally likeminded, users.” (Billsus and Pazzani, 1998).
“use the opinions of a community of users to help individuals in that community more effectively identify content of interest from a potentially overwhelming set of choices” (Resnick and Varian 1997).
“a field of study designed for creating a systematic approach to extracting information that a particular person finds important from a larger stream of information” (Canavese, 1994).
“the goal of an information [retrieval] system is for the user to obtain information from the knowledge resource which helps her/him in problem management” (Belkin, 1984)
31
Definition - Flavour Based Data Integration
Recommendation Systems
Information Filtering
Information Retrieval
Collaborative Filtering
“The central idea here is to base personalized recommendations for users on information obtained from other, ideally likeminded, users.” (Billsus and Pazzani, 1998).
“use the opinions of a community of users to help individuals in that community more effectively identify content of interest from a potentially overwhelming set of choices” (Resnick and Varian 1997).
“a field of study designed for creating a systematic approach to extracting information that a particular person finds important from a larger stream of information” (Canavese, 1994).
“the goal of an information [retrieval] system is for the user to obtain information from the knowledge resource which helps her/him in problem management” (Belkin, 1984)
32
Recommendation Systems
Information Filtering
Information Retrieval
Collaborative Filtering
“The central idea here is to base personalized recommendations for users on information obtained from other, ideally likeminded, users.” (Billsus and Pazzani, 1998).
“use the opinions of a community of users to help individuals in that community more effectively identify content of interest from a potentially overwhelming set of choices” (Resnick and Varian 1997).
“a field of study designed for creating a systematic approach to extracting information that a particular person finds important from a larger stream of information” (Canavese, 1994).
“the goal of an information [retrieval] system is for the user to obtain information from the knowledge resource which helps her/him in problem management” (Belkin, 1984)
33
Definition - Flavour Based Data Integration
Recommendation Systems
Information Filtering
Information Retrieval
Collaborative Filtering
“The central idea here is to base personalized recommendations for users on information obtained from other, ideally likeminded, users.” (Billsus and Pazzani, 1998).
“use the opinions of a community of users to help individuals in that community more effectively identify content of interest from a potentially overwhelming set of choices” (Resnick and Varian 1997).
“a field of study designed for creating a systematic approach to extracting information that a particular person finds important from a larger stream of information” (Canavese, 1994).
“the goal of an information [retrieval] system is for the user to obtain information from the knowledge resource which helps her/him in problem management” (Belkin, 1984)
34
Definition - Flavour Based Data Integration
Recommendation Systems
Information Filtering
Information Retrieval
Collaborative Filtering
“The central idea here is to base personalized recommendations for users on information obtained from other, ideally likeminded, users.” (Billsus and Pazzani, 1998).
“use the opinions of a community of users to help individuals in that community more effectively identify content of interest from a potentially overwhelming set of choices” (Resnick and Varian 1997).
“a field of study designed for creating a systematic approach to extracting information that a particular person finds important from a larger stream of information” (Canavese, 1994).
“the goal of an information [retrieval] system is for the user to obtain information from the knowledge resource which helps her/him in problem management” (Belkin, 1984)
35
Definition - Flavour Based Data Integration
Flavour Based Data Integration
Recommendation Systems
Information Filtering
Information Retrieval
Collaborative Filtering
“The central idea here is to base personalized recommendations for users on information obtained from other, ideally likeminded, users.” (Billsus and Pazzani, 1998).
“use the opinions of a community of users to help individuals in that community more effectively identify content of interest from a potentially overwhelming set of choices” (Resnick and Varian 1997).
“a field of study designed for creating a systematic approach to extracting information that a particular person finds important from a larger stream of information” (Canavese, 1994).
“the goal of an information [retrieval] system is for the user to obtain information from the knowledge resource which helps her/him in problem management” (Belkin, 1984)
36
Keyphrases in FBDI
Systematic approach to extracting information Obtain information from one or many knowledge
resource/s Recommendations for user groups or user
categories
Opinions of a community of users Keyword, flavour or activity such as tourism, history,
sport, culture, shopping etc
37
Definition of FBDI
FBDI is an activity based, systematic approach to extract and integrate information from multiple knowledge sources depending on habits of certain user groups or user categories, capable of learning over time.
Flavour = typical activities of a certain user group Examples – Tourism, Shopping, Sports, Historical
excursions, Cultural excursions etc
38
The GenW2 Framework40
ParserParsedQuery
Unranked dataset
Ranked dataset
Data
Static datasets
Facts DB
MRDB
Intelligent Ranker
Data sources
1
2
Data Integrator1
Web Interaction Component
Wrapper Generator
Data Transformer
Web
Image metadata
Webservices
Webpages
3
ParserParsedQuery
Unranked dataset
Ranked dataset
Data
Static datasets
Facts DB
MRDB
Intelligent Ranker
Data sources
1
2
Data Integrator1
Web Interaction Component
Wrapper Generator
Data Transformer
Web
Image metadata
Webservices
Webpages
3
The GenW2 Framework41
Adaptive tour guide for Paris
Flavour Based Integration with web as datasource Only web as the
database (Grabler et al.,2008 )
Integration of data on Tourism Transport User feedback User Rating Facebook profile Dopplr profile
Scheduler
42
Data Integrator
Example of web data integration Functional components (Baumgartner et al., 2009)
Web interaction component Lonelyplanet, wikitravel, virtualtourist, tripadvisor and
official tourist website
Wrapper generatorOpenKapow Robomaker
Data transformer DOM parser for RSS and XML formats
43
The GenW2 Framework44
ParserParsedQuery
Unranked dataset
Ranked dataset
Data
Static datasets
Facts DB
MRDB
Intelligent Ranker
Data sources
1
2
Data Integrator1
Web Interaction Component
Wrapper Generator
Data Transformer
Web
Image metadata
Webservices
Webpages
3
Data Integrator
Example of web data integration Functional components (Baumgartner et al., 2009)
Web interaction component Lonelyplanet, wikitravel, virtualtourist, tripadvisor and
official tourist website
Wrapper generatorOpenKapow Robomaker
Data transformer DOM parser for RSS and XML formats
45
The GenW2 Framework46
ParserParsedQuery
Unranked dataset
Ranked dataset
Data
Static datasets
Facts DB
MRDB
Intelligent Ranker
Data sources
1
2
Data Integrator1
Web Interaction Component
Wrapper Generator
Data Transformer
Web
Image metadata
Webservices
Webpages
3
Web data Extraction
Semi automatic wrappers
Automatic wrapper Induction WIEN (Kushmerick et al., 1997)
Stalker (Muslea et al., 2001)
DEBye (Laender et al., 2000)
47
Academic XWARP (Liu et al., 2000)
Lixto (Baumgartner et al., 2001)
Wargo (Pan et al., 2002)
Commercial RoboMaker
(Kapow Technologies)
WebQL(QL2 Software Inc.)
Data Integrator
Example of web data integration Functional components (Baumgartner et al., 2009)
Web interaction component Lonelyplanet, wikitravel, virtualtourist, tripadvisor and
official tourist website
Wrapper generatorOpenKapow Robomaker
Data transformer DOM parser for RSS and XML formats
48
The GenW2 Framework49
ParserParsedQuery
Unranked dataset
Ranked dataset
Data
Static datasets
Facts DB
MRDB
Intelligent Ranker
Data sources
1
2
Data Integrator1
Web Interaction Component
Wrapper Generator
Data Transformer
Web
Image metadata
Webservices
Webpages
3
Data Integrator
Example of web data integration Google as a first part of integration Second Part - Functional components (Baumgartner
et al., 2009) Web interaction component lonelyplanet, wikitravel, virtualtourist, tripadvisor and
official tourist website
Wrapper generatorOpenKapow Robomaker
Data transformer DOM parser for RSS and XML formats
50
The GenW2 Framework51
ParserParsedQuery
Unranked dataset
Ranked dataset
Data
Static datasets
Facts DB
MRDB
Intelligent Ranker
Data sources
1
2
Data Integrator1
Web Interaction Component
Wrapper Generator
Data Transformer
Web
Image metadata
Webservices
Webpages
3
Intelligent Ranker and Scheduler
Third step of integration. Applies different profiles to the data, like Facebook
and Dopplr. Arranges the data in a ranked form depending on
matches from user interests and activities. Brute force cumulative ranking algorithm
3 – Explicitly mentioned 2 – Description match 1 – Suggested by other users
Merges data from public transport
52
The GenW2 Framework53
ParserParsedQuery
Unranked dataset
Ranked dataset
Data
Static datasets
Facts DB
MRDB
Intelligent Ranker
Data sources
1
2
Data Integrator1
Web Interaction Component
Wrapper Generator
Data Transformer
Web
Image metadata
Webservices
Webpages
3
Facts DB
Location information from the MRDB and map LOD with place
Activity Location pairs Fact DB structure
54
Facts DB Structure
High Level Structure
Lower level structure – Database Object maps to more locations
Limit to two levels Inverse Page Lookup
55
Activity LocationFrom LocationTo Name Rank User Feedback
Shopping 47°22′40″N,8°32′25″E
47.3671°N , 8.5409°E
Bahnhofstrasse 3 Shop for watches, jewelry, clothes
Database Object
Data Quality
Evaluation through completeness and correctness Example : Shopping stores in Bahnofstrasse Extract lat-lng Shop name, website, details and contact details Shop opening and closing times Evaluate against manually collected data for completeness
and correctness.
56
Next steps8
Next Steps
Formalizing parameters and methods for integration (Link)
Improve scoring algorithm for places Structure of Facts DB for efficient storage and
retrieval Develop on quality control methods like considering
user feedback and credibility
58
Open Questions
At what point is the data integrated? When is it complete? Qualitative vs. Quantitative Error recovery and correction mechanism in
FactsDB? Mapping of place’s score to LOD?
59
Fall 2008Year 1Spring2009
Fall 2009Year 2Spring 2010
Fall 2010Year 3 Spring 2011
• Literature review
• Develop overall framework
• Start to develop research questions and focus area.
• Literature review
• Develop research questions
• Define use cases
• Make a prototype of one use case - TourGuide
• Develop concept and methods for RQ1
• Implement parts of TourGuide
• Develop user tests for input to RQ2 and RQ3
• Continue work on RQ1. Formaliseparameters.
• Analyseinput from user tests and combine with other parameters for RQ2
• Continue work with RQ2 and start RQ3
• Formaliseparameters for data quality control
• Perform evaluation of data, define and implement quality assessing/controlling parameters for FBDI
• Finalize publications
• Thesis write-up
Milestones60
Summary: Expected contributions
Working system and framework for ad hoc data integration, that will work for certain flavours
Methodology of Flavour based data integration (RQ1) Structure Algorithm for efficient data source selection depending on “flavour” Algorithm for scoring different places depending on number of parameters.
Concept and structure of FactsDB that will work with data from the MRDB for enrichment (RQ2)
Improved and adapted parameters and a mechanism for checking the quality of the integrated data and some test cases (RQ3)
61
The GenW2 Framework62
ParserParsedQuery
Unranked dataset
Ranked dataset
Data
Static datasets
Facts DB
MRDB
Intelligent Ranker
Data sources
Data Integrator
Web Interaction Component
Wrapper Generator
Data Transformer
Web
Image metadata
Webservices
Webpages
Thank you!
Ramya Venkateswaran ([email protected])
Demo and slides at http://www.geo.uzh.ch/~ramya/kolloquium/
63