Upload
mabel-green
View
214
Download
0
Embed Size (px)
Citation preview
The Database and Info. Systems Lab.University of Illinois at Urbana-Champaign
Light-weight Domain-based Form Assistant:
Querying Web Databases On the Fly
Zhen Zhang, Bin He, and Kevin C. Chang
MetaQuerier 2
The Context: MetaQuerier @ UIUC Exploring and integrating the deep Web
Explorer• source discovery• source modeling• source indexing
Integrator• source selection• schema integration• query mediation
FIND sources
QUERY sources
db of dbs
unified query interface
Amazon.comCars.com
411localte.com
Apartments.com
MetaQuerier 3
The Need: Querying alternative sources
in the same domain Sources are proliferating in the same domain
2004 survey found 10% Web sites are “deep” totaling 450,000 DBs on the Web
Each query can often find many useful DBs Different query needs different sources
How to query across dynamic sources?
MetaQuerier 4
The Problem: Query translation on-the-fly
Challenge: No pre-configured source-specific translation knowledge
Requirements: Within domain: Source generality Across domain: Domain portability
MetaQuerier 5
Dynamic query translation – Essential tasks Reconcile three levels of query heterogeneities
Attribute level: schema matching Predicate level: predicate mapping Query level: query rewriting
MetaQuerier 6
Demo.Form Assistant to help navigate the deep Web.
MetaQuerier 7
Translation objective: Closest among the valid
Tom ClancyTom Clancy
Source query Qs on source form S
U
Target query form T
Query Translation
Filter : σtitle contain “red storm” and price < 35 and age > 12
Union Query Qt*:
Input:
output:
Two goals: Syntactic valid semantic close
MetaQuerier 8
What is valid? Each source has a query model Vocabulary: predicate templates
{ P1, P2, P3, P4, P5 }
Syntax: valid combination of predicate templates { F1, F2, F3, F4, F5, F6, F7, F8 }
P1 P3 P4P2
F1 F2 F3 F4 F5 F6 F7 F8
P1 ν νP2 ν νP3 ν νP4 ν νP5 ν ν ν ν
Tom Clancy
P5
F5:
F6:
MetaQuerier 9
What is close? Define semantic closeness.
Minimal subsuming Cmin
No false positive Miss no correct answer
Minimizing false negative Contain fewest extra answers
Clear semantic Database content independent
Modular translation Reduce translation complexity
t1:0 25
t2:25 45
s: 350
t1 v t2:25 65
t3:6545
t2 v t3:25 65
?
Cmin
MetaQuerier 10
What is close? Define semantic closeness.
Minimal subsuming Cmin
No false positive: Miss no answer Minimizing false negative: Fewest extra answers Clear semantics: DB content independent Modular translation: Reduce translation complexity
t1:0 25
t2:25 45
s: 350
t1 v t2:0 45
t3:6545
t1 v t2 v t3:0 65
? Cmin
MetaQuerier 11
Target Query
Source Query
Enumeratevalid
Search for closest
Target Query
Query Translation
Source Query
What mechanism?
Attribute Match
Predicate Mapping
Query Rewriter
Cmin?
MetaQuerier 12
Form Extractor Form Extractor
Source query Qs Target query form QI
Attribute Matcher:Syntax-based schema matching
Predicate Mapper:Type-based search-driven mapping
Query Rewriter:Constraint-based query rewriting
Target query Qt*
Domain-specificThesaurus
Domain-specific type handlers
System architecture: Modular & lightweight
Modularized mechanism
Lightweight domain knowledge
[RahmBernstein- VLDBJ01]
[Halevy-VLDBJ01]
?
[ZhangHC-
SIGMOD04]
[HeChang-SIGMOD03]
[WuYDM-SIGMOD04]
MetaQuerier 13
The core challenge: Predicate mapping
Tasks Choose operator Fill in values
Union of target predicate t*
Predicate MappingPredicate Mapping
U
Objective Minimal subsuming
Input:
output:
MetaQuerier 14
Is source-specific translation applicable?
1 ………1
……
……
..
1 ……
1
…….
adult = $t passenger = $t… …
price<$t if $t<25:
[price:between:0,25] elseif $t<45: …… …
MetaQuerier 15
Enable source-generic predicate mapping?
What is the scope of translation?
What is the mechanism of translation?
MetaQuerier 16
The right scope? Survey 150 sources for the Correspondence Matrix.
Correspondences occur within localities!
MetaQuerier 17
The right scope? Correspondence locality Type-based translation Target template P
Target Predicate t*
Type Recognizer
Domain Specific Handler
Text Handler
Numeric Handler
Datetime Handler
Predicate Mapper
Source predicate s
Correspondences occur within localities Translation by type-handler
MetaQuerier 18
The right mechanism: Is pairwise-rule based mechanism suitable?
Template
new template
1 n n+11
n
n+1
Adding one template needs to add 2n rules! And need knowledge of the old templates.
attr<$t if $t<25: [attr:between:0,25] elseif $t<45: …… …
Rule:
MetaQuerier 19
More extendable mechanism? Search-driven.
Values of the type(virtual database)
Evaluate over “database”
Templates of same type
Evaluation resultsSearch for closest
evaluator
-infinite +infinite0 1
t1:0 25
t2:25 45
s:350
t1 v t2:25 45
s
t
… …
uevaluator
MetaQuerier 20
Greedy search to construct Cmin mapping
Find mapping iteratively Each iteration, greedily choose the one covering
maximal uncovered
t1:0 25
t2:25 45
s:350
t3:45 65
MetaQuerier 21
Experiments
Translating 120 queries in total Between randomly paired sources from 8 domains With domain thesaurus but no type handler Accuracy as ratio of correct condition per query
Matc
hin
g
18%
40%
42%
Extraction
Mapping
Average accuracy Error distribution
Basic: 3 domains New: 5 domains
MetaQuerier 22
Conclusion
System: Form assistant for querying Web databases
Problem Dynamic query translation
Contributions: Framework: Light-weight domain-based architecture Techniques: Type-based search-driven pred. mapping
MetaQuerier 24
Experiment: Accuracy distribution
Accuracy distribution for Basic dataset Accuracy distribution for New dataset
MetaQuerier 25
Text handler: Search space
Conceptually, union of all target predicate Practically, close-world assumption
MetaQuerier 26
Text handler: Closeness estimation
Ideally, logic reasoning Practically, evaluation-by-materialization
Materialize query against a “complete” database