The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight...

Preview:

DESCRIPTION

MetaQuerier 3 The Need: Querying alternative sources in the same domain Sources are proliferating in the same domain  2004 survey found 10% Web sites are “deep”  totaling 450,000 DBs on the Web Each query can often find many useful DBs Different query needs different sources  How to query across dynamic sources?

Citation preview

The Database and Info. Systems Lab.University of Illinois at Urbana-Champaign

Light-weight Domain-based Form Assistant:

Querying Web Databases On the Fly

Zhen Zhang, Bin He, and Kevin C. Chang

MetaQuerier 2

The Context: MetaQuerier @ UIUC Exploring and integrating the deep Web

Explorer• source discovery• source modeling• source indexing

Integrator• source selection• schema integration• query mediation

FIND sources

QUERY sources

db of dbs

unified query interface

Amazon.comCars.com

411localte.com

Apartments.com

MetaQuerier 3

The Need: Querying alternative sources

in the same domain Sources are proliferating in the same domain

2004 survey found 10% Web sites are “deep” totaling 450,000 DBs on the Web

Each query can often find many useful DBs Different query needs different sources

How to query across dynamic sources?

MetaQuerier 4

The Problem: Query translation on-the-fly

Challenge: No pre-configured source-specific translation knowledge

Requirements: Within domain: Source generality Across domain: Domain portability

MetaQuerier 5

Dynamic query translation – Essential tasks Reconcile three levels of query heterogeneities

Attribute level: schema matching Predicate level: predicate mapping Query level: query rewriting

MetaQuerier 6

Demo.Form Assistant to help navigate the deep Web.

MetaQuerier 7

Translation objective: Closest among the valid

Tom ClancyTom Clancy

Source query Qs on source form S

U

Target query form T

Query Translation

Filter : σtitle contain “red storm” and price < 35 and age > 12

Union Query Qt*:

Input:

output:

Two goals: Syntactic valid semantic close

MetaQuerier 8

What is valid? Each source has a query model Vocabulary: predicate templates

{ P1, P2, P3, P4, P5 }

Syntax: valid combination of predicate templates { F1, F2, F3, F4, F5, F6, F7, F8 }

P1 P3 P4P2

F1 F2 F3 F4 F5 F6 F7 F8

P1 ν νP2 ν νP3 ν νP4 ν νP5 ν ν ν ν

Tom Clancy

P5

F5:

F6:

MetaQuerier 9

What is close? Define semantic closeness.

Minimal subsuming Cmin

No false positive: Miss no answer Minimizing false negative: Fewest extra answers Clear semantics: DB content independent Modular translation: Reduce translation complexity

t1:0 25

t2:25 45

s: 350

t1 v t2:0 45

t3:6545

t1 v t2 v t3:0 65

? Cmin

MetaQuerier 10

Target Query

Source Query

Enumeratevalid

Search for closest

Target Query

Query Translation

Source Query

What mechanism?

Attribute Match

Predicate Mapping

Query Rewriter

Cmin?

MetaQuerier 11

Form Extractor Form Extractor

Source query Qs Target query form QI

Attribute Matcher:Syntax-based schema matching

Predicate Mapper:Type-based search-driven mapping

Query Rewriter:Constraint-based query rewriting

Target query Qt*

Domain-specificThesaurus

Domain-specific type handlers

System architecture: Modular & lightweight

Modularized mechanism

Lightweight domain knowledge

[RahmBernstein- VLDBJ01]

[Halevy-VLDBJ01]

?

[ZhangHC-SIGMOD04]

[HeChang-SIGMOD03]

[WuYDM-SIGMOD04]

MetaQuerier 12

The core challenge: Predicate mapping Tasks

Choose operator Fill in values

Union of target predicate t*

Predicate MappingPredicate Mapping

U

Objective Minimal subsuming

Input:

output:

MetaQuerier 13

Is source-specific translation applicable?

1 ………1

……

……

..

1 ……

1

…….

adult = $t passenger = $t… …

price<$t if $t<25:

[price:between:0,25] elseif $t<45: …… …

MetaQuerier 14

Enable source-generic predicate mapping?

What is the scope of translation?

What is the mechanism of translation?

MetaQuerier 15

The right scope? Survey 150 sources for the Correspondence Matrix.

Correspondences occur within localities!

MetaQuerier 16

The right scope? Correspondence locality Type-based translation Target template P

Target Predicate t*

Type Recognizer

Domain Specific Handler

Text Handler

Numeric Handler

Datetime Handler

Predicate Mapper

Source predicate s

Correspondences occur within localities Translation by type-handler

MetaQuerier 17

The right mechanism: Is pairwise-rule based mechanism suitable?

Template

new template

1 n n+11

n

n+1

Adding one template needs to add 2n rules! And need knowledge of the old templates.

attr<$t if $t<25: [attr:between:0,25] elseif $t<45: …… …

Rule:

MetaQuerier 18

More extendable mechanism? Search-driven.

Values of the type(virtual database)

Evaluate over “database”

Templates of same type

Evaluation resultsSearch for closest

evaluator

-infinite +infinite0 1

t1: 0 25

t2:25 45

s: 350

t1 v t2:25 45

st

… …

uevaluator

MetaQuerier 19

Greedy search to construct Cmin mapping Find mapping iteratively Each iteration, greedily choose the one covering

maximal uncovered

t1:0 25

t2:25 45

s:350

t3:45 65

MetaQuerier 20

Experiments Translating 120 queries in total Between randomly paired sources from 8 domains With domain thesaurus but no type handler Accuracy as ratio of correct condition per query

Mat

chin

g

18%

40%

42%

Extraction

Mapping

Average accuracy Error distributionBasic: 3 domains New: 5 domains

MetaQuerier 21

Conclusion

System: Form assistant for querying Web databases

Problem Dynamic query translation

Contributions: Framework: Light-weight domain-based architecture Techniques: Type-based search-driven pred. mapping

Insight: Holistic integration holds promise!

MetaQuerier 22

Thank You!

For more information:

http://metaquerier.cs.uiuc.edu kcchang@cs.uiuc.edu

MetaQuerier 23

What is close? Define semantic closeness.

Minimal subsuming Cmin

No false positive Miss no correct answer

Minimizing false negative Contain fewest extra answers

Clear semantic Database content independent

Modular translation Reduce translation complexity

t1: 0 25

t2:25 45

s: 350

t1 v t2:25 65

t3:6545

t2 v t3:25 65

?

Cmin

MetaQuerier 24

Experiment: Accuracy distribution

Accuracy distribution for Basic dataset Accuracy distribution for New dataset

MetaQuerier 25

Text handler: Search space

Conceptually, union of all target predicate Practically, close-world assumption

MetaQuerier 26

Text handler: Closeness estimation Ideally, logic reasoning Practically, evaluation-by-materialization

Materialize query against a “complete” database

Recommended