26
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the Fly Zhen Zhang, Bin He, and Kevin C. Chang

The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the

Embed Size (px)

DESCRIPTION

MetaQuerier 3 The Need: Querying alternative sources in the same domain Sources are proliferating in the same domain  2004 survey found 10% Web sites are “deep”  totaling 450,000 DBs on the Web Each query can often find many useful DBs Different query needs different sources  How to query across dynamic sources?

Citation preview

Page 1: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the

The Database and Info. Systems Lab.University of Illinois at Urbana-Champaign

Light-weight Domain-based Form Assistant:

Querying Web Databases On the Fly

Zhen Zhang, Bin He, and Kevin C. Chang

Page 2: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the

MetaQuerier 2

The Context: MetaQuerier @ UIUC Exploring and integrating the deep Web

Explorer• source discovery• source modeling• source indexing

Integrator• source selection• schema integration• query mediation

FIND sources

QUERY sources

db of dbs

unified query interface

Amazon.comCars.com

411localte.com

Apartments.com

Page 3: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the

MetaQuerier 3

The Need: Querying alternative sources

in the same domain Sources are proliferating in the same domain

2004 survey found 10% Web sites are “deep” totaling 450,000 DBs on the Web

Each query can often find many useful DBs Different query needs different sources

How to query across dynamic sources?

Page 4: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the

MetaQuerier 4

The Problem: Query translation on-the-fly

Challenge: No pre-configured source-specific translation knowledge

Requirements: Within domain: Source generality Across domain: Domain portability

Page 5: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the

MetaQuerier 5

Dynamic query translation – Essential tasks Reconcile three levels of query heterogeneities

Attribute level: schema matching Predicate level: predicate mapping Query level: query rewriting

Page 6: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the

MetaQuerier 6

Demo.Form Assistant to help navigate the deep Web.

Page 7: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the

MetaQuerier 7

Translation objective: Closest among the valid

Tom ClancyTom Clancy

Source query Qs on source form S

U

Target query form T

Query Translation

Filter : σtitle contain “red storm” and price < 35 and age > 12

Union Query Qt*:

Input:

output:

Two goals: Syntactic valid semantic close

Page 8: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the

MetaQuerier 8

What is valid? Each source has a query model Vocabulary: predicate templates

{ P1, P2, P3, P4, P5 }

Syntax: valid combination of predicate templates { F1, F2, F3, F4, F5, F6, F7, F8 }

P1 P3 P4P2

F1 F2 F3 F4 F5 F6 F7 F8

P1 ν νP2 ν νP3 ν νP4 ν νP5 ν ν ν ν

Tom Clancy

P5

F5:

F6:

Page 9: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the

MetaQuerier 9

What is close? Define semantic closeness.

Minimal subsuming Cmin

No false positive: Miss no answer Minimizing false negative: Fewest extra answers Clear semantics: DB content independent Modular translation: Reduce translation complexity

t1:0 25

t2:25 45

s: 350

t1 v t2:0 45

t3:6545

t1 v t2 v t3:0 65

? Cmin

Page 10: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the

MetaQuerier 10

Target Query

Source Query

Enumeratevalid

Search for closest

Target Query

Query Translation

Source Query

What mechanism?

Attribute Match

Predicate Mapping

Query Rewriter

Cmin?

Page 11: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the

MetaQuerier 11

Form Extractor Form Extractor

Source query Qs Target query form QI

Attribute Matcher:Syntax-based schema matching

Predicate Mapper:Type-based search-driven mapping

Query Rewriter:Constraint-based query rewriting

Target query Qt*

Domain-specificThesaurus

Domain-specific type handlers

System architecture: Modular & lightweight

Modularized mechanism

Lightweight domain knowledge

[RahmBernstein- VLDBJ01]

[Halevy-VLDBJ01]

?

[ZhangHC-SIGMOD04]

[HeChang-SIGMOD03]

[WuYDM-SIGMOD04]

Page 12: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the

MetaQuerier 12

The core challenge: Predicate mapping Tasks

Choose operator Fill in values

Union of target predicate t*

Predicate MappingPredicate Mapping

U

Objective Minimal subsuming

Input:

output:

Page 13: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the

MetaQuerier 13

Is source-specific translation applicable?

1 ………1

……

……

..

1 ……

1

…….

adult = $t passenger = $t… …

price<$t if $t<25:

[price:between:0,25] elseif $t<45: …… …

Page 14: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the

MetaQuerier 14

Enable source-generic predicate mapping?

What is the scope of translation?

What is the mechanism of translation?

Page 15: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the

MetaQuerier 15

The right scope? Survey 150 sources for the Correspondence Matrix.

Correspondences occur within localities!

Page 16: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the

MetaQuerier 16

The right scope? Correspondence locality Type-based translation Target template P

Target Predicate t*

Type Recognizer

Domain Specific Handler

Text Handler

Numeric Handler

Datetime Handler

Predicate Mapper

Source predicate s

Correspondences occur within localities Translation by type-handler

Page 17: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the

MetaQuerier 17

The right mechanism: Is pairwise-rule based mechanism suitable?

Template

new template

1 n n+11

n

n+1

Adding one template needs to add 2n rules! And need knowledge of the old templates.

attr<$t if $t<25: [attr:between:0,25] elseif $t<45: …… …

Rule:

Page 18: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the

MetaQuerier 18

More extendable mechanism? Search-driven.

Values of the type(virtual database)

Evaluate over “database”

Templates of same type

Evaluation resultsSearch for closest

evaluator

-infinite +infinite0 1

t1: 0 25

t2:25 45

s: 350

t1 v t2:25 45

st

… …

uevaluator

Page 19: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the

MetaQuerier 19

Greedy search to construct Cmin mapping Find mapping iteratively Each iteration, greedily choose the one covering

maximal uncovered

t1:0 25

t2:25 45

s:350

t3:45 65

Page 20: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the

MetaQuerier 20

Experiments Translating 120 queries in total Between randomly paired sources from 8 domains With domain thesaurus but no type handler Accuracy as ratio of correct condition per query

Mat

chin

g

18%

40%

42%

Extraction

Mapping

Average accuracy Error distributionBasic: 3 domains New: 5 domains

Page 21: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the

MetaQuerier 21

Conclusion

System: Form assistant for querying Web databases

Problem Dynamic query translation

Contributions: Framework: Light-weight domain-based architecture Techniques: Type-based search-driven pred. mapping

Insight: Holistic integration holds promise!

Page 22: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the

MetaQuerier 22

Thank You!

For more information:

http://metaquerier.cs.uiuc.edu [email protected]

Page 23: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the

MetaQuerier 23

What is close? Define semantic closeness.

Minimal subsuming Cmin

No false positive Miss no correct answer

Minimizing false negative Contain fewest extra answers

Clear semantic Database content independent

Modular translation Reduce translation complexity

t1: 0 25

t2:25 45

s: 350

t1 v t2:25 65

t3:6545

t2 v t3:25 65

?

Cmin

Page 24: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the

MetaQuerier 24

Experiment: Accuracy distribution

Accuracy distribution for Basic dataset Accuracy distribution for New dataset

Page 25: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the

MetaQuerier 25

Text handler: Search space

Conceptually, union of all target predicate Practically, close-world assumption

Page 26: The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the

MetaQuerier 26

Text handler: Closeness estimation Ideally, logic reasoning Practically, evaluation-by-materialization

Materialize query against a “complete” database