21
MetaQuerier Mid-flight: Toward Large-Scale Integration for the Deep Web Kevin C. Chang

MetaQuerier Mid-flight: Toward Large-Scale Integration for the Deep Web Kevin C. Chang

  • View
    216

  • Download
    3

Embed Size (px)

Citation preview

Page 1: MetaQuerier Mid-flight: Toward Large-Scale Integration for the Deep Web Kevin C. Chang

MetaQuerier Mid-flight: Toward Large-Scale

Integrationfor the Deep Web

Kevin C. Chang

Page 2: MetaQuerier Mid-flight: Toward Large-Scale Integration for the Deep Web Kevin C. Chang

MetaQuerier 2

The previous Web: things are just on the surface

Page 3: MetaQuerier Mid-flight: Toward Large-Scale Integration for the Deep Web Kevin C. Chang

MetaQuerier 3

The current Web: Getting “deeper” with non-trivial access

Page 4: MetaQuerier Mid-flight: Toward Large-Scale Integration for the Deep Web Kevin C. Chang

MetaQuerier 4

MetaQuerier: Exploring and integrating deep Web

Explorer• source discovery• source modeling• source indexing

Integrator• source selection• schema integration• query mediation

FIND sources

QUERY sources

db of dbs

unified query interface

Amazon.comCars.com

411localte.com

Apartments.com

Page 5: MetaQuerier Mid-flight: Toward Large-Scale Integration for the Deep Web Kevin C. Chang

MetaQuerier 5

Toward large scale integration

We are facing very different “large scale” scenarios! Many sources on the Web, order of 105

Such integration must be dynamic and ad-hoc: Dynamic discovery:

Sources are dynamically changing On-the-fly integration:

Queries are ad-hoc and need different sources

Page 6: MetaQuerier Mid-flight: Toward Large-Scale Integration for the Deep Web Kevin C. Chang

MetaQuerier 6

Our proposal: MetaQuerier for the deep Web MetaExplorer: April 2002 --

IIS-0233199 CAREER: Dynamic Ad-hoc Information Integration across the Internet

MetaIntegrator: August 2003 -- IIS-0313260 ITR: Shallow Integration over the

Deep Web: A Holistic Approach

This talk: midterm report – Lessons learned!

Page 7: MetaQuerier Mid-flight: Toward Large-Scale Integration for the Deep Web Kevin C. Chang

MetaQuerier 7

Lesson #1:

Be careful with what you propose.

Because you may actually get it.

Page 8: MetaQuerier Mid-flight: Toward Large-Scale Integration for the Deep Web Kevin C. Chang

MetaQuerier 8

The challenge boils down to – How to deal with “deep” semantics across a large scale?“Semantics” is the key in integration! How to understand a query interface?

Where is the first condition? What’s its attribute? How to match query interfaces?

What does “author” on this source match on that? How to translate queries?

How to ask this query on that source?

Page 9: MetaQuerier Mid-flight: Toward Large-Scale Integration for the Deep Web Kevin C. Chang

MetaQuerier 9

Lesson #2:

Think not only the right techniques but also the right

goals. “As needs are so great,

compromise is possible.” -- Carey and Haas

Page 10: MetaQuerier Mid-flight: Toward Large-Scale Integration for the Deep Web Kevin C. Chang

MetaQuerier 10

Our goals defined

Domain-based integration Sources in the same domain are simpler to integrate Such sources are useful to integrate

Semi-transparent integration Bring users to the right sources Help users to interact as automatically as possible

Page 11: MetaQuerier Mid-flight: Toward Large-Scale Integration for the Deep Web Kevin C. Chang

MetaQuerier 11

Lesson #3:

Send your scouts. Survey the frontier before you

go to the battle.

Page 12: MetaQuerier Mid-flight: Toward Large-Scale Integration for the Deep Web Kevin C. Chang

MetaQuerier 12

Our survey found…

Challenge reassured: 450,000 online databases 1,258,000 query interfaces 307,000 deep web sites 3-7 times increase in 4 years

Insight revealed: Web sources are not arbitrarily complex “Amazon effect” – convergence and regularity

naturally emerge

Page 13: MetaQuerier Mid-flight: Toward Large-Scale Integration for the Deep Web Kevin C. Chang

MetaQuerier 13

“Amazon effect” in action…

Attributes converge in a domain!

Constraint patterns converge even across domains!

Page 14: MetaQuerier Mid-flight: Toward Large-Scale Integration for the Deep Web Kevin C. Chang

MetaQuerier 14

Lesson #4:

The challenge may

as well be an opportunity. Large scale is not only a

challenge but also an opportunity.

Page 15: MetaQuerier Mid-flight: Toward Large-Scale Integration for the Deep Web Kevin C. Chang

MetaQuerier 15

Shallow observable clues: ``underlying'' semantics often relates to the ``observable''

presentations in some way of connection. Holistic hidden regularities:

Such connections often follow some implicit properties, which will reveal holistically across sources

Large-scale itself presents opportunity -- Shallow integration across holistic sources

Semantics:(to be discovered)

Presentations(observed)

Reverse Analysis

Some Way of Connection

Hidden Regulariti

es

Page 16: MetaQuerier Mid-flight: Toward Large-Scale Integration for the Deep Web Kevin C. Chang

MetaQuerier 16

Some evidences for holistic integration

Evidence 1: [SIGMOD04]

Query Interface Understanding

Hidden-syntax parsing

Evidence 2: [SIGMOD03, KDD04]

Matching Query InterfacesHidden-model

discovery

attributeoperator value

Page 17: MetaQuerier Mid-flight: Toward Large-Scale Integration for the Deep Web Kevin C. Chang

MetaQuerier 17

Evidences for holistic integration

Evidence 1: [SIGMOD04]

Query Interface Understandingby Hidden-syntax parsing

Evidence 2: [SIGMOD03, KDD04]

Query Interfaces Matchingby Hidden-model discovery

QueryCapabilitie

s

Visual Patterns

Hidden Syntax

(Grammar)

SyntacticComposer

Syntactic Analyzer

AttributeMatchings

AttributeOccurrence

s

Hidden Generativ

eModel

StatisticGenerator

StatisticAnalyzer

Page 18: MetaQuerier Mid-flight: Toward Large-Scale Integration for the Deep Web Kevin C. Chang

MetaQuerier 18

Putting together: The MetaQuerier system

DatabaseCrawler

DatabaseCrawler

MetaQuerier

InterfaceExtraction

InterfaceExtraction

SourceClustering

SourceClustering

SchemaMatching

SchemaMatching

The Deep Web

Back-end: Semantics Discovery

Front-end: Query Execution

QueryTranslation

QueryTranslation

SourceSelection

SourceSelection

Grammar

Type Patterns

ResultCompilation

ResultCompilation

Deep Web Repository

Unified InterfacesSubject DomainsQuery CapabilitiesQuery Interfaces

Query Web databases Find Web databases

Page 19: MetaQuerier Mid-flight: Toward Large-Scale Integration for the Deep Web Kevin C. Chang

MetaQuerier 19

Lesson #5:

Use undergraduates.

Then it might be possible to build systems at schools.

Page 20: MetaQuerier Mid-flight: Toward Large-Scale Integration for the Deep Web Kevin C. Chang

MetaQuerier 20

Conclusion: Toward large scale integration

Status: Completed or in progress Deep Web survey [SIGMOD-Record Sep’04] Query-interface understanding [SIGMOD’04]

Schema matching [SIGMOD’03, KDD’04]

Source clustering [CIKM’04]

Query translation [VLDB-IIWeb’04]

Shallow, holistic integration approach [VLDB-IIWeb’04, SIGMOD-Record Dec’04]

Current focus: System integration for building an integration system

Page 21: MetaQuerier Mid-flight: Toward Large-Scale Integration for the Deep Web Kevin C. Chang

MetaQuerier 21

Thank You!

For more information:http://[email protected]

Welcome to see our demo tomorrow!