View
217
Download
0
Tags:
Embed Size (px)
Citation preview
Query Relaxation Using Malleable Schemas
Xuan Zhou, Julien Gaugaz, Wolf-Tilo Balke, Wolfgang NejdlL3S Research Center
Leibniz UniversityHanover, Germany
Presented by Aaron StewartBYU CS 652Spring 2009
Problem
+ = ?
Problem
• Multiple data sources
• Unmatched schemas
Approach
1. Malleable schemas
2. Discover correlations
3. Relax user queries
Malleable Schemas
• Allow duplicate fields
• Allow related fields
Malleable Schemas
Malleable Schemasfirst_name, sur_name
name
Malleable Schemas
contents
body
In Practice: Tables
• “…a malleable schema… contains imprecise and overlapping definitions of attributes or relationships.”
• “In this way, a malleable schema can capture such heterogeneous data structures as in Figure 1.”
In Practice: Tables
In Practice: Tables
Entities (database records, rows)
Attributes (database fields, columns)
Equivalently: Distinct tables
Query Relaxation Planning
• Multiple queries– Different columns or tables– As few queries as possible
• Exponential number of relaxed queries– Evaluate in order of precision– Stop at k results
Query Relaxation Planning
A1 A2
relaxed attributechild attributes
Query Relaxation Planning
• A “relaxed query always yields better precision than its child queries, so that it should always be evaluated prior to its child queries”
Parent/Child Relationship
• We would think A is the parent, and A1 and A2 are the children, but…
• Put them in order of correlation probability– If P(A|A1) > P(A|A2)– Then A => A1 => A2
Query Relaxation Planning
Query Relaxation
Experiments
• Data sets– IMDB Movies– Amazon.com DVDs and VHS videos
Results
Results
Results
Analysis
• Strengths– Handles mixed schemas– Well-designed algorithms (IMO)
• Future work– Speed