17
Data Integration Aggregate Query Answering under Uncertain Schema Mappings Avigdor Gal, Maria Vanina Martinez, Gerardo I. Simari, VS Subrahmanian Presented By Stephen Lynn

Aggregate Query Answering under Uncertain Schema Mappings

  • Upload
    shada

  • View
    30

  • Download
    0

Embed Size (px)

DESCRIPTION

Aggregate Query Answering under Uncertain Schema Mappings. Avigdor Gal, Maria Vanina Martinez, Gerardo I. Simari, VS Subrahmanian Presented By Stephen Lynn. Overview. Aggregate Queries Probabilistic Schema Mapping Goals/Objectives Aggregate Processing (3 proposals) By-Table Algorithm - PowerPoint PPT Presentation

Citation preview

Page 1: Aggregate Query Answering under Uncertain Schema Mappings

Data Integration

Aggregate Query Answering under Uncertain Schema Mappings

Avigdor Gal, Maria Vanina Martinez, Gerardo I. Simari, VS Subrahmanian

Presented By Stephen Lynn

Page 2: Aggregate Query Answering under Uncertain Schema Mappings

Data Integration

Overview Aggregate Queries Probabilistic Schema Mapping Goals/Objectives Aggregate Processing (3 proposals) By-Table Algorithm By-Tuple Algorithm Evaluation Analysis

Page 3: Aggregate Query Answering under Uncertain Schema Mappings

Data Integration

Aggregate Queries

COUNT, MIN, MAX, SUM, AVG

ID Price Quantity1 2.30 2

2 3.20 4

3 7.34 1

4 8.29 20

5 3.32 3

Simple PTIME algorithms to compute

Page 4: Aggregate Query Answering under Uncertain Schema Mappings

Data Integration

Probabilistic Schema Mappings

Page 5: Aggregate Query Answering under Uncertain Schema Mappings

Data Integration

By-Table vs By-Tuple

Tuple – consider all possible mappings for each tuple

Table – single mapping for entire table P(date→postedDate) = 0.7 P(date→reducedDate) = 0.3

Page 6: Aggregate Query Answering under Uncertain Schema Mappings

Data Integration

Goals/ObjectivesImpact Analysis of Probabilistic Schemas on Aggregate Queries

Aggregate Query AlgorithmsTime Complexity AnalysisEvaluation

Page 7: Aggregate Query Answering under Uncertain Schema Mappings

Data Integration

Aggregation Methods

RangeDistribution

Expected Value

Page 8: Aggregate Query Answering under Uncertain Schema Mappings

Data Integration

Method Relationships Distribution

Most time consumingMost information

RangeComputed directly from distribution

Expected ValueComputed directly from distribution

More efficient ways to compute

Page 9: Aggregate Query Answering under Uncertain Schema Mappings

Data Integration

By-Table Algorithm

All PTIME computable

Page 10: Aggregate Query Answering under Uncertain Schema Mappings

Data Integration

By-Tuple Algorithm (COUNT)

O(n * m)

Page 11: Aggregate Query Answering under Uncertain Schema Mappings

Data Integration

Example By-Tuple (COUNT)

Page 12: Aggregate Query Answering under Uncertain Schema Mappings

Data Integration

Time Complexity

Page 13: Aggregate Query Answering under Uncertain Schema Mappings

Data Integration

Evaluation Empirical Evaluation

Real-world dataset (eBay)Synthetic dataset

Evaluate Time ComplexityVary tuple numbersVary attribute mappings

Page 14: Aggregate Query Answering under Uncertain Schema Mappings

Data Integration

Evaluation Results

Page 15: Aggregate Query Answering under Uncertain Schema Mappings

Data Integration

Evaluation Results

Page 16: Aggregate Query Answering under Uncertain Schema Mappings

Data Integration

Evaluation Results

Page 17: Aggregate Query Answering under Uncertain Schema Mappings

Data Integration

Analysis Strengths

Effect of probabilistic schemas on aggregatesNice PTIME algorithms

WeaknessesEvaluation was obviousBy-Table results biased by database optimizations

Future Work Improve algorithmsExtend to sub-queriesHeuristics