17
A Model and Algorithms for Pricing Queries Tang Ruiming, Wu Huayu, Bao Zhifeng, Stephane Bressan, Patrick Valduriez

A Model and Algorithms for Pricing Queries Tang Ruiming, Wu Huayu, Bao Zhifeng, Stephane Bressan, Patrick Valduriez

Embed Size (px)

Citation preview

Page 1: A Model and Algorithms for Pricing Queries Tang Ruiming, Wu Huayu, Bao Zhifeng, Stephane Bressan, Patrick Valduriez

A Model and Algorithms for Pricing Queries

Tang Ruiming, Wu Huayu, Bao Zhifeng,

Stephane Bressan, Patrick Valduriez

Page 2: A Model and Algorithms for Pricing Queries Tang Ruiming, Wu Huayu, Bao Zhifeng, Stephane Bressan, Patrick Valduriez

2

Overview Aggdata

Page 3: A Model and Algorithms for Pricing Queries Tang Ruiming, Wu Huayu, Bao Zhifeng, Stephane Bressan, Patrick Valduriez

3

Overview

Windows Azure Marketplace

Page 4: A Model and Algorithms for Pricing Queries Tang Ruiming, Wu Huayu, Bao Zhifeng, Stephane Bressan, Patrick Valduriez

4

motivation and existing works

• People may want to buy data by asking queries.

• As stated by Koutris et al. in [Koutris et al., 2012], current pricing schemes have limitations:

• Assign prices to entire datasets.

• Assign prices to predefined views, and consumers are restricted to these views.

• May lead to arbitrage situations. E.g. 10 10-application-free accounts can be used to get 100 applications.

• In frameworks of [Koutris et al., 2012], [Koutris et al., 2013], [Li et al., 2012]

• Assign prices to pre-defined views.

• The price of a query is the price of cheapest set of pre-defined views which can determine the query. (NP-hard)

Page 5: A Model and Algorithms for Pricing Queries Tang Ruiming, Wu Huayu, Bao Zhifeng, Stephane Bressan, Patrick Valduriez

5

Framework

• In our framework• Assign prices to individual tuples.• For a query, we track the source tuples contributing

to the query result.• Each contributing source tuple is charged only once

no matter how many times it contributes.

provenance

Nature of information goods

[Balazinska et al., 2011]

Page 6: A Model and Algorithms for Pricing Queries Tang Ruiming, Wu Huayu, Bao Zhifeng, Stephane Bressan, Patrick Valduriez

6

Minimal provenance

• (provenance) Let Q be a query, D be a database. Q(D) is the query result. A provenance of Q(D) is a set of tuples L in D, such that

• (minimal provenance) A minimal provenance of Q(D) is a provenance L of Q(D) such that

• where L’ is a provenance of Q(D).

)()( LQDQ

LLLLL '','

Page 7: A Model and Algorithms for Pricing Queries Tang Ruiming, Wu Huayu, Bao Zhifeng, Stephane Bressan, Patrick Valduriez

7

Minimal provenance

Page 8: A Model and Algorithms for Pricing Queries Tang Ruiming, Wu Huayu, Bao Zhifeng, Stephane Bressan, Patrick Valduriez

8

Pricing function

• Pricing setting function maps each tuple in database to its price.

• Pricing function takes a query as input and returns its price.

• Properties of pricing function:• Contribution monotonicity: if a query uses less source

tuples than the other query, the price of the first query should be lower.

• Contribution arbitrage-freedom: if a query uses less source tuples than a set of queries, the price of the first query should be lower than the sum price of the set of queries.

• Bounded price: the price of a query is always not higher than the price of source tuples in the involved relations in the query.

Page 9: A Model and Algorithms for Pricing Queries Tang Ruiming, Wu Huayu, Bao Zhifeng, Stephane Bressan, Patrick Valduriez

9

Pricing function

• The price of a query Q in a database D is defined as the price of the cheapest minimal provenance of Q(D):

• where is the p-norm of L. Increasing p value decreases the p-norm value. Data seller can use p-norm to adjust prices according to different categories of data consumers.

pDQMLD LQpr ||||min)( ),(pL ||||

Page 10: A Model and Algorithms for Pricing Queries Tang Ruiming, Wu Huayu, Bao Zhifeng, Stephane Bressan, Patrick Valduriez

10

Algorithms for price computation

• We assume that for each result tuple, its set of minimal provenances is available.

• We aim to find the cheapest minimal provenance of the set of result tuples.

• We prove that this problem is NP-hard.• Exact algorithm:

• enumerates all the provenances of the query result. (exponential number)

• choose the cheapest one.

Page 11: A Model and Algorithms for Pricing Queries Tang Ruiming, Wu Huayu, Bao Zhifeng, Stephane Bressan, Patrick Valduriez

11

Approximation algorithms

• We devise some approximation algorithms.• Worst case

p n

Khanna et al. prove that the approximability of this problem is a polynomial factor in the size of input. ([Khanna et al., 2000] )

Page 12: A Model and Algorithms for Pricing Queries Tang Ruiming, Wu Huayu, Bao Zhifeng, Stephane Bressan, Patrick Valduriez

12

Approximation algorithms

• Heuristic 1: choose the cheapest minimal provenance for each individual result tuple independently. (greedy algorithm)

• Heuristic 2: choose the minimal provenance with the lowest average price for each individual result tuple independently. (greedy algorithm)

• Heuristic 3: Heuristic 1 but consider previous choices. (semi-greedy)

• Heuristic 4: Heuristic 2 but consider previous choices. (semi-greedy)

Page 13: A Model and Algorithms for Pricing Queries Tang Ruiming, Wu Huayu, Bao Zhifeng, Stephane Bressan, Patrick Valduriez

13

Experiments

• Effectiveness: the ratio between approximate price and exact price

• Efficiency: running time of approximation algorithms.

Page 14: A Model and Algorithms for Pricing Queries Tang Ruiming, Wu Huayu, Bao Zhifeng, Stephane Bressan, Patrick Valduriez

14

Experiments

• Effectiveness: the ratio between approximate price and exact price

• Efficiency: running time of approximation algorithms.

• Set up:• Number of result tuples is 10 for measuring

effectiveness. (ratio in the worst case is 10)• Number of result tuples varies from 1,000 to 5,000 for

measuring efficiency.• For each result tuple, the number of minimal

provenances and the size of each minimal provenance is sampled from [1,5] with uniform distribution.

Page 15: A Model and Algorithms for Pricing Queries Tang Ruiming, Wu Huayu, Bao Zhifeng, Stephane Bressan, Patrick Valduriez

15

Effectiveness50,000 runs

Page 16: A Model and Algorithms for Pricing Queries Tang Ruiming, Wu Huayu, Bao Zhifeng, Stephane Bressan, Patrick Valduriez

16

Efficiency

Page 17: A Model and Algorithms for Pricing Queries Tang Ruiming, Wu Huayu, Bao Zhifeng, Stephane Bressan, Patrick Valduriez

17

Conclusion

• We propose a framework for pricing queries based on the source tuples contributed in the query result.

• The price of a query is the price of the cheapest minimal provenance of the query result.

• We propose a baseline algorithm to compute the exact price of a query and four heuristics to compute the approximate price of a query.

• We conduct experiment to show the effectiveness and efficiency of the heuristics.