54
Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Embed Size (px)

Citation preview

Page 1: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Cleaning Uncertain Data with Quality Guarantees

Reynold Cheng, Jinchuan Chen, Xike Xie

2008 VLDB

Presented by SHAO Yufeng

Page 2: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Outline

Background

Related works

Data and Query model

PWS-quality model

Cleaning procedure

Experiments result

Page 3: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Uncertain Database(old model)

Inherent in various application

Examples: RFID data sensor networks data protected because of privacy reason

Infeasible to eliminate all uncertainty in many models

Page 4: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Uncertain Database(new model)

Previous model focus on query in the uncertain database

But what if we are able to reduce SOME of the uncertainty in this kind of database?

New model are required to produce optimal solution

Page 5: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Example 1: Sensor probing Some sensors in the sensor network might have

transmission problems and cannot update data

Commands can be sent to refresh some sensors

New certain data are obtained

Limited by the bandwidth / battery power, cannot probe too often

Page 6: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Example 2: Movie Rating

Movie ratings(IMDB, Netflix) collected from customers might contain some uncertainty

managers can communicate with customers to verify the rating data

New certain movie rating data is obtained

Limited by the human power or other resource

Page 7: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Cleaning Data

UncertainDB

Query

Ambiguous result

LESSUncertain

DB

Query

LESS ambiguousresult

Cleaning procedure

Page 8: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Real model example A database of some products and theirs

price(uncertain)Key Product

IDPrice ($) Prob.

a1 a 120 0.7

a2 a 80 0.3

b1 b 110 0.6

b2 b 90 0.4

c1 c 140 0.5

c2 c 110 0.3

c3 c 100 0.2

d1 d 10 1

Price of product a has two different possible values: 120 (prob 0.7 ) or 80 (prob 0.3)

Page 9: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Query Example 1:

Key Product ID

Price ($)

Prob.

a1 a 120 0.7

a2 a 80 0.3

b1 b 110 0.6

b2 b 90 0.4

c1 c 140 0.5

c2 c 110 0.3

c3 c 100 0.2

d1 d 10 1

Query 1(Range Query): Select product with price in range [100$, 110$]

Possible world result:({b1,c2}, 0.18), ({b1,c3}, 0.12), ({b1},0.3), ({c2},0.12), ({c3}, 0.08), ({Φ},0.2)

Page 10: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Query Example 2:

Key Product ID

Price ($) Prob.

a1 a 120 0.7

a2 a 80 0.3

b1 b 110 0.6

b2 b 90 0.4

c1 c 140 0.5

c2 c 110 0.3

c3 c 100 0.2

d1 d 10 1

Query 2 (Max query):Select product with highest price

Possible world answer:({c1}, 0.5), ({a1}, 0.35), ({c2} 0.036)({b1}, 0.06), ({b1, c2}, 0.054)({c3}, 0.054)

Page 11: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Clean up example Suppose we have some amount of resource to clean

up some data

Assume we clean up the information related to product a and c

New database with less uncertainty

Key Product ID

Price ($) Prob.

a2 a 80 1

b1 b 110 0.6

b2 b 90 0.4

c3 c 100 1

d1 d 10 1

Page 12: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Clean up example (Cont.)

Key Product ID

Price ($) Prob.

a2 a 80 1

b1 b 110 0.6

b2 b 90 0.4

c3 c 100 1

d1 d 10 1

Run query 1 again:Select product with price in range [100$, 110$]

New possible world result:({b1,c3}, 0.6), ({c3}, 0.4)

Old possible result:({b1,c2}, 0.18), ({b1,c3}, 0.12), ({b1},0.3), ({c2},0.12), ({c3}, 0.08), ({Φ},0.2)

Apparently less uncertain in the cleaned database, but clean up procedure limited by budget

New database with less uncertainty

Page 13: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Background

Related works

Data and Query model

PWS-quality model

Cleaning procedure

Experiments result

Outline

Page 14: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Important related works

Reynold Cheng, Dmitri V. Kalashnikov, Sunil Prabhakar: Evaluating Probabilistic Queries over Imprecise Data. SIGMOD Conference 2003: 551-562 Mentioned about the ideas of doing clean up in Max/Min and Range query, but

not real implementation

P. Andritsos, A. Fuxman, and R. Miller. Clean answers over

dirty databases: A probabilistic approach. In ICDE, 2006.

Introduce the technique to rewrite query

Page 15: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Important related works (Cont) Jinchuan Chen, Reynold Cheng: Quality-Aware Probing of Uncertain

Data with Resource Constraints. SSDBM 2008

Similar cleaning method

continuous pdf function representation of uncertainty

Support less query type(only range query)

Chris Mayfield, Jennifer Neville, Sunil Prabhakar ERACER: A Database Approach for Statistical Inference and Data Cleaning SIGMOD 2010 Use the attribute level correlation to provide optimized clean up

Page 16: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Background

Related works

Database and Query model

PWS-quality model

Cleaning procedure

Experiments result

Outline

Page 17: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

System Structure

ProbabilisticDatabase

QueryEngine

QueryAnswer

User

QualityEvaluator

Data CleaningAlgorithm

Quality Manager

PWS-qualityscore

Cleaning Budget

External Data Sources

Cleaning Manager

CleaningSet

Cleaning request

Dataupdate

Queryrequest

Page 18: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Important Notations

Key Product ID

Price ($) Prob.

a1 a 120 0.7

a2 a 80 0.3

b1 b 110 0.6

b2 b 90 0.4

c1 c 140 0.5

c2 c 110 0.3

c3 c 100 0.2

d1 d 10 1

tuple ti(total n tuples)

x-tuple τi(total m x-tuple)

uncertain attribute

existential probability (ei)

One x-tuple

Page 19: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Important Notations

Key Product ID

Price ($) Prob.

a1 a 120 0.7

a2 a 80 0.3

b1 b 110 0.6

b2 b 90 0.4

c1 c 140 0.5

c2 c 110 0.3

c3 c 100 0.2

d1 d 10 1

tuple ti(total n tuples)

x-tuple τi(total m x-tuple)

uncertain attribute

existential probability (ei)

One x-tuple

Page 20: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Query in possible world model

PossibleWorld

ProbabilisticDB

PossibleWorld

PossibleWorld

PW-Result

PW-Result

PWS-Quality

Final QueryAnswer

(b1,0.28), (c2,0.18), (c3,0.1)

0.18

0.1

0.1

{b1,c2}, 0.18

{b1,c3}, 0.1

- 1.44

Qualification probability(pi) of c2: 0.18Qualification probability(Pk) of c: 0.28

Page 21: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Possible Range Query(PRQ) Given a closed interval , where and , a PRQ returns

a set of tuples , where is the non-zero probability that .

],[ ba Rba , ba ),( ii pt ip

],[ bavi

Key Product ID

Price ($)

Prob.

a1 a 120 0.7

a2 a 80 0.3

b1 b 110 0.6

b2 b 90 0.4

c1 c 140 0.5

c2 c 110 0.3

c3 c 100 0.2

d1 d 10 1

Range Query:Select product with price in range [100$, 110$]

Possible world result set:({b1,c2}, 0.18), ({b1,c3}, 0.12), ({b1},0.3), ({c2},0.12), ({c3}, 0.08), ({Φ},0.2)

Prob. qj of occurrence

Page 22: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Probabilistic Maximum Query(PMaxQ) A PMaxQ returns a set of tuples , where , the probability

of , is the non-zero probability that , where and .

),( ii pt ip it

ji vv ij nj ,...,1

Key Product ID

Price ($) Prob.

a1 a 120 0.7

a2 a 80 0.3

b1 b 110 0.6

b2 b 90 0.4

c1 c 140 0.5

c2 c 110 0.3

c3 c 100 0.2

d1 d 10 1

Query:Select product with highest price

Possible world answer:({c1}, 0.5), ({a1}, 0.35), ({c2} 0.036)({b1}, 0.06), ({b1, c2}, 0.054)({c3}, 0.054)

Page 23: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Background

Related works

Data and Query model

PWS-quality model

Cleaning procedure

Experiments result

Outline

Page 24: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

PWS-quality Suppose we have two sets of possible world result:

0.20.1 0.1 0.1

0.2

0.9

0.1

{a2,b1} {a1,b2,c1} {b3,c2}

{b1}

0.3

{a1,c1}

We need a measurement to tell which result is more uncertain and by how

Solution:

Use entropy like measurement to calculate the PWS-quality (degree of uncertainty)

Page 25: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

PWS-Quality: Calculation Let qj be the prob. of getting distinct PW-result rj

Let d be the number of distinct pw-result

Negative S(D, Q) score, larger the score, better the quality

0 means no uncertainty(only 1 possible world result exist)

d

jjj qqQDS

1

log),(

Page 26: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

PWS-quality example Suppose we have a set of possible world result:

PWS score:

S(D,Q) = 0.5*log0.5 + 0.4*log0.4 + 0.1*log0.1= -0.496

0.4

0.1

{b1}{a1,c1}

0.5

{b2}

Page 27: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

PWS-quality problem

However, calculating PWS-quality for all possible worlds are too expensive

# of possible world result might be exponential

Need to speed up the algorithm

Page 28: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

x-Form PWS-Quality x-Form of PWS-Quality

g(k,D,Q)= func(existential & qualification probs. of tuples in k-th x-tuple)

Summation of quality information of all the result x-tuples

Only consider x-tuples whose tuples are in query answer

k

QDkgQDS

),,(),(

Page 29: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

x-Form of PRQ (Range Query)

Each g(k, D, Q) only require O(|τk|) time

pi and Pk are the qualification probability of the current tuple ti and current x-tuple tK which can be calculated easily

ki

ki

tik

tkkii

pPwhere

PPepQDkg

)1log()1(log),,(

Page 30: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

x-Form of PMaxQ (Max Query)

Require O(|τk|2) to calculate g(k, D, Q) for PMaxQ

Details of the proof will be talked at the end of present

k

kik

ik

ik

iki

jjk

ik

i

i

jjkikikik

k,i

k,ik

τi

ie

p

e

pe

where

eepQDkg

v

ti-th

k

0

))(1(

))1log(log(),,(

, oforder descendingin sorted

, is of tuple theSuppose

1,

1,

,

,

1,

,

1 1,,,,

Page 31: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

x-form PWS-quality summary

By transforming the original PWS-quality calculation to the x-form PWS calculation, we avoid the exponential computation time

Total computation time O(m log(n/m))

Compared to the query time, the x-form PWS-quality calculation time is small. (will be shown in the experiment)

Page 32: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Background

Related works

Data and Query model

PWS-quality model

Cleaning procedure

Experiments result

Outline

Page 33: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Cleaning with limited budget

With a limited budget, say, 10 Units, which tuples should we clean?

Key Product ID

Price ($) Prob.

a1 a 120 0.7

a2 a 80 0.3

b1 b 110 0.6

b2 b 90 0.4

c1 c 140 0.5

c2 c 110 0.3

c3 c 100 0.2

d1 d 10 1

Clean cost: 5 unit

Clean cost: 7 unit

Clean cost: 10 unit

Page 34: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Example of cleaning After Cleaning, the tuple

existential probability become 1

This x-tuple contracted to 1 single tuple with certain attribute value

Key Product ID

Price ($) Prob.

a1 a 120 0.7

a2 a 80 0.3

b1 b 110 0.6

b2 b 90 0.4

c3 c 100 1

d1 d 10 1

Page 35: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Quality improvement Expected Quality after cleaning

The set of x-tuple that we are going to clean is represented by X = {τ1, ···, τ|x|}

Quality Improvement

But quality improvement calculation is exponential

Page 36: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Computation example:

Key Product ID

Price ($)

Prob. QP

a1 a 120 0.7 0.35

a2 a 80 0.3 0

b1 b 110 0.6 0.09

b2 b 90 0.4 0

c1 c 140 0.5 0.5

c2 c 110 0.3 0.05

c3 c 100 0.2 0.024

d1 d 10 1 0

Query 2 (Max query):Select product with highest price

if we decided to clean up x-tuple c

Page 37: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Computation example (Cont.):

Key Product ID

Price ($)

Prob. QP

a1 a 120 0.7 0.7

a2 a 80 0.3 0

b1 b 110 0.6 0.18

b2 b 90 0.4 0

c1 c 140 0.5

c2 c 110 0.3

c3 c 100 1 0.12

d1 d 10 1 0

New PWS-quality S(D’, Q) = -1.17

Query 2 (Max query):Select product with highest price

We decided to clean up x-tuple cone possible case is c3 is the real world case

Page 38: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Computation example (Cont.):

Key Product ID

Price ($)

Prob. QP

a1 a 120 0.7 0.7

a2 a 80 0.3 0

b1 b 110 0.6 0.18

b2 b 90 0.4 0

c1 c 140 0.5

c2 c 110 1 0.12

c3 c 100 0.2

d1 d 10 1 0

Query 2 (Max query):Select product with highest price

We decided to clean up x-tuple canother possible case is c2 is the real world case

New PWS-quality S(D’, Q) = -1.17

Page 39: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Computation example (Cont.):

Key Product ID

Price ($)

Prob. QP

a1 a 120 0.7 0.35

a2 a 80 0.3 0

b1 b 110 0.6 0.09

b2 b 90 0.4 0

c1 c 140 0.5 0.5

c2 c 110 0.3 0.05

c3 c 100 0.2 0.024

d1 d 10 1 0

Query 2 (Max query):Select product with highest price

To clean up x-tuple cwe have 3 different possible real world scenarios

Expected quality of cleaning up x-tuple c = 0 * 0.5 + (-1.17) * 0.3 + (- 1.17) * 0.2 = -0.585

Page 40: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

x-form quality improvement calculation of the quality improvement in x-form will

become following

X is the set of x-tuple that we are going to clean

proof: rewrite the original E(S(D’(t), Q)) as

left side is equal to 0, right side is unchanged after the cleaning

Xk

QDkgQDXI

),,(),,(

Page 41: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Optimal Data Cleaning Algorithm in x-form quality improvement problem, we get the

following objective function:

cK: the cleaning cost k-th x-tuple

C: total cleaning budget Z: total number of x-tuple with pi in (0,1)

Can be transformed to 0/1 Knapsack problem

Zkb

Ccbtosubject

QDkgbMaximize

k

Z

k kk

Z

k k

,...,1},1,0{

),,(

1

1

Page 42: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

DP algorithm

Time complexity O(CZ) Space Complexity O(CZ2) C: total budget Z: number of x-tuples

Page 43: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Other heuristics methods:

Random

MaxQP Select x-tuples with highest qualification probability

Greedy: Rank x-tuples with max expected quality improvement

per cleaning cost

Page 44: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Background

Related works

Data and Query model

PWS-quality model

Cleaning procedure

Experiments result

Outline

Page 45: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Experiment set up

Size of DB 10 K x-tuples, 100 K tuples (synthetic)4,999 x-tuples, 10,037tuples (Netflix movie ratings)

Prob. distributions Gaussian (variance = 100)

Cleaning cost Uniform in [1,10]

Resource Budget [20,500]default = 30

Page 46: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

PWS-quality(S) vs database size(Z) (PRQ)

200 400 600 800 1000 1200 1400 1600 1800 2000-6000

-5000

-4000

-3000

-2000

-1000

0

z

S

GaussianUniform

Page 47: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Quality evaluation performance(PRQ)

0 2000 4000 6000 8000 10000 12000 14000 16000 180000

20

40

60

80

100

120

z

time(

ms)

Query EvaluationQuality Caculation

(database size)

Page 48: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Running time for Clean up selection(PMaxQ)

100

101

102

10310

-2

10-1

100

101

102

103

C

time(

ms)

BasicRandomMaxQPDPGreedy

Total budget

Page 49: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Quality improvement vs Budget(PRQ)

10 20 30 40 50 60 70 80 90 1000

5

10

15

20

25

30

35

C

IRandomMaxQPDPGreedy

Total budget

Quality

Improvem

ent

Page 50: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Quality improvement vs Budget(PMaxQ)

10 15 20 25 30 35 40 45 500.5

1

1.5

2

2.5

3

C

I

RandomMaxQPDPGreedy

Total budget

Quality

Improvem

ent

Page 51: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Quality improvement vs Budget(PRQ, real data)

0 20 40 60 80 1000

5

10

15

C

IRandomMaxQPDPGreedy

Quality

Improvem

ent

Total budget

Page 52: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Thank you

Q & A

Page 53: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Appendix: Deriving x-form of PRQ

d

rtjji

ji

qp1

d

jjj qqQDS

1

log),(

jkji r

krt

ij Peq )1(

d

j rk

rtij

jkji

PeqQDS1

)1(log),(

...))1log(...)1log(...log...(log

......

...))1log(...)1log(...log...(log

...))1log(...)1log(...log...(log),(

11

112

111

kid

ki

ki

PPeeq

PPeeq

PPeeqQDS

ii ep log

kit

ik pP

)1log()1( kk PP

m

k tkkii

ki

PPepQDS1

))1log()1(log(),(

Page 54: Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng

Appendix: Deriving x-form of PMaxQ

d

jjj qqQDS

1

log),(

jkji r

jkrt

ij vreq ).Pr(

),(

1,1).Pr(

kjs

llkjk evr

A number in [0, ]k

d

j rjk

rtij

jkji

vreqQDS1

)).Pr(log(),(

n

iii ep

1

log

m

k i

i

jjkik

k

e1 1

,, )1log(