45
1 Best-Effort Top-k Query Processing Under Budgetary Constraints Michal Shmueli-Scheuer (IBM Haifa Research Lab and UCI) Yosi Mass, Haggai Roitman Chen Li Ralf Schenkel, Gerhard Weiku

Best-Effort Top-k Query Processing Under Budgetary Constraints

Embed Size (px)

DESCRIPTION

Best-Effort Top-k Query Processing Under Budgetary Constraints. Michal Shmueli-Scheuer (IBM Haifa Research Lab and UCI). Yosi Mass, Haggai Roitman. Chen Li. Ralf Schenkel, Gerhard Weikum. Mobile Applications Highly impatient users, need fast results. Motivating Example. - PowerPoint PPT Presentation

Citation preview

Page 1: Best-Effort Top-k Query Processing Under Budgetary Constraints

1

Best-Effort Top-k Query Processing Under Budgetary Constraints

Michal Shmueli-Scheuer

(IBM Haifa Research Lab and UCI)

Yosi Mass, Haggai Roitman Chen Li Ralf Schenkel, Gerhard Weikum

Page 2: Best-Effort Top-k Query Processing Under Budgetary Constraints

2

Motivating Example

Engine

Top-kresultsqueries

Michal Shmueli-Scheuer

Top-k

Mobile Applications

Highly impatient

users, need fast

results.

Mediation Systems

Achieve high query throughput.

Online Analytics (e.g. logs)

Achieve high query throughput.

Page 3: Best-Effort Top-k Query Processing Under Budgetary Constraints

3

• Pre-computed lists over multiple attributes.

• Combine scores by some monotonic aggregation function.

• Two accesses modes:– sorted access (Cs)– random access (Cr)

• Objective: Compute k objects with highest scores.

Traditional top-k query

Rm

c0.9

b0.6

g0.5

…..

a0.4

R1

a0.9

b0.6

c0.5

…..

d0.4

n

m

sort

ed

R2

d0.87

a0.85

f0.5

…..

c0.2

Michal Shmueli-Scheuer

Page 4: Best-Effort Top-k Query Processing Under Budgetary Constraints

4

NRA algorithm (Fagin et al.)

a[0.9,1.77]

d[0.87,1.77]

Top-2R1

a0.9

b0.6

c0.5

…..

d0.4

R2

d0.87

a0.85

f0.5

.…..

c0.2

Worst score

Best score

highi

mink

candidates

mink > best-score of candidates

f = SUM

Michal Shmueli-Scheuer

Page 5: Best-Effort Top-k Query Processing Under Budgetary Constraints

5

NRA algorithm (Fagin et al.)

a[1.75,1.75]

d[0.87,1.47]

Top-2R1

a0.9

b0.6

c0.5

…..

d0.4

R2

d0.87

a0.85

f0.25

.…..

c0.2

Worst score

Best score

highi

mink

b[0.6,1.45]

candidates

mink > best-score of candidates

Michal Shmueli-Scheuer

Page 6: Best-Effort Top-k Query Processing Under Budgetary Constraints

6

NRA algorithm (Fagin et al.)

a[1.75,1.75]

d[0.87,1.37]

Top-2R1

a0.9

b0.6

c0.5

…..

d0.4

R2

d0.87

a0.85

f0.25

.…..

c0.2

Worst score

Best score

highi mink

b[0.6,0.85]

c[0.5,0.75]

f[0.25,0.75]

candidates

mink > best-score of candidates

Michal Shmueli-Scheuer

Page 7: Best-Effort Top-k Query Processing Under Budgetary Constraints

7

Top-k with Budget Constraints

R1

s0.95

u0.93

t0.92

d0.9

x0.5

y0.4

z0.2

R2

a1.0

b0.9

c0.85

d0.8

e0.7

t0.6

f0.4

..

d1.7

t1.52

Top-2NRA: 12Cs = 12

precision =0.5

Cs=1, Cr =3

f = SUM

Access Costs

Sorted access cost- Cs

Random access cost- Cr

Budget =10 ?

TA: 7Cs +7Cr = 28

precision =0Given budget B ,maximize result quality

Michal Shmueli-Scheuer

Page 8: Best-Effort Top-k Query Processing Under Budgetary Constraints

8

Contributions

• Sorted Accesses– Efficient Plan– Solution with Adaptive

• Sorted and Random Accesses– Efficient Plan– Solution with Adaptive

• Experiments

Michal Shmueli-Scheuer

Page 9: Best-Effort Top-k Query Processing Under Budgetary Constraints

9

Results Under Limited Budget

Michal Shmueli-Scheuer

K results for unlimited Results for limited budget

budget

Page 10: Best-Effort Top-k Query Processing Under Budgetary Constraints

10

Efficient Plan- Sorted Accesses

• Assume that we know the k results for unlimited budget (REXACT).

• Plan – {L1,4} {L2,2}

o5

o1

Top-2

P1

P2

Q1

Q2

• Interesting positions- where the k objects appear in the lists.

L1 L2

o1, SL1

o1, SL2

o5, SL1

o2, SL2

o5, SL2

o4, SL2

o8, SL1

o6, SL1

o3, SL2

Michal Shmueli-Scheuer

Page 11: Best-Effort Top-k Query Processing Under Budgetary Constraints

11

Efficient Plan- Sorted Accesses

• Goal: find plan t, such that :

|||R|maxarg e

||t

xacttBtTt RR

P1

P2

Q1

Q2

L1 L2

o1, SL1

o1, SL2

o5, SL1

o2, SL2

o5, SL2

o4, SL2

o8, SL1

o6, SL1

o3, SL2

Denoted as ROPT

Plans for B=5

Plan: {L1,2} {L2,3}

Michal Shmueli-Scheuer

Page 12: Best-Effort Top-k Query Processing Under Budgetary Constraints

12

Sorted Accesses

• Observations:

Prefer high scores

L1 L2 L3

O2, SL1 O2, SL2 O2, SL3

O1, SL1 O1, SL2

Michal Shmueli-Scheuer

Page 13: Best-Effort Top-k Query Processing Under Budgetary Constraints

13

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

sco

res

<title>

<description>

Observations – contd.

Prefer large score reductions

title=“war” description=“weapon”

Michal Shmueli-Scheuer

Page 14: Best-Effort Top-k Query Processing Under Budgetary Constraints

14

Score Utilities

Score gain: Score reduction:

o1, 0.6

o2, 1

o5, 0.8

o4, 0.9

o3, 0.7

y =39.03

8.09.01

2.08.01

Michal Shmueli-Scheuer

Page 15: Best-Effort Top-k Query Processing Under Budgetary Constraints

15

Optimization Problem

bbts

xLutil

m

ii

m

i

i

1

1

,

.

))(( maximizes

Where m is the number of lists

• Bi-objective optimization problem:

util(Li,x) = * gain +(1-)* reduction

Heuristics:

• Fair Heuristic

• Rank Heuristic

Michal Shmueli-Scheuer

Page 16: Best-Effort Top-k Query Processing Under Budgetary Constraints

16

Adaptive

gain reduction)) (1-(

time

Michal Shmueli-Scheuer

Page 17: Best-Effort Top-k Query Processing Under Budgetary Constraints

17

Adaptive

candidates

top-k

o4 [0.6,bs]

o1 [ws,bs]

o2 [ws,bs]

o3 [0.8,bs]

L1 L2 L3

O1, SL1

O1, SL2

O1, SL3

)(

] | )([)(cEi

iiikhighScSPcp

o6 [ws,bs]

hight1

hight2

Theobald et al. VLDB04

(o4) = 0.8-0.6=0.2

Michal Shmueli-Scheuer

Page 18: Best-Effort Top-k Query Processing Under Budgetary Constraints

18

Adaptive

setcandckkcp

setcandp

.

)(|.|

kp̂

TREC query, k=100

Michal Shmueli-Scheuer

Page 19: Best-Effort Top-k Query Processing Under Budgetary Constraints

19

Efficient Plan- Random Accesses

• Observations:– random accesses occur always after sorted

accesses have been finished.

schedule 1: {SA……RA……SA….}

schedule 2: {SA……SA……RA….}

precision(schedule1) = precision(schedule2)

Michal Shmueli-Scheuer

Page 20: Best-Effort Top-k Query Processing Under Budgetary Constraints

20

Observations- contd.

• Random accesses are only useful to objects in REXACT.

L2

o1, SL2

o2, SL2

o5, SL2

o5, Not in

REXACT

top-k

o1 [ws,bs]

o5 [ws,bs]

o2 [ws,bs]

candidates

o4 [ws,bs]o5 [ws,bs]Precision

remains the same

Precision reduced

o1 [ws,bs]

o2 [ws,bs]

o3 [ws,bs]

Michal Shmueli-Scheuer

Page 21: Best-Effort Top-k Query Processing Under Budgetary Constraints

21

Random Accesses

Gathering with Sorted

Probing with Random

• When to switch from SA to RA?

(1-(

)(

Not enough RAs to prune the candidates

Not enough good candidates, RA is wasted

time

Michal Shmueli-Scheuer

Page 22: Best-Effort Top-k Query Processing Under Budgetary Constraints

22

Random Accesses

• Switch from Sorted to Random:

R= (1- )*SS – total cost of sorted accesses.

R – total cost for random accesses.

• Which items to access ?– maximize expected score.

S+R > B

Michal Shmueli-Scheuer

Page 23: Best-Effort Top-k Query Processing Under Budgetary Constraints

23

Experimental Data• TREC Terabyte

– 25M webpages– 50 queries with average length of 3 words.

• IMDB – 375,000 movies– 20 queries , each with 4 attributes: {Title, Genre, Actors, Description}

• Synthetic data

– Zipf, #lists =[2,6], #objects =[10000,1000000]

• Aggregate Function : Sum

Michal Shmueli-Scheuer

Page 24: Best-Effort Top-k Query Processing Under Budgetary Constraints

24

Evaluation Methods

• percentage of optimal precision

opt

a

precision

precision lg

Michal Shmueli-Scheuer

• SME

RalgRopt RoptRexact

Page 25: Best-Effort Top-k Query Processing Under Budgetary Constraints

25

50%

60%

70%

80%

90%

500 1000 2000 3000 4000 5000

Budget (#SA)

per

cen

tag

e o

f O

pti

mal

Pre

cisi

on

NRA

KBA

Fair

Ranking

Results- Sorted Accesses

TREC, k=100

• Less budget, more improvement

Michal Shmueli-Scheuer

Page 26: Best-Effort Top-k Query Processing Under Budgetary Constraints

26

20%

30%

40%

50%

60%

70%

80%

90%

20 50 100

k

per

cen

tag

e o

f O

pti

mal

Pre

cisi

on

NRA

KBA

Fair

Ranking

Varied k

IMDB, B=400

• Lower K, more improvement.

Michal Shmueli-Scheuer

Page 27: Best-Effort Top-k Query Processing Under Budgetary Constraints

27

40%

60%

80%

100%

2 3 4 5 6

Number of Lists

per

cen

tag

e o

f O

pti

mal

Pre

cisi

on NRA

KBA

Fair

Ranking

Number of Lists

Zipf, K=100, B=4000

• More lists, more improvement.

Michal Shmueli-Scheuer

Page 28: Best-Effort Top-k Query Processing Under Budgetary Constraints

28

Results- Random Accesses

TREC, k=100,Cr=10

TREC, K=100, Cr=100

Page 29: Best-Effort Top-k Query Processing Under Budgetary Constraints

29

Related Works• Minimize budget for optimal results:

– the algorithm computes the exact results with minimum cost. (Bast et al. VLDB06, Bruno et al. ICDE02, Chang et al. SIGMOD02)

– Dual problem.• Anytime top-k :

– The algorithm collects statistics during processing, which can be used to provide probabilistic guarantees at any time during processing. (Aray et al. VLDB07)

– Do not do any optimizations.• Approximate top-k:

– approximate results with probabilistic guarantees. (Theobald et al. VLDB04, Fagin et al. 2001)

Michal Shmueli-Scheuer

Page 30: Best-Effort Top-k Query Processing Under Budgetary Constraints

30

Conclusions

• First attempt to deal with budget constraints.

• For SA only, average precision around 70%.

• Tradeoff between RAs and SAs, for relatively low cost of RA, RA schedules are improved.

Michal Shmueli-Scheuer

Page 31: Best-Effort Top-k Query Processing Under Budgetary Constraints

31

Thank You !

Page 32: Best-Effort Top-k Query Processing Under Budgetary Constraints

32

Page 33: Best-Effort Top-k Query Processing Under Budgetary Constraints

33

• Given a set of n objects and m scoring lists sorted in decreasing order, find the top-k objects according to a scoring function f

• top-k: a set T of k objects such that f(rj1,…,rjm) ≤ f(ri1,…,rim) for every object Xi in T and every object Xj not in T

• Assumption: The scoring function f is monotone– f(r1,…,rm) ≤ f(r1’,…,rm’) if ri ≤ ri’ for all I– Two accesses modes:

• sorted access – Cs• random access - Cr

• Objective: Compute top-k with the minimum cost

Top-k query

Page 34: Best-Effort Top-k Query Processing Under Budgetary Constraints

34

Sorted Accesses

• Observations:– object with high

scores has higher potential to be part of the top-k.

– object with “mediocre” scores does not help.

Prefer high scores

L1 L2 L3

O1, SL1 O1, SL2 O1, SL3

Page 35: Best-Effort Top-k Query Processing Under Budgetary Constraints

35

Example

uselessQ

Wireless zone

Page 36: Best-Effort Top-k Query Processing Under Budgetary Constraints

36

Applications

• Mobile Applications– Highly impatient users, need fast results.

• Mediation Systems– Achieve high query throughput.

• Online analytics (e.g. logs)– Achieve high query throughput.

Michal Shmueli-Scheuer

Page 37: Best-Effort Top-k Query Processing Under Budgetary Constraints

37

Motivating Example

Query throughput

Mediator

Servers

User query

Engine

Given #queries per

time unit

Allo

cate

tim

e fo

r

each

que

ry

Page 38: Best-Effort Top-k Query Processing Under Budgetary Constraints

38

Terminology

1. Sorted Access2. Random Access3. highi

4. Top-k queue5. Candidates queue6. mink7. worstScore(d)8. bestScore(d)

Page 39: Best-Effort Top-k Query Processing Under Budgetary Constraints

39

Efficient Offline Solution- Sorted

• Goal: find trace t, such that :

|||R| e

t

xactt

RR

|||R|maxarg e

||t

xacttBtTt RR

P1

P2

P1

P2

L1 L2

o1, SL1

o1, SL2

o5, SL1

o2, SL2

o5, SL2

o4, SL2

o8, SL1

o6, SL1

o3, SL2Denoted as ROPT

t105

t214

t323

t432

t541

t650

L1 L2

B=5

Page 40: Best-Effort Top-k Query Processing Under Budgetary Constraints

40

Efficient Offline Solution- Sorted

• Goal: find trace t, such that :

|||R|maxarg e

||t

xacttBtTt RR

P1

P2

P1

P2

L1 L2

o1, SL1

o1, SL2

o5, SL1

o2, SL2

o5, SL2

o4, SL2

o8, SL1

o6, SL1

o3, SL2

• Feasible for K up to 100, and m up to 10.

B =5

t105

t214

t323

t432

t541

t650

L1 L2

Page 41: Best-Effort Top-k Query Processing Under Budgetary Constraints

41

Efficient Offline Solution- Sorted

• Proof: (in negation)– Assume that t does not exists, and chose trace s that within the budget and has optimal

precision. Assume s` with traces s`i that are largest position of Pi less or equal to si.

– By construction the score of any object in S is the same to S`

Page 42: Best-Effort Top-k Query Processing Under Budgetary Constraints

42

Fair Heuristic

• Assume budget =b

m

jj

iLi

xLutil

xLutilbSA

1

),(

),(

),(*)1(),(*),( xLutilxLutilxLutil isriasi

Runs in batches

Page 43: Best-Effort Top-k Query Processing Under Budgetary Constraints

43

Efficient Offline Solution- Random

• Budget for RAs =(B-|t|*Cs)

Top-k

o1, S

o4, S

o2, S

o3, S

d Rexact

o9, S

o5, S

o7, S

o8, S

….

….

best(o)-mink

(best(o) = wosrt(o)+RA)

o10, S

o14, S

….

Page 44: Best-Effort Top-k Query Processing Under Budgetary Constraints

44

Motivation

• Many applications work in budgeted constraint environments. Still, they wish to perform top-k queries.

Mediator

Servers

User query

Engine

Budget-awareQuery processing

Page 45: Best-Effort Top-k Query Processing Under Budgetary Constraints

45

Future work

• Different access costs for different lists

• Time-aware top-k

• Top-k with budget constraints for P2P