45
Optimizing and Optimizing and Parallelizing Parallelizing Ranked Enumeration Ranked Enumeration Konstantin Konstantin Golenberg Golenberg Benny Benny Kimelfeld Kimelfeld Yehoshua Yehoshua Sagiv Sagiv The Hebrew University of Jerusalem IBM Research – Almaden The Hebrew University of Jerusalem VLDB 2011 Seattle, WA

Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

Embed Size (px)

Citation preview

Page 1: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

Optimizing and Parallelizing Optimizing and Parallelizing

Ranked EnumerationRanked Enumeration

Konstantin GolenbergKonstantin Golenberg Benny KimelfeldBenny Kimelfeld Yehoshua SagivYehoshua SagivThe Hebrew University

of JerusalemIBM Research –

AlmadenThe Hebrew University

of Jerusalem

VLDB 2011Seattle, WA

Page 2: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

2

Background: DB Search at HebrewUBackground: DB Search at HebrewU

eu brussels search

• Initial implementation was too slow…• Purchased a multi-core server• Didn’t help: cores were usually idle

– Due to the inherent flow of the enumeration technique we used

•Needed deeper understanding of ranked enumeration to benefit from parallelization– This paperThis paper

demo in SIGMOD’10, implementation in SIGMOD’08, algorithms in PODS’06

Page 3: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

OutlineOutline

Lawler-Murty’s Ranked EnumerationLawler-Murty’s Ranked Enumeration

Optimizing by Progressive BoundsOptimizing by Progressive Bounds

Parallelization / Core UtilizationParallelization / Core Utilization

ConclusionsConclusions

Page 4: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

4

Ranked EnumerationRanked Enumeration

UserUser

ProblemProblem

Huge number (e.g., 2|Problem|) of ranked answers

best answer2nd best answer3rd best answer . . .. . .

Examples:• Various graph optimizations

–Shortest paths–Smallest spanning trees–Best perfect matchings

• Top results of keyword search on DBs (graph search)

• Most probable answers in probabilistic DBs

• Best recommendations for schema integration

Examples:• Various graph optimizations

–Shortest paths–Smallest spanning trees–Best perfect matchings

• Top results of keyword search on DBs (graph search)

• Most probable answers in probabilistic DBs

• Best recommendations for schema integration

““Complexity”:Complexity”:

•What is the delay between successive answers?

•How much time to get top-k?

Here

(Can’t afford to instantiate all answers)

Page 5: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

5

Goal:Goal: Find top-k answersFind top-k answers

Abstract Problem Formulation Abstract Problem Formulation

O =A collection of objects

A =

score()

21 31 2827 17

score(a) is high a is of high-quality

Huge, described by a condition on A’s subsets

……

……32 31 28

Answersa ⊆ O

inputinput

17

a1 a2 a3 ak

Page 6: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

6

Goal:Goal: Find top-k answersFind top-k answers

Graph Search in The AbstractionGraph Search in The Abstraction

A = …… Answersa ⊆ O

• Data graph G• Set Q of keywords • Data graph G• Set Q of keywords

Edges of G

Subtrees (edge sets) a containing all keywords in Q (w/o redundancy, see [GKS 2008])

score(a):1

, IR measures, etc.weight(a)

O =

Page 7: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

7

What is the Challenge?What is the Challenge?

O =

32start

1st (top) answer

Optimization problem

31

2nd answer

??

. . .. . . 17

j th answer

• ≠ previous (j-1) answers• best remaining answer

Conceivably, much Conceivably, much more complicated more complicated

than top-1!than top-1!

??

How to handle these constraints? (j may be large!)

. . .. . .

Page 8: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

8

Lawler-Murty’s ProcedureLawler-Murty’s ProcedureLawler-Murty’s gives a general reduction:

Finding top-k answers

Finding top-1 answer under simple constraints

if PTIME

then PTIME

We understand optimization much better!

Often, amounts to classical optimization, e.g., shortest path(but sometimes it may get involved, e.g., [KS 2006])

[Murty, 1968][Lawler, 1972][Murty, 1968][Lawler, 1972]

Other general top-k procedure:

[Hamacher & Queyranne 84], very similar!

Page 9: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

9

Among the Uses of Lawler-Murty’sAmong the Uses of Lawler-Murty’s

• Shortest simple paths [Yen 1972]• Minimum spanning trees [Gabow 1977, Katoh et al., 1981]• Best solutions in resource allocation [Katoh et al. 1981]• Best perfect matchings, best cuts [Hamacher & Queyranne 1985]• Minimum Steiner trees [KS 2006]

Graph/Combinatorial Algorithms:Graph/Combinatorial Algorithms:

• Yen’s algorithm to find sets of metabolites connected by chemical reactions [Takigawa & Mamitsuka 2008]

Bioinformatics:Bioinformatics:

• ORDER-BY queries [KS 2006, 2007]• Graph/XML search [GKS 2008]• Generation of forms over integrated data [Talukdar et al. 2008]• Course recommendation [Parameswaran & Garcia-Molina 2009]• Querying Markov sequences [K & Ré 2010]

Data Management:Data Management:

Page 10: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

10

Lawler-Murty’s Method: Conceptual Lawler-Murty’s Method: Conceptual

start

Page 11: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

11

OutputOutput

1. 1. Find & Print the Top AnswerFind & Print the Top Answer

start

But Instead…But Instead…

In principle, at this point we should find the second-best answer

Page 12: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

12

2.2. Partition the Remaining Answers Partition the Remaining AnswersPartition defined by a set of simple constraintssimple constraints

OutputOutputstart • Inclusion constraint: “must contain ”

• Exclusion constraint: “must not contain ”

Page 13: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

13

3.3. Find the Top of Each Set Find the Top of Each Set

OutputOutputstart

Page 14: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

14

4.4. Find & Print the Second Answer Find & Print the Second Answer

OutputOutputstart Next answer: Best among all the Best among all the top answers in the partitionstop answers in the partitions

Page 15: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

15

5.5. Further Divide the Chosen Partition Further Divide the Chosen Partition

… and so on … (until k answers are printed)

OutputOutputstart . . .. . .

Page 16: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

16

OutputOutput

Partition Reps. + Best of EachPartition Reps. + Best of Each

Lawler-Murty’s: Actual ExecutionLawler-Murty’s: Actual Execution

18182424

3434 3030

Printed

already

Best of each

partitionbest

1919

Page 17: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

17

OutputOutput

Lawler-Murty’s: Actual ExecutionLawler-Murty’s: Actual Execution

2424

Partition Reps. + Best of EachPartition Reps. + Best of Each

For each new partition, a task to find the best

answer1919 1818

3434 3030

Page 18: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

18

OutputOutput

Lawler-Murty’s: Actual ExecutionLawler-Murty’s: Actual Execution

1818 2121

Partition Reps. + Best of EachPartition Reps. + Best of Each

2424

best…

1919 1818

3434 3030

2222

Page 19: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

OutlineOutline

Lawler-Murty’s Ranked EnumerationLawler-Murty’s Ranked Enumeration

Optimizing by Progressive BoundsOptimizing by Progressive Bounds

Parallelization / Core UtilizationParallelization / Core Utilization

ConclusionsConclusions

Page 20: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

20

OutputOutput

Typical BottleneckTypical Bottleneck

2424

Partition Reps. + Best of EachPartition Reps. + Best of Each

3434 3030

1414 1212

Page 21: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

21

OutputOutput

Typical BottleneckTypical Bottleneck

2424

Partition Reps. + Best of EachPartition Reps. + Best of Each

3434 3030

2222 2020 1515

1414 1212

In top k?

Page 22: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

22

1212

Progressive Upper BoundProgressive Upper Bound

• Throughout the execution, an optimization alg. often upper bounds it’s final solution’s score

• Progressive: bound gets smaller in time

• Often, nontrivial bounds, e.g.,– Dijkstra's algorithm: distance at the top of the queue

• Similarly: some Steiner-tree algorithms [DreyfusWagner72]

– Viterbi algorithms: max intermediate probability– Primal-dual methods: value of dual LP solution

≤18 ≤14≤22≤24

TimeTime

Page 23: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

23

OutputOutput

Freezing Tasks (Simplified)Freezing Tasks (Simplified)

2424

Partition Reps. + Best of EachPartition Reps. + Best of Each

3434 3030

1414 1212

Page 24: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

24

OutputOutput

Freezing Tasks (Simplified)Freezing Tasks (Simplified)

2424

Partition Reps. + Best of EachPartition Reps. + Best of Each

≤24≤23

3434 3030

2222

≤24≤23≤22

2020

1414 1212

Page 25: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

25

OutputOutput

Freezing Tasks (Simplified)Freezing Tasks (Simplified)

2424

Partition Reps. + Best of EachPartition Reps. + Best of Each

22 > 20

3434 3030

1414 12122222 2020

≤24≤23≤20

Page 26: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

26

OutputOutput

Freezing Tasks (Simplified)Freezing Tasks (Simplified)

Partition Reps. + Best of EachPartition Reps. + Best of Each

best

3434 3030 2424

1414 1212

≤20

2222 2020

≤24≤23≤20≤18≤16≤15

1515

Page 27: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

27

0

20000

40000

60000

80000

100000

120000

ms

0

2000

4000

6000

8000

10000

ms

0

200

400

600

800

1000

ms

Improvement of FreezingImprovement of Freezing

Mondialk = 10 , 100

DBLP (part)k = 10 , 100

DBLP (full)k = 10 , 100

On average, freezing On average, freezing saved saved 56%56% of the running of the running

timetime

Experiments: Graph Search2 Intel Xeon processors (2.67GHz), 4 cores each (8 total); 48GB memory

Simple Lawler-Murty w/ Freezing

Page 28: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

OutlineOutline

Lawler-Murty’s Ranked EnumerationLawler-Murty’s Ranked Enumeration

Optimizing by Progressive BoundsOptimizing by Progressive Bounds

Parallelization / Core UtilizationParallelization / Core Utilization

ConclusionsConclusions

Page 29: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

29

Awaiting TasksAwaiting Tasks

OutputOutput

Straightforward ParallelizationStraightforward Parallelization

1414 1212

3434 3030

2424

Page 30: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

30

Awaiting TasksAwaiting Tasks

OutputOutput

Straightforward ParallelizationStraightforward Parallelization

1414 1212

3434 3030 2424

2222

1515

2020

Page 31: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

31

Awaiting TasksAwaiting Tasks

OutputOutput

Straightforward ParallelizationStraightforward Parallelization

1414 121220202222

3434 3030 2424

1515

Page 32: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

Not so fast…Not so fast…

Typical: reduced 30% of running time

Same for 2,3…,8 threads!

Page 33: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

33

Awaiting TasksAwaiting Tasks

OutputOutput

Idle Cores while WaitingIdle Cores while Waiting

1414 1212

3434 3030

2424

Page 34: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

34

Awaiting TasksAwaiting Tasks

OutputOutput

Idle Cores while WaitingIdle Cores while Waiting

idle

1414 1212

3434 3030 2424

2222

1515

2020

Page 35: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

35

Awaiting TasksAwaiting Tasks

OutputOutput

Early PoppingEarly Popping

≤24≤23≤20

22 > 20

≤22

≤22

Skipped issues:

• Thread synchronization

– semaphores, locking, etc.

• Correctness

1414 121220202222

3434 3030 2424

≤19

Page 36: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

36

Improvement of Early PoppingImprovement of Early Popping

Mondialshort, medium-size & long queries

DBLP (part)short, medium-size & long queries

0%

50%

100%

150%

1 2 4 6 8

Number of Threads

% o

f Law

ler-

Mur

ty

Short Medium Long

0%

50%

100%

150%

1 2 4 6 8

Number of Threads

% o

f Law

ler-

Mur

ty

Short Medium Long

Experiments: Graph Search2 Intel Xeon processors (2.67GHz), 4 cores each (8 total); 48GB memory

Page 37: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

37

Early Popping vs. (Serial) Freezing Early Popping vs. (Serial) Freezing

Mondialshort, medium-size & long queries

DBLP (part)short, medium-size & long queries

0

100

200

300

1 2 4 6 8

Number of Threads

% o

f S

eri

al F

ree

zin

g

Short Medium Long

0

100

200

300

1 2 4 6 8

Number of Threads

% o

f S

eri

al F

ree

zin

g

Short Medium Long

•Need 4 threads to start Need 4 threads to start gaininggaining•And even then, fairly poor…And even then, fairly poor…

Experiments: Graph Search2 Intel Xeon processors (2.67GHz), 4 cores each (8 total); 48GB memory

Page 38: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

38

Combining Freezing & Early PoppingCombining Freezing & Early Popping

• We discuss additional ideas and techniques to further utilize the cores– Not here, see the paper

• Main speedup by combining early popping with freezing– Cores kept busy… on high-potential tasks– Thread synchronization is quite involved

• At the high level, the final algorithm has the following flow:

Page 39: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

39

Combining: General IdeaCombining: General Idea

Computed Answers (to-print)Computed Answers (to-print)

Partition Reps. as Frozen TasksPartition Reps. as Frozen Tasks

OutputOutput

171725251515

Threads work on frozen tasks

frozen + new tasks

computed

answers

3434 3030

24242020 1212

2626

Page 40: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

40

Combining: General IdeaCombining: General Idea

Computed Answers (to-print)Computed Answers (to-print)

Partition Reps. as Frozen TasksPartition Reps. as Frozen Tasks

OutputOutput

171725251515

Threads work on frozen tasks

frozen + new tasks

computed

answers

3434 3030

24242020 1212

20

Page 41: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

41

Main task just pops computed results to print… but validates: no better results by frozen

tasks

Combining: General IdeaCombining: General Idea

Computed Answers (to-print)Computed Answers (to-print)

Partition Reps. as Frozen TasksPartition Reps. as Frozen Tasks

OutputOutput

17172525151520

Threads work on frozen tasks

frozen + new tasks

computed

answers 22222222

3434 3030

2424

2222

2020 1212

Page 42: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

42

Combined vs. (Serial) Freezing Combined vs. (Serial) Freezing

0%

20%40%

60%

80%100%

120%

1 2 4 6 8

Number of Threads

% o

f S

eri

al F

ree

zin

g

Short Medium Long

0%

20%40%

60%

80%100%

120%

1 2 4 6 8

Number of Threads%

of

Se

ria

l Fre

ezi

ng

Short Medium Long

Mondial DBLP

Now, significant gain (Now, significant gain (≈50%≈50%) already w/ 2 ) already w/ 2 threadsthreads

Experiments: Graph Search2 Intel Xeon processors (2.67GHz), 4 cores each (8 total); 48GB memory

Page 43: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

43

Improvement of CombinedImprovement of Combined

0%

10%

20%

30%

40%

50%

1 2 4 6 8

Number of Threads

% o

f L

aw

ler-

Mu

rty

Short Medium Long

DBLP

0%

10%

20%

30%

40%

50%

1 2 4 6 8

Number of Threads

% o

f L

aw

ler-

Mu

rty

Short Medium Long

4%-5% 3%-10%

On average, with 8 threads we On average, with 8 threads we got 5.7% of the original running got 5.7% of the original running

timetime

Mondial

Experiments: Graph Search2 Intel Xeon processors (2.67GHz), 4 cores each (8 total); 48GB memory

Page 44: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

OutlineOutline

Lawler-Murty’s Ranked EnumerationLawler-Murty’s Ranked Enumeration

Optimizing by Progressive BoundsOptimizing by Progressive Bounds

Parallelization / Core UtilizationParallelization / Core Utilization

ConclusionsConclusions

Page 45: Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem

45

ConclusionsConclusions• Considered Lawler-Murty’s ranked enumeration

– Theoretical complexity guarantees– …but a direct implementation is very slow– Straightforward parallelization poorly utilizes cores

• Ideas: progressive bounds, freezing, early popping– In the paper: additional ideas, combination of ideas

• Most significant speedup by combining these ideas– Flow substantially differs from the original procedure– 20x faster on 8 cores

• Test case: graph search; focus: general apps – Future: additional test cases

Questions?Questions?