Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld...

Optimizing and Parallelizing Optimizing and Parallelizing

Ranked EnumerationRanked Enumeration

Konstantin GolenbergKonstantin Golenberg Benny KimelfeldBenny Kimelfeld Yehoshua SagivYehoshua SagivThe Hebrew University

of JerusalemIBM Research –

AlmadenThe Hebrew University

of Jerusalem

VLDB 2011Seattle, WA

Background: DB Search at HebrewUBackground: DB Search at HebrewU

eu brussels search

• Initial implementation was too slow…• Purchased a multi-core server• Didn’t help: cores were usually idle

– Due to the inherent flow of the enumeration technique we used

•Needed deeper understanding of ranked enumeration to benefit from parallelization– This paperThis paper

demo in SIGMOD’10, implementation in SIGMOD’08, algorithms in PODS’06

OutlineOutline

Lawler-Murty’s Ranked EnumerationLawler-Murty’s Ranked Enumeration

Optimizing by Progressive BoundsOptimizing by Progressive Bounds

Parallelization / Core UtilizationParallelization / Core Utilization

ConclusionsConclusions

Ranked EnumerationRanked Enumeration

UserUser

ProblemProblem

Huge number (e.g., 2|Problem|) of ranked answers

best answer2nd best answer3rd best answer . . .. . .

Examples:• Various graph optimizations

–Shortest paths–Smallest spanning trees–Best perfect matchings

• Top results of keyword search on DBs (graph search)

• Most probable answers in probabilistic DBs

• Best recommendations for schema integration

Examples:• Various graph optimizations

–Shortest paths–Smallest spanning trees–Best perfect matchings

• Top results of keyword search on DBs (graph search)

• Most probable answers in probabilistic DBs

• Best recommendations for schema integration

““Complexity”:Complexity”:

•What is the delay between successive answers?

•How much time to get top-k?

(Can’t afford to instantiate all answers)

Goal:Goal: Find top-k answersFind top-k answers

Abstract Problem Formulation Abstract Problem Formulation

O =A collection of objects

score()

21 31 2827 17

score(a) is high a is of high-quality

Huge, described by a condition on A’s subsets

……

……32 31 28

Answersa ⊆ O

inputinput

a1 a2 a3 ak

Goal:Goal: Find top-k answersFind top-k answers

Graph Search in The AbstractionGraph Search in The Abstraction

A = …… Answersa ⊆ O

• Data graph G• Set Q of keywords • Data graph G• Set Q of keywords

Edges of G

Subtrees (edge sets) a containing all keywords in Q (w/o redundancy, see [GKS 2008])

score(a):1

, IR measures, etc.weight(a)

What is the Challenge?What is the Challenge?

32start

1st (top) answer

Optimization problem

2nd answer

. . .. . . 17

j th answer

• ≠ previous (j-1) answers• best remaining answer

Conceivably, much Conceivably, much more complicated more complicated

than top-1!than top-1!

How to handle these constraints? (j may be large!)

. . .. . .

Lawler-Murty’s ProcedureLawler-Murty’s ProcedureLawler-Murty’s gives a general reduction:

Finding top-k answers

Finding top-1 answer under simple constraints

if PTIME

then PTIME

We understand optimization much better!

Often, amounts to classical optimization, e.g., shortest path(but sometimes it may get involved, e.g., [KS 2006])

[Murty, 1968][Lawler, 1972][Murty, 1968][Lawler, 1972]

Other general top-k procedure:

[Hamacher & Queyranne 84], very similar!

Among the Uses of Lawler-Murty’sAmong the Uses of Lawler-Murty’s

• Shortest simple paths [Yen 1972]• Minimum spanning trees [Gabow 1977, Katoh et al., 1981]• Best solutions in resource allocation [Katoh et al. 1981]• Best perfect matchings, best cuts [Hamacher & Queyranne 1985]• Minimum Steiner trees [KS 2006]

Graph/Combinatorial Algorithms:Graph/Combinatorial Algorithms:

• Yen’s algorithm to find sets of metabolites connected by chemical reactions [Takigawa & Mamitsuka 2008]

Bioinformatics:Bioinformatics:

• ORDER-BY queries [KS 2006, 2007]• Graph/XML search [GKS 2008]• Generation of forms over integrated data [Talukdar et al. 2008]• Course recommendation [Parameswaran & Garcia-Molina 2009]• Querying Markov sequences [K & Ré 2010]

Data Management:Data Management:

Lawler-Murty’s Method: Conceptual Lawler-Murty’s Method: Conceptual

OutputOutput

1. 1. Find & Print the Top AnswerFind & Print the Top Answer

But Instead…But Instead…

In principle, at this point we should find the second-best answer

2.2. Partition the Remaining Answers Partition the Remaining AnswersPartition defined by a set of simple constraintssimple constraints

OutputOutputstart • Inclusion constraint: “must contain ”

• Exclusion constraint: “must not contain ”

3.3. Find the Top of Each Set Find the Top of Each Set

OutputOutputstart

4.4. Find & Print the Second Answer Find & Print the Second Answer

OutputOutputstart Next answer: Best among all the Best among all the top answers in the partitionstop answers in the partitions

5.5. Further Divide the Chosen Partition Further Divide the Chosen Partition

… and so on … (until k answers are printed)

OutputOutputstart . . .. . .

OutputOutput

Partition Reps. + Best of EachPartition Reps. + Best of Each

Lawler-Murty’s: Actual ExecutionLawler-Murty’s: Actual Execution

18182424

3434 3030

Printed

already

Best of each

partitionbest

OutputOutput

For each new partition, a task to find the best

answer1919 1818

3434 3030

OutputOutput

1818 2121

best…

1919 1818

3434 3030

OutlineOutline

OutputOutput

Typical BottleneckTypical Bottleneck

3434 3030

1414 1212

OutputOutput

Typical BottleneckTypical Bottleneck

3434 3030

2222 2020 1515

1414 1212

In top k?

Progressive Upper BoundProgressive Upper Bound

• Throughout the execution, an optimization alg. often upper bounds it’s final solution’s score

• Progressive: bound gets smaller in time

• Often, nontrivial bounds, e.g.,– Dijkstra's algorithm: distance at the top of the queue

• Similarly: some Steiner-tree algorithms [DreyfusWagner72]

– Viterbi algorithms: max intermediate probability– Primal-dual methods: value of dual LP solution

≤18 ≤14≤22≤24

TimeTime

OutputOutput

Freezing Tasks (Simplified)Freezing Tasks (Simplified)

3434 3030

1414 1212

OutputOutput

≤24≤23

3434 3030

≤24≤23≤22

1414 1212

OutputOutput

22 > 20

3434 3030

1414 12122222 2020

≤24≤23≤20

OutputOutput

3434 3030 2424

1414 1212

2222 2020

≤24≤23≤20≤18≤16≤15

100000

120000

Improvement of FreezingImprovement of Freezing

Mondialk = 10 , 100

DBLP (part)k = 10 , 100

DBLP (full)k = 10 , 100

On average, freezing On average, freezing saved saved 56%56% of the running of the running

timetime

Experiments: Graph Search2 Intel Xeon processors (2.67GHz), 4 cores each (8 total); 48GB memory

Simple Lawler-Murty w/ Freezing

OutlineOutline

Awaiting TasksAwaiting Tasks

OutputOutput

Straightforward ParallelizationStraightforward Parallelization

1414 1212

3434 3030

OutputOutput

1414 1212

3434 3030 2424

OutputOutput

1414 121220202222

3434 3030 2424

Not so fast…Not so fast…

Typical: reduced 30% of running time

Same for 2,3…,8 threads!

OutputOutput

Idle Cores while WaitingIdle Cores while Waiting

1414 1212

3434 3030

OutputOutput

Idle Cores while WaitingIdle Cores while Waiting

1414 1212

3434 3030 2424

OutputOutput

Early PoppingEarly Popping

≤24≤23≤20

22 > 20

Skipped issues:

• Thread synchronization

– semaphores, locking, etc.

• Correctness

1414 121220202222

3434 3030 2424

Improvement of Early PoppingImprovement of Early Popping

Mondialshort, medium-size & long queries

DBLP (part)short, medium-size & long queries

1 2 4 6 8

Number of Threads

Short Medium Long

1 2 4 6 8

Number of Threads

Short Medium Long

Early Popping vs. (Serial) Freezing Early Popping vs. (Serial) Freezing

Mondialshort, medium-size & long queries

DBLP (part)short, medium-size & long queries

1 2 4 6 8

Number of Threads

Short Medium Long

1 2 4 6 8

Number of Threads

Short Medium Long

•Need 4 threads to start Need 4 threads to start gaininggaining•And even then, fairly poor…And even then, fairly poor…

Combining Freezing & Early PoppingCombining Freezing & Early Popping

• We discuss additional ideas and techniques to further utilize the cores– Not here, see the paper

• Main speedup by combining early popping with freezing– Cores kept busy… on high-potential tasks– Thread synchronization is quite involved

• At the high level, the final algorithm has the following flow:

Combining: General IdeaCombining: General Idea

Computed Answers (to-print)Computed Answers (to-print)

Partition Reps. as Frozen TasksPartition Reps. as Frozen Tasks

OutputOutput

171725251515

Threads work on frozen tasks

frozen + new tasks

computed

answers

3434 3030

24242020 1212

OutputOutput

171725251515

frozen + new tasks

computed

answers

3434 3030

24242020 1212

Main task just pops computed results to print… but validates: no better results by frozen

OutputOutput

17172525151520

frozen + new tasks

computed

answers 22222222

3434 3030

2020 1212

Combined vs. (Serial) Freezing Combined vs. (Serial) Freezing

20%40%

80%100%

1 2 4 6 8

Number of Threads

Short Medium Long

20%40%

80%100%

1 2 4 6 8

Number of Threads%

Short Medium Long

Mondial DBLP

Now, significant gain (Now, significant gain (≈50%≈50%) already w/ 2 ) already w/ 2 threadsthreads

Improvement of CombinedImprovement of Combined

1 2 4 6 8

Number of Threads

Short Medium Long

1 2 4 6 8

Number of Threads

Short Medium Long

4%-5% 3%-10%

On average, with 8 threads we On average, with 8 threads we got 5.7% of the original running got 5.7% of the original running

timetime

Mondial

OutlineOutline

ConclusionsConclusions• Considered Lawler-Murty’s ranked enumeration

– Theoretical complexity guarantees– …but a direct implementation is very slow– Straightforward parallelization poorly utilizes cores

• Ideas: progressive bounds, freezing, early popping– In the paper: additional ideas, combination of ideas

• Most significant speedup by combining these ideas– Flow substantially differs from the original procedure– 20x faster on 8 cores

• Test case: graph search; focus: general apps – Future: additional test cases

Questions?Questions?

Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld...

Documents

BENNY GOODMAN

Multi-Tuple Deletion Propagation ... - theory.stanford.edujvondrak/data/del-multituple.pdf · Multi-Tuple Deletion Propagation: Approximations and Complexity Benny Kimelfeld IBM Research–Almaden

A Benny Goodman filmsztori - Bohém Ragtime Jazz Bandjazzfovaros.bohemragtime.com/verseny/img/bjm_zipi_goodman.pdf · A Benny Goodman filmsztori Benny Goodman (1909-1986) élet-rajzát

Developmental Regulation of Floral Sexual Dimorphism in Cultivated Spinach, Spinacia oleracea Edward M. Golenberg, D. Noah Sather, Catherine Pfent, Kevin

Benny 2 Article

CIKM 2005 1 Finding and Approximating Top-k Answers in Keyword Proximity Search Benny Kimelfeld Yehoshua Sagiv Benny Kimelfeld and Yehoshua Sagiv The Selim

Benny vip day

ththoihoa.bencat.edu.vnththoihoa.bencat.edu.vn/uploads/ththoihoa/news/2020_02/... · Web viewBENNY/ SUE SUE/ BENNY SUE/ BENNY BENNY/ SUE SUE/ BENNY SUE/ BENNY SUE/ BENNY SUE/ BENNY

Jack Benny

benny moore

Benny Farm Redevelopment · 2019-07-05 · Benny Farm Redevelopment September 22, 2003 Page 1 Benny Farm: a Project for the Community Benny Farm is an 18-acre (7.3 hectares) residential

Benny Y Babe

Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University

Righteous Victims - Benny Morris

Benny walker

HAPPY BIRTHDAY BENNY!!!

Benny Applebaum - eng.tau.ac.il

Benny Bunny Meets Benny Banana

Benny Greb Md

The Blood - Benny Hinn