22
A Parallelization of State-of- A Parallelization of State-of- the-Art Graph Bisection the-Art Graph Bisection Algorithms Algorithms Nan Dun Nan Dun , Kenjiro Taura, Akinori Yonezawa , Kenjiro Taura, Akinori Yonezawa Graduate School of Information Science and Graduate School of Information Science and Technology Technology The University of Tokyo The University of Tokyo

A Parallelization of State-of-the-Art Graph Bisection Algorithms

Embed Size (px)

DESCRIPTION

A Parallelization of State-of-the-Art Graph Bisection Algorithms. Nan Dun , Kenjiro Taura, Akinori Yonezawa Graduate School of Information Science and Technology The University of Tokyo. Problem Description. Graph Partition Goal: To minimize cut K-partition Bisection (Bipartition) - PowerPoint PPT Presentation

Citation preview

Page 1: A Parallelization of State-of-the-Art Graph Bisection Algorithms

A Parallelization of State-of-the-A Parallelization of State-of-the-Art Graph Bisection AlgorithmsArt Graph Bisection Algorithms

A Parallelization of State-of-the-A Parallelization of State-of-the-Art Graph Bisection AlgorithmsArt Graph Bisection Algorithms

Nan DunNan Dun, Kenjiro Taura, Akinori Yonezawa, Kenjiro Taura, Akinori YonezawaGraduate School of Information Science and TechnologyGraduate School of Information Science and Technology

The University of TokyoThe University of Tokyo

Nan DunNan Dun, Kenjiro Taura, Akinori Yonezawa, Kenjiro Taura, Akinori YonezawaGraduate School of Information Science and TechnologyGraduate School of Information Science and Technology

The University of TokyoThe University of Tokyo

Page 2: A Parallelization of State-of-the-Art Graph Bisection Algorithms

July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 22

Problem DescriptionProblem DescriptionProblem DescriptionProblem Description

• Graph Partition Goal: To minimize cut K-partition Bisection (Bipartition)

• Problem Complexity To find best partition or

To find approximate partitions: NP-Hard1)2)

• Solutions Heuristics

Non-deterministic On the Grid

• Graph Partition Goal: To minimize cut K-partition Bisection (Bipartition)

• Problem Complexity To find best partition or

To find approximate partitions: NP-Hard1)2)

• Solutions Heuristics

Non-deterministic On the Grid

2

1

3

4

6

5

グラフ分割問題グラフ分割問題

無向グラフ 無向グラフ G=(V,E)G=(V,E) が与えが与えられたとき、られたとき、 |L|=|R||L|=|R| を満たを満たすす VV の分割の分割 (L,R)(L,R) で、で、 LL とと RR間の枝の本数を最小にするも間の枝の本数を最小にするものを求める問題。のを求める問題。

L={1,2,3}L={1,2,3} R={4,5,6}R={4,5,6}

2

1

Page 3: A Parallelization of State-of-the-Art Graph Bisection Algorithms

July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 33

Practical ApplicationPractical ApplicationPractical ApplicationPractical Application

• In Mathematics Analysis of sparse system of linear equations

• In Computer Science Modeling data placement on distributed memory,

to minimize communication

• In other Various Domains VLSI Design Transportation Networks Communication Networks

• In Mathematics Analysis of sparse system of linear equations

• In Computer Science Modeling data placement on distributed memory,

to minimize communication

• In other Various Domains VLSI Design Transportation Networks Communication Networks

Page 4: A Parallelization of State-of-the-Art Graph Bisection Algorithms

July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 44

Bisection FlowBisection FlowBisection FlowBisection Flow

• Bisection Initialization Random Initialization Half-Half Initialization Region Growing

• Bisection Refinement Kernighan-Lin3)4)

Tabu Search7)

Fixed Tabu Search Reactive Tabu Search

• Bisection Initialization Random Initialization Half-Half Initialization Region Growing

• Bisection Refinement Kernighan-Lin3)4)

Tabu Search7)

Fixed Tabu Search Reactive Tabu Search

Bisection InitializationBisection Initialization

Bisection RefinementBisection Refinement

Initial Bisection

Final Bisection

Page 5: A Parallelization of State-of-the-Art Graph Bisection Algorithms

July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 55

Min-Max Greedy GrowingMin-Max Greedy Growing7)7)Min-Max Greedy GrowingMin-Max Greedy Growing7)7)

Min: Search vertices Search vertices which cause minimal which cause minimal edge-cutedge-cut

Max: Breaking ties Breaking ties by maximizing by maximizing internal internal connectionsconnections

AB

C

addsetaddset

A

Page 6: A Parallelization of State-of-the-Art Graph Bisection Algorithms

July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 66

Kernighan-LinKernighan-Lin3)4)3)4)Kernighan-LinKernighan-Lin3)4)3)4)

1. Calculate gain of each vertex

2. Search a serials of pairs which leads to maximal edge-cut reduction if being swapped

3. Swap pairs of vertices obtained in 2, lock them from further swap in current pass

4. Iterate step 1, 2, 3 until edge-cut stops to converge

1. Calculate gain of each vertex

2. Search a serials of pairs which leads to maximal edge-cut reduction if being swapped

3. Swap pairs of vertices obtained in 2, lock them from further swap in current pass

4. Iterate step 1, 2, 3 until edge-cut stops to converge

A

B

C

D

A B

C D

Swapping Pair of VerticesSwapping Pair of Vertices

*gain := # of Internal Edges - # of External *gain := # of Internal Edges - # of External EdgesEdges

gain(B) = -1, gain(C) = -2gain(B) = -1, gain(C) = -2

ΔΔCut of swapping B, C = Cut of swapping B, C = gain(B) + gain(C) + 2 = -gain(B) + gain(C) + 2 = -

11

Page 7: A Parallelization of State-of-the-Art Graph Bisection Algorithms

July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 77

Tabu SearchTabu Search7)7)Tabu SearchTabu Search7)7)

• Kernighan-Lin Like Swapping pairs of vertices according to their

gains

• Temporarily Forbidden Previously swapped vertices are temporarily

forbad to move for a period of time (Tabu Length) Tabu Length: A fraction (Tabu Fraction) of |V|

E.g.: Tabu Fraction = 0.01, |V| = 1000, Tabu Length = 0.01 x |V| = 10 Previously swapped pairs are allowed to move again after 10 other swaps

To exceed “Local-Minimum”

• Kernighan-Lin Like Swapping pairs of vertices according to their

gains

• Temporarily Forbidden Previously swapped vertices are temporarily

forbad to move for a period of time (Tabu Length) Tabu Length: A fraction (Tabu Fraction) of |V|

E.g.: Tabu Fraction = 0.01, |V| = 1000, Tabu Length = 0.01 x |V| = 10 Previously swapped pairs are allowed to move again after 10 other swaps

To exceed “Local-Minimum”

Page 8: A Parallelization of State-of-the-Art Graph Bisection Algorithms

July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 88

Graph Types – Tabu LengthsGraph Types – Tabu LengthsGraph Types – Tabu LengthsGraph Types – Tabu Lengths

• Number of Vertex Degree Denser random graphs tend to prefer smaller Tabu

lengths, while denser geometric graphs tend to prefer larger tabu lengths8)

• Distribution of Vertex Degree Graphs having uniform distribution of vertex degree

tend to have unique fitting tabu length

• Number of Vertex Degree Denser random graphs tend to prefer smaller Tabu

lengths, while denser geometric graphs tend to prefer larger tabu lengths8)

• Distribution of Vertex Degree Graphs having uniform distribution of vertex degree

tend to have unique fitting tabu length

1400

1500

1600

1700

1800

1900

0.01 0.04 0.07 0.1 0.13 0.16 0.19 0.22 0.25

7150

7250

7350

7450

7550

7650

7750

0.01 0.04 0.07 0.1 0.13 0.16 0.19 0.22 0.25

Edg

e-C

ut

Tabu Fraction

|V| = 17758 |E| = 54196 Deg: Max 573 Min 1 Avg. 6.1 |V| = 35000 |E| = 346572 Deg: Max 43 Min 3 Avg. 19.8

Page 9: A Parallelization of State-of-the-Art Graph Bisection Algorithms

July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 99

RRTSRRTS7)7)RRTSRRTS7)7)

• Synthesis of Heuristics Heuristics perform as

complementary for each other

• Reactive Try each Tabu-length to

see which is better Adaptive to various

graphs

• Best Quality Beyond “Local-minimum”

• Long Running Time Scoring Phase

• Synthesis of Heuristics Heuristics perform as

complementary for each other

• Reactive Try each Tabu-length to

see which is better Adaptive to various

graphs

• Best Quality Beyond “Local-minimum”

• Long Running Time Scoring Phase

RREACTIVEEACTIVERRANDOMIZEDANDOMIZEDTTABUABUSSEARCEARCHH

Scoring each Tabu length by smallScoring each Tabu length by small runs of TS runs of TS do I times

Initial bisection by Min-Max

do J times TS with high-scoredhigh-scored Tabu length Refine by Kernighan-Lin runs

R. Battiti and A. A. Bertossi.R. Battiti and A. A. Bertossi. Greedy, Greedy, Prohibition, and Reactive Heuristics for Prohibition, and Reactive Heuristics for Graph Partitioning. Graph Partitioning. IEEE Transactions IEEE Transactions on Computers, Vol. 48, April 1999.on Computers, Vol. 48, April 1999.

Page 10: A Parallelization of State-of-the-Art Graph Bisection Algorithms

July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 1010

Multi-level for Large GraphsMulti-level for Large GraphsMulti-level for Large GraphsMulti-level for Large Graphs

• Coarsen Phase Coarsen large graphs to

smaller one by using “Match Scheme”

Multi-level coarsen

• Bisection Phase Bisecting small graphs is

usually very fast

• Uncoarsen Phase Mapping back to original

graph Perform refinement in

each uncoarsening phase

• METIS5)12)

• Coarsen Phase Coarsen large graphs to

smaller one by using “Match Scheme”

Multi-level coarsen

• Bisection Phase Bisecting small graphs is

usually very fast

• Uncoarsen Phase Mapping back to original

graph Perform refinement in

each uncoarsening phase

• METIS5)12)

Matching Scheme

Page 11: A Parallelization of State-of-the-Art Graph Bisection Algorithms

July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 1111

Comparison of HeuristicsComparison of HeuristicsComparison of HeuristicsComparison of Heuristics

METISMETIS RRTS100RRTS100 FTS10000FTS10000

cutcut timetime cutcut timetime cutcut timetime

G1 130 0.01 130 168.11 130 1.22

G2 366 0.07 353 696.49 354 13.85

G3 311 0.10 311 935.56 306 32.85

G4 6337 0.04 6257 353.45 6316 3.77

G5 950 0.17 Timeout (1 hour)Timeout (1 hour) 929 31.55

Graph |V| |E|Degree

Best Tabu FractionAvg Min Max

G1:fe_4elt 11143 32818 7.93 0 15 0.02

G2:fe_pwt 36519 144794 5.89 3 12 0.02

G3:fe_body 45087 163734 7.26 0 28 0.02

G4:mem 17758 54196 6.10 1 573 0.14

G5:wing 62032 121544 3.92 2 4 0.01

Page 12: A Parallelization of State-of-the-Art Graph Bisection Algorithms

July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 1212

Comparison of HeuristicsComparison of HeuristicsComparison of HeuristicsComparison of Heuristics

• METIS Extremely Fast

Using Multi-level Technique High-Quality Bisections but worse than RRTS

Multi-level lacks “Global-Optimizing” during coarsen phase

• RRTS Very Slow

Scoring Phase is time costing “Ever-best” Bisections

Adaptive to kinds of graphs

• FTS with Known Tabu-Length Must faster than RRTS Comparable result to RRTS

• METIS Extremely Fast

Using Multi-level Technique High-Quality Bisections but worse than RRTS

Multi-level lacks “Global-Optimizing” during coarsen phase

• RRTS Very Slow

Scoring Phase is time costing “Ever-best” Bisections

Adaptive to kinds of graphs

• FTS with Known Tabu-Length Must faster than RRTS Comparable result to RRTS

Page 13: A Parallelization of State-of-the-Art Graph Bisection Algorithms

July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 1313

A Naive ParallelizationA Naive ParallelizationA Naive ParallelizationA Naive Parallelization

• Run RRTS independently on each node Simply equivalent to scale-up iterations

• Generate Different seeds for different nodes Heuristics are initial sensitive 10% ~ 20% enhanced

• Run RRTS independently on each node Simply equivalent to scale-up iterations

• Generate Different seeds for different nodes Heuristics are initial sensitive 10% ~ 20% enhanced

RRTS100RRTS100

RRTS100RRTS100

RRTS100RRTS100

RRTS100RRTS100

RRTS100RRTS100

RRTS100RRTS100

RRTS100RRTS100

Dispatch GraphsDispatch Graphs

Synthesize Results

Synthesize Results

Page 14: A Parallelization of State-of-the-Art Graph Bisection Algorithms

July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 1414

Statistical Properties of Cut-Statistical Properties of Cut-sizesizeStatistical Properties of Cut-Statistical Properties of Cut-sizesize• Incidence of Bests

Average quality is good Only 0.25% is the best

• General Property Distribution becomes

“Peak” as |V| grows Distribution tends

towards Gaussian8)

Mean and Variance scales linearly with |V|

• Incidence of Bests Average quality is good Only 0.25% is the best

• General Property Distribution becomes

“Peak” as |V| grows Distribution tends

towards Gaussian8)

Mean and Variance scales linearly with |V|

0

10

20

30

40

50

60

70

80

1050 1090 1130 1170 1210 1250 1290 1330 1370

Edge-Cut

Cou

nt

|V| = 35000 |E| = 346572 Degree: Max 43 Min 3 Avg 19.80

RRTS100 on 400 nodes provided by Grid Challenge Federation

Page 15: A Parallelization of State-of-the-Art Graph Bisection Algorithms

July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 1515

Issues of Parallelizing Issues of Parallelizing HeuristicsHeuristicsIssues of Parallelizing Issues of Parallelizing HeuristicsHeuristics• Hard by Message-Passing Model (MPI)

J.R. Gilbert and E. Zmijewski9): A parallel graph partitioning algorithm for a message-passing multiprocessor. International Journal of Parallel Programming

Par-METIS (Parallel METIS) Par-METIS only parallelized “coarsen-uncoarsen” part

• Hard to Be Efficient (statistic property) If we could parallelize heuristic efficiently

The fraction of reach the best bisections is still small among overall iterations

If we corporately run independent instance on Grid How many nodes will leads to best partition When will a good threshold come

• Hard by Message-Passing Model (MPI) J.R. Gilbert and E. Zmijewski9):

A parallel graph partitioning algorithm for a message-passing multiprocessor. International Journal of Parallel Programming

Par-METIS (Parallel METIS) Par-METIS only parallelized “coarsen-uncoarsen” part

• Hard to Be Efficient (statistic property) If we could parallelize heuristic efficiently

The fraction of reach the best bisections is still small among overall iterations

If we corporately run independent instance on Grid How many nodes will leads to best partition When will a good threshold come

Page 16: A Parallelization of State-of-the-Art Graph Bisection Algorithms

July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 1616

Contribution of PhasesContribution of PhasesContribution of PhasesContribution of Phases

• Initial Phase Reduce large portion of

Edge-cut Good initial partitions

lead to good final partitions

Consistent time for different running, good initial partitions gain time for refinement

• TS and KL Phase Reductions tend be

alike More iterations, better

results

• Initial Phase Reduce large portion of

Edge-cut Good initial partitions

lead to good final partitions

Consistent time for different running, good initial partitions gain time for refinement

• TS and KL Phase Reductions tend be

alike More iterations, better

results

900

1100

1300

1500

1700

1900

912 1035 1075 1079

Final KL FTS Init

Best Edge-Cuts

ΔE

dge-

Cu

t

Page 17: A Parallelization of State-of-the-Art Graph Bisection Algorithms

July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 1717

Results from Same Initial Results from Same Initial BisectionsBisectionsResults from Same Initial Results from Same Initial BisectionsBisections• Given Same Initial

Partitions Best initial partitions

leads to best final partitions

FTS and KL tend to be deterministic

Fewer swapping are available

• Diversity of edge-cut can be cancelled by distributing only one phase Run FTS and KL on one

node is enough

• Given Same Initial Partitions Best initial partitions

leads to best final partitions

FTS and KL tend to be deterministic

Fewer swapping are available

• Diversity of edge-cut can be cancelled by distributing only one phase Run FTS and KL on one

node is enough

0

10

20

30

40

50

915

Init: 1078

985

Init: 1156

987

Init: 1197

1000

Init: 1185

Perform FTS and KL on same initial partitions, 50 nodes

Cou

nt

Page 18: A Parallelization of State-of-the-Art Graph Bisection Algorithms

July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 1818

Multi-level ScoringMulti-level ScoringMulti-level ScoringMulti-level Scoring

• Mainly Used to Adapt Large-Scale Graphs If |V| = 1000, Tabu = 0.01 x 1000 = 10

If |V| = 100000, Tabu = 0.01 x 100000 = 1000

• Tuning Tabu-Length to fit specific graphs better Level-1 Scoring distinguish graphs from their types Level-2 Scoring test better Tabu-length from specific graphs

• Mainly Used to Adapt Large-Scale Graphs If |V| = 1000, Tabu = 0.01 x 1000 = 10

If |V| = 100000, Tabu = 0.01 x 100000 = 1000

• Tuning Tabu-Length to fit specific graphs better Level-1 Scoring distinguish graphs from their types Level-2 Scoring test better Tabu-length from specific graphs

950

1050

1150

1250

1350

1450

1550

1650

1750

0.01 0.04 0.07 0.1 0.13 0.16 0.19 0.22 0.25

Avg. Cut Min. Cut

900

950

1000

1050

1100

1150

1200

1250

1300

0.001 0.004 0.007 0.01 0.013 0.016 0.019

Avg. Cut Min. Cut

Level-2 Tabu Fraction

Level-1 Tabu Fraction

Edg

e-C

ut

Edg

e-C

ut

Page 19: A Parallelization of State-of-the-Art Graph Bisection Algorithms

July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 1919

Final ApproachesFinal ApproachesFinal ApproachesFinal Approaches

• Not to Use Multi-level Partition To preserve a “best” quality

• Not to Parallelize Heuristics Itself Not a good trade-off

• To Parallelize Scoring Phase One group of nodes score one tabu length With multi-level scoring technique

• To Parallelize Initial Phase Only Remove diversity of edge-cut ASAP

Take advantage of running distribution to remove diversity of edge-cut

Reduce computing effort AMAP Further refinement can be done on single node

• To Use GXP Cluster Shell “mw” command: mw M {{ W }}

• Not to Use Multi-level Partition To preserve a “best” quality

• Not to Parallelize Heuristics Itself Not a good trade-off

• To Parallelize Scoring Phase One group of nodes score one tabu length With multi-level scoring technique

• To Parallelize Initial Phase Only Remove diversity of edge-cut ASAP

Take advantage of running distribution to remove diversity of edge-cut

Reduce computing effort AMAP Further refinement can be done on single node

• To Use GXP Cluster Shell “mw” command: mw M {{ W }}

Page 20: A Parallelization of State-of-the-Art Graph Bisection Algorithms

July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 2020

Full PictureFull PictureFull PictureFull Picture

S: 0.01

S: 0.01

S: 0.02

S: 0.02

S: 0.03

S: 0.03

S: 0.04

S: 0.04

S: 0.05

S: 0.05

S: 0.06

S: 0.06

S: 0.07

S: 0.07

S:0.001

S:0.001

S: 0.002

S: 0.002

S: 0.003

S: 0.003

S: 0.004

S: 0.004

S: 0.005

S: 0.005

S: 0.006

S: 0.006

S: 0.007

S: 0.007

InitInit InitInit InitInit InitInit InitInit InitInit

FTS and KLFTS and KL

Multi-Multi-LevelLevel

Scoring Scoring

Initial Initial PhasePhase

RefinemenRefinement Phaset Phase

High-Scored Level-1 Tabu Fraction

High-Scored Level-1 Tabu Fraction

High-Scored Level-2 Tabu Fraction

High-Scored Level-2 Tabu Fraction

Best Initial Partitions

Best Initial Partitions

Page 21: A Parallelization of State-of-the-Art Graph Bisection Algorithms

July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 2121

ConclusionsConclusionsConclusionsConclusions

• Bisection Quality “Ever-Best” partitions

Edge-CutOUR ≤ Edge-CutRRTS≤ Edge-CutMETIS

• Bisection Time Comparable and Reasonable

TimeMETIS < TimeOUR << TimeRRTS

Speed Up 10 comparing to RRTS

• Adapted to Grid Environment Scalable Performance Convenient usage Good Fault Tolerant

• Bisection Quality “Ever-Best” partitions

Edge-CutOUR ≤ Edge-CutRRTS≤ Edge-CutMETIS

• Bisection Time Comparable and Reasonable

TimeMETIS < TimeOUR << TimeRRTS

Speed Up 10 comparing to RRTS

• Adapted to Grid Environment Scalable Performance Convenient usage Good Fault Tolerant

Page 22: A Parallelization of State-of-the-Art Graph Bisection Algorithms

July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 2222

御静聴ありがとうございまし御静聴ありがとうございました!た!