Upload
ira-washington
View
33
Download
3
Tags:
Embed Size (px)
DESCRIPTION
A Parallelization of State-of-the-Art Graph Bisection Algorithms. Nan Dun , Kenjiro Taura, Akinori Yonezawa Graduate School of Information Science and Technology The University of Tokyo. Problem Description. Graph Partition Goal: To minimize cut K-partition Bisection (Bipartition) - PowerPoint PPT Presentation
Citation preview
A Parallelization of State-of-the-A Parallelization of State-of-the-Art Graph Bisection AlgorithmsArt Graph Bisection Algorithms
A Parallelization of State-of-the-A Parallelization of State-of-the-Art Graph Bisection AlgorithmsArt Graph Bisection Algorithms
Nan DunNan Dun, Kenjiro Taura, Akinori Yonezawa, Kenjiro Taura, Akinori YonezawaGraduate School of Information Science and TechnologyGraduate School of Information Science and Technology
The University of TokyoThe University of Tokyo
Nan DunNan Dun, Kenjiro Taura, Akinori Yonezawa, Kenjiro Taura, Akinori YonezawaGraduate School of Information Science and TechnologyGraduate School of Information Science and Technology
The University of TokyoThe University of Tokyo
July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 22
Problem DescriptionProblem DescriptionProblem DescriptionProblem Description
• Graph Partition Goal: To minimize cut K-partition Bisection (Bipartition)
• Problem Complexity To find best partition or
To find approximate partitions: NP-Hard1)2)
• Solutions Heuristics
Non-deterministic On the Grid
• Graph Partition Goal: To minimize cut K-partition Bisection (Bipartition)
• Problem Complexity To find best partition or
To find approximate partitions: NP-Hard1)2)
• Solutions Heuristics
Non-deterministic On the Grid
2
1
3
4
6
5
グラフ分割問題グラフ分割問題
無向グラフ 無向グラフ G=(V,E)G=(V,E) が与えが与えられたとき、られたとき、 |L|=|R||L|=|R| を満たを満たすす VV の分割の分割 (L,R)(L,R) で、で、 LL とと RR間の枝の本数を最小にするも間の枝の本数を最小にするものを求める問題。のを求める問題。
L={1,2,3}L={1,2,3} R={4,5,6}R={4,5,6}
2
1
July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 33
Practical ApplicationPractical ApplicationPractical ApplicationPractical Application
• In Mathematics Analysis of sparse system of linear equations
• In Computer Science Modeling data placement on distributed memory,
to minimize communication
• In other Various Domains VLSI Design Transportation Networks Communication Networks
• In Mathematics Analysis of sparse system of linear equations
• In Computer Science Modeling data placement on distributed memory,
to minimize communication
• In other Various Domains VLSI Design Transportation Networks Communication Networks
July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 44
Bisection FlowBisection FlowBisection FlowBisection Flow
• Bisection Initialization Random Initialization Half-Half Initialization Region Growing
• Bisection Refinement Kernighan-Lin3)4)
Tabu Search7)
Fixed Tabu Search Reactive Tabu Search
• Bisection Initialization Random Initialization Half-Half Initialization Region Growing
• Bisection Refinement Kernighan-Lin3)4)
Tabu Search7)
Fixed Tabu Search Reactive Tabu Search
Bisection InitializationBisection Initialization
Bisection RefinementBisection Refinement
Initial Bisection
Final Bisection
July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 55
Min-Max Greedy GrowingMin-Max Greedy Growing7)7)Min-Max Greedy GrowingMin-Max Greedy Growing7)7)
Min: Search vertices Search vertices which cause minimal which cause minimal edge-cutedge-cut
Max: Breaking ties Breaking ties by maximizing by maximizing internal internal connectionsconnections
AB
C
addsetaddset
A
July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 66
Kernighan-LinKernighan-Lin3)4)3)4)Kernighan-LinKernighan-Lin3)4)3)4)
1. Calculate gain of each vertex
2. Search a serials of pairs which leads to maximal edge-cut reduction if being swapped
3. Swap pairs of vertices obtained in 2, lock them from further swap in current pass
4. Iterate step 1, 2, 3 until edge-cut stops to converge
1. Calculate gain of each vertex
2. Search a serials of pairs which leads to maximal edge-cut reduction if being swapped
3. Swap pairs of vertices obtained in 2, lock them from further swap in current pass
4. Iterate step 1, 2, 3 until edge-cut stops to converge
A
B
C
D
A B
C D
Swapping Pair of VerticesSwapping Pair of Vertices
*gain := # of Internal Edges - # of External *gain := # of Internal Edges - # of External EdgesEdges
gain(B) = -1, gain(C) = -2gain(B) = -1, gain(C) = -2
ΔΔCut of swapping B, C = Cut of swapping B, C = gain(B) + gain(C) + 2 = -gain(B) + gain(C) + 2 = -
11
July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 77
Tabu SearchTabu Search7)7)Tabu SearchTabu Search7)7)
• Kernighan-Lin Like Swapping pairs of vertices according to their
gains
• Temporarily Forbidden Previously swapped vertices are temporarily
forbad to move for a period of time (Tabu Length) Tabu Length: A fraction (Tabu Fraction) of |V|
E.g.: Tabu Fraction = 0.01, |V| = 1000, Tabu Length = 0.01 x |V| = 10 Previously swapped pairs are allowed to move again after 10 other swaps
To exceed “Local-Minimum”
• Kernighan-Lin Like Swapping pairs of vertices according to their
gains
• Temporarily Forbidden Previously swapped vertices are temporarily
forbad to move for a period of time (Tabu Length) Tabu Length: A fraction (Tabu Fraction) of |V|
E.g.: Tabu Fraction = 0.01, |V| = 1000, Tabu Length = 0.01 x |V| = 10 Previously swapped pairs are allowed to move again after 10 other swaps
To exceed “Local-Minimum”
July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 88
Graph Types – Tabu LengthsGraph Types – Tabu LengthsGraph Types – Tabu LengthsGraph Types – Tabu Lengths
• Number of Vertex Degree Denser random graphs tend to prefer smaller Tabu
lengths, while denser geometric graphs tend to prefer larger tabu lengths8)
• Distribution of Vertex Degree Graphs having uniform distribution of vertex degree
tend to have unique fitting tabu length
• Number of Vertex Degree Denser random graphs tend to prefer smaller Tabu
lengths, while denser geometric graphs tend to prefer larger tabu lengths8)
• Distribution of Vertex Degree Graphs having uniform distribution of vertex degree
tend to have unique fitting tabu length
1400
1500
1600
1700
1800
1900
0.01 0.04 0.07 0.1 0.13 0.16 0.19 0.22 0.25
7150
7250
7350
7450
7550
7650
7750
0.01 0.04 0.07 0.1 0.13 0.16 0.19 0.22 0.25
Edg
e-C
ut
Tabu Fraction
|V| = 17758 |E| = 54196 Deg: Max 573 Min 1 Avg. 6.1 |V| = 35000 |E| = 346572 Deg: Max 43 Min 3 Avg. 19.8
July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 99
RRTSRRTS7)7)RRTSRRTS7)7)
• Synthesis of Heuristics Heuristics perform as
complementary for each other
• Reactive Try each Tabu-length to
see which is better Adaptive to various
graphs
• Best Quality Beyond “Local-minimum”
• Long Running Time Scoring Phase
• Synthesis of Heuristics Heuristics perform as
complementary for each other
• Reactive Try each Tabu-length to
see which is better Adaptive to various
graphs
• Best Quality Beyond “Local-minimum”
• Long Running Time Scoring Phase
RREACTIVEEACTIVERRANDOMIZEDANDOMIZEDTTABUABUSSEARCEARCHH
Scoring each Tabu length by smallScoring each Tabu length by small runs of TS runs of TS do I times
Initial bisection by Min-Max
do J times TS with high-scoredhigh-scored Tabu length Refine by Kernighan-Lin runs
R. Battiti and A. A. Bertossi.R. Battiti and A. A. Bertossi. Greedy, Greedy, Prohibition, and Reactive Heuristics for Prohibition, and Reactive Heuristics for Graph Partitioning. Graph Partitioning. IEEE Transactions IEEE Transactions on Computers, Vol. 48, April 1999.on Computers, Vol. 48, April 1999.
July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 1010
Multi-level for Large GraphsMulti-level for Large GraphsMulti-level for Large GraphsMulti-level for Large Graphs
• Coarsen Phase Coarsen large graphs to
smaller one by using “Match Scheme”
Multi-level coarsen
• Bisection Phase Bisecting small graphs is
usually very fast
• Uncoarsen Phase Mapping back to original
graph Perform refinement in
each uncoarsening phase
• METIS5)12)
• Coarsen Phase Coarsen large graphs to
smaller one by using “Match Scheme”
Multi-level coarsen
• Bisection Phase Bisecting small graphs is
usually very fast
• Uncoarsen Phase Mapping back to original
graph Perform refinement in
each uncoarsening phase
• METIS5)12)
Matching Scheme
July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 1111
Comparison of HeuristicsComparison of HeuristicsComparison of HeuristicsComparison of Heuristics
METISMETIS RRTS100RRTS100 FTS10000FTS10000
cutcut timetime cutcut timetime cutcut timetime
G1 130 0.01 130 168.11 130 1.22
G2 366 0.07 353 696.49 354 13.85
G3 311 0.10 311 935.56 306 32.85
G4 6337 0.04 6257 353.45 6316 3.77
G5 950 0.17 Timeout (1 hour)Timeout (1 hour) 929 31.55
Graph |V| |E|Degree
Best Tabu FractionAvg Min Max
G1:fe_4elt 11143 32818 7.93 0 15 0.02
G2:fe_pwt 36519 144794 5.89 3 12 0.02
G3:fe_body 45087 163734 7.26 0 28 0.02
G4:mem 17758 54196 6.10 1 573 0.14
G5:wing 62032 121544 3.92 2 4 0.01
July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 1212
Comparison of HeuristicsComparison of HeuristicsComparison of HeuristicsComparison of Heuristics
• METIS Extremely Fast
Using Multi-level Technique High-Quality Bisections but worse than RRTS
Multi-level lacks “Global-Optimizing” during coarsen phase
• RRTS Very Slow
Scoring Phase is time costing “Ever-best” Bisections
Adaptive to kinds of graphs
• FTS with Known Tabu-Length Must faster than RRTS Comparable result to RRTS
• METIS Extremely Fast
Using Multi-level Technique High-Quality Bisections but worse than RRTS
Multi-level lacks “Global-Optimizing” during coarsen phase
• RRTS Very Slow
Scoring Phase is time costing “Ever-best” Bisections
Adaptive to kinds of graphs
• FTS with Known Tabu-Length Must faster than RRTS Comparable result to RRTS
July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 1313
A Naive ParallelizationA Naive ParallelizationA Naive ParallelizationA Naive Parallelization
• Run RRTS independently on each node Simply equivalent to scale-up iterations
• Generate Different seeds for different nodes Heuristics are initial sensitive 10% ~ 20% enhanced
• Run RRTS independently on each node Simply equivalent to scale-up iterations
• Generate Different seeds for different nodes Heuristics are initial sensitive 10% ~ 20% enhanced
RRTS100RRTS100
RRTS100RRTS100
RRTS100RRTS100
RRTS100RRTS100
RRTS100RRTS100
RRTS100RRTS100
RRTS100RRTS100
Dispatch GraphsDispatch Graphs
Synthesize Results
Synthesize Results
July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 1414
Statistical Properties of Cut-Statistical Properties of Cut-sizesizeStatistical Properties of Cut-Statistical Properties of Cut-sizesize• Incidence of Bests
Average quality is good Only 0.25% is the best
• General Property Distribution becomes
“Peak” as |V| grows Distribution tends
towards Gaussian8)
Mean and Variance scales linearly with |V|
• Incidence of Bests Average quality is good Only 0.25% is the best
• General Property Distribution becomes
“Peak” as |V| grows Distribution tends
towards Gaussian8)
Mean and Variance scales linearly with |V|
0
10
20
30
40
50
60
70
80
1050 1090 1130 1170 1210 1250 1290 1330 1370
Edge-Cut
Cou
nt
|V| = 35000 |E| = 346572 Degree: Max 43 Min 3 Avg 19.80
RRTS100 on 400 nodes provided by Grid Challenge Federation
July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 1515
Issues of Parallelizing Issues of Parallelizing HeuristicsHeuristicsIssues of Parallelizing Issues of Parallelizing HeuristicsHeuristics• Hard by Message-Passing Model (MPI)
J.R. Gilbert and E. Zmijewski9): A parallel graph partitioning algorithm for a message-passing multiprocessor. International Journal of Parallel Programming
Par-METIS (Parallel METIS) Par-METIS only parallelized “coarsen-uncoarsen” part
• Hard to Be Efficient (statistic property) If we could parallelize heuristic efficiently
The fraction of reach the best bisections is still small among overall iterations
If we corporately run independent instance on Grid How many nodes will leads to best partition When will a good threshold come
• Hard by Message-Passing Model (MPI) J.R. Gilbert and E. Zmijewski9):
A parallel graph partitioning algorithm for a message-passing multiprocessor. International Journal of Parallel Programming
Par-METIS (Parallel METIS) Par-METIS only parallelized “coarsen-uncoarsen” part
• Hard to Be Efficient (statistic property) If we could parallelize heuristic efficiently
The fraction of reach the best bisections is still small among overall iterations
If we corporately run independent instance on Grid How many nodes will leads to best partition When will a good threshold come
July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 1616
Contribution of PhasesContribution of PhasesContribution of PhasesContribution of Phases
• Initial Phase Reduce large portion of
Edge-cut Good initial partitions
lead to good final partitions
Consistent time for different running, good initial partitions gain time for refinement
• TS and KL Phase Reductions tend be
alike More iterations, better
results
• Initial Phase Reduce large portion of
Edge-cut Good initial partitions
lead to good final partitions
Consistent time for different running, good initial partitions gain time for refinement
• TS and KL Phase Reductions tend be
alike More iterations, better
results
900
1100
1300
1500
1700
1900
912 1035 1075 1079
Final KL FTS Init
Best Edge-Cuts
ΔE
dge-
Cu
t
July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 1717
Results from Same Initial Results from Same Initial BisectionsBisectionsResults from Same Initial Results from Same Initial BisectionsBisections• Given Same Initial
Partitions Best initial partitions
leads to best final partitions
FTS and KL tend to be deterministic
Fewer swapping are available
• Diversity of edge-cut can be cancelled by distributing only one phase Run FTS and KL on one
node is enough
• Given Same Initial Partitions Best initial partitions
leads to best final partitions
FTS and KL tend to be deterministic
Fewer swapping are available
• Diversity of edge-cut can be cancelled by distributing only one phase Run FTS and KL on one
node is enough
0
10
20
30
40
50
915
Init: 1078
985
Init: 1156
987
Init: 1197
1000
Init: 1185
Perform FTS and KL on same initial partitions, 50 nodes
Cou
nt
July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 1818
Multi-level ScoringMulti-level ScoringMulti-level ScoringMulti-level Scoring
• Mainly Used to Adapt Large-Scale Graphs If |V| = 1000, Tabu = 0.01 x 1000 = 10
If |V| = 100000, Tabu = 0.01 x 100000 = 1000
• Tuning Tabu-Length to fit specific graphs better Level-1 Scoring distinguish graphs from their types Level-2 Scoring test better Tabu-length from specific graphs
• Mainly Used to Adapt Large-Scale Graphs If |V| = 1000, Tabu = 0.01 x 1000 = 10
If |V| = 100000, Tabu = 0.01 x 100000 = 1000
• Tuning Tabu-Length to fit specific graphs better Level-1 Scoring distinguish graphs from their types Level-2 Scoring test better Tabu-length from specific graphs
950
1050
1150
1250
1350
1450
1550
1650
1750
0.01 0.04 0.07 0.1 0.13 0.16 0.19 0.22 0.25
Avg. Cut Min. Cut
900
950
1000
1050
1100
1150
1200
1250
1300
0.001 0.004 0.007 0.01 0.013 0.016 0.019
Avg. Cut Min. Cut
Level-2 Tabu Fraction
Level-1 Tabu Fraction
Edg
e-C
ut
Edg
e-C
ut
July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 1919
Final ApproachesFinal ApproachesFinal ApproachesFinal Approaches
• Not to Use Multi-level Partition To preserve a “best” quality
• Not to Parallelize Heuristics Itself Not a good trade-off
• To Parallelize Scoring Phase One group of nodes score one tabu length With multi-level scoring technique
• To Parallelize Initial Phase Only Remove diversity of edge-cut ASAP
Take advantage of running distribution to remove diversity of edge-cut
Reduce computing effort AMAP Further refinement can be done on single node
• To Use GXP Cluster Shell “mw” command: mw M {{ W }}
• Not to Use Multi-level Partition To preserve a “best” quality
• Not to Parallelize Heuristics Itself Not a good trade-off
• To Parallelize Scoring Phase One group of nodes score one tabu length With multi-level scoring technique
• To Parallelize Initial Phase Only Remove diversity of edge-cut ASAP
Take advantage of running distribution to remove diversity of edge-cut
Reduce computing effort AMAP Further refinement can be done on single node
• To Use GXP Cluster Shell “mw” command: mw M {{ W }}
July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 2020
Full PictureFull PictureFull PictureFull Picture
S: 0.01
S: 0.01
S: 0.02
S: 0.02
S: 0.03
S: 0.03
S: 0.04
S: 0.04
S: 0.05
S: 0.05
S: 0.06
S: 0.06
S: 0.07
S: 0.07
S:0.001
S:0.001
S: 0.002
S: 0.002
S: 0.003
S: 0.003
S: 0.004
S: 0.004
S: 0.005
S: 0.005
S: 0.006
S: 0.006
S: 0.007
S: 0.007
InitInit InitInit InitInit InitInit InitInit InitInit
FTS and KLFTS and KL
Multi-Multi-LevelLevel
Scoring Scoring
Initial Initial PhasePhase
RefinemenRefinement Phaset Phase
High-Scored Level-1 Tabu Fraction
High-Scored Level-1 Tabu Fraction
High-Scored Level-2 Tabu Fraction
High-Scored Level-2 Tabu Fraction
Best Initial Partitions
Best Initial Partitions
July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 2121
ConclusionsConclusionsConclusionsConclusions
• Bisection Quality “Ever-Best” partitions
Edge-CutOUR ≤ Edge-CutRRTS≤ Edge-CutMETIS
• Bisection Time Comparable and Reasonable
TimeMETIS < TimeOUR << TimeRRTS
Speed Up 10 comparing to RRTS
• Adapted to Grid Environment Scalable Performance Convenient usage Good Fault Tolerant
• Bisection Quality “Ever-Best” partitions
Edge-CutOUR ≤ Edge-CutRRTS≤ Edge-CutMETIS
• Bisection Time Comparable and Reasonable
TimeMETIS < TimeOUR << TimeRRTS
Speed Up 10 comparing to RRTS
• Adapted to Grid Environment Scalable Performance Convenient usage Good Fault Tolerant
July 31, KochiJuly 31, Kochi SWoPP 2006SWoPP 2006 2222
御静聴ありがとうございまし御静聴ありがとうございました!た!