Multilevel Optimization in VLSICAD - Springer978-1-4757-3748-6/1.pdf · Multilevel Optimization in VLSICAD edited by Jason Cong University of California, Los Angeles, U.S.A. and Joseph

Multilevel Optimization in VLSICAD

COMBINATORIAL OPTIMIZATION

VOLUME 14

Through monographs and contributed works the objective of the series is to publish state of the art expository research covering all topics in the field of combinatorial optimization. In addition, the series will include books which are suitable for graduate level courses in computer science, engineering, business, applied mathematics, and operations research.

Combinatorial (or discrete) optimization problems arise in various applications, including communications network design, VLSI design, machine vision, airline crew scheduling, corporate planning, computer-aided design and manufacturing, database query design, cellular telephone frequency assignment, constraint directed reasoning, and computational biology. The topics of the books will cover complexity analysis and algorithm design (parallel and serial), computational experiments and applications in science and engineering.

Series Editors:

Ding-Zhu Du, University of Minnesota Panos M. Pardalos, University of Florida

Advisory Editorial Board:

Afonso Ferreira, CNRS-LIP ENS Lyon Jun Gu, University of Calgary David S. Johnson, AT&T Research James B. Orlin, M.I.T. Christos H. Papadimitriou, University of California at Berkeley Fred S. Roberts, Rutgers University Paul Spirakis, Computer Tech Institute (CTl)

The titles published in this series are listed at the end of this volume.

Multilevel Optimization in VLSICAD edited by

Jason Cong University of California, Los Angeles, U.S.A.

and

Joseph R. Shinned University of California, Los Angeles, U.S.A.

Springer-Science+Business Media, B.Y.

A c.I.P. Catalogue record for this book is available from the Library of Congress.

ISBN 978-1-4419-5240-0 ISBN 978-1-4757-3748-6 (eBook) DOl 10.1007/978-1-4757-3748-6

Printed on acidjree paper

All Rights Reserved © 2003 Springer Science+Business Media Dordrecht Originally published by Kluwer Academic Publishers in 2003. Softcover reprint of the hardcover 1st edition 2003 No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work.

Contents

List of Figures ix

List of Tables xiii

Preface xv

Chapter 1 Multigrid Solvers and Multilevel Optimization 1

Strategies Achi Brandt and Dorit Ron

1 Unconstrained quadratic optimization and basic multiscale con-cepts 4

2 Linear geometric multigrid 7 3 Algebraic multigrid (AMG) 13 4 Numerical homogenization: High-accuracy coarsening 19 5 Non-symmetric and highly indefinite matrices 21 6 Non-local equations: Dense matrices 23 7 Non-quadratic optimization: Nonlinear systems 27 8 Constrained optimization and eigenproblems 34 9 Non-deterministic systems 42 10 Global optimization: Multilevel annealing 49 11 Graph and hypergraph problems 53 12 Multilevel formulation 59

Chapter 2 An Exploration of Multilevel Combinatorial 71

Optimisation Chris Walshaw

1 The Graph Partitioning Problem 76 1.1 Multilevel graph partitioning 77 1.2 Multilevel refinement: experimental results 83 1.3 Multilevel landscapes: experimental results 89 1.4 Variant problems 96

2 The Travelling Salesman Problem 97 2.1 A multilevel algorithm for the travelling salesman prob-

lem 99 2.2 Multilevel refinement: experimental results 103 2.3 Multilevel landscapes: experimental results 107

3 Summary 112

v

VI MULTILEVEL OPTIMIZATION AND VLSIGAD

3.1 A generic multilevel strategy 112 3.2 Related work 114 3.3 Typical runtime 115 3.4 Solution-based coarsening and iterated multilevel algo-

rithms 116 3.5 Review of the experimental data 117 3.6 Conclusions and future research 118

Chapter 3 Multilevel Hypergraph Partitioning George K arypis

1 Hypergraph Partitioning - Problem Definition 1.1 Extensions on the Basic Problem 1.2 Methods for Computing a k-way Partitioning

2 The Multilevel Paradigm for Hypergraph Partitioning 3 The Various Phases of the Multilevel Paradigm

3.1 Coarsening Phase 3.2 Initial Partitioning Phase 3.3 Uncoarsening and Refinement Phase

4 Why Does the Multilevel Paradigm Work? 5 Extensions of the Multilevel Paradigm 6 Direction of Future Research

Chapter 4 Multilevel Circuit Placement Tony F. Ghan, Jason Gong, Tim Kong and Joseph R. Shinned

125

126 128 129 130 132 132 140 141 146 148 151

155

1 Problem Formulation and Approximations 157 1.1 Objective and Constraint Representations 158 1.2 Hypergraph and Graph Models 161 1.3 Global Placement and Detailed Placement 161

2 Contemporary Methodology 162 2.1 Annealing Methods 162 2.2 Analytical Methods 163 2.3 Partitioning-Based Methods 165 2.4 Hybrid Methods 167 2.5 Multilevel Simulated-Annealing-Based Methods 169

3 mPL: A Multilevel Placement Algorithm 172 3.1 Bottom-Up Hierarchy Construction 174 3.2 Nonlinear Programming by a Penalized Interior-Point

Method 178 3.3 Discrete Refinement 184 3.4 Numerical Experiments 186

4 Conclusions 188

Chapter 5 Multilevel VLSI Routing Jason Gong, Min Xie, Yan Zhang

1 Introduction to VLSI Routing Problem 2 Overview of the Routing Flow 3 Coarsening Process and Resource Reservation

3.1 Merging Resource 3.2 Resource Reservation

4 Initial Routing 5 History-Based Incremental Refinement

195

196 200 202 204 205 207 209

Contents

5.1 The Path Searching Algorithm 5.2 History-Based Iterative Refinement

6 Experimental Results

Chapter 6

Vll

210 211 213

Optimization for Reconfigurable Systems 219 Using Hierarchical Abstraction

Elaheh Bozorgzadeh, Adam Kaplan, Ryan Kastner, Seda Ogrenci Memik and Majid Sarra/zadeh

1 Introduction 220 2 Compilation: Data Communication Minimization 225

2.1 Problem Definition 226 2.2 Algorithm 229

3 Customized Resource Allocation 239 3.1 Gain and Overlap Model 242 3.2 Problem Formulation 244 3.3 Overlap Graph 245 3.4 Customized Block Selection Algorithm 250

4 Simultaneous Scheduling and Binding: Customized Resource Utilization/Latency Minimization Trade-off 253 4.1 Problem Formulation 253 4.2 Proposed Scheduling Algorithm 254

5 Conclusions 262

Chapter 7 Practical Aspects of Multiscale Optimization

Methods for VLSICAD Robert Michael Lewis and Stephen G. Nash

1 The VLSICAD Optimization Model 2 Existing Approaches 3 The Multiscale Optimization Algorithm 4 Properties of the Multiscale Algorithm 5 Practical Issues 6 Computational Examples

6.1 A Hyperbolic Model Problem 6.2 An Elliptic Model Problem

7 Conclusions

Index

265

267 269 272 274 281 285 285 288 288

293

List of Figures

1.1 FMG algorithm with one V cycle per leveL 12

2.1 The multilevel scheme applied to a simple objective function 75

2.2 An example of multilevel partitioning 78 2.3 An example of coarsening via matching and contraction 79 2.4 Example partitions of a small graph 84 2.5 Plots of convergence behaviour for the partitioning

test suites 87 2.6 Plots of convergence behaviour including iterated

multilevel partitioning results 89 2.7 An example of the large sparse (semi-regular) GPP

test instances 92 2.8 Enumeration results for the small GPP instances 94 2.9 Sampling results for the large GPP instances 95 2.10 An example of a multilevel TSP algorithm at work 99 2.11 TSP matching examples 101 2.12 Example tours for a small TSP instance 103 2.13 Plots of convergence behaviour for the travelling

salesman test suite 105 2.14 A small coarsened TSP instance 107 2.15 Enumeration results for the small TSP instances 111 2.16 Sampling results for the large TSP instances 111 2.17 A schematic of the multilevel refinement algorithm 112 3.1 A sample circuit and its hypergraph representation 126

3.2 The various phases of the multilevel hypergraph bisection 131 3.3 Coarsenings induced by vertex matchings 135 3.4 Natural clusters obscured by edge-coarsening 137 3.5 Heavy-edge matching minimizes the exposed edge weight 147 3.6 Escape from local minimum with edge-cut reducing moves 147 3.7 The effect of coarsening on the size of the hyperedges 148

IX

x MULTILEVEL OPTIMIZATION AND VLSICAD

3.8 The effect of restricted coarsening 149 4.1 A Good Placement 157 4.2 A Bad Placement of the Same Circuit in Figure 4.1 158 4.3 The V-Cycle Flow of mPL 173 4.4 The CAPFOREST algorithm for all-edges xy-mincut

estimation 175 4.5 The multilevel ESC algorithm for recursive graph

coarsening 177 4.6 Sequential Unconstrained Minimization Techniques

(SUMT) 180 4.7 Generic Linesearch Algorithm 182 4.8 f-neighbor hood 185 4.9 Search Tree from A 186 5.1 Traditional Two Level Routing Flow 196 5.2 Hierarchical Routing Flow 198 5.3 3-level Routing Flow 199 5.4 Multilevel Routing Flow 200 5.5 Limitation of Hierarchical Approaches 202 5.6 Three Dimensional Routing Graph 203 5.7 Resource Estimation Model 204 5.8 Path Cost Example 204 5.9 Merging of Coutour List 205 5.10 The Effect of Resource Reservation 206 5.11 Reservation Calculation 207 5.12 Approximate Multicommodity Flow Algorithm 210 5.13 Constrained Maze Refinement 212 6.1 Different VLSICAD Flow Methodologies 221 6.2 Overall Design Flow for Reconfigurable Computing

Systems 223 6.3 A Control Data Flow Graph 227 6.4 Distributed Control and Centralized Control 228 6.5 Conversion of Straight-line Code to SSA and SSA

Conversion with Control Flow 231 6.6 SSA form and the corresponding floorplan 232 6.7 SSA form with the ¢-node spatially distributed 234 6.8 Spatial SSA Algorithm 235 6.9 Different Customized Block Candidates on a Data

Flow Graph 241 6.10 Overlap Graph 246

List of Figures Xl

6.11 Supernodes and Subnodes in Overlap Graph 247 6.12 Overlap MultiGraph 249 6.13 Candidate Node i Being Added to Cluster C 251 6.14 Pseudocode of Customized Block Selection Algorithm 252 6.15 Bipartite Graph Representation and Corresponding

Point Set on the x-y Plane 258 6.16 Non-crossing bipartite matching example 259 6.17 Pseudocode for max_weighted_k_chainO Procedure 260 6.18 Pseudocode for Overall Scheduling Algorithm 261 7.1 Comparison of approaches for the advection prob-

lem 287 7.2 Comparison of approaches for the Dirichlet-to-Neumann

map 289

List of Tables

4.1 Comparison of Different Clustering Algorithms 178 4.2 Test Circuits 187 4.3 Impact of Nonlinear Programming on Circuit biomed:

15% Reduction in Wirelength 187 4.4 Comparison with GORDIAN-L-DOMINO 188 5.1 Examples Used for Multilevel Routing; Lr denotes

the number of routing layers; Lc, the number of levels. 214 5.2 Comparision of 3-level and Multilevel Routing Results 214 5.3 Comparision of Hiearchical Routing and Multilevel

Routing Results 215

Xlll

Preface

During the past thirty years, the computer-aided design of very largescale integrated circuits (VLSICAD) has been an enabling force behind the exponential growth in the performance and capacity of integrated circuits according to Moore's Law. As this growth continues toward gigascale integration, however, the obstacles to further improvements in design and design automation become ever more daunting. The goal of this book is to support the innovation needed to meet the challenge.

The prevailing VLSICAD methodology is challenged by the rapid increases in both design complexity and the interconnect-to-device delay ratio. On-chip integration, as expressed in transistors per chip, increases at approximately 58% per year compounded, doubling roughly every 18 months. Total wirelength is increasing in a similar fashion, from around 2 km today to around 10 km projected by 2009, with 8 or 9 routing layers. Accompanying this growth is a qualitative transition in the relationship between on-chip communication and computation. In deep submicron designs, the interconnect delay far exceeds the device delay and is the dominant factor determining system performance. Typically, more time is now required to transmit data between different chip components than to generate it by computation.

These changes are steadily weakening the abstraction hierarchy traditionally used to divide VLSI design into separate tasks suitable for implementation by multiple design teams. The current VLSI design flow typically proceeds in the following sequence: behavioral level design, register-transfer-level design, logic design, physical design. The success of this approach depends heavily on a tight correlation between the abstract model at the higher level and the implementation at the lower level. Such a correlation, however, is difficult to maintain, as the existing abstractions are largely incapable of modeling the performance, reliability, and complexity of the interconnect. Consequently, many iterations over the flow sequence are typically required to meet timing requirements. The lack of adequate physical models in the behavioral

xv

XVI MULTILEVEL OPTIMIZATION AND VLSICAD

and logical design stages leads ultimately to instability in the design process which grows ever more problematic as design sizes increase.

Multilevel methods, also known as multiscale methods, construct a hierarchy of successively coarser problems from the bottom up by recursive aggregation. They employ iterative improvement at each of the resulting levels, transfer these improvements up and down the hierarchy, and eventually terminate with a solution at the original, finest level. Typically, they converge in the optimal time order to solutions equal or superior to those obtained by non-hierarchical means. Widely used in the solution of integral and differential equations, they are beginning to be viewed also as a generic framework for solving difficult large-scale constrained global optimization problems. Throughout scientific computation, these methods are increasingly seen as indispensable to scalable solutions, but the associated knowledge seems insufficient to meet the immediate demands of VLSICAD.

The first direct application of a multilevel algorithm to VLSICAD came in 1991 with FastCap, a package for capacitance extraction based on the Fast Multipole Method of Greengard and Rokhlin. Developed by Jacob White and Keith Nabors at MIT, FastCap's run-time is several orders of magnitude less than that required by the previous state of the art. The first widely successful use of multilevel optimization in VLSICAD came in 1997 with the hMETIS package for circuit partitioning, described in Chapter 3 of this book. Currently, efforts are underway to extend the multilevel framework to solve other VLSICAD problems, such as placement (Chapter 4) and routing (Chapter 5).

Experience so far confirms the power and generality of the multiscale approach but also points to large gaps in our understanding. For a given problem, what properties are essential to the level-specific relaxation algorithm? What are the requirements of an effective hypergraph coarsening scheme? Does simple clustering suffice, or can results be improved by associating a given point at a finer level with several points at the adjacent coarser level? If so, then how? When the problem to be solved can be approximated by a linear system of equations, many satisfactory answers to these questions are available. But in the case of a large-scale, combinatorial optimization problem, linearization is impractical. Most multilevel algorithms in this setting use iterative improvement only in the interpolation phase. Can they be improved by appropriate use of relaxation during coarsening as well?

The work presented here is contributed by participants of the 2001 Workshop on Multilevel Optimization in VLSICAD sponsored by the UCLA Institute for Pure and Applied Mathematics. The purpose of the workshop was to increase interaction among scientists, engineers, and

PREFACE XVll

mathematicians studying either multilevel optimization or VLSICAD. The diverse content of this volume reflects the diverse interests of the workshop participants.

Chapter 1 presents a broad and general overview of practical methods for multiscale optimization. A concise presentation of basic principles is followed by a description of the geometric origins of multiscale methods in partial differential equations (PDE) and their algebraic extensions. Further descriptions of techniques for non-local equations, nonquadratic nonlinear systems, eigenproblems, ill-posed problems, and constrained, global optimization convey some sense of the scope, variation, and generality of the multiscale heuristic. The view in this chapter is toward the essential mathematical properties common to useful algorithms as well as specific adaptations to distinct problem domains.

Chapter 2 relates recent successes of multiscale algorithms in two classic problems of large-scale combinatorial optimization. Methods for the graph partitioning and traveling salesman problems are considered in detail. The approach taken here is to find coarsening strategies that can be used with existing local-search algorithms to accelerate convergence and improve solution quality. Coarsening in the graph partitioning problem is achieved by recursive maximal matching. The Kernighan-Lin algorithm is used for relaxation. Balancing the weights of the clusters can be done during coarsening or relaxation, or both. For the traveling salesman problem, coarsening proceeds by repeatedly matching pairs of adjacent vertices and fixing the corresponding edges. Chains of edges can be replaced by single edges retaining only the endpoints of the chain. When edges are unpacked during interpolation, alternative connections are investigated according to the chained Lin-Kernighan heuristic. Experiments on both problems produce similar results. When the data graph input is not too dense, the approach consistently improves on the given local-search method. Beyond a certain density threshold, however, the multilevel approach taken here is less competitive. The difficulty seems to lie not so much in the lack of sparsity as in the lack of natural coarsening heuristics for the denser problems.

Chapter 3 surveys the hMETIS family of multiscale algorithms for hypergraph partitioning, a problem of direct and enormous significance both in VLSICAD and elsewhere. Coarsening in hMETIS proceeds by the first-choice method, in which each vertex is matched to another vertex for which the sum of the hyperedge weights they share is maximum. The matching is independent of the order in which vertices are considered and results in connected components of vertices rather than just isolated pairs. Each such component becomes a cluster at the adjacent coarser level. Hyperedges at the finer level are transferred directly to the

XVlll MULTILEVEL OPTIMIZATION AND VLSICAD

coarser level but shrink in cardinality along the way, as their constituent vertices are merged into clusters. Singleton hyperedges are eliminated and duplicate hyperedges are merged. In this way, cutsize at coarser levels corresponds directly to cutsize at the finest level as well. Relaxation is an accelerated variation of the Fiduccia-Mattheses algorithm (FM). Because the best partition at the coarsest level need not lead to the best partition at the finest level, multiple partitions at the coarsest level are retained and propagated back toward the finest level. For efficiency, a fixed fraction of these are discarded at each level of the interpolation phase until only one candidate remains.

Chapter 4 describes recent work on both top-down hierarchical placement and multilevel placement. In the authors' multilevel scheme, mPL, coarsening proceeds by vertex matching on a clique-model graph approximation. Vertices are matched according to estimates of their min-cut connectivity as estimated in N log N by the CAPFOREST algorithm of Nagamochi and Ibaraki. At the coarsest levels (500-1000 clusters), continuous approximation and nonconvex nonlinear programming are used. After legalization by linear assignment, these are further refined by permuting clusters by a variation of Goto's algorithm. At finer levels, only this discrete relaxation is used. An order-of-magnitude speed-up of the multilevel approach is obtained by mPL, but at some cost in solution quality (5-10%) compared to GORDIAN-L.

Chapter 5 describes recent work on multilevel gridless VLSI routing. The given routing region is first partitioned into small rectangular regions called tiles. A line-sweeping algorithm is used to estimate the routing resources available to each tile. Clusters of adjacent tiles are merged to form the next coarser level, resources of clusters being sums of the resources of their components. A multicommodity flow method is employed at the coarsest level to produce an initial global routing. Modified maze-search routing is used as relaxation during interpolation. At the last and finest level of the interpolation phase, gridless detailed routing is used to determine precise wire locations. Compared to existing methods, improved solutions are obtained while the run-time is decreased by factors of 3 to 75.

Chapter 6 departs from the multilevel view, focusing on optimization at different stages in the hierarchical design of a reconfigurable system. The design paradigm supports incremental changes and enhanced flexibility across all levels of abstraction, from compilation through high-level synthesis to resource binding and physical implementation. Compilation translates high-level reconfiguration source code to a control-flow data graph representation. Communication overhead, measured by counting reads and writes to RAM, is minimized by a variation of Static Single

PREFACE xix

Assignment. Customized block selection is performed by constrained greedy clustering on an overlap graph. Simultaneous scheduling and binding is reduced to bipartite matching subproblems on a data flow graph. The algorithms presented in this chapter present a complete flow for the hierarchical design of a reconfigurable system.

Finally, Chapter 7 describes a continuous multilevel approach to discretized optimization problems with differential-equation constraints. Two model problems are considered: one with hyperbolic PDE constraints (a linear advection equation) and one with elliptic PDE constraints (Laplace's equation). The first of these is related to substrate current modeling, the second, to force-directed overlap removal. While solving the differential equations alone by multigrid might not be effective, a multilevel method for solving the entire optimization problem at once is highly successful.

The continuing exponential scaling of on-chip integration poses organizational challenges that can be met only by fundamental advances in the scalability and scope of VLSI design algorithms. Practical algorithms are a synthesis of general ideas and problem-specific heuristics. The multilevel idea demonstrates clear potential for improved algorithms, but making effective syntheses for real problems is far from easy. We hope that this book will facilitate further advances of multilevel optimization in VLSICAD.

Acknowledgments

This book is a byproduct of the 2001 Workshop on Multilevel Optimization in VLSICAD hosted by the UCLA Institute for Pure and Applied Mathematics (IPAM). The editors thank all participants of the workshop and the excellent IPAM staff. In particular, for their invaluable assistance with the workshop's organization and administration, we recognize the efforts of Directors Tony Chan and Mark Green; Assistant Director Eilish Hathaway; Information Systems Manager Carl Hunt; and Administrative Specialist Lee Melreit.

Financial support for our work in editing this book comes from the Semiconductor Research Consortium (SRC grant 99-TJ-686), the National Science Foundation (NSF grant CCR-9901153), and gifts from the IBM and Intel corporations. We gratefully acknowledge their contributions.

Jason Cong and Joseph Shinnerl {cong,shinnerl}~cs.ucla.edu

University of California, Los Angeles September, 2002

Documents

Multilevel Optimization in VLSICAD - Springer978-1-4757-3748-6/1.pdf · Multilevel Optimization in VLSICAD edited by Jason Cong University of California, Los Angeles, U.S.A. and Joseph