Large-Scale Optimization in VLSI CAD Igor Markov imarkov

Large-Scale OptimizationLarge-Scale Optimizationin VLSI CADin VLSI CAD

Igor Markovhttp://www.eecs.umich.edu/~imarkov

Nov 1, 2000 Igor Markov, U. of Michigan

Goals/Outline of the TalkGoals/Outline of the Talk

Give a general idea about the field– success stories and applications– potential for cross-pollination

What drives the fieldReusable Intellectual Property in CADConsequences of “large-scale”Sample wide-open problems


General (VLSI CAD)General (VLSI CAD)

Very Large System Integration– numerous components + interconnect– emergent properties

• not apparent in isolated components

Computer-Aided Design– better than human design (super-human!)– and then some

FOR MORE INFO...

http://www.eecs.umich.edu/~imarkov/EECS527


Integrated CircuitsIntegrated Circuits

Excellent examples of “large systems”– manufacturing is enormously expensive

• research can prevent blunders… and pays off

– two Moore’s laws keep everyone busy• circuits are growing• circuit design is getting harder

– decreased market windows• must design quickly (or else…)

– digital circuits amenable to auto- manipulation• have a lot of regularity (easier to represent)


Just How Large?Just How Large?

As large as we can handle– a priori (physical) limits

are at least 20 years away– pushing the boundaries is our goal

Current limits– need to solve many NP-hard problems– poor understanding, mathematical models– lack of efficient algorithms

(typical problem sizes will follow)


Design via OptimizationDesign via Optimization

Think of all possible design solutions– “solution space”– need to choose one solution (or several)

What parameters should be optimized?– “objective functions” f1(x), f2(x),…

Need to observe design constraints The EDA revolution of the 1980s:

– searching, combinatorial and mathematical optimization may outperform engineeringintuition when implemented in software


A Meta-Approach to OptimizationA Meta-Approach to Optimization

Global Optimization– often cannot optimize “accurate” objectives

• they can be hopeless to evaluate• e.g., min routed wirelength as f(placement)

– find simpler objectives that correlate well– ditto for constraints

Detailed Optimization– improve global solutions by local search

• can now worry about weird constraints• can optimize a better measure of signal delay, etc


Consequences of “Large-Scale”Consequences of “Large-Scale”

Runtimes must scale near-linearly– strict limitation on used primitives

(e.g., no Gaussian elimination)– wide-spread use of multi-level methods

Same goes for memory consumption– cannot represent graphs as dense matrices– use random sampling/walks instead of enumeration

Trading solution quality for runtime– especially for randomized algorithms


Historic OpportunismHistoric Opportunism

In early days of VLSI CAD… – the Electronic Design Automation revolution

• enabling, but short-lived results (can easily do better)• e.g., “this new algorithm addresses objective f(x)”

– many proposed approaches never picked up As ICs became larger, most CAD tools

could not handle leading-edge circuits…– “algorithms for Deep SubMicron circuits”– soon turned out that many algos were weak

• partitioning, placement, SAT, etc.


CompetitivenessCompetitiveness

Outdated algorithms cause costly software rewrites and lost opportunity– commercial tools may sell for $400,000+

Learning circuit physics, optics, semiconductor technologies, applied math, CS theory, AI, databases, proper

software design, etc is well worth the effort– competitive edge

As a result of competitiveness, VLSI CAD offers– some of the best algorithms, very strong implementations– frequent contributions to other fields


Success StoriesSuccess Stories

Min-cut [hyper-] graph partitioning– (“very good” solutions)– 200K 0/1 variables, 1-2 mins of CPU time

Minimal Steiner trees (optimal)– hundreds of points in 1 second

Provably good routing (approximation)– 500K nets in several hours (!!!)


Min-cut PartitioningMin-cut Partitioning

Given– [hyper-] graph – k bins

• each accommodates up to N vertices

Seek– to assign each vertex to a bin

Minimize– # of [hyper-] edges between bins


Min-cut Partitioning (cont’d)Min-cut Partitioning (cont’d)

Numerous apps in VLSI CAD + beyond– supercomputing, data mining, Internet,…

Progress in partitioning algorithms– started in 1972 and still going

• many approaches invented / discarded

– now can auto-partition 1M-gate circuits• better than manually, with free software• couldn’t, even commercially, just 3 years ago• (this has nothing to do with Deep SubMicron)


Min-cut Partitioning (cont’d)Min-cut Partitioning (cont’d)

UCLA MLPart (ASPDAC 2000)– faster than hMetis per start– returns better solutions on average– never worse than 5% off from hMetis

– sometimes (ibm06,2%aa) 30% better– available in source code (C++) and binaries

• at “the bookshelf”, free for any use w/o notification

Used at Cadence, Intel, start-ups Vital to UCLA Capo placer


Steiner Minimal TreesSteiner Minimal Trees

Given– k points in the plane

Seek– a Steiner tree connecting the points

• add extra points • connect all points by straight-line segments

Minimize – total edge-length of the tree


Steiner Minimal Trees (cont’d)Steiner Minimal Trees (cont’d)

Applications– routing signal nets– connecting cities by highways

1989, Scientific American– “cannot find an SMT for 100 US cities”

1999, SODA (Warme/Zachariasen)– with GeoSteiner can do that in <1 sec– implementation available in source code


Routing of Multiple NetsRouting of Multiple Nets

Given– n-tuples of locations to be connected

• with Steiner trees (think of signal nets)

Constraints (not trivial to satisfy!)

– routes cannot occupy same spaceMinimize

– total length of routes, “congestion”


Routing Of Multiple NetsRouting Of Multiple Nets

One of the first circuit design automations (late 1960s)

Has enormous solution space A classic AI problem Current commercial tools (e.g., Cadence)

– up to a day for 500K nets, no guarantees ISPD 2000, Albrecht (using multi-commodity flows)

– 500K nets in several hours, within 20% of opt.– (IBM Power 3 chip)


What Makes a Break-through?What Makes a Break-through?(or at least a splash)(or at least a splash)

Study sample splashesIs it enough to minimize a function?

(function - relevant, minimization - efficient)– Yes– Yes, but …– No– Absolutely not


Background: VLSI PlacementBackground: VLSI Placement

bad placement good placement


Global WL-driven PlacementGlobal WL-driven Placement

Objective– total Half-Perimeter WireLength– approximates Steiner Minimal Tree

UCLA Capo placer (DAC 2000)– beats Cadence QPlace on many benchmarks

• <50k gates; unpublished: 30% better on a 280K gate bm.• compared by “routed WL” after Cadence WarpRoute• in congestion-driven mode; 1 routing violation = failure

– used for research at IBM, Intel, Phillips; CMU,…– available in source code (C++), free for any use– (timing-driven mode not yet released)


Background: Detailed PlacementBackground: Detailed Placement

Detailed circuit placement– given locations of circuit elements (“cells”),

improve them by local changes (e.g., swaps)– minimize total length of signal nets

“Local”, but large-scale problem– entails a very large number of small sub-problems

Practically important – local improvements directly translate to large scale– very similar to floorplanning (a high-level problem)


Background: Detailed PlacementBackground: Detailed Placement

• Naïve “detailed” optimization

- consider 7-8 “cells” at a time

- enumerate all permutations

- compute HPWL for each

- pick the best permutation

- repeat for another group of 7-8

• Greater groups better solutions

-practical limit: 0.01sec per group

• Use Branch-and-bound for each group (ISPD `99)

• Overall linear runtime

• Easy parallelization (optimize many groups in ||)


Optimal InterleavingOptimal Interleaving

Can handle 30+ elements at a time– easier to implement than B&B– the order constraint turns out very mild

Very good result – but, seemingly, nothing more than min f(x) !

A B C D E 1 2 3 4 5

A 1 2 B C 3 4 D 5 E

ICCAD 2000, Hur and Lillis (TR available)

Optimally in O(n2) time by Dynamic Programming


Popularity Comparison w GeoSteinerPopularity Comparison w GeoSteiner

The Hur/Lillis algorithm– appeared several months ago (on paper)– already implemented by several groups

• with great results

… but Warme’s GeoSteiner– is barely used– source code published 2 years ago– instead, used are simple heuristics that are slower

Difference: ease of reuse!– of result itself and/or of its representation


Intellectual Property in CADIntellectual Property in CAD Reuse?

– today hundreds of VLSI CAD engineersare implementing the same, known, but difficult algorithms

Breakthroughs typically producevalidated and reusable intellectual property– yet another algorithm to min f(x) does not automatically

qualify for validated, reusable CAD IP • applicability, generality, quality of description, etc.

– CAD IP is not just algorithms and code – CAD IP: benchmarks, evaluation techniques, empirical

studies/results, algorithm analyses,etc Studies of CAD IP suggest:

– to effectively reuse, need infrastructure


Intellectual Property in CADIntellectual Property in CAD GRSC Bookshelf

for Fundamental Algorithms in CAD– a repository for reusable CAD IP, a publication medium– a way to communicate with industry

• problem formulations are also considered CAD IP

– http://vlsicad.cs.ucla.edu/GSRC/bookshelf Existing bookshelf “slots” include

– SAT, Graph Coloring, Hypergraph Partitioning, Mathematical Optimization, Circuit Placement, Clock Tree Routing, Global Routing, Interconnect Optimization, etc…

Leading-edge implementations (free for all uses)• UCLA Physical Design Tools (graph partitioners, placers,etc)• many more (SAT solvers from U. Michigan, GeoSteiner, etc)


Reuse and Education Reuse and Education Both are necessary to sustain Moore’s laws

– not enough designers to implement new chips– not enough CAD engineers to automate design

Need to teach/study reusable design– hardware, software/CAD IP (similar? different?)– note: typical “promising” research demos not reusable

Design of reusable software– “theory” has been available for years (processes, code metrics,

interface languages, modeling, robust public-domain tools, etc)– need [more] infrastructure, practice, experience of reuse– first: reuse software– then: design reusable software


Research Directions (1)Research Directions (1)

“Citius, Altius, Fortius”– faster, leaner implementations– higher-quality solutions– stronger impact on applications– aid available: latest advances in CS theory,

Mathematics, AI, software engineering, etc

Large-scale computing aspects of VLSI CAD– memory locality (big deal for irregular circuits)– “memory-less” algorithms (and trade-offs)



Quantified suboptimality of heuristics– (for NP-hard problems)– how close can we get to optima in practice?

• estimate suboptimality of specific solutions• study dependence on input distributions• related to CS theory / approximation algos

– example: detection of symmetries in Logic Synthesis• Kravets/Sakallah, ICCAD 2000 and TR

Lower bounds and impossibility arguments for fundamental algorithms


Research Directions (3)Research Directions (3) Using better, but still computable, models of reality

– simulation as a driver for optimization– modeling semiconductor effects

• Alpert et al, ISPD 2000 --- a new interconnect delay model, better than Elmore delay; all optimizations assuming Elmore are open to “porting”

• inductance, noise, etc

– effects of statistical variations CAD for new types of semi technologies and styles

– subwavelength lithography (optical proximity correction, etc)– System-On-Chip (high-level partitioning, etc)

CAD for analog circuits (including RF, MW)



Self-conscious optimization tools– prediction and estimation

• of solution quality before optimization• SLIP 2001 - http://www.ee.pdx.edu/~slip• GTX - http://www.gigascale.org/gtx

– calibration (which solutions/tools are good?) Support for intelligent/expert users

– “computer-aided” does not always mean “w/o people”– efficient visualization, diagnostics and interactivity

• how do you visualize a partitioning solution?• how do you visualize many unrouted 2-pin nets in same row?


ConclusionsConclusions

Large-scale optimization in VLSI CAD– dynamic and challenging field– benefits from other fields and gives back– IP reuse is paramount– research is respected and economically justified– opportunities available

FOR MORE INFO...

http://www.eecs.umich.edu/~imarkov/

Documents

Large-Scale Optimization in VLSI CAD Igor Markov imarkov