Upload
bonita
View
24
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Hierarchical Physical Design Methodology for Multi-Million Gate Chips Session 11. Wei-Jin Dai. Overview. Introduction Challenges of hierarchical design Hierarchical methodology – Full chip physical prototyping Performance data Summary. Introduction. - PowerPoint PPT Presentation
Citation preview
Hierarchical Physical Design Methodology for Multi-Million Gate
Chips
Session 11
Wei-Jin Dai
2
Overview
• Introduction
• Challenges of hierarchical design
• Hierarchical methodology – Full chip physical prototyping
• Performance data
• Summary
3
Introduction
• As chip size and complexity grow, hierarchical design approach is necessary
• During last 12 months, there is a big increase in the number of chips designed with hierarchical approach
• The advantages of hierarchical approach is divide-and-conquer
4
The Challenges• How to get full-chip (10 million gates+) physical
reality early on to identify potential problems?• How to have convergence process to reach design
closure from beginning to end?• How to achieve die utilization similar to “flat”
approach?• How to achieve clock speed and skews similar to
“flat” approach?• How to automatically generate optimal pin
assignments for each module?• How to automatically come up with realistic timing
budgets for each module?• How to achieve top level timing/signal integrity
closure?
5
Creating the Physical Prototype
• Full-chip flat prototype delivers the complete physical, timing, clock and power data– Eliminates the guessing of the traditional block-based
approaches
• Drives the partitioning in manageable blocks
Flat Full-Chip
Delivers an
Accurate Physical Prototype
6
Estimation
Prototyping Starts Early in the Flow
• Most accurate view possible at all design stages• Physical timing budgeting drives synthesis
RTL/Black box
75% netlist/Black box
Completenetlist
Refinement Optimization
DesignCompletion
P r o t o t y p i n g
Initial timing budgets
Refined timing budgets
7
Hierarchical Design Flow
Flat Full ChipPhysical Prototype
PhysicallyFeasible?
Physical Partitioning
Top Level ImplementationCTS, Optimization, Power
NO
OptimizedTop Level Netlist
• Die size• Timing• Clock skew• Power• SI
LEF/GDSIIRTL/Black BoxProcess Data
• Quick synthesis• Floor planning• Placement• CTS• Trial route
PartitionData
BlockImplementation
Place, CTS, Optimize
PartitionDataPartition
DataPartition
DataPartition
Data
• Pin assignment• Timing budget• Clock spec• Power grid
DEFPlacement
Chip LevelTiming
Constraints
DEFPlacement
8
Hierarchical Partitioning
• Pin assignment• Timing budgeting• Clock tree generation• Power grid planning
PartitioningIndependent block-level
implementation SoC assembly
9
Accurate Pin Assignment
• Full-chip prototype results in optimal pin placement– Results in narrower channels and reduced die size– Reduces the routing congestion– Improves the chip timing
Accurate Physical PrototypeFlat Full-Chip
Top Level Partition View
10
Timing Budgeting
Each block requires:• Clock definition• Set_input_delay• Set_output_delay• Set_drive• Set_load• Path exceptions
(false, multicycle paths)
Block 1
Block 3
Block 2
L
L
L
Accurate timing budgets result in predictable timing convergence
11
Hierarchical Clock Tree Synthesis
• Accurate physical timing data enables the creation of an optimal clock tree– Block-level followed by top-level clock tree
• Final clock tree routing generates near zero skew– Balanced tree at the top level
Worst block skew
+ Zero top level skew
= 150ps total clock skew
Balancedclocktree
150psskew
120ps skew
50psskew
50psskew
100psskew
130ps skew
12
Full Chip Power Analysis
13
Hierarchical Power Grid Design
• P/G are planned at full chip level• P/G network gets automatically pushed down
during partitioning
Full chip
Block
14
Performance Data
Design Description Netlist to SDF Time
1.8M cells; 200 macros 6 hours
900K cells 3 hours
2.3M cells; 700 macros 14 hours
2M cells; 100+ macros 5 hours
2.8M cells 10 hours
1.7M cells; 70 macros 5 hours
15
High Performance Environment
DesignImport
DetailPlace
DetailRoute*
RCExtract
DelayCalculation
TimingAnalysis
IPODesignIteration
60x
4 m
in
4 hr
1x
3 hr
20
min
2 hr
50
min
56x
8 m
in
7 hr
30
min
57x
6 m
in
5 hr
45
min
33x
7 m
in
3 hr
50
min
7x
20 m
in
2 hr
15
min
5x
1 hr
50
min 9
hr
6x
5 hr
25
min
35 h
r 40
min
• Design 580K cells, 0.25um process, 5LM, 100MHz• Data collected on a 500MHz processor workstation
(*) SPC Trial Route First Encounter
Traditional
16
High Accuracy of the Prototype
• The prototype closely correlates with post-route layout– Comparison to ‘tape-out’ back-end flow– More than 90% of the interconnect and IO path delays within 2%
Design:Design: 5LM 0.25um 580K cells 620K nets 572 I/Os 4 blocks
17
SummarySoC Hierarchical Methodology
• Build a full-chip physical prototype early on– Start at RTL– Identify problems early
• Achieve design closure before partitioning– Close full-chip timing– Optimize die size– Meet power requirements– Resolve signal integrity issues
• Maintain the design closure throughout the design process