Upload
kirtesh-tiwari
View
143
Download
22
Embed Size (px)
DESCRIPTION
good for icc flow
Citation preview
1Maxim Integrated Products
Ultra High Speed (5Ghz) Block Custom
Physical Design Flow with ICC
Prakash Jayasekharan Senior PD Engineer
Suman Musunuru Senior Design Engineer
Maxim Integrated Products
2Maxim Integrated Products
Agenda
• Challenges in High speed Physical Design
- Design Constraints, Library and Design issues
• Custom solutions with Synopsys ICC flow
- Matrix re-characterization, Synthesis improvements,
placement sensitive flow, CTS waveform balancing,
Signal EM, power
• Timing/STA correlation results
- Star-RC vs Calibre, ICC vs PT-SI
• Conclusion/Takeaways
• Appendix A
• Appendix B (scripts)
3Maxim Integrated Products
Design Constraints
• 65nm SOC design
- 2.4 Million gate
- Block A and Block B @5GHz (200ps period)
- 5% late, 10% early Derating (both clock and data), 5% Jitter
- Target skew ~15ps
Transition ~20ps
Pulse width ~ 80ps
- IR < 3% Peak
(or 30mv Weff)
4Maxim Integrated Products
Library Issues
• Re-characterization of timing libraries
- Traditional library tables produce pessimism in timing delay
calculation (setup/delays worst by 10ps at least)
.lib
spice
5Maxim Integrated Products
...Library issues
• Extra pessimism not tolerable because
- 10ps for each cell gets added to become significant
- Paths become too tight to fix
• Library is mostly made of weak drive strength buffers,
complex gates. Realistic fanout <5
• Asymmetric clock cells cause low pulse width
• IR drop not part of timing delay tables in .lib
- No way of analyzing timing effect of IR drop
6Maxim Integrated Products
Design Issues
• Alternative lower frequency architecture not done
- Will consume 2x area and power
• Very good skew and transition times required
- Very fast transition => higher switching power
=> higher insertion delay
- Weak clock tree cells cause more insertion delay
• > 70% of the logic is sequential. Setup (reg2reg)
timing is critical
• Decap cells for peak IR released late in the flow
- could not be added in block A
7Maxim Integrated Products
...Design Issues
• Small coupling caps (1fF) due to size of design- Small nets in the design do not get extracted and can be
dropped . Use coupling_abs_threshold to reduce thresh
• 4 corners for IR/EM, 3 corners for Timing- highV, high Temp added finally for IR/EM
Voltage Temp Tag Description
0.9 125.0 WCCOM Traditional worst case timing
1.1 -40 LTCOM Traditional best case timing
0.9 -40 WCLCOM Temp inversion corner
1.1 125 MLCOM Worst EM/IR/Leakage
Temp
Voltage0.9 1.1
125C
-40C
8Maxim Integrated Products
Agenda
• Challenges in High speed Physical Design
- Design Constraints, Library and Design issues
• Custom solutions with Synopsys ICC flow
- Matrix re-characterization, Synthesis improvements,
placement sensitive flow, CTS waveform balancing,
signal EM, power
• Timing/STA correlation results
- Star-RC vs Calibre, ICC vs PT-SI
• Conclusion/Takeaways
• Appendix A
• Appendix B (scripts)
9Maxim Integrated Products
Matrix re-characterization
timing() {
related_pin : "cp" ;
timing_type : setup_rising ;
fall_constraint(cnst_ctin_rtin_3x3) {
index_1("0.003, 0.2019, 0.9");
index_2("0.003, 0.2019, 0.9");
values("0.00995, 0.0199, 0.06965",\
"0.08955, 0.1095, 0.2089",\
"0.2189, 0.1791, 0.3184");
}
BEFORE (3x3)
AFTER ( 10x10)
10x10 reduces extra pessimism
timing() {
related_pin : "cp" ;
timing_type : setup_rising ;
fall_constraint(cnst_ctin_rtin_10x10) {
index_1("0.003, 0.009191, 0.03092, 0.07243, 0.1371, \
0.2278, 0.3472, 0.4976, 0.6812, 0.9");
index_2("0.003, 0.009191, 0.03092, 0.07243, 0.1371, \
0.2278, 0.3472, 0.4976, 0.6812, 0.9");
values("0.00995, 0.00995, 0.00995, 0.00995, 0.00995, 0.0199, 0.02985, 0.0398, 0.04975, 0.06965",\
"0.0199, 0.0199, 0.00995, 0.0199, 0.0199, 0.0199, 0.02985, 0.0398, 0.0597, 0.06965",\
"0.02985, 0.02985, 0.02985, 0.02985, 0.02985, 0.0398, 0.04975, 0.0597, 0.06965, 0.08955",\
"0.04975, 0.04975, 0.0398, 0.04975, 0.0597, 0.06965, 0.0796, 0.08955, 0.1095, 0.1194",\
"0.06965, 0.06965, 0.0597, 0.06965, 0.0796, 0.0995, 0.1095, 0.1293, 0.1492, 0.1691",\
"0.08955, 0.08955, 0.0796, 0.08955, 0.0995, 0.1194, 0.1393, 0.1691, 0.199, 0.2288",\
"0.1194, 0.1095, 0.0995, 0.1095, 0.1194, 0.1393, 0.1592, 0.189, 0.2288, 0.2686",\
"0.1492, 0.1393, 0.1194, 0.1293, 0.1393, 0.1492, 0.1791, 0.2089, 0.2487, 0.2885",\
"0.1791, 0.1791, 0.1492, 0.1492, 0.1592, 0.1691, 0.189, 0.2288, 0.2587, 0.3085",\
"0.2189, 0.2189, 0.1791, 0.1791, 0.1791, 0.189, 0.2089, 0.2388, 0.2786, 0.3184");
}
10Maxim Integrated Products
Synthesis Improvements
• Very slow cells like XOR, 4:1 Mux, AOI gates prohibited
- some sensitive logic hand instantiated to prevent AOI or XOR
selection
• Register Cloning/Fanout optimization to reduce fanout
- 10-15% increase in sequential area, but helps reduce flop delay
- set_register_replication (DC) can be used
Load Cap =C
Load Cap =C/2
Load Cap =C/2
11Maxim Integrated Products
Placement Sensitive Flow
• Cell placement is closely controlled in all stage
• Bad timing due to:
- Placement of cells due to loose constraints
- High buffer insertion to close timing
• Clocks over-constrained by 10% and incremental
psynopts improves timing
- Best possible flop placement achieved
• Clock latency set to simulate post-cts derating in
placement
12Maxim Integrated Products
Placement..
Default timing flow
create_placement + psynopt
WNS :-0.05, 50 paths
clock_opt
route_opt + route_opt -incr
WNS:-0.10, 60 paths
WNS:-0.18, 90 paths
Derating
SI+ Wires
13Maxim Integrated Products
Placement..
PSFlow create_placement+
psynopt WNS :-0.05, 50 paths
clock_opt –only_cts
route_opt+ route_opt -incr
WNS:+0.005,10 paths
psynopt(1)
psynopt(2)
WNS:-0.10, 80 paths
WNS:-0.025, 50 paths
WNS:-0.08, 20 paths
WNS:-0.015, 10 paths (waived)
route_opt -incr (reg2reg only)
SI +wires
40 ps uncertainty Dont upsizeJust Move
Allow buffer resizing
Removeextra uncertainty (24ps) Don’t move registers
14Maxim Integrated Products
CTS-Waveform Balancing
• Getting around clock cells’ asymmetricism
- Decision to use same non-equal duty cycle inverter back to back
to avoid pulse width issues
15Maxim Integrated Products
CTS-others
• Register placement is fixed
• Fast transition times help speed up Ck-Q timing
- Also reduces setup times at the flops
• Final duty cycle tolerance -40/60%
• Since skew is very small eliminates hold fixing
16Maxim Integrated Products
Power Analysis
• Both blocks are in special power domain (not shared by top )
• Target < 3% (i.e. 33mv)
• IR drop achieved @MLCOM (1.1, 125) is 14 + 17 = 31 mv
Pads block B block A
Top core
17Maxim Integrated Products
Power EM
• EM, Rj issues due to high current through buses with
insufficient Vias (Important run for high speed)
• ICC custom route tool used to add extra Via2, M24x2 array 2x pin width
18Maxim Integrated Products
Signal EM
Statistical EM
Timing clean up(Worst func mode for power )
Simulate/generate vcd /saif file.
SAIF based EM
* Fix Signal EM
Iterations
Fix Signal EM(If any)
Fix minor DRCs/Antennas
Repeat for critical functional modes.
* fix_signal_em (or) script
STA
ReducedTiming
Iterations
19Maxim Integrated Products
...Signal EM
• Sample EM fix with repair file (clock widened 2x to 4x)
20Maxim Integrated Products
Agenda
• Challenges in High speed Physical Design
- Design Constraints, Library and Design issues
• Custom solutions with Synopsys-ICC flow
- Matrix re-characterization, Synthesis improvements,
placement sensitive flow, CTS waveform balancing ,
signal EM, power
• Timing/STA correlation results
- Star-RC vs Calibre, ICC vs PT-SI
• Conclusion/Takeaways
• Appendix A
• Appendix B (scripts)
21Maxim Integrated Products
Correlation
• Bottom up flow to make sure ICC settings are close
enough to PrimeTime, Star-RC ( Solvnet IC Compiler
Correlation Checklist Trilogy )
• Extraction Settings
OPERATING_TEMPERATURE: 25, COUPLE_TO_GROUND: NO,
COUPLING_ABS_THRESHOLD: 1e-15 , MODE=400 ,
EXTRACT_VIA_CAPS =YES
• Noise / Timing Settings
set db_load_ccs_noise_data true, set timing_crpr_threshold_ps 0,
set si_filter_accum_aggr_noise_peak_ratio 0.2
22Maxim Integrated Products
Star-RC vs Calibre spef
• Block B: Star-RC within 8% mean
23Maxim Integrated Products
ICC vs PT-SI slack
• Block B: ICC (4ps) slightly pessimistic vs PT (2ps)
# P
ath
s
WNS(ns) WNS(ns)0.000-0.002 0.005 0.0110.000-0.004 0.005 0.009
# P
ath
s
24Maxim Integrated Products
Agenda
• Challenges in High speed Physical Design
- Design Constraints, Library and Design issues
• Custom solutions with Synopsys-ICC flow
- Matrix re-characterization, Synthesis improvements,
placement sensitive flow, CTS waveform balancing ,
signal EM, power.
• Timing/STA correlation results
- Star-RC vs Calibre, ICC vs PT-SI
• Conclusion / Takeaways
• Appendix A
• Appendix B (scripts)
25Maxim Integrated Products
Conclusion / Takeaways
• Fix Library Issues
- Good range of cells with decent strengths for optimization
- Cell names must be user friendly to limit use (for better EM/IR)
- Larger matrices for setup/pulse timing to prevent timing
pessimism
- Symmetric clock cells tagged with special naming
- Don’t use cells should be clearly marked
• Fix Process Corners (e.g. MLcom , WCLcom)
- Special situations like Temperature inversion for timing, High
Temp corners for leakage, peak IR drop should be known well in
advance
26Maxim Integrated Products
…Conclusion / Takeaways
• Think Top level- Think about next stage, top level
• Correlate (SolvNet : IC Compiler Correlation Checklist Trilogy )
- Star-RC / ICC extraction should be correlated to device level
- PT-SI and ICC noise settings should be checked
• Tune ICC to meet requirements (e.g. custom
placement, custom cts, custom router, etc…)
- Get to know all options available
- Script for Reusability
27Maxim Integrated Products
Thanks…
Synopsys Hotline
• Filed and accepted requests for EM gui and temperature scaling
• Retaining FILLs in soft block while after flattening
• Ability to check min grid during zroute verify
Others
1. KhanKap Mounarath – Sr. Scientist, Maxim
2. DSM group/ Library , Maxim EDA
3. Bill Sicaras - Synopsys AC
28Maxim Integrated Products
Appendix A
• PT-SI and Spice correlation
Spice level simulation performed on the worst path
Startpoint: clk_div_0/div_by2_by4_0/sig_i4_reg(rising edge-triggered flip-flop clocked by dac_clk1)
Endpoint: clk_div_0/div_by2_by4_0/sig_i4_reg(rising edge-triggered flip-flop clocked by dac_clk1)
Path Group: dac_clk1Path Type: max
∑ ( launch clock delay + CK-Q delay + combinational delay to the
Endpoint register ) is within 5% for Block B
29Maxim Integrated Products
Appendix B (scripts)
Script used for placement## Source the common settings for placement and optimization
source common_placement_settings_icc.tcl
set placer_max_cell_density_threshold 0.68
## 15% of the clock period which is 200ps is 30ps
## 30ps plus 10ps uncertainty is 40ps overconstraining
set_timing_derate late 1.15
set_clock_uncertainty 0.01 [all_clocks]
set_critical_range 0.090 cd18_decoder_dac
## INITIAL PLACEMENT
create_placement effort high congestion congestion_effort high
legalize_placement
## FIRST ROUND OF optimizations
set_dont_touch [get_cells * ]
set_dont_touch [get_nets * ]
psynopt
## tighten the output paths
set_clock_uncertainty 0.015 [all_clocks]
set_clock_latency 0.200 [get_clocks dac_clk]
set_clock_latency 0.100 [get_clocks dac_clko]
psynopt
## SECOND ROUND OF optimization
## Remove the dont touches and let the tool optimize the
## timing more . ( upsize cells etc. )
remove_attribute [get_cells hier *] dont_touch quiet
remove_attribute [get_nets hier *] dont_touch quiet
## do not optimize some sensitive logic
set_dont_touch [get_cells U*]
psynopt
## save cell and report timing ##
30Maxim Integrated Products
Appendix B
Script used for CTS
# DON’T MOVE CAREFULLY PLACED CELLS
set_dont_touch_placement [get_cells hier *_reg* ]
set_attribute [get_cells hier spr*] is_fixed true
remove_clock_tree clock_trees { dac_clk dac_clko} honor_dont_touch
reset_clock_tree_references
define_routing_rule decoder_clk_shield_rule default_reference_rule taper_level 0
multi
lier_width 2 multiplier_spacing 1 shield
## CONTROL TRANSITION FOR CLOCKS
## RELAX BUFFER LEVLES TO l help fix fanout
set_clock_tree_options layer_list $runOption(input,clkRoutelayerList) routing_rule
ecoder_clk_shield_rule use_default_routing_for_sinks 1 target_skew 0.010
max_buffer_levels 9 max_transition .024
set_clock_tree_options clock_trees dac_clk routing_rule decoder_clk_shield_rule \
use_default_routing_for_sinks 1 target_skew 0.010 max_buffer_levels 9
set_max_fanout 2 [get_ports dac_clk]
set_max_fanout 2 [get_ports dac_clko]
## Tighter transition on output clk. timing is ok.
set_clock_tree_options clock_trees dac_clko max_buffer_levels 3 max_transition 0.022
check_clock_tree clocks dac_clk
report_clock_tree summary clock_trees dac_clk level_info
report_clock_tree show_all_sinks
report_clock_tree settings > clktree/settings.rpt
update_clock_latency
## Turn on removal and recovery check ##
set enable_recovery_removal_arcs true
## Perform clock tree synthesis only
clock_opt only_cts operating_condition min_max