30
1 Maxim Integrated Products Ultra High Speed (5Ghz) Block Custom Physical Design Flow with ICC Prakash Jayasekharan Senior PD Engineer Suman Musunuru Senior Design Engineer Maxim Integrated Products

Pa2 Jay Pres

Embed Size (px)

DESCRIPTION

good for icc flow

Citation preview

Page 1: Pa2 Jay Pres

1Maxim Integrated Products

Ultra High Speed (5Ghz) Block Custom

Physical Design Flow with ICC

Prakash Jayasekharan Senior PD Engineer

Suman Musunuru Senior Design Engineer

Maxim Integrated Products

Page 2: Pa2 Jay Pres

2Maxim Integrated Products

Agenda

• Challenges in High speed Physical Design

- Design Constraints, Library and Design issues

• Custom solutions with Synopsys ICC flow

- Matrix re-characterization, Synthesis improvements,

placement sensitive flow, CTS waveform balancing,

Signal EM, power

• Timing/STA correlation results

- Star-RC vs Calibre, ICC vs PT-SI

• Conclusion/Takeaways

• Appendix A

• Appendix B (scripts)

Page 3: Pa2 Jay Pres

3Maxim Integrated Products

Design Constraints

• 65nm SOC design

- 2.4 Million gate

- Block A and Block B @5GHz (200ps period)

- 5% late, 10% early Derating (both clock and data), 5% Jitter

- Target skew ~15ps

Transition ~20ps

Pulse width ~ 80ps

- IR < 3% Peak

(or 30mv Weff)

Page 4: Pa2 Jay Pres

4Maxim Integrated Products

Library Issues

• Re-characterization of timing libraries

- Traditional library tables produce pessimism in timing delay

calculation (setup/delays worst by 10ps at least)

.lib

spice

Page 5: Pa2 Jay Pres

5Maxim Integrated Products

...Library issues

• Extra pessimism not tolerable because

- 10ps for each cell gets added to become significant

- Paths become too tight to fix

• Library is mostly made of weak drive strength buffers,

complex gates. Realistic fanout <5

• Asymmetric clock cells cause low pulse width

• IR drop not part of timing delay tables in .lib

- No way of analyzing timing effect of IR drop

Page 6: Pa2 Jay Pres

6Maxim Integrated Products

Design Issues

• Alternative lower frequency architecture not done

- Will consume 2x area and power

• Very good skew and transition times required

- Very fast transition => higher switching power

=> higher insertion delay

- Weak clock tree cells cause more insertion delay

• > 70% of the logic is sequential. Setup (reg2reg)

timing is critical

• Decap cells for peak IR released late in the flow

- could not be added in block A

Page 7: Pa2 Jay Pres

7Maxim Integrated Products

...Design Issues

• Small coupling caps (1fF) due to size of design- Small nets in the design do not get extracted and can be

dropped . Use coupling_abs_threshold to reduce thresh

• 4 corners for IR/EM, 3 corners for Timing- highV, high Temp added finally for IR/EM

Voltage Temp Tag Description

0.9 125.0 WCCOM Traditional worst case timing

1.1 -40 LTCOM Traditional best case timing

0.9 -40 WCLCOM Temp inversion corner

1.1 125 MLCOM Worst EM/IR/Leakage

Temp

Voltage0.9 1.1

125C

-40C

Page 8: Pa2 Jay Pres

8Maxim Integrated Products

Agenda

• Challenges in High speed Physical Design

- Design Constraints, Library and Design issues

• Custom solutions with Synopsys ICC flow

- Matrix re-characterization, Synthesis improvements,

placement sensitive flow, CTS waveform balancing,

signal EM, power

• Timing/STA correlation results

- Star-RC vs Calibre, ICC vs PT-SI

• Conclusion/Takeaways

• Appendix A

• Appendix B (scripts)

Page 9: Pa2 Jay Pres

9Maxim Integrated Products

Matrix re-characterization

timing() {

related_pin : "cp" ;

timing_type : setup_rising ;

fall_constraint(cnst_ctin_rtin_3x3) {

index_1("0.003, 0.2019, 0.9");

index_2("0.003, 0.2019, 0.9");

values("0.00995, 0.0199, 0.06965",\

"0.08955, 0.1095, 0.2089",\

"0.2189, 0.1791, 0.3184");

}

BEFORE (3x3)

AFTER ( 10x10)

10x10 reduces extra pessimism

timing() {

related_pin : "cp" ;

timing_type : setup_rising ;

fall_constraint(cnst_ctin_rtin_10x10) {

index_1("0.003, 0.009191, 0.03092, 0.07243, 0.1371, \

0.2278, 0.3472, 0.4976, 0.6812, 0.9");

index_2("0.003, 0.009191, 0.03092, 0.07243, 0.1371, \

0.2278, 0.3472, 0.4976, 0.6812, 0.9");

values("0.00995, 0.00995, 0.00995, 0.00995, 0.00995, 0.0199, 0.02985, 0.0398, 0.04975, 0.06965",\

"0.0199, 0.0199, 0.00995, 0.0199, 0.0199, 0.0199, 0.02985, 0.0398, 0.0597, 0.06965",\

"0.02985, 0.02985, 0.02985, 0.02985, 0.02985, 0.0398, 0.04975, 0.0597, 0.06965, 0.08955",\

"0.04975, 0.04975, 0.0398, 0.04975, 0.0597, 0.06965, 0.0796, 0.08955, 0.1095, 0.1194",\

"0.06965, 0.06965, 0.0597, 0.06965, 0.0796, 0.0995, 0.1095, 0.1293, 0.1492, 0.1691",\

"0.08955, 0.08955, 0.0796, 0.08955, 0.0995, 0.1194, 0.1393, 0.1691, 0.199, 0.2288",\

"0.1194, 0.1095, 0.0995, 0.1095, 0.1194, 0.1393, 0.1592, 0.189, 0.2288, 0.2686",\

"0.1492, 0.1393, 0.1194, 0.1293, 0.1393, 0.1492, 0.1791, 0.2089, 0.2487, 0.2885",\

"0.1791, 0.1791, 0.1492, 0.1492, 0.1592, 0.1691, 0.189, 0.2288, 0.2587, 0.3085",\

"0.2189, 0.2189, 0.1791, 0.1791, 0.1791, 0.189, 0.2089, 0.2388, 0.2786, 0.3184");

}

Page 10: Pa2 Jay Pres

10Maxim Integrated Products

Synthesis Improvements

• Very slow cells like XOR, 4:1 Mux, AOI gates prohibited

- some sensitive logic hand instantiated to prevent AOI or XOR

selection

• Register Cloning/Fanout optimization to reduce fanout

- 10-15% increase in sequential area, but helps reduce flop delay

- set_register_replication (DC) can be used

Load Cap =C

Load Cap =C/2

Load Cap =C/2

Page 11: Pa2 Jay Pres

11Maxim Integrated Products

Placement Sensitive Flow

• Cell placement is closely controlled in all stage

• Bad timing due to:

- Placement of cells due to loose constraints

- High buffer insertion to close timing

• Clocks over-constrained by 10% and incremental

psynopts improves timing

- Best possible flop placement achieved

• Clock latency set to simulate post-cts derating in

placement

Page 12: Pa2 Jay Pres

12Maxim Integrated Products

Placement..

Default timing flow

create_placement + psynopt

WNS :-0.05, 50 paths

clock_opt

route_opt + route_opt -incr

WNS:-0.10, 60 paths

WNS:-0.18, 90 paths

Derating

SI+ Wires

Page 13: Pa2 Jay Pres

13Maxim Integrated Products

Placement..

PSFlow create_placement+

psynopt WNS :-0.05, 50 paths

clock_opt –only_cts

route_opt+ route_opt -incr

WNS:+0.005,10 paths

psynopt(1)

psynopt(2)

WNS:-0.10, 80 paths

WNS:-0.025, 50 paths

WNS:-0.08, 20 paths

WNS:-0.015, 10 paths (waived)

route_opt -incr (reg2reg only)

SI +wires

40 ps uncertainty Dont upsizeJust Move

Allow buffer resizing

Removeextra uncertainty (24ps) Don’t move registers

Page 14: Pa2 Jay Pres

14Maxim Integrated Products

CTS-Waveform Balancing

• Getting around clock cells’ asymmetricism

- Decision to use same non-equal duty cycle inverter back to back

to avoid pulse width issues

Page 15: Pa2 Jay Pres

15Maxim Integrated Products

CTS-others

• Register placement is fixed

• Fast transition times help speed up Ck-Q timing

- Also reduces setup times at the flops

• Final duty cycle tolerance -40/60%

• Since skew is very small eliminates hold fixing

Page 16: Pa2 Jay Pres

16Maxim Integrated Products

Power Analysis

• Both blocks are in special power domain (not shared by top )

• Target < 3% (i.e. 33mv)

• IR drop achieved @MLCOM (1.1, 125) is 14 + 17 = 31 mv

Pads block B block A

Top core

Page 17: Pa2 Jay Pres

17Maxim Integrated Products

Power EM

• EM, Rj issues due to high current through buses with

insufficient Vias (Important run for high speed)

• ICC custom route tool used to add extra Via2, M24x2 array 2x pin width

Page 18: Pa2 Jay Pres

18Maxim Integrated Products

Signal EM

Statistical EM

Timing clean up(Worst func mode for power )

Simulate/generate vcd /saif file.

SAIF based EM

* Fix Signal EM

Iterations

Fix Signal EM(If any)

Fix minor DRCs/Antennas

Repeat for critical functional modes.

* fix_signal_em (or) script

STA

ReducedTiming

Iterations

Page 19: Pa2 Jay Pres

19Maxim Integrated Products

...Signal EM

• Sample EM fix with repair file (clock widened 2x to 4x)

Page 20: Pa2 Jay Pres

20Maxim Integrated Products

Agenda

• Challenges in High speed Physical Design

- Design Constraints, Library and Design issues

• Custom solutions with Synopsys-ICC flow

- Matrix re-characterization, Synthesis improvements,

placement sensitive flow, CTS waveform balancing ,

signal EM, power

• Timing/STA correlation results

- Star-RC vs Calibre, ICC vs PT-SI

• Conclusion/Takeaways

• Appendix A

• Appendix B (scripts)

Page 21: Pa2 Jay Pres

21Maxim Integrated Products

Correlation

• Bottom up flow to make sure ICC settings are close

enough to PrimeTime, Star-RC ( Solvnet IC Compiler

Correlation Checklist Trilogy )

• Extraction Settings

OPERATING_TEMPERATURE: 25, COUPLE_TO_GROUND: NO,

COUPLING_ABS_THRESHOLD: 1e-15 , MODE=400 ,

EXTRACT_VIA_CAPS =YES

• Noise / Timing Settings

set db_load_ccs_noise_data true, set timing_crpr_threshold_ps 0,

set si_filter_accum_aggr_noise_peak_ratio 0.2

Page 22: Pa2 Jay Pres

22Maxim Integrated Products

Star-RC vs Calibre spef

• Block B: Star-RC within 8% mean

Page 23: Pa2 Jay Pres

23Maxim Integrated Products

ICC vs PT-SI slack

• Block B: ICC (4ps) slightly pessimistic vs PT (2ps)

# P

ath

s

WNS(ns) WNS(ns)0.000-0.002 0.005 0.0110.000-0.004 0.005 0.009

# P

ath

s

Page 24: Pa2 Jay Pres

24Maxim Integrated Products

Agenda

• Challenges in High speed Physical Design

- Design Constraints, Library and Design issues

• Custom solutions with Synopsys-ICC flow

- Matrix re-characterization, Synthesis improvements,

placement sensitive flow, CTS waveform balancing ,

signal EM, power.

• Timing/STA correlation results

- Star-RC vs Calibre, ICC vs PT-SI

• Conclusion / Takeaways

• Appendix A

• Appendix B (scripts)

Page 25: Pa2 Jay Pres

25Maxim Integrated Products

Conclusion / Takeaways

• Fix Library Issues

- Good range of cells with decent strengths for optimization

- Cell names must be user friendly to limit use (for better EM/IR)

- Larger matrices for setup/pulse timing to prevent timing

pessimism

- Symmetric clock cells tagged with special naming

- Don’t use cells should be clearly marked

• Fix Process Corners (e.g. MLcom , WCLcom)

- Special situations like Temperature inversion for timing, High

Temp corners for leakage, peak IR drop should be known well in

advance

Page 26: Pa2 Jay Pres

26Maxim Integrated Products

…Conclusion / Takeaways

• Think Top level- Think about next stage, top level

• Correlate (SolvNet : IC Compiler Correlation Checklist Trilogy )

- Star-RC / ICC extraction should be correlated to device level

- PT-SI and ICC noise settings should be checked

• Tune ICC to meet requirements (e.g. custom

placement, custom cts, custom router, etc…)

- Get to know all options available

- Script for Reusability

Page 27: Pa2 Jay Pres

27Maxim Integrated Products

Thanks…

Synopsys Hotline

• Filed and accepted requests for EM gui and temperature scaling

• Retaining FILLs in soft block while after flattening

• Ability to check min grid during zroute verify

Others

1. KhanKap Mounarath – Sr. Scientist, Maxim

2. DSM group/ Library , Maxim EDA

3. Bill Sicaras - Synopsys AC

Page 28: Pa2 Jay Pres

28Maxim Integrated Products

Appendix A

• PT-SI and Spice correlation

Spice level simulation performed on the worst path

Startpoint: clk_div_0/div_by2_by4_0/sig_i4_reg(rising edge-triggered flip-flop clocked by dac_clk1)

Endpoint: clk_div_0/div_by2_by4_0/sig_i4_reg(rising edge-triggered flip-flop clocked by dac_clk1)

Path Group: dac_clk1Path Type: max

∑ ( launch clock delay + CK-Q delay + combinational delay to the

Endpoint register ) is within 5% for Block B

Page 29: Pa2 Jay Pres

29Maxim Integrated Products

Appendix B (scripts)

Script used for placement## Source the common settings for placement and optimization

source common_placement_settings_icc.tcl

set placer_max_cell_density_threshold 0.68

## 15% of the clock period which is 200ps is 30ps

## 30ps plus 10ps uncertainty is 40ps overconstraining

set_timing_derate late 1.15

set_clock_uncertainty 0.01 [all_clocks]

set_critical_range 0.090 cd18_decoder_dac

## INITIAL PLACEMENT

create_placement effort high congestion congestion_effort high

legalize_placement

## FIRST ROUND OF optimizations

set_dont_touch [get_cells * ]

set_dont_touch [get_nets * ]

psynopt

## tighten the output paths

set_clock_uncertainty 0.015 [all_clocks]

set_clock_latency 0.200 [get_clocks dac_clk]

set_clock_latency 0.100 [get_clocks dac_clko]

psynopt

## SECOND ROUND OF optimization

## Remove the dont touches and let the tool optimize the

## timing more . ( upsize cells etc. )

remove_attribute [get_cells hier *] dont_touch quiet

remove_attribute [get_nets hier *] dont_touch quiet

## do not optimize some sensitive logic

set_dont_touch [get_cells U*]

psynopt

## save cell and report timing ##

Page 30: Pa2 Jay Pres

30Maxim Integrated Products

Appendix B

Script used for CTS

# DON’T MOVE CAREFULLY PLACED CELLS

set_dont_touch_placement [get_cells hier *_reg* ]

set_attribute [get_cells hier spr*] is_fixed true

remove_clock_tree clock_trees { dac_clk dac_clko} honor_dont_touch

reset_clock_tree_references

define_routing_rule decoder_clk_shield_rule default_reference_rule taper_level 0

multi

lier_width 2 multiplier_spacing 1 shield

## CONTROL TRANSITION FOR CLOCKS

## RELAX BUFFER LEVLES TO l help fix fanout

set_clock_tree_options layer_list $runOption(input,clkRoutelayerList) routing_rule

ecoder_clk_shield_rule use_default_routing_for_sinks 1 target_skew 0.010

max_buffer_levels 9 max_transition .024

set_clock_tree_options clock_trees dac_clk routing_rule decoder_clk_shield_rule \

use_default_routing_for_sinks 1 target_skew 0.010 max_buffer_levels 9

set_max_fanout 2 [get_ports dac_clk]

set_max_fanout 2 [get_ports dac_clko]

## Tighter transition on output clk. timing is ok.

set_clock_tree_options clock_trees dac_clko max_buffer_levels 3 max_transition 0.022

check_clock_tree clocks dac_clk

report_clock_tree summary clock_trees dac_clk level_info

report_clock_tree show_all_sinks

report_clock_tree settings > clktree/settings.rpt

update_clock_latency

## Turn on removal and recovery check ##

set enable_recovery_removal_arcs true

## Perform clock tree synthesis only

clock_opt only_cts operating_condition min_max