36
03/25/22 Ph.D. Preliminary Exam 1 Ph.D. Preliminary Exam Mozammel Hossain Colorado State University Department of Electrical and Computer Engineering Nest Circuit Lead, IBM, Austin, TX Advisor: Prof. Tom W. Chen Committee Members: Prof. Yashwant Malaiya Dr. Sudeep Pasricha Dr. Ali Pezeshki

Ph.D. Preliminary Exam

Embed Size (px)

DESCRIPTION

Mozammel Hossain Colorado State University Department of Electrical and Computer Engineering Nest Circuit Lead, IBM, Austin, TX Advisor: Prof. Tom W. Chen Committee Members: Prof. Yashwant Malaiya Dr. Sudeep Pasricha Dr. Ali Pezeshki. Ph.D. Preliminary Exam. - PowerPoint PPT Presentation

Citation preview

Page 1: Ph.D. Preliminary Exam

04/19/23Ph.D. Preliminary Exam 1

Ph.D. Preliminary Exam

Mozammel HossainColorado State University

Department of Electrical and Computer EngineeringNest Circuit Lead, IBM, Austin, TX

Advisor: Prof. Tom W. Chen

Committee Members: Prof. Yashwant Malaiya

Dr. Sudeep PasrichaDr. Ali Pezeshki

Page 2: Ph.D. Preliminary Exam

04/19/23Ph.D. Preliminary Exam 2

Research Area

• Synthesis Based Design and Implementation Methodology of • High Speed, High Performing Unit (LBS)• Sync-Async Interface timing • Arrays with clock gating

• To convert to synthesizable macro

Page 3: Ph.D. Preliminary Exam

04/19/23Ph.D. Preliminary Exam 3

Outlines

Introduction Overview of Present Synthesis Methodology Future: Research and innovation in Synthesis Methodology

– Problem definitions1. Large Block Synthesis (LBS): L2 Cache Unit

2. Sync-Async Interface timing

3. Clock Gating support for Array Design

– Approaches– Preliminary results

Conclusion and Future Work Acknowledgement

Page 4: Ph.D. Preliminary Exam

04/19/23Ph.D. Preliminary Exam 4

Introduction

Technology market demand faster turn around of IC design and designers struggle to meet performance requirements.

Increasing costs for design, validation, and time to market. past generations of microprocessors had more custom circuit

design to meet tighter cycle time battle. moving towards common synthesizable design methodology and

most cases sacrificing desired speed of the chip in favor of new functionality and time to market.

Page 5: Ph.D. Preliminary Exam

04/19/23Ph.D. Preliminary Exam 5

Introduction: Design Methodology

Page 6: Ph.D. Preliminary Exam

04/19/23 Ph.D. Preliminary Exam 6

Introduction: Macro Design Spectrum

5) Custom design(conventional)

1) VHDL structuring,parm customization

2) Preplace lcb/latches

3) Embed custom

components

4) Custom prerouting

0) “Vanilla” synthesis

Design Effort

De

sig

n C

ust

om

iza

tio

n

ATTRIBUTE BLOCK_DATA of add64 : label is "LOGIC_STYLE=/xxxx/";

Page 7: Ph.D. Preliminary Exam

04/19/23Ph.D. Preliminary Exam 7

Introduction:Trend of Design methodology for last 16 years

Macro Design History in L2 Unit for las 16 Years

0%10%20%30%40%50%60%70%80%90%

Custom Hybrid rlm LBS

Design Methodology

% o

f Mac

ro T

ype 90nm

65nm

45nm

32nm

22nm

14nm

Page 8: Ph.D. Preliminary Exam

04/19/23Ph.D. Preliminary Exam 8

Synthesis – VHDL – compile vhdl– PDSRTL – front-end synthesis– PDSEMPAD – early mode padding– MAR – routing– RAPIDS – post routing optimization

– PROMOTE – promote routed design Run all backend tools (PDV, extraction, timing)

Overview of Present Synthesis Methodology

Page 9: Ph.D. Preliminary Exam

04/19/23 Ph.D. Preliminary Exam 9

Backend toolsPDVRLMB

Cadence SpaceMAR/Rapids

Overview of Present Synthesis Methodology

Page 10: Ph.D. Preliminary Exam

04/19/23Ph.D. Preliminary Exam 10

Slack sharing Example:

• Look at timing across multiple latches• Consider sharing positive slack

Broken path Has marginto share

Overview of Present Synthesis Methodology

Page 11: Ph.D. Preliminary Exam

04/19/23Ph.D. Preliminary Exam 11

Slack sharing Example:

• Delayed 1st Clock by 17 ps• Balanced slack of +3ps across 2 latches

Balanced Slack Balanced Slack

Overview of Present Synthesis Methodology

Page 12: Ph.D. Preliminary Exam

04/19/23Ph.D. Preliminary Exam 12

Overview of Present Synthesis Methodology

Works very well on – Traditional control macro with 2.5-5M Transistors or about

20K-40K Latches– Timing non-critical macro– Non-embeded IP macro– Without parent’s blockages

Unit buffer, latches, clock blockages

– Slack sharing at synchronous clock domain– Without clock gating after Local Clock Buffer (LCB)

Page 13: Ph.D. Preliminary Exam

04/19/23Ph.D. Preliminary Exam 13

Future:Research and innovation in Synthesis Methodology

1. Problem definition: Large Block Synthesis (LBS)– Current methodology does not work well for much bigger design: L2

Cache Unit (20M Transistor) Need techniques such as IP pre-placement, dataflow structuring, and

hierarchical embedded synthesis. Need techniques for Wire Trait, soft hierarchy, Interior PIN Congestion analysis at Critical timing and wiring area.

– Develop Synthesis Methodology to support Significant Shorter Design Cycle Significant Physical Design Resources Reduction Potential Area Reduction

Page 14: Ph.D. Preliminary Exam

04/19/23Ph.D. Preliminary Exam 14

LBS test case to develop methodology: – Why L2 Cache Unit?

Area challenged unit Has both 1:1 and 2:1 clocking methodology

– 1:1 Clocking is same clock speed as Core clock Paths on1:1 clocking, are highly timing challenged Require Dual voltage routing and clock gating Combination of data flow and control macros Big unit to challenge tool flow run time and data management

Future:Research and innovation in Synthesis Methodology

Page 15: Ph.D. Preliminary Exam

04/19/23 Ph.D. Preliminary Exam 15

C5C4C3

C8C7C6 C11C10C9

C2C1C0

L3 Unit

Core

L2 Unit

Why L2 unit as test case?

Page 16: Ph.D. Preliminary Exam

04/19/23Ph.D. Preliminary Exam 16

LBS: Why L2 unit as test case?Total Cache size: 512KByte

• >4 GHz with core interface, control and Data Flow interface• >2 GHz with cache, dir, address, L3 and Fabric interface

Unit Size: > 4.0 sq mm in 22nm, Total Black Box: 82#of Transistor including cache: 44M # of Synthesizable Transistor: 19M

Future:Research and innovation in Synthesis Methodology

Page 17: Ph.D. Preliminary Exam

04/19/23Ph.D. Preliminary Exam 17

LBS: Physical Design Resource Comparison with Proposed Methodology

Physical Design Resources

Traditional Approach(man month)

Synthesizable Unit Approach(man month)

Ckt. Designer 18 0

Unit Timer 6 0

Unit Integrator 6 0

Unit Ckt. Lead 6 12

Total Resources 36 12

Future:Research and innovation in Synthesis Methodology

Page 18: Ph.D. Preliminary Exam

04/19/23Ph.D. Preliminary Exam 18

Future:Research and innovation in Synthesis Methodology

2. Problem definition: Synthesis timing methodology for Sync-Async interface.

– Slack Sharing can not be done at Sync-Async Interface. Can result in meta-stable condition . Need to develop a methodology.

– To handle Slack sharing in synthesis and timing environment Identify latches involved. Turn-ff slack sharing.

– For Design Automation.

Page 19: Ph.D. Preliminary Exam

04/19/23Ph.D. Preliminary Exam 19

Slack Sharing can not be done at Sync-Async Interface

Future:Research and innovation in Synthesis Methodology

Page 20: Ph.D. Preliminary Exam

04/19/23Ph.D. Preliminary Exam 20

Slack-Sharing at Sync-Async Interface can result in Meta-stability condition

Meta-stabilityAt Latch point

Future:Research and innovation in Synthesis Methodology

Page 21: Ph.D. Preliminary Exam

04/19/23Ph.D. Preliminary Exam 21

3. Problem definition: Clock Gating support for Array Design in Synthesis Methodology.

Compliable Array offers fixed menu with limited read write ports.– Does not support clock gating.

Current methodology does not allow any gates between LCB (Local Clock Buffer) and Latch to prevent electrical rule violation.

Wiring, gate placement & timing constraints need to be developed. Minimum custom design: Only Array Column Potential Benefits:

– Around 20% Physical Design Resources Reduction. – Significant Shorter Design Cycle– Apply learning to other array design for more savings.– Potential area saving in Synthesis flow.

Future:Research and innovation in Synthesis Methodology

Page 22: Ph.D. Preliminary Exam

04/19/23Ph.D. Preliminary Exam 22

Proposed Array Design in Synthesis Methodology

• LCB: Local Clock Buffer• Generate CLK for MS Latch

Future:Research and innovation in Synthesis Methodology

Page 23: Ph.D. Preliminary Exam

04/19/23Ph.D. Preliminary Exam 23

Approaches:

Pre-Placing Hard IP in LBS– Pseudo Algorithm

begin_place place <inst_name> xloc <> yloc <> <rot>

movetype=fixedend_place

Wire Trait Example in LBS– Pseudo Parms file

<Flow>: <wire_code> <time gain> <routing layers> synthesis_layer_traits : W20S10L15 3 3 M2 X3 fine_opt_layer_traits : W20S10L15 3 3 M2 X3

Page 24: Ph.D. Preliminary Exam

04/19/23Ph.D. Preliminary Exam 24

Approaches: Soft-Hierarchy in LBS

Algorithm:inst_name=rlctl prefix=l2rlctl xlow=< > ylow=< > width= height=

where<inst_name>: user specified name to recognize gatesprefix: is the name of logic gates used in VHDLxlow, ylow= left lower coordinatewidth, height: width and height of macro in micron

Page 25: Ph.D. Preliminary Exam

04/19/23Ph.D. Preliminary Exam 25

Approaches: Synthesis Parms in LBS

VT Upgrade*user_native_vt: 1

*user_alternate_vt: 2 3

Interior PIN*pds_assign_interior_pins: true

*pds_pin_spec: “<metal layer> <width> <height>“

*pds_horizontal_pin_spacing: “<metal layer> <Spacing>"

*pds_vertical_pin_spacing: “<metal layer> <Spacing>”

Rapids

Page 26: Ph.D. Preliminary Exam

04/19/23Ph.D. Preliminary Exam 26

Approaches: Congestion Analysis

• Routing resource allocation at top level• Negotiate routing resources with macro (IP)• Negotiate PIN placement with macro (IP)

Page 27: Ph.D. Preliminary Exam

04/19/23Ph.D. Preliminary Exam 27

Application of Sync-Async Latch:

Approaches: Synthesis Methodology at Sync-Async Interface

Logic Logic

data_in

data_out

Sync-AsyncLatch

LatchLatch

NCLK NASYNCNASYNC

Page 28: Ph.D. Preliminary Exam

04/19/23Ph.D. Preliminary Exam 28

Pseudo Algorithm to exclude Sync-Async Latch in slack borrowing:

Approaches: Synthesis Methodology at Sync-Async Interface

Page 29: Ph.D. Preliminary Exam

04/19/23Ph.D. Preliminary Exam 29

Preliminary Results: Placed and Timed Gates of L2

#of Transistor including cache: 44M# of Synthesizable Transistor: 19M

Page 30: Ph.D. Preliminary Exam

04/19/23Ph.D. Preliminary Exam 30

Preliminary Results: Slack Take Down of L2

Slack Data

0

5000

10000

15000

20000

25000

30000

35000

40000

>40ps 40-36ps 35-31ps 30-26ps 26-21ps 20-16ps 15-11ps 10-6ps 5-0ps

Negative Slacks

# o

f Fa

iled

Pa

th Base line

Soft Hierarchy

Vt Upgrade

Interior PIN

Rapids

eFinale

Page 31: Ph.D. Preliminary Exam

04/19/23Ph.D. Preliminary Exam 31

Preliminary Results: Clock Gating at Array interface

LCB

LCB

• Clock gating is not working• Red shape/line: Current Routing and Placement

• Violates timing at array cell, Electrical check• Blue shape/line: Desired Routing and Placement

L

L

LCB

Page 32: Ph.D. Preliminary Exam

04/19/23Ph.D. Preliminary Exam 32

• Clock gating is not working

Preliminary Results: Clock Gating at Array interface

Page 33: Ph.D. Preliminary Exam

04/19/23Ph.D. Preliminary Exam 33

Conclusion and Future Works

With robust tool sets, newly proposed synthesis methodology and design guideline, L2 cache unit design can take almost ~50% less resources to design even without dedicated unit timing and integration resources.

– Preliminary data is very promising. – Further Experiment with 10% less unit area once design is closed.

Timing at Sync-Async interface methodology in Synthesis flow is being developed with user controlled parms.

Clock-Gating work in progress with collaboration from the Tool development team of IBM.

– Save 20% of design effort at present application in RF design– Potential lead to more physical design effort savings in all type of array

design. i.e SRAM, CAM, DRAM

Page 34: Ph.D. Preliminary Exam

04/19/23Ph.D. Preliminary Exam 34

Acknowledgement

Advisor: Prof. Tom W. Chen

Committee Members: Prof. Yashwant Malaiya

Dr. Sudeep PasrichaDr. Ali Pezeshki

IBM:Joshua Friedrich

Dr. Vikas AgarwalChirag DesaiJohn Badar

Page 35: Ph.D. Preliminary Exam

04/19/23Ph.D. Preliminary Exam 35

Page 36: Ph.D. Preliminary Exam

04/19/23Ph.D. Preliminary Exam 36

Personal Background

Educational– BS in Electrical Engineering, BUET, Dhaka, Bangladesh– ME in Electrical Engineering, CUNY, New York

Professional– Product Development Engineer, Advanced Micro Devices (AMD), TX:1994 – 1997

Circuit Design, Critical timing path analysis, Layout for K5 development team– Hardware Development Engineer, Mentor Graphics Corporation, NJ: 1997 -1999

Test chip, Data Path Design, verilog model for ROM/RAM, – Member of Technical Staff, Hewlett Packard (HP), CO: 1999 – 2002

Circuit design for FPU, High Speed IO Driver, Place and route, Timing analysis– Senior Engineer, International Business Machines (IBM), TX: 2003 – Present

Fabric Unit interim/co-Circuit Lead P6 GX, TP, CLIB, PC Unit Circuit Lead: P6 DD1 L2, L3, NCU Circuit Lead: P6 DD2 L2, NCU Circuit Lead: P7 DD1, DD2 Nest Circuit Lead for P8, P9