34
1

Neeraj Paliwal Senior Engineering Manager Advanced Processor Development

  • Upload
    louise

  • View
    32

  • Download
    0

Embed Size (px)

DESCRIPTION

CAD Challenges For Designing A High Frequency Multi-Core SoC Implementation Of The First-Generation CELL Processor . Neeraj Paliwal Senior Engineering Manager Advanced Processor Development IBM Corporation, Austin TX. Outline. Introduction  Design Goals - PowerPoint PPT Presentation

Citation preview

Page 1: Neeraj Paliwal Senior Engineering Manager Advanced Processor Development

11

Page 2: Neeraj Paliwal Senior Engineering Manager Advanced Processor Development

22

CAD Challenges For Designing CAD Challenges For Designing A High Frequency Multi-Core A High Frequency Multi-Core SoC Implementation Of The SoC Implementation Of The

First-Generation CELL Processor First-Generation CELL Processor

Neeraj PaliwalNeeraj PaliwalSenior Engineering ManagerSenior Engineering Manager

Advanced Processor DevelopmentAdvanced Processor Development

IBM Corporation, Austin TXIBM Corporation, Austin TX

Page 3: Neeraj Paliwal Senior Engineering Manager Advanced Processor Development

33

OutlineOutlineIntroduction Introduction Design Goals Design GoalsDesign Goal Design Goal Design Challenges Design ChallengesChallenges Challenges CAD Methodology CAD MethodologyCAD Methodology DetailsCAD Methodology DetailsLessons Learned Lessons Learned Recommendation RecommendationConclusionConclusion

Page 4: Neeraj Paliwal Senior Engineering Manager Advanced Processor Development

44

Digital Media ApplicationsDigital Media Applications

Page 5: Neeraj Paliwal Senior Engineering Manager Advanced Processor Development

55

Design GoalsDesign GoalsDesign for natural human interactionDesign for natural human interaction– Realism requires Supercomputer attributes with extreme floating Realism requires Supercomputer attributes with extreme floating

point capabilitiespoint capabilities2 TFLOPS in the new Playstation3 System2 TFLOPS in the new Playstation3 System

Set new performance standardSet new performance standard– Exploits parallelism while achieving high frequencyExploits parallelism while achieving high frequency

Multiple HF CoresMultiple HF Cores

Foster innovation in Design & MethodologyFoster innovation in Design & Methodology– Holistic Design approachHolistic Design approach– Scalability and Flexibility through Modular designScalability and Flexibility through Modular design

Page 6: Neeraj Paliwal Senior Engineering Manager Advanced Processor Development

66

OutlineOutlineIntroduction Introduction Design Goals Design GoalsDesign Goal Design Goal Design Challenges Design ChallengesChallenges Challenges CAD Methodology CAD MethodologyCAD Methodology DetailsCAD Methodology DetailsLessons Learned Lessons Learned Recommendation RecommendationConclusionConclusion

Page 7: Neeraj Paliwal Senior Engineering Manager Advanced Processor Development

77

Design ChallengesDesign ChallengesTriple ConstraintsTriple Constraints– PowerPower– FrequencyFrequency– CostCost

Design TrendsDesign Trends– SoC and Giga Scale IntegrationSoC and Giga Scale Integration– Multi-Core on a ChipMulti-Core on a Chip

Time to MarketTime to Market

Page 8: Neeraj Paliwal Senior Engineering Manager Advanced Processor Development

88

System Trends Toward IntegrationSystem Trends Toward Integration

Increased integration is driving processors to take on Increased integration is driving processors to take on many functions typically associated with systemsmany functions typically associated with systems– Integration forces processor developers to address off-load and Integration forces processor developers to address off-load and

acceleration in the design of the processoracceleration in the design of the processor– Integration of bridge chip functionalityIntegration of bridge chip functionality

Memory

Accel

Southbridge

Processor

Northbridge Memory

Cell

Processor

IO IO

Page 9: Neeraj Paliwal Senior Engineering Manager Advanced Processor Development

99

Giga Scale IntegrationGiga Scale Integration

CPU

Media

SecurityConfig.

IOSynergistic

Processor

Mem.

Contr.

Synergistic

Processor

64b Power

Processor

CPU

Media

Processor

Security

Processor

Network

Processor

Streaming

Graphics

Processor

NIC

GPU

Hardwired

Function

Programmable

ASIC

Cell

Need an innovative Design Methodology for High Frequency Multi-Core SoC

Page 10: Neeraj Paliwal Senior Engineering Manager Advanced Processor Development

1010

Implementation ChallengesImplementation ChallengesTechnology ScalingTechnology Scaling– Minimize cross chip variations in delay and leakageMinimize cross chip variations in delay and leakage– Array bit cell stability, writability, yieldArray bit cell stability, writability, yield– Growing impact of wire RC vs. device speedGrowing impact of wire RC vs. device speed

11FO4 design within air-cooled power envelope11FO4 design within air-cooled power envelope– Power, Clock, Signal Distribution variation due to hot spots, inductance Power, Clock, Signal Distribution variation due to hot spots, inductance

effects, etceffects, etc– Multi Clock domainsMulti Clock domains– Intra-Chip interconnectionsIntra-Chip interconnections– Global Optimization with “triple constraints”: Frequency, Power, Cost Global Optimization with “triple constraints”: Frequency, Power, Cost

(Die Size and Yield)(Die Size and Yield)

Page 11: Neeraj Paliwal Senior Engineering Manager Advanced Processor Development

1111

OutlineOutlineIntroduction Introduction Design Goals Design GoalsDesign Goal Design Goal Design Challenges Design ChallengesChallenges Challenges CAD Methodology CAD MethodologyCAD Methodology DetailsCAD Methodology DetailsLessons Learned Lessons Learned Recommendation RecommendationConclusionConclusion

Page 12: Neeraj Paliwal Senior Engineering Manager Advanced Processor Development

1212

Holistic Design ApproachHolistic Design ApproachDesignDesign– Cover all aspects of the designCover all aspects of the design

Circuits, Cores, Chips, System, SoftwareCircuits, Cores, Chips, System, Software

Development processDevelopment process– Fast ConvergenceFast Convergence

Top Down / Bottom UpTop Down / Bottom UpEarly Design Planning / Final ConvergenceEarly Design Planning / Final Convergence

– Adaptability and ScalabilityAdaptability and ScalabilityFor long duration projects need to allows for refinement of ideasFor long duration projects need to allows for refinement of ideas

Organizational structureOrganizational structure– Building the best processor development team spans across Building the best processor development team spans across

the globethe globe– Enable Learning and Adaptive to changes in marketEnable Learning and Adaptive to changes in market

Page 13: Neeraj Paliwal Senior Engineering Manager Advanced Processor Development

1313

Design Methodology PhilosophyDesign Methodology PhilosophyMicro architecture definition must go hand-in-Micro architecture definition must go hand-in-hand with physical floorplan definition – wire hand with physical floorplan definition – wire delays are major component of performancedelays are major component of performance““Divide and Conquer”Divide and Conquer”– Chip hierarchy: macros, units, islands, partitions and chipChip hierarchy: macros, units, islands, partitions and chip– Macro is lowest level floorplannable objectMacro is lowest level floorplannable object– Physical partitioning represented in RTLPhysical partitioning represented in RTL– Each level of hierarchy verified independently (DRC, LVS, Each level of hierarchy verified independently (DRC, LVS,

Equivalence checking)Equivalence checking)

Formal Equivalence Checking required between Formal Equivalence Checking required between RTL and schematicRTL and schematic– Latch points must match – no retimingLatch points must match – no retiming– Performed hierarchically up to the chip levelPerformed hierarchically up to the chip level

VHDL drives physical designVHDL drives physical designDerived data is auditedDerived data is audited

Page 14: Neeraj Paliwal Senior Engineering Manager Advanced Processor Development

1414

Schematic Illustration of Design HierarchySchematic Illustration of Design Hierarchy

Page 15: Neeraj Paliwal Senior Engineering Manager Advanced Processor Development

1515

High-Level

Design

Logic Design

Circuit/Physical

Design & Integration

Verification

Global

Processes

Hardware

Validation

Software

Development

Design Specs

Customer Reqs.

Business Plan

RTL Design

Mfg. Data

Workloads

S/W Dev. Kit

STI Development Process

To Manufacturing Sample Hardware To Customers

Page 16: Neeraj Paliwal Senior Engineering Manager Advanced Processor Development

1616

OutlineOutlineIntroduction Introduction Design Goals Design GoalsDesign Goal Design Goal Design Challenges Design ChallengesChallenges Challenges CAD Methodology CAD MethodologyCAD Methodology DetailsCAD Methodology DetailsLessons Learned Lessons Learned Recommendation RecommendationConclusionConclusion

Page 17: Neeraj Paliwal Senior Engineering Manager Advanced Processor Development

1717

Chip/UnitVHDL

CustomVHDL

ArrayVHDL

RLMVHDL

Portals

DADB

MESAAWAN

Sim env(Fusion,

Specman)

Testcases

GenesysProXGEN

Portals/BooleDozer Portals

TestPat

TECH

ChipBench or CadenceFloorplan

Routing

EinstimerTECH

Layout

CadenceComposer

DeviceVIMPowerSpice

Cadence/GYMLayout Editor

Layout

Verity

LVSERIE

PlacementPDSrtl

TECH

CadenceRoute

Layout

DeviceVIM

Verity

DCMRules

3DX

PDM

GlobalNoise

Device VIM

EinsTLT

DCM TimingRule

Gatemaker

TPGTECH

Macro Noise

Noise Rule

Echk

Merged Layout

NiagaraDRC, LVS

STI Chip Design Flow

PhysVIM

NoiseRules

DesignAudit

CPAMLAVA

PowerRule

LVS

TexPower

CadenceComposer

DeviceVIMPowerSpice

Ultrasim

Cadence/GYMLayout Editor

Layout

VerityESPCV

LVSERIE

SVV

ERIE

Page 18: Neeraj Paliwal Senior Engineering Manager Advanced Processor Development

1818

Design Data ManagementDesign Data ManagementSeven sites & 450+ designersSeven sites & 450+ designers– Need a way to verify that every check has been run on every Need a way to verify that every check has been run on every

piece of data that is going on the chip => this process is called piece of data that is going on the chip => this process is called AuditAudit

– Over the course of the chip development, snapshots of the chip Over the course of the chip development, snapshots of the chip data are going to be needed so that different design teams can data are going to be needed so that different design teams can work with data that is of a certain quality. A work with data that is of a certain quality. A level level can be created can be created to identify that data => this process is called Promoteto identify that data => this process is called Promote

Page 19: Neeraj Paliwal Senior Engineering Manager Advanced Processor Development

1919

Circuit Design PhilosophyCircuit Design Philosophy

Strict design guidelines to minimize design Strict design guidelines to minimize design variationsvariations– Layout topology check and DFM rules for yieldLayout topology check and DFM rules for yield– Circuit topology and electrical checksCircuit topology and electrical checks– Global active clock pulse limiter for dynamic circuitsGlobal active clock pulse limiter for dynamic circuits– Hold time margin scale with clock path delayHold time margin scale with clock path delay

Reduce design sensitivity to technology Reduce design sensitivity to technology leakageleakage– Limited dynamic logic circuit usageLimited dynamic logic circuit usage– No Low-Vt devicesNo Low-Vt devices

Array yield focusArray yield focus– Array redundancy for bit cell stability failsArray redundancy for bit cell stability fails– Reduced cell stress during readReduced cell stress during read

Page 20: Neeraj Paliwal Senior Engineering Manager Advanced Processor Development

2020

Clock PhilosophyClock Philosophy

Clock Distribution using Grid-Tree approachClock Distribution using Grid-Tree approach– Minimal global clock skew – HOLD margin built into Minimal global clock skew – HOLD margin built into

latch timing rule latch timing rule – Do not include clock arrival times in chip static timing Do not include clock arrival times in chip static timing

– eliminates dependency on clock distribution – eliminates dependency on clock distribution analysis analysis

– Clock Distribution area is pre-allocated and tuned Clock Distribution area is pre-allocated and tuned concurrently with unit integrationconcurrently with unit integration

Main Mesh

Page 21: Neeraj Paliwal Senior Engineering Manager Advanced Processor Development

2121

Timing Practices – “Fast Convergence”Timing Practices – “Fast Convergence”

Macro partitioning encouraged to be on Macro partitioning encouraged to be on timing/latch boundariestiming/latch boundariesUnit/Partition/Chip level static timing done early Unit/Partition/Chip level static timing done early and often - progressively improving accuracyand often - progressively improving accuracy– Shell rules -> schematic based rules -> layout extracted Shell rules -> schematic based rules -> layout extracted

rulesrules– Steiner routes -> add wire codes -> 3D extraction -> noise Steiner routes -> add wire codes -> 3D extraction -> noise

upliftuplift

All latches treated as hard timing boundaries, no All latches treated as hard timing boundaries, no transparencytransparencyTransistor level static timing required for all Transistor level static timing required for all macrosmacros

Page 22: Neeraj Paliwal Senior Engineering Manager Advanced Processor Development

2222

Hierarchical Timing ExampleHierarchical Timing ExampleTiming at 4 Levels of Timing at 4 Levels of Hierarchy:Hierarchy:

Unit (eg: sfx)Unit (eg: sfx) Island (eg: spu core)Island (eg: spu core) Partition (eg: spc)Partition (eg: spc) ChipChip

Hierarchical approach breaks Hierarchical approach breaks down larger problem into down larger problem into manageable pieces (Units)manageable pieces (Units)

Chip Timing run times all Chip Timing run times all paths across all hierarchies.paths across all hierarchies.

Internal Macro Timing Closed Internal Macro Timing Closed via EinsTLT but ALL paths via EinsTLT but ALL paths visible in chip runvisible in chip run

ChipPartition

Island

Unit A

Macro

Macro

Macro

Unit B

Page 23: Neeraj Paliwal Senior Engineering Manager Advanced Processor Development

2323

Noise Analysis ExampleNoise Analysis ExampleMacro Analysis Unit/Chip Analysis

Noise analysis with focus on transistors and wires

Global analysis with focus on behavior of wires

Page 24: Neeraj Paliwal Senior Engineering Manager Advanced Processor Development

2424

Power Management PracticesPower Management Practices

Dynamic power is controlled by fine-grain Dynamic power is controlled by fine-grain clock gatingclock gatingLeakage power is managed by adding lower Leakage power is managed by adding lower vt devices only where necessaryvt devices only where necessaryAccurate power estimationAccurate power estimation– Macro level uses circuit simulation and generates a power Macro level uses circuit simulation and generates a power

rule (0-50% input switching)rule (0-50% input switching)– Partition/Chip level uses behavior simulation with specific Partition/Chip level uses behavior simulation with specific

workloads and macro level power rulesworkloads and macro level power rules

Page 25: Neeraj Paliwal Senior Engineering Manager Advanced Processor Development

2525

Integration FlowIntegration FlowVHDL To Finished LayoutVHDL To Finished LayoutCommon Code And Methodology Infrastructure With RLMCommon Code And Methodology Infrastructure With RLMAdditional Steps Unique To Unit ConstructionAdditional Steps Unique To Unit Construction– Generate Power BussesGenerate Power Busses– Buffer Planning/InsertionBuffer Planning/Insertion– Generate hierarchy design constraintsGenerate hierarchy design constraints– Decap InsertionDecap Insertion– Unit Clock Router, minimize powerUnit Clock Router, minimize power– Routing with noise awareness, wire bendingRouting with noise awareness, wire bending– Generate Power and Redundant ViasGenerate Power and Redundant Vias– Verification and Analysis: Extraction, Timing, IREM, Noise, Meth Verification and Analysis: Extraction, Timing, IREM, Noise, Meth

Check, Density Check, Yield Rule Check, DRC/LVS, VerityCheck, Density Check, Yield Rule Check, DRC/LVS, Verity

Saved Parameters For Each Design Making Rebuild SimpleSaved Parameters For Each Design Making Rebuild Simple– Use Of Existing Designs As Template For New DesignsUse Of Existing Designs As Template For New Designs

Page 26: Neeraj Paliwal Senior Engineering Manager Advanced Processor Development

2626

Hot Spot AnalysisHot Spot AnalysisExtensive thermal analysis Extensive thermal analysis early in the design cycleearly in the design cycle

Power maps created for use Power maps created for use with package and heat sink with package and heat sink models.models.

Steady state and transient Steady state and transient thermal behavior simulatedthermal behavior simulated

Analysis feedback to chip Analysis feedback to chip floorplan and thermal sensor floorplan and thermal sensor designdesign

Page 27: Neeraj Paliwal Senior Engineering Manager Advanced Processor Development

2727

Hierarchical VerificationHierarchical Verification

Top Down Specification / Bottom up Top Down Specification / Bottom up ImplementationImplementationTest Generation: provide simulation with Test Generation: provide simulation with good stimulusgood stimulusModel Build, Simulation, and AnalysisModel Build, Simulation, and AnalysisFormal VerificationFormal Verification

Page 28: Neeraj Paliwal Senior Engineering Manager Advanced Processor Development

2828

Test / Pervasive Design PracticesTest / Pervasive Design Practices

Distributed test functionsDistributed test functions– LBIST engine for coresLBIST engine for cores– ABIST engine for arraysABIST engine for arrays

Distributed debug featuresDistributed debug features– Common debug busCommon debug bus– Centralized trace arrayCentralized trace array

Centralized test and pervasive controlCentralized test and pervasive control– Common strategy for logic debug and performance monitoringCommon strategy for logic debug and performance monitoring– Monitor some activity externallyMonitor some activity externally

Early focus on design bring upEarly focus on design bring up– At speed test (internal chip scan, ABIST, programmable LBIST)At speed test (internal chip scan, ABIST, programmable LBIST)– On chip logic analyzer for debugOn chip logic analyzer for debug– On chip performance monitorOn chip performance monitor– Isolate, start, stop, step controls for lab debug.Isolate, start, stop, step controls for lab debug.

Page 29: Neeraj Paliwal Senior Engineering Manager Advanced Processor Development

2929

OutlineOutlineIntroduction Introduction Design Goals Design GoalsDesign Goal Design Goal Design Challenges Design ChallengesChallenges Challenges CAD Methodology CAD MethodologyCAD Methodology DetailsCAD Methodology DetailsLessons Learned Lessons Learned Recommendation RecommendationConclusionConclusion

Page 30: Neeraj Paliwal Senior Engineering Manager Advanced Processor Development

3030

LessonsLessonsLearnedLearned

Data Translation Time Data Translation Time Open Access DB Open Access DB

Early PDV Planning Early PDV Planning Black box approach Black box approach

Layout automation Layout automation Migration and DFM friendly layouts Migration and DFM friendly layouts

Synthesis to layout loop Synthesis to layout loop Physical/DFM aware synthesis Physical/DFM aware synthesis

Hardware resource Hardware resource Linux based CAD flow for better Linux based CAD flow for better ROI and TATROI and TAT

Communication Communication Wiki based documentation system Wiki based documentation system

Multiple sites and IT/OS Issues Multiple sites and IT/OS Issues Regression suite Regression suite

RecommendationRecommendation

Page 31: Neeraj Paliwal Senior Engineering Manager Advanced Processor Development

3131

OutlineOutlineIntroduction Introduction Design Goals Design GoalsDesign Goal Design Goal Design Challenges Design ChallengesChallenges Challenges CAD Methodology CAD MethodologyCAD Methodology DetailsCAD Methodology DetailsLessons Learned Lessons Learned Recommendation RecommendationConclusionConclusion

Page 32: Neeraj Paliwal Senior Engineering Manager Advanced Processor Development

3232

ConclusionsConclusions

The CELL processor, a multi-core design, was The CELL processor, a multi-core design, was successfully implemented usingsuccessfully implemented using– Innovative design methodologyInnovative design methodology– Good design practicesGood design practices– Rules for modularity and reuseRules for modularity and reuse– Triple Constraints for optimum design pointTriple Constraints for optimum design point

Correct operation has been observed with good Correct operation has been observed with good Frequency range (over 3.2GHz)Frequency range (over 3.2GHz)

Sony/SCEI announced PS3 System in 5/05Sony/SCEI announced PS3 System in 5/05

Recommendations being implemented in the next Recommendations being implemented in the next generation chips!generation chips!

Page 33: Neeraj Paliwal Senior Engineering Manager Advanced Processor Development

3333

AcknowledgementAcknowledgement

The Authors: Dac Pham (APDAC 2006 Presentation), Han-The Authors: Dac Pham (APDAC 2006 Presentation), Han-Werner Anderson, Erwin Behnen, Mark Bolliger, Sanjay Werner Anderson, Erwin Behnen, Mark Bolliger, Sanjay Gupta, Peter Hofstee, Paul Harvey, Charles Johns, Jim Kahle, Gupta, Peter Hofstee, Paul Harvey, Charles Johns, Jim Kahle, Atsushi Kameyama, John Keaty, Bob Le, Sang Lee, Tuyen Atsushi Kameyama, John Keaty, Bob Le, Sang Lee, Tuyen Nguyen, John Petrovick, Mydung Pham, Juergen Pille, Nguyen, John Petrovick, Mydung Pham, Juergen Pille, Stephen Posluszny, Mack Riley, Joseph Verock, James Stephen Posluszny, Mack Riley, Joseph Verock, James Warnock, Steve Weitzel, Dieter Wendel.Warnock, Steve Weitzel, Dieter Wendel.

Deep collaboration and many contributions from the entire Deep collaboration and many contributions from the entire SONY-Toshiba-IBM team who worked tirelessly side-by-side SONY-Toshiba-IBM team who worked tirelessly side-by-side on the design of this processor.on the design of this processor.

The executive management teams of the three companies The executive management teams of the three companies who provided management insight and created the right who provided management insight and created the right business conditions for this project.business conditions for this project.

Page 34: Neeraj Paliwal Senior Engineering Manager Advanced Processor Development

3434

Thank You