28
1 © 2006 IBM Corporation

© 2006 IBM Corporation 0. IBM Research © 2007 IBM Corporation Multi-Core Design Automation Challenges John Darringer IBM T. J. Watson Research Center

Embed Size (px)

Citation preview

Page 1: © 2006 IBM Corporation 0. IBM Research © 2007 IBM Corporation Multi-Core Design Automation Challenges John Darringer IBM T. J. Watson Research Center

1

© 2006 IBM Corporation

Page 2: © 2006 IBM Corporation 0. IBM Research © 2007 IBM Corporation Multi-Core Design Automation Challenges John Darringer IBM T. J. Watson Research Center

IBM Research

© 2007 IBM Corporation

Multi-Core Design Automation Challenges

John Darringer

IBM T. J. Watson Research CenterYorktown Heights, NY, USA

DAC 2007

Page 3: © 2006 IBM Corporation 0. IBM Research © 2007 IBM Corporation Multi-Core Design Automation Challenges John Darringer IBM T. J. Watson Research Center

3

© 2006 IBM Corporation

Scaling no longer provides traditional performance boost

Power limits everything

Advances will come from entire performance stack

Technology

Chip Level

System Level

Application

Dynamic optimization

Assist Threads

Fast Computation

Power Optimization

Compiler Support

Packaging, CoolingNew Devices

Dense SRAM, eDRAMOptics

Memory

Languages,Software Tuning

Efficient Programming

Middleware

System Performance Requires An Integrated Approach

Compiler Support

Multiple Cores

SMT

Accelerators

Power Management

Interconnect

Circuits

RecentHistorical

Trend

Device Performance

1998 2000 2002 2004 2006 2008

Production Date20

200

FP

G

100

Page 4: © 2006 IBM Corporation 0. IBM Research © 2007 IBM Corporation Multi-Core Design Automation Challenges John Darringer IBM T. J. Watson Research Center

4

© 2006 IBM Corporation

Innovation in System DesignL

3 D

ire

cto

ry/C

on

tro

l

L2 L2 L2

LSU LSUIFUBXU

IDU IDU

IFUBXU

FPU FPU

FX

U

FX

U

ISU ISU

Power 4Multi-Core-2001

Power 5Multi-Thread-2004

CELLAccelerators-2006

Power 64.7 Ghz-2007

Page 5: © 2006 IBM Corporation 0. IBM Research © 2007 IBM Corporation Multi-Core Design Automation Challenges John Darringer IBM T. J. Watson Research Center

5

© 2006 IBM Corporation

Trend to Modular Application Optimized Systems

Growing use of diverse modular components

Chip integration may evolve to component assembly

Challenge is in system-level design

– Optimizing architecture for specific applications

Core Accelerator

Cache

Blades

SMP

...

Memory

Page 6: © 2006 IBM Corporation 0. IBM Research © 2007 IBM Corporation Multi-Core Design Automation Challenges John Darringer IBM T. J. Watson Research Center

6

© 2006 IBM Corporation

Multi-Core ASICs

Multi-core ASIC SoCs are common today– Address broad range of markets

– Enables high functional integration

– Provides rapid time to market

One example from 2004– Cisco Silicon Packet Processor

– 188 32-bit RISC processors

– 47 BIPS

Page 7: © 2006 IBM Corporation 0. IBM Research © 2007 IBM Corporation Multi-Core Design Automation Challenges John Darringer IBM T. J. Watson Research Center

7

© 2006 IBM Corporation

Multi-Core Processors

Power efficient, reusable cores

Application matched accelerators

Flexible scaleable interconnect

Optimized memory hierarchy

High speed I/O

Energy management

Deliver system performance

Rapid chip assembly to serve diverse markets

Page 8: © 2006 IBM Corporation 0. IBM Research © 2007 IBM Corporation Multi-Core Design Automation Challenges John Darringer IBM T. J. Watson Research Center

8

© 2006 IBM Corporation

CHALLENGE

System Design

– Continued performance growth

– Increasing power efficiency

– Optimizing for new applications

Design Automation

– Custom design efficiency

– AISC productivity

– Design and verification

Enablers

– Physical Architecture

– Integrated Early Analysis

– Multi-Core Verification

Page 9: © 2006 IBM Corporation 0. IBM Research © 2007 IBM Corporation Multi-Core Design Automation Challenges John Darringer IBM T. J. Watson Research Center

9

© 2006 IBM Corporation

Physical Architecture

Complement logical architecture

Streamline chip integration

Plan for interconnect

Provide predictable results

Multiple strategies

– Fixed layout per block

– Parametric or generated

– Extended synthesis

Example Logical Architecture

Example Physical Architecture

Page 10: © 2006 IBM Corporation 0. IBM Research © 2007 IBM Corporation Multi-Core Design Automation Challenges John Darringer IBM T. J. Watson Research Center

10

© 2006 IBM Corporation

Modular Components

Components need self-contained vertical stack

– with clean interfaces to enable automated integration

ComponentFabric

InterfaceComponent

Function

FutureComponent

Current“Component”

Mixed Fabric and Component Function;

Custom Interface

Future ChipsCurrent Chips

Automated connection with parametric fabric

Custom crafting of clock, data, and power meshes

Page 11: © 2006 IBM Corporation 0. IBM Research © 2007 IBM Corporation Multi-Core Design Automation Challenges John Darringer IBM T. J. Watson Research Center

11

© 2006 IBM Corporation

Custom Design

Careful interconnect design

– Communication

– Clock distribution

– Power and ground

Better power efficiency

– Clock gating, Power gating

– Detailed transistor sizing

High bandwidth memory and I/O

Higher frequency operation

Page 12: © 2006 IBM Corporation 0. IBM Research © 2007 IBM Corporation Multi-Core Design Automation Challenges John Darringer IBM T. J. Watson Research Center

12

© 2006 IBM Corporation

Challenges of Modular Design

Core Core

CoreCore

Core

CoreCore

Core

Custom Layout

– Flexible shape and orientation

– Optimum mesh for power and clock

– Distributed communication and test

– Manually optimized

Modular Layout

– Constrained shape and orientation

– Separate power and clock per core

– Parametric interconnect fabric

– Automatic connection to fabric

Page 13: © 2006 IBM Corporation 0. IBM Research © 2007 IBM Corporation Multi-Core Design Automation Challenges John Darringer IBM T. J. Watson Research Center

13

© 2006 IBM Corporation

Custom Clock Design

Distribution network– Latches and clocked gates

– Control skew and jitter

– Minimize power

– Survive variation and noise

Interconnect models– Inductance critical

– Transmission line

– Buffer placement

Hand optimized– Still an art

Phillip Restle

Page 14: © 2006 IBM Corporation 0. IBM Research © 2007 IBM Corporation Multi-Core Design Automation Challenges John Darringer IBM T. J. Watson Research Center

14

© 2006 IBM Corporation

Custom Power Distribution

Distribute to all devices Multiple voltage domains Simulate detailed power demand Model chip and package Consider ground coupling Balance mesh and trees Allocate decoupling capacitors Focus on resonant frequency Explore clock/power gating

scenariosHoward Chen

Page 15: © 2006 IBM Corporation 0. IBM Research © 2007 IBM Corporation Multi-Core Design Automation Challenges John Darringer IBM T. J. Watson Research Center

15

© 2006 IBM Corporation

Challenges of Modular Design

Custom Wiring

– Optimized over chip

– Resources shared

– Variation minimized

– Complex analysis and integration

Modular Wiring

– Optimized at block level

– Fixed resource allocation

– Some variation in results

– Requires automated integration

Page 16: © 2006 IBM Corporation 0. IBM Research © 2007 IBM Corporation Multi-Core Design Automation Challenges John Darringer IBM T. J. Watson Research Center

16

© 2006 IBM Corporation

Spectrum of Strategies

Fixed physical architecture

Careful block design

Custom within block

Automated block connect

Predictable results

Good for planned cases

Stresses design

ModularReuse

ExtendedSynthesis

Generated physical architecture

More abstract layout

Heavy physical synthesis

Unique block configuration

Results will vary

Flexible restructuring

Stresses tools

Fixed Layout …. Parametric ….. Generated

Page 17: © 2006 IBM Corporation 0. IBM Research © 2007 IBM Corporation Multi-Core Design Automation Challenges John Darringer IBM T. J. Watson Research Center

17

© 2006 IBM Corporation

Systems Demand Early Analysis

To explore many more options

– Cores, Accelerators, Interconnect, Memory Hierarchy, …

To consider many design criteria simultaneously

– Power, Performance, Latency, Hotspots, Reliability, …

To optimize system for specific market

Environment exists for early functional modeling

But today’s tools are not linked to physical design

Page 18: © 2006 IBM Corporation 0. IBM Research © 2007 IBM Corporation Multi-Core Design Automation Challenges John Darringer IBM T. J. Watson Research Center

18

© 2006 IBM Corporation

Early System Analysis

PerformanceModels

Design

PowerAnalysis

Technology

ThermalAnalysis

Package

Implementation

InterconnectAnalysis

FloorplanAssumptionsAssumptions

DesignTeam

Loosely coupled disciplines with multiple experts and distinct models

Page 19: © 2006 IBM Corporation 0. IBM Research © 2007 IBM Corporation Multi-Core Design Automation Challenges John Darringer IBM T. J. Watson Research Center

19

© 2006 IBM Corporation

Performance Modeling Is Changing

New parallel workloads emerging

– Execution vs. trace driven

Shifting to multi-core designs

– Stresses balance of model performance and accuracy

Complex interconnect fabric and memory hierarchy

– Bus, switch, network, asynchronous,…

Increasing use of SystemC

– For early software development and component sharing

Page 20: © 2006 IBM Corporation 0. IBM Research © 2007 IBM Corporation Multi-Core Design Automation Challenges John Darringer IBM T. J. Watson Research Center

20

© 2006 IBM Corporation

Early Physical Planning is Essential

Interconnect requires full chip layout

– Estimate component area before implementation

– Need more accurate methods

– Have to plan for all facilities to predict chip size

Placement coupled to many factors

– Interconnect performance

– Power

– Thermal and reliability concerns

– Yield

Page 21: © 2006 IBM Corporation 0. IBM Research © 2007 IBM Corporation Multi-Core Design Automation Challenges John Darringer IBM T. J. Watson Research Center

21

© 2006 IBM Corporation

Interconnect Fabric

Modeling Interconnects in Multi-Core Designs

MemoryController

Core

Cache

Core

Cache

Cache

Core

Cache

Core

Async/Sync Interface withParametric delay

Interconnect Delays

Interconnect delays– Effect performance– Depend on placement– Require accurate modeling

Page 22: © 2006 IBM Corporation 0. IBM Research © 2007 IBM Corporation Multi-Core Design Automation Challenges John Darringer IBM T. J. Watson Research Center

22

© 2006 IBM Corporation

Power is Key Criteria, but Hard to Predict

Need estimate before implementation

– Voltage/Frequency scaling, Voltage islands,clock gating, leakage

Not just core, but many diverse chip components

– Core, cache, interconnect, controllers, I/O, pervasive

Model “interesting” states and transitions

Scale known implementations

– Complex measurement process for calibration

– Requires data from chip layout

Page 23: © 2006 IBM Corporation 0. IBM Research © 2007 IBM Corporation Multi-Core Design Automation Challenges John Darringer IBM T. J. Watson Research Center

23

© 2006 IBM Corporation

Integrated Early System Analysis

Implementation

DesignFloorplanPackage

TechnologyAssumptions

Results

Performance

Power

Interconnect

Thermal

Optimize

Handoff

DesignTeam Couple all forms of early analysis

Share data in central repository

Industry standard data model

– Open Access

Hand-off to chip integration

– Assumptions, blocks, layout, …

Graphic interface for editing

Stage is set for optimization

Page 24: © 2006 IBM Corporation 0. IBM Research © 2007 IBM Corporation Multi-Core Design Automation Challenges John Darringer IBM T. J. Watson Research Center

24

© 2006 IBM Corporation

Multi-Core Verification

Verification has always been the greatest challenge

Complexity grows with each generation

Challenge is to exploit reuse with multi-core designs

– Requires clear interface definition

CoreCore

Core Core

CoreCore

VerificationSystem

Verification

Traditional Approach Multi-Core Approach

Page 25: © 2006 IBM Corporation 0. IBM Research © 2007 IBM Corporation Multi-Core Design Automation Challenges John Darringer IBM T. J. Watson Research Center

25

© 2006 IBM Corporation

Core Verification

Complexity growing– Clock/Power gating, Voltage and frequency scaling

Formal methods are used– Checking RTL = netlist

– Checking assertions

– Proving implementation equivalent to reference model

Simulation still dominates

Need higher level of specification– Improve quality

– Stretch synthesis and verification tools

Reuse verification environment

Page 26: © 2006 IBM Corporation 0. IBM Research © 2007 IBM Corporation Multi-Core Design Automation Challenges John Darringer IBM T. J. Watson Research Center

26

© 2006 IBM Corporation

System Verification

More complex systems

– Many cores, accelerators, networks, asynchronous links

Memory and network contention is critical area

Formal methods have made impact

– Verifying abstract memory protocols

Simulation is still the final check

Need system-level test case generation

– Use system knowledge to expose resource contention issues

Page 27: © 2006 IBM Corporation 0. IBM Research © 2007 IBM Corporation Multi-Core Design Automation Challenges John Darringer IBM T. J. Watson Research Center

27

© 2006 IBM Corporation

Summary

Exciting and challenging times

– Designing application optimized multi-core systems

– Delivering custom efficiency with ASIC productivity

Focus areas

– Physical Architecture to streamline chip integration

– Integrated Early Analysis to explore design space

– Multi-core verification that exploits reuse

Long history of invention in today’s RTL flow

Innovation is needed now at the system level

Page 28: © 2006 IBM Corporation 0. IBM Research © 2007 IBM Corporation Multi-Core Design Automation Challenges John Darringer IBM T. J. Watson Research Center

28

© 2006 IBM Corporation

Acknowledgements

Thanks to the following people

– Emrah Acar, Reinaldo Bergamaschi, Pradip Bose, Howard Chen, Nagu Dhanwada, Steven German, Steve Kosonocky, Indira Nair, Ruchir Puri, Phillip Restle, Albert Ruehli, Michael Vinov.