25
Zhaowei Teng, Sr. Manager Technical Marketing, Arm Physical Design Group October, 2019 Implementing Best-in-Class Arm Cores Across Market Segments with Arm Artisan POP IP and PIK IP Technologies Arm Technology Symposia

Implementing Best-in-Class Arm Cores Across Market ... · POP User Guide Artisan®️ Architect Products CPU optimized Physical IP RTL-GDS scripts for Major EDA tool chains Comprehensive

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Implementing Best-in-Class Arm Cores Across Market ... · POP User Guide Artisan®️ Architect Products CPU optimized Physical IP RTL-GDS scripts for Major EDA tool chains Comprehensive

Zhaowei Teng, Sr. Manager Technical Marketing, Arm Physical Design GroupOctober, 2019

Implementing Best-in-Class Arm CoresAcross Market Segments with Arm

Artisan POP IP and PIK IP TechnologiesArm Technology Symposia

Page 2: Implementing Best-in-Class Arm Cores Across Market ... · POP User Guide Artisan®️ Architect Products CPU optimized Physical IP RTL-GDS scripts for Major EDA tool chains Comprehensive

2 © 2019 Arm Limited 2

Data Consumption is Revolutionizing the Infrastructure

Cloud Data Centers

Analyze & Store

Critical DataEdge

Edge

Edge

Edge

Edge

Edge

Filter & React

Massive Amounts of Data

Trillionsof Devices

5G

Local Decisions

Page 3: Implementing Best-in-Class Arm Cores Across Market ... · POP User Guide Artisan®️ Architect Products CPU optimized Physical IP RTL-GDS scripts for Major EDA tool chains Comprehensive

3 © 2019 Arm Limited 3

Growing Complexity of Cloud Data Processing

Cloud Data Centers

Analyze & Store

24 CoreN+S+S Processor

ProcessesApplication

Networking

Storage

Security

64 CoreApplication Processor

CloudRack

Page 4: Implementing Best-in-Class Arm Cores Across Market ... · POP User Guide Artisan®️ Architect Products CPU optimized Physical IP RTL-GDS scripts for Major EDA tool chains Comprehensive

4 © 2019 Arm Limited 4

Translating Arm RTL Benefits Into Silicon

Translate the year-over-year

performance improvements of >15% for compute

through 2020 in silicon

?

Optimize implementation

for new cores and advanced process nodes

?

Ensure fast turn-around time for

implementing new Arm cores

?

HOW TO

2

AMBA5 CHI / AMBA4 ACE

SCU

Arm® Neoverse N1 CPU Processor

Armv8.2-A32b/64b CPU

NEON™ SIMD engine

Crypto extensions

I-Cache w/parity D-Cache w/ECC

Core 1

Optional Shared L3 w DSUACP

ARM CoreSight™ Multicore Debug and Trace

Private L2 cache w/ECC

Peripheral Port

Async Bridges

Direct Connect to CMN ® -600 Mesh CHI

Page 5: Implementing Best-in-Class Arm Cores Across Market ... · POP User Guide Artisan®️ Architect Products CPU optimized Physical IP RTL-GDS scripts for Major EDA tool chains Comprehensive

5 © 2019 Arm Limited 5

GPU

PDG – Processor Co-developmentCustom IP for each core and market segment

CPU

NPU

Optimized core RTL for each market

segment

Optimized physical IP for each market segment and core

Infra

Client

Auto

ML

Page 6: Implementing Best-in-Class Arm Cores Across Market ... · POP User Guide Artisan®️ Architect Products CPU optimized Physical IP RTL-GDS scripts for Major EDA tool chains Comprehensive

6 © 2019 Arm Limited 6

POP IPdevelopment

Co-Optimization of Processor Technology with POP

Identify Process technology differentiation

Process benchmark details to

showcase results

Quick process adoption

with detailed CPU

implementation

EACProcessor

implementation trials

Cadence or Synopsys

Flow TuningFinal implementation

Physical IPtuning

Arm POP IP implementation teams go through many iterations of flow and physical IP tuning to provide a complete implementation solution with

optimized design for fast technology adoption.

Cadence or Synopsys

Flow tuning

Physical IPtuning

Physical IPtuning

Requirements / spec

Feedback

PhysicalsRevised physicals

Feedback

Final physicals

Page 7: Implementing Best-in-Class Arm Cores Across Market ... · POP User Guide Artisan®️ Architect Products CPU optimized Physical IP RTL-GDS scripts for Major EDA tool chains Comprehensive

7 © 2019 Arm Limited 7

Optimized Neoverse Core Implementation with POPOptimal CPU implementation with POP - Predictive PPA, low risk, improved Time To Market (TTM)

Artisan Physical IP

POP Reference

Scripts

POP UserGuide

Artisan®️

ArchitectProducts

CPU optimized Physical IP

RTL-GDS scripts for Major EDA tool

chains

Comprehensive implementation

methodology

Design utilitiesto improve

implementation

POP is the Arm brand for the products that include physical IP and methodology to implement an Arm processor cores

Zeus Platform

7/5nmPoseidon Platform

5nm

Cosmos Platform

16nm NeoverseN1 (Ares) Platform

7nm

Infrastructure Roadmap

Available 2019POP Landing

Team Support

Page 8: Implementing Best-in-Class Arm Cores Across Market ... · POP User Guide Artisan®️ Architect Products CPU optimized Physical IP RTL-GDS scripts for Major EDA tool chains Comprehensive

8 © 2019 Arm Limited 8

2560

200

150

100

100

120

130

140

130

130

40

3260

2400

2500

2600

2700

2800

2900

3000

3100

3200

3300

3400

Start POP AlphaImpl

Beta qualityFCIs

FloorplanChanges

PNR FlowUpdates

PBAoptimization

PNR Flowfixes

Sign-offmarginupdates

Memory PPAtuning

Critical-pathOptimiz.

PNR Flow &FP tuning

Final

Freq

uen

cy (

MH

z)Neoverse N1 Frequency Optimization with POP

The Value of POP Optimization

Page 9: Implementing Best-in-Class Arm Cores Across Market ... · POP User Guide Artisan®️ Architect Products CPU optimized Physical IP RTL-GDS scripts for Major EDA tool chains Comprehensive

9 © 2019 Arm Limited 9

POP Implementation: Sign-Off Checks

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

dev alpha beta EAC

Imp

lem

enta

tio

n E

ffo

rt

Implementation Stage

Sleep signal stitching

UPF verification

Signal EM

Dynamic-power optimization

Leakage-recovery

Hold fixing

CPU vector verification

IR Drop

DRC cleanup

Frequency push

Physical IP feedback & enhancements across all stages

Page 10: Implementing Best-in-Class Arm Cores Across Market ... · POP User Guide Artisan®️ Architect Products CPU optimized Physical IP RTL-GDS scripts for Major EDA tool chains Comprehensive

10 © 2019 Arm Limited 10

POP IP: Fast Cache InstancesBottleneck analysis reveals memories are in critical paths

10% Increased performance with FCIsUsing Industry standard instances

Higher performance targets achieved using Arm FCIs

Many critical paths are resolved with better memory timing arcs

Page 11: Implementing Best-in-Class Arm Cores Across Market ... · POP User Guide Artisan®️ Architect Products CPU optimized Physical IP RTL-GDS scripts for Major EDA tool chains Comprehensive

11 © 2019 Arm Limited 11

Arm is Enabling Partners

Memories are no longerin critical path

Current high-performance computing repeat POP customer

Achieve

>2.5GHz with POP

Server chip reached

3.3GHz + with Neoverse N1

Page 12: Implementing Best-in-Class Arm Cores Across Market ... · POP User Guide Artisan®️ Architect Products CPU optimized Physical IP RTL-GDS scripts for Major EDA tool chains Comprehensive

12 © 2019 Arm Limited 12

Data Consumption is Revolutionizing the Infrastructure toSecurely Connect Devices

Cloud Data Centers

Analyze & Store

Critical DataEdge

Edge

Edge

Edge

Edge

Edge

Filter & React

Massive Amounts of Data

Trillionsof Devices

5G

Local Decisions

Page 13: Implementing Best-in-Class Arm Cores Across Market ... · POP User Guide Artisan®️ Architect Products CPU optimized Physical IP RTL-GDS scripts for Major EDA tool chains Comprehensive

13 © 2019 Arm Limited 13

A Trillion Securely Connected Devices in 2035

• IoT devices push the limits on energy and power efficiency expectations

• Ultra-low voltage design is key to energy and power efficiency gains for IoT

• Arm seeks to unlock ultra-low voltage design use and IoT device proliferation

Very low power Low voltage Very low cost

Page 14: Implementing Best-in-Class Arm Cores Across Market ... · POP User Guide Artisan®️ Architect Products CPU optimized Physical IP RTL-GDS scripts for Major EDA tool chains Comprehensive

14 © 2019 Arm Limited 14

Arm CPU Implementation with Low Power Flow

• Logic depth analysis to ascertain synthesis efficiency

• Optimal cell choice for power and performance efficiency

• Focus on area recovery through repeated netlist optimizations

Cell Selection

• Flow tuning for usage of multi-bit flops• Reduce area and dynamic

power

• Mixing optimal “Vt/CL” combinations rather than All “Vt/CL”

• Restricted use of higher drives for timing

• SC track selection based on design need

• Floorplan tuning for getting best in class power, performance and area

• Clock tree cell types(INV/BUF) & Vt/CL selected to optimize Pdyn

• Estimating and fixing crosstalk/congestion from pre-route to keep TNS in check

Synthesis Place and Route

Page 15: Implementing Best-in-Class Arm Cores Across Market ... · POP User Guide Artisan®️ Architect Products CPU optimized Physical IP RTL-GDS scripts for Major EDA tool chains Comprehensive

15 © 2019 Arm Limited 15

Low-Power Flow (Supports Industry Standard EDA Tools)

DesignCompiler®Graphical

IC CompilerTM II

PrimeTime® ECO

• Quantitative library analysis to select the Vt/CL• Cell profiling• Timing-driven multi-bit register banking and de-

banking• Physical-aware clock gating

• Low power placement• Incremental multi-bit register banking and de-

banking• Selecting the right INV/BUF strategy for building

the clock tree• High effort leakage flow

• PBA based aggressive leakage recovery with minimal timing impact

Genus®

Innovus®

Tempus®

Page 16: Implementing Best-in-Class Arm Cores Across Market ... · POP User Guide Artisan®️ Architect Products CPU optimized Physical IP RTL-GDS scripts for Major EDA tool chains Comprehensive

16 © 2019 Arm Limited 16

Reducing Power While Maintaining Performance

Reduce total power by ~10% while keeping same performance

• 6.5 track cell architecture

• Library analysis for Vt selection

• Leakage power optimization

• Enabling multi-bit flops

• Clock tree cell selection

(Source: Arm internal measurement)

100 100 100

90

9291

Dynamic Power Leakage Power Total Power

No

rmal

ized

Po

wer

Rat

io

Power Summary

Baseline Final

Page 17: Implementing Best-in-Class Arm Cores Across Market ... · POP User Guide Artisan®️ Architect Products CPU optimized Physical IP RTL-GDS scripts for Major EDA tool chains Comprehensive

17 © 2019 Arm Limited 17

Arm POP IP Enables Success with Arm CPU Cores at 22nm ULP

Significant Performance Improvement and Faster time to market using Arm POP IP

Blog at Arm.com with Novatek quote

https://community.arm.com/soc/b/blog/posts/novatek-advancing-digital-television-with-arm-pop-ip-on-tsmc-22nm-ulp

Page 18: Implementing Best-in-Class Arm Cores Across Market ... · POP User Guide Artisan®️ Architect Products CPU optimized Physical IP RTL-GDS scripts for Major EDA tool chains Comprehensive

18 © 2019 Arm Limited 18

POP Advantage: Different POPs for Different Criteria

Parameters Without POP POP

Logic Standard Platform Standard + PPA tuned custom cells

Memories Compiler based FCIs optimized for Fmax, Pmin and Min Area.

Arm Tech Standard Platform Custom VIAs + Custom NDRs

Implementation iRM flow released with CPU RTL POP RFM flow with optimized floorplan tuned for best in-class PPA

Performance Optimized

POP

15%↑ Fmax

10%↓ Power

5%↓ Area

Power Optimized

POP

5%↑ Frequency

20%↓ Power

~8%↓ Area

Area Optimized

POP

5%↑ Frequency

5%↓ Power

15%↓ Area

Page 19: Implementing Best-in-Class Arm Cores Across Market ... · POP User Guide Artisan®️ Architect Products CPU optimized Physical IP RTL-GDS scripts for Major EDA tool chains Comprehensive

19 © 2019 Arm Limited 19

Arm POP DifferentiationFeature POP Other Solutions

Physical IP: fast cache instances and logic tuned specifically for the Arm CPU implementation

Yes Limited

Synergistic development of CPU RTL, flow and POP IP Yes No

Includes tuned physical IP, floorplan, and reference RTL to GDS scripts delivered with CPU EAC delivery

Yes No

Tightly coupled interactive partner engagement to deliver a tuned CPU implementation (L1, L2, ECC, Crypto & other options) recipe for partner’s product

Yes Limited

Customized floorplan for both CPU & top-level Yes Limited

Reference RTL to GDS scripts for both Cadence and Synopsys EDA flows Yes 1 EDA flow only

Sign-off PVT corner for setup/hold & metal stacks aligned to user’s requirements Yes Limited

Complete IR analysis (static, dynamic & in-rush), signal EM and signoff DRC is checked for quoting PPA

Yes Limited

Logic Power Grid Architect (PGA) utility for floorplanning and power grid insertion running on both Cadence and Synopsys EDA tools

YesNot optimized for Arm

POP IP; 1 EDA flow

Time To Market Advantage achieving POP published PPA targets Greatest advantage5 to 8 months TTM savings

for advanced coresNo

Page 20: Implementing Best-in-Class Arm Cores Across Market ... · POP User Guide Artisan®️ Architect Products CPU optimized Physical IP RTL-GDS scripts for Major EDA tool chains Comprehensive

20 © 2019 Arm Limited 20

POP IP Engagement Model for CPU Hardening

Sync-up on RTL config & design spec

Download DEV POP IP set-up & reproduce

POP PPA

Kick-off meeting

Download BET POP IP set-up & reproduce

POP PPA

On-site Visit (if aligned) Final

alignment & concerns

Additional Opts& IR/DRC/hold

time clean

Final sign-off, integration,

POP consultancy and support

1st Review F2F Meeting 2nd Review EAC Release

Tape Out

Review process to download DEV POP

Establish project’s schedule and final PPA targets

Final alignment on POP checklist

Partner independently to integrate BIST/monitors/3rd-party IP etc. once bring-up is complete

Address issues in reproducing DEV POP IP PPA

Updates on POP BET IP PPA

Partner environment and IP download

Post-Beta POP EAC status

Discussions around DFT strategy

Updates on Partner queries and concerns

On-site visit schedule and agenda discussions

Post on-site PPA status update

Strategy for final hand-off

Preparation for final sign-off

Ownership of GDS, final verification & signoff lies with Partner

Feedback and lessons learnt from the engagement

POP-RFM EAC release Complete

Page 21: Implementing Best-in-Class Arm Cores Across Market ... · POP User Guide Artisan®️ Architect Products CPU optimized Physical IP RTL-GDS scripts for Major EDA tool chains Comprehensive

21 © 2019 Arm Limited 21

POP and PIK Availability and Active PartnersCPU/Tech 5nm 7nm 11nm 12nm 14nm 16nm 22nm 28nm

Cortex-A53 ⚫ ⚫ ⚫ ⚫

Cortex-A57 ⚫ ⚫

Cortex-A72 ⚫

Cortex-A73 ⚫ ⚫ ⚫ ⚫

Cortex-A55 ⚫ ⚫ ⚫ ⚫ ⚫ ⚫ ⚫

Cortex-A75 ⚫ ⚫ ⚫ ⚫ ⚫

Cortex-A76 ⚫ ⚫ ⚫

Cortex-A77 ⚫

Neoverse N1 ⚫ ⚫

Neoverse E1 ⚫

HerculesSL ⚫ ⚫

Zeus ⚫ ⚫

⚫Available with Active Customers

⚫ In Development

Page 22: Implementing Best-in-Class Arm Cores Across Market ... · POP User Guide Artisan®️ Architect Products CPU optimized Physical IP RTL-GDS scripts for Major EDA tool chains Comprehensive

22 © 2019 Arm Limited 22

Processor Implementation Spanning Markets and Technology

PIK (Processor Implementation Kit)

• Available for 22nm and older technologies

• Generic flow tailored towards meeting market requirements in terms of low power and low area

• Provides a 2-4 month advantage in termsof TTM

• No license fee

POP IP

• For latest cores on most advanced process nodes

• Specialized flow tailored more towards highest possible performance for a given core with partner sign-off conditions and configuration

• Provides a 3-6 month advantage in terms of TTM because of higher core complexity

• Extra license fee

• Enablement as part of the “Solution Packages”

Page 23: Implementing Best-in-Class Arm Cores Across Market ... · POP User Guide Artisan®️ Architect Products CPU optimized Physical IP RTL-GDS scripts for Major EDA tool chains Comprehensive

23 © 2019 Arm Limited 23

Summary

• The Arm ecosystem is driving innovation and choice for infrastructure, from data center to network core to edge to access

• Arm physical IP enables semiconductor solutions from high performance server chips to optimized SoCs, to FPGAs, ASICs, and CoT-based paths to solutions

• Artisan POP IP provides optimal CPU implementation from 5nm to 28nm - predictive PPA, low risk, improved Time To Market (TTM)

• Arm is the industry’s leading supplier of foundation physical IP and processor implementation solutions to address the performance, power and cost requirements for all application markets on advanced FinFET nodes

Page 24: Implementing Best-in-Class Arm Cores Across Market ... · POP User Guide Artisan®️ Architect Products CPU optimized Physical IP RTL-GDS scripts for Major EDA tool chains Comprehensive

Thank YouDankeMerci谢谢

ありがとうGracias

Kiitos감사합니다

धन्यवाद

شكرًاתודה

© 2019 Arm Limited

Page 25: Implementing Best-in-Class Arm Cores Across Market ... · POP User Guide Artisan®️ Architect Products CPU optimized Physical IP RTL-GDS scripts for Major EDA tool chains Comprehensive

The Arm trademarks featured in this presentation are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in

the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners.

www.arm.com/company/policies/trademarks