Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Zhaowei Teng, Sr. Manager Technical Marketing, Arm Physical Design GroupOctober, 2019
Implementing Best-in-Class Arm CoresAcross Market Segments with Arm
Artisan POP IP and PIK IP TechnologiesArm Technology Symposia
2 © 2019 Arm Limited 2
Data Consumption is Revolutionizing the Infrastructure
Cloud Data Centers
Analyze & Store
Critical DataEdge
Edge
Edge
Edge
Edge
Edge
Filter & React
Massive Amounts of Data
Trillionsof Devices
5G
Local Decisions
3 © 2019 Arm Limited 3
Growing Complexity of Cloud Data Processing
Cloud Data Centers
Analyze & Store
24 CoreN+S+S Processor
ProcessesApplication
Networking
Storage
Security
64 CoreApplication Processor
CloudRack
4 © 2019 Arm Limited 4
Translating Arm RTL Benefits Into Silicon
Translate the year-over-year
performance improvements of >15% for compute
through 2020 in silicon
?
Optimize implementation
for new cores and advanced process nodes
?
Ensure fast turn-around time for
implementing new Arm cores
?
HOW TO
2
AMBA5 CHI / AMBA4 ACE
SCU
Arm® Neoverse N1 CPU Processor
Armv8.2-A32b/64b CPU
NEON™ SIMD engine
Crypto extensions
I-Cache w/parity D-Cache w/ECC
Core 1
Optional Shared L3 w DSUACP
ARM CoreSight™ Multicore Debug and Trace
Private L2 cache w/ECC
Peripheral Port
Async Bridges
Direct Connect to CMN ® -600 Mesh CHI
5 © 2019 Arm Limited 5
GPU
PDG – Processor Co-developmentCustom IP for each core and market segment
CPU
NPU
Optimized core RTL for each market
segment
Optimized physical IP for each market segment and core
Infra
Client
Auto
ML
6 © 2019 Arm Limited 6
POP IPdevelopment
Co-Optimization of Processor Technology with POP
Identify Process technology differentiation
Process benchmark details to
showcase results
Quick process adoption
with detailed CPU
implementation
EACProcessor
implementation trials
Cadence or Synopsys
Flow TuningFinal implementation
Physical IPtuning
Arm POP IP implementation teams go through many iterations of flow and physical IP tuning to provide a complete implementation solution with
optimized design for fast technology adoption.
Cadence or Synopsys
Flow tuning
Physical IPtuning
Physical IPtuning
Requirements / spec
Feedback
PhysicalsRevised physicals
Feedback
Final physicals
7 © 2019 Arm Limited 7
Optimized Neoverse Core Implementation with POPOptimal CPU implementation with POP - Predictive PPA, low risk, improved Time To Market (TTM)
Artisan Physical IP
POP Reference
Scripts
POP UserGuide
Artisan®️
ArchitectProducts
CPU optimized Physical IP
RTL-GDS scripts for Major EDA tool
chains
Comprehensive implementation
methodology
Design utilitiesto improve
implementation
POP is the Arm brand for the products that include physical IP and methodology to implement an Arm processor cores
Zeus Platform
7/5nmPoseidon Platform
5nm
Cosmos Platform
16nm NeoverseN1 (Ares) Platform
7nm
Infrastructure Roadmap
Available 2019POP Landing
Team Support
8 © 2019 Arm Limited 8
2560
200
150
100
100
120
130
140
130
130
40
3260
2400
2500
2600
2700
2800
2900
3000
3100
3200
3300
3400
Start POP AlphaImpl
Beta qualityFCIs
FloorplanChanges
PNR FlowUpdates
PBAoptimization
PNR Flowfixes
Sign-offmarginupdates
Memory PPAtuning
Critical-pathOptimiz.
PNR Flow &FP tuning
Final
Freq
uen
cy (
MH
z)Neoverse N1 Frequency Optimization with POP
The Value of POP Optimization
9 © 2019 Arm Limited 9
POP Implementation: Sign-Off Checks
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
dev alpha beta EAC
Imp
lem
enta
tio
n E
ffo
rt
Implementation Stage
Sleep signal stitching
UPF verification
Signal EM
Dynamic-power optimization
Leakage-recovery
Hold fixing
CPU vector verification
IR Drop
DRC cleanup
Frequency push
Physical IP feedback & enhancements across all stages
10 © 2019 Arm Limited 10
POP IP: Fast Cache InstancesBottleneck analysis reveals memories are in critical paths
10% Increased performance with FCIsUsing Industry standard instances
Higher performance targets achieved using Arm FCIs
Many critical paths are resolved with better memory timing arcs
11 © 2019 Arm Limited 11
Arm is Enabling Partners
Memories are no longerin critical path
Current high-performance computing repeat POP customer
Achieve
>2.5GHz with POP
Server chip reached
3.3GHz + with Neoverse N1
12 © 2019 Arm Limited 12
Data Consumption is Revolutionizing the Infrastructure toSecurely Connect Devices
Cloud Data Centers
Analyze & Store
Critical DataEdge
Edge
Edge
Edge
Edge
Edge
Filter & React
Massive Amounts of Data
Trillionsof Devices
5G
Local Decisions
13 © 2019 Arm Limited 13
A Trillion Securely Connected Devices in 2035
• IoT devices push the limits on energy and power efficiency expectations
• Ultra-low voltage design is key to energy and power efficiency gains for IoT
• Arm seeks to unlock ultra-low voltage design use and IoT device proliferation
Very low power Low voltage Very low cost
14 © 2019 Arm Limited 14
Arm CPU Implementation with Low Power Flow
• Logic depth analysis to ascertain synthesis efficiency
• Optimal cell choice for power and performance efficiency
• Focus on area recovery through repeated netlist optimizations
Cell Selection
• Flow tuning for usage of multi-bit flops• Reduce area and dynamic
power
• Mixing optimal “Vt/CL” combinations rather than All “Vt/CL”
• Restricted use of higher drives for timing
• SC track selection based on design need
• Floorplan tuning for getting best in class power, performance and area
• Clock tree cell types(INV/BUF) & Vt/CL selected to optimize Pdyn
• Estimating and fixing crosstalk/congestion from pre-route to keep TNS in check
Synthesis Place and Route
15 © 2019 Arm Limited 15
Low-Power Flow (Supports Industry Standard EDA Tools)
DesignCompiler®Graphical
IC CompilerTM II
PrimeTime® ECO
• Quantitative library analysis to select the Vt/CL• Cell profiling• Timing-driven multi-bit register banking and de-
banking• Physical-aware clock gating
• Low power placement• Incremental multi-bit register banking and de-
banking• Selecting the right INV/BUF strategy for building
the clock tree• High effort leakage flow
• PBA based aggressive leakage recovery with minimal timing impact
Genus®
Innovus®
Tempus®
16 © 2019 Arm Limited 16
Reducing Power While Maintaining Performance
Reduce total power by ~10% while keeping same performance
• 6.5 track cell architecture
• Library analysis for Vt selection
• Leakage power optimization
• Enabling multi-bit flops
• Clock tree cell selection
(Source: Arm internal measurement)
100 100 100
90
9291
Dynamic Power Leakage Power Total Power
No
rmal
ized
Po
wer
Rat
io
Power Summary
Baseline Final
17 © 2019 Arm Limited 17
Arm POP IP Enables Success with Arm CPU Cores at 22nm ULP
Significant Performance Improvement and Faster time to market using Arm POP IP
Blog at Arm.com with Novatek quote
https://community.arm.com/soc/b/blog/posts/novatek-advancing-digital-television-with-arm-pop-ip-on-tsmc-22nm-ulp
18 © 2019 Arm Limited 18
POP Advantage: Different POPs for Different Criteria
Parameters Without POP POP
Logic Standard Platform Standard + PPA tuned custom cells
Memories Compiler based FCIs optimized for Fmax, Pmin and Min Area.
Arm Tech Standard Platform Custom VIAs + Custom NDRs
Implementation iRM flow released with CPU RTL POP RFM flow with optimized floorplan tuned for best in-class PPA
Performance Optimized
POP
15%↑ Fmax
10%↓ Power
5%↓ Area
Power Optimized
POP
5%↑ Frequency
20%↓ Power
~8%↓ Area
Area Optimized
POP
5%↑ Frequency
5%↓ Power
15%↓ Area
19 © 2019 Arm Limited 19
Arm POP DifferentiationFeature POP Other Solutions
Physical IP: fast cache instances and logic tuned specifically for the Arm CPU implementation
Yes Limited
Synergistic development of CPU RTL, flow and POP IP Yes No
Includes tuned physical IP, floorplan, and reference RTL to GDS scripts delivered with CPU EAC delivery
Yes No
Tightly coupled interactive partner engagement to deliver a tuned CPU implementation (L1, L2, ECC, Crypto & other options) recipe for partner’s product
Yes Limited
Customized floorplan for both CPU & top-level Yes Limited
Reference RTL to GDS scripts for both Cadence and Synopsys EDA flows Yes 1 EDA flow only
Sign-off PVT corner for setup/hold & metal stacks aligned to user’s requirements Yes Limited
Complete IR analysis (static, dynamic & in-rush), signal EM and signoff DRC is checked for quoting PPA
Yes Limited
Logic Power Grid Architect (PGA) utility for floorplanning and power grid insertion running on both Cadence and Synopsys EDA tools
YesNot optimized for Arm
POP IP; 1 EDA flow
Time To Market Advantage achieving POP published PPA targets Greatest advantage5 to 8 months TTM savings
for advanced coresNo
20 © 2019 Arm Limited 20
POP IP Engagement Model for CPU Hardening
Sync-up on RTL config & design spec
Download DEV POP IP set-up & reproduce
POP PPA
Kick-off meeting
Download BET POP IP set-up & reproduce
POP PPA
On-site Visit (if aligned) Final
alignment & concerns
Additional Opts& IR/DRC/hold
time clean
Final sign-off, integration,
POP consultancy and support
1st Review F2F Meeting 2nd Review EAC Release
Tape Out
Review process to download DEV POP
Establish project’s schedule and final PPA targets
Final alignment on POP checklist
Partner independently to integrate BIST/monitors/3rd-party IP etc. once bring-up is complete
Address issues in reproducing DEV POP IP PPA
Updates on POP BET IP PPA
Partner environment and IP download
Post-Beta POP EAC status
Discussions around DFT strategy
Updates on Partner queries and concerns
On-site visit schedule and agenda discussions
Post on-site PPA status update
Strategy for final hand-off
Preparation for final sign-off
Ownership of GDS, final verification & signoff lies with Partner
Feedback and lessons learnt from the engagement
POP-RFM EAC release Complete
21 © 2019 Arm Limited 21
POP and PIK Availability and Active PartnersCPU/Tech 5nm 7nm 11nm 12nm 14nm 16nm 22nm 28nm
Cortex-A53 ⚫ ⚫ ⚫ ⚫
Cortex-A57 ⚫ ⚫
Cortex-A72 ⚫
Cortex-A73 ⚫ ⚫ ⚫ ⚫
Cortex-A55 ⚫ ⚫ ⚫ ⚫ ⚫ ⚫ ⚫
Cortex-A75 ⚫ ⚫ ⚫ ⚫ ⚫
Cortex-A76 ⚫ ⚫ ⚫
Cortex-A77 ⚫
Neoverse N1 ⚫ ⚫
Neoverse E1 ⚫
HerculesSL ⚫ ⚫
Zeus ⚫ ⚫
⚫Available with Active Customers
⚫ In Development
22 © 2019 Arm Limited 22
Processor Implementation Spanning Markets and Technology
PIK (Processor Implementation Kit)
• Available for 22nm and older technologies
• Generic flow tailored towards meeting market requirements in terms of low power and low area
• Provides a 2-4 month advantage in termsof TTM
• No license fee
POP IP
• For latest cores on most advanced process nodes
• Specialized flow tailored more towards highest possible performance for a given core with partner sign-off conditions and configuration
• Provides a 3-6 month advantage in terms of TTM because of higher core complexity
• Extra license fee
• Enablement as part of the “Solution Packages”
23 © 2019 Arm Limited 23
Summary
• The Arm ecosystem is driving innovation and choice for infrastructure, from data center to network core to edge to access
• Arm physical IP enables semiconductor solutions from high performance server chips to optimized SoCs, to FPGAs, ASICs, and CoT-based paths to solutions
• Artisan POP IP provides optimal CPU implementation from 5nm to 28nm - predictive PPA, low risk, improved Time To Market (TTM)
• Arm is the industry’s leading supplier of foundation physical IP and processor implementation solutions to address the performance, power and cost requirements for all application markets on advanced FinFET nodes
Thank YouDankeMerci谢谢
ありがとうGracias
Kiitos감사합니다
धन्यवाद
شكرًاתודה
© 2019 Arm Limited
The Arm trademarks featured in this presentation are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in
the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners.
www.arm.com/company/policies/trademarks