26
© 2017 Arm Limited Arm Tech Symposia 2017 DynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for 5G Networks Jeff Maguire | Senior Product Manager Infrastructure IP Product Management | Arm

DynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for …test.armtechforum.com.cn/attached/article/BJ-C4_DynamIQ... · 2019-09-04 · Level 3 cache partition in a networking

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for …test.armtechforum.com.cn/attached/article/BJ-C4_DynamIQ... · 2019-09-04 · Level 3 cache partition in a networking

© 2017 Arm Limited Arm Tech Symposia 2017

DynamIQ Processor Designs Using Cortex-A75

& Cortex-A55 for 5G Networks

Jeff Maguire | Senior Product Manager

Infrastructure IP Product Management | Arm

Page 2: DynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for …test.armtechforum.com.cn/attached/article/BJ-C4_DynamIQ... · 2019-09-04 · Level 3 cache partition in a networking

2 © 2017 Arm Limited

Agenda

5G networks

Ecosystem software to support Arm based deployments of network hardware

Arm Cortex-A75 and Cortex-A55

Conclusions

Page 3: DynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for …test.armtechforum.com.cn/attached/article/BJ-C4_DynamIQ... · 2019-09-04 · Level 3 cache partition in a networking

© 2017 Arm Limited

5G Networks

Page 4: DynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for …test.armtechforum.com.cn/attached/article/BJ-C4_DynamIQ... · 2019-09-04 · Level 3 cache partition in a networking

4 © 2017 Arm Limited

What is fog computing?

CLOUD

FOG COMPUTING

A system-level horizontal architecture that distributescomputing, storage, and networking closer to users

AR/VRAutonomous VehiclesIoT/AI

Page 5: DynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for …test.armtechforum.com.cn/attached/article/BJ-C4_DynamIQ... · 2019-09-04 · Level 3 cache partition in a networking

5 © 2017 Arm Limited

5G Dynamics Across the Network: Edge to Cloud

Devices

ACC

ACS ion

Storage

ion

Storage

Packet Flows Packet Flows

Acceleration

Storage

Compute

Packet Flows

S

ACS

C

AC

A

S

SA

CIoT, Automotive,

Industrial, Medical

Latency30-50x

Throughput100x

Connections100x

Mobility1.5x

5G vs. LTE

Core / DCAccess/Edge EdgeCompute

Page 6: DynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for …test.armtechforum.com.cn/attached/article/BJ-C4_DynamIQ... · 2019-09-04 · Level 3 cache partition in a networking

6 © 2017 Arm Limited

Network/Wireless Infrastructure:Three Distinct Segments – Two Distinct Approaches

5G Antenna MEC Core Network/Data Centre

Server

NetworkOffload

Server

NetworkAnd RAN

Offload

CPU

Air Interface

Processing

Typically standardized platform designs

Typically customized platform designs

Base Station

Page 7: DynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for …test.armtechforum.com.cn/attached/article/BJ-C4_DynamIQ... · 2019-09-04 · Level 3 cache partition in a networking

7 © 2017 Arm Limited

Network/Wireless Infrastructure: Edge Access

Arm: Meet diverse needs with heterogeneous platforms

• Mix of latency, throughput and power

• Supplement sub-systems approach with accelerator offload.

• Sub-system/packet processing platform

• ASIC based designs and dedicated platforms

• Merchant Silicon from multiple suppliers

CCIX and connection to FPGA technology

Edge Network Architecture

5G Air Interface Management• Antenna, RRH, DFE, Distributed Base Band,

Multi-Access Compute Acceleration.• IoT Gateway and End applications.• Optimized power, performance, latency

5G Antenna

CPU

Air Interface

Processing

Base Station

Hardware acceleration for advanced signal processing and terabit throughput.

Customized platform designs

Packet processing / front end / edge design .

Page 8: DynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for …test.armtechforum.com.cn/attached/article/BJ-C4_DynamIQ... · 2019-09-04 · Level 3 cache partition in a networking

8 © 2017 Arm Limited

5G Network Hierarchy Design

• Multi-Access Edge Compute, Distributed Evolved Packet Core, Distributed Applications Processing, Distributed Content Delivery

• Application of virtualized/containerized workloads

Network/Wireless Infrastructure: Core Network

Leverage energy efficient Arm servers with specifics for network infrastructure

Linaro Server group activity, VNF porting activity, OPNFV actions, AIDC, OSEC

Standardized general purpose compute platforms on Arm

Performance/Density/Scalibility

• Support dedicated offload through silicon/software.

• DPDK/ODP support

Control plane functions, content and applications delivery can use general purpose virtualized cloud computing.

Core Network/Data Centre

Server

NetworkOffload

Server-like Architecture – more efficient on Arm

Page 9: DynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for …test.armtechforum.com.cn/attached/article/BJ-C4_DynamIQ... · 2019-09-04 · Level 3 cache partition in a networking

9 © 2017 Arm Limited

Network/Wireless Infrastructure: Mobile Edge Compute

• MEC will be critical in meeting 5G goals

• Low latency, low power required

• E2E Automotive assisted driving, IoT and industrial use cases

• 5G network Slicing dependent

• Designs range from chassis, multi chip to SOC designs

Unique designs tailored to MEC form factor will prevail.

Mix of: • Core network needs (software

driven, compute performance)• Edge Network needs (Latency,

throughput, power optimized)

5G network slicing – power and latency

MEC

Server

NetworkAnd RAN

Offload

Workload optimized general purpose compute and accelerators for user plane processing. Edge Compute Architecture

Page 10: DynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for …test.armtechforum.com.cn/attached/article/BJ-C4_DynamIQ... · 2019-09-04 · Level 3 cache partition in a networking

10 © 2017 Arm Limited

Linaro Segment Groups - Infrastructure

ODP cross-platform

support for SoCaccelerators

Reduces fragmentation,

cost, accelerates time to market

for servers based on Arm

OSS for networking

• Real Time Support

• Virtualization, Core isolation

• OpenDataPlane (ODP)

• OPNFV integration

OSS for Arm servers

• ATF/UEFI/ACPI

• KVM/Xen

• Armv8 optimization

• OpenJDK, Hadoop, OpenStack, Docker

Networking - LNG Enterprise - LEG

Page 11: DynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for …test.armtechforum.com.cn/attached/article/BJ-C4_DynamIQ... · 2019-09-04 · Level 3 cache partition in a networking

11 © 2017 Arm Limited

Arm SoC

Firmware & Network Boot ACPI

Virtualization and HAL

Operating System and VirtualInfrastructure Managers

Key Applications, VNFs, and Middleware

OEMs and ODMs

Networking-centric Ecosystem Enabling Deployments

Page 12: DynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for …test.armtechforum.com.cn/attached/article/BJ-C4_DynamIQ... · 2019-09-04 · Level 3 cache partition in a networking

12 © 2017 Arm Limited

• Initiated and hosted by Arm• Public vehicle to engage

infrastructure developer community (H/W & S/W)

AIDC

WorksOnArm, and AIDC

• Initiated and hosted by Packet.net

• Public vehicle to engage open source developers and share software readiness

WorksOnArm

Page 13: DynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for …test.armtechforum.com.cn/attached/article/BJ-C4_DynamIQ... · 2019-09-04 · Level 3 cache partition in a networking

13 © 2017 Arm Limited

Pharos LabsFully compliant NFVI software platform

POD

3x Control

Nodes

2x Compute

NodesBuild Server

ARM64

Jump Serverx86

VPN

Firewall

VPN

Gateway Router

INTERNET

• April 2016: 1st ARM-based OPNFV Lab by Enea

• Orange Labs, China Mobile, and others to be announced replicating Arm NFVI platform in their labs

Complete Pharos Pod @ ~1 cu. Ft. / 250WMarvell MACCHIATObin boards (Quad-CA72)

Arm reference platform

Colorado Reference Stack

Page 14: DynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for …test.armtechforum.com.cn/attached/article/BJ-C4_DynamIQ... · 2019-09-04 · Level 3 cache partition in a networking

© 2017 Arm Limited

Cortex-A75 and Cortex-A55

Page 15: DynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for …test.armtechforum.com.cn/attached/article/BJ-C4_DynamIQ... · 2019-09-04 · Level 3 cache partition in a networking

15 © 2017 Arm Limited

Scalable solutions from edge to core

Access point Data center compute

System cache

AcceleratorCortex-A

CoreLink CMN-600

DMC-620

IO

NIC-450

IO

DMC-620100GbEPCIe

DMC-620 DMC-620

DMC-620

DM

C-6

20

DM

C-6

20

DM

C-6

20

DM

C-6

20

CPU CPU CPU CPU

CPU CPU CPU CPU

CPU CPU CPU CPU

CPU CPU CPU CPU

CoreLink CMN-600Bandwidth 1 TB/s20 GB/s

System cache 128MB0MB

DDR channels 81

Cortex-A CPUs 2564

Scalable system configurations

DMC620 : dynamic memory controller

CMN: coherent mesh network interconnect

Page 16: DynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for …test.armtechforum.com.cn/attached/article/BJ-C4_DynamIQ... · 2019-09-04 · Level 3 cache partition in a networking

16 © 2017 Arm Limited

New DynamIQ-based CPUs for new possibilities

>50% more performance

compared to current devices

2.5xhigher power efficiency

compared to current devices

Estimated device performance using SPECINT2006, final device results may varyComparison using Cortex-A72 at 2.4GHz vs Cortex-A75 at 3GHz

Comparison using Cortex-A53 in 28nm devices vs Cortex-A55 in 16nm devices

Arm Cortex-A55Arm Cortex-A75

Page 17: DynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for …test.armtechforum.com.cn/attached/article/BJ-C4_DynamIQ... · 2019-09-04 · Level 3 cache partition in a networking

17 © 2017 Arm Limited

New DynamIQ-based CPUs for new possibilities

All comparisons at ISO process and frequency

Baseline to Cortex-A72 Baseline to Cortex-A53

All comparisons at ISO process and frequency

1.97x

1.42x

1.21x

LMBench memcpy

SPECfp2006

SPECint2006

1.30x

1.39x

1.23x

LMBench memcpy

SPECfp2006

SPECint2006

Arm Cortex-A55Arm Cortex-A75

Page 18: DynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for …test.armtechforum.com.cn/attached/article/BJ-C4_DynamIQ... · 2019-09-04 · Level 3 cache partition in a networking

18 © 2017 Arm Limited

Next-generation features

Dot product and half-precision float for AI/ML processing.

Virtualized Host Extensions (VHE) offering Type-2 hypervisor (KVM) performance improvements.

Cache stashing and atomic operations improves multicore networking performance and improves latency.

Cache clean to persistence to support storage class memory.

Infrastructure class RAS enhancement including data poisoning and improved error management.

Page 19: DynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for …test.armtechforum.com.cn/attached/article/BJ-C4_DynamIQ... · 2019-09-04 · Level 3 cache partition in a networking

19 © 2017 Arm Limited

DynamIQ cluster

0 - 7 CoresCore 0

Private L2

Snoop filter

PowerManagement

L3 Cache

BusI/F

ACP andperipheral

port I/F

Core 7

Private L2

Asynchronous bridges

DynamIQ Shared Unit (DSU)

DynamIQ Shared Unit (DSU)

Streamlines traffic across bridges

Advanced power management features

Improved performance with large private cache

Support for multiple performance domains

Connection to high-performance interconnect, with stashing support

Supports large amounts of partitionable local memory

Low latency interfaces for closely coupled accelerators

Any mix of DynamIQ cores, up to four premium

Page 20: DynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for …test.armtechforum.com.cn/attached/article/BJ-C4_DynamIQ... · 2019-09-04 · Level 3 cache partition in a networking

20 © 2017 Arm Limited

Level 3 cache partition in a networking application

Infrastructure

• Process 1 = data plane

• Process 2 = control plane

• Packet processing data sent through low latency ACP interface

Sensors or I/O agents ACP

Process 2

Core group 2

Example configuration with two Core groups in a DynamIQ cluster

Group 1 Group 2

Core group 1

Process 1

L3 cache

Core0 Core1 Core2 Core3 Core4 Core5 Core6 Core7

Reserved for external accelerators via ACP Group 2

Page 21: DynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for …test.armtechforum.com.cn/attached/article/BJ-C4_DynamIQ... · 2019-09-04 · Level 3 cache partition in a networking

21 © 2017 Arm Limited

Increasing performance through cache stashing

Enables reads/writes into the shared L3 cache or per-core L2 cache.

Allows closely coupled accelerators and I/O agents to gain access to core memory.

AMBA 5 CHI and Accelerator Coherency Port (ACP) can be used for cache stashing.

More throughput with Peripheral Port (PP) for acceleration, network, storage use-cases.

Acceleratoror I/O

CoreLink CMN-600

DMC-620 DMC-620

Agile System Cache

DDR4 DDR4

L3 Cache

L2 Cache

CPU

L2 Cache

CPU

L2 Cache

CPU

L2 Cache

Cortex-A

Agile System Cache

L3 Cache

L2 Cache

CPU

L2 Cache

CPU

L2 Cache

CPU

L2 Cache

Cortex-A

Stash critical data to any cache level

DynamIQ cluster

0–7 Cores

Core0

Snoop filter

PowerMngmt

L3 Cache

BusI/F

ACP andperipheral

port I/F

Core7

Asynchronous bridges

DynamIQ Shared Unit (DSU)

L1 cache

L2 cache

L1 cache

L2 cache

Page 22: DynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for …test.armtechforum.com.cn/attached/article/BJ-C4_DynamIQ... · 2019-09-04 · Level 3 cache partition in a networking

22 © 2017 Arm Limited

Networking: Compute and packet-processing solution

Build the right mix of compute

• High-performance Cortex-A75

• High-efficiency throughput Cortex-A55

• DSP, accelerators

Scale CoreLink CMN-600 for application

• Access points: 2~5W

• Macro BTS: 20~30W

• Telecom server: 60~100W

DSP / Accel

DMC-620

Cortex-A75

Level-3 Cache

DMC-620

Data plane processing• Throughput compute• Optional local accelerators

Scalable coherent SoC memory systemwith partitionable affinity

Control plane processing• Performance compute• Timing critical processing

PCIe

Multi-chiplet

Multi-socket

Accelerator

Agile System Cache Agile System Cache

Local

AcceleratorCortex-A55

Level-3 Cache

CoreLink CMN-600Enet

Page 23: DynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for …test.armtechforum.com.cn/attached/article/BJ-C4_DynamIQ... · 2019-09-04 · Level 3 cache partition in a networking

23 © 2017 Arm Limited

Data center server blade: Maximize compute density

Cortex-A75 can deliver 1.4x to 2.9x higher performance .

• >1.4x performance with 32 core Cortex-A75 + CMN-600 vs. 32 core Cortex-A72 + CCN-512

• 2.9x performance with 64 core Cortex-A75 +CMN-600

• Higher rate performance than competitive architectures per socket

• 1-1.5W/core enables sustained operation at maximum performance

Power: 60W compute.

Cache Cache Cache Cache

Cache Cache Cache Cache

Cache Cache Cache Cache

Cache Cache Cache CacheD

MC

-60

0

DD

R4

-32

00

DM

C-6

00

DD

R4

-32

00

DMC-600

DDR4-3200

DMC-600

DDR4-3200

DM

C-6

00

DD

R4

-32

00

DM

C-6

00

DD

R4

-32

00

DMC-600

DDR4-3200

DMC-600

DDR4-3200100GbE

CM

L

PCIe

CM

L

PC

IeP

CIe

Page 24: DynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for …test.armtechforum.com.cn/attached/article/BJ-C4_DynamIQ... · 2019-09-04 · Level 3 cache partition in a networking

24 © 2017 Arm Limited

Enabling 5G transformation

Transition to 5G networks and new use cases is transforming network architectures.

• Requires efficient compute closer to the edge

• Mobile Edge Compute (MEC) nodes combine server-like design (software driven, high aggregate performance) with edge network needs (latency, throughput, power optimized)

Software investments from Arm and ecosystem partners to enable real 5G deployment.

Cortex-A75 & Cortex-A55 together with CMN and other Arm IP enable scalable design.

• Address the compute requirements across the network in a range of power budgets

Page 25: DynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for …test.armtechforum.com.cn/attached/article/BJ-C4_DynamIQ... · 2019-09-04 · Level 3 cache partition in a networking

2525

Thank You!Danke!Merci!谢谢!ありがとう!Gracias!Kiitos!

© 2017 Arm Limited

Page 26: DynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for …test.armtechforum.com.cn/attached/article/BJ-C4_DynamIQ... · 2019-09-04 · Level 3 cache partition in a networking

2626 © 2017 Arm Limited

The Arm trademarks featured in this presentation are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners.

www.arm.com/company/policies/trademarks