24
© 2017 Arm Limited David Koenen Sr. Product Manager, Arm Arm Tech Symposia 2017, Taipei DynamIQ Processor Designs Using Cortex-A75 & Cortex- A55 for 5G Networks

DynamIQProcessor Designs Using Cortex-A75 & … : Ericsson review, 3GPP TR 38.801. 6 © 2017 Arm Limited RU RU RU RU RU RU 5G logical RAN architecture RU DU ... CMN-600 vs. 32 core

  • Upload
    buidieu

  • View
    220

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DynamIQProcessor Designs Using Cortex-A75 & … : Ericsson review, 3GPP TR 38.801. 6 © 2017 Arm Limited RU RU RU RU RU RU 5G logical RAN architecture RU DU ... CMN-600 vs. 32 core

© 2017 Arm Limited

David Koenen

Sr. Product Manager, Arm

Arm Tech Symposia 2017, Taipei

DynamIQ Processor Designs Using Cortex-A75 & Cortex-

A55 for 5G Networks

Page 2: DynamIQProcessor Designs Using Cortex-A75 & … : Ericsson review, 3GPP TR 38.801. 6 © 2017 Arm Limited RU RU RU RU RU RU 5G logical RAN architecture RU DU ... CMN-600 vs. 32 core

2 © 2017 Arm Limited

Agenda

5G networks

Ecosystem software to support Arm based deployments of network hardware

Arm Cortex-A75 and Cortex-A55

Conclusions

Page 3: DynamIQProcessor Designs Using Cortex-A75 & … : Ericsson review, 3GPP TR 38.801. 6 © 2017 Arm Limited RU RU RU RU RU RU 5G logical RAN architecture RU DU ... CMN-600 vs. 32 core

© 2017 Arm Limited

5G Networks

Page 4: DynamIQProcessor Designs Using Cortex-A75 & … : Ericsson review, 3GPP TR 38.801. 6 © 2017 Arm Limited RU RU RU RU RU RU 5G logical RAN architecture RU DU ... CMN-600 vs. 32 core

4 © 2017 Arm Limited

5G Dynamics Across the Network: Edge to Cloud

Devices

ACC

ACS ion

Storage

ion

Storage

Packet Flows Packet Flows

Acceleration

Storage

Compute

Packet Flows

S

ACS

C

AC

A

S

SA

CIoT, Automotive,

Industrial, Medical

Latency30-50x

Throughput100x

Connections100x

Mobility1.5x

5G vs. LTE

Core / DCAccess/Edge EdgeCompute

Page 5: DynamIQProcessor Designs Using Cortex-A75 & … : Ericsson review, 3GPP TR 38.801. 6 © 2017 Arm Limited RU RU RU RU RU RU 5G logical RAN architecture RU DU ... CMN-600 vs. 32 core

5 © 2017 Arm Limited

4G logical RAN architecture

RU

RU

RU

RU

RU

BBU

EPC

BBU

BBU

BBU

RU

RU

RU

RU

BBU

RU

RU

BBU

RU

eNB

eNB

eNB eNB

Radio

RadioL1 / Phy

MAC RLC PDCPRadio

Radio

Management and control

CPRI

Baseband Unit (BBU)Remote Unit (RU)

eNodeB

MME

PGWSGW

Evolved Packet Core (EPC)

S11

S5

HSS

S6a

PCRF

Gx

RU : Radio Unit

BBU : Baseband Unit

EPC : Evolved Packet Core

Source : Ericsson review, 3GPP TR 38.801

Page 6: DynamIQProcessor Designs Using Cortex-A75 & … : Ericsson review, 3GPP TR 38.801. 6 © 2017 Arm Limited RU RU RU RU RU RU 5G logical RAN architecture RU DU ... CMN-600 vs. 32 core

6 © 2017 Arm Limited

RURU

RU

RU

RU

RU

5G logical RAN architecture

RU

DU

CUEPC/NGC

DU

DU

DU

DU

DU

RU

RU

RU

RURU

RU : Radio Unit

DU : Distributed Unit

EPC : Evolved Packet Core

CU: Central Unit

NGC : Next Generation Core

gNBPDCP

Central Unit

(CU)

Switch

Management

L1 / Phy

MAC RLC

Management

Distributed Unit (DU)

Source : Ericsson review, 3GPP TR 38.801

Page 7: DynamIQProcessor Designs Using Cortex-A75 & … : Ericsson review, 3GPP TR 38.801. 6 © 2017 Arm Limited RU RU RU RU RU RU 5G logical RAN architecture RU DU ... CMN-600 vs. 32 core

7 © 2017 Arm Limited

5G Antenna Mobile Edge Compute Core Network/Data Centre

Server

NetworkOffload

Server

NetworkAnd RAN

Offload

CPU

Air Interface

Processing

Typically standardized platform designs

Typically specialized platform designs

Base Station

Network/Wireless Infrastructure:Three Distinct Segments – Two Distinct Approaches

Edge Network Architecture

5G Air Interface Management• Antenna, RRH, DFE,

Distributed Base Band, Multi-Access Compute Acceleration.

• IoT Gateway and End applications.

• Optimized power, performance, scalability, latency

Server-like Architecture – more efficient on Arm

5G Network Hierarchy Design • Multi-Access Edge Compute,

Distributed Evolved Packet Core, Distributed Applications Processing, Distributed Content Delivery.

• Application of virtualized/containerized workloads.

Mobile Edge Compute

Architecture

• Performance critical functions to the edge

• Application of virtualized / containerized workloads

Page 8: DynamIQProcessor Designs Using Cortex-A75 & … : Ericsson review, 3GPP TR 38.801. 6 © 2017 Arm Limited RU RU RU RU RU RU 5G logical RAN architecture RU DU ... CMN-600 vs. 32 core

8 © 2017 Arm Limited

Arm SoC

Firmware & Network Boot ACPI

Virtualization and HAL

Operating System and VirtualInfrastructure Managers

Key Applications, VNFs, and Middleware

OEMs and ODMs

Networking-centric Ecosystem Enabling Deployments

Page 9: DynamIQProcessor Designs Using Cortex-A75 & … : Ericsson review, 3GPP TR 38.801. 6 © 2017 Arm Limited RU RU RU RU RU RU 5G logical RAN architecture RU DU ... CMN-600 vs. 32 core

9 © 2017 Arm Limited

Linaro Segment Groups - Infrastructure

ODP cross-platform

support for SoCaccelerators

Reduces fragmentation,

cost, accelerates time to market

for servers based on Arm

OSS for networking

• Real Time Support

• Virtualization, Core isolation

• OpenDataPlane (ODP)

• OPNFV integration

OSS for Arm servers

• ATF/UEFI/ACPI

• KVM/Xen

• Armv8 optimization

• OpenJDK, Hadoop, OpenStack, Docker

Networking - LNG Enterprise - LEG

Page 10: DynamIQProcessor Designs Using Cortex-A75 & … : Ericsson review, 3GPP TR 38.801. 6 © 2017 Arm Limited RU RU RU RU RU RU 5G logical RAN architecture RU DU ... CMN-600 vs. 32 core

10 © 2017 Arm Limited

• Initiated and hosted by Arm• Public vehicle to engage

infrastructure developer community (H/W & S/W)

AIDC

WorksOnArm, and AIDC

• Initiated and hosted by Packet.net

• Public vehicle to engage open source developers and share software readiness

WorksOnArm

Page 11: DynamIQProcessor Designs Using Cortex-A75 & … : Ericsson review, 3GPP TR 38.801. 6 © 2017 Arm Limited RU RU RU RU RU RU 5G logical RAN architecture RU DU ... CMN-600 vs. 32 core

11 © 2017 Arm Limited

Demonstrating Scalability: the PicoPod

The World’s Tiniest Pharos Pod: The NFV PicoPod

Fully Functional OPNFV Pharos Pod @ ~1 cu. ft.

1 Jump Server, 3 Controllers & 2 Compute

12 port Switch, Power Supply, Fans

Marvell MACCHIATObin boards w/ Marvell Armada 8040 SoC (Quad Arm Cortex-A72)

Dual 10G NICs, 16Gb RAM, PCIe, SATA & more

Runs Danube release NFVi – sample VNFs

Page 12: DynamIQProcessor Designs Using Cortex-A75 & … : Ericsson review, 3GPP TR 38.801. 6 © 2017 Arm Limited RU RU RU RU RU RU 5G logical RAN architecture RU DU ... CMN-600 vs. 32 core

© 2017 Arm Limited

Cortex-A75 and Cortex-A55

Page 13: DynamIQProcessor Designs Using Cortex-A75 & … : Ericsson review, 3GPP TR 38.801. 6 © 2017 Arm Limited RU RU RU RU RU RU 5G logical RAN architecture RU DU ... CMN-600 vs. 32 core

© 2017 Arm Limited 13

Networking requires diverse platforms

GPP

Mem

GPP

GPP

DSP NPUGPP

GPP NPU

DSP PLD

PLDAccel

Networking

GPP

Storage

Acceleration

NPU

…Wireless-access-point / vCPE …Switch / router blade

…Base station / modem …Scale out / server

GPP Accel

DSP

Networking

GPP

Storage

Acceleration

…Edge content caching

& optimization

…vCPE

Networking

GPP

Acceleration

…NFV applianceStorage

GPP GPP GPP GPP

GPP GPP GPP GPP

GPP GPP GPP GPP

GPP GPP GPP GPP

Page 14: DynamIQProcessor Designs Using Cortex-A75 & … : Ericsson review, 3GPP TR 38.801. 6 © 2017 Arm Limited RU RU RU RU RU RU 5G logical RAN architecture RU DU ... CMN-600 vs. 32 core

14 © 2017 Arm Limited

Scalable solutions from edge to core

Access point Data center compute

System cache

AcceleratorCortex-A

CoreLink CMN-600

DMC-620

IO

NIC-450

IO

DMC-620100GbEPCIe

DMC-620 DMC-620

DMC-620

DM

C-6

20

DM

C-6

20

DM

C-6

20

DM

C-6

20

CPU CPU CPU CPU

CPU CPU CPU CPU

CPU CPU CPU CPU

CPU CPU CPU CPU

CoreLink CMN-600Bandwidth 1 TB/s20 GB/s

System cache 128MB0MB

DDR channels 81

Cortex-A CPUs 2564

Scalable system configurations

DMC620 : dynamic memory controller

CMN: mesh network interconnect

Page 15: DynamIQProcessor Designs Using Cortex-A75 & … : Ericsson review, 3GPP TR 38.801. 6 © 2017 Arm Limited RU RU RU RU RU RU 5G logical RAN architecture RU DU ... CMN-600 vs. 32 core

15 © 2017 Arm Limited

New DynamIQ-based CPUs for new possibilities

All comparisons at ISO process and frequency

Baseline to Cortex-A72 Baseline to Cortex-A53

All comparisons at ISO process and frequency

1.97x

1.42x

1.21x

LMBench memcpy

SPECfp2006

SPECint2006

1.30x

1.39x

1.23x

LMBench memcpy

SPECfp2006

SPECint2006

Arm Cortex-A55Arm Cortex-A75

Page 16: DynamIQProcessor Designs Using Cortex-A75 & … : Ericsson review, 3GPP TR 38.801. 6 © 2017 Arm Limited RU RU RU RU RU RU 5G logical RAN architecture RU DU ... CMN-600 vs. 32 core

16 © 2017 Arm Limited

Next-generation features

Dot product and half-precision float for AI/ML processing.

Virtualized Host Extensions (VHE) offering Type-2 hypervisor (KVM) performance improvements.

Cache stashing and atomic operations improves multicore networking performance and improves latency.

Cache clean to persistence to support storage class memory.

Infrastructure class RAS enhancement including data poisoning and improved error management.

Page 17: DynamIQProcessor Designs Using Cortex-A75 & … : Ericsson review, 3GPP TR 38.801. 6 © 2017 Arm Limited RU RU RU RU RU RU 5G logical RAN architecture RU DU ... CMN-600 vs. 32 core

17 © 2017 Arm Limited

DynamIQ cluster

0 - 7 CoresCore 0

Snoop filter

PowerManagement

L3 Cache

BusI/F

ACP andperipheral

port I/F

Core 7

Asynchronous bridges

DynamIQ Shared Unit (DSU)

DynamIQ Shared Unit (DSU)

Streamlines traffic across bridges

Advanced power management features

Latency and bandwidth optimizations

Support for multiple performance domains

Scalable interfaces for edge to cloud applications

Supports large amounts of local memory

Low latency interfaces for closely coupled accelerators

Page 18: DynamIQProcessor Designs Using Cortex-A75 & … : Ericsson review, 3GPP TR 38.801. 6 © 2017 Arm Limited RU RU RU RU RU RU 5G logical RAN architecture RU DU ... CMN-600 vs. 32 core

18 © 2017 Arm Limited

Level 3 cache partition in a networking application

Infrastructure

• Process 1 = data plane

• Process 2 = control plane

• Packet processing data sent through low latency ACP interface

Sensors or I/O agents ACP

Process 2

Core group 2

Example configuration with two Core groups in a DynamIQ cluster

Group 1 Group 2

Core group 1

Process 1

L3 cache

Core0 Core1 Core2 Core3 Core4 Core5 Core6 Core7

Reserved for external accelerators via ACP Group 2

Page 19: DynamIQProcessor Designs Using Cortex-A75 & … : Ericsson review, 3GPP TR 38.801. 6 © 2017 Arm Limited RU RU RU RU RU RU 5G logical RAN architecture RU DU ... CMN-600 vs. 32 core

19 © 2017 Arm Limited

Increasing performance through cache stashing

Enables reads/writes into the shared L3 cache or per-core L2 cache.

Allows closely coupled accelerators and I/O agents to gain access to core memory.

AMBA 5 CHI and Accelerator Coherency Port (ACP) can be used for cache stashing.

More throughput with Peripheral Port (PP) for acceleration, network, storage use-cases.

Acceleratoror I/O

CoreLink CMN-600

DMC-620 DMC-620

Agile System Cache

DDR4 DDR4

L3 Cache

L2 Cache

CPU

L2 Cache

CPU

L2 Cache

CPU

L2 Cache

Cortex-A

Agile System Cache

L3 Cache

L2 Cache

CPU

L2 Cache

CPU

L2 Cache

CPU

L2 Cache

Cortex-A

Stash critical data to any cache level

DynamIQ cluster

0–7 Cores

Core0

Snoop filter

PowerMngmt

L3 Cache

BusI/F

ACP andperipheral

port I/F

Core7

Asynchronous bridges

DynamIQ Shared Unit (DSU)

L1 cache

L2 cache

L1 cache

L2 cache

Page 20: DynamIQProcessor Designs Using Cortex-A75 & … : Ericsson review, 3GPP TR 38.801. 6 © 2017 Arm Limited RU RU RU RU RU RU 5G logical RAN architecture RU DU ... CMN-600 vs. 32 core

20 © 2017 Arm Limited

Networking: Compute and packet-processing solution

Build the right mix of compute

• High-performance Cortex-A75

• High-efficiency throughput Cortex-A55

• DSP, accelerators

Scale CoreLink CMN-600 for application

• Access points: 2~5W

• Macro BTS: 20~30W

• Telecom server: 60~100W

DSP

DMC-620

Cortex-A75

Level-3 Cache

DMC-620

Data plane processing• Throughput compute• Optional local accelerators

Scalable coherent SoC memory system

Control plane processing• Performance compute• Timing critical processing

IO IO

Ethernet

Accelerator

Agile System Cache Agile System Cache

Local

AcceleratorCortex-A55

Level-3 Cache

CoreLink CMN-600

Page 21: DynamIQProcessor Designs Using Cortex-A75 & … : Ericsson review, 3GPP TR 38.801. 6 © 2017 Arm Limited RU RU RU RU RU RU 5G logical RAN architecture RU DU ... CMN-600 vs. 32 core

21 © 2017 Arm Limited

Data center server blade: Maximize compute density

Cortex-A75 can deliver 1.4x to 2.9x higher performance .

• >1.4x performance with 32 core Cortex-A75 + CMN-600 vs. 32 core Cortex-A72 + CCN-512

• 2.9x performance with 64 core Cortex-A75 +CMN-600

• Higher rate performance than competitive architectures per socket

• 1-1.5W/core enables sustained operation at maximum performance

Power: 60W compute.

Cache Cache Cache Cache

Cache Cache Cache Cache

Cache Cache Cache Cache

Cache Cache Cache CacheD

MC

-60

0

DD

R4

-32

00

DM

C-6

00

DD

R4

-32

00

DMC-600

DDR4-3200

DMC-600

DDR4-3200

DM

C-6

00

DD

R4

-32

00

DM

C-6

00

DD

R4

-32

00

DMC-600

DDR4-3200

DMC-600

DDR4-3200100GbE

CM

L

PCIe

CM

L

PC

IeP

CIe

Page 22: DynamIQProcessor Designs Using Cortex-A75 & … : Ericsson review, 3GPP TR 38.801. 6 © 2017 Arm Limited RU RU RU RU RU RU 5G logical RAN architecture RU DU ... CMN-600 vs. 32 core

22 © 2017 Arm Limited

Enabling 5G transformation

Transition to 5G networks and new use cases is transforming network architectures. • Requires efficient compute closer to the edge

• Mobile Edge Compute (MEC) nodes combine server-like design (software driven, high aggregate performance) with edge network needs (latency, throughput, power optimized)

Software investments from Arm and ecosystem partners to enable real 5G deployment.

Cortex-A75 & Cortex-A55 together with CMN and other Arm IP enable scalable design. • Address the compute requirements across the network in a range of power budgets

For more information go to:

https://developer.arm.com

Page 23: DynamIQProcessor Designs Using Cortex-A75 & … : Ericsson review, 3GPP TR 38.801. 6 © 2017 Arm Limited RU RU RU RU RU RU 5G logical RAN architecture RU DU ... CMN-600 vs. 32 core

2323

Thank You!Danke!Merci!謝謝!ありがとう!Gracias!Kiitos!감사합니다धन्यवाद

© 2017 Arm Limited

Page 24: DynamIQProcessor Designs Using Cortex-A75 & … : Ericsson review, 3GPP TR 38.801. 6 © 2017 Arm Limited RU RU RU RU RU RU 5G logical RAN architecture RU DU ... CMN-600 vs. 32 core

2424 © 2017 Arm Limited

The Arm trademarks featured in this presentation are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners.

www.arm.com/company/policies/trademarks