© 2017 Arm Limited Arm Tech Symposia 2017
CCIX: a new coherent multichip interconnect for
accelerated use cases
Akira Shimizu | Senior Manager, Operator relations | Arm
© Arm 2017 2
Interconnects for different scale
SoC interconnect.
• Connectivity for on-chip processor, accelerator, IO and memory elements
Server node interconnect - ‘scale-up’.
• Simple multichip interconnect (typically PCIe) topology on a PCB motherboard with simple switches and expansion connectors
Rack interconnect - ‘scale-out’.
• Scale-out capabilities with complex topologies connecting 1000’s of server nodes and storage elements
Tech Symposia
© Arm 2017 3
Key drivers for interconnect technology
• Decline of Moore’s law forcing more heterogeneous compute.
• Big data analytics growing at 11.7% CAGR.
• 5G wireless applications requiring 10x more bandwidth, 10x lower latency by 2021.
• Increase in distributed data forcing more network intelligence at faster data rates (10GbE -> 100GbE -> 400GbE).
• Data bandwidth and sharing growth projected at 10x-50x increase vs. present PCIe by 2021.
Tech Symposia
© Arm 2017 4
CCIXTM cache coherent interconnect for accelerators
New class of interconnect for accelerated applications.
Mission of the CCIX Consortium is to develop and promote adoption of an industry standard specification to enable coherent interconnect technologies between general-purpose processors and acceleration devices for efficient heterogeneous computing.
https://www.ccixconsortium.com/
Tech Symposia
© Arm 2017 5
CCIX Consortium Inc
Formed January 2016, incorporated in
February 2017.
Complete ecosystem with 34
members and growing.
Hardware specification available for
design starts for member companies.
CCIX pronounced: (c’ siks).Arteris, Inc. Guizhou Huaxintong Semiconductor Technology Co. Ltd.INVECAS INC Netronome Phytium Technology Co., Ltd. PLDA Shanghai Zhaoxin Semiconductor Co., Ltd. Silicon Laboratories Inc.SmartDV Technologies India Private Ltd.
Promoters
Contributors
Adopters
Tech Symposia
© Arm 2017 6
Applications benefiting from CCIX
4G and 5G base station
Data-center Search
Embedded Computing
High Performance (Super)Computing
In memory database processing
Intelligent network acceleration
Machine / Deep Learning
Mobile Edge Computing
Video analytics
Tech Symposia
© Arm 2017 7
CCIX multichip connectivity
High performance, low latency.
• CCIX defines 25GT/s (3x performance*)
• Examining 56GT/s (7x performance*) and beyond
• Enabling low latency via light transaction layer
Flexible, scalable interconnect topologies.
• Flexible point-to-point, daisy chained and switched topologies
Seamless integration.
• Runs on existing PCIe transport layer and management stack
• Supports all major instruction set architectures (ISA)
Processor
Accelerator
Smart Network
PersistentMemory
Switch
Tech Symposia
© Arm 2017 8
System topology examples
Accelerator
CCIX
Switch
Processor
CCIX
Processor
CCIX
Memory
CCIX
Memory
CCIX
Processor
CCIX
Accelerator
CCIX
Processor
CCIX
Accel
CCIX
CCIX
CC
IX
CC
IX Accel
CCIX
CCIX
CC
IX
CC
IXAccel
CCIX
CCIX
CC
IX
CC
IXAccel
CCIX
CCIX
CC
IX
CC
IX
Processor
CCIX
Processor
PCIe
Accel
CCIX
CC
IX
CC
IX
PCIe
Accel
CCIX
CC
IX
CC
IX
PCIe
Accel
CCIX
CCIX
CC
IX
CC
IXAccel
CCIX
CCIX
CC
IX
CC
IX
Processor
PCIe
Direct attached, daisy chain, mesh and switched topologies.
Tech Symposia
© Arm 2017 9
DMA Engines: The problem with traditional accelerators
Operating System vendors are interested in the opportunity for workload-optimized accelerators.
Traditional DMA approach is to provide a special (Linux) kernel driver for every unique accelerator.• Requires skilled kernel developers (a driver for each
accelerator), failure mode is catastrophic (system crash/downtime)
Operating Systems used tomorrow have already been deployed. Updates are 9-12 months apart.• Drivers must be in “upstream” Linux before we support
them, a year+ turnaround for every accelerator
'Trilby”: DMA Engine driven FPGA based workload accelerator built by Jon Masters for research into the barriers to adoption in the Enterprise, uses traditional approach of kernel driver and Operating System hacks.
Tech Symposia
© Arm 2017 10
Coherent virtual memory eliminates data transfer overhead
Processor
Accelerator
Processor
Accelerator
Clean and copy data
Non-coherent system without Shared Virtual Memory (SVM)Software must manage cache maintenance and data copying
Clean and copy data
Clean and copy data
Cache coherent system with Shared Virtual Memory (SVM)Hardware managed cache maintenance, shared address space with direct memory access
AcceleratorProcessor
Tech Symposia
© Arm 2017 11
CCIX acceleration functions that just work in the cloud
Container_2
Physical Machine (e.g., processors, DRAM, caches, mmu, iommu, other resources and SoC devices ...)
Virtual Machines
Guest OS1
JVM_1App_1
VNF_2
Virtual Machine1
Guest OS2
VNF_1
Virtual Machine2
Virtual Machine Monitor (VMM)/Hypervisor
Physical Machine
...
Hyper-Priveleged
Nonpriveleged
Priveleged
Container_1
Guest OS3
App_4
Virtual Machine3
...
Container_1
VNF_3
Container_2
VNF_4
Other External Devices (e.g., disks, NICs, FPGAs, GPUs, crypto, other accelerators, other devices ...)
FirmwareFirmware,
Option ROMs, etc
(Optional)SystemDependent
Non-privileged
Privileged
HyperPrivileged
OptionalSystem Dependent
CCIXfunction
CCIXfunction
CCIXfunc
CCIXfunc
Tech Symposia
© Arm 2017 12
CCIX layered architecture
Protocol Layer – coherency protocol, memory read & write flows
Link Layer – formats CCIX messages for target transport
Transaction Layer – Adds optimized packets, manages credit based flow control
Physical Layer – Dual mode PHY to support extended data rates
PCIeTransaction
LayerCCIX
Transaction Layer
PCIe Data Link Layer
CCIX/PCIe Physical Layer
Tx Rx
PCIe packetsCCIX messages
CCIXLink Layer
CCIXProtocol Layer
Tech Symposia
© Arm 2017 13
CCIX example request to home data flows
Memory
Accelerator shares processor memory
ReqCache
Home
LALA
Daisy chain to shared processor memory
ReqCache
Memory
ReqCache
Home
LALA
ReqCache
LA
ReqCache
LA
Memory
ReqCache
Home
LALA
ReqCache
Shared processor and accelerator memory
Memory
Home
Memory
ReqCache
Home
LALA
ReqCache
Shared memory with aggregation
Memory
Home
LALA
Tech Symposia
© Arm 2017 14
Improved efficiency with CCIX transaction layer
Reduced latency with light weight transaction layer
Improved packet efficiency with optimized CCIX header
Tech Symposia
© Arm 2017 15
CCIX port aggregation to boost bandwidth and transactions
CCIX defines a hashing function to steer requests across multiple links.
Aggregation effectively multiplies the bandwidth.
Aggregation could also be used to increase number of transactions (eg 50GT/s vs 25GT/s).
PCIe requires separate address spaces, requests can not be hashed.
Memory
ReqCache
Home
LALA
ReqCache
CCIX with Port Aggregation
Memory
Home
LALA
Memory
Processor
Cache
Home
PC
I
PC
I
Accelerator
Cache
Mem0
Home0
PC
I
PC
I
Mem1
Home0
PCIe with Aggregation
Tech Symposia
© Arm 2017 16
CCIX SoC integration example
PC
IeTr
ansa
ctio
n
Laye
r
CC
IXTr
ansa
ctio
n
Laye
r
Dat
a Li
nk
Laye
r
PH
Y (u
p t
o 2
5G
pb
s)
16 Lanes
DMC-620
DMC-620 DMC-620
DMC-620
XP
CM
LR
NI
CXS
AXI
3rd party PCIe/CCIX IP
Example CMN-600 mesh design
CoreLink CMN-600
Tech Symposia
© Arm 2017 17
Arm CCIX demonstration vehicle
Arm’s DynamIQ and CoreLink CMN-600 technology.
Cadence CCIX and PCIe controller and PHY IP.
TSMC 7nm process technology.
CCIX Connectivity to Xilinx’s Virtex UltraSoC+ FPGA.
Xilinx, Arm, Cadence, and TSMC Announce World's First CCIX Silicon Demonstration Vehicle in 7nm Process Technology
Tech Symposia
© Arm 2017 18
Scale-up server node performance with CCIX
CCIX is a class of interconnect providing high performance, low latency for new accelerators use cases.
Easy adoption and simplified development by leveraging today’s data centerinfrastructure.
IP available from Arm and ecosystem to optimize CCIX SoC today.
• Server, FPGAs, GPUs, network/storage adapters, intelligent networks and custom ASICs
For more information go to:
https://developer.arm.com
Tech Symposia
1919
Thank You!Danke!Merci!谢谢!ありがとう!Gracias!Kiitos!
© Arm 2017
The Arm trademarks featured in this presentation are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners.
www.arm.com/company/policies/trademarks