Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Low Latency Analytics and Edge Connected Vehicles with RapidIO Interconnect Devashish Paul, Director Strategic [email protected]
June 2016© Integrated Device Technology
IDT Company Overview
Devashish Paul, Director Strategic Marketing 2
Founded 1980
Workforce Approximately 1,800 employees
Headquarters San Jose, California
Mixed-signal application-specific solutions
#1 Serial Switching – 100% 4G Infrastructure with RapidIO
#1 Memory Interface – Industry Leader DDR4
#1 Silicon Timing Devices – Broadest Portfolio
800+ Issued and Pending Patents Worldwide
Low Latency Analytics and Edge Connected Vehicles with RapidIO
Devashish Paul, Director Strategic Marketing 3
Agenda
Computing Trends/Distributed Computing
RapidIO 20-50 Gbps Technology
100 ns Low Latency Analytics Architectures
CERN OpenLabcollaboration
Analytics examples with low latency RapidIO
Connected Vehicles Use Case
5G Mobile Edge Computing + Edge Connected Appliances
Devashish Paul, Director Strategic Marketing 4
The Network is the Data Center
Ecommerce
Fleet Management
Semi Autonomous
Vehicles
Traffic management
• Low Latency
• Energy Efficient
• Analytics Workloads
• At Network Edge
• RapidIO
• IEEE 1588
• Timing
• RF Products
• Memory
Interface
• Retimers
• Sensors
Devashish Paul, Director Strategic Marketing 5
The Network is the Data Center
Ubiquitous Computing:
IDT connects, synchronizes, times and makes sense
of the human-and-machine connected world
Network and Data Center Convergence
Devashish Paul, Director Strategic Marketing 6
• Apps moving from data center
to co-locate with access node
(base station or wired access
node)
• Supporting real time
communication to mobile
devices (phones, cars IOT)
• Tight time synchronization
between apps running on
distributed servers and in data
center
• Need low latency interconnect
Edge Computing an essential element
of 5G Rollouts
Wired and Wireless
Access nodes
Servers/
Analytics
PPC
GPU
Data Center
Servers
Cloud
RAN
4G BTSTowards 5G/MEC Co-located CPU and Acceleraters
Devashish Paul, Director Strategic Marketing 7
Mobile Edge Computing
RapidIO Interconnect combines the best attributes
of PCIe® and Ethernet in a multi-processor fabric
Ethernet PCIe PCIe
ScalabilityLow
LatencyHardware
Terminated
Guaranteed
Delivery
Ethernet
in
software
only
Clustering Fabric Needs
• Lowest Deterministic
System Latency
• Scalability
• Peer to Peer / Any
Topology
• Embedded Endpoints
• Energy Efficiency
• Cost per performance
• HW Reliability and
Determinism
Devashish Paul, Director Strategic Marketing 8
Heterogeneous Edge Analytics with RapidIO
• Heterogeneous compute
workloads
• No protocol termination
CPU cycles
• Energy efficiency
• 20 to 50 Gbps
embedded interconnect
• Scalable Fat node
connect multiple boards
in Edge Appliance
• Push Video
encode/decode,
streaming, rendering
and analytics into the
network edge
Devashish Paul, Director Strategic Marketing 9
Board leveRapidIOInter-
ConnectFabric
FPGA/GPU
accelerators
CPU
Storage
Cable / connector
Backplane
RapidIO Rack Scale Interconnect Fabric
Other nodes
Other nodes
RapidIO Backplane Interconnect Fabric
Edge Analytics require low latency computeFor HPC – like compute
Video server
• 10/20/40/50 Gbps per port – 6.25/10/12.5 Gps
lane
• 100+ Gbps interconnect in definition
• Hardware termination at PHY layer: 3 layer
protocol
• Lowest Latency Interconnect ~ 100 ns
• Inherently scales to large system with
1000’s of nodes
• Over 15 million RapidIO
switches shipped
• > 2xEthernet (10GbE)
Over 110 million 10-20 Gbps
ports shipped
• 100% 4G interconnect
market share
Devashish Paul, Director Strategic Marketing 10
Supported by Large Eco-system
02/09/16 Devashish Paul, Director Strategic Marketing 11
© 2015 RapidIO.org
RapidIO Ecosystem and Market Progression
Performance evolves for multiple system platform generations
2000 2002 2004 2006 2008 2010 2012 2014 2016+
1st Gen Standard
2nd Gen Standard
3rd Gen Standard
4th Gen Standard
SpaceInterconnect
Bridging
Data Center
Computing
Storage
RapidIO Gen3 (10xN) released with path to 25 Gbaud
Devashish Paul, Director Strategic Marketing 12
NASA Space Interconnect Standard
Next Generation Spacecraft Interconnect Standard
RapidIO selected from
Infiniband / Ethernet
/FiberChannel / PCIe
NGSIS members: BAE,
Honeywell, Boeing, Lockheed-
Martin, Sandia
Cisco, Northrup-Grumman,
Loral, LGS, Orbital Sciences,
JPL, Raytheon, AFRL
13
Open Compute Project High Performance Computing
02/09/16 Devashish Paul, Director Strategic Marketing 14
IDT Co-Chairs Opencompute.org HPC Initiative
RapidIO.org ARM64 bit Scale Out Group
• 10s to 100s cores & Sockets
• ARM AMBA® protocol mapping to RapidIO protocols- AMBA 4 AXI4/ACE mapping to RapidIO protocols
- AMBA 5 CHI mapping to RapidIO protocols
• Migration path from AXI4/ACE to CHI and future ARM protocols
• Supports Heteregeneous computing
• Support Wireless/HPC/Data Center Applications
Latency
Scalability
Hardware Termination
Energy Efficiency
Source – www.rapidio.org, Linley Processor conf
16
Better Link Utilization• RapidIO offers better Link Utilization
– RapidIO Link Rate: 14.46 Gbps (Gen2 x4) without encoding overhead
65536, 14.46572591
0
2
4
6
8
10
12
14
16
0 10000 20000 30000 40000 50000 60000 70000
Date
Rate
[G
bit
/s]
Packet Size [Bytes]
Data Rate [Gb/s]
RapidIO Link Rate
RapidIO User Payload Rate
>90% Utilization
Link Rate 14.46 Gbps
17
Low CPU Overhead• RapidIO keeps CPU available for actual User Application
– RapidIO keeps CPU overhead below 1%
– IDT RapidIO 20G PCIE-RapidIO NIC and Intel x86
• Ethernet
– Ethernet Intel Driver Software with TCP/IP
– Intel x86 CPU x5560 and Intel X520 2x10GbE PCIe-Ethernet NIC
15 15.5 15.8
18
15.8
18
13 12.8
16
4.5 4 3.75 3.5 3.25 32 1.925
0.25
0
5
10
15
20
25
30
CP
U U
tili
zati
on
[%
]
Packet Size [Bytes]
CPU Utilization [%] Ethernet CPU Utilization
RapidIO CPU Utilization
RapidIO CPU Overhead less than 1%
CPU Capacity available for User Processing
18
Significant Power Savings• RapidIO offers Significant Power Savings
– Low Power PCIe-RapidIO NIC with low CPU overhed
– IDT RapidIO 20G PCIE-RapidIO NIC and Intel x86
• Ethernet
– Ethernet Intel Driver Software
– Intel x86 CPU x5560 and Intel X520 2x10GbE PCIe-Ethernet NIC
9.97510.92511.4475
13.775
11.9225
14.25
10.4510.33125
14.9625
0
5
10
15
20
25
30
Po
wer
Savin
g [
W]
Packet Size [Bytes]
Power Saving with RapidIO [W] CPU Power Saving with RapidIO [W]
RapidIO saves ~15W/CPU
Power Saving around 1.2 KW per Rack (80 socket)
20 Gbps RapidIO2 Analytics Kit
19
PCIe-RapidIO NIC
(Tsi721)
RapidIO Switch Unit
(CPS1432/CPS1848)
Driver
Software
General Available May 20, 2016
Fab
ric
Man
ag
em
en
t
TC
P/I
P
Device Drivers
Hardware
(NIC/Switch)
Rem
ote
Mem
ory
Acc
ess
Use
r
Sp
ace
Kern
el
Sp
ace
RapidIO Interconnect at CERN OpenLab
20
RapidIO at CERN Openlab: LHC and Data Center
21
• RapidIO Low latency interconnect fabric
• Heterogeneous computing
• Large scalable multi processor systems
• Desire to leverage multi core x86, ARM, GPU, FPGA, DSP with uniform fabric
• Desire programmable upgrades during operations before shut downs
CERN OpenLabRoot Analytics on RapidIO
22
CERN OpenLabEvent Building Use Case on RapidIO
23
24
Analytics Optimized 100ns 50 Gbps Switch
• RXS2448
– 600 Gbps Full-Duplex Serial RapidIO® Switch
– 50 Gbps per port
– 33 x 33 mm package
– 48 lanes at 12.5 Gbps
– Up to 24 Serial RapidIO Ports
– RapidIO Specification (Rev 3.2) Compliant
• RXS1632
– 400 Gbps Full-Duplex Serial RapidIO Switch
– 50 Gbps per port
– 29 x 29 mm package
– 32 lanes at 12.5 Gbps
– Up to 16 Serial RapidIO Ports
– RapidIO Specification (Rev 3.2) Compliant
Clocking
& Reset
Power
MgmtJTAGI2C
Maintenance Buffer and Control
Scheduler and Switch
Fabric
Stack
Port0/1
SerDes
Stack
Port0/1
SerDes
Stack
Port0/1
SerDes
Stack
Port0/1
SerDes
Sta
ck
Port0
/1
SerD
es
Sta
ck
Port0
/1
SerD
es
Sta
ck
Port0
/1
SerD
es
Sta
ck
Port0
/1
SerD
es S
tack
Port
0/1
SerD
es
Sta
ck
Port
0/1
SerD
es
Sta
ck
Port
0/1
SerD
es
Sta
ck
Port
0/1
SerD
es
©2016. IDT.
50 Gbps per port | 300mW per 10 Gbps data | 100ns latency
Devashish Paul, Director Strategic Marketing 25
LocalInter-
ConnectFabric
Cable / Connector
CPU/
Accelerator
Storage
I/O
Hub
20 to 50 Gbps
RapidIOPorts
Scalable Low Latency Analytics Server Fat Node
Devashish Paul, Director Strategic Marketing 26
20 to 50 Gbps
embedded ports
300 mW per 10 Gbps
100 ns latency
Distributed switching
Direct connection to BBU
CPU/
Accelerator
Storage
I/O
Hub
Edge Node with I/O Hub
LocalInter-
ConnectFabric
FPGA/GPU
CPU
Storage
Cable / Connector
Edge Node Native Interconnect
20 to 50 Gbps
RapidIOPorts
RapidIO Top of Rack Switching
Edge Analytics Systems
APPLICATION ISSUES SOLVED
● 50 Gbps per port with 95% link utilization
● 100 ns latency
● Power efficient 300 mW per 10 Gbps
RXS
SoC
FPGA
DSP
Connector
RXS
SoC
FPGA
DSP
Connector
RXS
SoC
FPGA
DSP
Connector
RXS
BBU
Basestation
RXS
Fabric Cluster for Rack Switching
C-RAN
Backplane
RXS
CPU
FPGA/
GPU
RXS
Server
Backplane
RXS
CPU
FPGA/
GPU
Server
RXS
CPU
FPGA/
GPU
Server
Edge Computing Appliance
Distributed
low latency switching
Optimized for needs
of Edge Analytics
Devashish Paul, Director Strategic Marketing 27
Low Latency Switch for Edge Computing Scale Out
Devashish Paul, Director Strategic Marketing 28
• 38 x 20 Gbps ports
• Sub 200W switching power
• Support 42U Rack level scale out
• Available Now
0.75 Tbps
1 U 19 Inch 100 ns Switch
With 20 Gbps ports
Roadmap to 4.8 Tbps
2U 100 ns Switch
With 50 Gbps ports
• 96 x 50 Gbps ports
• Sub 400W switching power
• Supports redundant ports to 42U rack and intra rack scale out
• 2H 2016
5G|Mobile Edge Computing |HPC| Video Analytics | Low Latency Financial Trading
29
12x 50 Gbps
per port
Switch Silicon
X86 Edge Based Analytics Appliance
Devashish Paul, Director Strategic Marketing 30
• Key Components:
• 19 inch 1U ToR 38x20 port ToRswitch
• 19 inch 1U networking server with built in 320 Gbps RapidIO Switching
• Quad Intel Broadwell Sockets
• 4 TB storage in 8 bays
• 320 Gbps built in switching in Video Server
• 100ns latency per switch hop
• Up to 160 sockets per rack with any to any connectivity
• TCP/IP software driver for RapidIO + IDT Open Source Software
• Available Q2 2016
Analytics Server and ToR
RapidIO|x86|
Low Latency | Energy Efficient
GPU Acceleration Nodes
02/09/16 Devashish Paul, Director Strategic Marketing 31
• Computing
• 4 x Nvidia Tegra K1 GPU per Compute slot
• 16x per 19 Inch 1U server
• Analytics acceleration or cloud based rendering of gaming content
• Interconnect Fabric
• RapidIO Switching + 4x RapidIO Gen2 NIC
• Performance
• 1.28 Tflops/Compute (4 GPU)
• 140 Gbps Switching Fabric
• 100nsec RapidIO switching latency
Low Latency GPU Analytics Module
http://www.gocct.com/sheets/AG/photograph/hires/aga1xm1d.jpg
http://www.gocct.com/sheets/AG/aga1xm1d.htm
GPU+ARM RapidIO Video Analytics
RapidIO Switch
Appliance
Low Power ARM+GPU
Cluster with RapidIO
~29 frames/sec/node
~100 ns RapidIO switching latency
> TFlops Processing/node
~4W per GPU node
0
10
20
30
40
50
60
70
0
200
400
600
800
1000
1200
1400
1600
1800
2000
CP
U%
Gb
ps
Bytes
Utilization
Goodput
Throughput
Edge Video Analytics: IBM Power8/Nvidia Tesla/RapidIO Switching
Devashish Paul, Director Strategic Marketing 33
Roadmap to 4.8 Tbps
2U 100 ns Switch
With 50 Gbps ports
Building the OpenPower Edge Node
• Low latency RapidIO ToR Switching 38x20 20 Gbps
• 100 ns IDT RapidIO switch latency, 95% link utilization
• Multi socket Power8 servers
• 38 sockets per rack any to any compute for video analytics
• Option to add Nvidia Tesla class GPU for acceleration per Power8 Node
• IDT Versaclock5 Timing
Edge Social Data Analytics• Analyze User Impressions on World Cup Final 2014 (Germany/Argentina)
– HPAC Lab project to analyze World Cup 2014 twitter data using Hadoop and visualize using Tableau public on HPAC
Platform
Devashish Paul, Director Strategic Marketing 34
Orange Silicon Valley
5G Lab Germany: Edge Analytics for Autonomous Vehicles
Devashish Paul, Director Strategic Marketing 35
Low Latency Edge Computing
Key to Tactile Internet
Automotive Sensor and Sensor Fusion
36
Keys to Next-Generation Automotive Designs
IDT now offers even a greater array of solutions
• Automotive sales channel provides immediate leverage for existing IDT products
5G Lab Germany: Edge Analytics for Autonomous Vehicles
• Network Edge
Connected Vehicles– HPC like workload and network
edge
• GPU/x86/ARM/Open Power
based Analytics
• Low latency RapidIO Fabric
– In vehicle sensor fusion in real
time with low latency
– Leverage OCP Innovations
(Edge Appliance and ToR)
– Multi Vehicle Sensors analysed
at network edge by
“Supercomputing at the Edge”
Devashish Paul, Director Strategic Marketing 37
Autonomous Vehicle
Video Analytics/Object Recognition
Deep Learning/Object Analytics
Connected Vehicle Edge Analytics Workloads
38
Offload Use CasesFleet Management
Municipal Self Driving
Snow Removal
Police Services
Traffic management
Weather/Hazard avoidance
• Low Latency
• Energy Efficient
• Analytics Workloads
• At Network Edge
• Many workloads HPC-like requiring
low latency
• Requires mission critical reliability at
the appliance level
• Leverages 5G Network and
associated handoffs between 5G
nodes
RapidIO is Key Enabler for this Edge Use Case
RapidIO Analytics and Connected Vehicles
• Low Latency Servers
• 20 to 50 Gbps RapidIO
interconnect with 100ns
latency
• Heterogeneous
Computing
• X86, OpenPower, ARM
• GPU and FPGA assist
• Push Video
encode/decode,
streaming, rendering
and analytics into the
network edge
Devashish Paul, Director Strategic Marketing 39
Board levelRapidIOInter-
ConnectFabric
FPGA/GPU
accelerators
CPU
Storage
Cable / connector
Backplane
RapidIO Rack Scale Interconnect Fabric
Other nodes
Other nodes
RapidIO Backplane Interconnect Fabric
HPC like workloads In data center, Edge and Vehicles
Video server
Backup
40
Packet Structure – PHY Layer• PHY Layer Fields
Packet Structure – Transport Layer• Transport Layer Fields
Packet Structure – Logical Layer• Logical Layer Fields