Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Embedded Computing
without Compromise
GTC Israel 2017
Evolution of the Rugged
GPGPU Computer Session: SIL7127
Dan Mor – PLM -Aitech Systems
2
Aitech
Embedded Computing
without Compromise
Agenda
• Current Aitech GPGPU systems
• NVIDIA Jetson TX1 and TX2 evaluation
• Conclusions
• New Aitech Products
3
Aitech
Embedded Computing
without Compromise
GPGPU Product Line
4
Aitech
Embedded Computing
without Compromise
Current Aitech GPGPU Products
5
Aitech
Embedded Computing
without Compromise
Power
Supply
C873
4th Gen. Core i7
SBC
C530
GPGPU
Board
Frame
Grabber
Mezzanine
J3
RG
BH
V
4/7
2/0
DV
I/H
DM
I
RG
BH
V
18–
36V
In
pu
t P
ow
er
On-Board SSD
2
SD-SDI
Composite Video
US
B
Gig
ab
it E
the
rne
t
Se
ria
l
2 2
2.5" SSD(optional)
J1 J2
DV
I/H
DM
I
PCIe x8SATA
A191 Block Diagram
6
Aitech
Embedded Computing
without Compromise
We need SwaP System…
7
Aitech
Embedded Computing
without Compromise
SFF - 50x87mm
SoM with Linux support
Good for SWaP systems
Supercomputing performance
Quad-core ARM® Cortex®-A57 CPUs
GPU - NVIDIA Maxwell™, 1 TFLOP/s with 256 CUDA® Cores
Jetson TX1
8
Aitech
Embedded Computing
without Compromise
400-pin board-to-board connector
pin-out will be backward-compatible with future versions
draws as little as 1 watt of power or lower while idle
8-10 watts under typical CUDA load
up to 15 watts TDP when the module is fully utilized
automatically scaling of CPU,GPU, memory
1 TFLOPS (GTX 770M is 1.36 TFLOPS)
HW encoder (H264/H265) and decoder
4K video processing MIPI CSI x4 cameras or six CSI x2 cameras
9
Aitech
Embedded Computing
without Compromise
Jetson TX1 Evaluation - Non-Graphical Benchmark
The smaller is the number – the faster is calculation on GPU using CUDA. “TX1 – Max” is Jetson TX1 running with maximum GPU frequency
C873 & C530 which is about 120 Watts, only x 1.8 faster than Jetson TX1 which is only 15 Watt
10
Aitech
Embedded Computing
without Compromise
Jetson TX1 Evaluation - Conclusions
Jetson TX1 get a real boost in rendering and CUDA calculation power
CUDA calculation performance
TX1 vs TK1 – x 2 to x 4 for TX1
TX1 vs C873&C530 (770M) – only x 1.8 for C873&C530 (770M)
If Linux is not an obstacle for our customers, Jetson TX1 based product will be success
11
Aitech
Embedded Computing
without Compromise
Comparison table: TX2 vs TX1
Jetson TX2 Jetson TX1
GPU NVIDIA Pascal™, 256 CUDA cores NVIDIA Maxwell ™, 256 CUDA cores
CPU HMP Dual Denver 2/2 MB L2 +
Quad ARM® A57/2 MB L2
Quad ARM® A57/2 MB L2
Memory 8 GB 128 bit LPDDR4
58.3 GB/s
4 GB 64 bit LPDDR4
25.6 GB/s
Display 2x DSI, 2x DP 1.2 / HDMI 2.0 / eDP 1.4 2x DSI, 1x eDP 1.4 / DP 1.2 / HDMI
PCIE Gen 2 | 1x4 + 1x1 OR 2x1 + 1x2 Gen 2 | 1x4 + 1x1
Data Storage 32 GB eMMC, SDIO, SATA 16 GB eMMC, SDIO, SATA
Other CAN, UART, SPI, I2C, I2S, GPIOs UART, SPI, I2C, I2S, GPIOs
USB USB 3.0 + USB 2.0
Connectivity 1 Gigabit Ethernet, 802.11ac WLAN, Bluetooth
Mechanical 50 mm x 87 mm (400-Pin Compatible Board-to-Board Connector)
12
Aitech
Embedded Computing
without Compromise
Dual Operating Modes
13
Aitech
Embedded Computing
without Compromise
non-graphical benchmark (CUDA algorithms) - lower is better [ms]
TX1 TX2 MAXQ TX2 MAXN TX2 MAXQ vs TX1 TX2 MAXN vs TX1
n-body number 4096 4096 4096
Time for 10 iterations [msec] 22.533 68.4 16.421 -67% 27%
n-body number 8192 8192 8192
Time for 10 iterations [msec] 81.491 272.97 65.24 -70% 20%
n-body number 16384 16384 16384
Time for 10 iterations [msec] 206.799 527.47 154 -61% 25.5 %
TX2 has a better performance
when using MAXN power mode
14
Aitech
Embedded Computing
without Compromise
CPU benchmark - lower is better [ms] - nbody algorithm running on CPU
TX1 TX2 MAXQ TX2 MAXN TX2 MAXQ vs TX1 TX2 MAXN vs TX1
n-body number 4096 4096 4096
Time for 10 iterations [msec] 30492.172 57837.430 7169.735 -47% 76.5%
n-body number 8192 8192 8192
Time for 10 iterations [msec] 121315.578 232723.719 11340.421 -48% 90%
TX2 has a better CPU performance
when using MAXN power mode
15
Aitech
Embedded Computing
without Compromise
Conclusions
•TX2 getting a boost in GPU CUDA calculation power using MAXN power mode
MAXN power mode - increase of about 24% in performance (max power consumption 15 W) MAXQ power mode - decrease of about 66% in performance (max power consumption 7.5 W)
•TX2 getting a boost in CPU calculation power using MAXN power mode
MAXN power mode - increase of about 83% in performance (max power consumption 15 W) MAXQ power mode - decrease of about 47% in performance (max power consumption 7.5 W)
•The SW release is "Developer Preview Release", so I hope it should be a lot of improvement and optimizations in near future As we see from above, the half power coming with half of performance.
The full power coming with the boost for GPU (CUDA 24%) and CPU (83%).
16
Aitech
Embedded Computing
without Compromise
17
Aitech
Embedded Computing
without Compromise
Special Features
18
Aitech
Embedded Computing
without Compromise
Technical Features
A176 – Cyclone GPGPU Fanless Small FF RediBuilt™
Supercomputer
19
Aitech
Embedded Computing
without Compromise
A176 Cyclone Based on NVIDIA Jetson TX1/TX2
Pinout will be backward-compatible with future versions
Draws as little as 1 Watt of power or lower while idle
8-10 Watts under typical CUDA load
Up to 17 Watts when the CPU/GPU are fully utilized
Automatically scaling of CPU,GPU, memory
1 TFLOPS
Hardware encoder (H264/H265) and decoder
Ultra Small Form Factor –
129 mm [5.1"] square, 840g [1.85 lbs.]
20
Aitech
Embedded Computing
without Compromise
A176 Block Diagram
Optional
Expansion
Module
Front Panel Connectors
4GB RAM
LPDDR4
NVIDIA
Jetson TX1System on Module
16GB Flash
eMMC 5.1
Quad-Core
ARM CPU
NVIDIA
GPU
Dis
cre
te I/O
8
US
B 2
.0
UA
RT
Mini SATA
SSD
I2C
22
Gig
ab
it E
the
rne
t
DV
I/H
DM
I O
utp
ut
Optional
Expansion
Module
PC
Ie
Isolated
Power
Supply
Line
Filter 2
Optional I/O
- 8 x Composite Inputs
- 1 x SDI Input
PC
Ie
ETR
21
Aitech
Embedded Computing
without Compromise
A176 Highlights SWaP Optimized Rugged HPEC
Ultra Small Form Factor – 129 mm [5.1"] square, < 1 kg [2.2 lbs.]
NVIDIA® Jetson™ TX1 System on Module
NVIDIA Maxwell™ Architecture GPU, with 256 CUDA cores
ARM® Cortex® A57 Quad-Core CPU
1 TFLOPS
H.264/H.265 HW Encoder
Best Available Performance per Watt – 60 GFLOPS/W
SATA SSD with Quick Erase & Secure Erase
4 GB LPDDR4
Video Capture
SDI (SD/HD) w/dedicated H.264 encoder
Composite (RS-170A [NTSC]/PAL), 8 channels available simultaneously
I/O
Gigabit Ethernet DVI/HDMI Output
UART Serial Composite Input
USB 2.0 SDI Input
Discretes
CUDA, OpenGL, OpenGL ES, EGL
Low Power Consumption
Development Platforms Available
Additional expansions:
1. Dual Channel 1553
2. ARINC 429
3. Camera Link Frame Grabber
22
Aitech
Embedded Computing
without Compromise
Technical Features
C535 – Typhoon GPGPU 3U VPX Supercomputer Board
23
Aitech
Embedded Computing
without Compromise
C535 Typhoon Highlights
Rugged 3U VPX HPEC Board – SBC with on-board GPGPU
NVIDIA® Jetson™ TX1 System on Module
NVIDIA Maxwell™ Architecture GPU,
with 256 CUDA cores
ARM® Cortex® A57 Quad-Core CPU
1 TFLOPS
H.264/H.265 HW Encoder
Best Available Performance per Watt –
60 GFLOPS/W
SATA SSD with Quick Erase & Secure Erase
4 GB LPDDR4
Video Capture
SDI (SD/HD) w/dedicated H.264 encoder
Composite (RS-170A [NTSC]/PAL),
8 channels available simultaneously
I/O
Gigabit Ethernet DVI/HDMI Output
UART Serial Composite Input
USB 2.0 SDI Input
Discretes
CUDA, OpenGL, OpenGL ES, EGL
Low Power Consumption
Development Platforms Available
Rugged 3U VPX HPEC Board –
SBC with on-board GPGPU
24
Aitech
Embedded Computing
without Compromise
C535 Block Diagram
Front Panel Connectors
4GB RAM
LPDDR4
NVIDIA
Jetson TX1System on Module
16GB Flash
eMMC 5.1
Quad-Core
ARM CPU
NVIDIA
GPU
Dis
cre
te I/O
8
US
B 2
.0
UA
RT
Mini SATA
SSD
I2C
22
Gig
ab
it E
the
rne
t
DV
I/H
DM
I O
utp
ut
Optional
Expansion
Module
PSU
2
Optional I/O
- 8 x Composite Inputs
- 1 x SDI Input
ETR
PCIe
Switch
PC
Ie x
4
PC
Ie x
4
PC
Ie x
4
PC
Ie
SDOptional
Expansion
Module
PC
Ie
25
Aitech
Embedded Computing
without Compromise
A176/C535 – Interface Expansions
Currently available:
• FG – Simultaneously captures 8 composite PAL/NTSC inputs
• FG – HD/SD-SDI – H264 dedicated encoder (streaming) Available upon request:
• FG – CameraLink input
• ARINC-429 – 6 channels
• 1553 – 2 channels
Special Features
26
Aitech
Embedded Computing
without Compromise
Technical Features
EV176 Development System for A176/C535
27
Aitech
Embedded Computing
without Compromise
Start SW development
right now!
EV176 Development System for A176 Cyclone
28
Aitech
Embedded Computing
without Compromise
GPU rendering (navigation, maps, etc…)
CUDA based (algorithms)
Image Processing (CUDA accelerated)
Radars
Flight Simulators
Video recorders/streaming
Surveillance
Autonomous Vehicles/Drones
Smart Cities
GPGPU extensions to existing systems
Applications
29
Aitech
Embedded Computing
without Compromise
Thank you!