Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Taylor Newill
Senior Manager, Product Management
3rd December 2019
Run Faster on OCI
Confidential – © 2019 Oracle Internal/Restricted/Highly Restricted
Safe harbor statement
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions.
The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of Oracle Corporation.
Confidential – © 2019 Oracle Internal/Restricted/Highly Restricted
Oracle’s HPC Journey
Agenda
1
2
3
4
5
Current Infrastructure
Current HPC
Next Generation HPC
Performance, Performance, Performance
Confidential – © 2019 Oracle Internal/Restricted/Highly Restricted
Oracle’s HPC Journey
Agenda
1
Confidential – © 2019 Oracle Internal/Restricted/Highly Restricted
“The acquisition of Sun transforms the IT industry, combining best-in-class enterprise software and mission-critical computing systems”
Larry Ellison
CEOOracle
Confidential – © 2019 Oracle Internal/Restricted/Highly Restricted
Oracle’s HPC Journey
Copyright © 2019, Oracle and/or its affiliates. All rights reserved.
DEC, 1994 Sun, 2010Mellanox, 2010
Bare Metal, 2016
2x Pascal, 2017
8x Volta, 2Q2018
RDMA, 1Q2019
X9, 2019-2020
Agenda
2 Current Infrastructure
Confidential – © 2019 Oracle Internal/Restricted/Highly Restricted
Gaurav Duggal
Vice PresidentJio Reliance
“We achieve a 10X cost reduction compared with our existing solution, and get DDoS Prevention built in.
Oracle’s NVMe performance is 1,000X better than our existing solution SSD, allowing 10X more connections on the same machine shape.”
Oracle Cloud Infrastructure Global Footprint
Copyright © 2019 Oracle and/or its affiliates. Content as of Oracle OpenWorld 2019
2018
ASHBURNPHOENIX
LONDON
FRANKFURT
Commercial
Government
Oracle Cloud Infrastructure Global Footprint
Copyright © 2019 Oracle and/or its affiliates. Content as of Oracle OpenWorld 2019
September 2019: 16 Regions Live
ASHBURNPHOENIX
SYDNEY
CHICAGO
TORONTO
TOKYOSEOUL
MUMBAI
LONDON
FRANKFURT
ZURICH
SAO PAULO
Commercial
Government
Microsoft Azure Interconnect
Oracle Cloud Infrastructure Global Footprint
Copyright © 2019 Oracle and/or its affiliates. Content as of Oracle OpenWorld 2019
End of CY2020: 36 Oracle Regions
ASHBURNPHOENIX
SYDNEY
CHICAGO
TORONTO
BELO HORIZONTE
TOKYOSEOUL
MUMBAI
OSAKA
MELBOURNE
AMSTERDAM
HYDERABAD
JEDDAHDUBAI
LONDON
BAY AREA
SINGAPORE
SAUDI 2UAE 2
CHILE
Commercial
Government
Microsoft Azure Interconnect
ISRAEL
FRANKFURT
ZURICHMONTREAL
CHUNCHEON
SOUTH AFRICA
US GOV
EUROPE
ASIA
SAO PAULO
NEWPORT,WALES
NVMe storage Any middlebox – IDS/IPS,…Bare metal VMs Engineered Systems
PHYSICAL NETWORK
DATACENTERSREGION
VIRTUAL NETWORK
COMPUTE & STORAGE?
Availability Domain 1
Availability Domain 2
Availability Domain 3
Agenda
3 Current HPC
Confidential – © 2019 Oracle Internal/Restricted/Highly Restricted
Timur Bazhirov, Mohammad Mohammadi
Founders,Exabyte.io
“Oracle shows the best performance due to the combination of the latest generation of computing hardware and low-latency / high-bandwidth interconnect network that facilitates efficient scaling.”
Confidential – © 2019 Oracle Internal/Restricted/Highly Restricted
15
HPC
Processor Memory Network VMs
36 core
Intel 6154
3.0 GHz Base
3.7 GHz turbo
384 GB RAM
10.6 GB
RAM/Core
2666 MHz,
DDR4
Cluster
Networking
<2µs latency
ROCEv2 RDMA
1x 100GBps NIC
1x 25 GBps NIC
December 2019
Security
hardened
hypervisor
Flexible sizing
Dense IO option
BM.HPC2.36
Highest CPU performance in
the cloud
Local Attached StorageRemote Attached
Storage
NVMe SSDs
6.4 TB, 1 disk
Millions of IOPS
Up to 1PB of Block
NVMe Block Volumes
32 TB / volume
GPU
Processor Memory Bare Metal Next Gen
52 CPU cores
8 GPUs
NVLINK
768 GB RAM
96 GB RAM/GPU
2666 MHz,
DDR4
Instance
isolation
Highest IOPS
High throughput
Low latency
New processors
More cores
More memory
Local storage
BM.GPU3.8
Performance on-par with
DGX
Remote Attached
Storage
Up to 1PB of Block
NVMe Block Volumes
32 TB / volume
AMD
Processor Memory Network VMs
64 cores
Epyc 7551
8 Mem Channels
512 GB/RAM
8 GB RAM/Core
2666 MHz,
DDR4
2 x 25 Gbps 1 – 24 Cores
Security
hardened
hypervisor
Flexible sizing
Dense IO option
BM.Standard.E2.64
Highest Memory Bandwidth Processor
PriceRemote Attached
Storage
$0.03/core-hour Up to 1PB of Block
NVMe Block Volumes
32 TB / volume
Clustered RDMA Network1.5 µs latency, 100Gb/s
X X
CPU Servers GPU Servers* Block Storage*Exadata*
Cluster Networking – Up to 20,000 cores
For high performance workloads (HPC, Database, Big Data, AI) including the hardest product development workloads like CFD, Crash Simulations, Reservoir Modelling, DNA Sequencing
*Planned
Why are we better?
In other clouds, data has to go through a VM and a hypervisor before it can access the underlying server hardware
Other clouds automatically distribute the VMs across the entire datacenter
This adds significant latency and un-predictability in performance
RDMA Switch
Top of Rack Switch Top of Rack Switch
Hypervisor
VM
Server
Hypervisor
VM
Server
Hypervisor
VM
Server
Hypervisor
VM
Server
HypervisorServer
HypervisorServer
Hypervisor
VM VM
Server
Hypervisor
VM
Server
Rack Rack
Why are we better?
Oracle connects the servers directly to the RDMA switch
No hypervisor, no virtualization, no jitter
We allow customer to chose their own placement to maximize stability
RDMA Switch
Server
Server
Server
Server
Rack
ISV Ecosystem
ManufacturingAutomotiveAerospace
Artificial Intelligence & Deep Learning
Visual Effects Rendering
Open Source HPC Applications
Agenda
4 Next Generation HPC
Confidential – © 2019 Oracle Internal/Restricted/Highly Restricted
Koji Komura
General Manager of VE Development DepartmentDX Propulsion Center, Denso Techno Co., Ltd.
“There are three major benefits of Oracle Cloud HPC solutions. Low cost, high performance, and support for the latest technology. Denso Techno will use the Oracle Cloud HPC solution to establish our basic technology.”
Confidential – © 2019 Oracle Internal/Restricted/Highly Restricted
24
Copyright © 2019 Oracle and/or its affiliates. Content as of Oracle OpenWorld 2019
New AMD and Intel based BM/VM Shapes
Announcing: New Instances Availability : Q1 – Q4 2020
E2 Standard Instances X9 Standard Instances X9 HPC Instances
AMD Rome Architecture
Both Bare-Metal & VMs
Cores: 1 – 128 cores at 2.2Ghz base with up to 3.4Ghz boost
Memory: up to 2TB
2x 50G Bandwidth
Next Gen Intel Xeon
Both Bare-Metal & VMs
Cores: 1 – 64 cores at 2.4Ghz
Memory: up to 1TB
1x 50G Bandwidth
Next Gen Intel Xeon
Cores: 1 – 36 cores with High-Frequency (3.x Ghz+)
2x 100G RDMA
Local NVME SSDs
Scale to 10s of 1000s of cores per cluster
GPUs and Cluster Networking
Seamless deployment of Bare Metal RDMA Clusters in OCI
Instances Supported:
Available Today: HPC Instances: 36 cores, 3.7Ghz, 384GB RAM, 6.4TB NVME, 100G RDMA
Coming Soon: GPU Bare-Metal Instances: 8x GPUs, 2TB RAM, 25TB NVME, 8x 100G RDMA
Copyright © 2019 Oracle and/or its affiliates. Content as of Oracle OpenWorld 2019
Copyright © 2019 Oracle and/or its affiliates. Content as of Oracle OpenWorld 2019
Coming Soon
• Flexible and elastic compute instances–Pick your cores, pick the memory,
and ready to launch
• Zero downtime when scaling–Never experience downtime if you scale
cores or memory
Elastic Instances
Agenda
5 Performance, Performance, Performance
Confidential – © 2019 Oracle Internal/Restricted/Highly Restricted
Dr. Christopher Woods
Research Software FellowUniversity of Bristol
“We were able to process the large data sets obtained by the microscope on the cloud in a fraction of the time and at much lower cost than previously thought possible. We took a 90 day process and were able to complete it in under 5 days with Oracle Cloud HPC”
Confidential – © 2019 Oracle Internal/Restricted/Highly Restricted
29
Network Performance Implications
Confidential – Oracle Internal/Restricted/Highly Restricted
OpenFOAM
Copyright © 2019, Oracle and/or its affiliates. All rights reserved.31
List price is only $2.7/hrper node
Performance: Ansys Fluent Thickner Model (DPM Resolve)
Wall Clock+DPM resolve: Fluent Thickener(1500 Iterations Fluid + Particles)
9,198
6,116
12,57711,952
6,8846,047
256 Core128 Core
-45%-49%
OCI
Quasar
Azure
Scalability – Fluent Thickener (1500 it Fluid+Part)
Quasar
150%
Azure OCI
105%
152%
Average Performance improvement OCI vs. Azure
47%
192 Cores
8.03
1.12
0.19
7.00
3.97
0.79
0.14
4.22
3.22
0.440.09
3.33
Ø 20mm (947.351)
Ø 50mm (60.631) Ø 300mm (281) Ø various (272.360)
-19%
-44%-36%
-21%
Azure
Workstation
OCI
Performance: Rocky DEM (GPU test) - Spherical
OCI gives a 30% average performance improvement
over Azure
9.970.71
188.67
3.70 0.45
27.67
2.78 0.34
41.90
Ø 20mm (1.237.182)
Ø various (355.794)
Ø 50mm (70.180) Ø 300mm (367)
30,9248,67
330,00
-11%
-25% -24%
-14%
Workstation
Azure
OCI
Performance: Rocky DEM (GPU test) – Sphered Polyhedron
OCI gives a 19% average performance improvement
over Azure
Benchmark – SPEC Speed 2017
OCI
BM.HPC2.36
Cisco UCS
B200 M5
Cisco UCS
C220 M5
Cisco UCS
C240 M5
Series1 9.07 9.14 9.29 9.3
0
1
2
3
4
5
6
7
8
9
10
SPEC Speed 2017 Integer - Peak
OCI
BM.HPC2.36
Cisco UCS
B200 M5
Cisco UCS
C220 M5
Cisco UCS
C240 M5
Series1 118 121 132 133
0
20
40
60
80
100
120
140
SPEC Speed 2017 Floating Point - Base
Benchmark – SPEC Rate 2017
OCI
BM.HPC2.36
Cisco UCS
B200 M5
Cisco UCS
C220 M5
Cisco UCS
C240 M5
Series1 226 221 228 228
0
50
100
150
200
250
SPEC Rate 2017 Integer - Peak
OCI
BM.HPC2.36
Cisco UCS
B200 M5
Cisco UCS
C220 M5
Cisco UCS
C240 M5
Series1 198 202 210 213
0
50
100
150
200
250
SPEC Rate 2017 Floating Point - Base
Benchmark – SPECrate2017_int base
Benchmark – LINPACK
3500 3465 3328
835.2
2126
Rpeak
97.222 96.25
64
46.4
59.06
Rpeak/Core
Benchmark – STREAM 1GB Triad
Comparison of Netsuite Performance
GPU Performance
Confidential – © 2019 Oracle Internal/Restricted/Highly Restricted
0
5000
10000
15000
20000
25000
30000
35000
40000
1x GPU 2x GPU 4x GPU
Billion Word Language Model Benchmark(words per second)
White BoxV100-PCIe
Oracle OCIV100-DXM2
DGX1V100-SXM2
0.3
1.8
3.5
4.6
0
1
2
3
4
5
CPU Only NVIDIA 2x
P100
OCI GPU2.2 OCI GPU3.8Na
no
seco
nd
s P
er
Da
y
(ta
ller
is b
ett
er)
STMV – Tobacco Molecule
LS-DYNA Results
0
1000
2000
3000
4000
5000
6000
oracle
64x1
(64)
oracle
48x1
(48)
oracle
24x1
(24)
oracle
64x2
(128)
oracle
64x4
(256)
oracle
32x8
(256)
oracle
64x8
(512)
oracle
24x15
(360)
oracle
32x15
(48)
configuration cores nodes type modetime
(seconds)
oracle 64x1 (64) 64 1 oracle mpp 5030
oracle 48x1 (48) 48 1 oracle mpp 5631
oracle 24x1 (24) 24 1 oracle mpp 4572
oracle 64x2 (128) 128 2 oracle mpp 3314
oracle 64x4 (256) 256 4 oracle mpp 2633
oracle 32x8 (256) 256 8 oracle mpp 787
oracle 64x8 (512) 512 8 oracle mpp 2474
oracle 24x15 (360) 360 15 oracle mpp 702
oracle 32x15 (48) 480 15 oracle mpp 644
LS-Dyna results
Best performance for HPC in the cloud
“We benchmark the performance of the latest cloud hardware with HPL, two VASP simulation cases, one GROMACS case and MPI Benchmarks. Our findings demonstrate that Oracle Cloud outperforms other cloud vendors due to the latest generation of the hardware and fast interconnect network.” Exabyte.IO team
Better than VMs. Same as on-premises.
86%
90%88%
93%
89%
82%
80%
85%
90%
95%
100%
LS-DYNA ANSYS Fluent MILC WRF HPL Stream
Oracle bare metal performance vs.
other cloud provider VMs
Bare Metal VMLS-DYNA – Structural fluid analysisANSYS Fluent – Computational Fluid DynamicsMILC – MIMD Lattice ComputationWRF – Weather Research and Forecasting ModelHPL – High-performance Linpack (floating point)Stream – Vector applications (memory bandwidth)
Performance - LAMMPS
-
5,000,000
10,000,000
15,000,000
20,000,000
25,000,000
30,000,000
35,000,000
36 72 144 216
Tim
este
ps
per
day
BM.HPC2.36 Cores
LAMMPS - Chute Flow of 32M atoms
-
200,000
400,000
600,000
800,000
1,000,000
1,200,000
36 72 144 216
Tim
este
ps
per
day
BM.HPC2.36 cores, ethernet
LAMMPS - FENE bedspring benchmark, Ethernet
HPC Network Fabric Efficiency Comparison
Confidential – © 2019 Oracle Internal/Restricted/Highly Restricted
Performance
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
1 2 4 8 12 16 20 24 28
Flu
en
t R
ati
ng
Nodes (36 core, 3.7GHz)
sedan_4m model performance (higher is better)VM, TCP
VM, RDMA
BM, RDMA
Scaling
60%
70%
80%
90%
100%
110%
050,000100,000150,000200,000250,000300,000350,000400,000450,000
Sca
ling
Eff
icie
ncy
Cells per core
Scaling efficiency for aircraft_wing_14m (higher is better)
BM, TCP
VM, TCP
BM, RDMA
Scaling vs. Azure
50%
60%
70%
80%
90%
100%
110%
0500,0001,000,0001,500,0002,000,0002,500,0003,000,0003,500,000
Eff
icie
ncy
Cells/Core
Scaling Efficiency on StarCCM+ 105M Cell Model (higher is better)
BM.HPC2.36
AzureH16r
AzureA9
AzureH16mr
More network trafficLess network traffic
Cost vs. Azure
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
128 256 512 1024
Co
st (
$)
Cluster Size in Cores
Cost for 100 Iterations on StarCCM+ 105M Cell model (lower is better)
BM.HPC2.36 (0.075/core)
AzureH16r ($0.134/core)
AzureH16mr ($0.178/core)
Big Data
0
50
100
150
200
250
300
350
400
450
500
5 10 15 20
Exe
cuti
on
Tim
e (
sec)
Sort Size (GB)
Sort Execution Time
10Gbps RDMA
0
200
400
600
800
1000
1200
1400
20 30 40
Exe
cuti
on
Tim
e (
sec)
File Size (GB)
TeraSort Execution Time
10Gbps RDMA
What you have seen
• Oracle Cloud is making significant investments in HPC and can support enterprise end-to-end HPC workloads
• Oracle Cloud has HPC solutions for the full range of HPC
• Oracle provides HPC performance, as good or better than on-premises HPC clusters
• Oracle Cloud Infrastructure provides the best price-performance for HPC
Confidential – © 2019 Oracle Internal/Restricted/Highly Restricted
Thank you
Taylor Newill
Senior Manager, Product ManagementOracle Cloud
Confidential – © 2019 Oracle Internal/Restricted/Highly Restricted