Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
VIRT1983BU
#VMworld #VIRT1983BU
Making the Complicated Simple: Cycle Harvesting from the Virtual Desktop Infrastructure Estate for Financial Modeling and Simulation
VMworld 2017 Content: Not fo
r publication or distri
bution
• This presentation may contain product features that are currently under development.
• This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.
• Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
• Technical feasibility and market demand will affect final delivery.
• Pricing and packaging for any new technologies or features discussed or presented have not been determined.
Disclaimer
CONFIDENTIAL 2
VMworld 2017 Content: Not fo
r publication or distri
bution
Wayne Longcore, VP IT Client Services, Jackson National Life
Josh Simons, Chief Technologist for HPC, VMware
VIRT1983BU
#VMworld #VIRT1983BU
Making the Complicated Simple: Cycle Harvesting from the Virtual Desktop Infrastructure Estate for Financial Modeling and Simulation
VMworld 2017 Content: Not fo
r publication or distri
bution
Agenda
1 Virtualized HPC
2 JNL Environment
3 Cycle Harvesting at JNL
4 Performance
5 Benefits
6 Futures
#VIRT1983BU CONFIDENTIAL 4
VMworld 2017 Content: Not fo
r publication or distri
bution
Virtualized HPCScience, Research, Engineering, & Financial Applications on vSphere
VMworld 2017 Content: Not fo
r publication or distri
bution
Our Goal and Approach
• Increase agility and decrease time to discovery for researchers, scientists, and engineers
• Provide IT with the ability to efficiently provision, allocate, manage and ensure compliance of research compute infrastructure across an increasingly broad range of technical and business requirements
• By leveraging VMware’s proven, enterprise-class virtualization and cloud technologies to meet the performance requirements of research computing and HPC workloads, and
• Bringing novel capabilities to bear to enable new capabilities not available in traditional HPC environments
#VIRT1983BU CONFIDENTIAL 6
VMworld 2017 Content: Not fo
r publication or distri
bution
#VIRT1983BU CONFIDENTIAL 7
• Scientific or technical workloads
• Often floating-point intensive
• Often storage intensive
• Often parallel
• Mechanical Design/Drafting
• Chemical Engineering
• Economics/Financial
• Weather
• Electronic Design Automation (EDA)
• Geosciences
• Defense
• Computer-Aided Engineering (CAE)
• Bioscience
• Government Lab
• University/Academic
HPCCluster MPI
Jobs
ThroughputJobs
MessagePassingInterface
HPC Workloads
VMworld 2017 Content: Not fo
r publication or distri
bution
Virtual Machine Benefits for HPC
8
hardware
hypervisor
VMOSApp
Virtual Machines offer:
• Heterogeneity• Multi-tenant data security• Fault isolation• Reproducibility• Fault resiliency• Dynamic load balancing• Performance
hardware
#VIRT1983BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Virtualized HPC Performance
9
Representative throughput and MPI examples (performance ratios – higher is better)
0
0.2
0.4
0.6
0.8
1
Pe
rfo
rma
nc
e R
ati
o
Run 1
Run 2
Run 3
Run 4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
pe
rfo
rma
nce
Ra
tio
s
FLUENT
GROMACS
LAMMPS
LS-DYNA
NAMD
OpenFOAM
Monte Carlo simulation Science & engineering applications
#VIRT1983BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Introducing vSphere Scale-Out for Big Data and HPC Workloads
10
• Hypervisor, vMotion, vShield Endpoint, Storage vMotion, Storage APIs, Big Data Extensions, Distributed Switch, I/O Controls & SR-IOV, Host Profiles / Auto Deploy and more
Features
• Sold in Packs of 8 CPU at a cost-effective price pointPackaging
• EULA enforced for use w/ Big Data/HPC workloads onlyLicensing
New package that provides all the core features required for scale-out workloads at an attractive price point
VMworld 2017 Content: Not fo
r publication or distri
bution
Cycle Harvesting
• Long history of harvesting spare cycles in HPC and more broadly
• SETI@home
• Performance
– On average, about 980 TFLOPs
– 104,000 active users
– 156,000 active hosts
11#VIRT1983BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Cycle Harvesting
• Berkeley Open Infrastructure for Network Computing (BOINC)
– 100 projects
– On average, about 110 PetaFLOPs
– 203,000 active users
– 1,109,000 active hosts
• HTCondor – University of Wisconsin
– Well-established HPC distributed resource manager and opportunistic scheduler
– Support for VMware virtual machines
• These approaches require adding software infrastructure and complexity to achieve cycle harvesting
• Is there another way for VMware customers?
12Stats: http://boincstats.comBOINC: https://boinc.berkeley.edu #VIRT1983BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Jackson National Life EnvironmentMaking the complicated awesomely simple
VMworld 2017 Content: Not fo
r publication or distri
bution
Architectural Imperatives
#VIRT1983BU CONFIDENTIAL 14
• Measure everything based on user experience
• Stay current (Win10 & Office 365 Pro Plus, etc.)
• Focus on user productivity (uptime & responsiveness)
• Shared Nothing Architecture
– Reduce fault domains to fewer than 30 users
– Increase throughput via parallelization
• High core count for peak usage
– Subpar performance is unacceptable even 1% of the day
– Do not believe in expensive Overstuffed Rackmounts with limited cores and overcommitted buses
• Extremely limited memory overcommit, even in a failover situation
• Spread departments and functions across data centers
• “Re-leveling Scripts” based on sizing factors that match our chargeback factors
– Charge based on size, not based on use (like a laptop)
VMworld 2017 Content: Not fo
r publication or distri
bution
User Specific Virtual Desktops Low Cost – High Performance
#VIRT1983BU CONFIDENTIAL 15
• Use low cost components with no bottlenecks to memory and disk
• $255/year AVG Per VDI for hardware, including all storage & GPUs (a laptop per year?)
• Differentiated service levels – Treat each Virtual Desktop like a different laptop
Model 2017 Description of Config
Bronze Desktop 2 cores, 5 GB RAM, Windows 10, 100Gig HD, 10,000-15,000 IOPs
Silver Desktop 2 cores, 8 GB RAM, Windows 10, 100Gig HD, 10,000-15,000 IOPs
Gold Desktop 4 cores, 8 GB RAM, Windows 10, 128Gig HD, 10,000-15,000 IOPs
Platinum Desktop 6 cores, 10GB RAM, GPU, Windows 10, 128-256Gig HD, 10,000-15,000 IOPs
Diamond Desktop 8 cores, 12GB RAM, GPU, Windows 10, 100+ Gig HD, 40,000-70,000 IOPs
Concierge Desktop I 8 Cores, 16GB RAM, GPU, Windows 10, 100+ Gig HD, 40,000-200,000 IOPs
Concierge Desktop II 14 Cores, 32GB RAM, GPU, Windows 10, 100+ Gig HD, 40,000-200,000 IOPs
Concierge Desktop III 14 Cores, 64GB RAM, GPU, Windows 10, 100+ Gig HD, 40,000-200,000 IOPs
Concierge Desktop IV 28 Cores, 96GB RAM, GPU, Windows 10, 100+ Gig HD, 40,000-526,000 IOPs
VMworld 2017 Content: Not fo
r publication or distri
bution
Blade Design
SSD 1: 1.6 TB – single VMFS partition
NFS VMDK – exposed only
via NFS mount
Blade: 2 CPU, each from 8 to 14 cores
SSD 2: 1.6 TB – single VMFS partition
vSphere Replication fromSSD #1 on host
6001c
NFS VMDK – exposed only
via NFS mount
vSphere Replication to
SSD #2 on host 6001c
Hostname: 6001a
#VIRT1983BU CONFIDENTIAL 16
VMworld 2017 Content: Not fo
r publication or distri
bution
#VIRT1983BU CONFIDENTIAL 17
Typical Blade: 2 CPU, 8 – 14 cores each
Cores
CPUs
2 vCPUNFS VM
Various Sizes, up to 5:1 CPU oversubscriptionVDI VMs
VDI Design – vCPU Allocation
VMworld 2017 Content: Not fo
r publication or distri
bution
Data Center Design: Active – Active
#VIRT1983BU CONFIDENTIAL 18
vSphere 6.0 Update 2, transitioning to vSphere 6.5
16 Blades
13 Enclosures+ 16 Rack Mounts
Identical to Data Center A
Multiple 10GB Fiber
Links
Data Center A Data Center C
VMworld 2017 Content: Not fo
r publication or distri
bution
Environmental Summary
444 Hosts being cycle harvested
10,468 Cores
71TB of RAM being used by cycle harvesters in 155TB physical RAM
280TB of SSD being used out of 2007TB physical local SSD
Total capacity of 71.6 Million IOs Per Second (4K RW 50%)
Total throughput of 4.022Tb/s to disk
6768 VDI desktops
888 Cycle harvesting desktop VMs
#VIRT1983BU CONFIDENTIAL 19
VMworld 2017 Content: Not fo
r publication or distri
bution
Cycle HarvestingHarvesting unused VDI compute cycles for Risk Analysis
VMworld 2017 Content: Not fo
r publication or distri
bution
Cycle Harvesting Virtual Machine Overlay
Typical Blade: 2 CPU, 8 – 14 cores each
Cores
CPUs
2 vCPUNFS VM
Various Sizes, up to 5:1 CPU oversubscriptionVDI VMs
Equal to # cores in a CPU Equal to # cores in a CPUCycle Harvesting VMs
#VIRT1983BU CONFIDENTIAL 21
VMworld 2017 Content: Not fo
r publication or distri
bution
Cycle Harvesting VM Configuration
Virtual CPUs Equal to number of cores per socket
Virtual RAM 7 GB RAM per core
Virtual Disk 30 GB per core of scratch
60 GB for BootWindows 10
MG Alfa, etc.
Drive C:/ Drive D:/
NFS mount (eventually will be VMDK on local SSD’s VMFS)
Boot Disk
VMDK - sits directly on local SSD’s VMFS
Scratch Disk:10 – 30 GB
MSFT DFS for job data
#VIRT1983BU CONFIDENTIAL 22
VMworld 2017 Content: Not fo
r publication or distri
bution
Cycle Harvesting Virtual Disk Overlay
SSD 1: 1.6 TB – single VMFS partition
NFS VMDK – exposed only
via NFS mount
Cycle harvesting
VM 1 VMDK
Blade: 2 CPU, each from 8 to 14 cores
Cycle harvesting
VM 2 VMDK
SSD 2: 1.6 TB – single VMFS partition
vSphere Replication fromSSD #1 on host
6001c
NFS VMDK – exposed only
via NFS mount
vSphere Replication to
SSD #2 on host 6001c
Hostname: 6001a
Not replicated Not replicated
#VIRT1983BU CONFIDENTIAL 23
VMworld 2017 Content: Not fo
r publication or distri
bution
Cycle Harvesting Job Submission
Cores
2 vCPUNFS VM
Shares set to 100 per vCPU Shares set to 100 per vCPUCycle Harvesting VMs
HPC Job Slots
Shares set to 1000 per vCPUVDI VMs
Shares set to 1000 per vCPU
Windows HPC Server
#VIRT1983BU CONFIDENTIAL 24
VMworld 2017 Content: Not fo
r publication or distri
bution
Performance AnalysisRepresentative performance details from a single host
VMworld 2017 Content: Not fo
r publication or distri
bution
#VIRT1983BU CONFIDENTIAL 26
VD
I V
M C
PU
%U
SE
DVDI VM Activity
VMworld 2017 Content: Not fo
r publication or distri
bution
VDI and Harvester CPU Utilization
27#VIRT1983BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
#VIRT1983BU CONFIDENTIAL 28
Peak utilization = 100% * (28 cores) * (1.25 HT factor) = 3500
All VMs on a 28-core Host
VMworld 2017 Content: Not fo
r publication or distri
bution
Harvesters and VDI VMs
29#VIRT1983BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Harvesters and VDI VMs – Detail
30#VIRT1983BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
A Further Note on Licensing
• vSphere Desktop, the vSphere edition that underpins VMware Horizon, is restricted by EULA to run desktop VMs
• Jackson harvester VMs satisfy this requirement by
– Running a desktop OS (Windows 10)
– Running the Horizon View Agent
– Being tied to users within Jackson’s Named User limit
31#VIRT1983BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Benefits
VMworld 2017 Content: Not fo
r publication or distri
bution
Primary Benefits
• Cost savings due to avoided cloud costs
• No new skills required to administer solution
• Cycle harvesting is transparent to both VDI and HPC users
• Increased virtual infrastructure reliability
• Better VDI experience for desktop users
• Enables end-users (actuaries) to think beyond just batch scheduling
33#VIRT1983BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Futures
VMworld 2017 Content: Not fo
r publication or distri
bution
Future Directions
• Harvesting GPU compute cycles
– Testing OpenCL-based approaches
• 1M stream processors online by Q2 2018
• 60K Xeon core-equivalent capacity
35#VIRT1983BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Extreme Performance Series – Las Vegas
• SER2724BU Performance Best Practices
• SER2723BU Benchmarking 101
• SER2343BU vSphere Compute & Memory Schedulers
• SER1504BU vCenter Performance Deep Dive
• SER2734BU Byte Addressable Non-Volatile Memory in vSphere
• SER2849BU Predictive DRS – Performance & Best Practices
• SER1494BU Encrypted vMotion Architecture, Performance, & Futures
• STO1515BU vSAN Performance Troubleshooting
• VIRT1445BU Fast Virtualized Hadoop and Spark on All-Flash Disks
• VIRT1397BU Optimize & Increase Performance Using VMware NSX
• VIRT2550BU Reducing Latency in Enterprise Applications with VMware NSX
• VIRT1052BU Monster VM Database Performance
• VIRT1983BU Cycle Harvesting from the VDI Estate for Financial Modeling
• VIRT1997BU Machine Learning and Deep Learning on VMware vSphere
• FUT2020BU Wringing Max Perf from vSphere for Extremely Demanding Workloads
• FUT2761BU Sharing High Performance Interconnects across Multiple VMs
#VIRT1983BU CONFIDENTIAL 36
VMworld 2017 Content: Not fo
r publication or distri
bution
Extreme Performance Series – Hand on Labs
Don’t miss these popular Extreme Performance labs:
• HOL-1804-01-SDC: vSphere 6.5 Performance Diagnostics & Benchmarking
– Each module dives deep into vSphere performance best practices, diagnostics, and optimizations using various interfaces and benchmarking tools.
• HOL-1804-02-CHG: vSphere Challenge Lab
– Each module places you in a different fictional scenario to fix common vSphere operational and performance problems.
#VIRT1983BU CONFIDENTIAL 37
VMworld 2017 Content: Not fo
r publication or distri
bution
Performance Survey
#VIRT1983BU CONFIDENTIAL 38
The VMware Performance Engineeringteam is always looking for feedback about your experience with theperformance of our products, ourvarious tools, interfaces and wherewe can improve.
Scan this QR code to access ashort survey and provide us directfeedback.
Alternatively: www.vmware.com/go/perf
Thank you!
VMworld 2017 Content: Not fo
r publication or distri
bution
VMworld 2017 Content: Not fo
r publication or distri
bution
VMworld 2017 Content: Not fo
r publication or distri
bution