View
5
Download
0
Category
Preview:
Citation preview
HPC in the Cloud BOF
Sara Jeanes, Boyd Wilson, Amy CannonI2 & Omnibond (CloudyCluster)
© 2016 Internet2
[ 2 ]© 2016 Internet2
Things to Consider (Discussion Topics)• People & Disciplines• Workloads & Technology• Funding & Integration• Cloud vs. Datacenter Costs• Break• Optional Demo – Hands On
HPC in the Cloud
[ 3 ]
HPC Support
• People + People– Researchers
• HPC or Parallel computation opens doors, but a majority of researchers can’t do it alone
• CI Practitioners • Disciplines
– Sciences, Engineering, Arts, Humanities, Social Sciences– Machine Learning will extend this– More will come
• Additional Pressure on resources (funding, support and infrastructure)
© 2016 Internet2
[ 4 ]
Workloads
• Pleasingly Parallel (P2) / High Throughput Computing• Message Passing Interface (MPI)
– Light Communication– Heavy Communication
• Data Intensive Computing – Big Data (Hadoop Ecosystem +)• Graphics Processing Unit (GPU)• Field-programmable gate array (FPGA)• Interactive Computation (Jupyter)• Machine Learning• Real Time
© 2016 Internet2
Where to start with HPC in the Cloud?
Technology
Here – Combine all the services yourself?
Technology
Self-Service Elastic HPC
Scheduler
CCQ
EFS S3
Auto-ScalingCompute
OrangeFS HPCParallelStorage
DDB
HPCJob
Login
WebDAVGlobus
CreateafullyoperationalHPCClusterinminutes,completewith:• Storage: OrangeFSonEBS,S3,EFS
• Compute: JobDrivenElasticComputethroughCCQ
• Scheduler: Torque/Maui&SLURMwithCCQMeta-Scheduler
• HPCLibraries:Boost,Cuda Toolkit,Docker,FFTW,FLTK,GCC,Gengetopt,GRIB2,GSL,Hadoop,HDF5,ImageMagick,JasPer,NetCDF,NumPy,Octave,OpenCV,OpenMPI,PROJ,R,Rmpi,SciPy,SWIG,WGRIB,UDUNITS,.NET Core, Singularity, Queue, Picard and xrootd
• HPCSoftware:Ambertools,ANN,ATLAS,BLAS,Blast,Blender,Burrows-WheelerAligner,CESM,GROMACS,LAMMPS,NCAR,NCL,NCO,nwchem,OpenFoam,papi,paraview,QuantumEspresso,SAMtools,WRF,Galaxy, Vtk, Su2, Dakota, Gatk and JupyterNotebook
• YoucanalsoInstallyourownsoftwareinacustomAMIorinEFS• AllfromaneasytouseWebUIfrommobile,tabletordesktop• iRODS andXDMoD aretargetedforFuturerelease.• OnAveragefor5%oftheinstancechargesno upfrontcosts
TorqueSlurm
Technology
[ 8 ]
Technology
© 2016 Internet2
Federated Web Authentication• Shibboleth• OAuth
Collaborate• Have the ability to create
collaborations• Invite other collaborators to
CloudyCluster• Initially can share Google Drive
Folders
[ 9 ]
TechnologyCCQ - Elastic HPC Dispatching
SchedulerDynamoDB
Login Instance
Public Subnet
SubmittheJobThroughCCQ
CCQholdsjobdeterminesand
launchesinstancesneeded
CCQSendsthejobtotheschedulerwhenready
SchedulerlaunchestheJobnormally
Ifnojobsareinthequeueforthat
instancetypenearthebillinghour,instances
areterminated
Compute Groups
Scheduler:TorqueSlurm
CCQ
[ 10 ]
Remove VisualizationFlip of the switch enables secure VNC
Technology
[ 11 ]
Serverless• Launch Code based on Events
Machine Learning as a Service• natural language understanding (NLU)• text-to-speech (TTS)
• Amazon – Rekognition, Polly, Lex• IBM - Watson• Google - Tensor Flow
Technology
[ 12 ]
FundingforCloudHPC
• NIHCloudCreditspilot,upto$6mtobereleasedforcurrentNIHInvestigators.Thegetthecreditsdirectlyfromtheprovider.CloudyClusterisaconformantplatform.
• NSFBigDataSciencesandEngineeringprogram$29m+$9minPublicCloudCredits(fromAWS,AzureandGoogle)tobegivendirectlytoresearchers.
[ 13 ]
CCQHub ProjectHPCJobRouting
Projectgoals:• RouteHPCJobsonPremiseortothe
Cloud• StageDatapriortolaunchingthe
job.• Returnjobresultswhencomplete.• Scalecloudresourceswith
CloudyCluster
Integration
[ 14 ]
NewFeaturesin V1.3
• SharedHomeDirectoriesinEFS• ConfigurableEBSvolumesper
instanceforOrangeFS• EncryptedEBSvolumeoptions• EnforceS3objectencryption• MFAsupport• SupportforCCQHub• NewLibrariesincludingMachine
LearningCodes• Mlpack,.NetCore,NuPIC,Octave,
OpenCV,PICARD,Queue,Scikit-learn,TensorFlowandTheano.
Scheduler
EFS S3
Auto-ScalingCompute
OrangeFSHPCParallelStorage
DDB
Login
WebDAVGlobus
TorqueSlurm
MultiFactorAuthentication
EncryptedEBS
VolumesOption
SharedHome
Directories
ConfigurableEBSvolumesperInstance
EnforceS3Object
Encryption
[ 15 ]
COTS Servers + Power + Power Equip + Free DC Building
– $14,375 (5 year, 36 core across 2 CPU, 64 GB Ram, Rack, Cabling, PDU), = 0.0091 per core hr(no UPS/Gen, no electrons, no cooling, no sysadmin, no netadmin, no building, 5yr warranty). ½ Kw power per server
– UPS+ Gen + Transfer Switch (Power Equipment) = $0.13 kw/hr, (100% Utilized)– Electron Charges $0.08 kw/hr Server, ½ for AC units $0.04 kw/hr = $0.12 kw/hr– Total UPS/GEN/Electron/Cooling: = $0.25 kw/hr– Cost per server 1/2kw (at 100% Capacity of DC) = $0.125 /36 cores = $0.0035 core/hr power/cooling (free
building)
– Network $2500 per 10G Port $0.057 hr /36 core = $0.0016 per core / hr– Total = $0.014 P/C 100% utilized Servers 100% utilized (No System Admin, No Network Admin) – Total = $0.021 if P/C is 50% utilized. Servers 100% utilized (Free Building, No System or Network
Admin)– Total = $0.023 P/C is 50% utilized and Servers 85% utilized (Benchmarks, upgrades, offline nodes),
(Free Building, No System or Network Admin).
Cost
[ 16 ]
Cost
• Cloud (AWS)– C4.8xlarge – 36 core, 60GB Ram– On Demand – $1.591 per instance, $0.0419 per core hr
• Can get the latest CPU/GPU as soon as its avaialble
– Reserved 3 yr -- $0.852 per instance, $0.0237 per core hr• Buying like HW
– Current Spot in Oregon -- $.60 per instance, $0.0167 per core hr– Current Spot in Ohio – $.39 per instance, $0.0108 per core hr
• Spot can be interrupted, so checkpoint or small jobs with restart capability.
– Does not include any quantity discounts, etc.. (Netflix doesn’t pay retail)
[ 17 ]
HPC Center off Web
Includes some sort of labor
[ 18 ]
Electron Costs per State (retail)
Subtitle (if any)
© 2016 Internet2
Thank you…
Optional Demo / Hands on (with Free AWS Credit)
Recommended