Upload
amazon-web-services
View
467
Download
7
Embed Size (px)
Citation preview
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Adam Boeglin, HPC Solutions Architect
Monday, October 31, 2016
Launch a thousand core HPC cluster in minutes with AWS CfnCluster
Webinar Highlights
• What is CfnCluster and when to use it• Architecture guidance to fit your
security models• How to install and configure of
CfnCluster• Demo: Review of CfnCluster and
managing compute at scale
Introduction to CfnCluster
• AWS CloudFormation + Cluster = CfnCluster• Simple to install, easy to manage• Everything you need to get a cluster up and running in
minutes• Head node with scheduler• Shared NFS Storage
• /home• /shared
• OpenMPI• Compute nodes that grow and shrink on demand
Workloads Well Suited for CfnCluster
• Computational Fluid Dynamics• Semiconductor Design• Weather Modeling• Genomics and Molecular Simulation• Seismic and reservoir simulations• 3D rendering and visualizations• … anything that uses a traditional HPC scheduler
Cluster HPC and Grid HPC
Cluster HPCTightly coupled, latency sensitive
applications
Use larger EC2 compute instances, placement groups,
Enhanced Networking
Grid HPCLoosely coupled,
pleasingly parallel.
Requires very little node to node interaction.
Grids of ClustersUse a grid strategy on the cloud
to run a group of parallel, individually clustered HPC jobs
Computational Fluid DynamicsANSYS Fluent
• AWS c4.8xlarge• 140M cells• F1 car CFD benchmark
http://www.ansys-blog.com/simulation-on-the-cloud/
https://aws.amazon.com/hpc/cfncluster/
Configuration Options• Operating System
• Amazon Linux• Centos 6• Centos 7• Ubuntu 14.04
• Scheduler• Sun Grid Engine (SGE)• OpenLava• Torque• SLURM
• Storage Size & IOPS• EBS & Instance Store
Encryption• Scaling Speed & Limits• Provisioning Scripts
Many AWS services to tie it all together
• CloudFormation manages the state of the cluster• Amazon CloudWatch & Auto Scaling lets compute fleet
grow and shrink on demand• Amazon SQS & Amazon SNS allows compute nodes to
signal to master when they’re online• AWS Identity and Access Management (IAM) allows for
fine grained access control• Amazon S3 for storage of CloudFormation templates
Amazon S3
DynamoDB
Amazon SQS
CloudWatch
Internet Gateway
(IGW)
region-1a
Master Server
Auto ScalingCompute
Fleet
CloudFormation
Standalone CfnCluster
Amazon S3
DynamoDB
Amazon SQS
CloudWatch
Internet Gateway
(IGW)
Private Subnet
Master Server
Auto ScalingCompute
Fleet
CloudFormation
Public Subnet
VPC NAT gateway
Private Subnet Route TableVPC Traffic -> Local
0.0.0.0 -> Nat Gateway
Public Subnet Route TableVPC Traffic -> Local
0.0.0.0 -> Internet Gateway
Isolated CfnCluster
Bastian Server
Amazon S3
DynamoDB
Amazon SQS
CloudWatch
Internet Gateway
(IGW)
Private Subnet
Master Server
Auto ScalingCompute
Fleet
CloudFormation
Public Subnet
VPC NAT gateway
Corporate Data Center
Engineer VPN Connection
Private Subnet Route TableVPC Traffic -> Local
Corp IP Range -> VPN0.0.0.0 -> Nat Gateway
Public Subnet Route TableVPC Traffic -> Local
Corp IP Range -> VPN0.0.0.0 -> Internet Gateway
Isolated CfnCluster w/ VPN
Private Subnet
Master Server
Auto ScalingCompute
Fleet
Amazon S3
DynamoDB
Amazon SQS
CloudWatch
CloudFormation
Corporate Data Center
Proxy ServerVPN Connection
InternetConnection
Private Subnet Route TableVPC Traffic -> Local
Corp IP Range -> VPN0.0.0.0 -> VPN
Private CfnCluster w/ VPN & Proxy
Creating an IAM User
• Create an IAM user with Administrative privileges• Fine grain access controls can be done later
• Generate an Access & Secret key and keep it safe
Create an SSH Key
• Generate or import the key you’ll use for user login
Installing the CfnCluster CLI
• On your desktop or a bastion server
$ sudo pip install cfncluster
Creating the Base Configuration
• First, create the base config required to start a cluster.
$ cfncluster configure
Edit the configuration file to meet your needs
• Reference the configuration docs• http://cfncluster.readthedocs.io/en/latest/configuration.html
$ vim ~/.cfncluster/config
Launch the Cluster
$ cfncluster create mycluster
• Cluster creation usually takes ~15 minutes
• Completely managed by CloudFormation
Submit your first job[ec2-user@ip-10-0-0-17 ~]$ cat hw.qsub#!/bin/bash##$ -cwd#$ -j y#$ -pe mpi 2#$ -S /bin/bash#module load openmpi-x86_64mpirun -np 2 hostname
[ec2-user@ip-10-0-0-17 ~]$ qsub hw.qsub Your job 1 ("hw.qsub") has been submitted
[ec2-user@ip-10-0-0-17 ~]$ qstatjob-ID prior name user state submit/start at queue slots ja-task-ID ------------------------------------------------------------------------------------------------ 1 0.55500 hw.qsub ec2-user r 02/01/2015 05:57:25 [email protected] 2
[ec2-user@ip-10-0-0-17 ~]$ ls -ltotal 8-rw-rw-r-- 1 ec2-user ec2-user 110 Feb 1 05:57 hw.qsub-rw-r--r-- 1 ec2-user ec2-user 26 Feb 1 05:57 hw.qsub.o1
[ec2-user@ip-10-0-0-17 ~]$ cat hw.qsub.o1 ip-10-0-0-44ip-10-0-0-45
EBS Snapshots for Software & Storage Management
• Install your applications and store any working data to /shared
• Create a snapshot of that volume
• Re-use that snapshot every time you launch your cluster
ebs_snapshot_id = snap-xxxxx
Master Server
Root & HomeVolume (/ & /home)
NFS Shared Volume(/shared)
Amazon EBS Snapshot
(snap-xxxxx)
Upgrading Hardware is Easy!
• Simple upgrade from Ivy Bridge to Haswell
1. Let all compute nodes stop2. Edit ~/.cfncluster/config and change
compute_instance_type = c3.8xlargeto
compute_instance_type = c4.8xlarge3. Update the cluster
$ cfncluster update mycluster
C3
C4
Demo: Launching a Cluster
Thank you!