43
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Timothy DiLauro, AWS Solutions Architect Julien Lépine, AWS Solutions Architect October 2015 CMP306 On-Demand Windows HPC on AWS Windows Clusters for Dynamic Needs

(CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

Embed Size (px)

Citation preview

Page 1: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Timothy DiLauro, AWS Solutions Architect

Julien Lépine, AWS Solutions Architect

October 2015

CMP306

On-Demand Windows HPC on AWS

Windows Clusters for Dynamic Needs

Page 2: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

What to Expect from the Session

HPC on AWS

AWS Architecture for Windows HPC

AWS Architecture for HPC

Best Practices for Windows HPC

Demonstration

Page 3: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

HPC on AWS

Page 4: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

Low cost with flexible pricing Efficient clusters

Unlimited infrastructure

Faster time to results

Concurrent Clusters on-demand

Increased collaboration

Why AWS for HPC?

Page 5: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

Popular HPC workloads on AWS

Genome

processing

Modeling and

Simulation

Government and

Educational Research

Monte Carlo

Simulations

Transcoding and

Encoding

Computational

Chemistry

Page 6: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

Benefits of Agility

Elastic Cloud-Based Resources

Actual demand

Resources scaled to demand

Waste Customer

Dissatisfaction

Actual Demand

Predicted Demand

Rigid On-Premises Resources

Page 7: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

Cost Benefits of HPC in the Cloud

Pay As You Go Model

Use only what you need

Multiple pricing models

On-Premises

Capital Expense Model

High upfront capital cost

High cost of ongoing support

Page 8: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

AWS Journey for HPC Customer

Dev, Test, Eval True Production Mission Critical All-in

Build new production apps

Migrate production apps

Build mission-critical apps

Migrate mission-critical apps

Development and test

Eval and training

Corporate standard

“Cloud First”

Page 9: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

AWS Architecture for HPC

Page 10: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

On-Demand HPC on AWS

With AWS, deploy multiple clusters

running at the same time and match the

architectures to the jobs

Page 11: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

AWS Architecture for HPC

Amazon

Virtual Private

Cloud

Amazon

Simple Storage

Service

Amazon

Elastic Block

Store

Amazon

Elastic Compute

Cloud

Amazon

CloudWatch

AWS

CloudFormation

Auto Scaling

Page 12: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

2006 2007 2008 2009 2010 2011 2012-2013 2014

m1.small

m1.xlarge

m1.large

m1.small

m2.2xlarge

m2.4xlarge

c1.medium

c1.xlarge

m1.xlarge

m1.large

m1.small

cc2.8xlarge

cc1.4xlarge

cg1.4xlarge

t1.micro

m2.xlarge

m2.2xlarge

m2.4xlarge

c1.medium

c1.xlarge

m1.xlarge

m1.large

m1.small

cr1.8xlarge

hs1.8xlarge

m3.xlarge

m3.2xlarge

hi1.4xlarge

m1.medium

cc2.8xlarge

cg1.4xlarge

t1.micro

m2.xlarge

m2.2xlarge

m2.4xlarge

c1.medium

c1.xlarge

m1.xlarge

m1.large

m1.small

cc1.4xlarge

cg1.4xlarge

t1.micro

m2.xlarge

m2.2xlarge

m2.4xlarge

c1.medium

c1.xlarge

m1.xlarge

m1.large

m1.small

t2.micro

t2.small

t2.medium

t1.micro

hs1.8xlarge

m3.xlarge

m3.2xlarge

hi1.4xlarge

m1.medium

cc2.8xlarge

cr1.8xlarge

cg1.4xlarge

m2.xlarge

m2.2xlarge

m2.4xlarge

c1.medium

c1.xlarge

m1.xlarge

m1.large

m1.small

c1.medium

c1.xlarge

m1.xlarge

m1.large

m1.small

new

existing

Amazon Elastic Compute Cloud

g2.2xlarge

hs1.xlarge

hs1.2xlarge

hs1.4xlarge

c3.large

c3.xlarge

c3.2xlarge

c3.4xlarge

c3.8xlarge

m3.medium

m3.large

i2.large

i2.xlarge

i2.4xlarge

i2.8xlarge

r3.large

r3.xlarge

r3.2xlarge

r3.4xlarge

r3.8xlarge

Continuing to enable customer choice and right sizing of clusters

m4.large

m4.xlarge

m4.2xlarge

d2.xlarge

d2.2xlarge

d2.4xlarge

d2.8xlarge

t2.micro

t2.small

t2.medium

t2.large

t1.micro

hs1.8xlarge

m3.xlarge

m3.2xlarge

hi1.4xlarge

m1.medium

cc2.8xlarge

cr1.8xlarge

cg1.4xlarge

m2.xlarge

m2.2xlarge

m2.4xlarge

c1.medium

c1.xlarge

m1.xlarge

m1.large

m1.small

m4.4xlarge

m4.10xlarge

c4.xlarge

c4.2xlarge

c4.4xlarge

c4.8xlarge

g2.8xlarge

g2.2xlarge

hs1.xlarge

hs1.2xlarge

hs1.4xlarge

c3.large

c3.xlarge

c3.2xlarge

c3.4xlarge

c3.8xlarge

m3.medium

m3.large

i2.large

i2.xlarge

i2.4xlarge

i2.8xlarge

r3.large

r3.xlarge

r3.2xlarge

r3.4xlarge

r3.8xlarge

2015

Page 13: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

Auto Scaling and Amazon CloudWatchMatch demands of cluster queue with appropriate compute needs

CloudWatch

Auto Scaling group

Windows HPC Job Manager

Page 14: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

Amazon Elastic Block Store

• Designed for five nines of availability

• Attaches to Amazon EC2 within the same Availability Zone

• Point-in-time snapshots to Amazon S3

• Checkbox enabled encryption

MagneticGeneral Purpose

(SSD)

Provisioned IOPS

(SSD)

Volume types

When performance

matters, use SSD-

backed volumes!

Network attached persistent block storage volumes for Amazon EC2

Page 15: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

Amazon EBS

• Default 30 GB volume

• Gets initial I/O credit of 5.4M

• Burst for up to 30 mins @ 3000 IOPS

• Accumulate 90 I/O credits/second

Windows Boot Volume

Decrease launch time of instances by leveraging General Purpose SSD

Page 16: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

Amazon Simple Storage ServiceStore input and result datasets for dynamic and transitive Windows HPC clusters

RedundancyDurability: designed for 99.999999999%

Availability: designed for 99.9%

CapacityConsumption-based storage model

Virtually unlimited capacity

SecurityEncryption in Transit: HTTPS/TLS

Encryption at Rest: SSE, SSE-C, SSE-KMS

Ease of useStorage Classes: Standard, RRS, Glacier

Lifecycle Policies: archive, expiration

Page 17: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

Amazon S3

Copy data to Amazon S3 and enable SSE

Write-S3Object –BucketName mybucket -Folder .\Scripts -KeyPrefix SampleScripts\ -ServerSideEncryption

Copy data from Amazon S3 to a local folder

Read-S3Object –BucketName mybucket -KeyPrefix SampleScripts –Folder .\

• Bucket: mybucket

• Keyname Space: SampleScripts

• Local Folder: .\Scripts

Migrate data to AWS and Windows HPC clusters with AWS Tools for PowerShell

Page 18: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

AWS CloudFormation

• Create templates to describe the AWS resources used to run your

application

• Provision identical copies of a stack

• Templates can be stored in a source control system

• Track all changes made to your infrastructure stack

• Modify and update resources in a controlled and predictable way

• Just choose what resources and configurations you need

• Customize your template via parameters

Consistently and easily deploy Windows HPC clusters based on workflow needs

Templated resource provisioning

Infrastructure as code

Declarative and flexible

Page 19: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

AWS Architecture for HPC

• Users directory

• Bastion host

• Head node

• Compute nodes

Core Infrastructure Cluster Infrastructure

Amazon VPC

Users

Bastion

Core

Head

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Cluster

Page 20: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

AWS Architecture for HPC

Hybrid or “burst” All-in AWS

Choose the right deployment architecture for the use case

Core infrastructure:Users directory

Bastion host

On-premises

AWS

AWS Directory Service

Amazon EC2

Cluster infrastructure:Head node

Compute node

Storage

AWS

AWS

On-premises/AWS

Amazon EC2

Amazon EC2

Amazon S3

User workstations On-premises Amazon WorkSpaces

Page 21: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

AWS Architecture for HPC“Burst” to virtually unlimited compute capacity in AWS

Amazon VPC

Users

Bastion

Core

Head

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Compute

ClusterWorkstations

Head

HPCUsers

CoreCluster

On-Premise

HPC

HPC HPC

Page 22: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

AWS Architecture for HPCDeploy users, infrastructure, and cluster all in AWS

Amazon VPC

Core

Head

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Compute

ClusterWorkstations

Users

Bastion

Page 23: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

AWS Architecture for Windows HPC

Page 24: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

Windows Server on AWS

Easy Licensing

OS $/Hr

BYOL

Optimized AWS

Software for

Windows

EC2Config, drivers

Experience

October 2008

Every use case

Every industry

OS Choice

2003R2

2008, 2008R2

2012, 2012R2

Microsoft Portfolio

SQL Server

SharePoint

Exchange, Lync

Customize Systems

50+ EC2 instances

32, 64 bits

CPU, GPU

Page 25: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

AWS Architecture for Windows HPCNetworking best practices for Windows HPC clusters

• Network Design- Leverage both public and private subnets, manage sizing

• Availability – Use multi-AZ design

• Access Control – use VPC endpoint and NAT for external accesses

Availability Zone A

Availability Zone B

Private Subnet

10.0.10.0/24

Public Subnet

10.0.0.0/24

Core

Private Subnet 2

10.0.11.0/24

VPCEndpoint

NAT

Public Subnet

10.0.1.0/24

NAT

Page 26: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

AWS Architecture for Windows HPC

• Domain Controller – Highly available extension of your existing environment

• Remote Desktop Gateway - Increase security posture

Core infrastructure best practices for Windows HPC clustersAvailability Zone A

Availability Zone B

Private Subnet

10.0.10.0/24

Public Subnet

10.0.0.0/24

DC

Core

Private Subnet 2

10.0.11.0/24

DC

RDGW

Public Subnet

10.0.1.0/24

Page 27: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

AWS Architecture for Windows HPC

• Head Node – Size independent of Compute Node, General Purpose family

• Compute Nodes – use Auto Scaling groups and cluster instances

• S3 Bucket – Persistent, secure, available storage of cluster input and results

Cluster infrastructure best practices for Windows HPC clusters

Availability Zone B

Availability Zone A

Private Subnet

10.0.10.0/24

Public Subnet

10.0.0.0/24

Core

Private Subnet 2

10.0.11.0/24

Head

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Cluster

Public Subnet

10.0.1.0/24

S3Bucket

VPCEndpoint

Page 28: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

AWS Architecture for Windows HPCAll at once, complete Windows HPC infrastructure on AWS

Availability Zone B

Availability Zone A

Private Subnet

10.0.10.0/24

Public Subnet

10.0.0.0/24

DC

S3Bucket

Core

Private Subnet 2

10.0.11.0/24

DC

Head

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Cluster

VPCEndpoint

RDGW

NAT

Public Subnet

10.0.1.0/24

NAT

Page 29: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

AWS Architecture for Windows HPCLaunch multiple clusters right-sized to complete work in amount of time specified

Private Subnet

10.0.10.0/24

Public Subnet

10.0.0.0/24

DC

Core

Private Subnet 2

10.0.11.0/24

DC

Head

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Cluster

Head

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Head

Compute

Compute

Compute

Compute

Compute

Compute

Compute Compute Compute Compute

RDGW

NAT

Public Subnet

10.0.1.0/24

NAT

Availability Zone A

Availability Zone B

S3Bucket

VPCEndpoint

Page 30: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

Best Practices for Windows HPC

Page 31: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

Secure Windows HPC Workloads on AWS

AWS Resource Access: Enable access to AWS resource through

policies in IAM roles

Encryption at Rest: Enable encryption on EBS volumes and specify

server side encryption for objects in Amazon S3

Create private access to input and output results stored in Amazon S3

via VPC endpoints

Ensure auditability of AWS account by enabling AWS CloudTrail

Leverage native AWS security features to enhance the

security posture of Windows HPC

Page 32: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

Optimized network for Windows HPC

Enhanced Networking: SR-IOV feature provides higher PPS

performance, lower latencies, and very low network jitter

Placement Groups: All instances get low latency, full bisection,

10Gbps bandwidth between instances

EBS Optimization: Get up to 4000Mbps additional dedicated

throughput dedicated to your storage needs

AWS PV Drivers / Intel Drivers: Make sure you stay current with

the latest

Get the most of AWS networking for your HPC workloads

Page 33: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

Optimized processing with Windows HPC

Hyper-threading: Most current generation AWS instances provide

hyper-threading, keep it or deactivate it based on your needs

Turbo Boost: Latest generation of instances leave you control C-

state and P-state registers for your processors

The right instance: Choose your constraints (price, CPU, GPU,

RAM, network) and get the instance type that fits your use case

The right storage: Choose the amount and support of instance

storage or Amazon EBS storage required, and leverage storage

services such as Amazon S3

Get the most of your instances for your HPC workloads

Page 34: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

Automated Windows HPC computing

Windows PowerShell®: You can get all the installation and

configuration of the instances done automatically

AWS Tools for Windows PowerShell: Your cluster can become

aware of the infrastructure it is running on

Auto Scaling: Automate provisioning and scaling of your cluster to

have your workloads finished when you need them

AWS CloudFormation: Deploy your clusters in a few clicks, create

test clusters in minutes

Get your cluster as code, running in minutes from scratch

Page 35: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

Demonstration

Page 36: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

Windows HPC AWS CloudFormation TemplateEnable automated deployments of clusters with pre-built template

Amazon VPC

DC

RDGW

Core

Head

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Compute

Cluster

Page 37: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

AWS CloudFormation Templates: PrerequisitesThings to do before starting the template

Select your region and base image• VPC + Subnet: Just input selected CIDR

• Instance Types: for all instances

• (Optional) Placement Group: Create a VPC placement group

Prepare installation media then snapshot• Download Microsoft HPC Pack and unzip to \HPCPack2012R2-Full

• Extract SQL Server installation to \SQLInstall

• Download Intel SR-IOV drivers and extract to \PROWinx64

• Download latest AWS PV drivers and extract to \AWSPVDriverSetup

Select installation configuration:• Define domain configuration and credentials

Page 38: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

AWS CloudFormation Template: CoreBuilding the core Windows infrastructure

Base Network• VPC + Public Subnet: Select your CIDR

• DHCP Option Set: Configured to use DC

• Security Groups: For bastion and cluster

Core Infrastructure:• Domain Controller in new forest

• Remote Desktop Bastion Host (outside of domain)

• Domain User “Join Computer to Domain” privileges

Page 39: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

AWS CloudFormation Template: ClusterBuilding the Microsoft HPC cluster on AWS

Head-Node• Multi-role: database, HPC Head node, Share

• Monitored: Amazon CloudWatch Custom metrics

Compute Nodes:• Automated: Automatic configuration to join the cluster

• Scalable: Auto Scaling group resizing the cluster based on load

• Up-to-date: auto upgrade of AWS and Intel Drivers

Page 40: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

Windows HPC AWS CloudFormation Template

In < 30 minutes, your cluster will be ready to accept jobs.

Page 41: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

Getting Started Collateral

QwikLAB: Launching Microsoft HPC Pack on AWS:

https://www.qwiklab.com/focuses/preview/1604?search=19103

Reference CloudFormation Template:

https://github.com/awslabs/aws-cfn-windows-hpc--template

Page 42: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

Remember to complete

your evaluations!

Page 43: (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS

Thank you!