38
Microsoft Innovation Center for Technical Computing MICROSOFT AZURE IN HPC SCENARIOS Lukasz Miroslaw, Ph.D. [email protected] 18.11.2015, MICROSOFT SWITZERLAND

Microsoft Azure in HPC scenarios

  • Upload
    mictc

  • View
    1.069

  • Download
    2

Embed Size (px)

Citation preview

Microsoft Innovation Center for Technical

Computing

MICROSOFT AZURE IN HPC SCENARIOS

Lukasz Miroslaw, Ph.D.

[email protected]

18.11.2015, MICROSOFT SWITZERLAND

Challenges

2

57 % of users are dissatisfied with their desktop

computing capacity*

* Source: US Council of Competitiveness: http://www.compete.org, theubercloud.com

Computing: too slow

Memory: too small

Fig. Sometimes solving a problem with IT is hard.

Challenges

3

$70,000 server => $1M cost over 3 years

High costs of IT infrastructure

Low Cost of running in the cloud

Cost Model assumes that the hardware makes 7% of Total Costs

4

Fig. Cost of a GFLOP in U.S. Dollars on different Microsoft Azure

nodes and a private HSR cluster.

Motivation

5

Agenda

6

Use Case #1: Remote physical simulations with external partners.

Use Case #2: Scale out physical simulations in the cloud.

Use Case #3: Stellar Classification, Prediction of Energy Efficiency in

buildings

Conclusions.

Agenda

7

Use Case #1: Remote physical simulations with external partners. Azure IaaS, Remote App, Azure Batch

Use Case #2: Scale out physical simulations in the cloud.SimplyHPC, HPC Pack

Use Case #3: Stellar Classification, Prediction of Energy Efficiency in

buildings

AzureML

Conclusions.

What is Computational Fluid Dynamics?

CFD is the science to simulate fluid flow, heat and mass transfer and

chemical reactions

8

What is Computational Fluid Dynamics?

Airflow simulation around sky-diving Santa Claus.

9

* Source: Desktop Engineering

Use Case #1: Collaborative Simulations of Electrical Arcs

10

Use Case #1: Collaborative Simulations of Electrical Arcs

Goal #1: Develop a Cloud-based algorithm for electrical arc simulation

Microsoft Azure Research Award in 2014

Contact: Kenji Takeda (Microsoft Research)

Goal #2: Provide simulation tool to partners in Brasil and Deutschland

Ongoing collaborations

Streamer International (CTI Project)

Panasonic

Fraunhofer SCAI

WEG

11

1st Use case: Instant ANSYS

12

VM:

D14 with 16 core

CPU, 112 GB RAM,

Windows Server

2012

MpCCI, ANSYS

preinstalled

Storage: locally

redundant,

automatically

scallable

License Server (LS)

on A0 in Germany

Customer

VM

LS

INSTANT ANSYS

13

No Installation. No configuration. No up-front costs.

Access to powerful VMs with ANSYS already preinstalled

and preconfigured.

Access to redundant and highly available storage.

Disaster Recovery and 99.5% SLA.

Connection to on-premise infrastructure with IPSec VPN.

Use Case #1: INSTANT ANSYS

14

IaaS DEMO

2nd Use case: Linux VM

15

The UberCloud: Making Technical

Computing available in the Cloud

UberCloud Community:

+2500 companies and

individuals:

+60 cloud providers,

+80 software providers,

several hundred consulting

firms and individual experts.

OpenFOAM added to Azure

Marketplace

Docker containerization

www.ubercloud.com

2nd Use case: Linux VM

16

DEMO

The compute environment you ordered is now

ready.

Access your compute environment via remote

desktop connection (Chrome 8+, Firefox 7+,

Opera 11+, IE 9+)

Launch

Your password for remote desktop access is:

TN1b39pv4Djw

Azure RemoteApp

17

Deliver apps from the cloud, cost-

effectively

Simplify your infrastructure

Run Windows apps anywhere

Centralize your apps, help secure your data

Azure RemoteApp

18

Windows applications as a service

accessible from anywhere.

Costs

19

VM with 16 cores and 56 GB RAM costs 2.11 CHF / hour (D14)

1 TB of Storage costs 30 CHF / month

RemoteApp starting price: $10 / user / month (40h included)

Online Calculator

Azure in Education

Faculty will receive a 12 month,

$250/month account

Students will receive a 6 month,

$100/month account

Short Summary

20

+ Powerful VMs that can be started/stopped on-demand increase

the productivity in our group.

+ Virtual images with OS and different software version to avoid

problems with backward compatibility.

+ Students and team members can manage their own VMs and

reduce the costs of support.

- Storage File Service can be easily mapped to a drive on the VM

but not on premises.

- Only a single user can access one VM.

Scalability Tests on Microsoft Azure

HPC Pack IaaS Demo

SimplyHPC: Light-weight Cloud Orchestrator for MS Azure

What is SimplyHPC?

Framework

23

SimplyHPC:

1) Distributed framework for

Microsoft Azure,

2) Set of PowerShell scripts.

SimplyHPC = Simpler Deployment

Performance and Scalability

Example #1: Solving linear systems with PETSc and HPCG

25

Fig. Performance in GFlops of PETSc solving ruep (right) matrix

system and HPCG Benchmark (left) on different Microsoft Azure

nodes and a private HSR cluster.

Performance and Scalability

26

Example #2: ANSYS CFX

Performance and Scalability

27

Fig. Strong scaling of ANSYS CFX of the compressor (11 mln nodes).

Example #2: ANSYS CFX

Azure Batch

28

Batch is a managed service for batch

processing or batch computing - running a

large volume of similar tasks to get some

desired result.

Short Summary

SimplyHPC: a framework to simplify cluster deployment and

job submission.

Set of light-weight PowerShell scripts to submit, execute and

monitor multi-threaded jobs on Windows Azure.

Easy to use. No cloud-related knowledge necessary.

Run the jobs from command line and download the results

directly to your Azure Storage.

Up to 9x faster than native MS HPC Pack scripts.

Available at https://github.com/vbaros/SimplyHPC

29

L Miroslaw, V Baros, M Pantic, H Nordborg, Unified Cloud Orchestration

Framework for Elastic High Performance Computing on Microsoft Azure,

NAFEMS World Congress 2015

Short Summary

30

+ Scaling properties of Microsoft Azure is comparable to the on-premises

cluster.

HSR Cluster: 7.3 days (176 hours), limited availability.

Microsoft Azure: 4.9 days (118 hours), ca. 50% faster, 100% availability.

+ Dynamic scaling (up- / downscaling) and instant access to the newest

hardware reduces the costs.

+ (Un)limited computing at competitive price.

Cluster composed of 32 x A8 nodes (=256 cores) costs

32 x 2.11 CHF/h = ca. 68 CHF/h

- Upscaling > 100 cores should be planned in advance.

Microsoft Azure Machine Learning Studio

Three types of knowlege:

Know-What (facts)

Know-How (processes)

Know-Why (reasons)

31

Image credit: Univ. Hamburg

AzureML Studio

Key goals of Machine Learning:

Prediction

Classification

Clustering

Collaborative Filtering

32

Image credits: OpenCV, Snipview, Stanford

AzureML: Stellar Classification

Classification Challenge:

HYG database* is a compilation of

of stellar data from three main

catalogues.

Contains ca. 120k stars, 37

spectral characteristics.

2D classification scheme based on

temperature (color index) and

brightness (absolute magnitude).

Data is incomplete and may

contain a few misclassifictions.

Prediction Engine developed in

AzureML

33

Credits: Michael Pantic (HSR)* http://www.astronexus.com/hyg

AzureML Example: Heating Load Prognosis

34

Image credit: SAB Magazine

Input:

- Roof area

- Overall hight

- Glazing area

- Surface area

- ...

Output:

- Heating load

prediction

AzureML Workflow

35

Machine Learning Workflow

1. Hypothesis

2. Data Preparation

3. Model

4. Test

5. Evaluate

A. Tsanas, A. Xifara: 'Accurate quantitative estimation of energy performance of

residential buildings using statistical machine learning tools', Energy and Buildings,

Vol. 49, pp. 560-567, 201

- 8 physical characteristics

from 768 buildings

- Goal: predict buldings’

heating load and cooling

load

- Architects need to compare

several building designs

before selecting the final

approach

AzureML: Cost Model

36

AzureML: Short Summary

37

Very fast prototyping. Load the system with data, test different

Machine Learning methods.

Platform for Internet of Things: Event Hubs, Stream Analytics.

Share the models & results.

Deploy web services fast.

Develop own methods in Python and R Statistics.

Summary

38

Computing and storage at competitive

price.

High Availability, data redundancy,

disaster recovery services are included.

Data transfer take some time.

Up- and downscaling resources

dynamically. Higher productivity.

„Cloudify” your system’s complexity.