40
High-Performance High-Performance Computing With Windows Computing With Windows Ryan Waite Ryan Waite General Program Manager General Program Manager Windows Server HPC Group Windows Server HPC Group Microsoft Corporation Microsoft Corporation

High-Performance Computing With Windows Ryan Waite General Program Manager Windows Server HPC Group Microsoft Corporation

  • View
    216

  • Download
    1

Embed Size (px)

Citation preview

High-PerformanceHigh-PerformanceComputing With WindowsComputing With Windows

Ryan WaiteRyan WaiteGeneral Program ManagerGeneral Program ManagerWindows Server HPC GroupWindows Server HPC GroupMicrosoft CorporationMicrosoft Corporation

OutlineOutline

Part 1: OverviewPart 1: OverviewWhy Microsoft has gotten into HPCWhy Microsoft has gotten into HPC

What our V1 product offersWhat our V1 product offers

Some future directionsSome future directions

Part 2: Drill-downPart 2: Drill-downA few representative V1 featuresA few representative V1 features(for those who are interested)(for those who are interested)

Part 1Part 1

OverviewOverview

Evolving Tools Of The Scientific ProcessEvolving Tools Of The Scientific Process

InstrumentsInstrumentsExperiments done with a Experiments done with a telescope by Galilei 400 years telescope by Galilei 400 years ago inaugurated the scientific ago inaugurated the scientific methodmethod

Microscope, laser, x-ray, Microscope, laser, x-ray, collider, accelerator allowed collider, accelerator allowed peering further and deeper peering further and deeper into matterinto matter

HPCHPCAutomation and acceleration Automation and acceleration of the scientific and of the scientific and engineering process itselfengineering process itself

Digital instruments, data Digital instruments, data mining, simulation, mining, simulation, experiment steeringexperiment steering

1. Observation1. Observation 2. Hypothesis2. Hypothesis

4. Validation4. Validation 3. Prediction3. Prediction

The Next ChallengeThe Next ChallengeTaking HPC MainstreamTaking HPC Mainstream

Volume economics of industry standard Volume economics of industry standard hardware and commercial software hardware and commercial software applications are rapidly bringing HPC applications are rapidly bringing HPC capabilities to a broader number of userscapabilities to a broader number of users

But HPC is still only accessible to the few But HPC is still only accessible to the few computational scientists who can master a computational scientists who can master a domain science, program parallel, domain science, program parallel, distributed algorithms, and use/manage distributed algorithms, and use/manage a supercomputera supercomputer

Microsoft HPC Strategy – taking HPC to Microsoft HPC Strategy – taking HPC to the mainstreamthe mainstream

Enabling broad HPC adoption and making Enabling broad HPC adoption and making HPC into a high volume market in which HPC into a high volume market in which everyone can have their own personal everyone can have their own personal supercomputersupercomputer

Enabling domain scientists who are not Enabling domain scientists who are not computer scientists to partake in the HPC computer scientists to partake in the HPC revolutionrevolution

Evidence Of Standardization And CommoditizationEvidence Of Standardization And Commoditization

Industry Industry usage usage risingrising

GigE is GigE is gaining gaining (50% of (50% of

systems)systems)

Clusters Clusters over 70%over 70%

x86 is x86 is leadingleading

(Pentium (Pentium 41%,41%,

EM64T EM64T 16%,16%,

Opteron Opteron 11%)11%)

HPC Market TrendsHPC Market Trends

Source: IDC, 2005Source: IDC, 2005

-3%-3%

2005 Systems2005 Systems

30%30%

981981

4,9884,988

21,73321,733

163,441163,441

2005 Growth2005 Growth

36%36%

33%33%

<$250K – 97% of systems, 55% of revenue<$250K – 97% of systems, 55% of revenue

Even The Low End Is PowerfulEven The Low End Is Powerful

19911991 19981998 20052005

SystemSystem

Cray Y-MP C916Cray Y-MP C916 Sun HPC10000Sun HPC10000 Small Form Factor PCsSmall Form Factor PCs

ArchitectureArchitecture 16 x Vector16 x Vector4GB, Bus4GB, Bus

24 x 333MHz Ultra-24 x 333MHz Ultra-SPARCII, 24GB, SBusSPARCII, 24GB, SBus

4 x 2.2GHz Athlon644 x 2.2GHz Athlon644GB, GigE4GB, GigE

OSOS UNICOSUNICOS Solaris 2.5.1Solaris 2.5.1 Windows Server 2003 SP1Windows Server 2003 SP1

GFlopsGFlops ~10~10 ~10~10 ~10~10

Top500 #Top500 # 11 500500 N/AN/A

PricePrice $40,000,000$40,000,000 $1,000,000 (40x drop)$1,000,000 (40x drop) < $4,000 (250x drop)< $4,000 (250x drop)

CustomersCustomers Government LabsGovernment Labs Large EnterprisesLarge Enterprises Every Engineer and Scientist Every Engineer and Scientist

ApplicationsApplications Classified, Climate, Classified, Climate, Physics ResearchPhysics Research

Manufacturing, Energy, Manufacturing, Energy, Finance, TelecomFinance, Telecom

Bioinformatics, Materials Bioinformatics, Materials Sciences, Digital MediaSciences, Digital Media

Top ChallengesTop Challenges

Setup is painfulSetup is painfulTakes a long time to get Takes a long time to get clusters up and runningclusters up and running

Clusters are separate Clusters are separate islandsislands

Lack of integration intoLack of integration intoIT infrastructureIT infrastructure

Job managementJob managementLack of integration intoLack of integration intoend-user appsend-user apps

Application availabilityApplication availabilityLimited eco-system of Limited eco-system of applications that can exploit applications that can exploit parallel processing capabilitiesparallel processing capabilities

““Make high-end computing easier and Make high-end computing easier and more productive to use. Emphasis more productive to use. Emphasis should be placed on time to solution, should be placed on time to solution, the major metric of value to high-end the major metric of value to high-end computing users… computing users… A common software environment for A common software environment for scientific computation encompassing scientific computation encompassing desktop to high-end systems will desktop to high-end systems will enhance productivity gains by enhance productivity gains by promoting ease of use and promoting ease of use and manageability of systems.”manageability of systems.”

High-End Computing Revitalization Task Force, 2004 High-End Computing Revitalization Task Force, 2004 (Office of Science and Technology Policy, (Office of Science and Technology Policy,

Executive Office of the President)Executive Office of the President)

Windows Compute Cluster Server 2003Windows Compute Cluster Server 2003

Simplified cluster deployment, job submission Simplified cluster deployment, job submission and status monitoringand status monitoring

Better integration with existing Windows Better integration with existing Windows infrastructure allowing customers to leverage infrastructure allowing customers to leverage existing technology and skill-setsexisting technology and skill-sets

Familiar development environment allows Familiar development environment allows developers to write parallel applications from developers to write parallel applications from within the powerful Visual Studio IDEwithin the powerful Visual Studio IDE

Windows Compute Cluster Server 2003Windows Compute Cluster Server 2003

Leveraging Existing Windows Infrastructure Leveraging Existing Windows Infrastructure

Operations managerOperations manager

Systems Management ServerSystems Management Server

Windows Update servicesWindows Update services

Secure job executionSecure job execution

Remote Installation servicesRemote Installation services

Admin consoleAdmin console

Performance monitorPerformance monitor

Command line interfaceCommand line interface

Kerberos authenticationKerberos authentication

Resource managementResource management

Group policiesGroup policies

Integration with IT infrastructureIntegration with IT infrastructure

Job schedulerJob scheduler

Secure MPISecure MPI

CCS Key FeaturesCCS Key Features

Node deployment and administration Node deployment and administration Task-based configuration for head and compute nodesTask-based configuration for head and compute nodes

UI and command line-based node managementUI and command line-based node management

Monitoring with Performance Monitor (Perfmon), Microsoft Operations Manager (MOM), Server Monitoring with Performance Monitor (Perfmon), Microsoft Operations Manager (MOM), Server Performance Advisor (SPA), and 3rd-party toolsPerformance Advisor (SPA), and 3rd-party tools

Integration with existing Windows and management infrastructureIntegration with existing Windows and management infrastructureIntegrates with Active Directory, Windows security technologies, management, and Integrates with Active Directory, Windows security technologies, management, and deployment toolsdeployment tools

Extensible job schedulerExtensible job scheduler3rd-party extensibility at job submission and/or job assignment3rd-party extensibility at job submission and/or job assignment

Submit jobs from command line, UI, or directly from applicationsSubmit jobs from command line, UI, or directly from applications

Simple job management, similar to print queue managementSimple job management, similar to print queue management

Secure and performant MPISecure and performant MPIUser credentials secured in job scheduler and compute nodesUser credentials secured in job scheduler and compute nodes

MPI stack based on MPICH2 reference implementationMPI stack based on MPICH2 reference implementation

Support for high performance interconnects through Winsock DirectSupport for high performance interconnects through Winsock Direct

Integrated development environmentIntegrated development environmentOpenMP support in Visual Studio, Standard EditionOpenMP support in Visual Studio, Standard Edition

Parallel debugger in Visual Studio, Professional EditionParallel debugger in Visual Studio, Professional Edition

University of University of VirginiaVirginiaCharlottesville, Charlottesville, VA U.S.A.VA U.S.A.

University of University of TennesseeTennesseeKnoxville, TN Knoxville, TN U.S.A.U.S.A.

Cornell Theory CenterCornell Theory CenterIthaca, NY U.S.A.Ithaca, NY U.S.A.

University of UtahUniversity of UtahSalt Lake City, UT Salt Lake City, UT U.S.A.U.S.A.

TACC – University of TACC – University of TexasTexasAustin, TX U.S.A.Austin, TX U.S.A.

Southampton Southampton UniversityUniversitySouthampton, Southampton, UKUK

HLRS – HLRS – University of University of StuttgartStuttgartStuttgart, Stuttgart, GermanyGermany

Shanghai Shanghai Jiao Tong Jiao Tong UniversityUniversityShanghai, Shanghai, PRCPRC

Tokyo Tokyo Institute of Institute of TechnologyTechnologyTokyo, JapanTokyo, Japan

Nizhni Novgorod Nizhni Novgorod UniversityUniversity

Nizhni Novgorod, Nizhni Novgorod, RussiaRussia

HPC InstitutesHPC Institutes

National Center for National Center for Supercomputing Supercomputing Applications, IL Applications, IL U.S.A.U.S.A.

An Example Of Porting To WindowsAn Example Of Porting To WindowsWeather research and forecasting modelWeather research and forecasting model

Large collaborative effort, lead by NCAR, to develop next-generation community model Large collaborative effort, lead by NCAR, to develop next-generation community model with direct path to operationswith direct path to operations

ApplicationsApplicationsAtmospheric researchAtmospheric research

Numerical weather predictionNumerical weather prediction

Coupled modeling systems Coupled modeling systems

Current release WRFV2.1.2Current release WRFV2.1.2~1/3 million lines, Fortran 90~1/3 million lines, Fortran 90and some C using MPI, OpenMPand some C using MPI, OpenMP

Traditionally developed for Unix Traditionally developed for Unix HPC systemsHPC systems

Two dynamical coresTwo dynamical cores

Full range of physics optionsFull range of physics options

Rapid community growth –Rapid community growth –more than 3,000 registered usersmore than 3,000 registered users

Operational capabilities Operational capabilities U.S. Air Force Weather AgencyU.S. Air Force Weather Agency

National Centers for Environmental Prediction (NOAA)National Centers for Environmental Prediction (NOAA)

KMA (Korea), IMD (India), CWB (Taiwan), IAF (Israel), WSI (U.S.)KMA (Korea), IMD (India), CWB (Taiwan), IAF (Israel), WSI (U.S.)

WRF On WindowsWRF On Windows

MotivationMotivationExtend available systems available to WRF usersExtend available systems available to WRF users

Stability and consistency with respect to LinuxStability and consistency with respect to Linux

Take advantage of Microsoft and 3rd party (e.g., Portland Group) Take advantage of Microsoft and 3rd party (e.g., Portland Group) development tools, environments development tools, environments

WRF ported under SUA and running on development AMD64 clusters WRF ported under SUA and running on development AMD64 clusters using Compute Cluster Packusing Compute Cluster Pack

Of 360k lines, fewer than 750 changed to compile and link under SUAOf 360k lines, fewer than 750 changed to compile and link under SUA

Largest number of changes involved the WRF build mechanism Largest number of changes involved the WRF build mechanism (Makefiles, scripts)(Makefiles, scripts)

Level of effort and nature of tasks was not unlike porting to any new Level of effort and nature of tasks was not unlike porting to any new version of UNIXversion of UNIX

Details of porting experience described in a white paper available from Details of porting experience described in a white paper available from Microsoft and at Microsoft and at http://www.mmm.ucar.edu/wrf/WG2/wrf_port_notes.htmhttp://www.mmm.ucar.edu/wrf/WG2/wrf_port_notes.htm

DesktopDesktop ServersServers ClustersClusters

Excel ServicesExcel Services

Excel Services on Windows Excel Services on Windows Compute Cluster Server 2003Compute Cluster Server 2003

Excel “12”Excel “12”

An Example Of Application Integration An Example Of Application Integration With HPCWith HPCScaling ExcelScaling Excel

Excel “12”Excel “12”

BrowserBrowser100% thin100% thin

View and View and InteractInteract

CustomCustomapplicationsapplications

Web ServicesWeb ServicesAccessAccess

Excel “12”Excel “12”clientclient

Open Open Spreadsheet/SnapshotSpreadsheet/Snapshot

Excel ServicesExcel Services

Author and Publish Author and Publish SpreadsheetsSpreadsheets

Excel And Windows CCSExcel And Windows CCS

Customer requirementsCustomer requirementsFaster spreadsheet calculationFaster spreadsheet calculation

Free-up client machines from long-running calculationsFree-up client machines from long-running calculations

Time/mission critical calculations that must runTime/mission critical calculations that must run

Parallel iterations on modelsParallel iterations on models

Example scenariosExample scenariosSchedule overnight risk calculationsSchedule overnight risk calculations

Farm out analytical library calculationsFarm out analytical library calculations

Scale-out Monte Carlo iterations, parametric sweepsScale-out Monte Carlo iterations, parametric sweeps

Evolution Of HPCEvolution Of HPC

Evolving ScenariosEvolving Scenarios Key FactorsKey Factors

Batch computing on supercomputersBatch computing on supercomputers Compute cycles are scarce and require careful Compute cycles are scarce and require careful partitioning and allocationpartitioning and allocation

Cluster systems administration major challengeCluster systems administration major challenge

Applications split into UI and compute partsApplications split into UI and compute parts

Interactive computing on Interactive computing on departmental clustersdepartmental clusters

Compute cycles are cheapCompute cycles are cheap

Interactive applications integrate UI/compute parts Interactive applications integrate UI/compute parts

Emergence of turnkey personal clustersEmergence of turnkey personal clusters

Complex workflow spanning applicationsComplex workflow spanning applications Compute and data resources are diffused throughout Compute and data resources are diffused throughout the enterprisethe enterprise

Distributed application, systems and data Distributed application, systems and data management is the key source of complexitymanagement is the key source of complexity

Multiple applications are organized into complex Multiple applications are organized into complex workflows and data pipelinesworkflows and data pipelines

Focus on service orientation and web servicesFocus on service orientation and web services

Interactive Interactive Computation and Computation and VisualizationVisualization

Manual, batchManual, batchexecutionexecution

IT IT MgrMgr

SQLSQL

Cheap Cycles And Personal SupercomputingCheap Cycles And Personal Supercomputing

IBM Cell processorIBM Cell processor256 Gflops today256 Gflops today

4 node personal cluster 4 node personal cluster 1 Tflops 1 Tflops

32 node personal cluster 32 node personal cluster Top100 Top100

The key challengeThe key challengeHow to program these thingsHow to program these things

Concurrent programmingConcurrent programmingwill be an important areawill be an important areaof investments for all of of investments for all of Microsoft (not just HPC)Microsoft (not just HPC)

22

Microsoft XboxMicrosoft Xbox3 custom PowerPCs + ATI graphics 3 custom PowerPCs + ATI graphics processorprocessor

1 Tflops today1 Tflops today

$300$300

8 node personal cluster 8 node personal cluster “Top100” for $2500 “Top100” for $2500 (ignoring all that you don’t get for $300)(ignoring all that you don’t get for $300)

Intel many-core chipsIntel many-core chips““100’s of cores on a chip in 2015” (Justin Rattner, Intel)100’s of cores on a chip in 2015” (Justin Rattner, Intel)

““4 cores”/Tflop 4 cores”/Tflop 25 Tflops/chip 25 Tflops/chip

““Grid Computing”Grid Computing”

A catch-all marketing termA catch-all marketing termDesktop cycle-stealingDesktop cycle-stealing

Managed HPC clustersManaged HPC clusters

Internet access to giant, Internet access to giant, distributed repositoriesdistributed repositories

Virtualization of data center IT resourcesVirtualization of data center IT resources

Out-sourcing to “utility data centers”Out-sourcing to “utility data centers”

““Software as a service”Software as a service”

Parallel databasesParallel databases

HPC Grids And Web ServicesHPC Grids And Web Services

Compute gridCompute gridForest of clusters Forest of clusters

Coordinated scheduling Coordinated scheduling of resourcesof resources

Data gridData grid

Distributed storage facilities Distributed storage facilities

Coordinated management Coordinated management of dataof data

Web ServicesWeb ServicesGlue for heterogeneous Glue for heterogeneous platforms/applications/systemsplatforms/applications/systems

Cross- and intra-Cross- and intra-organization integrationorganization integration

Standards-basedStandards-baseddistributed computingdistributed computing

Interoperability Interoperability and composabilityand composability

Cluster-Based HPCCluster-Based HPC

Intra-Organization HPCIntra-Organization HPC

Virtual OrganizationsVirtual Organizations

Part 2Part 2

Drill-DownDrill-Down

TechnologiesTechnologies

PlatformPlatformWindows Server 2003 SP1 64-bit EditionWindows Server 2003 SP1 64-bit Edition

x64 processors (Intel EM64T and AMD Opteron)x64 processors (Intel EM64T and AMD Opteron)

Ethernet, Ethernet over RDMA and Infiniband supportEthernet, Ethernet over RDMA and Infiniband support

AdministrationAdministrationPrescriptive, simplified cluster setup and administrationPrescriptive, simplified cluster setup and administration

Scripted, image-based compute node managementScripted, image-based compute node management

Active Directory based securityActive Directory based security

Scalable job scheduling and resource managementScalable job scheduling and resource management

DevelopmentDevelopmentMPICH-2 from Argonne National Labs with performance and MPICH-2 from Argonne National Labs with performance and security enhancementssecurity enhancements

Cluster scheduler programmable via Web Services and DCOMCluster scheduler programmable via Web Services and DCOM

Visual Studio 2005 – OpenMP, Parallel DebuggerVisual Studio 2005 – OpenMP, Parallel Debugger

Partner delivered Fortran compilers and numerical librariesPartner delivered Fortran compilers and numerical libraries

Head Node InstallationHead Node Installation

Head Node installs only on x64Head Node installs only on x64Windows 2003 Compute Cluster EditionWindows 2003 Compute Cluster Edition

Windows 2003 SP1 Standard And EnterpriseWindows 2003 SP1 Standard And Enterprise

Windows 2003 R2Windows 2003 R2

InstallationInstallationLeverages appliance like functionalityLeverages appliance like functionality

Scripted installationScripted installation

Warnings if system is misconfiguredWarnings if system is misconfigured

To Do list to assist with final configurationTo Do list to assist with final configuration

WalkthroughWalkthroughWindows Server 2003 is installed on the head nodeWindows Server 2003 is installed on the head node

System may have been pre-installed using OPKSystem may have been pre-installed using OPK

User launches Compute Cluster Kit setupUser launches Compute Cluster Kit setup

To Do list starts up, guiding User through next stepsTo Do list starts up, guiding User through next steps

User joins Active Directory domainUser joins Active Directory domain

User installs IP over IB drivers for InfiniBand cards if not pre-installedUser installs IP over IB drivers for InfiniBand cards if not pre-installed

Wizard assists with multi-NIC routing and configurationWizard assists with multi-NIC routing and configuration

Remote Installation Service is configured for imaging compute nodesRemote Installation Service is configured for imaging compute nodes

Automated installationAutomated installationRemote Installation Service provides simpleRemote Installation Service provides simpleimaging solutionimaging solution

May use third-party system imaging toolsMay use third-party system imaging toolscompute nodescompute nodes

Requires private network Requires private network

WalkthroughWalkthroughUser racks up compute nodesUser racks up compute nodes

Starts Add Node wizardStarts Add Node wizard

Powers up a group of compute nodesPowers up a group of compute nodes

Compute nodes PXE bootCompute nodes PXE boot

RIS and installation scripts willRIS and installation scripts willInstall operating system: W2K3 SP1Install operating system: W2K3 SP1

Install driversInstall drivers

Join appropriate domainJoin appropriate domain

Install compute cluster software (CD2)Install compute cluster software (CD2)

Join clusterJoin cluster

Exiting wizard turns off RISExiting wizard turns off RIS

Corpnet

Infiniband

Ethernet

ComputeNode

ComputeNode

HeadNode

Compute Node InstallationCompute Node Installation

Node ManagementNode Management

Not building a new systems management paradigmNot building a new systems management paradigmLeveraging Windows infrastructure for simple managementLeveraging Windows infrastructure for simple management

MMC, Perfmon, Event Viewer, Remote DesktopMMC, Perfmon, Event Viewer, Remote Desktop

Can integrate with enterprise management infrastructure, such as Microsoft Can integrate with enterprise management infrastructure, such as Microsoft Operations ManagerOperations Manager

Compute Cluster MMC snap-inCompute Cluster MMC snap-inSupports specific actionsSupports specific actions

Pause NodePause Node

Resume NodeResume Node

Open CD DriveOpen CD Drive

Reboot NodeReboot Node

Execute CommandExecute Command

Remote Desktop ConnectionRemote Desktop Connection

Start PerfMonStart PerfMon

DeleteDelete

PropertiesProperties

Can operate on multiple nodes at onceCan operate on multiple nodes at once

Compute Cluster Admin ConsoleCompute Cluster Admin Console

File Action View Favorites Window Help

Compute Node Name Job NameJob StatusNode StatusCompute Cluster Admin Console

Node ManagementQueue Management

Bio Lab 1 (Compute Cluster)To Do List

Node1 Active Executing Bob’s Blast Job 47 NTDEV\bobmu Node2 Active Executing Bob’s Blast Job 51 NTDEV\bobmu Node3 Active Executing Bob’s Blast Job 41 NTDEV\bobmu Node4 Active Executing Orange Temp 1245 NTDEV\suej Node5 Active Idle Node6 Paused Executing Bob’s Blast Job 42 NTDEV\bobmu Node7 Active Executing Orange Temp 1245 NTDEV\suej Node8 Paused Executing Agent B, Matrix 27 60102 NTDEV\enrico Node9 Paused Idle Node10 Paused Idle Node11 Active Idle Node12 Active Idle Node13 Active Executing Agent B, Matrix 27 60102 NTDEV\enrico Node14 Active Executing Patching 465 CC\admin Node15 Active Executing Patching 680 CC\admin Node16 Active Executing Patching 465 CC\admin Node17 Installing Node18 Installing Node19 Installing Node20 Installing

Job Time Owner

Compute Cluster Admin ConsoleCompute Cluster Admin Console

File Action View Favorites Window Help

Compute Node Name Job NameJob StatusNode StatusCompute Cluster Admin Console

Node ManagementQueue Management

Bio Lab 1 (Compute Cluster)To Do List

Node1 Active Executing Bob’s Blast Job 47 NTDEV\bobmu Node2 Active Executing Bob’s Blast Job 51 NTDEV\bobmu Node3 Active Executing Bob’s Blast Job 41 NTDEV\bobmu Node4 Active Executing Orange Temp 1245 NTDEV\suej Node5 Active Idle Node6 Paused Executing Bob’s Blast Job 42 NTDEV\bobmu Node7 Active Executing Orange Temp 1245 NTDEV\suej Node8 Paused Executing Agent B, Matrix 27 60102 NTDEV\enrico Node9 Paused Idle Node10 Paused Idle Node11 Active Idle Node12 Active Idle Node13 Active Executing Agent B, Matrix 27 60102 NTDEV\enrico Node14 Active Executing Patching 465 CC\admin Node15 Active Executing Patching 680 CC\admin Node16 Active Executing Patching 465 CC\admin Node17 Installing Node18 Installing Node19 Installing Node20 Installing

Job Time Owner

Job/Task Conceptual ModelJob/Task Conceptual ModelSerial JobSerial Job

TaskTask

ProcProc

Parallel MPI JobParallel MPI Job

TaskTask

ProcProc ProcProcIPCIPC

Parameter Sweep JobParameter Sweep Job

TaskTask

ProcProc

TaskTask

ProcProc

TaskTask

ProcProc

Task Flow JobTask Flow Job

TaskTask

TaskTask

TaskTask

TaskTask

Job Scheduler StackJob Scheduler Stack

WS (WSE 3.0)

COMAPI

CommandLine Interface

UserConsole

AdminConsole

User Interface Handlers

Queueing

Job Management Resource Management

NodeManager

NodeManager

NodeManager

NodeManager

NodeManager

NodeManager

Third-partyApplications

InterfaceLayer

SchedulingLayer

ExecutionLayer

Object Model

User Admin

Head NodeHead Node

Client NodeClient Node

Compute NodeCompute Node

Admission

Allocation

Activation

Jobs/TasksJobs/Tasks

Job SchedulerJob Scheduler

Job scheduler provides two features: Ordering and allocationJob scheduler provides two features: Ordering and allocation

Job orderingJob orderingPriority-based first-come, first-serve (FCFS)Priority-based first-come, first-serve (FCFS)

Backfill supported for jobs with time limitsBackfill supported for jobs with time limits

Resource allocationResource allocationLicense-aware scheduling through plug-insLicense-aware scheduling through plug-ins

Parallel application node allocation policiesParallel application node allocation policies

ExtensibleExtensibleCore engine based on embedded SQL engineCore engine based on embedded SQL engine

Resource and job descriptions are based on XMLResource and job descriptions are based on XML

3rd parties can extend by plugging into submission and execution phases to implement queuing 3rd parties can extend by plugging into submission and execution phases to implement queuing and licensing policiesand licensing policies

Job submissionJob submissionJobs submitted via UI, API, command line, or web serviceJobs submitted via UI, API, command line, or web service

SecuritySecurityJobs on compute nodes execute in the security account of the submitting user, allowing secure Jobs on compute nodes execute in the security account of the submitting user, allowing secure access to networked resourcesaccess to networked resources

CleanupCleanupJobs executed in Job Objects on compute nodes, facilitating cleanupJobs executed in Job Objects on compute nodes, facilitating cleanup

Queue ManagementQueue Management

Job Management model similar to print Job Management model similar to print queue managementqueue management

Leverage familiar user paradigmLeverage familiar user paradigm

Queue management operationsQueue management operationsDeleteDelete

Change propertiesChange propertiesPriorityPriority

Run timeRun time

# of CPUs# of CPUs

Preferred nodesPreferred nodes

CPUs per nodeCPUs per node

All in oneAll in one

License parametersLicense parameters

Uniform attributesUniform attributes

NotificationNotification

Compute ClusterCompute Cluster

Console Root

To Do List

Compute Nodes

File Action View Favorites Window Help

Bio Lab 1 (Compute Cluster)

Queue Management

Order Priority Name StatusOwner2 Bob’s Blast Job Domain\Bobr Running

000434 Domain\Bobr Completed 000435 Domain\Bobr Completed 000436 Domain\Bobr Running – Bnode19 000437 Domain\Bobr Running – Bnode20 000438 Domain\Bobr Running – Bnode30 000439 Domain\Bobr Running – Bnode21 000440 Domain\Bobr Running – Bnode26 000441 Domain\Bobr Running – Bnode27 000442 Domain\Bobr Running – Bnode18 000443 Domain\Bobr Queued 000444 Domain\Bobr Queued 000445 Domain\Bobr Queued 000446 Domain\Bobr Queued 000447 Domain\Bobr Queued 000448 Domain\Bobr Queued 000449 Domain\Bobr Queued 000450 Domain\Bobr Queued

1 Lodica Calc Domain\Sue Running2 Sue Domain\Sue Running2 Agent B, Matrix 27 Domain\Tam.. Running1 Better work this time! Domain\Crai.. Running2 Orange Temp Domain\Ryan Running

1

23456

Compute ClusterCompute Cluster

Console Root

To Do List

Compute Nodes

File Action View Favorites Window Help

Bio Lab 1 (Compute Cluster)

Queue Management

Order Priority Name StatusOwner2 Bob’s Blast Job Domain\Bobr Running

000434 Domain\Bobr Completed 000435 Domain\Bobr Completed 000436 Domain\Bobr Running – Bnode19 000437 Domain\Bobr Running – Bnode20 000438 Domain\Bobr Running – Bnode30 000439 Domain\Bobr Running – Bnode21 000440 Domain\Bobr Running – Bnode26 000441 Domain\Bobr Running – Bnode27 000442 Domain\Bobr Running – Bnode18 000443 Domain\Bobr Queued 000444 Domain\Bobr Queued 000445 Domain\Bobr Queued 000446 Domain\Bobr Queued 000447 Domain\Bobr Queued 000448 Domain\Bobr Queued 000449 Domain\Bobr Queued 000450 Domain\Bobr Queued

1 Lodica Calc Domain\Sue Running2 Sue Domain\Sue Running2 Agent B, Matrix 27 Domain\Tam.. Running1 Better work this time! Domain\Crai.. Running2 Orange Temp Domain\Ryan Running

1

23456

Networking Networking

Focusing on industry standard interconnect technologiesFocusing on industry standard interconnect technologiesMPI implementation tuned to WinsockMPI implementation tuned to Winsock

Automatic RDMA support through Winsock DirectAutomatic RDMA support through Winsock Direct(SAN provider required from IHV)(SAN provider required from IHV)

Gigabit EthernetGigabit EthernetExpect to be the mainstream choiceExpect to be the mainstream choice

RDMA + GigE offers compelling latencyRDMA + GigE offers compelling latency

InfinibandInfinibandEmerging as a leading high end solutionEmerging as a leading high end solution

Engaged with all IB vendorsEngaged with all IB vendors

OpenIB group developing a Windows IB stackOpenIB group developing a Windows IB stack

Planning to support IB in WHQLPlanning to support IB in WHQL

ResourcesResources

Microsoft HPC web siteMicrosoft HPC web site(evaluation copies available)(evaluation copies available)

http://www.microsoft.com/http://www.microsoft.com/hpchpc//

Microsoft Windows Compute Cluster Server Microsoft Windows Compute Cluster Server 2003 community site2003 community site

http://http://www.windowshpc.netwww.windowshpc.net/ /

Windows Server x64 informationWindows Server x64 informationhttp://www.microsoft.com/64bit/http://www.microsoft.com/64bit/

http://www.microsoft.com/x64/http://www.microsoft.com/x64/

Windows Server System informationWindows Server System informationhttp://www.microsoft.com/http://www.microsoft.com/wsswss//

© 2006 Microsoft Corporation. All rights reserved.Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation.Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft,

and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.