View
216
Download
1
Tags:
Embed Size (px)
Citation preview
High-PerformanceHigh-PerformanceComputing With WindowsComputing With Windows
Ryan WaiteRyan WaiteGeneral Program ManagerGeneral Program ManagerWindows Server HPC GroupWindows Server HPC GroupMicrosoft CorporationMicrosoft Corporation
OutlineOutline
Part 1: OverviewPart 1: OverviewWhy Microsoft has gotten into HPCWhy Microsoft has gotten into HPC
What our V1 product offersWhat our V1 product offers
Some future directionsSome future directions
Part 2: Drill-downPart 2: Drill-downA few representative V1 featuresA few representative V1 features(for those who are interested)(for those who are interested)
Evolving Tools Of The Scientific ProcessEvolving Tools Of The Scientific Process
InstrumentsInstrumentsExperiments done with a Experiments done with a telescope by Galilei 400 years telescope by Galilei 400 years ago inaugurated the scientific ago inaugurated the scientific methodmethod
Microscope, laser, x-ray, Microscope, laser, x-ray, collider, accelerator allowed collider, accelerator allowed peering further and deeper peering further and deeper into matterinto matter
HPCHPCAutomation and acceleration Automation and acceleration of the scientific and of the scientific and engineering process itselfengineering process itself
Digital instruments, data Digital instruments, data mining, simulation, mining, simulation, experiment steeringexperiment steering
1. Observation1. Observation 2. Hypothesis2. Hypothesis
4. Validation4. Validation 3. Prediction3. Prediction
The Next ChallengeThe Next ChallengeTaking HPC MainstreamTaking HPC Mainstream
Volume economics of industry standard Volume economics of industry standard hardware and commercial software hardware and commercial software applications are rapidly bringing HPC applications are rapidly bringing HPC capabilities to a broader number of userscapabilities to a broader number of users
But HPC is still only accessible to the few But HPC is still only accessible to the few computational scientists who can master a computational scientists who can master a domain science, program parallel, domain science, program parallel, distributed algorithms, and use/manage distributed algorithms, and use/manage a supercomputera supercomputer
Microsoft HPC Strategy – taking HPC to Microsoft HPC Strategy – taking HPC to the mainstreamthe mainstream
Enabling broad HPC adoption and making Enabling broad HPC adoption and making HPC into a high volume market in which HPC into a high volume market in which everyone can have their own personal everyone can have their own personal supercomputersupercomputer
Enabling domain scientists who are not Enabling domain scientists who are not computer scientists to partake in the HPC computer scientists to partake in the HPC revolutionrevolution
Evidence Of Standardization And CommoditizationEvidence Of Standardization And Commoditization
Industry Industry usage usage risingrising
GigE is GigE is gaining gaining (50% of (50% of
systems)systems)
Clusters Clusters over 70%over 70%
x86 is x86 is leadingleading
(Pentium (Pentium 41%,41%,
EM64T EM64T 16%,16%,
Opteron Opteron 11%)11%)
HPC Market TrendsHPC Market Trends
Source: IDC, 2005Source: IDC, 2005
-3%-3%
2005 Systems2005 Systems
30%30%
981981
4,9884,988
21,73321,733
163,441163,441
2005 Growth2005 Growth
36%36%
33%33%
<$250K – 97% of systems, 55% of revenue<$250K – 97% of systems, 55% of revenue
Even The Low End Is PowerfulEven The Low End Is Powerful
19911991 19981998 20052005
SystemSystem
Cray Y-MP C916Cray Y-MP C916 Sun HPC10000Sun HPC10000 Small Form Factor PCsSmall Form Factor PCs
ArchitectureArchitecture 16 x Vector16 x Vector4GB, Bus4GB, Bus
24 x 333MHz Ultra-24 x 333MHz Ultra-SPARCII, 24GB, SBusSPARCII, 24GB, SBus
4 x 2.2GHz Athlon644 x 2.2GHz Athlon644GB, GigE4GB, GigE
OSOS UNICOSUNICOS Solaris 2.5.1Solaris 2.5.1 Windows Server 2003 SP1Windows Server 2003 SP1
GFlopsGFlops ~10~10 ~10~10 ~10~10
Top500 #Top500 # 11 500500 N/AN/A
PricePrice $40,000,000$40,000,000 $1,000,000 (40x drop)$1,000,000 (40x drop) < $4,000 (250x drop)< $4,000 (250x drop)
CustomersCustomers Government LabsGovernment Labs Large EnterprisesLarge Enterprises Every Engineer and Scientist Every Engineer and Scientist
ApplicationsApplications Classified, Climate, Classified, Climate, Physics ResearchPhysics Research
Manufacturing, Energy, Manufacturing, Energy, Finance, TelecomFinance, Telecom
Bioinformatics, Materials Bioinformatics, Materials Sciences, Digital MediaSciences, Digital Media
Top ChallengesTop Challenges
Setup is painfulSetup is painfulTakes a long time to get Takes a long time to get clusters up and runningclusters up and running
Clusters are separate Clusters are separate islandsislands
Lack of integration intoLack of integration intoIT infrastructureIT infrastructure
Job managementJob managementLack of integration intoLack of integration intoend-user appsend-user apps
Application availabilityApplication availabilityLimited eco-system of Limited eco-system of applications that can exploit applications that can exploit parallel processing capabilitiesparallel processing capabilities
““Make high-end computing easier and Make high-end computing easier and more productive to use. Emphasis more productive to use. Emphasis should be placed on time to solution, should be placed on time to solution, the major metric of value to high-end the major metric of value to high-end computing users… computing users… A common software environment for A common software environment for scientific computation encompassing scientific computation encompassing desktop to high-end systems will desktop to high-end systems will enhance productivity gains by enhance productivity gains by promoting ease of use and promoting ease of use and manageability of systems.”manageability of systems.”
High-End Computing Revitalization Task Force, 2004 High-End Computing Revitalization Task Force, 2004 (Office of Science and Technology Policy, (Office of Science and Technology Policy,
Executive Office of the President)Executive Office of the President)
Windows Compute Cluster Server 2003Windows Compute Cluster Server 2003
Simplified cluster deployment, job submission Simplified cluster deployment, job submission and status monitoringand status monitoring
Better integration with existing Windows Better integration with existing Windows infrastructure allowing customers to leverage infrastructure allowing customers to leverage existing technology and skill-setsexisting technology and skill-sets
Familiar development environment allows Familiar development environment allows developers to write parallel applications from developers to write parallel applications from within the powerful Visual Studio IDEwithin the powerful Visual Studio IDE
Leveraging Existing Windows Infrastructure Leveraging Existing Windows Infrastructure
Operations managerOperations manager
Systems Management ServerSystems Management Server
Windows Update servicesWindows Update services
Secure job executionSecure job execution
Remote Installation servicesRemote Installation services
Admin consoleAdmin console
Performance monitorPerformance monitor
Command line interfaceCommand line interface
Kerberos authenticationKerberos authentication
Resource managementResource management
Group policiesGroup policies
Integration with IT infrastructureIntegration with IT infrastructure
Job schedulerJob scheduler
Secure MPISecure MPI
CCS Key FeaturesCCS Key Features
Node deployment and administration Node deployment and administration Task-based configuration for head and compute nodesTask-based configuration for head and compute nodes
UI and command line-based node managementUI and command line-based node management
Monitoring with Performance Monitor (Perfmon), Microsoft Operations Manager (MOM), Server Monitoring with Performance Monitor (Perfmon), Microsoft Operations Manager (MOM), Server Performance Advisor (SPA), and 3rd-party toolsPerformance Advisor (SPA), and 3rd-party tools
Integration with existing Windows and management infrastructureIntegration with existing Windows and management infrastructureIntegrates with Active Directory, Windows security technologies, management, and Integrates with Active Directory, Windows security technologies, management, and deployment toolsdeployment tools
Extensible job schedulerExtensible job scheduler3rd-party extensibility at job submission and/or job assignment3rd-party extensibility at job submission and/or job assignment
Submit jobs from command line, UI, or directly from applicationsSubmit jobs from command line, UI, or directly from applications
Simple job management, similar to print queue managementSimple job management, similar to print queue management
Secure and performant MPISecure and performant MPIUser credentials secured in job scheduler and compute nodesUser credentials secured in job scheduler and compute nodes
MPI stack based on MPICH2 reference implementationMPI stack based on MPICH2 reference implementation
Support for high performance interconnects through Winsock DirectSupport for high performance interconnects through Winsock Direct
Integrated development environmentIntegrated development environmentOpenMP support in Visual Studio, Standard EditionOpenMP support in Visual Studio, Standard Edition
Parallel debugger in Visual Studio, Professional EditionParallel debugger in Visual Studio, Professional Edition
University of University of VirginiaVirginiaCharlottesville, Charlottesville, VA U.S.A.VA U.S.A.
University of University of TennesseeTennesseeKnoxville, TN Knoxville, TN U.S.A.U.S.A.
Cornell Theory CenterCornell Theory CenterIthaca, NY U.S.A.Ithaca, NY U.S.A.
University of UtahUniversity of UtahSalt Lake City, UT Salt Lake City, UT U.S.A.U.S.A.
TACC – University of TACC – University of TexasTexasAustin, TX U.S.A.Austin, TX U.S.A.
Southampton Southampton UniversityUniversitySouthampton, Southampton, UKUK
HLRS – HLRS – University of University of StuttgartStuttgartStuttgart, Stuttgart, GermanyGermany
Shanghai Shanghai Jiao Tong Jiao Tong UniversityUniversityShanghai, Shanghai, PRCPRC
Tokyo Tokyo Institute of Institute of TechnologyTechnologyTokyo, JapanTokyo, Japan
Nizhni Novgorod Nizhni Novgorod UniversityUniversity
Nizhni Novgorod, Nizhni Novgorod, RussiaRussia
HPC InstitutesHPC Institutes
National Center for National Center for Supercomputing Supercomputing Applications, IL Applications, IL U.S.A.U.S.A.
An Example Of Porting To WindowsAn Example Of Porting To WindowsWeather research and forecasting modelWeather research and forecasting model
Large collaborative effort, lead by NCAR, to develop next-generation community model Large collaborative effort, lead by NCAR, to develop next-generation community model with direct path to operationswith direct path to operations
ApplicationsApplicationsAtmospheric researchAtmospheric research
Numerical weather predictionNumerical weather prediction
Coupled modeling systems Coupled modeling systems
Current release WRFV2.1.2Current release WRFV2.1.2~1/3 million lines, Fortran 90~1/3 million lines, Fortran 90and some C using MPI, OpenMPand some C using MPI, OpenMP
Traditionally developed for Unix Traditionally developed for Unix HPC systemsHPC systems
Two dynamical coresTwo dynamical cores
Full range of physics optionsFull range of physics options
Rapid community growth –Rapid community growth –more than 3,000 registered usersmore than 3,000 registered users
Operational capabilities Operational capabilities U.S. Air Force Weather AgencyU.S. Air Force Weather Agency
National Centers for Environmental Prediction (NOAA)National Centers for Environmental Prediction (NOAA)
KMA (Korea), IMD (India), CWB (Taiwan), IAF (Israel), WSI (U.S.)KMA (Korea), IMD (India), CWB (Taiwan), IAF (Israel), WSI (U.S.)
WRF On WindowsWRF On Windows
MotivationMotivationExtend available systems available to WRF usersExtend available systems available to WRF users
Stability and consistency with respect to LinuxStability and consistency with respect to Linux
Take advantage of Microsoft and 3rd party (e.g., Portland Group) Take advantage of Microsoft and 3rd party (e.g., Portland Group) development tools, environments development tools, environments
WRF ported under SUA and running on development AMD64 clusters WRF ported under SUA and running on development AMD64 clusters using Compute Cluster Packusing Compute Cluster Pack
Of 360k lines, fewer than 750 changed to compile and link under SUAOf 360k lines, fewer than 750 changed to compile and link under SUA
Largest number of changes involved the WRF build mechanism Largest number of changes involved the WRF build mechanism (Makefiles, scripts)(Makefiles, scripts)
Level of effort and nature of tasks was not unlike porting to any new Level of effort and nature of tasks was not unlike porting to any new version of UNIXversion of UNIX
Details of porting experience described in a white paper available from Details of porting experience described in a white paper available from Microsoft and at Microsoft and at http://www.mmm.ucar.edu/wrf/WG2/wrf_port_notes.htmhttp://www.mmm.ucar.edu/wrf/WG2/wrf_port_notes.htm
DesktopDesktop ServersServers ClustersClusters
Excel ServicesExcel Services
Excel Services on Windows Excel Services on Windows Compute Cluster Server 2003Compute Cluster Server 2003
Excel “12”Excel “12”
An Example Of Application Integration An Example Of Application Integration With HPCWith HPCScaling ExcelScaling Excel
Excel “12”Excel “12”
BrowserBrowser100% thin100% thin
View and View and InteractInteract
CustomCustomapplicationsapplications
Web ServicesWeb ServicesAccessAccess
Excel “12”Excel “12”clientclient
Open Open Spreadsheet/SnapshotSpreadsheet/Snapshot
Excel ServicesExcel Services
Author and Publish Author and Publish SpreadsheetsSpreadsheets
Excel And Windows CCSExcel And Windows CCS
Customer requirementsCustomer requirementsFaster spreadsheet calculationFaster spreadsheet calculation
Free-up client machines from long-running calculationsFree-up client machines from long-running calculations
Time/mission critical calculations that must runTime/mission critical calculations that must run
Parallel iterations on modelsParallel iterations on models
Example scenariosExample scenariosSchedule overnight risk calculationsSchedule overnight risk calculations
Farm out analytical library calculationsFarm out analytical library calculations
Scale-out Monte Carlo iterations, parametric sweepsScale-out Monte Carlo iterations, parametric sweeps
Evolution Of HPCEvolution Of HPC
Evolving ScenariosEvolving Scenarios Key FactorsKey Factors
Batch computing on supercomputersBatch computing on supercomputers Compute cycles are scarce and require careful Compute cycles are scarce and require careful partitioning and allocationpartitioning and allocation
Cluster systems administration major challengeCluster systems administration major challenge
Applications split into UI and compute partsApplications split into UI and compute parts
Interactive computing on Interactive computing on departmental clustersdepartmental clusters
Compute cycles are cheapCompute cycles are cheap
Interactive applications integrate UI/compute parts Interactive applications integrate UI/compute parts
Emergence of turnkey personal clustersEmergence of turnkey personal clusters
Complex workflow spanning applicationsComplex workflow spanning applications Compute and data resources are diffused throughout Compute and data resources are diffused throughout the enterprisethe enterprise
Distributed application, systems and data Distributed application, systems and data management is the key source of complexitymanagement is the key source of complexity
Multiple applications are organized into complex Multiple applications are organized into complex workflows and data pipelinesworkflows and data pipelines
Focus on service orientation and web servicesFocus on service orientation and web services
Interactive Interactive Computation and Computation and VisualizationVisualization
Manual, batchManual, batchexecutionexecution
IT IT MgrMgr
SQLSQL
Cheap Cycles And Personal SupercomputingCheap Cycles And Personal Supercomputing
IBM Cell processorIBM Cell processor256 Gflops today256 Gflops today
4 node personal cluster 4 node personal cluster 1 Tflops 1 Tflops
32 node personal cluster 32 node personal cluster Top100 Top100
The key challengeThe key challengeHow to program these thingsHow to program these things
Concurrent programmingConcurrent programmingwill be an important areawill be an important areaof investments for all of of investments for all of Microsoft (not just HPC)Microsoft (not just HPC)
22
Microsoft XboxMicrosoft Xbox3 custom PowerPCs + ATI graphics 3 custom PowerPCs + ATI graphics processorprocessor
1 Tflops today1 Tflops today
$300$300
8 node personal cluster 8 node personal cluster “Top100” for $2500 “Top100” for $2500 (ignoring all that you don’t get for $300)(ignoring all that you don’t get for $300)
Intel many-core chipsIntel many-core chips““100’s of cores on a chip in 2015” (Justin Rattner, Intel)100’s of cores on a chip in 2015” (Justin Rattner, Intel)
““4 cores”/Tflop 4 cores”/Tflop 25 Tflops/chip 25 Tflops/chip
““Grid Computing”Grid Computing”
A catch-all marketing termA catch-all marketing termDesktop cycle-stealingDesktop cycle-stealing
Managed HPC clustersManaged HPC clusters
Internet access to giant, Internet access to giant, distributed repositoriesdistributed repositories
Virtualization of data center IT resourcesVirtualization of data center IT resources
Out-sourcing to “utility data centers”Out-sourcing to “utility data centers”
““Software as a service”Software as a service”
Parallel databasesParallel databases
HPC Grids And Web ServicesHPC Grids And Web Services
Compute gridCompute gridForest of clusters Forest of clusters
Coordinated scheduling Coordinated scheduling of resourcesof resources
Data gridData grid
Distributed storage facilities Distributed storage facilities
Coordinated management Coordinated management of dataof data
Web ServicesWeb ServicesGlue for heterogeneous Glue for heterogeneous platforms/applications/systemsplatforms/applications/systems
Cross- and intra-Cross- and intra-organization integrationorganization integration
Standards-basedStandards-baseddistributed computingdistributed computing
Interoperability Interoperability and composabilityand composability
TechnologiesTechnologies
PlatformPlatformWindows Server 2003 SP1 64-bit EditionWindows Server 2003 SP1 64-bit Edition
x64 processors (Intel EM64T and AMD Opteron)x64 processors (Intel EM64T and AMD Opteron)
Ethernet, Ethernet over RDMA and Infiniband supportEthernet, Ethernet over RDMA and Infiniband support
AdministrationAdministrationPrescriptive, simplified cluster setup and administrationPrescriptive, simplified cluster setup and administration
Scripted, image-based compute node managementScripted, image-based compute node management
Active Directory based securityActive Directory based security
Scalable job scheduling and resource managementScalable job scheduling and resource management
DevelopmentDevelopmentMPICH-2 from Argonne National Labs with performance and MPICH-2 from Argonne National Labs with performance and security enhancementssecurity enhancements
Cluster scheduler programmable via Web Services and DCOMCluster scheduler programmable via Web Services and DCOM
Visual Studio 2005 – OpenMP, Parallel DebuggerVisual Studio 2005 – OpenMP, Parallel Debugger
Partner delivered Fortran compilers and numerical librariesPartner delivered Fortran compilers and numerical libraries
Head Node InstallationHead Node Installation
Head Node installs only on x64Head Node installs only on x64Windows 2003 Compute Cluster EditionWindows 2003 Compute Cluster Edition
Windows 2003 SP1 Standard And EnterpriseWindows 2003 SP1 Standard And Enterprise
Windows 2003 R2Windows 2003 R2
InstallationInstallationLeverages appliance like functionalityLeverages appliance like functionality
Scripted installationScripted installation
Warnings if system is misconfiguredWarnings if system is misconfigured
To Do list to assist with final configurationTo Do list to assist with final configuration
WalkthroughWalkthroughWindows Server 2003 is installed on the head nodeWindows Server 2003 is installed on the head node
System may have been pre-installed using OPKSystem may have been pre-installed using OPK
User launches Compute Cluster Kit setupUser launches Compute Cluster Kit setup
To Do list starts up, guiding User through next stepsTo Do list starts up, guiding User through next steps
User joins Active Directory domainUser joins Active Directory domain
User installs IP over IB drivers for InfiniBand cards if not pre-installedUser installs IP over IB drivers for InfiniBand cards if not pre-installed
Wizard assists with multi-NIC routing and configurationWizard assists with multi-NIC routing and configuration
Remote Installation Service is configured for imaging compute nodesRemote Installation Service is configured for imaging compute nodes
Automated installationAutomated installationRemote Installation Service provides simpleRemote Installation Service provides simpleimaging solutionimaging solution
May use third-party system imaging toolsMay use third-party system imaging toolscompute nodescompute nodes
Requires private network Requires private network
WalkthroughWalkthroughUser racks up compute nodesUser racks up compute nodes
Starts Add Node wizardStarts Add Node wizard
Powers up a group of compute nodesPowers up a group of compute nodes
Compute nodes PXE bootCompute nodes PXE boot
RIS and installation scripts willRIS and installation scripts willInstall operating system: W2K3 SP1Install operating system: W2K3 SP1
Install driversInstall drivers
Join appropriate domainJoin appropriate domain
Install compute cluster software (CD2)Install compute cluster software (CD2)
Join clusterJoin cluster
Exiting wizard turns off RISExiting wizard turns off RIS
Corpnet
Infiniband
Ethernet
ComputeNode
ComputeNode
HeadNode
Compute Node InstallationCompute Node Installation
Node ManagementNode Management
Not building a new systems management paradigmNot building a new systems management paradigmLeveraging Windows infrastructure for simple managementLeveraging Windows infrastructure for simple management
MMC, Perfmon, Event Viewer, Remote DesktopMMC, Perfmon, Event Viewer, Remote Desktop
Can integrate with enterprise management infrastructure, such as Microsoft Can integrate with enterprise management infrastructure, such as Microsoft Operations ManagerOperations Manager
Compute Cluster MMC snap-inCompute Cluster MMC snap-inSupports specific actionsSupports specific actions
Pause NodePause Node
Resume NodeResume Node
Open CD DriveOpen CD Drive
Reboot NodeReboot Node
Execute CommandExecute Command
Remote Desktop ConnectionRemote Desktop Connection
Start PerfMonStart PerfMon
DeleteDelete
PropertiesProperties
Can operate on multiple nodes at onceCan operate on multiple nodes at once
Compute Cluster Admin ConsoleCompute Cluster Admin Console
File Action View Favorites Window Help
Compute Node Name Job NameJob StatusNode StatusCompute Cluster Admin Console
Node ManagementQueue Management
Bio Lab 1 (Compute Cluster)To Do List
Node1 Active Executing Bob’s Blast Job 47 NTDEV\bobmu Node2 Active Executing Bob’s Blast Job 51 NTDEV\bobmu Node3 Active Executing Bob’s Blast Job 41 NTDEV\bobmu Node4 Active Executing Orange Temp 1245 NTDEV\suej Node5 Active Idle Node6 Paused Executing Bob’s Blast Job 42 NTDEV\bobmu Node7 Active Executing Orange Temp 1245 NTDEV\suej Node8 Paused Executing Agent B, Matrix 27 60102 NTDEV\enrico Node9 Paused Idle Node10 Paused Idle Node11 Active Idle Node12 Active Idle Node13 Active Executing Agent B, Matrix 27 60102 NTDEV\enrico Node14 Active Executing Patching 465 CC\admin Node15 Active Executing Patching 680 CC\admin Node16 Active Executing Patching 465 CC\admin Node17 Installing Node18 Installing Node19 Installing Node20 Installing
Job Time Owner
Compute Cluster Admin ConsoleCompute Cluster Admin Console
File Action View Favorites Window Help
Compute Node Name Job NameJob StatusNode StatusCompute Cluster Admin Console
Node ManagementQueue Management
Bio Lab 1 (Compute Cluster)To Do List
Node1 Active Executing Bob’s Blast Job 47 NTDEV\bobmu Node2 Active Executing Bob’s Blast Job 51 NTDEV\bobmu Node3 Active Executing Bob’s Blast Job 41 NTDEV\bobmu Node4 Active Executing Orange Temp 1245 NTDEV\suej Node5 Active Idle Node6 Paused Executing Bob’s Blast Job 42 NTDEV\bobmu Node7 Active Executing Orange Temp 1245 NTDEV\suej Node8 Paused Executing Agent B, Matrix 27 60102 NTDEV\enrico Node9 Paused Idle Node10 Paused Idle Node11 Active Idle Node12 Active Idle Node13 Active Executing Agent B, Matrix 27 60102 NTDEV\enrico Node14 Active Executing Patching 465 CC\admin Node15 Active Executing Patching 680 CC\admin Node16 Active Executing Patching 465 CC\admin Node17 Installing Node18 Installing Node19 Installing Node20 Installing
Job Time Owner
Job/Task Conceptual ModelJob/Task Conceptual ModelSerial JobSerial Job
TaskTask
ProcProc
Parallel MPI JobParallel MPI Job
TaskTask
ProcProc ProcProcIPCIPC
Parameter Sweep JobParameter Sweep Job
TaskTask
ProcProc
TaskTask
ProcProc
TaskTask
ProcProc
Task Flow JobTask Flow Job
TaskTask
TaskTask
TaskTask
TaskTask
Job Scheduler StackJob Scheduler Stack
WS (WSE 3.0)
COMAPI
CommandLine Interface
UserConsole
AdminConsole
User Interface Handlers
Queueing
Job Management Resource Management
NodeManager
NodeManager
NodeManager
NodeManager
NodeManager
NodeManager
Third-partyApplications
InterfaceLayer
SchedulingLayer
ExecutionLayer
Object Model
User Admin
Head NodeHead Node
Client NodeClient Node
Compute NodeCompute Node
Admission
Allocation
Activation
Jobs/TasksJobs/Tasks
Job SchedulerJob Scheduler
Job scheduler provides two features: Ordering and allocationJob scheduler provides two features: Ordering and allocation
Job orderingJob orderingPriority-based first-come, first-serve (FCFS)Priority-based first-come, first-serve (FCFS)
Backfill supported for jobs with time limitsBackfill supported for jobs with time limits
Resource allocationResource allocationLicense-aware scheduling through plug-insLicense-aware scheduling through plug-ins
Parallel application node allocation policiesParallel application node allocation policies
ExtensibleExtensibleCore engine based on embedded SQL engineCore engine based on embedded SQL engine
Resource and job descriptions are based on XMLResource and job descriptions are based on XML
3rd parties can extend by plugging into submission and execution phases to implement queuing 3rd parties can extend by plugging into submission and execution phases to implement queuing and licensing policiesand licensing policies
Job submissionJob submissionJobs submitted via UI, API, command line, or web serviceJobs submitted via UI, API, command line, or web service
SecuritySecurityJobs on compute nodes execute in the security account of the submitting user, allowing secure Jobs on compute nodes execute in the security account of the submitting user, allowing secure access to networked resourcesaccess to networked resources
CleanupCleanupJobs executed in Job Objects on compute nodes, facilitating cleanupJobs executed in Job Objects on compute nodes, facilitating cleanup
Queue ManagementQueue Management
Job Management model similar to print Job Management model similar to print queue managementqueue management
Leverage familiar user paradigmLeverage familiar user paradigm
Queue management operationsQueue management operationsDeleteDelete
Change propertiesChange propertiesPriorityPriority
Run timeRun time
# of CPUs# of CPUs
Preferred nodesPreferred nodes
CPUs per nodeCPUs per node
All in oneAll in one
License parametersLicense parameters
Uniform attributesUniform attributes
NotificationNotification
Compute ClusterCompute Cluster
Console Root
To Do List
Compute Nodes
File Action View Favorites Window Help
Bio Lab 1 (Compute Cluster)
Queue Management
Order Priority Name StatusOwner2 Bob’s Blast Job Domain\Bobr Running
000434 Domain\Bobr Completed 000435 Domain\Bobr Completed 000436 Domain\Bobr Running – Bnode19 000437 Domain\Bobr Running – Bnode20 000438 Domain\Bobr Running – Bnode30 000439 Domain\Bobr Running – Bnode21 000440 Domain\Bobr Running – Bnode26 000441 Domain\Bobr Running – Bnode27 000442 Domain\Bobr Running – Bnode18 000443 Domain\Bobr Queued 000444 Domain\Bobr Queued 000445 Domain\Bobr Queued 000446 Domain\Bobr Queued 000447 Domain\Bobr Queued 000448 Domain\Bobr Queued 000449 Domain\Bobr Queued 000450 Domain\Bobr Queued
1 Lodica Calc Domain\Sue Running2 Sue Domain\Sue Running2 Agent B, Matrix 27 Domain\Tam.. Running1 Better work this time! Domain\Crai.. Running2 Orange Temp Domain\Ryan Running
1
23456
Compute ClusterCompute Cluster
Console Root
To Do List
Compute Nodes
File Action View Favorites Window Help
Bio Lab 1 (Compute Cluster)
Queue Management
Order Priority Name StatusOwner2 Bob’s Blast Job Domain\Bobr Running
000434 Domain\Bobr Completed 000435 Domain\Bobr Completed 000436 Domain\Bobr Running – Bnode19 000437 Domain\Bobr Running – Bnode20 000438 Domain\Bobr Running – Bnode30 000439 Domain\Bobr Running – Bnode21 000440 Domain\Bobr Running – Bnode26 000441 Domain\Bobr Running – Bnode27 000442 Domain\Bobr Running – Bnode18 000443 Domain\Bobr Queued 000444 Domain\Bobr Queued 000445 Domain\Bobr Queued 000446 Domain\Bobr Queued 000447 Domain\Bobr Queued 000448 Domain\Bobr Queued 000449 Domain\Bobr Queued 000450 Domain\Bobr Queued
1 Lodica Calc Domain\Sue Running2 Sue Domain\Sue Running2 Agent B, Matrix 27 Domain\Tam.. Running1 Better work this time! Domain\Crai.. Running2 Orange Temp Domain\Ryan Running
1
23456
Networking Networking
Focusing on industry standard interconnect technologiesFocusing on industry standard interconnect technologiesMPI implementation tuned to WinsockMPI implementation tuned to Winsock
Automatic RDMA support through Winsock DirectAutomatic RDMA support through Winsock Direct(SAN provider required from IHV)(SAN provider required from IHV)
Gigabit EthernetGigabit EthernetExpect to be the mainstream choiceExpect to be the mainstream choice
RDMA + GigE offers compelling latencyRDMA + GigE offers compelling latency
InfinibandInfinibandEmerging as a leading high end solutionEmerging as a leading high end solution
Engaged with all IB vendorsEngaged with all IB vendors
OpenIB group developing a Windows IB stackOpenIB group developing a Windows IB stack
Planning to support IB in WHQLPlanning to support IB in WHQL
ResourcesResources
Microsoft HPC web siteMicrosoft HPC web site(evaluation copies available)(evaluation copies available)
http://www.microsoft.com/http://www.microsoft.com/hpchpc//
Microsoft Windows Compute Cluster Server Microsoft Windows Compute Cluster Server 2003 community site2003 community site
http://http://www.windowshpc.netwww.windowshpc.net/ /
Windows Server x64 informationWindows Server x64 informationhttp://www.microsoft.com/64bit/http://www.microsoft.com/64bit/
http://www.microsoft.com/x64/http://www.microsoft.com/x64/
Windows Server System informationWindows Server System informationhttp://www.microsoft.com/http://www.microsoft.com/wsswss//
© 2006 Microsoft Corporation. All rights reserved.Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation.Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft,
and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.