44
Department of Electronics and Computer Science & School of Engineering Sciences University of Southampton All trademarks used herein are the property of their respective owners Advanced Cluster Computing Consortium (AC3) First Annual Meeting “Roadmaps to the Future of Cluster Computing” Held at Cornell Theory Center 2 nd June 2000 eeting Review by Kenji Takeda ([email protected]) School of Engineering Sciences The author thanks Microsoft Research for their support

Advanced Cluster Computing Consortium (AC3) First Annual Meeting

  • Upload
    geri

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

Advanced Cluster Computing Consortium (AC3) First Annual Meeting “Roadmaps to the Future of Cluster Computing”. Held at Cornell Theory Center 2 nd June 2000. Meeting Review by Kenji Takeda ([email protected]) School of Engineering Sciences - PowerPoint PPT Presentation

Citation preview

Page 1: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

Advanced Cluster Computing Consortium (AC3) First Annual Meeting

“Roadmaps to the Future of Cluster Computing”

Held at Cornell Theory Center2nd June 2000

Meeting Review by Kenji Takeda ([email protected])School of Engineering Sciences

The author thanks Microsoft Research for their support

Page 2: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

• Industry Standard Cluster Computing: R&D to the Enterprise

• Future of High Performance Computing: Intel Roadmap

• Cluster Computing Roadmap: Dell

• Cluster Benchmarks: Dell and CTC

• Cluster Computing with Windows 2000: MSR

• Cluster Computing Made Easy: New Tools for Scalable Servers and Services (CTC)

• Mining Large Databases: Present and Future (CTC)

• Performance, Scalability and Future Planes: MSTI

• Cluster Computing at NCSA

• Panel Sessions

• Reflections and Conclusions

Talk OutlineTalk Outline

Page 3: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

AC3 Background

• Cornell Theory Center has many years of supercomputing experience

• Needed a new mission once IBM SP2 work ended• Support computational science and push

boundaries• Formed AC3 with major industry partners: Dell,

Intel and Microsoft

“Increase the space/domain where large-scale problems of computational science are effectively

solved using industry standard cluster computing”

Thomas Coleman, Director, Cornell Theory Center

Page 4: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

Industry Standard Cluster Computing: R&D to the

Enterprise“Cluster computing is ready for Prime Time.

It doesn’t have to be hard” – David Lifka, CTC

• Proof by example– Installed 256 CPU Dell Velocity Cluster with 64 x quad 550MHz

Xeons with Giganet interconnect

– Site-installation took 10 hours

– Two weeks from installation to full production service

– Over 100 Cornell projects now use cluster

• Over 60 corporate partners involved– Want to use Windows and move away from UNIX

David Lifka, Associate Director, Cornell Theory Center

Page 5: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

Industry Standard Solutions

• Microsoft Windows NT/2000– Market volumes drive market in new directions

– 80% market is Windows NT/2000

– Administration skill base widely available

– Future killer apps

– New generation brought up on Windows. Expect high level of feature functionality and more than a command-line interface

• Big Iron Supercomputers– 4-5 times more expensive than Windows cluster solution

– High maintenance costs

– Performance and reliability gap closed

David Lifka, Associate Director, Cornell Theory Center

Page 6: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

Windows 2000 Issues

• Major reliability improvements over NT 4.0• Windows preserves all aspects of the server market• Deployable across the enterprise• Coordinated development• Desktop to Teraflops with one OS, leading to lower

TCO and consistent user interfaces• CTC moving all its services to Windows 2000:

– Email, print servers, backup, file servers, web servers, etc…

David Lifka, Associate Director, Cornell Theory Center

Page 7: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

CTC Systems Growth

• AC3 Velocity cluster has spawned huge interest• New clusters coming online:

– Velocity+: 64 x dual 733MHz PIII system with Giganet– National Plant Genomics Cluster. 48 CPUs, Gbit ethernet– Social Economics Research Cluster: 32 CPUs.Cheaper than

upgrading memory on existing SGI system! Looking to move US National Census data servers to Windows 2000 soon

– AFS servers for Windows 2000: 7 x dual PIII systems– 8 serial nodes, Poweredge 2450 servers with 1 Gbyte/node

• Testing 16- and 32-way systems (Unisys, Sequent and NEC)

• Early Testing of Itanium and Windows 2000 64-bit

David Lifka, Associate Director, Cornell Theory Center

Page 8: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

Future of High Performance Computing Roadmap: Intel

• Intel in supercomputer business for a long time• ASCI Red still world’s fastest machine,PIII upgrade• Changing definition of the supercomputer

– 1980s: Vector SMP (all custom components)

– 1990s: MPP (COTS CPUs, everything else custom)

– 2000s: Clusters (COTS everything)

• Why has clustering only now taken off:– PCs have closed performance gap

– COTS networking has hit major performance leagues with Gigabit ethernet, Giganet, Myrinet…

Timothy Mattson, Senior Research Scientist, Intel Coroporation

Page 9: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

Intel Processor Roadmap

1999 2000 2001 2002

Xeon

Itanium

CascadesFoster

McKinley

Future IA32

Madison

Deerfield

Itanium highlights800MHz and up

20 ops/clock cycle2 Gflops on LINPACK 1000

2.1 Gbytes/s bus for 4-way SMP128-bit integer and FP registers

Timothy Mattson, Senior Research Scientist, Intel Coroporation

Page 10: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

COTS Networking: VIA

• VIA (Virtual Interface Architecture) spearheaded by Intel, MS and Compaq, and 130 other companies

• Setup direct data channel that bypasses the kernel• VIA is here today – mature and stable• VIA has its problems though:

– PCI bottleneck, although improving with 2nd generation PCI-66 cards

– Targeted at clusters, not mass-market

• Infiniband is the future….

Timothy Mattson, Senior Research Scientist, Intel Coroporation

Page 11: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

COTS Networking: Infiniband

• Scalable, high-performance I/O for mass-market• Extend native message passing from CPU

Memory SAN and beyond…• Done using Host Channel Adapter (HCA) to

different I/O devices, including other nodes• 1st generation devices due Q3 2001

– Probably not best for HPC. Optimised for small-medium (e-business) clusters

• Intel aiming “to be the leader in Infiniband for clustering and e-business solutions”

“Infiniband is a great hardware implementation

of VIA”

Timothy Mattson, Senior Research Scientist, Intel Coroporation

Page 12: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

Community Cluster Development Kit

• Clusters are good for research labs but too fiddly• They are too hard to setup and use, there is little

support, too many options with no clear winners, and too many learning curves to climb

• Need fully integrated common cluster computing stacks, therefore Intel is supporting the…

Community Cluster Computing Development KitA snapshot of best-known methods, but not a new standard

“It’s the software, stupid!”

Timothy Mattson, Senior Research Scientist, Intel Coroporation

Page 13: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

Cluster Computing Roadmap: Dell

• Scalable Enterprise Computing

• Convergence of High Availability and High Performance Computing

• HPC is a building block for SEC:

– Firewalls

– Application clusters

– Data mining engines

Reza Rooholamini, Cluster Development, Dell

compute

OS

MPI

parallel apps

master node file server gateway

LAN/WAN

Page 14: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

Dell Cluster Solutions

• HPC Product Approach:– Collaborate with

universities and research institutes

– Partner major component providers

– Prototyping, benchmarking and sizing

– Case studies and white papers

PowerEdge Servers

Operating Systems

PowerVault Storage

VIA Parallel Apps Message Passing

Application Dev Tools

Configuration Tools

Fast Interconnect

Reza Rooholamini, Cluster Development, Dell

Page 15: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

Cluster Benchmarks (Dell)

• 32-CPU Dell test systems:– 8 x Dell 6350 4-way SMPs. Fast ethernet, Gigabit ethernet,

Giganet and Myrinet

– 16 x Dell 2450 2-way SMPs. Fast ethernet, Gigabit ethernet, Giganet and Myrinet

• NAS Parallel benchmarks:– Quad-processor significantly slower (30%) than dual processor.

– Single processors faster than dual processor systems

– BUT 4-way has best price/performance

• Giganet (MPI/Pro) better than Myrinet (MPICH-GM)

JenWei Hsiehi, Cluster Development, Dell and George Coulouris, CTC

Page 16: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

Cluster Benchmarks (CTC)

• Giganet Bandwidth– 113 mbytes/s using raw

Giganet cLAN driver

– 87 mbytes/s using MPI/Pro, up to 103 mbytes/s for very large messages

• NAS Parallel benchmarks:– LU and BT scales linearly with

Giganet, up to 16 nodes with fast Ethernet

JenWei Hsiehi, Cluster Development, Dell and George Coulouris, CTC

Page 17: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

Real Application Benchmarks (CTC)

• Protein folding simulations– Windows-based visualisation tools

developed, see www.tc.cornell.edu/reports/NIH/ resource/CapBiologyTools

• FEM code with 1.5 million degrees of freedom

– Superlinear scaling to 128 CPUs with PIII-733MHz and Giganet

– Per node CPU utilisation decreases as number of SMP CPUs increases

Blue Horizon SP2 (222MHz) 44.3

Pentium Xeon 550MHz (W2k) 46.0

Pentium III 650MHz (Linux) 59.1

Pentium 733MHz (W2k) 59.2

ops

processors

spee

dup SP2

PIII Cluster

JenWei Hsiehi, Cluster Development, Dell and George Coulouris, CTC

Page 18: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

Cluster Computing with Windows 2000

• $3 million+ annual commitment to HPC research• Supported projects include:

– MPICH on Windows 2000. Argonne National Labs

– NCSA VMI driver for Myrinet and Giganet

– Maui scheduler (from Utah). www.cs.byu.edu

– UTK: SInRG Grid Environment

– Globus. Ported to Windows NT. Working on Windows 2000 support using Active Directory services

– Condor scheduler

– Parallel visualisation. Kai Li using OpenGL on Windows 2000

– NCSA. High Performance DCOM over VIA

Todd Needham, Manager of Research Programs, Microsoft

Page 19: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

Enterprise Windows 2000

• Union of HPC and e-business technology• 100% overlap of tools. eg: cluster management• Need to improve out-of-the-box experience. • MS built 800 CPU Celeron 400MHz cluster to test

EP applications and DCOM scalability• MSR Cambridge:

– Performance prediction tools as runtime component in user application

• MS Redmond:– Winsock Direct, data mining, scalable servers

Todd Needham, Manager of Research Programs, Microsoft

Page 20: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

Future Technologies for Windows HPC

• Parallel file systems• Development tools and debuggers

– Toolworks and Totalview

• Parallel and Scalable commercial applications• Better desktop cluster transparency. eg: Jack

Dongarra’s Excel interface to NetSolve• Visual Studio v7. IDE for 3rd party plug-ins• 64-bit Windows 2000

Todd Needham, Manager of Research Programs, Microsoft

Page 21: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

Cluster Computing Made Easy: New Tools for Scalable Servers

and Services• ISIS, HORUS and ENSEMBLE Virtual Synchrony

execution model (1987-98)– Groups of processes with multicast comms between them

– Notification of failures and rejoins

– State transfer, allow addition of nodes to running job

• HORUS and ENSEMBLE are modular, with plug & play software components

– NYSE, Swiss Stock Exchange

– French Air Traffic Control

– Next Generation AEGIS System

Ken Birman, Professor, Computer Science, Cornell University

Page 22: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

QUINTET

• Focus on management• e-Business solutions. Huge real clusters managed

as single entities, such as Hotmail• Exploit high performance networks• Scalable cluster management• Cluster-aware application development• Enterprise clusters come in many flavours

No single management system is suitable for all needs

Ken Birman, Professor, Computer Science, Cornell University

Page 23: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

5 Lessons Learned for Scalability

1. Turn scale to an advantage

2. Progress under all circumstances

3. Avoid transparency side at the server side (it always hurts, the last 5% is impossible)

4. Do not solve all problems in the communications stack

5. Exploit intelligent, non-portable runtimes

Ken Birman, Professor, Computer Science, Cornell University

Page 24: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

Quintet Design

• Build a component framework for design and construction of cluster management systems

• Farm Manager– node membership and failure detection

– reliable comms and lightweight state-sharing

• Farm Services• Cluster Designer

– Tool to construct islands of specialised clusters with farms

– Generate cluster profiles

– Collection of User Interfaces and Visualisation tools

Ken Birman, Professor, Computer Science, Cornell University

Page 25: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

Quintet Configuration

• Automatic component configuration for core comms– Exploit SANs– Security/secrecy– Failure detection– Membership consensus– Message ordering

• Consensus membership (on AC3 Velocity cluster)– Changes: clean 200 s, dirty 500-7000 s– Component membership changes, 50-70 s– Fault tolerant distributed lock manager

• Lock acquire: 70-100 s• Node initialisation:400 s for 40,000 locks

Ken Birman, Professor, Computer Science, Cornell University

Page 26: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

Cluster Profiles

• Application development cluster– Process, job, installation and version control– Debug service, distributed logging, MS Visual Studio

integration and resource measurement

• Game server cluster– 10,000 user Quake server– Client management services, application load request routing,

synchronisation, state sharing, shared VM services

• Wolfpack/MS Cluster Services compatible profile

Quintet first public release (Alpha) in Q3 2000

Ken Birman, Professor, Computer Science, Cornell University

Page 27: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

Mining Large Databases: Present and Future

• Data mining reaching maturity. • DBMS technology: High availability, maintainability,

seamless integration with business processes• Current technology:

– Scalable data mining algorithms

– Consolidation in the industry

– Talks about crossing the chasm

Johannes Gehrke, Assistant Professor, Computer Science, Cornell University

Page 28: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

Data Mining: Future Technology

• Autopilot, automatic algorithms and parameter selection

• Privacy, internet may provide first tools for users to control access to data about themselves

• Scalability. Market basket data and ‘clickstream’ data. eg: Yahoo logs 2-4 Gbytes/hr to data mine

• Data Stream model. – Model maintenance

– Change detection

– Trend detection, find sequences in slow moving data

Johannes Gehrke, Assistant Professor, Computer Science, Cornell University

Page 29: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

Performance, Scalability, Future Plans: MPI Software Technology

• MSTI’s objectives in software design:– Performance

– Scalability

– Functionality

– Ease of Use

– Reliability

– Robustness

– Achieve production quality of support at reasonable price

– Mitigate risk, control cost of ownership

Rossen Dimitrov, MPI Software Technology

Page 30: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

MPI/Pro Features

• User-level thread safety• Asynchronous and synchronous completion

notification. User runtime switch (½ RTT quoted)– Interrupt driven for lower CPU overhead, higher latency (42 s)– Polling, low latency, higher CPU utilisation (19 s)

• Independent message progress• Low CPU overhead, high degree of overlapping• Optimied collective communications, derived

datatype, persistent mode of communications• Increased internal concurrency• Multi-driver support: Giganet, SMP and TCP

Rossen Dimitrov, MPI Software Technology

Page 31: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

MSTI Future Developments

• Support Model:– Value proposition is quality and support– Support only model (free downloads available)– Goal is to make cluster computing a business

• MPI/Pro– MPI-2 support (2001)– Interconnect configuration tool

• Cluster CoNTroller– Time sharing through Windows sessions– Gang scheduling– Windows 2000 Directory Services

Rossen Dimitrov, MPI Software Technology

Page 32: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

Cluster Computing at NCSA

• NCSA, NSF funded National Center 1986-present• Large number of parallel computer systems

– 7 x SGI Origin 2000 systems = 1536 processors

– 1 x Exemplar = 64 processors

– 256 processor NT supercluster

– 100 Windows NT CPUs in test beds and for serial jobs

– 100 Tbytes disk store. Generate about 1 Tbyte every 2 weeks

• Applications move easily to clusters, due to source level portability

Rob Pennington, Technical Program Manager, Cluster Computing, NCSA

Page 33: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

Challenges

• Technical and application challenges– Compilers, performance tools, MPI debugging

– Storage performance, biggest problem as cluster are unbalanced system architectures

– Administration tools

– Heterogeneous systems

– Integration with the Grid

• Organisational challenges– Integration with existing infrastructure

– Managing user accounts

Rob Pennington, Technical Program Manager, Cluster Computing, NCSA

Page 34: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

Clusters in the Alliance

• Three large clusters for members of the Alliance– NT Supercluster @ NCSA. 256 CPUs– Roadrunner cluster @ University of New Mexico. 512 CPUs– Argonne National Lab IBM cluster. 512 CPUs

“Develop locally, run globally”• Local clusters used for development and parameter

studies• Require compatible environments for development

and job scheduling across Windows and UNIX• Constantly evaluating technologies – OS, CPUs,

interconnect, middleware

Rob Pennington, Technical Program Manager, Cluster Computing, NCSA

Page 35: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

Evolution of Cluster Systems

Rob Pennington, Technical Program Manager, Cluster Computing, NCSA

192 cluster CPUsin 1998

1600+ cluster CPUsin 2000

• Job startup streamlined. From 15 mins (in 1998) for 128 node job to 1 minute now

• Significant user requirement for serial nodes• Reliability issues

– Windows NT nodes NEVER blue screen– One hardware failure per 100 machines per month– Peripheral failures only, not motherboards or CPUs– Use OpenGL cluster monitor tool to keep track of nodes

Page 36: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

NCSA Cluster Performance

• Quantum Chromo Dynamics memory-intensive code

– Memory leaks found in HPVM, now fixed (version 1.9)

– 5% slower using dual CPUs than single CPUs

– Not suitable for quad-processor systems at all

• ARPI3D CFD code– Code has inefficient MPI. Recoded to improve performance

– Compute time works well now, MPI part stays constant

– I/O is a major bottleneck with this code

– NT Scales better for I/O than Linux

Rob Pennington, Technical Program Manager, Cluster Computing, NCSA

Page 37: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

Clusters Futures

• 2000: Teraflop clusters possible with 1000 1GHz IA-32 nodes

• 2001: Teraflop machines with around 350 IA-64 nodes (assuming 3 GFlop CPU performance)

• Major problem is I/O bottleneck though, and SANs are expensive!

• Possible to use I/O nodes, with fibre-channel and Myrinet TCP to cross-mount file systems

Rob Pennington, Technical Program Manager, Cluster Computing, NCSA

Page 38: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

High Performance Computing with Clusters: Panel Session I

• For big application codes, use Cygwin tools for building (www.sourceware.cygwin.com)

• Use scripts to wrap native Windows compilers, make them look like UNIX ones

• Can be tedious to get around compiler flag and filename conventions

• Wish list:– C++ standard compliance

– C++ compiler robustness

– Performance and debugging tools

Page 39: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

High Performance Computing with Clusters: Panel Session II

• Molecular dynamics code users are happy:– Velocity (550 MHz Xeons) 2.2 - 2.4 faster than previous SP2

– Velocity+ (733MHz PIII) 1.3 - 1.4 faster than Velocity cluster

• Intel C compiler about 30% faster than MS Visual C++ on stochastic processes code (lots of random number generation)

• Windows 2000 runs faster than Windows NT on real applications

Page 40: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

Future for HPC Software on Windows Platform: Panel

Session I• Open question (from NAG), can Windows provide

transparent cluster? Pieces are coming together• Software vendors cite supporting different flavours

of Linux as a problem• Intel maintains that HPC is very important to them• Todd Needham of Microsoft speculates:

“Windows 2000 on Itanium Rocks!”• Microsoft sees 100% overlap in OS components for

Enterprise Computing and HPC

Page 41: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

Future for HPC Software on Windows Platform: Panel

Session II• MPI Software Technology Inc see many different

types of HPC users– Support Windows NT, Windows 2000, and different UNIXes– Different problems with different OSs– Windows: Pinning time for memory higher than Linux. Better

security than Linux. Lack of tools on Windows is crippling– Linux: SMP support not great. Many variants a problem

• Windows cluster out-of-the-box experience not great• Not many production settings of Windows clusters, so

people are not taking it seriously – yet!• Beowulf group has a quasi-community that is strong

Page 42: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

Future for HPC Software on Windows Platform: Panel

Session IIIFive-year prognosis for Windows Clusters

• Performance + Security + IT + TCO issues prevail• Bright future with a level playing field. Good for

competition• Academia will be biased towards using Linux• Outside Academia will be more Windows 2000

oriented

• User Beware! Petaflop computing will need a new paradigm though, to supersede MPI

Page 43: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

Reflections and Conclusions

• Cornell Theory Centre has demonstrated Industry Standard Windows Clusters by example

• Performance is as good, or better than Big Iron• HPC is becoming mainstream as a business tool• Convergence in hardware and software between e-

business/Enterprise Computing and HPC• Cluster management software is maturing fast• Lack of software development tools is a key

problem

Page 44: Advanced Cluster Computing Consortium (AC3) First Annual Meeting

Department of Electronics and Computer Science &School of Engineering Sciences University of Southampton

All trademarks used herein are the property of their respective owners

More Information…

For more information about Cornell Theory Centre Advanced Cluster Consortium (AC3) see:

http://www.tc.cornell.edu/

For more information about Windows Clusters in general see:

http://www.windowsclusters.org