32
The IEEE CS Task Force on Cluster Computing (TFCC) William Gropp Mathematics and Computer Science Argonne National Lab www.mcs.anl.gov/~gropp Thanks to Mark Baker University of Portsmouth, UK http://www.dcs.port.ac.uk /~mab

The IEEE CS Task Force on Cluster Computing (TFCC)

  • Upload
    mireya

  • View
    29

  • Download
    2

Embed Size (px)

DESCRIPTION

The IEEE CS Task Force on Cluster Computing (TFCC). William Gropp Mathematics and Computer Science Argonne National Lab www.mcs.anl.gov/~gropp. Thanks to Mark Baker University of Portsmouth, UK http://www.dcs.port.ac.uk/~mab. A Little History. - PowerPoint PPT Presentation

Citation preview

Page 1: The IEEE CS Task Force on Cluster Computing (TFCC)

The IEEE CS Task Force on Cluster Computing (TFCC)

William GroppMathematics and Computer

ScienceArgonne National Lab

www.mcs.anl.gov/~gropp

Thanks to Mark BakerUniversity of Portsmouth, UKhttp://www.dcs.port.ac.uk/~mab

Page 2: The IEEE CS Task Force on Cluster Computing (TFCC)

[email protected]

A Little History In 1998 there was obvious huge interest

in clusters, so it seemed natural to set up a focused group in this area.

A Cluster Computing Task Force was proposed to the IEEE CS.

The TFCC was approved and started operating in February 1999 – been going just over 2 years.

Page 3: The IEEE CS Task Force on Cluster Computing (TFCC)

[email protected]

Proposed Activities Act as an international forum to promote cluster

computing research and education, and participate in setting up technical standards in this area.

Be involved with issues related to the design, analysis and development of cluster systems as well as the applications that use them.

Sponsor professional meetings, produce publications, set guidelines for educational programs, and help co-ordinate academic, funding agency, and industry activities.

Organize events and hold a number of workshops that would span the range of activities sponsored by the Task Force.

Publish a bi-annual newsletter to help the community keep abreast of activities in field.

Page 4: The IEEE CS Task Force on Cluster Computing (TFCC)

[email protected]

IEEE CS Task Forces A TF is expected to have a finite term of

existence, normally a period of 2-3 years - continued existence beyond that point is generally not appropriate.

A TF is expected to either increase their scope of activities such that establishment of a Technical Committee (TC) is warranted, or the task force will be merged into existing TCs.

TFCC will submit an application to the CS become a TC later this year.

Page 5: The IEEE CS Task Force on Cluster Computing (TFCC)

[email protected]

Why a separate TFCC! It brings together all the

activities/technologies used with Cluster Computing into one area - so instead of tracking four or five IEEE TCs there is one...

Cluster Computing is NOT just Parallel, Distributed, OSs, or the Internet, it is a mix of them all, and consequently different.

The TFCC is an appropriate body for focusing activities and publications associated with Cluster Computing.

Page 6: The IEEE CS Task Force on Cluster Computing (TFCC)

[email protected]

http://www.ieeetfcc.org

Page 7: The IEEE CS Task Force on Cluster Computing (TFCC)

[email protected]

TFCC Mailing Lists Currently three emails lists have

been set up:• [email protected] – a discussion list

open to anyone interested in the TFCC - see TFCC page for info. on “how to subscribe”.

[email protected] – a closed executive committee mailing reflector.

[email protected] – a closed advisory committee mailing reflector.

Page 8: The IEEE CS Task Force on Cluster Computing (TFCC)

[email protected]

Annual Conference – ClusterXY

1st IEEE International Workshop on Cluster Computing (Cluster 1999), Melbourne, Australia, December 1999, about 105 attendees from 16 countries.http://www.clustercomp.org

2nd IEEE International Conference on Cluster Computing (Cluster 2000), Chemnitz, Germany, November, 2000, anticipate 160 attendees.http://www.tu-chemnitz.de/cluster2000

3rd IEEE International Conference on Cluster Computing (Cluster 2001), Newport Beach, California, October 8-11, 2001, expect 250-300 attendees.http://andy.usc.edu/cluster2001

Page 9: The IEEE CS Task Force on Cluster Computing (TFCC)

[email protected]

Associated Events - GRID’XY

1st IEEE/ACM International Workshop on Grid Computing (Grid2000), Bangalore, India, December 17, 2000 (attendees from 15 countries).http://www.gridcomputing.org

2nd IEEE/ACM International Workshop on Grid Computing (Grid2001), at SC2001, November 2001

Page 10: The IEEE CS Task Force on Cluster Computing (TFCC)

[email protected]

Supercomputing “Birds of A Feather” at SC99 and

SC2000. Aims of meetings are to gather

together interested parties and bring them up to date, but also put together a bunch of short talks and start a discussion on a variety of topics…

Probably be another at SC01 – depending on the community interest.

Page 11: The IEEE CS Task Force on Cluster Computing (TFCC)

[email protected]

Other Activities Book donation program Cluster Computing Archive

www.ieeetfcc.org/ClusterArchive.html TopClusters Project

www.TopClusters.org TFCC Whitepaper

www.dcs.port.ac.uk/~mab/tfcc/WhitePaper TFCC Newsletter

www.eg.bucknell.edu/~hyde/tfcc

Page 12: The IEEE CS Task Force on Cluster Computing (TFCC)

[email protected]

TopClusters Project http://www.TopClusters.org TFCC collaboration with Top500

project. Numeric, I/O, Web, Database, and

Application level benchmarking of clusters.

Joint BOF with Top500 at SC2000 on Cluster-based benchmarking.

Ongoing effort…

Page 13: The IEEE CS Task Force on Cluster Computing (TFCC)

[email protected]

TFCC Whitepaper A Whitepaper on Cluster Computing,

submitted to the International Journal of High-Performance Applications and Supercomputing, November 2000

Snap-shot of the state-of-the-art of Cluster Computing.

Preprint, www.dcs.port.ac.uk/~mab/tfcc/WhitePaper/

Page 14: The IEEE CS Task Force on Cluster Computing (TFCC)

[email protected]

TFCC Membership Over 300 registered members

Free membership open to all, but few benefits may be restricted - (reduced registration fee for IEEE members)

Over 450 on the TFCC mailing list <[email protected]>

Page 15: The IEEE CS Task Force on Cluster Computing (TFCC)

[email protected]

Future Plans We plan to submit an application to the

IEEE CS Technical Activities Board (TAB) to attain full Technical Committee status.

The TAB see the TFCC as a success and we hope that our application will be successful.

Obviously if we achieve TC status, we will need the continuing assistance and help of the TFCCs current volunteers plus encourage a bunch of new ones…

Page 16: The IEEE CS Task Force on Cluster Computing (TFCC)

[email protected]

Summary Successful conference series has been

started, with commercial sponsorship. Promoting Cluster-based technologies

through TFCC sponsorship. Helping the community with our book

donation program. Engendering debate and discussion

through mailing forum. Keeping the community informed with

our information rich TFCC Web site.

Page 17: The IEEE CS Task Force on Cluster Computing (TFCC)

[email protected]

Scalable Clusters TopCluster.org list:

26 Clusters with 128+ nodes 8 with 500+ nodes

34 with 64-127 nodes Most run Linux Most dedicated to applications

Where are scalable tools developed and tested? Caveats:

Does not include MPP-like systems (IBM SP, SGI Origin, Compaq, Intel TFLOPs, etc.)

Not a complete list Only clusters explicitly contributed to topcluster.org

Page 18: The IEEE CS Task Force on Cluster Computing (TFCC)

[email protected]

What is Scalability? Most common definition in use:

Works for n+1 nodes if it works for n, for small n

Practical definition Operations complete “fast enough”

0.5 to 3 seconds for “interactive” Operations are reliable

Approach to scalability must not be fragile

Page 19: The IEEE CS Task Force on Cluster Computing (TFCC)

[email protected]

Issues in Clusters and Scalability Developing and Testing Tools

Requires convenient access to large-scale system

Can this co-exist with production computing?

Too many different tools Why not adopt Unix philosophy? Example solution: Scalable Unix Tools Following slides thanks to Rusty Lusk

and Emil Ong

Page 20: The IEEE CS Task Force on Cluster Computing (TFCC)

[email protected]

What Are the Scalable Unix Tools? Parallel versions of common Unix commands

like ps, ls, cp, …, with appropriate semantics A few new commands in the same spirit but

without a serial counterpart Designed for users New this spring: release of a high-performance

implementation based on MPI One of the original “official” Ptools projects Original definition published

Proceedings of the Scalable High Performance Computing Conference

http://www.mcs.anl.gov/~gropp/papers/1994/shpcc-paper.ps

Page 21: The IEEE CS Task Force on Cluster Computing (TFCC)

[email protected]

Motivation Basic Unix commands (ls, grep, find,

…) are quintessential tools. Simple syntax and semantics (except

maybe find syntax) Have same component interface (lines

of text, stdin, stdout) Unix redirection ( <, >, and especially |

) allow tools to be easily combined into powerful command lines

“Old-fashioned”: no GUI, little interactivity

Page 22: The IEEE CS Task Force on Cluster Computing (TFCC)

[email protected]

Motivation, continued Many parallel machines have Unix and at

least partially distinct file systems on each node.

A user needs simple and familiar ways to Copy a file to local file space on each node Find all processes running on all nodes Test for conditions on all nodes Avoid getting swamped with output

On large machines these commands are not useful unless they take advantage of parallelism in their execution.

Page 23: The IEEE CS Task Force on Cluster Computing (TFCC)

[email protected]

Design Goals Familiar to Unix users

Similar names (we chose pt<Unix-name>)

Same arguments, similar semantics Interact well with traditional Unix

commands, facilitating construction of powerful command lines

Run at interactive speeds (requires scalability in parallel process manager startup and handling of I/O)

Page 24: The IEEE CS Task Force on Cluster Computing (TFCC)

[email protected]

Part I: Parallel Versions of Traditional Commands ptcp ptmv ptrm ptln ptmkdir

ptrmdir ptchmod ptchgrp ptchown pttest[ao]

• Select nodes to run on by• -all• -m <file of hostnames>• -M <hostlist>

• ‘donner dasher blitzen’• ‘ccn%d@1-32,42,65-96’

Page 25: The IEEE CS Task Force on Cluster Computing (TFCC)

[email protected]

Part II: Traditional Commands Producing Lots of Output

ptcat, ptls, ptfind Have potential to produce lots of output,

and the source is also of interest With –h option: ptls –M node%d@1-3 -h

[node1]myfile1[node2][node3]myfile1myfile2

Page 26: The IEEE CS Task Force on Cluster Computing (TFCC)

[email protected]

Performance of ptcp

Time to Copy 10MB file Total Bandwidth

• Copying a single 10 MB file• to 241 nodes in 14 seconds

Page 27: The IEEE CS Task Force on Cluster Computing (TFCC)

[email protected]

Watching ptcpptcp –all bigfile BIGFILE

X=1while true; do \ ptexec -all 'echo "`hostname`: `ls -s BIGFILE

\ | awk \ "{print \\"percentage\\" \$ (1)/98 \\"

blue \ red\\"}\"`"' | ptdisp -h

Page 28: The IEEE CS Task Force on Cluster Computing (TFCC)

[email protected]

Percentage of Completion

Page 29: The IEEE CS Task Force on Cluster Computing (TFCC)

[email protected]

Percentage of Completion

Page 30: The IEEE CS Task Force on Cluster Computing (TFCC)

[email protected]

Availability Open source Get from http://www.mcs.anl.gov/sut All source, man pages Configure, make, on Linux, Solaris,

Irix, AIX Needs MPI implementation with

mpirun Developed with Linux, MPICH, MPD,

on Chiba City at Argonne

Page 31: The IEEE CS Task Force on Cluster Computing (TFCC)

[email protected]

Chiba City Scalability Testbed http://www-unix.mcs.anl.gov/chiba/

Page 32: The IEEE CS Task Force on Cluster Computing (TFCC)

[email protected]

Some Other Efforts in Scalable Clusters Large Programs

DOE Scientific Discovery through Advanced Computing (SciDAC)

NSF Distributed Terascale Facility (DTF) OSCAR

Goal is a “cluster in a box” CD PVFS (Parallel Virtual File System) Many Smaller Efforts

www.beowulf.org, etc. Commercial Efforts

Scyld, etc.