Upload
aurek
View
23
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Ultimate Integration. Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop. Agenda. Supercomputing 2004 Conference Application Ultimate Integration Resource Overview Did it work? What did we take from it?. Supercomputing 2004. Annual Conference - PowerPoint PPT Presentation
Citation preview
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
Ultimate Integration
Joseph Lappa
Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
Agenda
• Supercomputing 2004 Conference
• Application– Ultimate Integration
• Resource Overview
• Did it work?
• What did we take from it?
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
Supercomputing 2004
• Annual Conference– Supercomputers– Storage
• Network hardware
– Original reason for application• Bandwidth Challenge
– Didn’t apply due to time
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
Application Requirements
• Runs on Lemieux (PSC’s supercomputer)• Application Gateways (AGW)• Cisco CRS-1
– 40Gb/sec OC-768 cards– Few exist
• Single application• Be used with another demo on the show floor
if possible
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
Ultimate Integration Application
• Checkpoint Recovery System – Program
• Garden variety Laplace solver instrumented to save its memory state in checkpoint files
• Checkpoints memory to remote network clients• Runs on 34 Lemieux nodes
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
Lemieux TCS System
• 750 Compaq Alphaserver ES45 nodes– SMP
• Four 1GHz Alpha Processors• 4 GB of Memory
• Interconnection– Quadrics Cluster Interconnect
• Shared memory library
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
Application Gateways
• 750 GigE connections are very expensive• Reuse Quadrics network to attach cheap
Linux boxes with GigE– 15 AGWS
• Single processor Xeons• 1 Quadrics card• 2 Intel GigE
– Each GigE card maxes out at 990Mb/sec– Only need 30 GigE to fill link to Teragrid
• Web100 kernel
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
Application Gateways
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
Network
• Cisco 6509– Sup720– WS-X6748-SFP– Two WS-X6704-10GE
• Used 4 10GE interfaces• OSPF load balancing was my real worry
– >30 GE streams over 4 links
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
Network
• Cisco CRS-1– 40 Gb/sec slot– 16 slots– For Demo
• Two OC-768 cards – Ken Goodwin’s and Kevin McGratten’s big worry was the
OC-768 transport
• Two 8 Port 10 GE cards
– Running production IOS-XR code– Had problems with tracking hardware
• Ran both without 2 Switching Fabrics with no effects on traffic
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
Network
• Cisco CRS-1– One at Westinghouse Machine Room– One on show floor
• Fork lift needed to place it– 7 feet tall– 939 lbs empty– 1657 lbs fully loaded
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
The Magic Box
• Stratalight – OTS 4040 transponder “compresses” the 40Gbs signal to fit into the spectral bandwidth of a traditional 10G wave– http://www.stratalight.com/
• Uses proprietary encoding techniques • The Stratalight transponder was connected to
the Mux/DMUX of the 15454 as an alien wavelength
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
Time Dependences
• OC-768 wasn’t worked on until one week before the conference
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
OC-768
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
OC-768
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
OC-768
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
Where Does the Data Land?
• Lustre Filesystem– http://www.lustre.org/
• Developed by Cluster File Systems– http://www.clusterfs.com/
• POSIX compliant, Open Source, parallel file system
• Separates metadata and data objects to allow for speed and scaling
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
The Show Floor
• 8 Checkpoint Servers with a 10GigE and Infiniband connections
• 5 Lustre OSTs connected via Infiniband with 2 SCSI disk shelves (RAID5)
• Lustre meta-data server (MDS) connected via Infiniband
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
The Show Floor
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
The Demo
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
How well did it run?
• Laplace Solver w/ Checkpoint Recovery– Using 16 Application Gateways (32 GigE
connections): 31.1Gbs• Only 32 Lemieux nodes were available
• IPERF– Using 17 Application Gateways + 3 single GigE
attached machines: 35 Gbs
• Zero SONET errors reported on interface• Over 44TB were transferred
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
The Team
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
Just Demoware?
• AGWs– qsub command now has AGW option
• Can do accounting (and possibly billing)• Mysql database with Web100 stats
– Validated that AGW was cost effective solution
• OC-768 Metro can be done by mere mortals
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
Just Demoware??
• Application receiver – Laplace solver ran at PSC– Checkpoint receiver program tested / run at both
NCSA and SDSC• Ten IA64 compute nodes as receiver• ~10 Gb/sec Network to Network (/dev/null)
– 990 Mb/sec * 10 streams
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
Thank You