Upload
carlyn
View
34
Download
1
Embed Size (px)
DESCRIPTION
SOS7: “Machines Already Operational” NSF’s Terascale Computing System. SOS-7 March 4-6, 2003 Mike Levine, PSC. Outline. Overview of TCS, the US-NSF’s Terascale Computing System. Answering 3 questions: Is your machine living up to performance expectations? … What is the MTBI? … - PowerPoint PPT Presentation
Citation preview
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R11
SOS7: “Machines Already Operational”SOS7: “Machines Already Operational”NSF’s Terascale Computing SystemNSF’s Terascale Computing System
SOS-7 March 4-6, 2003SOS-7 March 4-6, 2003Mike Levine, PSCMike Levine, PSC
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R22
OutlineOutline
Overview of TCS, the US-NSF’s Terascale Overview of TCS, the US-NSF’s Terascale Computing System.Computing System.
Answering 3 questions:Answering 3 questions: Is your machine living up to performance Is your machine living up to performance
expectations? …expectations? … What is the MTBI? …What is the MTBI? … What is the primary complaint, if any, from users?What is the primary complaint, if any, from users?
[See also PSC web pages & Rolf’s info.][See also PSC web pages & Rolf’s info.]
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R33
Q1: PerformanceQ1: Performance
Computational and communications Computational and communications performance is very good!performance is very good! Alpha processors & ES45 servers: very goodAlpha processors & ES45 servers: very good Quadrics bw & latency: very good.Quadrics bw & latency: very good. ~74% of peak on Linpack; >76% on LSMS~74% of peak on Linpack; >76% on LSMS
More work on disk IO.More work on disk IO. This has been a very ease “port” for most This has been a very ease “port” for most
users.users. Easier than some Cray Easier than some Cray Cray upgrades. Cray upgrades.
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R44
6.3
8.78.0
11.3
10.3
11.3
10.19.4
10.39.9
11.1
0.0
2.0
4.0
6.0
8.0
10.0
12.0
14.0
MTB
I ( h
ours
)Q2: MTBI Q2: MTBI (Monthly Average)(Monthly Average)
• Compare with theoretical prediction of 12 hrs.• Expect further improvement (fixing systematic problems).
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R55
Time Lost to Unscheduled EventsTime Lost to Unscheduled Events
0
500
1000
1500
2000
2500
3000
3500
4000
4500
Node
Hou
rs p
er W
eek
(tot=
126,
000)
• Purple: nodes requiring cleanup• Worst case is ~3%
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R66
Q3: ComplaintsQ3: Complaints #1: “I need more time” #1: “I need more time” ((notnot a complaint about performance) a complaint about performance)
Actual usage >80% of wall clockActual usage >80% of wall clock Some structural improvements still in progress.Some structural improvements still in progress. Not a whole lot more is possible!Not a whole lot more is possible!
Work needed onWork needed on Rogue OS Rogue OS activity.activity. [recall Prof. Kale’s comment][recall Prof. Kale’s comment] MPI & global reduction libraries.MPI & global reduction libraries. [ditto][ditto] System debugging and fragility.System debugging and fragility. IO performance.IO performance.
We have delayed full disk deployment to avoid data corruption & instabilities.We have delayed full disk deployment to avoid data corruption & instabilities. Node cleanupNode cleanup
We detect & hold out problem nodes until staff clean.We detect & hold out problem nodes until staff clean. All in all, the users have been VERY pleased.All in all, the users have been VERY pleased. [ditto][ditto]
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R77
Full Machine JobFull Machine Job This system is capable of doing big scienceThis system is capable of doing big science
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R88
TCS TCS (Terascale Computing System)(Terascale Computing System) && ETF ETF Sponsored by the U.S. National Science FoundationSponsored by the U.S. National Science Foundation Serving the “very high end” for US academic computational science and Serving the “very high end” for US academic computational science and
engineeringengineering Designed to be used, Designed to be used, as a wholeas a whole, on single problems. (recall full machine job), on single problems. (recall full machine job) Full range of scientific and engineering applications.Full range of scientific and engineering applications. Compaq AlphaServer SC hardware and software technologyCompaq AlphaServer SC hardware and software technology In general production since April, 2002In general production since April, 2002
#6 in Top 500; #6 in Top 500; (largest (largest openopen facility in the world: facility in the world: Nov 2001)Nov 2001) TCS-1: in general production since April, 2002TCS-1: in general production since April, 2002 Integrated into the PACI program (Partnerships for Academic Computing Integrated into the PACI program (Partnerships for Academic Computing
Infrastructure)Infrastructure) DTF DTF project to build and integrate multiple systems project to build and integrate multiple systems
– NCSA, SDSC, Caltech, Argonne. Multi-lamba, transcontinental interconnectNCSA, SDSC, Caltech, Argonne. Multi-lamba, transcontinental interconnect ETF aka Teratrid ETF aka Teratrid (Extensible Terascale Facility) integrating TCS with DTF (Extensible Terascale Facility) integrating TCS with DTF
formingforming– A heterogeneous, extensible scientific/engineering cyberinfrastructure GridA heterogeneous, extensible scientific/engineering cyberinfrastructure Grid
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R99
Infrastructure: PSC - TCS machine roomInfrastructure: PSC - TCS machine room ( @ Westinghouse)( @ Westinghouse)(Not require a new building; just a (Not require a new building; just a pipe & wirepipe & wire upgrade; not upgrade; not maxed outmaxed out))
~8k ft~8k ft22 Use Use
~2.5k~2.5k ExistingExisting
room.room. (16 yrs (16 yrs
old.)old.)
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R1010
Floor LayoutFloor Layout
Geometrical Geometrical constraints constraints invariant invariant twixt US & twixt US & JapanJapan
SWITCH
COMPUTE NODES
SERVERS
DISKSCONTROL
Full System: Full System: Physical StructurePhysical Structure
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R1111
Compute Nodes
TTerascale erascale CComputing omputing SSystemystem
Compute NodesCompute Nodes• 750 ES45 750 ES45 4-CPU4-CPU servers servers
• +13 inline spares+13 inline spares
• (+2 login nodes) (+2 login nodes)
• 4 - EV68’s /node4 - EV68’s /node
• 1 GHz = 2.Gf 1 GHz = 2.Gf [6 Tf][6 Tf]
• 4 GB memory 4 GB memory [3.0 TB][3.0 TB]
• 3*18.2 GB disk 3*18.2 GB disk [41 TB][41 TB]• SystemSystem• User temporaryUser temporary• Fast snapshotsFast snapshots
• [~90 GB/s][~90 GB/s]
• Tru64 UnixTru64 Unix
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R1212
ES45 nodesES45 nodes 5 nodes per cabinet5 nodes per cabinet 3 local disks /node3 local disks /node
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R1313
Quadrics
Compute Nodes
TTerascale erascale CComputing omputing SSystemystem
Quadrics NetworkQuadrics Network• 2 “rails”2 “rails”
• Higher bandwidthHigher bandwidth • (~250 MB/s/rail)(~250 MB/s/rail)
• Lower latencyLower latency• 2.5 2.5 s put latency s put latency
• 1 NIC/node/rail1 NIC/node/rail• FederatedFederated switch (/rail) switch (/rail)• “ “Fat-tree” (bbw ~0.2 Fat-tree” (bbw ~0.2 TB/s)TB/s)
• User virtual memory mappedUser virtual memory mapped• Hardware retryHardware retry• HeterogeneousHeterogeneous
• (Alpha Tru64 & Linux, Intel Linux)(Alpha Tru64 & Linux, Intel Linux)
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R1414
Central Switch AssemblyCentral Switch Assembly
20 cabinets20 cabinetsin centerin center
Minimize max Minimize max internode internode distancedistance
3 out of 4 rows 3 out of 4 rows shownshown
2121stst LL switch, LL switch, outside (not shown)outside (not shown)
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R1515
Quadrics wiring overhead Quadrics wiring overhead (view towards ceiling)(view towards ceiling)
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R1616
QuadricsControl
LAN
Compute Nodes
TTerascale erascale CComputing omputing SSystemystem
Management & ControlManagement & Control
• Quadrics switch control:Quadrics switch control:• Internal SBC & EthernetInternal SBC & Ethernet
• “ “Insight Manager” on PC’sInsight Manager” on PC’s• Dedicated systemsDedicated systems• Cluster/node Cluster/node monitoring & control monitoring & control
• RMS databaseRMS database• Ethernet &Ethernet &• Serial LinkSerial Link
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R1717
QuadricsControl
LAN
Compute Nodes
WAN/LAN
TTerascale erascale CComputing omputing SSystemystem
Interactive NodesInteractive Nodes
• Dedicated: 2*ES45 Dedicated: 2*ES45
• +8 on compute nodes+8 on compute nodes
• Shared function nodesShared function nodes
• User accessUser access
• Gigabit Ethernet to WANGigabit Ethernet to WAN
• Quadrics connectedQuadrics connected
• /usr & indexed store/usr & indexed store (ISMS) (ISMS)
Interactive/usr
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R1818
QuadricsControl
LAN
Compute Nodes
File Servers/tmp
WAN/LAN
Interactive/usr
TTerascale erascale CComputing omputing SSystemystem
File ServersFile Servers• 64, on compute nodes64, on compute nodes
• 0.47 TB/server 0.47 TB/server [[30 TB30 TB]]• ~500 MB/s ~500 MB/s [~32 GB/s[~32 GB/s]]
• Temporary user Temporary user storagestorage• Direct IODirect IO
• /tmp/tmp• [Each server has [Each server has
• 24 disks on24 disks on• 8 SCSI chains on8 SCSI chains on• 4 controllers4 controllers
• sustain full drive bw.]sustain full drive bw.]
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R1919
TTerascale erascale CComputing omputing SSystemystem
SummarySummary• 750750++ ES45 Compute Nodes ES45 Compute Nodes• 3000 EV68 CPU’s @ 1 GHz3000 EV68 CPU’s @ 1 GHz• 6 Tf6 Tf • 3. TB memory3. TB memory• 41 TB node disk41 TB node disk, ~90GB/s, ~90GB/s• Multi-rail fat-tree networkMulti-rail fat-tree network• Redundant monitor/ctrlRedundant monitor/ctrl• WAN/LAN accessibleWAN/LAN accessible• File servers: File servers: 30TB 30TB, ~32 GB/s, ~32 GB/s• Buffer disk store, ~150 TBBuffer disk store, ~150 TB• Parallel visualizationParallel visualization• Mass store, ~1 TB/hr, > 1 PBMass store, ~1 TB/hr, > 1 PB• ETF coupled (ETF coupled (heterohetero))
QuadricsControl
LAN
Compute Nodes
File Servers/tmp
WAN/LAN
Interactive/usr
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R2020
Quadrics
TTerascale erascale CComputing omputing SSystemystem
TCS
ApplicationGateways Viz Buffer Disk
340 GB/s (1520Q)340 GB/s (1520Q)
4.5 GB/s (20Q)4.5 GB/s (20Q)3.6 GB/s (16Q)3.6 GB/s (16Q) 3.6 GB/s (16Q)3.6 GB/s (16Q)
VisualizationVisualization• Intel/LinuxIntel/Linux
• Newest softwareNewest software
• ~16 nodes ~16 nodes
• Parallel renderingParallel rendering
• HW/SW compositingHW/SW compositing
•Quadrics connectedQuadrics connected
• Image outputImage output
• Web pages +Web pages +
WAN coupledWAN coupled
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R2121
Buffer Disk & HSMBuffer Disk & HSM Quadrics coupled Quadrics coupled (~225 (~225
MB/s/link)MB/s/link) Intermediate between TCS Intermediate between TCS
& HSM& HSM Independently managed.Independently managed. Private transportPrivate transport from from
TCS.TCS.Quadrics
TTerascale erascale CComputing omputing SSystemystem
TCS
ApplicationGateways Viz Buffer Disk
340 GB/s (1520Q)340 GB/s (1520Q)
4.5 GB/s (20Q)4.5 GB/s (20Q)3.6 GB/s (16Q)3.6 GB/s (16Q) 3.6 GB/s (16Q)3.6 GB/s (16Q)
HSM - LSCi
>360 MB/s to tape >360 MB/s to tape
Archive diskWAN/LAN & SDSC
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R2222
Application GatewaysApplication Gateways
Quadrics coupled Quadrics coupled (~225 MB/s/link)(~225 MB/s/link)• Coupled to ETF Coupled to ETF
backbone by GigE backbone by GigE • 30 Gb/s30 Gb/sQuadrics
TTerascale erascale CComputing omputing SSystemystem
TCS
ApplicationGateways Viz Buffer Disk
340 GB/s (1520Q)340 GB/s (1520Q)
4.5 GB/s (20Q)4.5 GB/s (20Q)3.6 GB/s (16Q)3.6 GB/s (16Q) 3.6 GB/s (16Q)3.6 GB/s (16Q)
Multi GigE to ETF Backbone @Multi GigE to ETF Backbone @ 30 Gb/s
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HSU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NGC E N T E RC E N T E RC E N T E RC E N T E RC E N T E R2323
The The Front Row Front Row
Yes, those are Pittsburgh sports’ colors.Yes, those are Pittsburgh sports’ colors.