View
215
Download
1
Tags:
Embed Size (px)
Citation preview
Swiss-T1 : A Commodity MPI computing solution
Swiss-T1 : A Commodity MPI computing solution
Mars 1999
Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne
Swiss-T1 : A Commodity MPI computing solution
Swiss-T1 : A Commodity MPI computing solution
March 2000
Content:
1. Distributed Commodity HPC2. Characterisation of machines and applications
3. Swiss-Tx project
July 1998
Past : SUPERCOMPUTERPast : SUPERCOMPUTER
Cray Research
Convex
Connection Machines
KSR
Intel Paragon
Japanese companies
Teracomputers
Taken over by SGI
Taken over by HP
Disappeared
Disappeared
Stopped supercomputing
Still existing (not main)
Develop since 6 years
Produced own processors
Developped own memory switches
Needed special memories
Developped own operating system
Developped own compiler
Special I/O : HW and SW
Own communication system
Manufactures What happened Why it happened
Processor performance evolutionProcessor performance evolution
July 1998
July 1998
SMP/NUMASMP/NUMA
DIGITAL
SUN
IBM
HP
SGI
…..
Wildfire
Starfire
SP-2
Exemplar
Origin 2000
…..
Off the shelf processors
Off the shelf memory switches
Off the shelf memories
Special parts of operating system
Special compiler extensions
Special I/O and SW
Own communication system
Manufacturer Parallel server Present situation
What is the trend ?
March 2000
Commodity Computing (MPI/PCI)Commodity Computing (MPI/PCI)
PC clusters/Linux:
Fast Ethernet: Beowulf
SOS cooperation (Alpha):
Myrinet/DS10: C-Plant (SNL)T-Net/DS20: Swiss-T1 (EPFL)
Customised commodity:
Quadrics/ES40: Compaq/Sierra
Off the shelf processorsOff the shelf memory switchesOff the shelf memoriesOff the shelf local I/O HW and SWOff the shelf operating systemsOff the shelf compilers
New communication system
New distributed file/IO system
March 2000
4th SOS workshop on Distributed Commodity HPC
4th SOS workshop on Distributed Commodity HPC
Participants: SNL, ORNL, Swiss-Tx, LLNL, LANL, ANL, NASA, LBL, PSC, DOE, UNM, Syracuse,Compaq, IBM, Cray, Sun, SME’s
Content: Vision, Clusters, Interconnects, Integration, OS, I/O, Applications, Usability, Crystal ball
March 2000
Distributed commodity HPC User’s Group
Distributed commodity HPC User’s Group
Goals:
Characterise the machinesCharacterise the applications
Match machines to applications
Characterise processors, machines, and applications
Characterise processors, machines, and applications
PerformanceProcessors: Vmac
Vmac= peak proc. performance/peak memory BWParallel machines: mac
mac = effective proc. perf./effective network perf.Applications: app
app = operation count/words to be sent
15 juin 1998
In a box: Vmac valuesIn a box: Vmac values
Vmac = R [Mflop/s] / M [Mword/s]
Table: Vmac values for Alpha 21164 and 21264 boxes and NEC SX-4
Machine N R M Vmac
Alpha server 1200 2 2133 138 15 DS20 2 2000 667 3 DS20+ 2 2667 667 4
NEC SX-4 1 2000 2000 1
Between boxes: mac valueBetween boxes: mac value
mac = N * R [Mflop/s] * <d> / C [Mword/s]
Table: mac of different machines
Machine Type Nproc Peak Eff perf Eff bw mac
Gravitor Beowulf 128 50 6.4* 0.064 100Swiss-T1 T-Net 64 64 13 0.32 40Swiss-T1 FE 64 64 13 0.032 400Baby T1 C+PCI 12 12 2.4 0.072 30Origin2K NUMA/MPI 80 32 9 1 9NEC SX4 vector 8 16 8 6.4 1.3Effective performance measured with MATMULT, * estimated. Effective bandwidth measured with point to point
The app valueThe app value
app = Operations/Communicated words
Material sciences (3D Fourier analysis): app~ 50Beowulf insufficient, Swiss-T1 just about right
Crash analysis (3D non-linear FE): app> 1000Beowulf sufficient, latency?
The app value for Finite ElementsThe app value for Finite Elements
app = Operations/Communicated words
FE: Ops Nb of volume nodesOps Nb of variables per node squareOps Nb of non-zero matrix elementsOps Nb of operations per matrix element
FE: Comm Nb of surface nodesComm Nb of variables per node
FE: app Nb of nodes in one directionapp Nb of variables per nodeapp Nb of non-zero matrix elementsapp Nb of operations per matrix elementapp Nb of surfaces
The app valueThe app valueStatistics for 3D brick problem (Finite elements)
Nb of Nb of Nb Mflop Mflop kB kB app
Subd Nodes interface /cycle /data /cycle /cycleNodes /proc transfer /proc
1 5049 0 13.5 13.5 0.0 0.0 2 5202 153 13.5 6.8 7.2 3.6 150744 5508 459 13.5 3.4 21.5 5.4 502816 6366 1317 13.5 0.8 61.7 3.9 175532 6960 1911 13.6 0.4 89.6 2.8 121164 7572 2523 13.6 0.2 118.3 1.8 918128 8796 3747 13.6 0.1 175.6 1.4 620Table: Current day case, 4096 elements
March 2000
Fat-tree/Crossbars 16x16Fat-tree/Crossbars 16x16
N=8, P=8, N*P=64 PUs, X=12, BiW=32, L=64
March 2000
Circulant graphs/Crossbars 12x12Circulant graphs/Crossbars 12x12
K=2 (1/3)N=8, P=8, X=8BiW=8, L=16
K=3 (1/3/5)N=11, P=6, X=11
BiW=18, L=33
K=4 (1/3/5/7)N=16, P=4, X=16
BiW=32, L=64
March 2000
Fat-tree/Circulant graphsFat-tree/Circulant graphsTable : Comparison of Fat-tree and circulant graph architectures
Parameter Fat-tree Circulantgraph
K=2 (1/3)
Circulantgraph
K=3 (1/3/5)
Circulantgraph
K=4 (1/3/5/7)Crossbar 16x16 12 - - -Crossbar 12x12 - 8 11 16
N 8 8 11 16P 8 8 6 4
N*P 64 64 66 64D 2 2 2 2
Dm 1.75 1.25 1.28 1.38BiW 32 8 18 32
L 64 16 33 64w 1 3 3 3
T=wP2 64 192 108 48
N : Number of computing nodesP : Number of boxes per nodeN*P : Total number of boxesD : Maximum distance between two nodesDm : Average distance between two nodes (load for a point-to-point operation)BiW : Bisectional widthL : Number of linksw : Load factor for an all-to-all communication operationT : Number of steps, or load, to perform an all-to-all operation
The Swiss-Tx machinesThe Swiss-Tx machines
September 1998
Swiss-T0
Machine
Swiss-T0 *(Dual)
Baby T1*
Swiss-T1
Installation
Date Place
12.97 EPFL
10.98 EPFL
8.99 EPFL4.00 DGM
1.00 EPFL
#P
8
16
16
70
Peak
Gflop/s
8
16
16
70
Memory
GBytes
2
8
8
35
Disk
GBytes
64
170
170
950
Archive
TBytes
1**
-
-
1**
Operating
system
Digital Unix
Windows NTDigital Unix
Tru64 Unix
Tru64 Unix
Connection
EasyNet busFE bus
system
Crossbar 12x12FE switch
EasyNet busFE switch
Crossbar 12x12FE switch
-90002521008504? ? Not decidedCrossbar 12x12
FE switchSwiss-T2
* Baby T1 is an upgrade of T0(Dual) ** Archive ported from T0 to T1
March 2000
Swiss-T1Swiss-T1
Swiss-T1Swiss-T1
Components32 computational DS20E
2 frontend DS20E1 development DS20E300 GB RAID disks
600 GB distributed disks1 TB DLT archive
Fast/Gigabit EthernetTru64/TruCluster Unix
LSF, GRD/CodineTotalview, Paradyn
MPICH/PVM
T-Net network technology( 8+1)12x12 crossbar 100MB/s
32 bit PCI adapter 75 MB/s(64 bit PCI adapter 180 MB/s)
Flexible, non-blockingReliable
Optimal routingFCI 5 s
MPI 18 sMonitoring system
Remote controlUp to 3 Tflop/s ( < 100)
March 2000
Swiss-T1 ArchitectureSwiss-T1 Architecture
March 2000
Swiss-T1 Routing tableSwiss-T1 Routing table
Table: Routing table for the Swiss-T1 machine
1 2 3 4 5 6 7 81 - 2 2 4 4 6 8 82 1 - 3 7 5 3 7 53 2 2 - 4 4 6 8 84 1 7 3 - 5 7 7 35 4 2 4 4 - 6 6 86 1 3 3 7 5 - 7 17 8 2 8 4 6 6 - 88 1 5 3 3 5 1 7 -
Swiss-T1: Software in a BoxSwiss-T1: Software in a Box
March 2000
*Digital Unix Compaq Operating system in each box*F77/F90 Compaq Fortran compilers*HPF Compaq High performance Fortran*C/C++ Compaq C and C++ compilers*DXML Compaq Digital math library in each box*MPI Compaq SMP message passing interface*Posix threads Compaq Threading in a box*OpenMP Compaq Multiprocessor usage in a box through directives*KAP-F KAI To parallelise a Fortran code in a multiprocessor box*KAP-C KAI To parallelise a C program in a multiprocessor box
Swiss-T1: Software between BoxesSwiss-T1: Software between Boxes
March 2000
*LSF Platform Inc.Load Sharing Facility for resource management*Totalview Dolphin Parallel debugger *Paradyn Madison/CSCS Profiler to help parallelising programs*MPI-1/FCI SCS AG Message passing interface between boxes running over TNET*MPICH Argonne Message passing interface running over Fast Ethernet**PVM UTK Parallel virtual machine running over Fast Ethernet*BLACS UTK Basic linear algebra subroutines *ScaLAPACK UTK Linear algebra matrix solvers
MPI I/O SCS/LSP Message passing interface for I/O MONITOR EPFL Monitoring of system parameters NAG NAG Math library packageEnsight Ensight 4D visualisationMEMCOM SMR SA Data management system for distributed architecturesShmem EPFL Interface Cray to Swiss-Tx
March 2000
Baby T1 ArchitectureBaby T1 Architecture
Swiss-T1 : Alternative networkSwiss-T1 : Alternative network
March 2000
March 2000
Swiss-T2 : K-Ring architectureSwiss-T2 : K-Ring architecture
Create SwissTx Company Create SwissTx Company
Commercialise T-Net
Commercialise dedicated machines
Transfer knowhow in parallel application technology
Between boxes: mac valueBetween boxes: mac value
* measured (SAXPY and Parkbench) ** expected
mac = N * R [Mflop/s] * <d> / C [Mword/s]
Table : The mac values for Swiss-T0, Swiss-T0(Dual) and Swiss-T1 for MATMUL
Machine N R % N * R C <d> mac
T0 (Bus) 8 8000 5* 400* 4* 1 100T0(Dual) (Bus) 8*2 16533 6* 1000* 4* 1 250
Baby T1 (Switch) 6*2 12000 20* 2400* 90* 1 27T1(local) (Switch) 4*2 8000 20* 1600* 60** 1 27T1(global) (Switch) 32*2 64000 20* 12800* 400** 1.25 40
T1 (Fast Ethernet) 32*2 64000 20* 12800* 80** 1 160
Time ScheduleTime Schedule
March 2000
1.1.98 1.1.99 1.1.00
1st phase 2nd phase
Swiss-T2504 processorsOS not defined
Baby T112 processorsDigital Unix
Swiss-T0(Dual)16 processorsWindows NT
Swiss-T0(Dual)16 processorsDigital Unix
1.6.98 31.10.001.11.99
Swiss-T168 processorsDigital Unix
EasyNet bus based prototypes T-Net switch based prototype/production
machines
March 2000
Phase I: Machines installedPhase I: Machines installed
Swiss-T0: 23 December 97 (accepted 25 May 98)
Swiss-T0(Dual): 29 September 98 (accepted 11 Dec. 98 / NT)
Swiss-T0(Dual): 29 September 98 (accepted 22 Jan. 99 / Unix)
Swiss-T1 Baby: 19 August 99 (accepted 18 Oct. 99 / Unix)
Swiss-T1: 21 Jan. 2000
Swiss-T1 Node ArchitectureSwiss-T1 Node Architecture
Mars 1999
March 2000
2nd Phase Swiss-Tx: The 8 WPs2nd Phase Swiss-Tx: The 8 WPs
Managing Board: Michel DevilleTechnical Team: Ralf GruberManagement: Jean-Michel Lafourcade
WP1: Hardware development Roland Paul, SCSWP2: Communication software development Martin Frey, SCSWP3: System and user environment Michel Jaunin, SIC-EPFLWP4: Data management issues Roger Hersch, DI-EPFLWP5: Applications Ralf Gruber, CAPA/SIC-EPFLWP6: Swiss-Tx concept Pierre Kuonen, DI-EPFLWP7: Management Jean-Michel Lafourcade, CAPA/DGM-EPFLWP8: SwissTx Spin-off Company Jean-Michel Lafourcade, CAPA/DGM-EPFL
;March 2000
2nd Phase Swiss-Tx: The MUSTs2nd Phase Swiss-Tx: The MUSTs
WP1: PCI adapter page table / 64 bit PCI adapterWP2: Dual processor FCI / Network monitoring / Shmem WP3: Management / Automatic SI / Monitoring / PE / LibrariesWP4: MPI-I/O / Distributed file managementWP5: ApplicationsWP6: Swiss-Tx architecture / AutoparallelisationWP7: ManagementWP8: SwissTx Spin-off Company