35
Zellescher Weg 16 Trefftz-Bau (HRSK-Anbau) Room HRSK/151 Tel. +49 351 - 463 - 39871 Guido Juckeland ([email protected]) Center for Information Services and High Performance Computing (ZIH) Introduction to High Performance Computing at ZIH Getting started

Zellescher Weg 16 Trefftz-Bau (HRSK-Anbau) Room HRSK/151 Tel. +49 351 - 463 - 39871 Guido Juckeland ([email protected]) Center for Information

Embed Size (px)

Citation preview

Zellescher Weg 16Trefftz-Bau (HRSK-Anbau) Room HRSK/151Tel. +49 351 - 463 - 39871

Guido Juckeland ([email protected])

Center for Information Services and High Performance Computing (ZIH)

Introduction to High Performance Computing at ZIH

Getting started

Agenda

Before you can get on – Paperwork

When you first get on – Using ssh, VPN, environment modules, available file systems

Things to know about the hardware you are/will be using

Slides at: http://wwwpub.zih.tu-dresden.de/~juckel/slides

Slide 2 - Guido Juckeland

Before you can get on – Paperwork

Slide 3 - Guido Juckeland

Project Proposal

• No login without a valid HPC project!

• Every HPC user account has to associated with at least one project

• Project has to be endorsed (headed) by a Saxonian research group leader

• Applications (pdf):http://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/dienste/formulare

• Online project application:https://formulare.zih.tu-dresden.de/antraege/antrag/antrag_form.html

• Small amount of CPU time can be granted immediately

• Proposal is peer reviewed and decided upon (Peers from all over Saxony)

• Projects have a lifetime – need to reapply for follow-up projects

Login Application

• Paperwork at:http://www.tu-dresden.de/zih/hpc

• You need a signature of your project leader on the application

• What you get:

• ZIH-Standard Login (E-Mail Account, Personal Storage, Anti-Virus Software, VPN Access, WLAN Access over Eduroam,…)

• Account on the HPC systems you applied for

• Automatic entry in the ZIH HPC Maillists (Announcements and Forum)

• Accounts usually expire every year at the end of October! You need to extend your login!

When you first get on – Using ssh, VPN, environment modules, available file systems

Slide 6 - Guido Juckeland

Access from within the TUD-Network

• You are on the TUD campus (you have an IP-Address that starts with 141.30 or 141.76)

• Simply „ssh/sftp“ to the machine address (e.g. ssh deimos.hrsk.tu-dresden.de)

• No Web-Access or similar (so do not try mars.hrsk.tu-dresden.de in your browser)

Access from the outside of the TUD Network

• You are sitting at a MPI or FhG (or at home)

• No direct access from outside the TUD local network (hardware firewall)

• 2 Options:

• Double ssh connection (tough for file transfers)• First ssh to one of the central ZIH login servers (login1.zih.tu-

dresden.de or login2.zih.tu-dresden.de) using your standard ZIH-login

• ssh to the desired HPC machine from there

• Use a ZIH VPN connection (preferred solution)• Download and install a ZIH VPN client (more information under:

http://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/dienste/datennetz_dienste/vpn)

• Establish a VPN connection using your ZIH standard login• Then open a ssh/sftp connection from your computer to the

desired HPC system

Andreas Knüpfer: HRSK-Einführung

SSH Fingerprints der HRSK-Maschinen

mars.hrsk.tu-dresden.de:

1024 cf:89:20:a8:aa:36:3f:1f:7b:5e:f4:8e:57:99:15:35 ssh_host_dsa_key.pub

1024 1a:cc:4e:4f:ff:5f:b0:bc:25:9d:84:9f:39:12:d7:6d ssh_host_key.pub

1024 08:3b:da:02:1d:ff:a8:cf:26:27:96:16:86:07:a2:a9 ssh_host_rsa_key.pub

neptun.hrsk.tu-dresden.de:

1024 b0:0b:2c:3d:66:d9:d2:49:ec:fc:d1:89:6d:5b:4c:f7 ssh_host_key.pub

deimos10[1-4].hrsk.tu-dresden.de:

1024 48:f7:d6:37:d0:cf:b0:f4:49:67:b6:1f:c1:44:7d:9f ssh_host_dsa_key.pub

1024 5f:11:98:8a:29:20:c8:65:78:75:d7:a0:bb:d4:74:93 ssh_host_key.pub

1024 22:42:72:c6:38:57:71:03:90:72:2b:2c:72:e7:d0:cd ssh_host_rsa_key.pub

phobos.hrsk.tu-dresden.de:

1024 91:bd:d0:b0:8b:60:75:40:bc:4a:54:9d:54:2a:dc:b8 ssh_host_dsa_key.pub

1024 1b:1c:29:1f:d2:5c:a9:0b:ac:e6:cf:28:1c:4f:92:8f ssh_host_key.pub

1024 b8:14:54:9a:f5:06:f8:d5:da:cb:51:a8:21:fb:db:bd ssh_host_rsa_key.pub

You are on – what do you find?

• HRSK: Standard Linux Enterprise installation (SuSE SLES 10 SP 2)

• Phobos: SuSE SLES 9 SP 3

• SX-6: SuperUX (Special UNIX environment)

• Similar to a Desktop Linux (some special programs missing)

• GCC, automake, and all the standard tools are there

• Only a limited number of GUI tools available (usually not needed)

• Caution: The amount of CPU time on the login nodes is limited to 5 minutes

• This can cause problems for large file transfers contact us in this case

• 3rd party software or stuff that is not in the Linux distribution via enviroment modules

Andreas Knüpfer: HRSK-Einführung

Modules for environment variables

Non standard software installed into special paths (not in standard search path for applications)

Modules set environment variables so that applications and libraries find their binaries/shared objects

Show installed modules module avail

Show currently loaded modules module list

Load a module module load <name>

Unload a module module rm <name>

Exchange modules module switch <1> <2>

Andreas Knüpfer: HRSK-Einführung

HRSK-Software

Installed Software on the HRSK Systems (not complete, not all on all systems):

Compilers:

– GCC

– Intel

– Pathscale

– PGI

Debuggers:

– ddd

– ddt

– idb

– Totalview

– Valgrind

Libraries:– acml– atlas– blacs– blas– boost– hypre– lapack– mkl/

clustermkl– netcdf– petsc

Applications:– Abaqus– Ansys– CFX– Comsol– CP2K– Fluent– Gamess– Gaussian– Gromacs– Hmmer– Lammps

– LS-Dyna– Maple– Mathematica– Matlab– MSC– Namd– Numeca– Octave– R– Tecplot

Michael Kluge

File system layout

Michael Kluge

Altix 4700

CXFS

– The same on all Altix partitions

– work [ /work ]• contains /work/home[0-9]/• 8,8 TB• Backup

– fastfs [ /fastfs ]• 60 TB• DMF, no Backup• Fastest file system

scratch [ /scratch ]• Local – only visible per Altix partition• Fast alternative to /tmp

Michael Kluge

Deimos

Lustre

– work [/work]• contains/work/home[0-9]/• global 16 TB• Backup

– fastfs [ /fastfs ]• global 48 TB• noBackup• Fastest available file system

local (ext3)

– scratch [ /scratch ]• local per node (per core about 40 GB)

Michael Kluge

Deimos (2)

NFS

– /hpc_fastfs• /fastfs from the Altix• Also dmf commands available to access archive• Also Deimos only users have access here to archive data

– /hpc_work• /work from the Altix• Incl. Home directories there

Project directories

• You are by default in a a user group the has the same name as your project

• Your project has a shared „Home“ and „Fastfs“ directory for you to share applications and data

• There are symbolic links in your home directory to the project directories

• Please use them and do not install software into each of your project members home directories!

Michael Kluge

DMF - Commands

DMF copies data back and forth automatically

Manual invocation possible to migrate data between disk and tape

dmput

– Moves data from disk to tape

– “-r” also removes the data from disk after moving

– Moving is done in the background

dmls

– Extended ls

– Displays the location of the file data (ONL=disk, OFL=tape; DUL=on disk and tape; MIG=currently moving to tape; UNM=currently moved to disk)

dmget

– Recalls data from tape to disk

Use dmput/dmget calls for full directories if needed!!

Michael Kluge

I/O Recommendations

Temporary data data -> /fastfs

Compile in /scratch

Source code etc. -> home

Checkpoints -> fastfs

Archive results as tar files (no need to compress) to /fastfs or /hpc_fastfs and run dmput -r on it afterwards

Parallel file systems are bad for small I/O! (e.g. compilation)

Large I/O bandwitdth with

– Lots of clients

– Lots of processes (that may even write to the same file)

– Large I/O blocks

Things to know about the hardware you are/will be using

Slide 20 - Guido Juckeland

Michael Kluge

SGI Altix 4700

SGI Altix 4700 (5 partitions)1024 x 1.6GHz/18MB L3 Cache Itanium II / Montecito CPUs (2048 Cores)13,1 TFlop/s Peak Performance6,6 TB Memory (4 GB/Core)NumaLink4Local disks + 68 TB SANSuSE SLES 10 incl. SGI ProPack 4Intel Compiler and ToolsVampirAlinea DDT DebuggerBatchsystem LSF

Michael Kluge

CPU

Intel Itanium II (Montecito), ca. 1.7 Billion transistors

IA-64 (not x86!!!)

1.6 GHz

Dual-Core

per Core:

– L1: 16 KB Data (no floating-point data) / 16 KB instructions

– L2: 256 KB Data / 1024 KB Instructions

– L3: 9 MB

Instuction bundles with 128 bit

3 instructions per bundle

No out-of-order execution

Depends extremely on the compiler (do not use GCC!!)

Michael Kluge

Connection to local memory and the rest of the system

Itanium IISocket

SHUB 2.0 NumaLink4 2*6,4 GB/s10,7 GB/s

DDR2 DIMM DDR2 DIMM

DDR2 DIMM DDR2 DIMM

DDR2 DIMM DDR2 DIMM

DDR2 DIMM DDR2 DIMM

DDR2 DIMM DDR2 DIMM

DDR2 DIMM DDR2 DIMM

Michael Kluge

The whole system architecture

1 Chip (2 Cores) per blade

8 Blades per IRU

4 IRUs per Rack

32 Racks

1024 Chips

2048 Cores spread over 5 partitions

one Paritition = 1 computer (1 operating system instance)

Michael Kluge

jupiter - Topology

Folie: SGI

Michael Kluge

Altix partitions

On all partitions: 4 CPUs set aside for the operating system

mars:

– 384 GB main memory

– 32 Prozessoren Login

– 346 Prozessoren batch operations

jupiter,saturn,uranus

– 2 TB main memory

– 506 CPUs batch operations

neptun

– 124 Prozessoren interactive use

– 2 * FPGA

– 4 graphic boards

Michael Kluge

User‘s view on the Altix

Login via SSH -> terminal emulation

Boot-CPU-Set with 4 processors

SuSE Enterprise Server 10 SP 2

Standard Linux-Kernel

Batch system places user requests on the rest of the available processors (also on the other partitions)

jupiter, saturn, uranus

LSF

marsLSF

Fire

wall

Login

Accessvia ssh

neptunFPGA

Graphics

Slide 28 - Guido Juckeland

Linux Networx PC-Farm (Deimos)

1292 AMD Opteron x85 Dual-Core CPUs (2,6 GHz)

726 Compute nodes with 2, 4 oder 8 CPU Cores

Per core 2 GiByte main memory

2 Infiniband interconnects (MPI- and I/O-Fabric)

68 TByte SAN-Storage

Per node 70, 150, 290 GByte scratch-disk

OS: SuSE SLES 10

Batch system: LSF

Compiler: Pathscale, PGI, Intel, Gnu

3rd party applications: Ansys100, CFX, Fluent, Gaussian, LS-DYNA, Matlab, MSC,…

Slide 29 - Guido Juckeland

Deimos - Partitions

2 Master Nodes

– Not accessible for users, PC-Farm management

4 Login Nodes

– 4 Core Nodes

– Accessible with DNS Round Robin under deimos.hrsk.tu-dresden.de

Single-, Dual- und Quad-Nodes

– 1, 2 or 4 CPUs

– 4, 8 or 16 GiByte main memory (24 Quads with 32 GiByte)

– 80, 160 or 300 GByte local disks

Setup in phase 1 and phase 2 nodes

– Identical hardware

– Differences in the connection to the MPI- and the I/O-Fabric (later)

Slide 30 - Guido Juckeland

Deimos – Layout of a single-CPU node

AMDOpteron

185Mem

ory

(4 G

iByte

)

Hypertransport

Peripheral devices(Infiniband, Ethernet, Disk)

Slide 31 - Guido Juckeland

Deimos – Layout of a dual-CPU nodes

AMDOpteron

285

AMDOpteron

285Mem

ory

(4 G

iByte

)

Mem

ory

(4 G

iByte

)

Hypertransport

Hypertransport

Peripheral devices(Infiniband, Ethernet, Festplatte)

Slide 32 - Guido Juckeland

Deimos - Layout of a quad-CPU Node

AMDOpteron

885

AMDOpteron

885Mem

ory

(4 G

iByte

)

Mem

ory

(4 G

iByte

)

Hypertransport

Hypertransport

Peripheral devices(Infiniband, Ethernet, Festplatte)

AMDOpteron

885

AMDOpteron

885Mem

ory

(4 G

iByte

)

Mem

ory

(4 G

iByte

)

Hypertransport

Hypertransport

Hypertransport

Slide 33 - Guido Juckeland

Deimos Infiniband-Layout (rough sketch)

Node

Node

Node

Node

Node

...

Node

Node

Node

Node

Node

...

MPI Netzwerk

IO Netzwerk

Slide 34 - Guido Juckeland

Deimos MPI-Fabric

+-------------------+ +--------------------+ +-------------------+| Switch 1 | | Switch 2 | | Switch 3 || | 30x | | 30x | || Rack 05 |-------| Rack 20 |-------| Rack 25 || | | | | || all Phase1 Nodes | | Phase2 Duals+Quads | | Phase 2 Singles |+-------------------+ +--------------------+ +-------------------+

3 288-Port Voltaire ISR 9288 IB-Switches with 4x Infiniband Ports

Slide 35 - Guido Juckeland

Deimos I/O Fabric

Tree structure with

– 1 192 Port Voltaire ISR 9288 IB-Switch with 4x Infiniband Ports (Rack 07)

– 36 24 Port Mellanox IB-Switch (4x) passive

VoltaireCore-Switch

VoltaireCore-Switch

24 Port Mellanox24 Port Mellanox

24 Port Mellanox24 Port Mellanox

24 Port Mellanox24 Port Mellanox

24 Port Mellanox24 Port Mellanox

24 Port Mellanox24 Port Mellanox

24 Port Mellanox24 Port Mellanox

...

...

Phase 1 Phase 2