51
Introduction to HPC at Introduction to HPC at UNBC UNBC The Enhanced High Performance The Enhanced High Performance Computing Center Computing Center Dr. You Qin (Jean) Wang Dr. You Qin (Jean) Wang February 13, 2008 February 13, 2008

Introduction to HPC at UNBC

  • Upload
    sal

  • View
    54

  • Download
    1

Embed Size (px)

DESCRIPTION

Introduction to HPC at UNBC. The Enhanced High Performance Computing Center Dr. You Qin (Jean) Wang February 13, 2008. Summary of the presentation:. Who needs HPC? What kind of software do we have? What kind of hardware do we have? How to access the HPC systems? - PowerPoint PPT Presentation

Citation preview

Page 1: Introduction to HPC at UNBC

Introduction to HPC at UNBCIntroduction to HPC at UNBC

The Enhanced High Performance The Enhanced High Performance Computing CenterComputing Center

Dr You Qin (Jean) WangDr You Qin (Jean) Wang

February 13 2008 February 13 2008

Summary of the presentationSummary of the presentation

Who needs HPCWho needs HPC

What kind of software do we haveWhat kind of software do we have

What kind of hardware do we haveWhat kind of hardware do we have

How to access the HPC systemsHow to access the HPC systems

Parallel programming basicsParallel programming basics

Who needs HPCWho needs HPCHPC Domains of Applications at UNBCHPC Domains of Applications at UNBC

Atmospheric ScienceAtmospheric ScienceEnvironmental ScienceEnvironmental ScienceGeophysicsGeophysicsChemistryChemistryComputer ScienceComputer ScienceForestForestPhysicsPhysicsEngineering Engineering

Who needs HPCWho needs HPC

We use HPC to solve problems that We use HPC to solve problems that cant be solved in a reasonable cant be solved in a reasonable amount of time using a single amount of time using a single desktop computerdesktop computer

Problems solved using HPCProblems solved using HPC Needs large quantity of RAMNeeds large quantity of RAM Requires large quantity of CPUsRequires large quantity of CPUs

HPC Users SummeryHPC Users Summery

On February 6 2008On February 6 2008

Total Users 73Total Users 73

Professors 16Professors 16

Post-doctoral 7Post-doctoral 7

Ph D students 5Ph D students 5

Master Students and Others 45Master Students and Others 45

What kind of software do we haveWhat kind of software do we have

IDL + ENVIIDL + ENVIMATLAB + ToolboxesMATLAB + ToolboxesTecplotTecplotSTATASTATANAG Fortran LibraryNAG Fortran LibraryFLUENTFLUENTPGI CompilersPGI CompilersIntel CompilersIntel Compilers

What kind of software do we haveWhat kind of software do we have

IDL ndash the ideal software for data IDL ndash the ideal software for data analysis visualization and cross-analysis visualization and cross-platform application developmentplatform application development

ENVI - the premier software solution ENVI - the premier software solution to quickly easily and accurately to quickly easily and accurately extract information from geospatial extract information from geospatial imageryimagery

What kind of software do we haveWhat kind of software do we haveMATLAB is a high-level technical computing MATLAB is a high-level technical computing language and interactive environment for language and interactive environment for algorithm development data visualization data algorithm development data visualization data analysis and numeric computationanalysis and numeric computation

MATLAB ToolboxesMATLAB Toolboxesndash Curve FittingCurve Fittingndash Distributed ComputingDistributed Computingndash Image ProcessingImage Processingndash Mapping Mapping ndash Neural NetworkNeural Networkndash StatisticsStatistics

What kind of software do we haveWhat kind of software do we have

Two images plotted Two images plotted using Tecplot by using Tecplot by Dr Jean WangDr Jean Wang

Pressure Contour Pressure Contour around a Prolate around a Prolate Spheroid Spheroid

What kind of software do we haveWhat kind of software do we have

Why use STATAWhy use STATA

STATA is a complete integrated STATA is a complete integrated statistical package that provides statistical package that provides everything you need for data everything you need for data analysis data management and analysis data management and graphicsgraphics

What kind of software do we haveWhat kind of software do we haveThe NAG Fortran Library - the largest The NAG Fortran Library - the largest commercially available collection of numerical commercially available collection of numerical algorithms for Fortran todayalgorithms for Fortran today

Calling NAG LibraryCalling NAG Libraryndash Set Environmental Variables before you run your jobSet Environmental Variables before you run your job

LM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatLM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatexport LM_LICENSE_FILEexport LM_LICENSE_FILE

optintelfc90binifort -r8 testfor ndashLoptintelfc90binifort -r8 testfor ndashLusrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnagso -o testexe usrlocalfll6420dclliblibnagso -o testexe

What kind of software do we haveWhat kind of software do we have

FLUENT ndash Flow Modeling SoftwareFLUENT ndash Flow Modeling Software

What kind of hardware do we haveWhat kind of hardware do we have

SGI Altix 3000 ndash 64 processorSGI Altix 3000 ndash 64 processor

Linux Cluster ndash 128 processor Linux Cluster ndash 128 processor (Opteron)(Opteron)

File Server File Server

Windows Terminal ServerWindows Terminal Server

10 Workstations in HPC Lab10 Workstations in HPC Lab

Geowall systems for visualizationGeowall systems for visualization

SGI Altix 3000 ndash columbiaunbccaSGI Altix 3000 ndash columbiaunbcca64 Processors64 Processorsndash Intel Itanium2 (15Ghz)Intel Itanium2 (15Ghz)ndash 4Mb Cache4Mb Cache

64 Gb RAM64 Gb RAMndash 1Gbprocessor1Gbprocessor

NumaLink interconnectNumaLink interconnectndash 64Gbs64Gbsndash Fat TreeFat Tree

10GbE network connection10GbE network connectionSuse Linux Enterprise Suse Linux Enterprise Server 9Server 9

Linux Cluster ndash andreiunbccaLinux Cluster ndash andreiunbcca64 Nodes (128 processors) + 64 Nodes (128 processors) + Head NodeHead Nodendash AMD Opteron (21Ghz) AMD Opteron (21Ghz)

(2node)(2node)ndash 144Gb RAM (2node + 16 for 144Gb RAM (2node + 16 for

head)head)

GigE interconnectGigE interconnectndash Two nortel switchesTwo nortel switchesndash Network access via head Network access via head

nodenode

Operating SystemOperating Systemndash Suse 93Suse 93

StorageStoragendash 17 Tb of local storage on 17 Tb of local storage on

head node for software and head node for software and local copieslocal copies

File ServerFile ServerSGI Altix 350SGI Altix 350ndash 4p 8Gb RAM4p 8Gb RAM

SGI TP9100SGI TP9100ndash 6Tb Storage6Tb Storagendash RAID 5 with hot RAID 5 with hot

sparespare

10GbE network 10GbE network connectionconnectionMaintain type Maintain type backupbackup

Windows Terminal Server ndash Windows Terminal Server ndash ithacaunbccaithacaunbcca

Dell PowerEdge 6800Dell PowerEdge 6800ndash 4p (Intel Xeon 24Ghz)4p (Intel Xeon 24Ghz)ndash 8Gb RAM8Gb RAM

Local Raid for system volumeLocal Raid for system volumendash 600Gb volume600Gb volume

Accessible from anywhereAccessible from anywhere

Runs windows applicationsRuns windows applications

Workstations at HPC LabWorkstations at HPC Lab

Dell Precision 470Dell Precision 470ndash 2 Intel Xeon Processors (32Ghz)2 Intel Xeon Processors (32Ghz)ndash 2Gb RAM2Gb RAMndash NVidia Quadro FX3400 256MbNVidia Quadro FX3400 256Mbndash 2 Dell 20rdquo LCD displays2 Dell 20rdquo LCD displays

GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay

How to access the HPC systemsHow to access the HPC systems

From Windows to WindowsFrom Windows to Windows

From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection

Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-

01unbcca01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to LinuxFrom Linux to Linux

ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca

ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-

6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt

How to access the HPC systemsHow to access the HPC systems

From Windows to LinuxFrom Windows to Linux

Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom

httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and

select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN

ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

On a Linux machineOn a Linux machinendash smbmount smbmount

pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN

ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to

Reminder to HPC usersReminder to HPC users

Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc

Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei

What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang

Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism

Less fish vs more Less fish vs more fishfish

What is ParallelismWhat is Parallelism

Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem

The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data

Kinds of ParallelismKinds of Parallelism

Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI

Distributed Memory ndash MPIDistributed Memory ndash MPI

The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing

Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour

Shared Memory Shared Memory ParallelismParallelism

If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours

Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown

And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours

The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min

The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour

Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min

Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return

Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos

Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly

Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other

All data are privateAll data are private

Processes communicate by passing Processes communicate by passing messagesmessages

The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)

Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead

The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until

everyone is readyeveryone is ready

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 2: Introduction to HPC at UNBC

Summary of the presentationSummary of the presentation

Who needs HPCWho needs HPC

What kind of software do we haveWhat kind of software do we have

What kind of hardware do we haveWhat kind of hardware do we have

How to access the HPC systemsHow to access the HPC systems

Parallel programming basicsParallel programming basics

Who needs HPCWho needs HPCHPC Domains of Applications at UNBCHPC Domains of Applications at UNBC

Atmospheric ScienceAtmospheric ScienceEnvironmental ScienceEnvironmental ScienceGeophysicsGeophysicsChemistryChemistryComputer ScienceComputer ScienceForestForestPhysicsPhysicsEngineering Engineering

Who needs HPCWho needs HPC

We use HPC to solve problems that We use HPC to solve problems that cant be solved in a reasonable cant be solved in a reasonable amount of time using a single amount of time using a single desktop computerdesktop computer

Problems solved using HPCProblems solved using HPC Needs large quantity of RAMNeeds large quantity of RAM Requires large quantity of CPUsRequires large quantity of CPUs

HPC Users SummeryHPC Users Summery

On February 6 2008On February 6 2008

Total Users 73Total Users 73

Professors 16Professors 16

Post-doctoral 7Post-doctoral 7

Ph D students 5Ph D students 5

Master Students and Others 45Master Students and Others 45

What kind of software do we haveWhat kind of software do we have

IDL + ENVIIDL + ENVIMATLAB + ToolboxesMATLAB + ToolboxesTecplotTecplotSTATASTATANAG Fortran LibraryNAG Fortran LibraryFLUENTFLUENTPGI CompilersPGI CompilersIntel CompilersIntel Compilers

What kind of software do we haveWhat kind of software do we have

IDL ndash the ideal software for data IDL ndash the ideal software for data analysis visualization and cross-analysis visualization and cross-platform application developmentplatform application development

ENVI - the premier software solution ENVI - the premier software solution to quickly easily and accurately to quickly easily and accurately extract information from geospatial extract information from geospatial imageryimagery

What kind of software do we haveWhat kind of software do we haveMATLAB is a high-level technical computing MATLAB is a high-level technical computing language and interactive environment for language and interactive environment for algorithm development data visualization data algorithm development data visualization data analysis and numeric computationanalysis and numeric computation

MATLAB ToolboxesMATLAB Toolboxesndash Curve FittingCurve Fittingndash Distributed ComputingDistributed Computingndash Image ProcessingImage Processingndash Mapping Mapping ndash Neural NetworkNeural Networkndash StatisticsStatistics

What kind of software do we haveWhat kind of software do we have

Two images plotted Two images plotted using Tecplot by using Tecplot by Dr Jean WangDr Jean Wang

Pressure Contour Pressure Contour around a Prolate around a Prolate Spheroid Spheroid

What kind of software do we haveWhat kind of software do we have

Why use STATAWhy use STATA

STATA is a complete integrated STATA is a complete integrated statistical package that provides statistical package that provides everything you need for data everything you need for data analysis data management and analysis data management and graphicsgraphics

What kind of software do we haveWhat kind of software do we haveThe NAG Fortran Library - the largest The NAG Fortran Library - the largest commercially available collection of numerical commercially available collection of numerical algorithms for Fortran todayalgorithms for Fortran today

Calling NAG LibraryCalling NAG Libraryndash Set Environmental Variables before you run your jobSet Environmental Variables before you run your job

LM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatLM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatexport LM_LICENSE_FILEexport LM_LICENSE_FILE

optintelfc90binifort -r8 testfor ndashLoptintelfc90binifort -r8 testfor ndashLusrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnagso -o testexe usrlocalfll6420dclliblibnagso -o testexe

What kind of software do we haveWhat kind of software do we have

FLUENT ndash Flow Modeling SoftwareFLUENT ndash Flow Modeling Software

What kind of hardware do we haveWhat kind of hardware do we have

SGI Altix 3000 ndash 64 processorSGI Altix 3000 ndash 64 processor

Linux Cluster ndash 128 processor Linux Cluster ndash 128 processor (Opteron)(Opteron)

File Server File Server

Windows Terminal ServerWindows Terminal Server

10 Workstations in HPC Lab10 Workstations in HPC Lab

Geowall systems for visualizationGeowall systems for visualization

SGI Altix 3000 ndash columbiaunbccaSGI Altix 3000 ndash columbiaunbcca64 Processors64 Processorsndash Intel Itanium2 (15Ghz)Intel Itanium2 (15Ghz)ndash 4Mb Cache4Mb Cache

64 Gb RAM64 Gb RAMndash 1Gbprocessor1Gbprocessor

NumaLink interconnectNumaLink interconnectndash 64Gbs64Gbsndash Fat TreeFat Tree

10GbE network connection10GbE network connectionSuse Linux Enterprise Suse Linux Enterprise Server 9Server 9

Linux Cluster ndash andreiunbccaLinux Cluster ndash andreiunbcca64 Nodes (128 processors) + 64 Nodes (128 processors) + Head NodeHead Nodendash AMD Opteron (21Ghz) AMD Opteron (21Ghz)

(2node)(2node)ndash 144Gb RAM (2node + 16 for 144Gb RAM (2node + 16 for

head)head)

GigE interconnectGigE interconnectndash Two nortel switchesTwo nortel switchesndash Network access via head Network access via head

nodenode

Operating SystemOperating Systemndash Suse 93Suse 93

StorageStoragendash 17 Tb of local storage on 17 Tb of local storage on

head node for software and head node for software and local copieslocal copies

File ServerFile ServerSGI Altix 350SGI Altix 350ndash 4p 8Gb RAM4p 8Gb RAM

SGI TP9100SGI TP9100ndash 6Tb Storage6Tb Storagendash RAID 5 with hot RAID 5 with hot

sparespare

10GbE network 10GbE network connectionconnectionMaintain type Maintain type backupbackup

Windows Terminal Server ndash Windows Terminal Server ndash ithacaunbccaithacaunbcca

Dell PowerEdge 6800Dell PowerEdge 6800ndash 4p (Intel Xeon 24Ghz)4p (Intel Xeon 24Ghz)ndash 8Gb RAM8Gb RAM

Local Raid for system volumeLocal Raid for system volumendash 600Gb volume600Gb volume

Accessible from anywhereAccessible from anywhere

Runs windows applicationsRuns windows applications

Workstations at HPC LabWorkstations at HPC Lab

Dell Precision 470Dell Precision 470ndash 2 Intel Xeon Processors (32Ghz)2 Intel Xeon Processors (32Ghz)ndash 2Gb RAM2Gb RAMndash NVidia Quadro FX3400 256MbNVidia Quadro FX3400 256Mbndash 2 Dell 20rdquo LCD displays2 Dell 20rdquo LCD displays

GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay

How to access the HPC systemsHow to access the HPC systems

From Windows to WindowsFrom Windows to Windows

From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection

Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-

01unbcca01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to LinuxFrom Linux to Linux

ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca

ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-

6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt

How to access the HPC systemsHow to access the HPC systems

From Windows to LinuxFrom Windows to Linux

Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom

httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and

select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN

ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

On a Linux machineOn a Linux machinendash smbmount smbmount

pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN

ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to

Reminder to HPC usersReminder to HPC users

Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc

Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei

What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang

Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism

Less fish vs more Less fish vs more fishfish

What is ParallelismWhat is Parallelism

Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem

The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data

Kinds of ParallelismKinds of Parallelism

Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI

Distributed Memory ndash MPIDistributed Memory ndash MPI

The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing

Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour

Shared Memory Shared Memory ParallelismParallelism

If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours

Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown

And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours

The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min

The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour

Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min

Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return

Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos

Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly

Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other

All data are privateAll data are private

Processes communicate by passing Processes communicate by passing messagesmessages

The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)

Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead

The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until

everyone is readyeveryone is ready

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 3: Introduction to HPC at UNBC

Who needs HPCWho needs HPCHPC Domains of Applications at UNBCHPC Domains of Applications at UNBC

Atmospheric ScienceAtmospheric ScienceEnvironmental ScienceEnvironmental ScienceGeophysicsGeophysicsChemistryChemistryComputer ScienceComputer ScienceForestForestPhysicsPhysicsEngineering Engineering

Who needs HPCWho needs HPC

We use HPC to solve problems that We use HPC to solve problems that cant be solved in a reasonable cant be solved in a reasonable amount of time using a single amount of time using a single desktop computerdesktop computer

Problems solved using HPCProblems solved using HPC Needs large quantity of RAMNeeds large quantity of RAM Requires large quantity of CPUsRequires large quantity of CPUs

HPC Users SummeryHPC Users Summery

On February 6 2008On February 6 2008

Total Users 73Total Users 73

Professors 16Professors 16

Post-doctoral 7Post-doctoral 7

Ph D students 5Ph D students 5

Master Students and Others 45Master Students and Others 45

What kind of software do we haveWhat kind of software do we have

IDL + ENVIIDL + ENVIMATLAB + ToolboxesMATLAB + ToolboxesTecplotTecplotSTATASTATANAG Fortran LibraryNAG Fortran LibraryFLUENTFLUENTPGI CompilersPGI CompilersIntel CompilersIntel Compilers

What kind of software do we haveWhat kind of software do we have

IDL ndash the ideal software for data IDL ndash the ideal software for data analysis visualization and cross-analysis visualization and cross-platform application developmentplatform application development

ENVI - the premier software solution ENVI - the premier software solution to quickly easily and accurately to quickly easily and accurately extract information from geospatial extract information from geospatial imageryimagery

What kind of software do we haveWhat kind of software do we haveMATLAB is a high-level technical computing MATLAB is a high-level technical computing language and interactive environment for language and interactive environment for algorithm development data visualization data algorithm development data visualization data analysis and numeric computationanalysis and numeric computation

MATLAB ToolboxesMATLAB Toolboxesndash Curve FittingCurve Fittingndash Distributed ComputingDistributed Computingndash Image ProcessingImage Processingndash Mapping Mapping ndash Neural NetworkNeural Networkndash StatisticsStatistics

What kind of software do we haveWhat kind of software do we have

Two images plotted Two images plotted using Tecplot by using Tecplot by Dr Jean WangDr Jean Wang

Pressure Contour Pressure Contour around a Prolate around a Prolate Spheroid Spheroid

What kind of software do we haveWhat kind of software do we have

Why use STATAWhy use STATA

STATA is a complete integrated STATA is a complete integrated statistical package that provides statistical package that provides everything you need for data everything you need for data analysis data management and analysis data management and graphicsgraphics

What kind of software do we haveWhat kind of software do we haveThe NAG Fortran Library - the largest The NAG Fortran Library - the largest commercially available collection of numerical commercially available collection of numerical algorithms for Fortran todayalgorithms for Fortran today

Calling NAG LibraryCalling NAG Libraryndash Set Environmental Variables before you run your jobSet Environmental Variables before you run your job

LM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatLM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatexport LM_LICENSE_FILEexport LM_LICENSE_FILE

optintelfc90binifort -r8 testfor ndashLoptintelfc90binifort -r8 testfor ndashLusrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnagso -o testexe usrlocalfll6420dclliblibnagso -o testexe

What kind of software do we haveWhat kind of software do we have

FLUENT ndash Flow Modeling SoftwareFLUENT ndash Flow Modeling Software

What kind of hardware do we haveWhat kind of hardware do we have

SGI Altix 3000 ndash 64 processorSGI Altix 3000 ndash 64 processor

Linux Cluster ndash 128 processor Linux Cluster ndash 128 processor (Opteron)(Opteron)

File Server File Server

Windows Terminal ServerWindows Terminal Server

10 Workstations in HPC Lab10 Workstations in HPC Lab

Geowall systems for visualizationGeowall systems for visualization

SGI Altix 3000 ndash columbiaunbccaSGI Altix 3000 ndash columbiaunbcca64 Processors64 Processorsndash Intel Itanium2 (15Ghz)Intel Itanium2 (15Ghz)ndash 4Mb Cache4Mb Cache

64 Gb RAM64 Gb RAMndash 1Gbprocessor1Gbprocessor

NumaLink interconnectNumaLink interconnectndash 64Gbs64Gbsndash Fat TreeFat Tree

10GbE network connection10GbE network connectionSuse Linux Enterprise Suse Linux Enterprise Server 9Server 9

Linux Cluster ndash andreiunbccaLinux Cluster ndash andreiunbcca64 Nodes (128 processors) + 64 Nodes (128 processors) + Head NodeHead Nodendash AMD Opteron (21Ghz) AMD Opteron (21Ghz)

(2node)(2node)ndash 144Gb RAM (2node + 16 for 144Gb RAM (2node + 16 for

head)head)

GigE interconnectGigE interconnectndash Two nortel switchesTwo nortel switchesndash Network access via head Network access via head

nodenode

Operating SystemOperating Systemndash Suse 93Suse 93

StorageStoragendash 17 Tb of local storage on 17 Tb of local storage on

head node for software and head node for software and local copieslocal copies

File ServerFile ServerSGI Altix 350SGI Altix 350ndash 4p 8Gb RAM4p 8Gb RAM

SGI TP9100SGI TP9100ndash 6Tb Storage6Tb Storagendash RAID 5 with hot RAID 5 with hot

sparespare

10GbE network 10GbE network connectionconnectionMaintain type Maintain type backupbackup

Windows Terminal Server ndash Windows Terminal Server ndash ithacaunbccaithacaunbcca

Dell PowerEdge 6800Dell PowerEdge 6800ndash 4p (Intel Xeon 24Ghz)4p (Intel Xeon 24Ghz)ndash 8Gb RAM8Gb RAM

Local Raid for system volumeLocal Raid for system volumendash 600Gb volume600Gb volume

Accessible from anywhereAccessible from anywhere

Runs windows applicationsRuns windows applications

Workstations at HPC LabWorkstations at HPC Lab

Dell Precision 470Dell Precision 470ndash 2 Intel Xeon Processors (32Ghz)2 Intel Xeon Processors (32Ghz)ndash 2Gb RAM2Gb RAMndash NVidia Quadro FX3400 256MbNVidia Quadro FX3400 256Mbndash 2 Dell 20rdquo LCD displays2 Dell 20rdquo LCD displays

GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay

How to access the HPC systemsHow to access the HPC systems

From Windows to WindowsFrom Windows to Windows

From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection

Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-

01unbcca01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to LinuxFrom Linux to Linux

ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca

ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-

6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt

How to access the HPC systemsHow to access the HPC systems

From Windows to LinuxFrom Windows to Linux

Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom

httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and

select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN

ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

On a Linux machineOn a Linux machinendash smbmount smbmount

pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN

ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to

Reminder to HPC usersReminder to HPC users

Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc

Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei

What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang

Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism

Less fish vs more Less fish vs more fishfish

What is ParallelismWhat is Parallelism

Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem

The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data

Kinds of ParallelismKinds of Parallelism

Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI

Distributed Memory ndash MPIDistributed Memory ndash MPI

The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing

Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour

Shared Memory Shared Memory ParallelismParallelism

If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours

Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown

And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours

The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min

The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour

Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min

Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return

Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos

Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly

Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other

All data are privateAll data are private

Processes communicate by passing Processes communicate by passing messagesmessages

The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)

Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead

The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until

everyone is readyeveryone is ready

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 4: Introduction to HPC at UNBC

Who needs HPCWho needs HPC

We use HPC to solve problems that We use HPC to solve problems that cant be solved in a reasonable cant be solved in a reasonable amount of time using a single amount of time using a single desktop computerdesktop computer

Problems solved using HPCProblems solved using HPC Needs large quantity of RAMNeeds large quantity of RAM Requires large quantity of CPUsRequires large quantity of CPUs

HPC Users SummeryHPC Users Summery

On February 6 2008On February 6 2008

Total Users 73Total Users 73

Professors 16Professors 16

Post-doctoral 7Post-doctoral 7

Ph D students 5Ph D students 5

Master Students and Others 45Master Students and Others 45

What kind of software do we haveWhat kind of software do we have

IDL + ENVIIDL + ENVIMATLAB + ToolboxesMATLAB + ToolboxesTecplotTecplotSTATASTATANAG Fortran LibraryNAG Fortran LibraryFLUENTFLUENTPGI CompilersPGI CompilersIntel CompilersIntel Compilers

What kind of software do we haveWhat kind of software do we have

IDL ndash the ideal software for data IDL ndash the ideal software for data analysis visualization and cross-analysis visualization and cross-platform application developmentplatform application development

ENVI - the premier software solution ENVI - the premier software solution to quickly easily and accurately to quickly easily and accurately extract information from geospatial extract information from geospatial imageryimagery

What kind of software do we haveWhat kind of software do we haveMATLAB is a high-level technical computing MATLAB is a high-level technical computing language and interactive environment for language and interactive environment for algorithm development data visualization data algorithm development data visualization data analysis and numeric computationanalysis and numeric computation

MATLAB ToolboxesMATLAB Toolboxesndash Curve FittingCurve Fittingndash Distributed ComputingDistributed Computingndash Image ProcessingImage Processingndash Mapping Mapping ndash Neural NetworkNeural Networkndash StatisticsStatistics

What kind of software do we haveWhat kind of software do we have

Two images plotted Two images plotted using Tecplot by using Tecplot by Dr Jean WangDr Jean Wang

Pressure Contour Pressure Contour around a Prolate around a Prolate Spheroid Spheroid

What kind of software do we haveWhat kind of software do we have

Why use STATAWhy use STATA

STATA is a complete integrated STATA is a complete integrated statistical package that provides statistical package that provides everything you need for data everything you need for data analysis data management and analysis data management and graphicsgraphics

What kind of software do we haveWhat kind of software do we haveThe NAG Fortran Library - the largest The NAG Fortran Library - the largest commercially available collection of numerical commercially available collection of numerical algorithms for Fortran todayalgorithms for Fortran today

Calling NAG LibraryCalling NAG Libraryndash Set Environmental Variables before you run your jobSet Environmental Variables before you run your job

LM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatLM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatexport LM_LICENSE_FILEexport LM_LICENSE_FILE

optintelfc90binifort -r8 testfor ndashLoptintelfc90binifort -r8 testfor ndashLusrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnagso -o testexe usrlocalfll6420dclliblibnagso -o testexe

What kind of software do we haveWhat kind of software do we have

FLUENT ndash Flow Modeling SoftwareFLUENT ndash Flow Modeling Software

What kind of hardware do we haveWhat kind of hardware do we have

SGI Altix 3000 ndash 64 processorSGI Altix 3000 ndash 64 processor

Linux Cluster ndash 128 processor Linux Cluster ndash 128 processor (Opteron)(Opteron)

File Server File Server

Windows Terminal ServerWindows Terminal Server

10 Workstations in HPC Lab10 Workstations in HPC Lab

Geowall systems for visualizationGeowall systems for visualization

SGI Altix 3000 ndash columbiaunbccaSGI Altix 3000 ndash columbiaunbcca64 Processors64 Processorsndash Intel Itanium2 (15Ghz)Intel Itanium2 (15Ghz)ndash 4Mb Cache4Mb Cache

64 Gb RAM64 Gb RAMndash 1Gbprocessor1Gbprocessor

NumaLink interconnectNumaLink interconnectndash 64Gbs64Gbsndash Fat TreeFat Tree

10GbE network connection10GbE network connectionSuse Linux Enterprise Suse Linux Enterprise Server 9Server 9

Linux Cluster ndash andreiunbccaLinux Cluster ndash andreiunbcca64 Nodes (128 processors) + 64 Nodes (128 processors) + Head NodeHead Nodendash AMD Opteron (21Ghz) AMD Opteron (21Ghz)

(2node)(2node)ndash 144Gb RAM (2node + 16 for 144Gb RAM (2node + 16 for

head)head)

GigE interconnectGigE interconnectndash Two nortel switchesTwo nortel switchesndash Network access via head Network access via head

nodenode

Operating SystemOperating Systemndash Suse 93Suse 93

StorageStoragendash 17 Tb of local storage on 17 Tb of local storage on

head node for software and head node for software and local copieslocal copies

File ServerFile ServerSGI Altix 350SGI Altix 350ndash 4p 8Gb RAM4p 8Gb RAM

SGI TP9100SGI TP9100ndash 6Tb Storage6Tb Storagendash RAID 5 with hot RAID 5 with hot

sparespare

10GbE network 10GbE network connectionconnectionMaintain type Maintain type backupbackup

Windows Terminal Server ndash Windows Terminal Server ndash ithacaunbccaithacaunbcca

Dell PowerEdge 6800Dell PowerEdge 6800ndash 4p (Intel Xeon 24Ghz)4p (Intel Xeon 24Ghz)ndash 8Gb RAM8Gb RAM

Local Raid for system volumeLocal Raid for system volumendash 600Gb volume600Gb volume

Accessible from anywhereAccessible from anywhere

Runs windows applicationsRuns windows applications

Workstations at HPC LabWorkstations at HPC Lab

Dell Precision 470Dell Precision 470ndash 2 Intel Xeon Processors (32Ghz)2 Intel Xeon Processors (32Ghz)ndash 2Gb RAM2Gb RAMndash NVidia Quadro FX3400 256MbNVidia Quadro FX3400 256Mbndash 2 Dell 20rdquo LCD displays2 Dell 20rdquo LCD displays

GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay

How to access the HPC systemsHow to access the HPC systems

From Windows to WindowsFrom Windows to Windows

From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection

Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-

01unbcca01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to LinuxFrom Linux to Linux

ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca

ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-

6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt

How to access the HPC systemsHow to access the HPC systems

From Windows to LinuxFrom Windows to Linux

Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom

httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and

select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN

ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

On a Linux machineOn a Linux machinendash smbmount smbmount

pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN

ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to

Reminder to HPC usersReminder to HPC users

Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc

Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei

What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang

Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism

Less fish vs more Less fish vs more fishfish

What is ParallelismWhat is Parallelism

Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem

The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data

Kinds of ParallelismKinds of Parallelism

Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI

Distributed Memory ndash MPIDistributed Memory ndash MPI

The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing

Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour

Shared Memory Shared Memory ParallelismParallelism

If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours

Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown

And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours

The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min

The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour

Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min

Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return

Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos

Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly

Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other

All data are privateAll data are private

Processes communicate by passing Processes communicate by passing messagesmessages

The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)

Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead

The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until

everyone is readyeveryone is ready

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 5: Introduction to HPC at UNBC

HPC Users SummeryHPC Users Summery

On February 6 2008On February 6 2008

Total Users 73Total Users 73

Professors 16Professors 16

Post-doctoral 7Post-doctoral 7

Ph D students 5Ph D students 5

Master Students and Others 45Master Students and Others 45

What kind of software do we haveWhat kind of software do we have

IDL + ENVIIDL + ENVIMATLAB + ToolboxesMATLAB + ToolboxesTecplotTecplotSTATASTATANAG Fortran LibraryNAG Fortran LibraryFLUENTFLUENTPGI CompilersPGI CompilersIntel CompilersIntel Compilers

What kind of software do we haveWhat kind of software do we have

IDL ndash the ideal software for data IDL ndash the ideal software for data analysis visualization and cross-analysis visualization and cross-platform application developmentplatform application development

ENVI - the premier software solution ENVI - the premier software solution to quickly easily and accurately to quickly easily and accurately extract information from geospatial extract information from geospatial imageryimagery

What kind of software do we haveWhat kind of software do we haveMATLAB is a high-level technical computing MATLAB is a high-level technical computing language and interactive environment for language and interactive environment for algorithm development data visualization data algorithm development data visualization data analysis and numeric computationanalysis and numeric computation

MATLAB ToolboxesMATLAB Toolboxesndash Curve FittingCurve Fittingndash Distributed ComputingDistributed Computingndash Image ProcessingImage Processingndash Mapping Mapping ndash Neural NetworkNeural Networkndash StatisticsStatistics

What kind of software do we haveWhat kind of software do we have

Two images plotted Two images plotted using Tecplot by using Tecplot by Dr Jean WangDr Jean Wang

Pressure Contour Pressure Contour around a Prolate around a Prolate Spheroid Spheroid

What kind of software do we haveWhat kind of software do we have

Why use STATAWhy use STATA

STATA is a complete integrated STATA is a complete integrated statistical package that provides statistical package that provides everything you need for data everything you need for data analysis data management and analysis data management and graphicsgraphics

What kind of software do we haveWhat kind of software do we haveThe NAG Fortran Library - the largest The NAG Fortran Library - the largest commercially available collection of numerical commercially available collection of numerical algorithms for Fortran todayalgorithms for Fortran today

Calling NAG LibraryCalling NAG Libraryndash Set Environmental Variables before you run your jobSet Environmental Variables before you run your job

LM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatLM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatexport LM_LICENSE_FILEexport LM_LICENSE_FILE

optintelfc90binifort -r8 testfor ndashLoptintelfc90binifort -r8 testfor ndashLusrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnagso -o testexe usrlocalfll6420dclliblibnagso -o testexe

What kind of software do we haveWhat kind of software do we have

FLUENT ndash Flow Modeling SoftwareFLUENT ndash Flow Modeling Software

What kind of hardware do we haveWhat kind of hardware do we have

SGI Altix 3000 ndash 64 processorSGI Altix 3000 ndash 64 processor

Linux Cluster ndash 128 processor Linux Cluster ndash 128 processor (Opteron)(Opteron)

File Server File Server

Windows Terminal ServerWindows Terminal Server

10 Workstations in HPC Lab10 Workstations in HPC Lab

Geowall systems for visualizationGeowall systems for visualization

SGI Altix 3000 ndash columbiaunbccaSGI Altix 3000 ndash columbiaunbcca64 Processors64 Processorsndash Intel Itanium2 (15Ghz)Intel Itanium2 (15Ghz)ndash 4Mb Cache4Mb Cache

64 Gb RAM64 Gb RAMndash 1Gbprocessor1Gbprocessor

NumaLink interconnectNumaLink interconnectndash 64Gbs64Gbsndash Fat TreeFat Tree

10GbE network connection10GbE network connectionSuse Linux Enterprise Suse Linux Enterprise Server 9Server 9

Linux Cluster ndash andreiunbccaLinux Cluster ndash andreiunbcca64 Nodes (128 processors) + 64 Nodes (128 processors) + Head NodeHead Nodendash AMD Opteron (21Ghz) AMD Opteron (21Ghz)

(2node)(2node)ndash 144Gb RAM (2node + 16 for 144Gb RAM (2node + 16 for

head)head)

GigE interconnectGigE interconnectndash Two nortel switchesTwo nortel switchesndash Network access via head Network access via head

nodenode

Operating SystemOperating Systemndash Suse 93Suse 93

StorageStoragendash 17 Tb of local storage on 17 Tb of local storage on

head node for software and head node for software and local copieslocal copies

File ServerFile ServerSGI Altix 350SGI Altix 350ndash 4p 8Gb RAM4p 8Gb RAM

SGI TP9100SGI TP9100ndash 6Tb Storage6Tb Storagendash RAID 5 with hot RAID 5 with hot

sparespare

10GbE network 10GbE network connectionconnectionMaintain type Maintain type backupbackup

Windows Terminal Server ndash Windows Terminal Server ndash ithacaunbccaithacaunbcca

Dell PowerEdge 6800Dell PowerEdge 6800ndash 4p (Intel Xeon 24Ghz)4p (Intel Xeon 24Ghz)ndash 8Gb RAM8Gb RAM

Local Raid for system volumeLocal Raid for system volumendash 600Gb volume600Gb volume

Accessible from anywhereAccessible from anywhere

Runs windows applicationsRuns windows applications

Workstations at HPC LabWorkstations at HPC Lab

Dell Precision 470Dell Precision 470ndash 2 Intel Xeon Processors (32Ghz)2 Intel Xeon Processors (32Ghz)ndash 2Gb RAM2Gb RAMndash NVidia Quadro FX3400 256MbNVidia Quadro FX3400 256Mbndash 2 Dell 20rdquo LCD displays2 Dell 20rdquo LCD displays

GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay

How to access the HPC systemsHow to access the HPC systems

From Windows to WindowsFrom Windows to Windows

From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection

Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-

01unbcca01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to LinuxFrom Linux to Linux

ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca

ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-

6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt

How to access the HPC systemsHow to access the HPC systems

From Windows to LinuxFrom Windows to Linux

Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom

httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and

select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN

ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

On a Linux machineOn a Linux machinendash smbmount smbmount

pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN

ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to

Reminder to HPC usersReminder to HPC users

Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc

Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei

What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang

Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism

Less fish vs more Less fish vs more fishfish

What is ParallelismWhat is Parallelism

Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem

The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data

Kinds of ParallelismKinds of Parallelism

Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI

Distributed Memory ndash MPIDistributed Memory ndash MPI

The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing

Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour

Shared Memory Shared Memory ParallelismParallelism

If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours

Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown

And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours

The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min

The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour

Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min

Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return

Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos

Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly

Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other

All data are privateAll data are private

Processes communicate by passing Processes communicate by passing messagesmessages

The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)

Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead

The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until

everyone is readyeveryone is ready

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 6: Introduction to HPC at UNBC

What kind of software do we haveWhat kind of software do we have

IDL + ENVIIDL + ENVIMATLAB + ToolboxesMATLAB + ToolboxesTecplotTecplotSTATASTATANAG Fortran LibraryNAG Fortran LibraryFLUENTFLUENTPGI CompilersPGI CompilersIntel CompilersIntel Compilers

What kind of software do we haveWhat kind of software do we have

IDL ndash the ideal software for data IDL ndash the ideal software for data analysis visualization and cross-analysis visualization and cross-platform application developmentplatform application development

ENVI - the premier software solution ENVI - the premier software solution to quickly easily and accurately to quickly easily and accurately extract information from geospatial extract information from geospatial imageryimagery

What kind of software do we haveWhat kind of software do we haveMATLAB is a high-level technical computing MATLAB is a high-level technical computing language and interactive environment for language and interactive environment for algorithm development data visualization data algorithm development data visualization data analysis and numeric computationanalysis and numeric computation

MATLAB ToolboxesMATLAB Toolboxesndash Curve FittingCurve Fittingndash Distributed ComputingDistributed Computingndash Image ProcessingImage Processingndash Mapping Mapping ndash Neural NetworkNeural Networkndash StatisticsStatistics

What kind of software do we haveWhat kind of software do we have

Two images plotted Two images plotted using Tecplot by using Tecplot by Dr Jean WangDr Jean Wang

Pressure Contour Pressure Contour around a Prolate around a Prolate Spheroid Spheroid

What kind of software do we haveWhat kind of software do we have

Why use STATAWhy use STATA

STATA is a complete integrated STATA is a complete integrated statistical package that provides statistical package that provides everything you need for data everything you need for data analysis data management and analysis data management and graphicsgraphics

What kind of software do we haveWhat kind of software do we haveThe NAG Fortran Library - the largest The NAG Fortran Library - the largest commercially available collection of numerical commercially available collection of numerical algorithms for Fortran todayalgorithms for Fortran today

Calling NAG LibraryCalling NAG Libraryndash Set Environmental Variables before you run your jobSet Environmental Variables before you run your job

LM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatLM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatexport LM_LICENSE_FILEexport LM_LICENSE_FILE

optintelfc90binifort -r8 testfor ndashLoptintelfc90binifort -r8 testfor ndashLusrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnagso -o testexe usrlocalfll6420dclliblibnagso -o testexe

What kind of software do we haveWhat kind of software do we have

FLUENT ndash Flow Modeling SoftwareFLUENT ndash Flow Modeling Software

What kind of hardware do we haveWhat kind of hardware do we have

SGI Altix 3000 ndash 64 processorSGI Altix 3000 ndash 64 processor

Linux Cluster ndash 128 processor Linux Cluster ndash 128 processor (Opteron)(Opteron)

File Server File Server

Windows Terminal ServerWindows Terminal Server

10 Workstations in HPC Lab10 Workstations in HPC Lab

Geowall systems for visualizationGeowall systems for visualization

SGI Altix 3000 ndash columbiaunbccaSGI Altix 3000 ndash columbiaunbcca64 Processors64 Processorsndash Intel Itanium2 (15Ghz)Intel Itanium2 (15Ghz)ndash 4Mb Cache4Mb Cache

64 Gb RAM64 Gb RAMndash 1Gbprocessor1Gbprocessor

NumaLink interconnectNumaLink interconnectndash 64Gbs64Gbsndash Fat TreeFat Tree

10GbE network connection10GbE network connectionSuse Linux Enterprise Suse Linux Enterprise Server 9Server 9

Linux Cluster ndash andreiunbccaLinux Cluster ndash andreiunbcca64 Nodes (128 processors) + 64 Nodes (128 processors) + Head NodeHead Nodendash AMD Opteron (21Ghz) AMD Opteron (21Ghz)

(2node)(2node)ndash 144Gb RAM (2node + 16 for 144Gb RAM (2node + 16 for

head)head)

GigE interconnectGigE interconnectndash Two nortel switchesTwo nortel switchesndash Network access via head Network access via head

nodenode

Operating SystemOperating Systemndash Suse 93Suse 93

StorageStoragendash 17 Tb of local storage on 17 Tb of local storage on

head node for software and head node for software and local copieslocal copies

File ServerFile ServerSGI Altix 350SGI Altix 350ndash 4p 8Gb RAM4p 8Gb RAM

SGI TP9100SGI TP9100ndash 6Tb Storage6Tb Storagendash RAID 5 with hot RAID 5 with hot

sparespare

10GbE network 10GbE network connectionconnectionMaintain type Maintain type backupbackup

Windows Terminal Server ndash Windows Terminal Server ndash ithacaunbccaithacaunbcca

Dell PowerEdge 6800Dell PowerEdge 6800ndash 4p (Intel Xeon 24Ghz)4p (Intel Xeon 24Ghz)ndash 8Gb RAM8Gb RAM

Local Raid for system volumeLocal Raid for system volumendash 600Gb volume600Gb volume

Accessible from anywhereAccessible from anywhere

Runs windows applicationsRuns windows applications

Workstations at HPC LabWorkstations at HPC Lab

Dell Precision 470Dell Precision 470ndash 2 Intel Xeon Processors (32Ghz)2 Intel Xeon Processors (32Ghz)ndash 2Gb RAM2Gb RAMndash NVidia Quadro FX3400 256MbNVidia Quadro FX3400 256Mbndash 2 Dell 20rdquo LCD displays2 Dell 20rdquo LCD displays

GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay

How to access the HPC systemsHow to access the HPC systems

From Windows to WindowsFrom Windows to Windows

From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection

Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-

01unbcca01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to LinuxFrom Linux to Linux

ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca

ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-

6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt

How to access the HPC systemsHow to access the HPC systems

From Windows to LinuxFrom Windows to Linux

Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom

httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and

select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN

ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

On a Linux machineOn a Linux machinendash smbmount smbmount

pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN

ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to

Reminder to HPC usersReminder to HPC users

Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc

Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei

What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang

Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism

Less fish vs more Less fish vs more fishfish

What is ParallelismWhat is Parallelism

Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem

The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data

Kinds of ParallelismKinds of Parallelism

Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI

Distributed Memory ndash MPIDistributed Memory ndash MPI

The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing

Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour

Shared Memory Shared Memory ParallelismParallelism

If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours

Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown

And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours

The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min

The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour

Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min

Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return

Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos

Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly

Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other

All data are privateAll data are private

Processes communicate by passing Processes communicate by passing messagesmessages

The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)

Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead

The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until

everyone is readyeveryone is ready

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 7: Introduction to HPC at UNBC

What kind of software do we haveWhat kind of software do we have

IDL ndash the ideal software for data IDL ndash the ideal software for data analysis visualization and cross-analysis visualization and cross-platform application developmentplatform application development

ENVI - the premier software solution ENVI - the premier software solution to quickly easily and accurately to quickly easily and accurately extract information from geospatial extract information from geospatial imageryimagery

What kind of software do we haveWhat kind of software do we haveMATLAB is a high-level technical computing MATLAB is a high-level technical computing language and interactive environment for language and interactive environment for algorithm development data visualization data algorithm development data visualization data analysis and numeric computationanalysis and numeric computation

MATLAB ToolboxesMATLAB Toolboxesndash Curve FittingCurve Fittingndash Distributed ComputingDistributed Computingndash Image ProcessingImage Processingndash Mapping Mapping ndash Neural NetworkNeural Networkndash StatisticsStatistics

What kind of software do we haveWhat kind of software do we have

Two images plotted Two images plotted using Tecplot by using Tecplot by Dr Jean WangDr Jean Wang

Pressure Contour Pressure Contour around a Prolate around a Prolate Spheroid Spheroid

What kind of software do we haveWhat kind of software do we have

Why use STATAWhy use STATA

STATA is a complete integrated STATA is a complete integrated statistical package that provides statistical package that provides everything you need for data everything you need for data analysis data management and analysis data management and graphicsgraphics

What kind of software do we haveWhat kind of software do we haveThe NAG Fortran Library - the largest The NAG Fortran Library - the largest commercially available collection of numerical commercially available collection of numerical algorithms for Fortran todayalgorithms for Fortran today

Calling NAG LibraryCalling NAG Libraryndash Set Environmental Variables before you run your jobSet Environmental Variables before you run your job

LM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatLM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatexport LM_LICENSE_FILEexport LM_LICENSE_FILE

optintelfc90binifort -r8 testfor ndashLoptintelfc90binifort -r8 testfor ndashLusrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnagso -o testexe usrlocalfll6420dclliblibnagso -o testexe

What kind of software do we haveWhat kind of software do we have

FLUENT ndash Flow Modeling SoftwareFLUENT ndash Flow Modeling Software

What kind of hardware do we haveWhat kind of hardware do we have

SGI Altix 3000 ndash 64 processorSGI Altix 3000 ndash 64 processor

Linux Cluster ndash 128 processor Linux Cluster ndash 128 processor (Opteron)(Opteron)

File Server File Server

Windows Terminal ServerWindows Terminal Server

10 Workstations in HPC Lab10 Workstations in HPC Lab

Geowall systems for visualizationGeowall systems for visualization

SGI Altix 3000 ndash columbiaunbccaSGI Altix 3000 ndash columbiaunbcca64 Processors64 Processorsndash Intel Itanium2 (15Ghz)Intel Itanium2 (15Ghz)ndash 4Mb Cache4Mb Cache

64 Gb RAM64 Gb RAMndash 1Gbprocessor1Gbprocessor

NumaLink interconnectNumaLink interconnectndash 64Gbs64Gbsndash Fat TreeFat Tree

10GbE network connection10GbE network connectionSuse Linux Enterprise Suse Linux Enterprise Server 9Server 9

Linux Cluster ndash andreiunbccaLinux Cluster ndash andreiunbcca64 Nodes (128 processors) + 64 Nodes (128 processors) + Head NodeHead Nodendash AMD Opteron (21Ghz) AMD Opteron (21Ghz)

(2node)(2node)ndash 144Gb RAM (2node + 16 for 144Gb RAM (2node + 16 for

head)head)

GigE interconnectGigE interconnectndash Two nortel switchesTwo nortel switchesndash Network access via head Network access via head

nodenode

Operating SystemOperating Systemndash Suse 93Suse 93

StorageStoragendash 17 Tb of local storage on 17 Tb of local storage on

head node for software and head node for software and local copieslocal copies

File ServerFile ServerSGI Altix 350SGI Altix 350ndash 4p 8Gb RAM4p 8Gb RAM

SGI TP9100SGI TP9100ndash 6Tb Storage6Tb Storagendash RAID 5 with hot RAID 5 with hot

sparespare

10GbE network 10GbE network connectionconnectionMaintain type Maintain type backupbackup

Windows Terminal Server ndash Windows Terminal Server ndash ithacaunbccaithacaunbcca

Dell PowerEdge 6800Dell PowerEdge 6800ndash 4p (Intel Xeon 24Ghz)4p (Intel Xeon 24Ghz)ndash 8Gb RAM8Gb RAM

Local Raid for system volumeLocal Raid for system volumendash 600Gb volume600Gb volume

Accessible from anywhereAccessible from anywhere

Runs windows applicationsRuns windows applications

Workstations at HPC LabWorkstations at HPC Lab

Dell Precision 470Dell Precision 470ndash 2 Intel Xeon Processors (32Ghz)2 Intel Xeon Processors (32Ghz)ndash 2Gb RAM2Gb RAMndash NVidia Quadro FX3400 256MbNVidia Quadro FX3400 256Mbndash 2 Dell 20rdquo LCD displays2 Dell 20rdquo LCD displays

GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay

How to access the HPC systemsHow to access the HPC systems

From Windows to WindowsFrom Windows to Windows

From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection

Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-

01unbcca01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to LinuxFrom Linux to Linux

ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca

ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-

6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt

How to access the HPC systemsHow to access the HPC systems

From Windows to LinuxFrom Windows to Linux

Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom

httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and

select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN

ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

On a Linux machineOn a Linux machinendash smbmount smbmount

pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN

ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to

Reminder to HPC usersReminder to HPC users

Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc

Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei

What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang

Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism

Less fish vs more Less fish vs more fishfish

What is ParallelismWhat is Parallelism

Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem

The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data

Kinds of ParallelismKinds of Parallelism

Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI

Distributed Memory ndash MPIDistributed Memory ndash MPI

The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing

Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour

Shared Memory Shared Memory ParallelismParallelism

If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours

Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown

And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours

The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min

The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour

Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min

Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return

Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos

Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly

Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other

All data are privateAll data are private

Processes communicate by passing Processes communicate by passing messagesmessages

The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)

Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead

The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until

everyone is readyeveryone is ready

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 8: Introduction to HPC at UNBC

What kind of software do we haveWhat kind of software do we haveMATLAB is a high-level technical computing MATLAB is a high-level technical computing language and interactive environment for language and interactive environment for algorithm development data visualization data algorithm development data visualization data analysis and numeric computationanalysis and numeric computation

MATLAB ToolboxesMATLAB Toolboxesndash Curve FittingCurve Fittingndash Distributed ComputingDistributed Computingndash Image ProcessingImage Processingndash Mapping Mapping ndash Neural NetworkNeural Networkndash StatisticsStatistics

What kind of software do we haveWhat kind of software do we have

Two images plotted Two images plotted using Tecplot by using Tecplot by Dr Jean WangDr Jean Wang

Pressure Contour Pressure Contour around a Prolate around a Prolate Spheroid Spheroid

What kind of software do we haveWhat kind of software do we have

Why use STATAWhy use STATA

STATA is a complete integrated STATA is a complete integrated statistical package that provides statistical package that provides everything you need for data everything you need for data analysis data management and analysis data management and graphicsgraphics

What kind of software do we haveWhat kind of software do we haveThe NAG Fortran Library - the largest The NAG Fortran Library - the largest commercially available collection of numerical commercially available collection of numerical algorithms for Fortran todayalgorithms for Fortran today

Calling NAG LibraryCalling NAG Libraryndash Set Environmental Variables before you run your jobSet Environmental Variables before you run your job

LM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatLM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatexport LM_LICENSE_FILEexport LM_LICENSE_FILE

optintelfc90binifort -r8 testfor ndashLoptintelfc90binifort -r8 testfor ndashLusrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnagso -o testexe usrlocalfll6420dclliblibnagso -o testexe

What kind of software do we haveWhat kind of software do we have

FLUENT ndash Flow Modeling SoftwareFLUENT ndash Flow Modeling Software

What kind of hardware do we haveWhat kind of hardware do we have

SGI Altix 3000 ndash 64 processorSGI Altix 3000 ndash 64 processor

Linux Cluster ndash 128 processor Linux Cluster ndash 128 processor (Opteron)(Opteron)

File Server File Server

Windows Terminal ServerWindows Terminal Server

10 Workstations in HPC Lab10 Workstations in HPC Lab

Geowall systems for visualizationGeowall systems for visualization

SGI Altix 3000 ndash columbiaunbccaSGI Altix 3000 ndash columbiaunbcca64 Processors64 Processorsndash Intel Itanium2 (15Ghz)Intel Itanium2 (15Ghz)ndash 4Mb Cache4Mb Cache

64 Gb RAM64 Gb RAMndash 1Gbprocessor1Gbprocessor

NumaLink interconnectNumaLink interconnectndash 64Gbs64Gbsndash Fat TreeFat Tree

10GbE network connection10GbE network connectionSuse Linux Enterprise Suse Linux Enterprise Server 9Server 9

Linux Cluster ndash andreiunbccaLinux Cluster ndash andreiunbcca64 Nodes (128 processors) + 64 Nodes (128 processors) + Head NodeHead Nodendash AMD Opteron (21Ghz) AMD Opteron (21Ghz)

(2node)(2node)ndash 144Gb RAM (2node + 16 for 144Gb RAM (2node + 16 for

head)head)

GigE interconnectGigE interconnectndash Two nortel switchesTwo nortel switchesndash Network access via head Network access via head

nodenode

Operating SystemOperating Systemndash Suse 93Suse 93

StorageStoragendash 17 Tb of local storage on 17 Tb of local storage on

head node for software and head node for software and local copieslocal copies

File ServerFile ServerSGI Altix 350SGI Altix 350ndash 4p 8Gb RAM4p 8Gb RAM

SGI TP9100SGI TP9100ndash 6Tb Storage6Tb Storagendash RAID 5 with hot RAID 5 with hot

sparespare

10GbE network 10GbE network connectionconnectionMaintain type Maintain type backupbackup

Windows Terminal Server ndash Windows Terminal Server ndash ithacaunbccaithacaunbcca

Dell PowerEdge 6800Dell PowerEdge 6800ndash 4p (Intel Xeon 24Ghz)4p (Intel Xeon 24Ghz)ndash 8Gb RAM8Gb RAM

Local Raid for system volumeLocal Raid for system volumendash 600Gb volume600Gb volume

Accessible from anywhereAccessible from anywhere

Runs windows applicationsRuns windows applications

Workstations at HPC LabWorkstations at HPC Lab

Dell Precision 470Dell Precision 470ndash 2 Intel Xeon Processors (32Ghz)2 Intel Xeon Processors (32Ghz)ndash 2Gb RAM2Gb RAMndash NVidia Quadro FX3400 256MbNVidia Quadro FX3400 256Mbndash 2 Dell 20rdquo LCD displays2 Dell 20rdquo LCD displays

GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay

How to access the HPC systemsHow to access the HPC systems

From Windows to WindowsFrom Windows to Windows

From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection

Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-

01unbcca01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to LinuxFrom Linux to Linux

ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca

ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-

6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt

How to access the HPC systemsHow to access the HPC systems

From Windows to LinuxFrom Windows to Linux

Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom

httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and

select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN

ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

On a Linux machineOn a Linux machinendash smbmount smbmount

pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN

ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to

Reminder to HPC usersReminder to HPC users

Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc

Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei

What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang

Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism

Less fish vs more Less fish vs more fishfish

What is ParallelismWhat is Parallelism

Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem

The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data

Kinds of ParallelismKinds of Parallelism

Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI

Distributed Memory ndash MPIDistributed Memory ndash MPI

The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing

Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour

Shared Memory Shared Memory ParallelismParallelism

If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours

Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown

And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours

The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min

The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour

Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min

Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return

Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos

Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly

Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other

All data are privateAll data are private

Processes communicate by passing Processes communicate by passing messagesmessages

The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)

Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead

The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until

everyone is readyeveryone is ready

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 9: Introduction to HPC at UNBC

What kind of software do we haveWhat kind of software do we have

Two images plotted Two images plotted using Tecplot by using Tecplot by Dr Jean WangDr Jean Wang

Pressure Contour Pressure Contour around a Prolate around a Prolate Spheroid Spheroid

What kind of software do we haveWhat kind of software do we have

Why use STATAWhy use STATA

STATA is a complete integrated STATA is a complete integrated statistical package that provides statistical package that provides everything you need for data everything you need for data analysis data management and analysis data management and graphicsgraphics

What kind of software do we haveWhat kind of software do we haveThe NAG Fortran Library - the largest The NAG Fortran Library - the largest commercially available collection of numerical commercially available collection of numerical algorithms for Fortran todayalgorithms for Fortran today

Calling NAG LibraryCalling NAG Libraryndash Set Environmental Variables before you run your jobSet Environmental Variables before you run your job

LM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatLM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatexport LM_LICENSE_FILEexport LM_LICENSE_FILE

optintelfc90binifort -r8 testfor ndashLoptintelfc90binifort -r8 testfor ndashLusrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnagso -o testexe usrlocalfll6420dclliblibnagso -o testexe

What kind of software do we haveWhat kind of software do we have

FLUENT ndash Flow Modeling SoftwareFLUENT ndash Flow Modeling Software

What kind of hardware do we haveWhat kind of hardware do we have

SGI Altix 3000 ndash 64 processorSGI Altix 3000 ndash 64 processor

Linux Cluster ndash 128 processor Linux Cluster ndash 128 processor (Opteron)(Opteron)

File Server File Server

Windows Terminal ServerWindows Terminal Server

10 Workstations in HPC Lab10 Workstations in HPC Lab

Geowall systems for visualizationGeowall systems for visualization

SGI Altix 3000 ndash columbiaunbccaSGI Altix 3000 ndash columbiaunbcca64 Processors64 Processorsndash Intel Itanium2 (15Ghz)Intel Itanium2 (15Ghz)ndash 4Mb Cache4Mb Cache

64 Gb RAM64 Gb RAMndash 1Gbprocessor1Gbprocessor

NumaLink interconnectNumaLink interconnectndash 64Gbs64Gbsndash Fat TreeFat Tree

10GbE network connection10GbE network connectionSuse Linux Enterprise Suse Linux Enterprise Server 9Server 9

Linux Cluster ndash andreiunbccaLinux Cluster ndash andreiunbcca64 Nodes (128 processors) + 64 Nodes (128 processors) + Head NodeHead Nodendash AMD Opteron (21Ghz) AMD Opteron (21Ghz)

(2node)(2node)ndash 144Gb RAM (2node + 16 for 144Gb RAM (2node + 16 for

head)head)

GigE interconnectGigE interconnectndash Two nortel switchesTwo nortel switchesndash Network access via head Network access via head

nodenode

Operating SystemOperating Systemndash Suse 93Suse 93

StorageStoragendash 17 Tb of local storage on 17 Tb of local storage on

head node for software and head node for software and local copieslocal copies

File ServerFile ServerSGI Altix 350SGI Altix 350ndash 4p 8Gb RAM4p 8Gb RAM

SGI TP9100SGI TP9100ndash 6Tb Storage6Tb Storagendash RAID 5 with hot RAID 5 with hot

sparespare

10GbE network 10GbE network connectionconnectionMaintain type Maintain type backupbackup

Windows Terminal Server ndash Windows Terminal Server ndash ithacaunbccaithacaunbcca

Dell PowerEdge 6800Dell PowerEdge 6800ndash 4p (Intel Xeon 24Ghz)4p (Intel Xeon 24Ghz)ndash 8Gb RAM8Gb RAM

Local Raid for system volumeLocal Raid for system volumendash 600Gb volume600Gb volume

Accessible from anywhereAccessible from anywhere

Runs windows applicationsRuns windows applications

Workstations at HPC LabWorkstations at HPC Lab

Dell Precision 470Dell Precision 470ndash 2 Intel Xeon Processors (32Ghz)2 Intel Xeon Processors (32Ghz)ndash 2Gb RAM2Gb RAMndash NVidia Quadro FX3400 256MbNVidia Quadro FX3400 256Mbndash 2 Dell 20rdquo LCD displays2 Dell 20rdquo LCD displays

GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay

How to access the HPC systemsHow to access the HPC systems

From Windows to WindowsFrom Windows to Windows

From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection

Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-

01unbcca01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to LinuxFrom Linux to Linux

ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca

ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-

6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt

How to access the HPC systemsHow to access the HPC systems

From Windows to LinuxFrom Windows to Linux

Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom

httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and

select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN

ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

On a Linux machineOn a Linux machinendash smbmount smbmount

pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN

ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to

Reminder to HPC usersReminder to HPC users

Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc

Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei

What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang

Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism

Less fish vs more Less fish vs more fishfish

What is ParallelismWhat is Parallelism

Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem

The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data

Kinds of ParallelismKinds of Parallelism

Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI

Distributed Memory ndash MPIDistributed Memory ndash MPI

The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing

Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour

Shared Memory Shared Memory ParallelismParallelism

If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours

Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown

And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours

The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min

The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour

Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min

Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return

Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos

Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly

Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other

All data are privateAll data are private

Processes communicate by passing Processes communicate by passing messagesmessages

The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)

Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead

The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until

everyone is readyeveryone is ready

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 10: Introduction to HPC at UNBC

What kind of software do we haveWhat kind of software do we have

Why use STATAWhy use STATA

STATA is a complete integrated STATA is a complete integrated statistical package that provides statistical package that provides everything you need for data everything you need for data analysis data management and analysis data management and graphicsgraphics

What kind of software do we haveWhat kind of software do we haveThe NAG Fortran Library - the largest The NAG Fortran Library - the largest commercially available collection of numerical commercially available collection of numerical algorithms for Fortran todayalgorithms for Fortran today

Calling NAG LibraryCalling NAG Libraryndash Set Environmental Variables before you run your jobSet Environmental Variables before you run your job

LM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatLM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatexport LM_LICENSE_FILEexport LM_LICENSE_FILE

optintelfc90binifort -r8 testfor ndashLoptintelfc90binifort -r8 testfor ndashLusrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnagso -o testexe usrlocalfll6420dclliblibnagso -o testexe

What kind of software do we haveWhat kind of software do we have

FLUENT ndash Flow Modeling SoftwareFLUENT ndash Flow Modeling Software

What kind of hardware do we haveWhat kind of hardware do we have

SGI Altix 3000 ndash 64 processorSGI Altix 3000 ndash 64 processor

Linux Cluster ndash 128 processor Linux Cluster ndash 128 processor (Opteron)(Opteron)

File Server File Server

Windows Terminal ServerWindows Terminal Server

10 Workstations in HPC Lab10 Workstations in HPC Lab

Geowall systems for visualizationGeowall systems for visualization

SGI Altix 3000 ndash columbiaunbccaSGI Altix 3000 ndash columbiaunbcca64 Processors64 Processorsndash Intel Itanium2 (15Ghz)Intel Itanium2 (15Ghz)ndash 4Mb Cache4Mb Cache

64 Gb RAM64 Gb RAMndash 1Gbprocessor1Gbprocessor

NumaLink interconnectNumaLink interconnectndash 64Gbs64Gbsndash Fat TreeFat Tree

10GbE network connection10GbE network connectionSuse Linux Enterprise Suse Linux Enterprise Server 9Server 9

Linux Cluster ndash andreiunbccaLinux Cluster ndash andreiunbcca64 Nodes (128 processors) + 64 Nodes (128 processors) + Head NodeHead Nodendash AMD Opteron (21Ghz) AMD Opteron (21Ghz)

(2node)(2node)ndash 144Gb RAM (2node + 16 for 144Gb RAM (2node + 16 for

head)head)

GigE interconnectGigE interconnectndash Two nortel switchesTwo nortel switchesndash Network access via head Network access via head

nodenode

Operating SystemOperating Systemndash Suse 93Suse 93

StorageStoragendash 17 Tb of local storage on 17 Tb of local storage on

head node for software and head node for software and local copieslocal copies

File ServerFile ServerSGI Altix 350SGI Altix 350ndash 4p 8Gb RAM4p 8Gb RAM

SGI TP9100SGI TP9100ndash 6Tb Storage6Tb Storagendash RAID 5 with hot RAID 5 with hot

sparespare

10GbE network 10GbE network connectionconnectionMaintain type Maintain type backupbackup

Windows Terminal Server ndash Windows Terminal Server ndash ithacaunbccaithacaunbcca

Dell PowerEdge 6800Dell PowerEdge 6800ndash 4p (Intel Xeon 24Ghz)4p (Intel Xeon 24Ghz)ndash 8Gb RAM8Gb RAM

Local Raid for system volumeLocal Raid for system volumendash 600Gb volume600Gb volume

Accessible from anywhereAccessible from anywhere

Runs windows applicationsRuns windows applications

Workstations at HPC LabWorkstations at HPC Lab

Dell Precision 470Dell Precision 470ndash 2 Intel Xeon Processors (32Ghz)2 Intel Xeon Processors (32Ghz)ndash 2Gb RAM2Gb RAMndash NVidia Quadro FX3400 256MbNVidia Quadro FX3400 256Mbndash 2 Dell 20rdquo LCD displays2 Dell 20rdquo LCD displays

GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay

How to access the HPC systemsHow to access the HPC systems

From Windows to WindowsFrom Windows to Windows

From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection

Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-

01unbcca01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to LinuxFrom Linux to Linux

ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca

ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-

6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt

How to access the HPC systemsHow to access the HPC systems

From Windows to LinuxFrom Windows to Linux

Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom

httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and

select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN

ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

On a Linux machineOn a Linux machinendash smbmount smbmount

pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN

ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to

Reminder to HPC usersReminder to HPC users

Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc

Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei

What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang

Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism

Less fish vs more Less fish vs more fishfish

What is ParallelismWhat is Parallelism

Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem

The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data

Kinds of ParallelismKinds of Parallelism

Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI

Distributed Memory ndash MPIDistributed Memory ndash MPI

The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing

Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour

Shared Memory Shared Memory ParallelismParallelism

If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours

Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown

And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours

The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min

The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour

Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min

Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return

Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos

Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly

Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other

All data are privateAll data are private

Processes communicate by passing Processes communicate by passing messagesmessages

The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)

Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead

The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until

everyone is readyeveryone is ready

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 11: Introduction to HPC at UNBC

What kind of software do we haveWhat kind of software do we haveThe NAG Fortran Library - the largest The NAG Fortran Library - the largest commercially available collection of numerical commercially available collection of numerical algorithms for Fortran todayalgorithms for Fortran today

Calling NAG LibraryCalling NAG Libraryndash Set Environmental Variables before you run your jobSet Environmental Variables before you run your job

LM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatLM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatexport LM_LICENSE_FILEexport LM_LICENSE_FILE

optintelfc90binifort -r8 testfor ndashLoptintelfc90binifort -r8 testfor ndashLusrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnagso -o testexe usrlocalfll6420dclliblibnagso -o testexe

What kind of software do we haveWhat kind of software do we have

FLUENT ndash Flow Modeling SoftwareFLUENT ndash Flow Modeling Software

What kind of hardware do we haveWhat kind of hardware do we have

SGI Altix 3000 ndash 64 processorSGI Altix 3000 ndash 64 processor

Linux Cluster ndash 128 processor Linux Cluster ndash 128 processor (Opteron)(Opteron)

File Server File Server

Windows Terminal ServerWindows Terminal Server

10 Workstations in HPC Lab10 Workstations in HPC Lab

Geowall systems for visualizationGeowall systems for visualization

SGI Altix 3000 ndash columbiaunbccaSGI Altix 3000 ndash columbiaunbcca64 Processors64 Processorsndash Intel Itanium2 (15Ghz)Intel Itanium2 (15Ghz)ndash 4Mb Cache4Mb Cache

64 Gb RAM64 Gb RAMndash 1Gbprocessor1Gbprocessor

NumaLink interconnectNumaLink interconnectndash 64Gbs64Gbsndash Fat TreeFat Tree

10GbE network connection10GbE network connectionSuse Linux Enterprise Suse Linux Enterprise Server 9Server 9

Linux Cluster ndash andreiunbccaLinux Cluster ndash andreiunbcca64 Nodes (128 processors) + 64 Nodes (128 processors) + Head NodeHead Nodendash AMD Opteron (21Ghz) AMD Opteron (21Ghz)

(2node)(2node)ndash 144Gb RAM (2node + 16 for 144Gb RAM (2node + 16 for

head)head)

GigE interconnectGigE interconnectndash Two nortel switchesTwo nortel switchesndash Network access via head Network access via head

nodenode

Operating SystemOperating Systemndash Suse 93Suse 93

StorageStoragendash 17 Tb of local storage on 17 Tb of local storage on

head node for software and head node for software and local copieslocal copies

File ServerFile ServerSGI Altix 350SGI Altix 350ndash 4p 8Gb RAM4p 8Gb RAM

SGI TP9100SGI TP9100ndash 6Tb Storage6Tb Storagendash RAID 5 with hot RAID 5 with hot

sparespare

10GbE network 10GbE network connectionconnectionMaintain type Maintain type backupbackup

Windows Terminal Server ndash Windows Terminal Server ndash ithacaunbccaithacaunbcca

Dell PowerEdge 6800Dell PowerEdge 6800ndash 4p (Intel Xeon 24Ghz)4p (Intel Xeon 24Ghz)ndash 8Gb RAM8Gb RAM

Local Raid for system volumeLocal Raid for system volumendash 600Gb volume600Gb volume

Accessible from anywhereAccessible from anywhere

Runs windows applicationsRuns windows applications

Workstations at HPC LabWorkstations at HPC Lab

Dell Precision 470Dell Precision 470ndash 2 Intel Xeon Processors (32Ghz)2 Intel Xeon Processors (32Ghz)ndash 2Gb RAM2Gb RAMndash NVidia Quadro FX3400 256MbNVidia Quadro FX3400 256Mbndash 2 Dell 20rdquo LCD displays2 Dell 20rdquo LCD displays

GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay

How to access the HPC systemsHow to access the HPC systems

From Windows to WindowsFrom Windows to Windows

From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection

Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-

01unbcca01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to LinuxFrom Linux to Linux

ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca

ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-

6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt

How to access the HPC systemsHow to access the HPC systems

From Windows to LinuxFrom Windows to Linux

Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom

httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and

select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN

ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

On a Linux machineOn a Linux machinendash smbmount smbmount

pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN

ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to

Reminder to HPC usersReminder to HPC users

Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc

Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei

What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang

Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism

Less fish vs more Less fish vs more fishfish

What is ParallelismWhat is Parallelism

Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem

The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data

Kinds of ParallelismKinds of Parallelism

Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI

Distributed Memory ndash MPIDistributed Memory ndash MPI

The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing

Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour

Shared Memory Shared Memory ParallelismParallelism

If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours

Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown

And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours

The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min

The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour

Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min

Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return

Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos

Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly

Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other

All data are privateAll data are private

Processes communicate by passing Processes communicate by passing messagesmessages

The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)

Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead

The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until

everyone is readyeveryone is ready

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 12: Introduction to HPC at UNBC

What kind of software do we haveWhat kind of software do we have

FLUENT ndash Flow Modeling SoftwareFLUENT ndash Flow Modeling Software

What kind of hardware do we haveWhat kind of hardware do we have

SGI Altix 3000 ndash 64 processorSGI Altix 3000 ndash 64 processor

Linux Cluster ndash 128 processor Linux Cluster ndash 128 processor (Opteron)(Opteron)

File Server File Server

Windows Terminal ServerWindows Terminal Server

10 Workstations in HPC Lab10 Workstations in HPC Lab

Geowall systems for visualizationGeowall systems for visualization

SGI Altix 3000 ndash columbiaunbccaSGI Altix 3000 ndash columbiaunbcca64 Processors64 Processorsndash Intel Itanium2 (15Ghz)Intel Itanium2 (15Ghz)ndash 4Mb Cache4Mb Cache

64 Gb RAM64 Gb RAMndash 1Gbprocessor1Gbprocessor

NumaLink interconnectNumaLink interconnectndash 64Gbs64Gbsndash Fat TreeFat Tree

10GbE network connection10GbE network connectionSuse Linux Enterprise Suse Linux Enterprise Server 9Server 9

Linux Cluster ndash andreiunbccaLinux Cluster ndash andreiunbcca64 Nodes (128 processors) + 64 Nodes (128 processors) + Head NodeHead Nodendash AMD Opteron (21Ghz) AMD Opteron (21Ghz)

(2node)(2node)ndash 144Gb RAM (2node + 16 for 144Gb RAM (2node + 16 for

head)head)

GigE interconnectGigE interconnectndash Two nortel switchesTwo nortel switchesndash Network access via head Network access via head

nodenode

Operating SystemOperating Systemndash Suse 93Suse 93

StorageStoragendash 17 Tb of local storage on 17 Tb of local storage on

head node for software and head node for software and local copieslocal copies

File ServerFile ServerSGI Altix 350SGI Altix 350ndash 4p 8Gb RAM4p 8Gb RAM

SGI TP9100SGI TP9100ndash 6Tb Storage6Tb Storagendash RAID 5 with hot RAID 5 with hot

sparespare

10GbE network 10GbE network connectionconnectionMaintain type Maintain type backupbackup

Windows Terminal Server ndash Windows Terminal Server ndash ithacaunbccaithacaunbcca

Dell PowerEdge 6800Dell PowerEdge 6800ndash 4p (Intel Xeon 24Ghz)4p (Intel Xeon 24Ghz)ndash 8Gb RAM8Gb RAM

Local Raid for system volumeLocal Raid for system volumendash 600Gb volume600Gb volume

Accessible from anywhereAccessible from anywhere

Runs windows applicationsRuns windows applications

Workstations at HPC LabWorkstations at HPC Lab

Dell Precision 470Dell Precision 470ndash 2 Intel Xeon Processors (32Ghz)2 Intel Xeon Processors (32Ghz)ndash 2Gb RAM2Gb RAMndash NVidia Quadro FX3400 256MbNVidia Quadro FX3400 256Mbndash 2 Dell 20rdquo LCD displays2 Dell 20rdquo LCD displays

GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay

How to access the HPC systemsHow to access the HPC systems

From Windows to WindowsFrom Windows to Windows

From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection

Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-

01unbcca01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to LinuxFrom Linux to Linux

ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca

ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-

6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt

How to access the HPC systemsHow to access the HPC systems

From Windows to LinuxFrom Windows to Linux

Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom

httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and

select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN

ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

On a Linux machineOn a Linux machinendash smbmount smbmount

pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN

ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to

Reminder to HPC usersReminder to HPC users

Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc

Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei

What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang

Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism

Less fish vs more Less fish vs more fishfish

What is ParallelismWhat is Parallelism

Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem

The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data

Kinds of ParallelismKinds of Parallelism

Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI

Distributed Memory ndash MPIDistributed Memory ndash MPI

The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing

Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour

Shared Memory Shared Memory ParallelismParallelism

If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours

Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown

And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours

The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min

The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour

Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min

Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return

Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos

Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly

Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other

All data are privateAll data are private

Processes communicate by passing Processes communicate by passing messagesmessages

The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)

Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead

The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until

everyone is readyeveryone is ready

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 13: Introduction to HPC at UNBC

What kind of hardware do we haveWhat kind of hardware do we have

SGI Altix 3000 ndash 64 processorSGI Altix 3000 ndash 64 processor

Linux Cluster ndash 128 processor Linux Cluster ndash 128 processor (Opteron)(Opteron)

File Server File Server

Windows Terminal ServerWindows Terminal Server

10 Workstations in HPC Lab10 Workstations in HPC Lab

Geowall systems for visualizationGeowall systems for visualization

SGI Altix 3000 ndash columbiaunbccaSGI Altix 3000 ndash columbiaunbcca64 Processors64 Processorsndash Intel Itanium2 (15Ghz)Intel Itanium2 (15Ghz)ndash 4Mb Cache4Mb Cache

64 Gb RAM64 Gb RAMndash 1Gbprocessor1Gbprocessor

NumaLink interconnectNumaLink interconnectndash 64Gbs64Gbsndash Fat TreeFat Tree

10GbE network connection10GbE network connectionSuse Linux Enterprise Suse Linux Enterprise Server 9Server 9

Linux Cluster ndash andreiunbccaLinux Cluster ndash andreiunbcca64 Nodes (128 processors) + 64 Nodes (128 processors) + Head NodeHead Nodendash AMD Opteron (21Ghz) AMD Opteron (21Ghz)

(2node)(2node)ndash 144Gb RAM (2node + 16 for 144Gb RAM (2node + 16 for

head)head)

GigE interconnectGigE interconnectndash Two nortel switchesTwo nortel switchesndash Network access via head Network access via head

nodenode

Operating SystemOperating Systemndash Suse 93Suse 93

StorageStoragendash 17 Tb of local storage on 17 Tb of local storage on

head node for software and head node for software and local copieslocal copies

File ServerFile ServerSGI Altix 350SGI Altix 350ndash 4p 8Gb RAM4p 8Gb RAM

SGI TP9100SGI TP9100ndash 6Tb Storage6Tb Storagendash RAID 5 with hot RAID 5 with hot

sparespare

10GbE network 10GbE network connectionconnectionMaintain type Maintain type backupbackup

Windows Terminal Server ndash Windows Terminal Server ndash ithacaunbccaithacaunbcca

Dell PowerEdge 6800Dell PowerEdge 6800ndash 4p (Intel Xeon 24Ghz)4p (Intel Xeon 24Ghz)ndash 8Gb RAM8Gb RAM

Local Raid for system volumeLocal Raid for system volumendash 600Gb volume600Gb volume

Accessible from anywhereAccessible from anywhere

Runs windows applicationsRuns windows applications

Workstations at HPC LabWorkstations at HPC Lab

Dell Precision 470Dell Precision 470ndash 2 Intel Xeon Processors (32Ghz)2 Intel Xeon Processors (32Ghz)ndash 2Gb RAM2Gb RAMndash NVidia Quadro FX3400 256MbNVidia Quadro FX3400 256Mbndash 2 Dell 20rdquo LCD displays2 Dell 20rdquo LCD displays

GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay

How to access the HPC systemsHow to access the HPC systems

From Windows to WindowsFrom Windows to Windows

From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection

Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-

01unbcca01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to LinuxFrom Linux to Linux

ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca

ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-

6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt

How to access the HPC systemsHow to access the HPC systems

From Windows to LinuxFrom Windows to Linux

Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom

httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and

select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN

ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

On a Linux machineOn a Linux machinendash smbmount smbmount

pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN

ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to

Reminder to HPC usersReminder to HPC users

Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc

Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei

What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang

Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism

Less fish vs more Less fish vs more fishfish

What is ParallelismWhat is Parallelism

Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem

The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data

Kinds of ParallelismKinds of Parallelism

Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI

Distributed Memory ndash MPIDistributed Memory ndash MPI

The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing

Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour

Shared Memory Shared Memory ParallelismParallelism

If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours

Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown

And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours

The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min

The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour

Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min

Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return

Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos

Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly

Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other

All data are privateAll data are private

Processes communicate by passing Processes communicate by passing messagesmessages

The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)

Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead

The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until

everyone is readyeveryone is ready

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 14: Introduction to HPC at UNBC

SGI Altix 3000 ndash columbiaunbccaSGI Altix 3000 ndash columbiaunbcca64 Processors64 Processorsndash Intel Itanium2 (15Ghz)Intel Itanium2 (15Ghz)ndash 4Mb Cache4Mb Cache

64 Gb RAM64 Gb RAMndash 1Gbprocessor1Gbprocessor

NumaLink interconnectNumaLink interconnectndash 64Gbs64Gbsndash Fat TreeFat Tree

10GbE network connection10GbE network connectionSuse Linux Enterprise Suse Linux Enterprise Server 9Server 9

Linux Cluster ndash andreiunbccaLinux Cluster ndash andreiunbcca64 Nodes (128 processors) + 64 Nodes (128 processors) + Head NodeHead Nodendash AMD Opteron (21Ghz) AMD Opteron (21Ghz)

(2node)(2node)ndash 144Gb RAM (2node + 16 for 144Gb RAM (2node + 16 for

head)head)

GigE interconnectGigE interconnectndash Two nortel switchesTwo nortel switchesndash Network access via head Network access via head

nodenode

Operating SystemOperating Systemndash Suse 93Suse 93

StorageStoragendash 17 Tb of local storage on 17 Tb of local storage on

head node for software and head node for software and local copieslocal copies

File ServerFile ServerSGI Altix 350SGI Altix 350ndash 4p 8Gb RAM4p 8Gb RAM

SGI TP9100SGI TP9100ndash 6Tb Storage6Tb Storagendash RAID 5 with hot RAID 5 with hot

sparespare

10GbE network 10GbE network connectionconnectionMaintain type Maintain type backupbackup

Windows Terminal Server ndash Windows Terminal Server ndash ithacaunbccaithacaunbcca

Dell PowerEdge 6800Dell PowerEdge 6800ndash 4p (Intel Xeon 24Ghz)4p (Intel Xeon 24Ghz)ndash 8Gb RAM8Gb RAM

Local Raid for system volumeLocal Raid for system volumendash 600Gb volume600Gb volume

Accessible from anywhereAccessible from anywhere

Runs windows applicationsRuns windows applications

Workstations at HPC LabWorkstations at HPC Lab

Dell Precision 470Dell Precision 470ndash 2 Intel Xeon Processors (32Ghz)2 Intel Xeon Processors (32Ghz)ndash 2Gb RAM2Gb RAMndash NVidia Quadro FX3400 256MbNVidia Quadro FX3400 256Mbndash 2 Dell 20rdquo LCD displays2 Dell 20rdquo LCD displays

GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay

How to access the HPC systemsHow to access the HPC systems

From Windows to WindowsFrom Windows to Windows

From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection

Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-

01unbcca01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to LinuxFrom Linux to Linux

ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca

ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-

6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt

How to access the HPC systemsHow to access the HPC systems

From Windows to LinuxFrom Windows to Linux

Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom

httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and

select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN

ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

On a Linux machineOn a Linux machinendash smbmount smbmount

pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN

ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to

Reminder to HPC usersReminder to HPC users

Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc

Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei

What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang

Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism

Less fish vs more Less fish vs more fishfish

What is ParallelismWhat is Parallelism

Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem

The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data

Kinds of ParallelismKinds of Parallelism

Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI

Distributed Memory ndash MPIDistributed Memory ndash MPI

The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing

Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour

Shared Memory Shared Memory ParallelismParallelism

If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours

Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown

And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours

The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min

The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour

Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min

Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return

Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos

Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly

Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other

All data are privateAll data are private

Processes communicate by passing Processes communicate by passing messagesmessages

The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)

Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead

The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until

everyone is readyeveryone is ready

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 15: Introduction to HPC at UNBC

Linux Cluster ndash andreiunbccaLinux Cluster ndash andreiunbcca64 Nodes (128 processors) + 64 Nodes (128 processors) + Head NodeHead Nodendash AMD Opteron (21Ghz) AMD Opteron (21Ghz)

(2node)(2node)ndash 144Gb RAM (2node + 16 for 144Gb RAM (2node + 16 for

head)head)

GigE interconnectGigE interconnectndash Two nortel switchesTwo nortel switchesndash Network access via head Network access via head

nodenode

Operating SystemOperating Systemndash Suse 93Suse 93

StorageStoragendash 17 Tb of local storage on 17 Tb of local storage on

head node for software and head node for software and local copieslocal copies

File ServerFile ServerSGI Altix 350SGI Altix 350ndash 4p 8Gb RAM4p 8Gb RAM

SGI TP9100SGI TP9100ndash 6Tb Storage6Tb Storagendash RAID 5 with hot RAID 5 with hot

sparespare

10GbE network 10GbE network connectionconnectionMaintain type Maintain type backupbackup

Windows Terminal Server ndash Windows Terminal Server ndash ithacaunbccaithacaunbcca

Dell PowerEdge 6800Dell PowerEdge 6800ndash 4p (Intel Xeon 24Ghz)4p (Intel Xeon 24Ghz)ndash 8Gb RAM8Gb RAM

Local Raid for system volumeLocal Raid for system volumendash 600Gb volume600Gb volume

Accessible from anywhereAccessible from anywhere

Runs windows applicationsRuns windows applications

Workstations at HPC LabWorkstations at HPC Lab

Dell Precision 470Dell Precision 470ndash 2 Intel Xeon Processors (32Ghz)2 Intel Xeon Processors (32Ghz)ndash 2Gb RAM2Gb RAMndash NVidia Quadro FX3400 256MbNVidia Quadro FX3400 256Mbndash 2 Dell 20rdquo LCD displays2 Dell 20rdquo LCD displays

GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay

How to access the HPC systemsHow to access the HPC systems

From Windows to WindowsFrom Windows to Windows

From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection

Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-

01unbcca01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to LinuxFrom Linux to Linux

ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca

ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-

6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt

How to access the HPC systemsHow to access the HPC systems

From Windows to LinuxFrom Windows to Linux

Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom

httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and

select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN

ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

On a Linux machineOn a Linux machinendash smbmount smbmount

pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN

ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to

Reminder to HPC usersReminder to HPC users

Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc

Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei

What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang

Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism

Less fish vs more Less fish vs more fishfish

What is ParallelismWhat is Parallelism

Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem

The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data

Kinds of ParallelismKinds of Parallelism

Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI

Distributed Memory ndash MPIDistributed Memory ndash MPI

The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing

Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour

Shared Memory Shared Memory ParallelismParallelism

If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours

Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown

And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours

The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min

The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour

Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min

Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return

Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos

Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly

Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other

All data are privateAll data are private

Processes communicate by passing Processes communicate by passing messagesmessages

The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)

Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead

The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until

everyone is readyeveryone is ready

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 16: Introduction to HPC at UNBC

File ServerFile ServerSGI Altix 350SGI Altix 350ndash 4p 8Gb RAM4p 8Gb RAM

SGI TP9100SGI TP9100ndash 6Tb Storage6Tb Storagendash RAID 5 with hot RAID 5 with hot

sparespare

10GbE network 10GbE network connectionconnectionMaintain type Maintain type backupbackup

Windows Terminal Server ndash Windows Terminal Server ndash ithacaunbccaithacaunbcca

Dell PowerEdge 6800Dell PowerEdge 6800ndash 4p (Intel Xeon 24Ghz)4p (Intel Xeon 24Ghz)ndash 8Gb RAM8Gb RAM

Local Raid for system volumeLocal Raid for system volumendash 600Gb volume600Gb volume

Accessible from anywhereAccessible from anywhere

Runs windows applicationsRuns windows applications

Workstations at HPC LabWorkstations at HPC Lab

Dell Precision 470Dell Precision 470ndash 2 Intel Xeon Processors (32Ghz)2 Intel Xeon Processors (32Ghz)ndash 2Gb RAM2Gb RAMndash NVidia Quadro FX3400 256MbNVidia Quadro FX3400 256Mbndash 2 Dell 20rdquo LCD displays2 Dell 20rdquo LCD displays

GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay

How to access the HPC systemsHow to access the HPC systems

From Windows to WindowsFrom Windows to Windows

From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection

Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-

01unbcca01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to LinuxFrom Linux to Linux

ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca

ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-

6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt

How to access the HPC systemsHow to access the HPC systems

From Windows to LinuxFrom Windows to Linux

Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom

httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and

select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN

ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

On a Linux machineOn a Linux machinendash smbmount smbmount

pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN

ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to

Reminder to HPC usersReminder to HPC users

Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc

Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei

What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang

Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism

Less fish vs more Less fish vs more fishfish

What is ParallelismWhat is Parallelism

Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem

The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data

Kinds of ParallelismKinds of Parallelism

Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI

Distributed Memory ndash MPIDistributed Memory ndash MPI

The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing

Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour

Shared Memory Shared Memory ParallelismParallelism

If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours

Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown

And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours

The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min

The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour

Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min

Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return

Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos

Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly

Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other

All data are privateAll data are private

Processes communicate by passing Processes communicate by passing messagesmessages

The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)

Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead

The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until

everyone is readyeveryone is ready

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 17: Introduction to HPC at UNBC

Windows Terminal Server ndash Windows Terminal Server ndash ithacaunbccaithacaunbcca

Dell PowerEdge 6800Dell PowerEdge 6800ndash 4p (Intel Xeon 24Ghz)4p (Intel Xeon 24Ghz)ndash 8Gb RAM8Gb RAM

Local Raid for system volumeLocal Raid for system volumendash 600Gb volume600Gb volume

Accessible from anywhereAccessible from anywhere

Runs windows applicationsRuns windows applications

Workstations at HPC LabWorkstations at HPC Lab

Dell Precision 470Dell Precision 470ndash 2 Intel Xeon Processors (32Ghz)2 Intel Xeon Processors (32Ghz)ndash 2Gb RAM2Gb RAMndash NVidia Quadro FX3400 256MbNVidia Quadro FX3400 256Mbndash 2 Dell 20rdquo LCD displays2 Dell 20rdquo LCD displays

GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay

How to access the HPC systemsHow to access the HPC systems

From Windows to WindowsFrom Windows to Windows

From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection

Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-

01unbcca01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to LinuxFrom Linux to Linux

ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca

ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-

6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt

How to access the HPC systemsHow to access the HPC systems

From Windows to LinuxFrom Windows to Linux

Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom

httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and

select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN

ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

On a Linux machineOn a Linux machinendash smbmount smbmount

pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN

ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to

Reminder to HPC usersReminder to HPC users

Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc

Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei

What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang

Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism

Less fish vs more Less fish vs more fishfish

What is ParallelismWhat is Parallelism

Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem

The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data

Kinds of ParallelismKinds of Parallelism

Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI

Distributed Memory ndash MPIDistributed Memory ndash MPI

The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing

Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour

Shared Memory Shared Memory ParallelismParallelism

If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours

Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown

And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours

The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min

The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour

Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min

Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return

Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos

Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly

Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other

All data are privateAll data are private

Processes communicate by passing Processes communicate by passing messagesmessages

The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)

Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead

The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until

everyone is readyeveryone is ready

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 18: Introduction to HPC at UNBC

Workstations at HPC LabWorkstations at HPC Lab

Dell Precision 470Dell Precision 470ndash 2 Intel Xeon Processors (32Ghz)2 Intel Xeon Processors (32Ghz)ndash 2Gb RAM2Gb RAMndash NVidia Quadro FX3400 256MbNVidia Quadro FX3400 256Mbndash 2 Dell 20rdquo LCD displays2 Dell 20rdquo LCD displays

GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay

How to access the HPC systemsHow to access the HPC systems

From Windows to WindowsFrom Windows to Windows

From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection

Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-

01unbcca01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to LinuxFrom Linux to Linux

ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca

ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-

6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt

How to access the HPC systemsHow to access the HPC systems

From Windows to LinuxFrom Windows to Linux

Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom

httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and

select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN

ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

On a Linux machineOn a Linux machinendash smbmount smbmount

pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN

ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to

Reminder to HPC usersReminder to HPC users

Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc

Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei

What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang

Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism

Less fish vs more Less fish vs more fishfish

What is ParallelismWhat is Parallelism

Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem

The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data

Kinds of ParallelismKinds of Parallelism

Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI

Distributed Memory ndash MPIDistributed Memory ndash MPI

The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing

Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour

Shared Memory Shared Memory ParallelismParallelism

If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours

Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown

And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours

The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min

The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour

Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min

Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return

Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos

Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly

Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other

All data are privateAll data are private

Processes communicate by passing Processes communicate by passing messagesmessages

The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)

Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead

The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until

everyone is readyeveryone is ready

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 19: Introduction to HPC at UNBC

GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay

How to access the HPC systemsHow to access the HPC systems

From Windows to WindowsFrom Windows to Windows

From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection

Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-

01unbcca01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to LinuxFrom Linux to Linux

ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca

ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-

6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt

How to access the HPC systemsHow to access the HPC systems

From Windows to LinuxFrom Windows to Linux

Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom

httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and

select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN

ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

On a Linux machineOn a Linux machinendash smbmount smbmount

pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN

ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to

Reminder to HPC usersReminder to HPC users

Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc

Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei

What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang

Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism

Less fish vs more Less fish vs more fishfish

What is ParallelismWhat is Parallelism

Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem

The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data

Kinds of ParallelismKinds of Parallelism

Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI

Distributed Memory ndash MPIDistributed Memory ndash MPI

The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing

Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour

Shared Memory Shared Memory ParallelismParallelism

If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours

Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown

And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours

The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min

The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour

Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min

Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return

Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos

Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly

Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other

All data are privateAll data are private

Processes communicate by passing Processes communicate by passing messagesmessages

The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)

Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead

The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until

everyone is readyeveryone is ready

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 20: Introduction to HPC at UNBC

How to access the HPC systemsHow to access the HPC systems

From Windows to WindowsFrom Windows to Windows

From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection

Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-

01unbcca01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to LinuxFrom Linux to Linux

ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca

ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-

6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt

How to access the HPC systemsHow to access the HPC systems

From Windows to LinuxFrom Windows to Linux

Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom

httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and

select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN

ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

On a Linux machineOn a Linux machinendash smbmount smbmount

pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN

ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to

Reminder to HPC usersReminder to HPC users

Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc

Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei

What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang

Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism

Less fish vs more Less fish vs more fishfish

What is ParallelismWhat is Parallelism

Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem

The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data

Kinds of ParallelismKinds of Parallelism

Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI

Distributed Memory ndash MPIDistributed Memory ndash MPI

The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing

Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour

Shared Memory Shared Memory ParallelismParallelism

If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours

Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown

And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours

The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min

The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour

Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min

Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return

Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos

Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly

Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other

All data are privateAll data are private

Processes communicate by passing Processes communicate by passing messagesmessages

The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)

Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead

The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until

everyone is readyeveryone is ready

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 21: Introduction to HPC at UNBC

How to access the HPC systemsHow to access the HPC systems

From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-

01unbcca01unbcca

Log on to UNILog on to UNI

How to access the HPC systemsHow to access the HPC systems

From Linux to LinuxFrom Linux to Linux

ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca

ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-

6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt

How to access the HPC systemsHow to access the HPC systems

From Windows to LinuxFrom Windows to Linux

Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom

httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and

select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN

ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

On a Linux machineOn a Linux machinendash smbmount smbmount

pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN

ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to

Reminder to HPC usersReminder to HPC users

Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc

Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei

What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang

Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism

Less fish vs more Less fish vs more fishfish

What is ParallelismWhat is Parallelism

Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem

The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data

Kinds of ParallelismKinds of Parallelism

Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI

Distributed Memory ndash MPIDistributed Memory ndash MPI

The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing

Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour

Shared Memory Shared Memory ParallelismParallelism

If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours

Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown

And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours

The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min

The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour

Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min

Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return

Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos

Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly

Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other

All data are privateAll data are private

Processes communicate by passing Processes communicate by passing messagesmessages

The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)

Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead

The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until

everyone is readyeveryone is ready

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 22: Introduction to HPC at UNBC

How to access the HPC systemsHow to access the HPC systems

From Linux to LinuxFrom Linux to Linux

ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca

ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-

6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt

How to access the HPC systemsHow to access the HPC systems

From Windows to LinuxFrom Windows to Linux

Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom

httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and

select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN

ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

On a Linux machineOn a Linux machinendash smbmount smbmount

pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN

ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to

Reminder to HPC usersReminder to HPC users

Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc

Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei

What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang

Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism

Less fish vs more Less fish vs more fishfish

What is ParallelismWhat is Parallelism

Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem

The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data

Kinds of ParallelismKinds of Parallelism

Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI

Distributed Memory ndash MPIDistributed Memory ndash MPI

The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing

Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour

Shared Memory Shared Memory ParallelismParallelism

If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours

Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown

And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours

The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min

The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour

Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min

Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return

Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos

Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly

Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other

All data are privateAll data are private

Processes communicate by passing Processes communicate by passing messagesmessages

The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)

Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead

The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until

everyone is readyeveryone is ready

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 23: Introduction to HPC at UNBC

How to access the HPC systemsHow to access the HPC systems

From Windows to LinuxFrom Windows to Linux

Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom

httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and

select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN

ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

On a Linux machineOn a Linux machinendash smbmount smbmount

pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN

ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to

Reminder to HPC usersReminder to HPC users

Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc

Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei

What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang

Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism

Less fish vs more Less fish vs more fishfish

What is ParallelismWhat is Parallelism

Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem

The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data

Kinds of ParallelismKinds of Parallelism

Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI

Distributed Memory ndash MPIDistributed Memory ndash MPI

The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing

Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour

Shared Memory Shared Memory ParallelismParallelism

If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours

Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown

And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours

The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min

The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour

Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min

Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return

Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos

Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly

Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other

All data are privateAll data are private

Processes communicate by passing Processes communicate by passing messagesmessages

The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)

Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead

The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until

everyone is readyeveryone is ready

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 24: Introduction to HPC at UNBC

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and

select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN

ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

On a Linux machineOn a Linux machinendash smbmount smbmount

pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN

ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to

Reminder to HPC usersReminder to HPC users

Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc

Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei

What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang

Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism

Less fish vs more Less fish vs more fishfish

What is ParallelismWhat is Parallelism

Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem

The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data

Kinds of ParallelismKinds of Parallelism

Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI

Distributed Memory ndash MPIDistributed Memory ndash MPI

The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing

Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour

Shared Memory Shared Memory ParallelismParallelism

If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours

Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown

And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours

The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min

The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour

Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min

Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return

Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos

Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly

Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other

All data are privateAll data are private

Processes communicate by passing Processes communicate by passing messagesmessages

The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)

Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead

The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until

everyone is readyeveryone is ready

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 25: Introduction to HPC at UNBC

How to access the HPC systemsHow to access the HPC systems

How to mount hpc file systemHow to mount hpc file system

On a Linux machineOn a Linux machinendash smbmount smbmount

pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN

ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to

Reminder to HPC usersReminder to HPC users

Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc

Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei

What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang

Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism

Less fish vs more Less fish vs more fishfish

What is ParallelismWhat is Parallelism

Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem

The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data

Kinds of ParallelismKinds of Parallelism

Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI

Distributed Memory ndash MPIDistributed Memory ndash MPI

The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing

Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour

Shared Memory Shared Memory ParallelismParallelism

If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours

Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown

And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours

The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min

The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour

Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min

Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return

Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos

Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly

Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other

All data are privateAll data are private

Processes communicate by passing Processes communicate by passing messagesmessages

The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)

Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead

The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until

everyone is readyeveryone is ready

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 26: Introduction to HPC at UNBC

Reminder to HPC usersReminder to HPC users

Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc

Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei

What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang

Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism

Less fish vs more Less fish vs more fishfish

What is ParallelismWhat is Parallelism

Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem

The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data

Kinds of ParallelismKinds of Parallelism

Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI

Distributed Memory ndash MPIDistributed Memory ndash MPI

The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing

Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour

Shared Memory Shared Memory ParallelismParallelism

If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours

Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown

And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours

The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min

The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour

Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min

Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return

Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos

Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly

Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other

All data are privateAll data are private

Processes communicate by passing Processes communicate by passing messagesmessages

The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)

Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead

The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until

everyone is readyeveryone is ready

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 27: Introduction to HPC at UNBC

What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang

Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism

Less fish vs more Less fish vs more fishfish

What is ParallelismWhat is Parallelism

Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem

The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data

Kinds of ParallelismKinds of Parallelism

Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI

Distributed Memory ndash MPIDistributed Memory ndash MPI

The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing

Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour

Shared Memory Shared Memory ParallelismParallelism

If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours

Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown

And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours

The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min

The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour

Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min

Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return

Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos

Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly

Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other

All data are privateAll data are private

Processes communicate by passing Processes communicate by passing messagesmessages

The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)

Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead

The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until

everyone is readyeveryone is ready

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 28: Introduction to HPC at UNBC

Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism

Less fish vs more Less fish vs more fishfish

What is ParallelismWhat is Parallelism

Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem

The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data

Kinds of ParallelismKinds of Parallelism

Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI

Distributed Memory ndash MPIDistributed Memory ndash MPI

The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing

Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour

Shared Memory Shared Memory ParallelismParallelism

If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours

Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown

And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours

The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min

The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour

Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min

Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return

Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos

Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly

Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other

All data are privateAll data are private

Processes communicate by passing Processes communicate by passing messagesmessages

The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)

Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead

The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until

everyone is readyeveryone is ready

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 29: Introduction to HPC at UNBC

What is ParallelismWhat is Parallelism

Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem

The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data

Kinds of ParallelismKinds of Parallelism

Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI

Distributed Memory ndash MPIDistributed Memory ndash MPI

The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing

Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour

Shared Memory Shared Memory ParallelismParallelism

If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours

Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown

And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours

The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min

The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour

Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min

Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return

Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos

Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly

Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other

All data are privateAll data are private

Processes communicate by passing Processes communicate by passing messagesmessages

The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)

Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead

The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until

everyone is readyeveryone is ready

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 30: Introduction to HPC at UNBC

Kinds of ParallelismKinds of Parallelism

Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI

Distributed Memory ndash MPIDistributed Memory ndash MPI

The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing

Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour

Shared Memory Shared Memory ParallelismParallelism

If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours

Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown

And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours

The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min

The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour

Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min

Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return

Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos

Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly

Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other

All data are privateAll data are private

Processes communicate by passing Processes communicate by passing messagesmessages

The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)

Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead

The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until

everyone is readyeveryone is ready

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 31: Introduction to HPC at UNBC

The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing

Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour

Shared Memory Shared Memory ParallelismParallelism

If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours

Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown

And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours

The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min

The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour

Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min

Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return

Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos

Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly

Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other

All data are privateAll data are private

Processes communicate by passing Processes communicate by passing messagesmessages

The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)

Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead

The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until

everyone is readyeveryone is ready

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 32: Introduction to HPC at UNBC

Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown

And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours

The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min

The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour

Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min

Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return

Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos

Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly

Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other

All data are privateAll data are private

Processes communicate by passing Processes communicate by passing messagesmessages

The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)

Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead

The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until

everyone is readyeveryone is ready

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 33: Introduction to HPC at UNBC

The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour

Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min

Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return

Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos

Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly

Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other

All data are privateAll data are private

Processes communicate by passing Processes communicate by passing messagesmessages

The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)

Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead

The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until

everyone is readyeveryone is ready

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 34: Introduction to HPC at UNBC

Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min

Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return

Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos

Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly

Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other

All data are privateAll data are private

Processes communicate by passing Processes communicate by passing messagesmessages

The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)

Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead

The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until

everyone is readyeveryone is ready

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 35: Introduction to HPC at UNBC

Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos

Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly

Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other

All data are privateAll data are private

Processes communicate by passing Processes communicate by passing messagesmessages

The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)

Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead

The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until

everyone is readyeveryone is ready

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 36: Introduction to HPC at UNBC

Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other

All data are privateAll data are private

Processes communicate by passing Processes communicate by passing messagesmessages

The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)

Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead

The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until

everyone is readyeveryone is ready

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 37: Introduction to HPC at UNBC

Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead

The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until

everyone is readyeveryone is ready

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 38: Introduction to HPC at UNBC

OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms

MPIhellip parallelizing data MPIhellip parallelizing data

OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 39: Introduction to HPC at UNBC

MPIMPIHarry Potter Volume 1Harry Potter Volume 1

SpanishSpanish

FrenchFrench

TranslatorTranslator

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

FrenchFrench

TranslatorTranslator

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 40: Introduction to HPC at UNBC

OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

SpanishSpanish

TranslatorTranslator

Harry Potter Volume 1Harry Potter Volume 1

Harry Potter Volume 2Harry Potter Volume 2

FrenchFrench

TranslatorTranslator

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 41: Introduction to HPC at UNBC

CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)

GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90

Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 42: Introduction to HPC at UNBC

PGI Compilers (cluster)PGI Compilers (cluster)

PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as

export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH

For 64-bit compilers set PATH asFor 64-bit compilers set PATH as

export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH

Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90

C pgccmpiccC pgccmpicc

C++ pgCC mpicxxC++ pgCC mpicxx

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 43: Introduction to HPC at UNBC

Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a

Fortran code mpihellofFortran code mpihellof

On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash

lmpichlmpich

On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 44: Introduction to HPC at UNBC

Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich

Which mpirunWhich mpirun

[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun

[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip

More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 45: Introduction to HPC at UNBC

Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes

ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi

Code with OpenMp directivesCode with OpenMp directives

ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec

Automatic ParallelizationAutomatic Parallelization

ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 46: Introduction to HPC at UNBC

More About CompilersMore About CompilersOn columbiaOn columbia

man ifort -M optintelfc90manman ifort -M optintelfc90man

man icc -M optintelcc90manman icc -M optintelcc90man

On andreiOn andrei

man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man

man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 47: Introduction to HPC at UNBC

Getting started with OpenMPGetting started with OpenMP

Key pointsKey points

ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler

directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 48: Introduction to HPC at UNBC

C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip

============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel

printf (Hello worldn)printf (Hello worldn)

return 0return 0

================================1048707================================1048707

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 49: Introduction to HPC at UNBC

Fortran Fortran OpenMP compiler OpenMP compiler directivedirective

Parallel regions in Fortran hellip

program helloc$omp parallel

print lsquoHello worldrsquoc$omp end parallel

end

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 50: Introduction to HPC at UNBC

Compiling and RunningCompiling and Running

Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-

openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace

Page 51: Introduction to HPC at UNBC

Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id

homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk

for increasing the for increasing the spacespace

hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a

weekweekndash Contact Jean Wang Contact Jean Wang

for increasing the for increasing the spacespace