View
227
Download
1
Tags:
Embed Size (px)
Citation preview
Introduction to HPC at UNBCIntroduction to HPC at UNBC
The Enhanced High Performance The Enhanced High Performance Computing CenterComputing Center
Dr You Qin (Jean) WangDr You Qin (Jean) Wang
February 13 2008 February 13 2008
Summary of the presentationSummary of the presentation
Who needs HPCWho needs HPC
What kind of software do we haveWhat kind of software do we have
What kind of hardware do we haveWhat kind of hardware do we have
How to access the HPC systemsHow to access the HPC systems
Parallel programming basicsParallel programming basics
Who needs HPCWho needs HPCHPC Domains of Applications at UNBCHPC Domains of Applications at UNBC
Atmospheric ScienceAtmospheric ScienceEnvironmental ScienceEnvironmental ScienceGeophysicsGeophysicsChemistryChemistryComputer ScienceComputer ScienceForestForestPhysicsPhysicsEngineering Engineering
Who needs HPCWho needs HPC
We use HPC to solve problems that We use HPC to solve problems that cant be solved in a reasonable cant be solved in a reasonable amount of time using a single amount of time using a single desktop computerdesktop computer
Problems solved using HPCProblems solved using HPC Needs large quantity of RAMNeeds large quantity of RAM Requires large quantity of CPUsRequires large quantity of CPUs
HPC Users SummeryHPC Users Summery
On February 6 2008On February 6 2008
Total Users 73Total Users 73
Professors 16Professors 16
Post-doctoral 7Post-doctoral 7
Ph D students 5Ph D students 5
Master Students and Others 45Master Students and Others 45
What kind of software do we haveWhat kind of software do we have
IDL + ENVIIDL + ENVIMATLAB + ToolboxesMATLAB + ToolboxesTecplotTecplotSTATASTATANAG Fortran LibraryNAG Fortran LibraryFLUENTFLUENTPGI CompilersPGI CompilersIntel CompilersIntel Compilers
What kind of software do we haveWhat kind of software do we have
IDL ndash the ideal software for data IDL ndash the ideal software for data analysis visualization and cross-analysis visualization and cross-platform application developmentplatform application development
ENVI - the premier software solution ENVI - the premier software solution to quickly easily and accurately to quickly easily and accurately extract information from geospatial extract information from geospatial imageryimagery
What kind of software do we haveWhat kind of software do we haveMATLAB is a high-level technical computing MATLAB is a high-level technical computing language and interactive environment for language and interactive environment for algorithm development data visualization data algorithm development data visualization data analysis and numeric computationanalysis and numeric computation
MATLAB ToolboxesMATLAB Toolboxesndash Curve FittingCurve Fittingndash Distributed ComputingDistributed Computingndash Image ProcessingImage Processingndash Mapping Mapping ndash Neural NetworkNeural Networkndash StatisticsStatistics
What kind of software do we haveWhat kind of software do we have
Two images plotted Two images plotted using Tecplot by using Tecplot by Dr Jean WangDr Jean Wang
Pressure Contour Pressure Contour around a Prolate around a Prolate Spheroid Spheroid
What kind of software do we haveWhat kind of software do we have
Why use STATAWhy use STATA
STATA is a complete integrated STATA is a complete integrated statistical package that provides statistical package that provides everything you need for data everything you need for data analysis data management and analysis data management and graphicsgraphics
What kind of software do we haveWhat kind of software do we haveThe NAG Fortran Library - the largest The NAG Fortran Library - the largest commercially available collection of numerical commercially available collection of numerical algorithms for Fortran todayalgorithms for Fortran today
Calling NAG LibraryCalling NAG Libraryndash Set Environmental Variables before you run your jobSet Environmental Variables before you run your job
LM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatLM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatexport LM_LICENSE_FILEexport LM_LICENSE_FILE
optintelfc90binifort -r8 testfor ndashLoptintelfc90binifort -r8 testfor ndashLusrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnagso -o testexe usrlocalfll6420dclliblibnagso -o testexe
What kind of software do we haveWhat kind of software do we have
FLUENT ndash Flow Modeling SoftwareFLUENT ndash Flow Modeling Software
What kind of hardware do we haveWhat kind of hardware do we have
SGI Altix 3000 ndash 64 processorSGI Altix 3000 ndash 64 processor
Linux Cluster ndash 128 processor Linux Cluster ndash 128 processor (Opteron)(Opteron)
File Server File Server
Windows Terminal ServerWindows Terminal Server
10 Workstations in HPC Lab10 Workstations in HPC Lab
Geowall systems for visualizationGeowall systems for visualization
SGI Altix 3000 ndash columbiaunbccaSGI Altix 3000 ndash columbiaunbcca64 Processors64 Processorsndash Intel Itanium2 (15Ghz)Intel Itanium2 (15Ghz)ndash 4Mb Cache4Mb Cache
64 Gb RAM64 Gb RAMndash 1Gbprocessor1Gbprocessor
NumaLink interconnectNumaLink interconnectndash 64Gbs64Gbsndash Fat TreeFat Tree
10GbE network connection10GbE network connectionSuse Linux Enterprise Suse Linux Enterprise Server 9Server 9
Linux Cluster ndash andreiunbccaLinux Cluster ndash andreiunbcca64 Nodes (128 processors) + 64 Nodes (128 processors) + Head NodeHead Nodendash AMD Opteron (21Ghz) AMD Opteron (21Ghz)
(2node)(2node)ndash 144Gb RAM (2node + 16 for 144Gb RAM (2node + 16 for
head)head)
GigE interconnectGigE interconnectndash Two nortel switchesTwo nortel switchesndash Network access via head Network access via head
nodenode
Operating SystemOperating Systemndash Suse 93Suse 93
StorageStoragendash 17 Tb of local storage on 17 Tb of local storage on
head node for software and head node for software and local copieslocal copies
File ServerFile ServerSGI Altix 350SGI Altix 350ndash 4p 8Gb RAM4p 8Gb RAM
SGI TP9100SGI TP9100ndash 6Tb Storage6Tb Storagendash RAID 5 with hot RAID 5 with hot
sparespare
10GbE network 10GbE network connectionconnectionMaintain type Maintain type backupbackup
Windows Terminal Server ndash Windows Terminal Server ndash ithacaunbccaithacaunbcca
Dell PowerEdge 6800Dell PowerEdge 6800ndash 4p (Intel Xeon 24Ghz)4p (Intel Xeon 24Ghz)ndash 8Gb RAM8Gb RAM
Local Raid for system volumeLocal Raid for system volumendash 600Gb volume600Gb volume
Accessible from anywhereAccessible from anywhere
Runs windows applicationsRuns windows applications
Workstations at HPC LabWorkstations at HPC Lab
Dell Precision 470Dell Precision 470ndash 2 Intel Xeon Processors (32Ghz)2 Intel Xeon Processors (32Ghz)ndash 2Gb RAM2Gb RAMndash NVidia Quadro FX3400 256MbNVidia Quadro FX3400 256Mbndash 2 Dell 20rdquo LCD displays2 Dell 20rdquo LCD displays
GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay
How to access the HPC systemsHow to access the HPC systems
From Windows to WindowsFrom Windows to Windows
From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection
Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-
01unbcca01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to LinuxFrom Linux to Linux
ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca
ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-
6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt
How to access the HPC systemsHow to access the HPC systems
From Windows to LinuxFrom Windows to Linux
Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom
httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and
select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN
ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
On a Linux machineOn a Linux machinendash smbmount smbmount
pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN
ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to
Reminder to HPC usersReminder to HPC users
Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc
Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei
What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang
Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism
Less fish vs more Less fish vs more fishfish
What is ParallelismWhat is Parallelism
Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem
The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data
Kinds of ParallelismKinds of Parallelism
Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI
Distributed Memory ndash MPIDistributed Memory ndash MPI
The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing
Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour
Shared Memory Shared Memory ParallelismParallelism
If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours
Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown
And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours
The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min
The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour
Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min
Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return
Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos
Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly
Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other
All data are privateAll data are private
Processes communicate by passing Processes communicate by passing messagesmessages
The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)
Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead
The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until
everyone is readyeveryone is ready
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
Summary of the presentationSummary of the presentation
Who needs HPCWho needs HPC
What kind of software do we haveWhat kind of software do we have
What kind of hardware do we haveWhat kind of hardware do we have
How to access the HPC systemsHow to access the HPC systems
Parallel programming basicsParallel programming basics
Who needs HPCWho needs HPCHPC Domains of Applications at UNBCHPC Domains of Applications at UNBC
Atmospheric ScienceAtmospheric ScienceEnvironmental ScienceEnvironmental ScienceGeophysicsGeophysicsChemistryChemistryComputer ScienceComputer ScienceForestForestPhysicsPhysicsEngineering Engineering
Who needs HPCWho needs HPC
We use HPC to solve problems that We use HPC to solve problems that cant be solved in a reasonable cant be solved in a reasonable amount of time using a single amount of time using a single desktop computerdesktop computer
Problems solved using HPCProblems solved using HPC Needs large quantity of RAMNeeds large quantity of RAM Requires large quantity of CPUsRequires large quantity of CPUs
HPC Users SummeryHPC Users Summery
On February 6 2008On February 6 2008
Total Users 73Total Users 73
Professors 16Professors 16
Post-doctoral 7Post-doctoral 7
Ph D students 5Ph D students 5
Master Students and Others 45Master Students and Others 45
What kind of software do we haveWhat kind of software do we have
IDL + ENVIIDL + ENVIMATLAB + ToolboxesMATLAB + ToolboxesTecplotTecplotSTATASTATANAG Fortran LibraryNAG Fortran LibraryFLUENTFLUENTPGI CompilersPGI CompilersIntel CompilersIntel Compilers
What kind of software do we haveWhat kind of software do we have
IDL ndash the ideal software for data IDL ndash the ideal software for data analysis visualization and cross-analysis visualization and cross-platform application developmentplatform application development
ENVI - the premier software solution ENVI - the premier software solution to quickly easily and accurately to quickly easily and accurately extract information from geospatial extract information from geospatial imageryimagery
What kind of software do we haveWhat kind of software do we haveMATLAB is a high-level technical computing MATLAB is a high-level technical computing language and interactive environment for language and interactive environment for algorithm development data visualization data algorithm development data visualization data analysis and numeric computationanalysis and numeric computation
MATLAB ToolboxesMATLAB Toolboxesndash Curve FittingCurve Fittingndash Distributed ComputingDistributed Computingndash Image ProcessingImage Processingndash Mapping Mapping ndash Neural NetworkNeural Networkndash StatisticsStatistics
What kind of software do we haveWhat kind of software do we have
Two images plotted Two images plotted using Tecplot by using Tecplot by Dr Jean WangDr Jean Wang
Pressure Contour Pressure Contour around a Prolate around a Prolate Spheroid Spheroid
What kind of software do we haveWhat kind of software do we have
Why use STATAWhy use STATA
STATA is a complete integrated STATA is a complete integrated statistical package that provides statistical package that provides everything you need for data everything you need for data analysis data management and analysis data management and graphicsgraphics
What kind of software do we haveWhat kind of software do we haveThe NAG Fortran Library - the largest The NAG Fortran Library - the largest commercially available collection of numerical commercially available collection of numerical algorithms for Fortran todayalgorithms for Fortran today
Calling NAG LibraryCalling NAG Libraryndash Set Environmental Variables before you run your jobSet Environmental Variables before you run your job
LM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatLM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatexport LM_LICENSE_FILEexport LM_LICENSE_FILE
optintelfc90binifort -r8 testfor ndashLoptintelfc90binifort -r8 testfor ndashLusrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnagso -o testexe usrlocalfll6420dclliblibnagso -o testexe
What kind of software do we haveWhat kind of software do we have
FLUENT ndash Flow Modeling SoftwareFLUENT ndash Flow Modeling Software
What kind of hardware do we haveWhat kind of hardware do we have
SGI Altix 3000 ndash 64 processorSGI Altix 3000 ndash 64 processor
Linux Cluster ndash 128 processor Linux Cluster ndash 128 processor (Opteron)(Opteron)
File Server File Server
Windows Terminal ServerWindows Terminal Server
10 Workstations in HPC Lab10 Workstations in HPC Lab
Geowall systems for visualizationGeowall systems for visualization
SGI Altix 3000 ndash columbiaunbccaSGI Altix 3000 ndash columbiaunbcca64 Processors64 Processorsndash Intel Itanium2 (15Ghz)Intel Itanium2 (15Ghz)ndash 4Mb Cache4Mb Cache
64 Gb RAM64 Gb RAMndash 1Gbprocessor1Gbprocessor
NumaLink interconnectNumaLink interconnectndash 64Gbs64Gbsndash Fat TreeFat Tree
10GbE network connection10GbE network connectionSuse Linux Enterprise Suse Linux Enterprise Server 9Server 9
Linux Cluster ndash andreiunbccaLinux Cluster ndash andreiunbcca64 Nodes (128 processors) + 64 Nodes (128 processors) + Head NodeHead Nodendash AMD Opteron (21Ghz) AMD Opteron (21Ghz)
(2node)(2node)ndash 144Gb RAM (2node + 16 for 144Gb RAM (2node + 16 for
head)head)
GigE interconnectGigE interconnectndash Two nortel switchesTwo nortel switchesndash Network access via head Network access via head
nodenode
Operating SystemOperating Systemndash Suse 93Suse 93
StorageStoragendash 17 Tb of local storage on 17 Tb of local storage on
head node for software and head node for software and local copieslocal copies
File ServerFile ServerSGI Altix 350SGI Altix 350ndash 4p 8Gb RAM4p 8Gb RAM
SGI TP9100SGI TP9100ndash 6Tb Storage6Tb Storagendash RAID 5 with hot RAID 5 with hot
sparespare
10GbE network 10GbE network connectionconnectionMaintain type Maintain type backupbackup
Windows Terminal Server ndash Windows Terminal Server ndash ithacaunbccaithacaunbcca
Dell PowerEdge 6800Dell PowerEdge 6800ndash 4p (Intel Xeon 24Ghz)4p (Intel Xeon 24Ghz)ndash 8Gb RAM8Gb RAM
Local Raid for system volumeLocal Raid for system volumendash 600Gb volume600Gb volume
Accessible from anywhereAccessible from anywhere
Runs windows applicationsRuns windows applications
Workstations at HPC LabWorkstations at HPC Lab
Dell Precision 470Dell Precision 470ndash 2 Intel Xeon Processors (32Ghz)2 Intel Xeon Processors (32Ghz)ndash 2Gb RAM2Gb RAMndash NVidia Quadro FX3400 256MbNVidia Quadro FX3400 256Mbndash 2 Dell 20rdquo LCD displays2 Dell 20rdquo LCD displays
GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay
How to access the HPC systemsHow to access the HPC systems
From Windows to WindowsFrom Windows to Windows
From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection
Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-
01unbcca01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to LinuxFrom Linux to Linux
ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca
ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-
6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt
How to access the HPC systemsHow to access the HPC systems
From Windows to LinuxFrom Windows to Linux
Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom
httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and
select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN
ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
On a Linux machineOn a Linux machinendash smbmount smbmount
pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN
ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to
Reminder to HPC usersReminder to HPC users
Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc
Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei
What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang
Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism
Less fish vs more Less fish vs more fishfish
What is ParallelismWhat is Parallelism
Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem
The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data
Kinds of ParallelismKinds of Parallelism
Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI
Distributed Memory ndash MPIDistributed Memory ndash MPI
The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing
Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour
Shared Memory Shared Memory ParallelismParallelism
If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours
Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown
And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours
The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min
The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour
Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min
Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return
Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos
Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly
Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other
All data are privateAll data are private
Processes communicate by passing Processes communicate by passing messagesmessages
The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)
Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead
The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until
everyone is readyeveryone is ready
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
Who needs HPCWho needs HPCHPC Domains of Applications at UNBCHPC Domains of Applications at UNBC
Atmospheric ScienceAtmospheric ScienceEnvironmental ScienceEnvironmental ScienceGeophysicsGeophysicsChemistryChemistryComputer ScienceComputer ScienceForestForestPhysicsPhysicsEngineering Engineering
Who needs HPCWho needs HPC
We use HPC to solve problems that We use HPC to solve problems that cant be solved in a reasonable cant be solved in a reasonable amount of time using a single amount of time using a single desktop computerdesktop computer
Problems solved using HPCProblems solved using HPC Needs large quantity of RAMNeeds large quantity of RAM Requires large quantity of CPUsRequires large quantity of CPUs
HPC Users SummeryHPC Users Summery
On February 6 2008On February 6 2008
Total Users 73Total Users 73
Professors 16Professors 16
Post-doctoral 7Post-doctoral 7
Ph D students 5Ph D students 5
Master Students and Others 45Master Students and Others 45
What kind of software do we haveWhat kind of software do we have
IDL + ENVIIDL + ENVIMATLAB + ToolboxesMATLAB + ToolboxesTecplotTecplotSTATASTATANAG Fortran LibraryNAG Fortran LibraryFLUENTFLUENTPGI CompilersPGI CompilersIntel CompilersIntel Compilers
What kind of software do we haveWhat kind of software do we have
IDL ndash the ideal software for data IDL ndash the ideal software for data analysis visualization and cross-analysis visualization and cross-platform application developmentplatform application development
ENVI - the premier software solution ENVI - the premier software solution to quickly easily and accurately to quickly easily and accurately extract information from geospatial extract information from geospatial imageryimagery
What kind of software do we haveWhat kind of software do we haveMATLAB is a high-level technical computing MATLAB is a high-level technical computing language and interactive environment for language and interactive environment for algorithm development data visualization data algorithm development data visualization data analysis and numeric computationanalysis and numeric computation
MATLAB ToolboxesMATLAB Toolboxesndash Curve FittingCurve Fittingndash Distributed ComputingDistributed Computingndash Image ProcessingImage Processingndash Mapping Mapping ndash Neural NetworkNeural Networkndash StatisticsStatistics
What kind of software do we haveWhat kind of software do we have
Two images plotted Two images plotted using Tecplot by using Tecplot by Dr Jean WangDr Jean Wang
Pressure Contour Pressure Contour around a Prolate around a Prolate Spheroid Spheroid
What kind of software do we haveWhat kind of software do we have
Why use STATAWhy use STATA
STATA is a complete integrated STATA is a complete integrated statistical package that provides statistical package that provides everything you need for data everything you need for data analysis data management and analysis data management and graphicsgraphics
What kind of software do we haveWhat kind of software do we haveThe NAG Fortran Library - the largest The NAG Fortran Library - the largest commercially available collection of numerical commercially available collection of numerical algorithms for Fortran todayalgorithms for Fortran today
Calling NAG LibraryCalling NAG Libraryndash Set Environmental Variables before you run your jobSet Environmental Variables before you run your job
LM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatLM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatexport LM_LICENSE_FILEexport LM_LICENSE_FILE
optintelfc90binifort -r8 testfor ndashLoptintelfc90binifort -r8 testfor ndashLusrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnagso -o testexe usrlocalfll6420dclliblibnagso -o testexe
What kind of software do we haveWhat kind of software do we have
FLUENT ndash Flow Modeling SoftwareFLUENT ndash Flow Modeling Software
What kind of hardware do we haveWhat kind of hardware do we have
SGI Altix 3000 ndash 64 processorSGI Altix 3000 ndash 64 processor
Linux Cluster ndash 128 processor Linux Cluster ndash 128 processor (Opteron)(Opteron)
File Server File Server
Windows Terminal ServerWindows Terminal Server
10 Workstations in HPC Lab10 Workstations in HPC Lab
Geowall systems for visualizationGeowall systems for visualization
SGI Altix 3000 ndash columbiaunbccaSGI Altix 3000 ndash columbiaunbcca64 Processors64 Processorsndash Intel Itanium2 (15Ghz)Intel Itanium2 (15Ghz)ndash 4Mb Cache4Mb Cache
64 Gb RAM64 Gb RAMndash 1Gbprocessor1Gbprocessor
NumaLink interconnectNumaLink interconnectndash 64Gbs64Gbsndash Fat TreeFat Tree
10GbE network connection10GbE network connectionSuse Linux Enterprise Suse Linux Enterprise Server 9Server 9
Linux Cluster ndash andreiunbccaLinux Cluster ndash andreiunbcca64 Nodes (128 processors) + 64 Nodes (128 processors) + Head NodeHead Nodendash AMD Opteron (21Ghz) AMD Opteron (21Ghz)
(2node)(2node)ndash 144Gb RAM (2node + 16 for 144Gb RAM (2node + 16 for
head)head)
GigE interconnectGigE interconnectndash Two nortel switchesTwo nortel switchesndash Network access via head Network access via head
nodenode
Operating SystemOperating Systemndash Suse 93Suse 93
StorageStoragendash 17 Tb of local storage on 17 Tb of local storage on
head node for software and head node for software and local copieslocal copies
File ServerFile ServerSGI Altix 350SGI Altix 350ndash 4p 8Gb RAM4p 8Gb RAM
SGI TP9100SGI TP9100ndash 6Tb Storage6Tb Storagendash RAID 5 with hot RAID 5 with hot
sparespare
10GbE network 10GbE network connectionconnectionMaintain type Maintain type backupbackup
Windows Terminal Server ndash Windows Terminal Server ndash ithacaunbccaithacaunbcca
Dell PowerEdge 6800Dell PowerEdge 6800ndash 4p (Intel Xeon 24Ghz)4p (Intel Xeon 24Ghz)ndash 8Gb RAM8Gb RAM
Local Raid for system volumeLocal Raid for system volumendash 600Gb volume600Gb volume
Accessible from anywhereAccessible from anywhere
Runs windows applicationsRuns windows applications
Workstations at HPC LabWorkstations at HPC Lab
Dell Precision 470Dell Precision 470ndash 2 Intel Xeon Processors (32Ghz)2 Intel Xeon Processors (32Ghz)ndash 2Gb RAM2Gb RAMndash NVidia Quadro FX3400 256MbNVidia Quadro FX3400 256Mbndash 2 Dell 20rdquo LCD displays2 Dell 20rdquo LCD displays
GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay
How to access the HPC systemsHow to access the HPC systems
From Windows to WindowsFrom Windows to Windows
From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection
Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-
01unbcca01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to LinuxFrom Linux to Linux
ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca
ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-
6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt
How to access the HPC systemsHow to access the HPC systems
From Windows to LinuxFrom Windows to Linux
Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom
httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and
select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN
ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
On a Linux machineOn a Linux machinendash smbmount smbmount
pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN
ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to
Reminder to HPC usersReminder to HPC users
Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc
Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei
What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang
Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism
Less fish vs more Less fish vs more fishfish
What is ParallelismWhat is Parallelism
Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem
The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data
Kinds of ParallelismKinds of Parallelism
Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI
Distributed Memory ndash MPIDistributed Memory ndash MPI
The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing
Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour
Shared Memory Shared Memory ParallelismParallelism
If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours
Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown
And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours
The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min
The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour
Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min
Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return
Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos
Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly
Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other
All data are privateAll data are private
Processes communicate by passing Processes communicate by passing messagesmessages
The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)
Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead
The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until
everyone is readyeveryone is ready
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
Who needs HPCWho needs HPC
We use HPC to solve problems that We use HPC to solve problems that cant be solved in a reasonable cant be solved in a reasonable amount of time using a single amount of time using a single desktop computerdesktop computer
Problems solved using HPCProblems solved using HPC Needs large quantity of RAMNeeds large quantity of RAM Requires large quantity of CPUsRequires large quantity of CPUs
HPC Users SummeryHPC Users Summery
On February 6 2008On February 6 2008
Total Users 73Total Users 73
Professors 16Professors 16
Post-doctoral 7Post-doctoral 7
Ph D students 5Ph D students 5
Master Students and Others 45Master Students and Others 45
What kind of software do we haveWhat kind of software do we have
IDL + ENVIIDL + ENVIMATLAB + ToolboxesMATLAB + ToolboxesTecplotTecplotSTATASTATANAG Fortran LibraryNAG Fortran LibraryFLUENTFLUENTPGI CompilersPGI CompilersIntel CompilersIntel Compilers
What kind of software do we haveWhat kind of software do we have
IDL ndash the ideal software for data IDL ndash the ideal software for data analysis visualization and cross-analysis visualization and cross-platform application developmentplatform application development
ENVI - the premier software solution ENVI - the premier software solution to quickly easily and accurately to quickly easily and accurately extract information from geospatial extract information from geospatial imageryimagery
What kind of software do we haveWhat kind of software do we haveMATLAB is a high-level technical computing MATLAB is a high-level technical computing language and interactive environment for language and interactive environment for algorithm development data visualization data algorithm development data visualization data analysis and numeric computationanalysis and numeric computation
MATLAB ToolboxesMATLAB Toolboxesndash Curve FittingCurve Fittingndash Distributed ComputingDistributed Computingndash Image ProcessingImage Processingndash Mapping Mapping ndash Neural NetworkNeural Networkndash StatisticsStatistics
What kind of software do we haveWhat kind of software do we have
Two images plotted Two images plotted using Tecplot by using Tecplot by Dr Jean WangDr Jean Wang
Pressure Contour Pressure Contour around a Prolate around a Prolate Spheroid Spheroid
What kind of software do we haveWhat kind of software do we have
Why use STATAWhy use STATA
STATA is a complete integrated STATA is a complete integrated statistical package that provides statistical package that provides everything you need for data everything you need for data analysis data management and analysis data management and graphicsgraphics
What kind of software do we haveWhat kind of software do we haveThe NAG Fortran Library - the largest The NAG Fortran Library - the largest commercially available collection of numerical commercially available collection of numerical algorithms for Fortran todayalgorithms for Fortran today
Calling NAG LibraryCalling NAG Libraryndash Set Environmental Variables before you run your jobSet Environmental Variables before you run your job
LM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatLM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatexport LM_LICENSE_FILEexport LM_LICENSE_FILE
optintelfc90binifort -r8 testfor ndashLoptintelfc90binifort -r8 testfor ndashLusrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnagso -o testexe usrlocalfll6420dclliblibnagso -o testexe
What kind of software do we haveWhat kind of software do we have
FLUENT ndash Flow Modeling SoftwareFLUENT ndash Flow Modeling Software
What kind of hardware do we haveWhat kind of hardware do we have
SGI Altix 3000 ndash 64 processorSGI Altix 3000 ndash 64 processor
Linux Cluster ndash 128 processor Linux Cluster ndash 128 processor (Opteron)(Opteron)
File Server File Server
Windows Terminal ServerWindows Terminal Server
10 Workstations in HPC Lab10 Workstations in HPC Lab
Geowall systems for visualizationGeowall systems for visualization
SGI Altix 3000 ndash columbiaunbccaSGI Altix 3000 ndash columbiaunbcca64 Processors64 Processorsndash Intel Itanium2 (15Ghz)Intel Itanium2 (15Ghz)ndash 4Mb Cache4Mb Cache
64 Gb RAM64 Gb RAMndash 1Gbprocessor1Gbprocessor
NumaLink interconnectNumaLink interconnectndash 64Gbs64Gbsndash Fat TreeFat Tree
10GbE network connection10GbE network connectionSuse Linux Enterprise Suse Linux Enterprise Server 9Server 9
Linux Cluster ndash andreiunbccaLinux Cluster ndash andreiunbcca64 Nodes (128 processors) + 64 Nodes (128 processors) + Head NodeHead Nodendash AMD Opteron (21Ghz) AMD Opteron (21Ghz)
(2node)(2node)ndash 144Gb RAM (2node + 16 for 144Gb RAM (2node + 16 for
head)head)
GigE interconnectGigE interconnectndash Two nortel switchesTwo nortel switchesndash Network access via head Network access via head
nodenode
Operating SystemOperating Systemndash Suse 93Suse 93
StorageStoragendash 17 Tb of local storage on 17 Tb of local storage on
head node for software and head node for software and local copieslocal copies
File ServerFile ServerSGI Altix 350SGI Altix 350ndash 4p 8Gb RAM4p 8Gb RAM
SGI TP9100SGI TP9100ndash 6Tb Storage6Tb Storagendash RAID 5 with hot RAID 5 with hot
sparespare
10GbE network 10GbE network connectionconnectionMaintain type Maintain type backupbackup
Windows Terminal Server ndash Windows Terminal Server ndash ithacaunbccaithacaunbcca
Dell PowerEdge 6800Dell PowerEdge 6800ndash 4p (Intel Xeon 24Ghz)4p (Intel Xeon 24Ghz)ndash 8Gb RAM8Gb RAM
Local Raid for system volumeLocal Raid for system volumendash 600Gb volume600Gb volume
Accessible from anywhereAccessible from anywhere
Runs windows applicationsRuns windows applications
Workstations at HPC LabWorkstations at HPC Lab
Dell Precision 470Dell Precision 470ndash 2 Intel Xeon Processors (32Ghz)2 Intel Xeon Processors (32Ghz)ndash 2Gb RAM2Gb RAMndash NVidia Quadro FX3400 256MbNVidia Quadro FX3400 256Mbndash 2 Dell 20rdquo LCD displays2 Dell 20rdquo LCD displays
GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay
How to access the HPC systemsHow to access the HPC systems
From Windows to WindowsFrom Windows to Windows
From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection
Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-
01unbcca01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to LinuxFrom Linux to Linux
ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca
ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-
6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt
How to access the HPC systemsHow to access the HPC systems
From Windows to LinuxFrom Windows to Linux
Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom
httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and
select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN
ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
On a Linux machineOn a Linux machinendash smbmount smbmount
pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN
ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to
Reminder to HPC usersReminder to HPC users
Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc
Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei
What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang
Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism
Less fish vs more Less fish vs more fishfish
What is ParallelismWhat is Parallelism
Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem
The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data
Kinds of ParallelismKinds of Parallelism
Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI
Distributed Memory ndash MPIDistributed Memory ndash MPI
The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing
Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour
Shared Memory Shared Memory ParallelismParallelism
If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours
Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown
And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours
The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min
The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour
Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min
Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return
Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos
Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly
Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other
All data are privateAll data are private
Processes communicate by passing Processes communicate by passing messagesmessages
The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)
Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead
The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until
everyone is readyeveryone is ready
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
HPC Users SummeryHPC Users Summery
On February 6 2008On February 6 2008
Total Users 73Total Users 73
Professors 16Professors 16
Post-doctoral 7Post-doctoral 7
Ph D students 5Ph D students 5
Master Students and Others 45Master Students and Others 45
What kind of software do we haveWhat kind of software do we have
IDL + ENVIIDL + ENVIMATLAB + ToolboxesMATLAB + ToolboxesTecplotTecplotSTATASTATANAG Fortran LibraryNAG Fortran LibraryFLUENTFLUENTPGI CompilersPGI CompilersIntel CompilersIntel Compilers
What kind of software do we haveWhat kind of software do we have
IDL ndash the ideal software for data IDL ndash the ideal software for data analysis visualization and cross-analysis visualization and cross-platform application developmentplatform application development
ENVI - the premier software solution ENVI - the premier software solution to quickly easily and accurately to quickly easily and accurately extract information from geospatial extract information from geospatial imageryimagery
What kind of software do we haveWhat kind of software do we haveMATLAB is a high-level technical computing MATLAB is a high-level technical computing language and interactive environment for language and interactive environment for algorithm development data visualization data algorithm development data visualization data analysis and numeric computationanalysis and numeric computation
MATLAB ToolboxesMATLAB Toolboxesndash Curve FittingCurve Fittingndash Distributed ComputingDistributed Computingndash Image ProcessingImage Processingndash Mapping Mapping ndash Neural NetworkNeural Networkndash StatisticsStatistics
What kind of software do we haveWhat kind of software do we have
Two images plotted Two images plotted using Tecplot by using Tecplot by Dr Jean WangDr Jean Wang
Pressure Contour Pressure Contour around a Prolate around a Prolate Spheroid Spheroid
What kind of software do we haveWhat kind of software do we have
Why use STATAWhy use STATA
STATA is a complete integrated STATA is a complete integrated statistical package that provides statistical package that provides everything you need for data everything you need for data analysis data management and analysis data management and graphicsgraphics
What kind of software do we haveWhat kind of software do we haveThe NAG Fortran Library - the largest The NAG Fortran Library - the largest commercially available collection of numerical commercially available collection of numerical algorithms for Fortran todayalgorithms for Fortran today
Calling NAG LibraryCalling NAG Libraryndash Set Environmental Variables before you run your jobSet Environmental Variables before you run your job
LM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatLM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatexport LM_LICENSE_FILEexport LM_LICENSE_FILE
optintelfc90binifort -r8 testfor ndashLoptintelfc90binifort -r8 testfor ndashLusrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnagso -o testexe usrlocalfll6420dclliblibnagso -o testexe
What kind of software do we haveWhat kind of software do we have
FLUENT ndash Flow Modeling SoftwareFLUENT ndash Flow Modeling Software
What kind of hardware do we haveWhat kind of hardware do we have
SGI Altix 3000 ndash 64 processorSGI Altix 3000 ndash 64 processor
Linux Cluster ndash 128 processor Linux Cluster ndash 128 processor (Opteron)(Opteron)
File Server File Server
Windows Terminal ServerWindows Terminal Server
10 Workstations in HPC Lab10 Workstations in HPC Lab
Geowall systems for visualizationGeowall systems for visualization
SGI Altix 3000 ndash columbiaunbccaSGI Altix 3000 ndash columbiaunbcca64 Processors64 Processorsndash Intel Itanium2 (15Ghz)Intel Itanium2 (15Ghz)ndash 4Mb Cache4Mb Cache
64 Gb RAM64 Gb RAMndash 1Gbprocessor1Gbprocessor
NumaLink interconnectNumaLink interconnectndash 64Gbs64Gbsndash Fat TreeFat Tree
10GbE network connection10GbE network connectionSuse Linux Enterprise Suse Linux Enterprise Server 9Server 9
Linux Cluster ndash andreiunbccaLinux Cluster ndash andreiunbcca64 Nodes (128 processors) + 64 Nodes (128 processors) + Head NodeHead Nodendash AMD Opteron (21Ghz) AMD Opteron (21Ghz)
(2node)(2node)ndash 144Gb RAM (2node + 16 for 144Gb RAM (2node + 16 for
head)head)
GigE interconnectGigE interconnectndash Two nortel switchesTwo nortel switchesndash Network access via head Network access via head
nodenode
Operating SystemOperating Systemndash Suse 93Suse 93
StorageStoragendash 17 Tb of local storage on 17 Tb of local storage on
head node for software and head node for software and local copieslocal copies
File ServerFile ServerSGI Altix 350SGI Altix 350ndash 4p 8Gb RAM4p 8Gb RAM
SGI TP9100SGI TP9100ndash 6Tb Storage6Tb Storagendash RAID 5 with hot RAID 5 with hot
sparespare
10GbE network 10GbE network connectionconnectionMaintain type Maintain type backupbackup
Windows Terminal Server ndash Windows Terminal Server ndash ithacaunbccaithacaunbcca
Dell PowerEdge 6800Dell PowerEdge 6800ndash 4p (Intel Xeon 24Ghz)4p (Intel Xeon 24Ghz)ndash 8Gb RAM8Gb RAM
Local Raid for system volumeLocal Raid for system volumendash 600Gb volume600Gb volume
Accessible from anywhereAccessible from anywhere
Runs windows applicationsRuns windows applications
Workstations at HPC LabWorkstations at HPC Lab
Dell Precision 470Dell Precision 470ndash 2 Intel Xeon Processors (32Ghz)2 Intel Xeon Processors (32Ghz)ndash 2Gb RAM2Gb RAMndash NVidia Quadro FX3400 256MbNVidia Quadro FX3400 256Mbndash 2 Dell 20rdquo LCD displays2 Dell 20rdquo LCD displays
GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay
How to access the HPC systemsHow to access the HPC systems
From Windows to WindowsFrom Windows to Windows
From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection
Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-
01unbcca01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to LinuxFrom Linux to Linux
ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca
ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-
6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt
How to access the HPC systemsHow to access the HPC systems
From Windows to LinuxFrom Windows to Linux
Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom
httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and
select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN
ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
On a Linux machineOn a Linux machinendash smbmount smbmount
pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN
ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to
Reminder to HPC usersReminder to HPC users
Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc
Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei
What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang
Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism
Less fish vs more Less fish vs more fishfish
What is ParallelismWhat is Parallelism
Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem
The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data
Kinds of ParallelismKinds of Parallelism
Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI
Distributed Memory ndash MPIDistributed Memory ndash MPI
The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing
Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour
Shared Memory Shared Memory ParallelismParallelism
If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours
Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown
And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours
The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min
The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour
Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min
Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return
Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos
Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly
Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other
All data are privateAll data are private
Processes communicate by passing Processes communicate by passing messagesmessages
The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)
Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead
The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until
everyone is readyeveryone is ready
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
What kind of software do we haveWhat kind of software do we have
IDL + ENVIIDL + ENVIMATLAB + ToolboxesMATLAB + ToolboxesTecplotTecplotSTATASTATANAG Fortran LibraryNAG Fortran LibraryFLUENTFLUENTPGI CompilersPGI CompilersIntel CompilersIntel Compilers
What kind of software do we haveWhat kind of software do we have
IDL ndash the ideal software for data IDL ndash the ideal software for data analysis visualization and cross-analysis visualization and cross-platform application developmentplatform application development
ENVI - the premier software solution ENVI - the premier software solution to quickly easily and accurately to quickly easily and accurately extract information from geospatial extract information from geospatial imageryimagery
What kind of software do we haveWhat kind of software do we haveMATLAB is a high-level technical computing MATLAB is a high-level technical computing language and interactive environment for language and interactive environment for algorithm development data visualization data algorithm development data visualization data analysis and numeric computationanalysis and numeric computation
MATLAB ToolboxesMATLAB Toolboxesndash Curve FittingCurve Fittingndash Distributed ComputingDistributed Computingndash Image ProcessingImage Processingndash Mapping Mapping ndash Neural NetworkNeural Networkndash StatisticsStatistics
What kind of software do we haveWhat kind of software do we have
Two images plotted Two images plotted using Tecplot by using Tecplot by Dr Jean WangDr Jean Wang
Pressure Contour Pressure Contour around a Prolate around a Prolate Spheroid Spheroid
What kind of software do we haveWhat kind of software do we have
Why use STATAWhy use STATA
STATA is a complete integrated STATA is a complete integrated statistical package that provides statistical package that provides everything you need for data everything you need for data analysis data management and analysis data management and graphicsgraphics
What kind of software do we haveWhat kind of software do we haveThe NAG Fortran Library - the largest The NAG Fortran Library - the largest commercially available collection of numerical commercially available collection of numerical algorithms for Fortran todayalgorithms for Fortran today
Calling NAG LibraryCalling NAG Libraryndash Set Environmental Variables before you run your jobSet Environmental Variables before you run your job
LM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatLM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatexport LM_LICENSE_FILEexport LM_LICENSE_FILE
optintelfc90binifort -r8 testfor ndashLoptintelfc90binifort -r8 testfor ndashLusrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnagso -o testexe usrlocalfll6420dclliblibnagso -o testexe
What kind of software do we haveWhat kind of software do we have
FLUENT ndash Flow Modeling SoftwareFLUENT ndash Flow Modeling Software
What kind of hardware do we haveWhat kind of hardware do we have
SGI Altix 3000 ndash 64 processorSGI Altix 3000 ndash 64 processor
Linux Cluster ndash 128 processor Linux Cluster ndash 128 processor (Opteron)(Opteron)
File Server File Server
Windows Terminal ServerWindows Terminal Server
10 Workstations in HPC Lab10 Workstations in HPC Lab
Geowall systems for visualizationGeowall systems for visualization
SGI Altix 3000 ndash columbiaunbccaSGI Altix 3000 ndash columbiaunbcca64 Processors64 Processorsndash Intel Itanium2 (15Ghz)Intel Itanium2 (15Ghz)ndash 4Mb Cache4Mb Cache
64 Gb RAM64 Gb RAMndash 1Gbprocessor1Gbprocessor
NumaLink interconnectNumaLink interconnectndash 64Gbs64Gbsndash Fat TreeFat Tree
10GbE network connection10GbE network connectionSuse Linux Enterprise Suse Linux Enterprise Server 9Server 9
Linux Cluster ndash andreiunbccaLinux Cluster ndash andreiunbcca64 Nodes (128 processors) + 64 Nodes (128 processors) + Head NodeHead Nodendash AMD Opteron (21Ghz) AMD Opteron (21Ghz)
(2node)(2node)ndash 144Gb RAM (2node + 16 for 144Gb RAM (2node + 16 for
head)head)
GigE interconnectGigE interconnectndash Two nortel switchesTwo nortel switchesndash Network access via head Network access via head
nodenode
Operating SystemOperating Systemndash Suse 93Suse 93
StorageStoragendash 17 Tb of local storage on 17 Tb of local storage on
head node for software and head node for software and local copieslocal copies
File ServerFile ServerSGI Altix 350SGI Altix 350ndash 4p 8Gb RAM4p 8Gb RAM
SGI TP9100SGI TP9100ndash 6Tb Storage6Tb Storagendash RAID 5 with hot RAID 5 with hot
sparespare
10GbE network 10GbE network connectionconnectionMaintain type Maintain type backupbackup
Windows Terminal Server ndash Windows Terminal Server ndash ithacaunbccaithacaunbcca
Dell PowerEdge 6800Dell PowerEdge 6800ndash 4p (Intel Xeon 24Ghz)4p (Intel Xeon 24Ghz)ndash 8Gb RAM8Gb RAM
Local Raid for system volumeLocal Raid for system volumendash 600Gb volume600Gb volume
Accessible from anywhereAccessible from anywhere
Runs windows applicationsRuns windows applications
Workstations at HPC LabWorkstations at HPC Lab
Dell Precision 470Dell Precision 470ndash 2 Intel Xeon Processors (32Ghz)2 Intel Xeon Processors (32Ghz)ndash 2Gb RAM2Gb RAMndash NVidia Quadro FX3400 256MbNVidia Quadro FX3400 256Mbndash 2 Dell 20rdquo LCD displays2 Dell 20rdquo LCD displays
GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay
How to access the HPC systemsHow to access the HPC systems
From Windows to WindowsFrom Windows to Windows
From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection
Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-
01unbcca01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to LinuxFrom Linux to Linux
ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca
ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-
6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt
How to access the HPC systemsHow to access the HPC systems
From Windows to LinuxFrom Windows to Linux
Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom
httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and
select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN
ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
On a Linux machineOn a Linux machinendash smbmount smbmount
pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN
ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to
Reminder to HPC usersReminder to HPC users
Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc
Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei
What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang
Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism
Less fish vs more Less fish vs more fishfish
What is ParallelismWhat is Parallelism
Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem
The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data
Kinds of ParallelismKinds of Parallelism
Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI
Distributed Memory ndash MPIDistributed Memory ndash MPI
The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing
Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour
Shared Memory Shared Memory ParallelismParallelism
If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours
Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown
And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours
The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min
The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour
Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min
Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return
Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos
Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly
Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other
All data are privateAll data are private
Processes communicate by passing Processes communicate by passing messagesmessages
The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)
Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead
The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until
everyone is readyeveryone is ready
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
What kind of software do we haveWhat kind of software do we have
IDL ndash the ideal software for data IDL ndash the ideal software for data analysis visualization and cross-analysis visualization and cross-platform application developmentplatform application development
ENVI - the premier software solution ENVI - the premier software solution to quickly easily and accurately to quickly easily and accurately extract information from geospatial extract information from geospatial imageryimagery
What kind of software do we haveWhat kind of software do we haveMATLAB is a high-level technical computing MATLAB is a high-level technical computing language and interactive environment for language and interactive environment for algorithm development data visualization data algorithm development data visualization data analysis and numeric computationanalysis and numeric computation
MATLAB ToolboxesMATLAB Toolboxesndash Curve FittingCurve Fittingndash Distributed ComputingDistributed Computingndash Image ProcessingImage Processingndash Mapping Mapping ndash Neural NetworkNeural Networkndash StatisticsStatistics
What kind of software do we haveWhat kind of software do we have
Two images plotted Two images plotted using Tecplot by using Tecplot by Dr Jean WangDr Jean Wang
Pressure Contour Pressure Contour around a Prolate around a Prolate Spheroid Spheroid
What kind of software do we haveWhat kind of software do we have
Why use STATAWhy use STATA
STATA is a complete integrated STATA is a complete integrated statistical package that provides statistical package that provides everything you need for data everything you need for data analysis data management and analysis data management and graphicsgraphics
What kind of software do we haveWhat kind of software do we haveThe NAG Fortran Library - the largest The NAG Fortran Library - the largest commercially available collection of numerical commercially available collection of numerical algorithms for Fortran todayalgorithms for Fortran today
Calling NAG LibraryCalling NAG Libraryndash Set Environmental Variables before you run your jobSet Environmental Variables before you run your job
LM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatLM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatexport LM_LICENSE_FILEexport LM_LICENSE_FILE
optintelfc90binifort -r8 testfor ndashLoptintelfc90binifort -r8 testfor ndashLusrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnagso -o testexe usrlocalfll6420dclliblibnagso -o testexe
What kind of software do we haveWhat kind of software do we have
FLUENT ndash Flow Modeling SoftwareFLUENT ndash Flow Modeling Software
What kind of hardware do we haveWhat kind of hardware do we have
SGI Altix 3000 ndash 64 processorSGI Altix 3000 ndash 64 processor
Linux Cluster ndash 128 processor Linux Cluster ndash 128 processor (Opteron)(Opteron)
File Server File Server
Windows Terminal ServerWindows Terminal Server
10 Workstations in HPC Lab10 Workstations in HPC Lab
Geowall systems for visualizationGeowall systems for visualization
SGI Altix 3000 ndash columbiaunbccaSGI Altix 3000 ndash columbiaunbcca64 Processors64 Processorsndash Intel Itanium2 (15Ghz)Intel Itanium2 (15Ghz)ndash 4Mb Cache4Mb Cache
64 Gb RAM64 Gb RAMndash 1Gbprocessor1Gbprocessor
NumaLink interconnectNumaLink interconnectndash 64Gbs64Gbsndash Fat TreeFat Tree
10GbE network connection10GbE network connectionSuse Linux Enterprise Suse Linux Enterprise Server 9Server 9
Linux Cluster ndash andreiunbccaLinux Cluster ndash andreiunbcca64 Nodes (128 processors) + 64 Nodes (128 processors) + Head NodeHead Nodendash AMD Opteron (21Ghz) AMD Opteron (21Ghz)
(2node)(2node)ndash 144Gb RAM (2node + 16 for 144Gb RAM (2node + 16 for
head)head)
GigE interconnectGigE interconnectndash Two nortel switchesTwo nortel switchesndash Network access via head Network access via head
nodenode
Operating SystemOperating Systemndash Suse 93Suse 93
StorageStoragendash 17 Tb of local storage on 17 Tb of local storage on
head node for software and head node for software and local copieslocal copies
File ServerFile ServerSGI Altix 350SGI Altix 350ndash 4p 8Gb RAM4p 8Gb RAM
SGI TP9100SGI TP9100ndash 6Tb Storage6Tb Storagendash RAID 5 with hot RAID 5 with hot
sparespare
10GbE network 10GbE network connectionconnectionMaintain type Maintain type backupbackup
Windows Terminal Server ndash Windows Terminal Server ndash ithacaunbccaithacaunbcca
Dell PowerEdge 6800Dell PowerEdge 6800ndash 4p (Intel Xeon 24Ghz)4p (Intel Xeon 24Ghz)ndash 8Gb RAM8Gb RAM
Local Raid for system volumeLocal Raid for system volumendash 600Gb volume600Gb volume
Accessible from anywhereAccessible from anywhere
Runs windows applicationsRuns windows applications
Workstations at HPC LabWorkstations at HPC Lab
Dell Precision 470Dell Precision 470ndash 2 Intel Xeon Processors (32Ghz)2 Intel Xeon Processors (32Ghz)ndash 2Gb RAM2Gb RAMndash NVidia Quadro FX3400 256MbNVidia Quadro FX3400 256Mbndash 2 Dell 20rdquo LCD displays2 Dell 20rdquo LCD displays
GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay
How to access the HPC systemsHow to access the HPC systems
From Windows to WindowsFrom Windows to Windows
From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection
Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-
01unbcca01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to LinuxFrom Linux to Linux
ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca
ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-
6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt
How to access the HPC systemsHow to access the HPC systems
From Windows to LinuxFrom Windows to Linux
Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom
httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and
select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN
ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
On a Linux machineOn a Linux machinendash smbmount smbmount
pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN
ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to
Reminder to HPC usersReminder to HPC users
Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc
Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei
What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang
Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism
Less fish vs more Less fish vs more fishfish
What is ParallelismWhat is Parallelism
Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem
The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data
Kinds of ParallelismKinds of Parallelism
Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI
Distributed Memory ndash MPIDistributed Memory ndash MPI
The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing
Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour
Shared Memory Shared Memory ParallelismParallelism
If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours
Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown
And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours
The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min
The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour
Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min
Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return
Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos
Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly
Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other
All data are privateAll data are private
Processes communicate by passing Processes communicate by passing messagesmessages
The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)
Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead
The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until
everyone is readyeveryone is ready
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
What kind of software do we haveWhat kind of software do we haveMATLAB is a high-level technical computing MATLAB is a high-level technical computing language and interactive environment for language and interactive environment for algorithm development data visualization data algorithm development data visualization data analysis and numeric computationanalysis and numeric computation
MATLAB ToolboxesMATLAB Toolboxesndash Curve FittingCurve Fittingndash Distributed ComputingDistributed Computingndash Image ProcessingImage Processingndash Mapping Mapping ndash Neural NetworkNeural Networkndash StatisticsStatistics
What kind of software do we haveWhat kind of software do we have
Two images plotted Two images plotted using Tecplot by using Tecplot by Dr Jean WangDr Jean Wang
Pressure Contour Pressure Contour around a Prolate around a Prolate Spheroid Spheroid
What kind of software do we haveWhat kind of software do we have
Why use STATAWhy use STATA
STATA is a complete integrated STATA is a complete integrated statistical package that provides statistical package that provides everything you need for data everything you need for data analysis data management and analysis data management and graphicsgraphics
What kind of software do we haveWhat kind of software do we haveThe NAG Fortran Library - the largest The NAG Fortran Library - the largest commercially available collection of numerical commercially available collection of numerical algorithms for Fortran todayalgorithms for Fortran today
Calling NAG LibraryCalling NAG Libraryndash Set Environmental Variables before you run your jobSet Environmental Variables before you run your job
LM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatLM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatexport LM_LICENSE_FILEexport LM_LICENSE_FILE
optintelfc90binifort -r8 testfor ndashLoptintelfc90binifort -r8 testfor ndashLusrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnagso -o testexe usrlocalfll6420dclliblibnagso -o testexe
What kind of software do we haveWhat kind of software do we have
FLUENT ndash Flow Modeling SoftwareFLUENT ndash Flow Modeling Software
What kind of hardware do we haveWhat kind of hardware do we have
SGI Altix 3000 ndash 64 processorSGI Altix 3000 ndash 64 processor
Linux Cluster ndash 128 processor Linux Cluster ndash 128 processor (Opteron)(Opteron)
File Server File Server
Windows Terminal ServerWindows Terminal Server
10 Workstations in HPC Lab10 Workstations in HPC Lab
Geowall systems for visualizationGeowall systems for visualization
SGI Altix 3000 ndash columbiaunbccaSGI Altix 3000 ndash columbiaunbcca64 Processors64 Processorsndash Intel Itanium2 (15Ghz)Intel Itanium2 (15Ghz)ndash 4Mb Cache4Mb Cache
64 Gb RAM64 Gb RAMndash 1Gbprocessor1Gbprocessor
NumaLink interconnectNumaLink interconnectndash 64Gbs64Gbsndash Fat TreeFat Tree
10GbE network connection10GbE network connectionSuse Linux Enterprise Suse Linux Enterprise Server 9Server 9
Linux Cluster ndash andreiunbccaLinux Cluster ndash andreiunbcca64 Nodes (128 processors) + 64 Nodes (128 processors) + Head NodeHead Nodendash AMD Opteron (21Ghz) AMD Opteron (21Ghz)
(2node)(2node)ndash 144Gb RAM (2node + 16 for 144Gb RAM (2node + 16 for
head)head)
GigE interconnectGigE interconnectndash Two nortel switchesTwo nortel switchesndash Network access via head Network access via head
nodenode
Operating SystemOperating Systemndash Suse 93Suse 93
StorageStoragendash 17 Tb of local storage on 17 Tb of local storage on
head node for software and head node for software and local copieslocal copies
File ServerFile ServerSGI Altix 350SGI Altix 350ndash 4p 8Gb RAM4p 8Gb RAM
SGI TP9100SGI TP9100ndash 6Tb Storage6Tb Storagendash RAID 5 with hot RAID 5 with hot
sparespare
10GbE network 10GbE network connectionconnectionMaintain type Maintain type backupbackup
Windows Terminal Server ndash Windows Terminal Server ndash ithacaunbccaithacaunbcca
Dell PowerEdge 6800Dell PowerEdge 6800ndash 4p (Intel Xeon 24Ghz)4p (Intel Xeon 24Ghz)ndash 8Gb RAM8Gb RAM
Local Raid for system volumeLocal Raid for system volumendash 600Gb volume600Gb volume
Accessible from anywhereAccessible from anywhere
Runs windows applicationsRuns windows applications
Workstations at HPC LabWorkstations at HPC Lab
Dell Precision 470Dell Precision 470ndash 2 Intel Xeon Processors (32Ghz)2 Intel Xeon Processors (32Ghz)ndash 2Gb RAM2Gb RAMndash NVidia Quadro FX3400 256MbNVidia Quadro FX3400 256Mbndash 2 Dell 20rdquo LCD displays2 Dell 20rdquo LCD displays
GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay
How to access the HPC systemsHow to access the HPC systems
From Windows to WindowsFrom Windows to Windows
From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection
Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-
01unbcca01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to LinuxFrom Linux to Linux
ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca
ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-
6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt
How to access the HPC systemsHow to access the HPC systems
From Windows to LinuxFrom Windows to Linux
Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom
httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and
select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN
ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
On a Linux machineOn a Linux machinendash smbmount smbmount
pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN
ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to
Reminder to HPC usersReminder to HPC users
Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc
Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei
What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang
Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism
Less fish vs more Less fish vs more fishfish
What is ParallelismWhat is Parallelism
Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem
The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data
Kinds of ParallelismKinds of Parallelism
Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI
Distributed Memory ndash MPIDistributed Memory ndash MPI
The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing
Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour
Shared Memory Shared Memory ParallelismParallelism
If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours
Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown
And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours
The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min
The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour
Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min
Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return
Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos
Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly
Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other
All data are privateAll data are private
Processes communicate by passing Processes communicate by passing messagesmessages
The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)
Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead
The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until
everyone is readyeveryone is ready
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
What kind of software do we haveWhat kind of software do we have
Two images plotted Two images plotted using Tecplot by using Tecplot by Dr Jean WangDr Jean Wang
Pressure Contour Pressure Contour around a Prolate around a Prolate Spheroid Spheroid
What kind of software do we haveWhat kind of software do we have
Why use STATAWhy use STATA
STATA is a complete integrated STATA is a complete integrated statistical package that provides statistical package that provides everything you need for data everything you need for data analysis data management and analysis data management and graphicsgraphics
What kind of software do we haveWhat kind of software do we haveThe NAG Fortran Library - the largest The NAG Fortran Library - the largest commercially available collection of numerical commercially available collection of numerical algorithms for Fortran todayalgorithms for Fortran today
Calling NAG LibraryCalling NAG Libraryndash Set Environmental Variables before you run your jobSet Environmental Variables before you run your job
LM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatLM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatexport LM_LICENSE_FILEexport LM_LICENSE_FILE
optintelfc90binifort -r8 testfor ndashLoptintelfc90binifort -r8 testfor ndashLusrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnagso -o testexe usrlocalfll6420dclliblibnagso -o testexe
What kind of software do we haveWhat kind of software do we have
FLUENT ndash Flow Modeling SoftwareFLUENT ndash Flow Modeling Software
What kind of hardware do we haveWhat kind of hardware do we have
SGI Altix 3000 ndash 64 processorSGI Altix 3000 ndash 64 processor
Linux Cluster ndash 128 processor Linux Cluster ndash 128 processor (Opteron)(Opteron)
File Server File Server
Windows Terminal ServerWindows Terminal Server
10 Workstations in HPC Lab10 Workstations in HPC Lab
Geowall systems for visualizationGeowall systems for visualization
SGI Altix 3000 ndash columbiaunbccaSGI Altix 3000 ndash columbiaunbcca64 Processors64 Processorsndash Intel Itanium2 (15Ghz)Intel Itanium2 (15Ghz)ndash 4Mb Cache4Mb Cache
64 Gb RAM64 Gb RAMndash 1Gbprocessor1Gbprocessor
NumaLink interconnectNumaLink interconnectndash 64Gbs64Gbsndash Fat TreeFat Tree
10GbE network connection10GbE network connectionSuse Linux Enterprise Suse Linux Enterprise Server 9Server 9
Linux Cluster ndash andreiunbccaLinux Cluster ndash andreiunbcca64 Nodes (128 processors) + 64 Nodes (128 processors) + Head NodeHead Nodendash AMD Opteron (21Ghz) AMD Opteron (21Ghz)
(2node)(2node)ndash 144Gb RAM (2node + 16 for 144Gb RAM (2node + 16 for
head)head)
GigE interconnectGigE interconnectndash Two nortel switchesTwo nortel switchesndash Network access via head Network access via head
nodenode
Operating SystemOperating Systemndash Suse 93Suse 93
StorageStoragendash 17 Tb of local storage on 17 Tb of local storage on
head node for software and head node for software and local copieslocal copies
File ServerFile ServerSGI Altix 350SGI Altix 350ndash 4p 8Gb RAM4p 8Gb RAM
SGI TP9100SGI TP9100ndash 6Tb Storage6Tb Storagendash RAID 5 with hot RAID 5 with hot
sparespare
10GbE network 10GbE network connectionconnectionMaintain type Maintain type backupbackup
Windows Terminal Server ndash Windows Terminal Server ndash ithacaunbccaithacaunbcca
Dell PowerEdge 6800Dell PowerEdge 6800ndash 4p (Intel Xeon 24Ghz)4p (Intel Xeon 24Ghz)ndash 8Gb RAM8Gb RAM
Local Raid for system volumeLocal Raid for system volumendash 600Gb volume600Gb volume
Accessible from anywhereAccessible from anywhere
Runs windows applicationsRuns windows applications
Workstations at HPC LabWorkstations at HPC Lab
Dell Precision 470Dell Precision 470ndash 2 Intel Xeon Processors (32Ghz)2 Intel Xeon Processors (32Ghz)ndash 2Gb RAM2Gb RAMndash NVidia Quadro FX3400 256MbNVidia Quadro FX3400 256Mbndash 2 Dell 20rdquo LCD displays2 Dell 20rdquo LCD displays
GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay
How to access the HPC systemsHow to access the HPC systems
From Windows to WindowsFrom Windows to Windows
From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection
Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-
01unbcca01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to LinuxFrom Linux to Linux
ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca
ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-
6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt
How to access the HPC systemsHow to access the HPC systems
From Windows to LinuxFrom Windows to Linux
Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom
httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and
select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN
ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
On a Linux machineOn a Linux machinendash smbmount smbmount
pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN
ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to
Reminder to HPC usersReminder to HPC users
Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc
Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei
What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang
Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism
Less fish vs more Less fish vs more fishfish
What is ParallelismWhat is Parallelism
Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem
The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data
Kinds of ParallelismKinds of Parallelism
Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI
Distributed Memory ndash MPIDistributed Memory ndash MPI
The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing
Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour
Shared Memory Shared Memory ParallelismParallelism
If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours
Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown
And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours
The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min
The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour
Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min
Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return
Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos
Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly
Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other
All data are privateAll data are private
Processes communicate by passing Processes communicate by passing messagesmessages
The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)
Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead
The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until
everyone is readyeveryone is ready
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
What kind of software do we haveWhat kind of software do we have
Why use STATAWhy use STATA
STATA is a complete integrated STATA is a complete integrated statistical package that provides statistical package that provides everything you need for data everything you need for data analysis data management and analysis data management and graphicsgraphics
What kind of software do we haveWhat kind of software do we haveThe NAG Fortran Library - the largest The NAG Fortran Library - the largest commercially available collection of numerical commercially available collection of numerical algorithms for Fortran todayalgorithms for Fortran today
Calling NAG LibraryCalling NAG Libraryndash Set Environmental Variables before you run your jobSet Environmental Variables before you run your job
LM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatLM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatexport LM_LICENSE_FILEexport LM_LICENSE_FILE
optintelfc90binifort -r8 testfor ndashLoptintelfc90binifort -r8 testfor ndashLusrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnagso -o testexe usrlocalfll6420dclliblibnagso -o testexe
What kind of software do we haveWhat kind of software do we have
FLUENT ndash Flow Modeling SoftwareFLUENT ndash Flow Modeling Software
What kind of hardware do we haveWhat kind of hardware do we have
SGI Altix 3000 ndash 64 processorSGI Altix 3000 ndash 64 processor
Linux Cluster ndash 128 processor Linux Cluster ndash 128 processor (Opteron)(Opteron)
File Server File Server
Windows Terminal ServerWindows Terminal Server
10 Workstations in HPC Lab10 Workstations in HPC Lab
Geowall systems for visualizationGeowall systems for visualization
SGI Altix 3000 ndash columbiaunbccaSGI Altix 3000 ndash columbiaunbcca64 Processors64 Processorsndash Intel Itanium2 (15Ghz)Intel Itanium2 (15Ghz)ndash 4Mb Cache4Mb Cache
64 Gb RAM64 Gb RAMndash 1Gbprocessor1Gbprocessor
NumaLink interconnectNumaLink interconnectndash 64Gbs64Gbsndash Fat TreeFat Tree
10GbE network connection10GbE network connectionSuse Linux Enterprise Suse Linux Enterprise Server 9Server 9
Linux Cluster ndash andreiunbccaLinux Cluster ndash andreiunbcca64 Nodes (128 processors) + 64 Nodes (128 processors) + Head NodeHead Nodendash AMD Opteron (21Ghz) AMD Opteron (21Ghz)
(2node)(2node)ndash 144Gb RAM (2node + 16 for 144Gb RAM (2node + 16 for
head)head)
GigE interconnectGigE interconnectndash Two nortel switchesTwo nortel switchesndash Network access via head Network access via head
nodenode
Operating SystemOperating Systemndash Suse 93Suse 93
StorageStoragendash 17 Tb of local storage on 17 Tb of local storage on
head node for software and head node for software and local copieslocal copies
File ServerFile ServerSGI Altix 350SGI Altix 350ndash 4p 8Gb RAM4p 8Gb RAM
SGI TP9100SGI TP9100ndash 6Tb Storage6Tb Storagendash RAID 5 with hot RAID 5 with hot
sparespare
10GbE network 10GbE network connectionconnectionMaintain type Maintain type backupbackup
Windows Terminal Server ndash Windows Terminal Server ndash ithacaunbccaithacaunbcca
Dell PowerEdge 6800Dell PowerEdge 6800ndash 4p (Intel Xeon 24Ghz)4p (Intel Xeon 24Ghz)ndash 8Gb RAM8Gb RAM
Local Raid for system volumeLocal Raid for system volumendash 600Gb volume600Gb volume
Accessible from anywhereAccessible from anywhere
Runs windows applicationsRuns windows applications
Workstations at HPC LabWorkstations at HPC Lab
Dell Precision 470Dell Precision 470ndash 2 Intel Xeon Processors (32Ghz)2 Intel Xeon Processors (32Ghz)ndash 2Gb RAM2Gb RAMndash NVidia Quadro FX3400 256MbNVidia Quadro FX3400 256Mbndash 2 Dell 20rdquo LCD displays2 Dell 20rdquo LCD displays
GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay
How to access the HPC systemsHow to access the HPC systems
From Windows to WindowsFrom Windows to Windows
From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection
Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-
01unbcca01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to LinuxFrom Linux to Linux
ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca
ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-
6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt
How to access the HPC systemsHow to access the HPC systems
From Windows to LinuxFrom Windows to Linux
Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom
httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and
select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN
ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
On a Linux machineOn a Linux machinendash smbmount smbmount
pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN
ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to
Reminder to HPC usersReminder to HPC users
Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc
Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei
What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang
Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism
Less fish vs more Less fish vs more fishfish
What is ParallelismWhat is Parallelism
Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem
The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data
Kinds of ParallelismKinds of Parallelism
Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI
Distributed Memory ndash MPIDistributed Memory ndash MPI
The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing
Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour
Shared Memory Shared Memory ParallelismParallelism
If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours
Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown
And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours
The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min
The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour
Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min
Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return
Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos
Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly
Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other
All data are privateAll data are private
Processes communicate by passing Processes communicate by passing messagesmessages
The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)
Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead
The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until
everyone is readyeveryone is ready
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
What kind of software do we haveWhat kind of software do we haveThe NAG Fortran Library - the largest The NAG Fortran Library - the largest commercially available collection of numerical commercially available collection of numerical algorithms for Fortran todayalgorithms for Fortran today
Calling NAG LibraryCalling NAG Libraryndash Set Environmental Variables before you run your jobSet Environmental Variables before you run your job
LM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatLM_LICENSE_FILE=usrlocalfll6420dcllicenselicensedatexport LM_LICENSE_FILEexport LM_LICENSE_FILE
optintelfc90binifort -r8 testfor ndashLoptintelfc90binifort -r8 testfor ndashLusrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnaga usrlocalfll6420dclliblibnagso -o testexe usrlocalfll6420dclliblibnagso -o testexe
What kind of software do we haveWhat kind of software do we have
FLUENT ndash Flow Modeling SoftwareFLUENT ndash Flow Modeling Software
What kind of hardware do we haveWhat kind of hardware do we have
SGI Altix 3000 ndash 64 processorSGI Altix 3000 ndash 64 processor
Linux Cluster ndash 128 processor Linux Cluster ndash 128 processor (Opteron)(Opteron)
File Server File Server
Windows Terminal ServerWindows Terminal Server
10 Workstations in HPC Lab10 Workstations in HPC Lab
Geowall systems for visualizationGeowall systems for visualization
SGI Altix 3000 ndash columbiaunbccaSGI Altix 3000 ndash columbiaunbcca64 Processors64 Processorsndash Intel Itanium2 (15Ghz)Intel Itanium2 (15Ghz)ndash 4Mb Cache4Mb Cache
64 Gb RAM64 Gb RAMndash 1Gbprocessor1Gbprocessor
NumaLink interconnectNumaLink interconnectndash 64Gbs64Gbsndash Fat TreeFat Tree
10GbE network connection10GbE network connectionSuse Linux Enterprise Suse Linux Enterprise Server 9Server 9
Linux Cluster ndash andreiunbccaLinux Cluster ndash andreiunbcca64 Nodes (128 processors) + 64 Nodes (128 processors) + Head NodeHead Nodendash AMD Opteron (21Ghz) AMD Opteron (21Ghz)
(2node)(2node)ndash 144Gb RAM (2node + 16 for 144Gb RAM (2node + 16 for
head)head)
GigE interconnectGigE interconnectndash Two nortel switchesTwo nortel switchesndash Network access via head Network access via head
nodenode
Operating SystemOperating Systemndash Suse 93Suse 93
StorageStoragendash 17 Tb of local storage on 17 Tb of local storage on
head node for software and head node for software and local copieslocal copies
File ServerFile ServerSGI Altix 350SGI Altix 350ndash 4p 8Gb RAM4p 8Gb RAM
SGI TP9100SGI TP9100ndash 6Tb Storage6Tb Storagendash RAID 5 with hot RAID 5 with hot
sparespare
10GbE network 10GbE network connectionconnectionMaintain type Maintain type backupbackup
Windows Terminal Server ndash Windows Terminal Server ndash ithacaunbccaithacaunbcca
Dell PowerEdge 6800Dell PowerEdge 6800ndash 4p (Intel Xeon 24Ghz)4p (Intel Xeon 24Ghz)ndash 8Gb RAM8Gb RAM
Local Raid for system volumeLocal Raid for system volumendash 600Gb volume600Gb volume
Accessible from anywhereAccessible from anywhere
Runs windows applicationsRuns windows applications
Workstations at HPC LabWorkstations at HPC Lab
Dell Precision 470Dell Precision 470ndash 2 Intel Xeon Processors (32Ghz)2 Intel Xeon Processors (32Ghz)ndash 2Gb RAM2Gb RAMndash NVidia Quadro FX3400 256MbNVidia Quadro FX3400 256Mbndash 2 Dell 20rdquo LCD displays2 Dell 20rdquo LCD displays
GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay
How to access the HPC systemsHow to access the HPC systems
From Windows to WindowsFrom Windows to Windows
From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection
Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-
01unbcca01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to LinuxFrom Linux to Linux
ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca
ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-
6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt
How to access the HPC systemsHow to access the HPC systems
From Windows to LinuxFrom Windows to Linux
Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom
httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and
select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN
ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
On a Linux machineOn a Linux machinendash smbmount smbmount
pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN
ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to
Reminder to HPC usersReminder to HPC users
Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc
Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei
What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang
Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism
Less fish vs more Less fish vs more fishfish
What is ParallelismWhat is Parallelism
Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem
The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data
Kinds of ParallelismKinds of Parallelism
Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI
Distributed Memory ndash MPIDistributed Memory ndash MPI
The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing
Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour
Shared Memory Shared Memory ParallelismParallelism
If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours
Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown
And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours
The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min
The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour
Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min
Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return
Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos
Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly
Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other
All data are privateAll data are private
Processes communicate by passing Processes communicate by passing messagesmessages
The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)
Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead
The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until
everyone is readyeveryone is ready
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
What kind of software do we haveWhat kind of software do we have
FLUENT ndash Flow Modeling SoftwareFLUENT ndash Flow Modeling Software
What kind of hardware do we haveWhat kind of hardware do we have
SGI Altix 3000 ndash 64 processorSGI Altix 3000 ndash 64 processor
Linux Cluster ndash 128 processor Linux Cluster ndash 128 processor (Opteron)(Opteron)
File Server File Server
Windows Terminal ServerWindows Terminal Server
10 Workstations in HPC Lab10 Workstations in HPC Lab
Geowall systems for visualizationGeowall systems for visualization
SGI Altix 3000 ndash columbiaunbccaSGI Altix 3000 ndash columbiaunbcca64 Processors64 Processorsndash Intel Itanium2 (15Ghz)Intel Itanium2 (15Ghz)ndash 4Mb Cache4Mb Cache
64 Gb RAM64 Gb RAMndash 1Gbprocessor1Gbprocessor
NumaLink interconnectNumaLink interconnectndash 64Gbs64Gbsndash Fat TreeFat Tree
10GbE network connection10GbE network connectionSuse Linux Enterprise Suse Linux Enterprise Server 9Server 9
Linux Cluster ndash andreiunbccaLinux Cluster ndash andreiunbcca64 Nodes (128 processors) + 64 Nodes (128 processors) + Head NodeHead Nodendash AMD Opteron (21Ghz) AMD Opteron (21Ghz)
(2node)(2node)ndash 144Gb RAM (2node + 16 for 144Gb RAM (2node + 16 for
head)head)
GigE interconnectGigE interconnectndash Two nortel switchesTwo nortel switchesndash Network access via head Network access via head
nodenode
Operating SystemOperating Systemndash Suse 93Suse 93
StorageStoragendash 17 Tb of local storage on 17 Tb of local storage on
head node for software and head node for software and local copieslocal copies
File ServerFile ServerSGI Altix 350SGI Altix 350ndash 4p 8Gb RAM4p 8Gb RAM
SGI TP9100SGI TP9100ndash 6Tb Storage6Tb Storagendash RAID 5 with hot RAID 5 with hot
sparespare
10GbE network 10GbE network connectionconnectionMaintain type Maintain type backupbackup
Windows Terminal Server ndash Windows Terminal Server ndash ithacaunbccaithacaunbcca
Dell PowerEdge 6800Dell PowerEdge 6800ndash 4p (Intel Xeon 24Ghz)4p (Intel Xeon 24Ghz)ndash 8Gb RAM8Gb RAM
Local Raid for system volumeLocal Raid for system volumendash 600Gb volume600Gb volume
Accessible from anywhereAccessible from anywhere
Runs windows applicationsRuns windows applications
Workstations at HPC LabWorkstations at HPC Lab
Dell Precision 470Dell Precision 470ndash 2 Intel Xeon Processors (32Ghz)2 Intel Xeon Processors (32Ghz)ndash 2Gb RAM2Gb RAMndash NVidia Quadro FX3400 256MbNVidia Quadro FX3400 256Mbndash 2 Dell 20rdquo LCD displays2 Dell 20rdquo LCD displays
GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay
How to access the HPC systemsHow to access the HPC systems
From Windows to WindowsFrom Windows to Windows
From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection
Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-
01unbcca01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to LinuxFrom Linux to Linux
ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca
ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-
6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt
How to access the HPC systemsHow to access the HPC systems
From Windows to LinuxFrom Windows to Linux
Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom
httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and
select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN
ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
On a Linux machineOn a Linux machinendash smbmount smbmount
pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN
ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to
Reminder to HPC usersReminder to HPC users
Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc
Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei
What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang
Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism
Less fish vs more Less fish vs more fishfish
What is ParallelismWhat is Parallelism
Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem
The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data
Kinds of ParallelismKinds of Parallelism
Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI
Distributed Memory ndash MPIDistributed Memory ndash MPI
The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing
Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour
Shared Memory Shared Memory ParallelismParallelism
If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours
Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown
And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours
The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min
The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour
Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min
Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return
Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos
Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly
Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other
All data are privateAll data are private
Processes communicate by passing Processes communicate by passing messagesmessages
The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)
Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead
The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until
everyone is readyeveryone is ready
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
What kind of hardware do we haveWhat kind of hardware do we have
SGI Altix 3000 ndash 64 processorSGI Altix 3000 ndash 64 processor
Linux Cluster ndash 128 processor Linux Cluster ndash 128 processor (Opteron)(Opteron)
File Server File Server
Windows Terminal ServerWindows Terminal Server
10 Workstations in HPC Lab10 Workstations in HPC Lab
Geowall systems for visualizationGeowall systems for visualization
SGI Altix 3000 ndash columbiaunbccaSGI Altix 3000 ndash columbiaunbcca64 Processors64 Processorsndash Intel Itanium2 (15Ghz)Intel Itanium2 (15Ghz)ndash 4Mb Cache4Mb Cache
64 Gb RAM64 Gb RAMndash 1Gbprocessor1Gbprocessor
NumaLink interconnectNumaLink interconnectndash 64Gbs64Gbsndash Fat TreeFat Tree
10GbE network connection10GbE network connectionSuse Linux Enterprise Suse Linux Enterprise Server 9Server 9
Linux Cluster ndash andreiunbccaLinux Cluster ndash andreiunbcca64 Nodes (128 processors) + 64 Nodes (128 processors) + Head NodeHead Nodendash AMD Opteron (21Ghz) AMD Opteron (21Ghz)
(2node)(2node)ndash 144Gb RAM (2node + 16 for 144Gb RAM (2node + 16 for
head)head)
GigE interconnectGigE interconnectndash Two nortel switchesTwo nortel switchesndash Network access via head Network access via head
nodenode
Operating SystemOperating Systemndash Suse 93Suse 93
StorageStoragendash 17 Tb of local storage on 17 Tb of local storage on
head node for software and head node for software and local copieslocal copies
File ServerFile ServerSGI Altix 350SGI Altix 350ndash 4p 8Gb RAM4p 8Gb RAM
SGI TP9100SGI TP9100ndash 6Tb Storage6Tb Storagendash RAID 5 with hot RAID 5 with hot
sparespare
10GbE network 10GbE network connectionconnectionMaintain type Maintain type backupbackup
Windows Terminal Server ndash Windows Terminal Server ndash ithacaunbccaithacaunbcca
Dell PowerEdge 6800Dell PowerEdge 6800ndash 4p (Intel Xeon 24Ghz)4p (Intel Xeon 24Ghz)ndash 8Gb RAM8Gb RAM
Local Raid for system volumeLocal Raid for system volumendash 600Gb volume600Gb volume
Accessible from anywhereAccessible from anywhere
Runs windows applicationsRuns windows applications
Workstations at HPC LabWorkstations at HPC Lab
Dell Precision 470Dell Precision 470ndash 2 Intel Xeon Processors (32Ghz)2 Intel Xeon Processors (32Ghz)ndash 2Gb RAM2Gb RAMndash NVidia Quadro FX3400 256MbNVidia Quadro FX3400 256Mbndash 2 Dell 20rdquo LCD displays2 Dell 20rdquo LCD displays
GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay
How to access the HPC systemsHow to access the HPC systems
From Windows to WindowsFrom Windows to Windows
From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection
Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-
01unbcca01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to LinuxFrom Linux to Linux
ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca
ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-
6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt
How to access the HPC systemsHow to access the HPC systems
From Windows to LinuxFrom Windows to Linux
Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom
httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and
select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN
ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
On a Linux machineOn a Linux machinendash smbmount smbmount
pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN
ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to
Reminder to HPC usersReminder to HPC users
Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc
Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei
What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang
Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism
Less fish vs more Less fish vs more fishfish
What is ParallelismWhat is Parallelism
Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem
The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data
Kinds of ParallelismKinds of Parallelism
Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI
Distributed Memory ndash MPIDistributed Memory ndash MPI
The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing
Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour
Shared Memory Shared Memory ParallelismParallelism
If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours
Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown
And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours
The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min
The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour
Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min
Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return
Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos
Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly
Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other
All data are privateAll data are private
Processes communicate by passing Processes communicate by passing messagesmessages
The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)
Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead
The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until
everyone is readyeveryone is ready
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
SGI Altix 3000 ndash columbiaunbccaSGI Altix 3000 ndash columbiaunbcca64 Processors64 Processorsndash Intel Itanium2 (15Ghz)Intel Itanium2 (15Ghz)ndash 4Mb Cache4Mb Cache
64 Gb RAM64 Gb RAMndash 1Gbprocessor1Gbprocessor
NumaLink interconnectNumaLink interconnectndash 64Gbs64Gbsndash Fat TreeFat Tree
10GbE network connection10GbE network connectionSuse Linux Enterprise Suse Linux Enterprise Server 9Server 9
Linux Cluster ndash andreiunbccaLinux Cluster ndash andreiunbcca64 Nodes (128 processors) + 64 Nodes (128 processors) + Head NodeHead Nodendash AMD Opteron (21Ghz) AMD Opteron (21Ghz)
(2node)(2node)ndash 144Gb RAM (2node + 16 for 144Gb RAM (2node + 16 for
head)head)
GigE interconnectGigE interconnectndash Two nortel switchesTwo nortel switchesndash Network access via head Network access via head
nodenode
Operating SystemOperating Systemndash Suse 93Suse 93
StorageStoragendash 17 Tb of local storage on 17 Tb of local storage on
head node for software and head node for software and local copieslocal copies
File ServerFile ServerSGI Altix 350SGI Altix 350ndash 4p 8Gb RAM4p 8Gb RAM
SGI TP9100SGI TP9100ndash 6Tb Storage6Tb Storagendash RAID 5 with hot RAID 5 with hot
sparespare
10GbE network 10GbE network connectionconnectionMaintain type Maintain type backupbackup
Windows Terminal Server ndash Windows Terminal Server ndash ithacaunbccaithacaunbcca
Dell PowerEdge 6800Dell PowerEdge 6800ndash 4p (Intel Xeon 24Ghz)4p (Intel Xeon 24Ghz)ndash 8Gb RAM8Gb RAM
Local Raid for system volumeLocal Raid for system volumendash 600Gb volume600Gb volume
Accessible from anywhereAccessible from anywhere
Runs windows applicationsRuns windows applications
Workstations at HPC LabWorkstations at HPC Lab
Dell Precision 470Dell Precision 470ndash 2 Intel Xeon Processors (32Ghz)2 Intel Xeon Processors (32Ghz)ndash 2Gb RAM2Gb RAMndash NVidia Quadro FX3400 256MbNVidia Quadro FX3400 256Mbndash 2 Dell 20rdquo LCD displays2 Dell 20rdquo LCD displays
GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay
How to access the HPC systemsHow to access the HPC systems
From Windows to WindowsFrom Windows to Windows
From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection
Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-
01unbcca01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to LinuxFrom Linux to Linux
ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca
ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-
6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt
How to access the HPC systemsHow to access the HPC systems
From Windows to LinuxFrom Windows to Linux
Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom
httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and
select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN
ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
On a Linux machineOn a Linux machinendash smbmount smbmount
pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN
ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to
Reminder to HPC usersReminder to HPC users
Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc
Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei
What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang
Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism
Less fish vs more Less fish vs more fishfish
What is ParallelismWhat is Parallelism
Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem
The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data
Kinds of ParallelismKinds of Parallelism
Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI
Distributed Memory ndash MPIDistributed Memory ndash MPI
The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing
Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour
Shared Memory Shared Memory ParallelismParallelism
If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours
Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown
And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours
The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min
The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour
Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min
Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return
Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos
Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly
Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other
All data are privateAll data are private
Processes communicate by passing Processes communicate by passing messagesmessages
The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)
Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead
The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until
everyone is readyeveryone is ready
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
Linux Cluster ndash andreiunbccaLinux Cluster ndash andreiunbcca64 Nodes (128 processors) + 64 Nodes (128 processors) + Head NodeHead Nodendash AMD Opteron (21Ghz) AMD Opteron (21Ghz)
(2node)(2node)ndash 144Gb RAM (2node + 16 for 144Gb RAM (2node + 16 for
head)head)
GigE interconnectGigE interconnectndash Two nortel switchesTwo nortel switchesndash Network access via head Network access via head
nodenode
Operating SystemOperating Systemndash Suse 93Suse 93
StorageStoragendash 17 Tb of local storage on 17 Tb of local storage on
head node for software and head node for software and local copieslocal copies
File ServerFile ServerSGI Altix 350SGI Altix 350ndash 4p 8Gb RAM4p 8Gb RAM
SGI TP9100SGI TP9100ndash 6Tb Storage6Tb Storagendash RAID 5 with hot RAID 5 with hot
sparespare
10GbE network 10GbE network connectionconnectionMaintain type Maintain type backupbackup
Windows Terminal Server ndash Windows Terminal Server ndash ithacaunbccaithacaunbcca
Dell PowerEdge 6800Dell PowerEdge 6800ndash 4p (Intel Xeon 24Ghz)4p (Intel Xeon 24Ghz)ndash 8Gb RAM8Gb RAM
Local Raid for system volumeLocal Raid for system volumendash 600Gb volume600Gb volume
Accessible from anywhereAccessible from anywhere
Runs windows applicationsRuns windows applications
Workstations at HPC LabWorkstations at HPC Lab
Dell Precision 470Dell Precision 470ndash 2 Intel Xeon Processors (32Ghz)2 Intel Xeon Processors (32Ghz)ndash 2Gb RAM2Gb RAMndash NVidia Quadro FX3400 256MbNVidia Quadro FX3400 256Mbndash 2 Dell 20rdquo LCD displays2 Dell 20rdquo LCD displays
GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay
How to access the HPC systemsHow to access the HPC systems
From Windows to WindowsFrom Windows to Windows
From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection
Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-
01unbcca01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to LinuxFrom Linux to Linux
ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca
ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-
6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt
How to access the HPC systemsHow to access the HPC systems
From Windows to LinuxFrom Windows to Linux
Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom
httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and
select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN
ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
On a Linux machineOn a Linux machinendash smbmount smbmount
pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN
ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to
Reminder to HPC usersReminder to HPC users
Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc
Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei
What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang
Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism
Less fish vs more Less fish vs more fishfish
What is ParallelismWhat is Parallelism
Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem
The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data
Kinds of ParallelismKinds of Parallelism
Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI
Distributed Memory ndash MPIDistributed Memory ndash MPI
The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing
Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour
Shared Memory Shared Memory ParallelismParallelism
If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours
Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown
And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours
The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min
The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour
Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min
Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return
Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos
Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly
Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other
All data are privateAll data are private
Processes communicate by passing Processes communicate by passing messagesmessages
The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)
Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead
The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until
everyone is readyeveryone is ready
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
File ServerFile ServerSGI Altix 350SGI Altix 350ndash 4p 8Gb RAM4p 8Gb RAM
SGI TP9100SGI TP9100ndash 6Tb Storage6Tb Storagendash RAID 5 with hot RAID 5 with hot
sparespare
10GbE network 10GbE network connectionconnectionMaintain type Maintain type backupbackup
Windows Terminal Server ndash Windows Terminal Server ndash ithacaunbccaithacaunbcca
Dell PowerEdge 6800Dell PowerEdge 6800ndash 4p (Intel Xeon 24Ghz)4p (Intel Xeon 24Ghz)ndash 8Gb RAM8Gb RAM
Local Raid for system volumeLocal Raid for system volumendash 600Gb volume600Gb volume
Accessible from anywhereAccessible from anywhere
Runs windows applicationsRuns windows applications
Workstations at HPC LabWorkstations at HPC Lab
Dell Precision 470Dell Precision 470ndash 2 Intel Xeon Processors (32Ghz)2 Intel Xeon Processors (32Ghz)ndash 2Gb RAM2Gb RAMndash NVidia Quadro FX3400 256MbNVidia Quadro FX3400 256Mbndash 2 Dell 20rdquo LCD displays2 Dell 20rdquo LCD displays
GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay
How to access the HPC systemsHow to access the HPC systems
From Windows to WindowsFrom Windows to Windows
From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection
Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-
01unbcca01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to LinuxFrom Linux to Linux
ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca
ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-
6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt
How to access the HPC systemsHow to access the HPC systems
From Windows to LinuxFrom Windows to Linux
Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom
httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and
select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN
ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
On a Linux machineOn a Linux machinendash smbmount smbmount
pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN
ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to
Reminder to HPC usersReminder to HPC users
Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc
Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei
What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang
Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism
Less fish vs more Less fish vs more fishfish
What is ParallelismWhat is Parallelism
Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem
The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data
Kinds of ParallelismKinds of Parallelism
Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI
Distributed Memory ndash MPIDistributed Memory ndash MPI
The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing
Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour
Shared Memory Shared Memory ParallelismParallelism
If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours
Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown
And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours
The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min
The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour
Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min
Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return
Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos
Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly
Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other
All data are privateAll data are private
Processes communicate by passing Processes communicate by passing messagesmessages
The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)
Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead
The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until
everyone is readyeveryone is ready
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
Windows Terminal Server ndash Windows Terminal Server ndash ithacaunbccaithacaunbcca
Dell PowerEdge 6800Dell PowerEdge 6800ndash 4p (Intel Xeon 24Ghz)4p (Intel Xeon 24Ghz)ndash 8Gb RAM8Gb RAM
Local Raid for system volumeLocal Raid for system volumendash 600Gb volume600Gb volume
Accessible from anywhereAccessible from anywhere
Runs windows applicationsRuns windows applications
Workstations at HPC LabWorkstations at HPC Lab
Dell Precision 470Dell Precision 470ndash 2 Intel Xeon Processors (32Ghz)2 Intel Xeon Processors (32Ghz)ndash 2Gb RAM2Gb RAMndash NVidia Quadro FX3400 256MbNVidia Quadro FX3400 256Mbndash 2 Dell 20rdquo LCD displays2 Dell 20rdquo LCD displays
GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay
How to access the HPC systemsHow to access the HPC systems
From Windows to WindowsFrom Windows to Windows
From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection
Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-
01unbcca01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to LinuxFrom Linux to Linux
ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca
ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-
6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt
How to access the HPC systemsHow to access the HPC systems
From Windows to LinuxFrom Windows to Linux
Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom
httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and
select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN
ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
On a Linux machineOn a Linux machinendash smbmount smbmount
pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN
ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to
Reminder to HPC usersReminder to HPC users
Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc
Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei
What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang
Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism
Less fish vs more Less fish vs more fishfish
What is ParallelismWhat is Parallelism
Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem
The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data
Kinds of ParallelismKinds of Parallelism
Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI
Distributed Memory ndash MPIDistributed Memory ndash MPI
The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing
Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour
Shared Memory Shared Memory ParallelismParallelism
If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours
Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown
And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours
The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min
The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour
Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min
Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return
Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos
Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly
Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other
All data are privateAll data are private
Processes communicate by passing Processes communicate by passing messagesmessages
The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)
Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead
The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until
everyone is readyeveryone is ready
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
Workstations at HPC LabWorkstations at HPC Lab
Dell Precision 470Dell Precision 470ndash 2 Intel Xeon Processors (32Ghz)2 Intel Xeon Processors (32Ghz)ndash 2Gb RAM2Gb RAMndash NVidia Quadro FX3400 256MbNVidia Quadro FX3400 256Mbndash 2 Dell 20rdquo LCD displays2 Dell 20rdquo LCD displays
GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay
How to access the HPC systemsHow to access the HPC systems
From Windows to WindowsFrom Windows to Windows
From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection
Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-
01unbcca01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to LinuxFrom Linux to Linux
ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca
ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-
6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt
How to access the HPC systemsHow to access the HPC systems
From Windows to LinuxFrom Windows to Linux
Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom
httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and
select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN
ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
On a Linux machineOn a Linux machinendash smbmount smbmount
pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN
ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to
Reminder to HPC usersReminder to HPC users
Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc
Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei
What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang
Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism
Less fish vs more Less fish vs more fishfish
What is ParallelismWhat is Parallelism
Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem
The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data
Kinds of ParallelismKinds of Parallelism
Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI
Distributed Memory ndash MPIDistributed Memory ndash MPI
The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing
Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour
Shared Memory Shared Memory ParallelismParallelism
If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours
Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown
And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours
The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min
The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour
Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min
Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return
Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos
Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly
Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other
All data are privateAll data are private
Processes communicate by passing Processes communicate by passing messagesmessages
The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)
Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead
The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until
everyone is readyeveryone is ready
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
GeoWall SystemsGeoWall SystemsTwo SystemsTwo SystemsBoth have a 2 Both have a 2 processors server processors server 15Tb RAID515Tb RAID5GeoWall Room (8-GeoWall Room (8-111) has rear 111) has rear projected displayprojected displayPortable unit has Portable unit has front projected front projected displaydisplay
How to access the HPC systemsHow to access the HPC systems
From Windows to WindowsFrom Windows to Windows
From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection
Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-
01unbcca01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to LinuxFrom Linux to Linux
ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca
ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-
6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt
How to access the HPC systemsHow to access the HPC systems
From Windows to LinuxFrom Windows to Linux
Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom
httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and
select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN
ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
On a Linux machineOn a Linux machinendash smbmount smbmount
pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN
ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to
Reminder to HPC usersReminder to HPC users
Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc
Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei
What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang
Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism
Less fish vs more Less fish vs more fishfish
What is ParallelismWhat is Parallelism
Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem
The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data
Kinds of ParallelismKinds of Parallelism
Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI
Distributed Memory ndash MPIDistributed Memory ndash MPI
The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing
Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour
Shared Memory Shared Memory ParallelismParallelism
If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours
Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown
And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours
The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min
The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour
Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min
Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return
Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos
Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly
Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other
All data are privateAll data are private
Processes communicate by passing Processes communicate by passing messagesmessages
The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)
Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead
The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until
everyone is readyeveryone is ready
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
How to access the HPC systemsHow to access the HPC systems
From Windows to WindowsFrom Windows to Windows
From Start-gt All Program -gt Accessories -From Start-gt All Program -gt Accessories -gt Communications -gt Remote Desktop gt Communications -gt Remote Desktop ConnectionConnection
Computer pg-hpc-ts-01unbccaComputer pg-hpc-ts-01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-
01unbcca01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to LinuxFrom Linux to Linux
ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca
ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-
6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt
How to access the HPC systemsHow to access the HPC systems
From Windows to LinuxFrom Windows to Linux
Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom
httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and
select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN
ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
On a Linux machineOn a Linux machinendash smbmount smbmount
pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN
ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to
Reminder to HPC usersReminder to HPC users
Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc
Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei
What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang
Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism
Less fish vs more Less fish vs more fishfish
What is ParallelismWhat is Parallelism
Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem
The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data
Kinds of ParallelismKinds of Parallelism
Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI
Distributed Memory ndash MPIDistributed Memory ndash MPI
The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing
Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour
Shared Memory Shared Memory ParallelismParallelism
If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours
Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown
And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours
The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min
The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour
Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min
Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return
Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos
Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly
Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other
All data are privateAll data are private
Processes communicate by passing Processes communicate by passing messagesmessages
The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)
Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead
The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until
everyone is readyeveryone is ready
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
How to access the HPC systemsHow to access the HPC systems
From Linux to WindowsFrom Linux to Windows rdesktop -a15 -g 1280x1024 pg-hpc-ts-rdesktop -a15 -g 1280x1024 pg-hpc-ts-
01unbcca01unbcca
Log on to UNILog on to UNI
How to access the HPC systemsHow to access the HPC systems
From Linux to LinuxFrom Linux to Linux
ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca
ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-
6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt
How to access the HPC systemsHow to access the HPC systems
From Windows to LinuxFrom Windows to Linux
Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom
httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and
select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN
ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
On a Linux machineOn a Linux machinendash smbmount smbmount
pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN
ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to
Reminder to HPC usersReminder to HPC users
Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc
Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei
What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang
Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism
Less fish vs more Less fish vs more fishfish
What is ParallelismWhat is Parallelism
Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem
The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data
Kinds of ParallelismKinds of Parallelism
Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI
Distributed Memory ndash MPIDistributed Memory ndash MPI
The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing
Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour
Shared Memory Shared Memory ParallelismParallelism
If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours
Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown
And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours
The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min
The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour
Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min
Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return
Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos
Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly
Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other
All data are privateAll data are private
Processes communicate by passing Processes communicate by passing messagesmessages
The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)
Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead
The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until
everyone is readyeveryone is ready
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
How to access the HPC systemsHow to access the HPC systems
From Linux to LinuxFrom Linux to Linux
ssh ndashX ssh ndashX yqwangcolumbiaunbccayqwangcolumbiaunbcca
ssh ndashX ssh ndashX yqwangandreiunbccayqwangandreiunbccandash [pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-[pg-hpc-clnode-head ~]gtssh -X pg-hpc-clnode-
6363ndash [pg-hpc-clnode-63 ~]gt[pg-hpc-clnode-63 ~]gt
How to access the HPC systemsHow to access the HPC systems
From Windows to LinuxFrom Windows to Linux
Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom
httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and
select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN
ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
On a Linux machineOn a Linux machinendash smbmount smbmount
pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN
ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to
Reminder to HPC usersReminder to HPC users
Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc
Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei
What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang
Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism
Less fish vs more Less fish vs more fishfish
What is ParallelismWhat is Parallelism
Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem
The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data
Kinds of ParallelismKinds of Parallelism
Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI
Distributed Memory ndash MPIDistributed Memory ndash MPI
The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing
Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour
Shared Memory Shared Memory ParallelismParallelism
If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours
Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown
And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours
The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min
The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour
Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min
Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return
Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos
Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly
Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other
All data are privateAll data are private
Processes communicate by passing Processes communicate by passing messagesmessages
The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)
Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead
The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until
everyone is readyeveryone is ready
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
How to access the HPC systemsHow to access the HPC systems
From Windows to LinuxFrom Windows to Linux
Download software ldquoXmanager 20rdquo Download software ldquoXmanager 20rdquo fromfrom
httpwwwdownloadcomhttpwwwdownloadcomXmanager3000-2155_4-Xmanager3000-2155_4-10038129html10038129html
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and
select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN
ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
On a Linux machineOn a Linux machinendash smbmount smbmount
pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN
ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to
Reminder to HPC usersReminder to HPC users
Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc
Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei
What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang
Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism
Less fish vs more Less fish vs more fishfish
What is ParallelismWhat is Parallelism
Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem
The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data
Kinds of ParallelismKinds of Parallelism
Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI
Distributed Memory ndash MPIDistributed Memory ndash MPI
The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing
Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour
Shared Memory Shared Memory ParallelismParallelism
If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours
Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown
And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours
The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min
The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour
Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min
Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return
Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos
Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly
Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other
All data are privateAll data are private
Processes communicate by passing Processes communicate by passing messagesmessages
The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)
Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead
The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until
everyone is readyeveryone is ready
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
Under windowsUnder windowsndash Simply right click on My Computer and Simply right click on My Computer and
select Map Network Drive and then select Map Network Drive and then choose pg-hpc-fs-01unbccaLOGIN choose pg-hpc-fs-01unbccaLOGIN
ndash replacing LOGIN with your UNI loginreplacing LOGIN with your UNI login
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
On a Linux machineOn a Linux machinendash smbmount smbmount
pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN
ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to
Reminder to HPC usersReminder to HPC users
Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc
Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei
What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang
Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism
Less fish vs more Less fish vs more fishfish
What is ParallelismWhat is Parallelism
Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem
The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data
Kinds of ParallelismKinds of Parallelism
Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI
Distributed Memory ndash MPIDistributed Memory ndash MPI
The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing
Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour
Shared Memory Shared Memory ParallelismParallelism
If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours
Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown
And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours
The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min
The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour
Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min
Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return
Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos
Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly
Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other
All data are privateAll data are private
Processes communicate by passing Processes communicate by passing messagesmessages
The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)
Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead
The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until
everyone is readyeveryone is ready
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
How to access the HPC systemsHow to access the HPC systems
How to mount hpc file systemHow to mount hpc file system
On a Linux machineOn a Linux machinendash smbmount smbmount
pg-hpc-fs-01unbccaLOGIN pg-hpc-fs-01unbccaLOGIN MOUNTPOINT -o MOUNTPOINT -o username=LOGINuid=LOGINusername=LOGINuid=LOGIN
ndash replacing MOUNTPOINT with the name replacing MOUNTPOINT with the name of a directory that the system will be of a directory that the system will be mounted to mounted to
Reminder to HPC usersReminder to HPC users
Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc
Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei
What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang
Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism
Less fish vs more Less fish vs more fishfish
What is ParallelismWhat is Parallelism
Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem
The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data
Kinds of ParallelismKinds of Parallelism
Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI
Distributed Memory ndash MPIDistributed Memory ndash MPI
The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing
Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour
Shared Memory Shared Memory ParallelismParallelism
If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours
Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown
And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours
The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min
The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour
Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min
Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return
Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos
Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly
Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other
All data are privateAll data are private
Processes communicate by passing Processes communicate by passing messagesmessages
The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)
Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead
The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until
everyone is readyeveryone is ready
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
Reminder to HPC usersReminder to HPC users
Donrsquot run applications directly on the Donrsquot run applications directly on the cluster headnode Always remember cluster headnode Always remember to switch to node 63 or 64 first then to switch to node 63 or 64 first then run your applications such as run your applications such as Matlab IDL etcMatlab IDL etc
Submit your job via PBS on both Submit your job via PBS on both Columbia and AndreiColumbia and Andrei
What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang
Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism
Less fish vs more Less fish vs more fishfish
What is ParallelismWhat is Parallelism
Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem
The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data
Kinds of ParallelismKinds of Parallelism
Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI
Distributed Memory ndash MPIDistributed Memory ndash MPI
The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing
Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour
Shared Memory Shared Memory ParallelismParallelism
If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours
Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown
And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours
The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min
The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour
Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min
Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return
Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos
Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly
Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other
All data are privateAll data are private
Processes communicate by passing Processes communicate by passing messagesmessages
The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)
Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead
The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until
everyone is readyeveryone is ready
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
What is PBSWhat is PBSPortable Batch System (or simply Portable Batch System (or simply PBS) is the name of computer PBS) is the name of computer software that performs job software that performs job scheduling Its primary task is to scheduling Its primary task is to allocate computational tasks ie allocate computational tasks ie batch jobs among the available batch jobs among the available computing resources computing resources If you want to know more about PBS If you want to know more about PBS please contact Dr Jean Wangplease contact Dr Jean Wang
Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism
Less fish vs more Less fish vs more fishfish
What is ParallelismWhat is Parallelism
Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem
The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data
Kinds of ParallelismKinds of Parallelism
Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI
Distributed Memory ndash MPIDistributed Memory ndash MPI
The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing
Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour
Shared Memory Shared Memory ParallelismParallelism
If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours
Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown
And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours
The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min
The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour
Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min
Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return
Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos
Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly
Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other
All data are privateAll data are private
Processes communicate by passing Processes communicate by passing messagesmessages
The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)
Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead
The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until
everyone is readyeveryone is ready
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
Parallel programming BasicsParallel programming BasicsWhat is parallelismWhat is parallelism
Less fish vs more Less fish vs more fishfish
What is ParallelismWhat is Parallelism
Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem
The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data
Kinds of ParallelismKinds of Parallelism
Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI
Distributed Memory ndash MPIDistributed Memory ndash MPI
The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing
Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour
Shared Memory Shared Memory ParallelismParallelism
If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours
Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown
And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours
The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min
The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour
Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min
Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return
Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos
Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly
Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other
All data are privateAll data are private
Processes communicate by passing Processes communicate by passing messagesmessages
The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)
Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead
The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until
everyone is readyeveryone is ready
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
What is ParallelismWhat is Parallelism
Parallelism is the use of multiple Parallelism is the use of multiple processors to solve a problem and in processors to solve a problem and in particular the use of multiple particular the use of multiple processors working concurrently on processors working concurrently on different parts of a problemdifferent parts of a problem
The different parts could be different The different parts could be different tasks or the same task on different tasks or the same task on different pieces of the problemrsquos datapieces of the problemrsquos data
Kinds of ParallelismKinds of Parallelism
Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI
Distributed Memory ndash MPIDistributed Memory ndash MPI
The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing
Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour
Shared Memory Shared Memory ParallelismParallelism
If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours
Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown
And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours
The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min
The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour
Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min
Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return
Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos
Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly
Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other
All data are privateAll data are private
Processes communicate by passing Processes communicate by passing messagesmessages
The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)
Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead
The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until
everyone is readyeveryone is ready
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
Kinds of ParallelismKinds of Parallelism
Shared Memory Auto Parallel Shared Memory Auto Parallel OpenMP MPIOpenMP MPI
Distributed Memory ndash MPIDistributed Memory ndash MPI
The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing
Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour
Shared Memory Shared Memory ParallelismParallelism
If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours
Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown
And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours
The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min
The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour
Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min
Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return
Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos
Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly
Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other
All data are privateAll data are private
Processes communicate by passing Processes communicate by passing messagesmessages
The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)
Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead
The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until
everyone is readyeveryone is ready
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
The Jigsaw Puzzle AnalogyThe Jigsaw Puzzle AnalogySerial ComputingSerial Computing
Suppose you want to Suppose you want to do a jigsaw puzzle do a jigsaw puzzle that has 1000 pieces that has 1000 pieces Letrsquos say that you can Letrsquos say that you can put the puzzle put the puzzle together in an hourtogether in an hour
Shared Memory Shared Memory ParallelismParallelism
If Tom sits across the If Tom sits across the table from you then table from you then he can work on his he can work on his half of the puzzle and half of the puzzle and you can work on you can work on yours yours
Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown
And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours
The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min
The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour
Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min
Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return
Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos
Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly
Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other
All data are privateAll data are private
Processes communicate by passing Processes communicate by passing messagesmessages
The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)
Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead
The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until
everyone is readyeveryone is ready
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
Shared Memory ParallelismShared Memory ParallelismOnce in a while you will both reach into the pile Once in a while you will both reach into the pile of pieces at the same time (you will contend for of pieces at the same time (you will contend for the same resource) which will cause a little bit of the same resource) which will cause a little bit of slowdown slowdown
And from time to time you will have to work And from time to time you will have to work together (communicate) at the interface between together (communicate) at the interface between his half and yourshis half and yours
The speedup will be nearly 2-to-1 you will take The speedup will be nearly 2-to-1 you will take 35 min instead of 30 min35 min instead of 30 min
The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour
Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min
Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return
Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos
Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly
Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other
All data are privateAll data are private
Processes communicate by passing Processes communicate by passing messagesmessages
The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)
Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead
The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until
everyone is readyeveryone is ready
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
The More the MerrierThe More the MerrierNow letrsquos put Mike and Sam on the other two sides of Now letrsquos put Mike and Sam on the other two sides of the table Each of you can work on a part of the the table Each of you can work on a part of the puzzle but there will be a lot more contention for puzzle but there will be a lot more contention for the shared resource (the pile of puzzle pieces) and a the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaceslot more communication at the interfacesSo you will get noticeably less than a 4-to-1 So you will get noticeably less than a 4-to-1 speedup but you will still have an improvement speedup but you will still have an improvement say the four of you can say the four of you can get it done in 20 min instead of an hourget it done in 20 min instead of an hour
Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min
Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return
Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos
Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly
Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other
All data are privateAll data are private
Processes communicate by passing Processes communicate by passing messagesmessages
The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)
Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead
The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until
everyone is readyeveryone is ready
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
Diminishing ReturnsDiminishing ReturnsIf we now put four more If we now put four more people on the corners of the people on the corners of the table there is going to be a table there is going to be a lot contention for the shared lot contention for the shared resource and a lot of resource and a lot of communication at the many communication at the many interfaces You will be lucky interfaces You will be lucky to get it down in 15 minto get it down in 15 min
Adding too many workers Adding too many workers onto a shared resource is onto a shared resource is eventually going to have a eventually going to have a diminishing return diminishing return
Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos
Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly
Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other
All data are privateAll data are private
Processes communicate by passing Processes communicate by passing messagesmessages
The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)
Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead
The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until
everyone is readyeveryone is ready
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
Distributed ParallelismDistributed ParallelismNow letrsquos set up two Now letrsquos set up two tables and letrsquos put tables and letrsquos put you at one of them you at one of them and Tom at the other and Tom at the other Letrsquos put half of the Letrsquos put half of the puzzle pieces on your puzzle pieces on your table and the other table and the other half of the pieces on half of the pieces on TomrsquosTomrsquos
Now you all can work completely Now you all can work completely independently without any independently without any contention for a shared resource contention for a shared resource But the cost of communicating is But the cost of communicating is much higher (scootch tables much higher (scootch tables together) and you need the together) and you need the ability to split up (decompose) ability to split up (decompose) the puzzle pieces reasonably the puzzle pieces reasonably evenlyevenly
Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other
All data are privateAll data are private
Processes communicate by passing Processes communicate by passing messagesmessages
The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)
Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead
The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until
everyone is readyeveryone is ready
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
Distributed ParallelismDistributed ParallelismProcessors are independent of each otherProcessors are independent of each other
All data are privateAll data are private
Processes communicate by passing Processes communicate by passing messagesmessages
The cost of passing a message is split into The cost of passing a message is split into the latency (connection time) and the the latency (connection time) and the bandwidth (time per byte) bandwidth (time per byte)
Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead
The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until
everyone is readyeveryone is ready
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
Parallel OverheadParallel OverheadParallelism isnrsquot free The compiler and the Parallelism isnrsquot free The compiler and the hardware have to do a lot of work hardware have to do a lot of work parallelism happen ndash and this work takes parallelism happen ndash and this work takes time This time is called parallel overheadtime This time is called parallel overhead
The overhead typically includesThe overhead typically includes Managing the multiple processesManaging the multiple processes Communication between processesCommunication between processes Synchronization everyone stops until Synchronization everyone stops until
everyone is readyeveryone is ready
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
OpenMP and MPI programming paradigms OpenMP and MPI programming paradigms
MPIhellip parallelizing data MPIhellip parallelizing data
OpenMPhellip parallelizing tasks OpenMPhellip parallelizing tasks
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
MPIMPIHarry Potter Volume 1Harry Potter Volume 1
SpanishSpanish
FrenchFrench
TranslatorTranslator
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
FrenchFrench
TranslatorTranslator
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
OpenMPOpenMPHarry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
SpanishSpanish
TranslatorTranslator
Harry Potter Volume 1Harry Potter Volume 1
Harry Potter Volume 2Harry Potter Volume 2
FrenchFrench
TranslatorTranslator
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
CompilersCompilersCompilers on ACT cluster (andrei)Compilers on ACT cluster (andrei)
GNU ndash CC++ g77GNU ndash CC++ g77PGI ndash CC++ f77 f90PGI ndash CC++ f77 f90
Compilers on Altix 3000 (columbia)Compilers on Altix 3000 (columbia)Intel ndash CC++ FortranIntel ndash CC++ FortranGNUndash CC++ g77GNUndash CC++ g77
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
PGI Compilers (cluster)PGI Compilers (cluster)
PGI CompilerPGI CompilerFor 32-bit compilers set PATH asFor 32-bit compilers set PATH as
export PATH=usrlocalpgilinux8660bin$PATHexport PATH=usrlocalpgilinux8660bin$PATH
For 64-bit compilers set PATH asFor 64-bit compilers set PATH as
export PATH=usrlocalpgilinux86-6460binexport PATH=usrlocalpgilinux86-6460bin$PATH$PATH
Fortran pgf77pgf90pgf95 pghpf(High Fortran pgf77pgf90pgf95 pghpf(High Performance Fortran) mpif77mpif90Performance Fortran) mpif77mpif90
C pgccmpiccC pgccmpicc
C++ pgCC mpicxxC++ pgCC mpicxx
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
Compilers for MPI codesCompilers for MPI codesExamples a C++ code bonesC a C code bogeyc and a Examples a C++ code bonesC a C code bogeyc and a
Fortran code mpihellofFortran code mpihellof
On clusterOn clusterusrlocalpgilinux8660binmpicxx bonesC -o bones ndashusrlocalpgilinux8660binmpicxx bonesC -o bones ndash
lmpichlmpich
On cloumbiaOn cloumbiaoptintelcc90binicc bogeyc ndasho bogey -lmpioptintelcc90binicc bogeyc ndasho bogey -lmpioptintelfc90binifort -o mpihello mpihellof -lmpioptintelfc90binifort -o mpihello mpihellof -lmpi
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
Compilers for MPI codesCompilers for MPI codesusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichusrlocalpgilinux8660binmpicxx bonesC -o bones ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichpgif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpichmpif77 -o mpihello mpihellof -lfmpich ndashlmpich
Which mpirunWhich mpirun
[pg-hpc-clnode-head ~]gt which mpirun[pg-hpc-clnode-head ~]gt which mpirunusrlocalpgilinux86-6460binmpirunusrlocalpgilinux86-6460binmpirun
[pg-hpc-altix-01 ~]gt which mpirun[pg-hpc-altix-01 ~]gt which mpirunusrbinmpirunusrbinmpirunoptmpichch-p4binmpirun ndashnp 4 hellipoptmpichch-p4binmpirun ndashnp 4 hellip
More then one ldquompirunrdquo ndash SGI MPI and MPICHMore then one ldquompirunrdquo ndash SGI MPI and MPICH
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
Intel CompilersIntel CompilersHow to compile a parallel codeHow to compile a parallel codeMPI codesMPI codes
ifort -options myMPIcodef -lmpiifort -options myMPIcodef -lmpiicc -options myMPIcodec -lmpiicc -options myMPIcodec -lmpi
Code with OpenMp directivesCode with OpenMp directives
ifort -options -openmp myOpenMpcodefifort -options -openmp myOpenMpcodeficc -options -openmp myOpenMpcodecicc -options -openmp myOpenMpcodec
Automatic ParallelizationAutomatic Parallelization
ifort -parallel mycodefifort -parallel mycodeficc -parallel mycodecicc -parallel mycodec
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
More About CompilersMore About CompilersOn columbiaOn columbia
man ifort -M optintelfc90manman ifort -M optintelfc90man
man icc -M optintelcc90manman icc -M optintelcc90man
On andreiOn andrei
man pgCC -M man pgCC -M usrlocalpgilinux8660manusrlocalpgilinux8660man
man pgf90 -M man pgf90 -M usrlocalpgilinux8660manusrlocalpgilinux8660man
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
Getting started with OpenMPGetting started with OpenMP
Key pointsKey points
ndash Shared memory multiprocessor nodes Shared memory multiprocessor nodes ndash Parallel programming using compiler Parallel programming using compiler
directives directives ndash Fortran 779095 and CC++ Fortran 779095 and CC++
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
C OpenMP compiler directiveC OpenMP compiler directiveParallel regions in C hellipParallel regions in C hellip
============================================================include ltstdiohgtinclude ltstdiohgtintintmain (void)main (void)pragma omp parallelpragma omp parallel
printf (Hello worldn)printf (Hello worldn)
return 0return 0
================================1048707================================1048707
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
Fortran Fortran OpenMP compiler OpenMP compiler directivedirective
Parallel regions in Fortran hellip
program helloc$omp parallel
print lsquoHello worldrsquoc$omp end parallel
end
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
Compiling and RunningCompiling and Running
Intel (-openmp) or SGI (-mp)Intel (-openmp) or SGI (-mp)ndash ldquoldquoicc testcpp ndashopenmp ndasho test-icc testcpp ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoifort testf ndashopenmp ndasho test-ifort testf ndashopenmp ndasho test-
openmpexerdquoopenmpexerdquondash ldquoldquoOMP_NUM_THREADS=32rdquoOMP_NUM_THREADS=32rdquondash ldquoldquoExport OMP_NUM_THREADSrdquoExport OMP_NUM_THREADSrdquondash ldquoldquotime test-openmpexerdquotime test-openmpexerdquo
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace
Two work directories ndashTwo work directories ndash homeuser-id amp hpchomeuser-id homeuser-id amp hpchomeuser-id
homeuser-idhomeuser-idndash CTS serverCTS serverndash Email boxEmail boxndash Login filesLogin filesndash Backup dailyBackup dailyndash Contact help desk Contact help desk
for increasing the for increasing the spacespace
hpchomeuser-idhpchomeuser-idndash HPC serverHPC serverndash Research areaResearch areandash Backup once a Backup once a
weekweekndash Contact Jean Wang Contact Jean Wang
for increasing the for increasing the spacespace