View
13
Download
0
Category
Preview:
Citation preview
Helen He!NERSC User Engagement Group!!New User Training, February 23-24, 2017
Getting Started at NERSC
-1-
Outline
• Connec&ngtoNERSC– SSH,NX
• Compu&ngEnvironment• CompileandRunMyFirstJob• CommonBestPrac&ces
-2-
Connecting to NERSC
-3-
SSH
• Allofthecomputa&onalsystemsatNERSCareaccessibleviaSSH
• Eachsystemhasasetofload-balancedloginnodeswhichofferSSHservice
• UseyourNIMusername&password• AddressesforNERSCsystems:
-4-
Large-scaleSystems– edison.nersc.gov– cori.nersc.gov
Mid-rangeSystems– genepool.nersc.gov– pdsf.nersc.gov
DataTransferNodes– dtn[1-4].nersc.gov
Advanced Topic: SSH Keys
• IfyouchoosetosetupanSSHkeytoaccessNERSCsystems,pleaseuseapassphraseonthekey– Nounencryptedkeys!
• UploadyourSSHpublickeysinNIM– AuthenCcaConavailableonlytouserswhohavestoredtheirSSHpublickeysinNIM
– Publickeysstoredinuserhomedirectoriesarenothonored
• MoreDetails:hKp://www.nersc.gov/users/network-connecCons/connecCng-to-nersc/
-5-
Basic SSH use from Mac/Linux/cygwin
• IfyouhaveaUNIX-likecomputer,youcandirectlycontactNERSCwithyourbuilt-inSSHclient1. Openanewterminal2. %ssh -l <NIM username> cori.nersc.gov
• Dependingonyourpreferencesyoumightwantaddi&onalSSHflags:
• ssh-Y performsrobustX-forwardingoverssh• ssh-A forwardsssh-agentinforma6on(ifyouuseSSHkeys)
-6-
SSH from a Windows System
• ManySSHclientsexistforWindows– AverypopularoneispuSy
• hSp://www.chiark.greenend.org.uk/~sgtatham/puSy/download.html
– AdvancedusersmightprefertouseSSHdirectlywithinminSy(fromCygwindistribuCon)
• Bothoftheseop&onssupportallSSHfeatures(thatI’veevertriedtouse)– ForX-forwardingtowork,you’llneedtofindX-serversoYware
• Cygwin/X• Exceed
– ConsiderusingNXinsteadofX-forwarding
-7-
X-forwarding
• AllowsyoutoaccessVisualiza&onprogramsremotelyatNERSC
-8-
Example:localhost% ssh -l elvis –Y cori.nersc.gov … e/elvis> module load matlab e/elvis> matlab <MATLAB starts up>
NERSCRecommendsusingNXinsteadofX-forwarding.
Nextsec6on!
Example Session localhost:~elvis> ssh -l <NIM username> cori.nersc.gov ***************************************************************** * * * NOTICE TO USERS * * --------------- * * * * Lawrence Berkeley National Laboratory operates this * * computer system under contract to the U.S. Department of * * Energy. This computer system is the property of the United * * States Government and is for authorized use only. *Users * * (authorized or unauthorized) have no explicit or implicit * * expectation of privacy.* * * * * Any or all uses of this system and all files on this system * * may be intercepted, monitored, recorded, copied, audited, * * inspected, and disclosed to site, Department of Energy, and * * law enforcement personnel, as well as authorized officials * * of other agencies, both domestic and foreign. *By using * * this system, the user consents to such interception, * * monitoring, recording, copying, auditing, inspection, and * * disclosure at the discretion of authorized site or * * Department of Energy personnel.* * * * * *Unauthorized or improper use of this system may result in * * administrative disciplinary action and civil and criminal * * penalties. _By continuing to use this system you indicate * * your awareness of and consent to these terms and conditions * * of use. LOG OFF IMMEDIATELY if you do not agree to the * * conditions stated in this warning._* * * * ***************************************************************** Password: <enter your NIM password here>
-9-
Promptonlocalsystem
NoCficaConofacceptableuse.
Passwordprompt
MOTD (NERSC Message of the Day) • A\eryoutypethepasswordandloggedintoasystem,you
willseeNERSCMOTDbeforeyoursessionpromptappears
-10-
Lastlogin:WedFeb2216:07:292017from198.128.212.1-----------------------------ContactInformaCon------------------------------NERSCContactshKp://www.nersc.gov/about/contact-us/NERSCStatushKp://www.nersc.gov/users/live-status/NERSC:800-66-NERSC(USA)510-486-8600(outsideconCnentalUSA)-----------------CurrentStatusasof2017-02-2214:35PST--------------ComputeResources:Cori:Available.Edison:Available.Genepool:Available.PDSF:Available.GlobalFilesystems:DNA:Available.GlobalCommon:Available.GlobalHomes:Available.Project:Available.ProjectA:Available.ProjectB:Available.
-
MassStorageSystems:HPSSBackup:Available.HPSSUser:Available.-----------------ServiceStatusasof2017-02-2214:35PST--------------------Allservicesareavailable.--------------------------------PlannedOutages-------------------------------Cori:02/28/176:00-03/01/176:00PST,Scheduledmaintenance.CoriwillbedegradedduetocabinetaddiCons.DatawarpnodeswillbereducedduringthisCme.Cori:03/01/176:00-03/03/1717:00PST,Scheduledmaintenance.CoriwillbedownforaddingcabinetsandHSN(high-speednetwork)maintenance.Loginswillnotbeavailable.DataTransferNodes:03/01/179:00-12:00PST,Scheduledmaintenance.---------------------------------PastOutages---------------------------------Cori:02/21/178:00-21:15PST,Scheduledmaintenance.Coriwillbeunavailablewhileupdatesareapplied.Loginswillbeavailable,howevernojobswillrun.Cori:02/21/1721:15-22:15PST,Systemindegradedmode.Themajorityofthesystem'scomputenodesarecurrentlyunavailable.EngineersareinvesCgaCngtheissueForpastoutages,see:hKp://my.nersc.gov/outagelog-cs.php-------------------------------------------------------------------------------
Login Node Auto-Logout
• SomeNERSCsystemswon’tgiveyouunlimited&meontheloginnodes– AYer48hoursidle,Cori&Edisonloginnodeswillterminateyoursession
– PDSFandGenepoolsessionsareunlimited
-11-
NX – Accelerated X
• AlsousesSSH• Persistentsessions• AcceleratedGraphics– reallygoodforremoteaccess
• KDEDesktop• WhatyouneedforNX– AnyDesktop/Laptop
• Windows/Linux/Mac
– NXClientSoYware(Free)
-12-
Reasons for NX
• SlowSpeeds:X-Windowsisslowovernetwork.Remotewindowsfromemacscantakeminutestoopen
• Solu&on:NXBuffers/CompressesXmessages,givingmuchbeKerXexperience
-13-
Reasons for NX
-14-
• LongLas&ngDesktop:NXgivesyouadesktop,soyoucanconnecttoNERSCresources(suchasEdison)andstartyourGUIapplicaCons.
Reasons for NX
• LostConnec&ons:IfIloseinternetconnecCon,Imightloseallrunningprocesses.
• Solu&on:NXprovidessessions.YoucansuspendthesessionwithoutterminaCngtherunningprocesses.– Andgetbacktothesamepointwhenreconnected,evenfromadifferentloca6onorcomputer.
-15-
NERSC NX Service
• 10$Minute$Start.up$Guide
Documenta*on:,Go#to#www.nersc.gov,#search#for#“NX”
-16-
Download(Client(One%Time,%5%min)
Setup(Connec1on(One%Time,%5%min)
Use
SuspendConnect
MapofCurrentUsers
NX Demo
LisaGerhardtwillshowashortliveNXdemoa\erthistalk
-17-
Computing Environment
-18-
Node Types
• Loginnodes– Sharedwithotherusers– CodecompilaCon,jobpreparaConandsubmission
• Computenodes– Notshared(exceptinthe“shared”parCCon)
-19-
Login Node Configuration • Edison
– 12nodes• 16cores,2.0GHzIntelSandyBridge,512GB
• Cori– 12nodes
• 32cores,2.3GHzIntelHaswell,512GB– Extraloginnodesforspecialpurposes(notinloadbalancer)
• Genepool– 2nodes
• 32cores,2.3GHzIntelHaswell,128GB• PDSF
– 3nodes• 32cores,2.6GHzIntelHaswell,128GB
-20-
Login Node Access
• Connect(viaSSH)toloadbalancer% ssh edison.nersc.gov % ssh cori.nersc.gov % ssh genepool.nersc.gov % ssh pdsf.nersc.gov
• Loadbalancerselectsloginnodebasedon:– NumberofconnecCons– MemoryofpreviousconnecConsfromsameIP
-21-
Login Node Usage
• Loginnodesaresharedbymanyusers,allthe&me• Editfiles,compileprograms,submitbatchjobs• Somelightpost-processing/dataanalysis– IDL,MATLAB,NCL,python,etc.– Allcanrunoncomputenodes(soyouhavededicatedresources)
• Somefiletransfers– Usedatatransfernodesforlarge/long-runningtransfers
• Pleaseusediscre&on– AllusersgetfrustratedbysluggishinteracCveresponse
-22-
Login Node Guidelines
• Usenomorethan50%ofavailablecores• Usenomorethan25%ofavailablememory• Limituseofparallel“make”
% make -j 4 all
• NERSCwillkilluserprocessesifloginnodesbecomeunacceptablysloworunresponsive
• Terminateidlesessionsoflicensedso\ware– IDL– MATLAB– MathemaCca
-23-
Shell Initialization Files • StandarddotfilesaremaintainedbyNERSC– .bashrc,.profile,.cshrc,.login,etc.– Symboliclinkstoread-onlyfiles
• Personaldotfiles– Aliases,environmentvariables,modules,etc.– Use.extsuffix(“.extfiles”).bashrc.ext,etc.
• Broken?Use“fixdots”tostartover– Creates$HOME/KeepDots.<timestamp> – Restoresalldotfilestodefaultstate– IfPATHcorrupted:
/usr/common/software/bin/fixdots • UseNIMtochangedefaultloginshell
-24-
Software is Managed by Modules • Iden&fytheso\wareyouneed
hKp://www.nersc.gov/users/soYware/– Usemodule avail package_name
• Lotsofoutput– Allmoduleoutputgoestostderr,notstdout
• Eachsystemhasdifferentmodules!
• Loadthemodule% which idl idl: Command not found. % module load idl % which idl /global/common/cori/software/idl/idl83/bin/idl
-25-
Other Useful Module Commands module unload <modulename> – Removethemodulefromyourenvironment
module swap <module1> <module2> – Unloadonemoduleandreplaceitwithanother % module swap intel intel/16.0.3.210 (replacecurrentdefaulttoaspecificversion)
module list – Seewhatmodulesyouhaveloadedrightnow
module show <modulename> – Seewhatthemoduleactuallydoes
module help <modulename> – GetmoreinformaConaboutthesoYware
-26-
NERSC Supported Software • NERSCprovidesawiderangeofso\ware• hSp://www.nersc.gov/users/so\ware/
– ScienCficApplicaCons• VASP,Amber,NAMD,QuantumEspresso,...
– Compilers• Intel,GCC,Cray
– ScripCngLanguages• perl,python,R-includingcommonpackagesforeach
– SoYwareLibraries(somemaintainedbyCray)• blas/lapack(MKL),boost,hdf5,netcdf,…
– DevelopmentuCliCes• git,mercurial,cmake,shiYer,…
– DebuggersandProfilers• DDT,TotalView,gdb,PerYools,MAP,Darshan,IPM,Vtune
– GridSoYware• Globus
– VisualizaConandAnalyCcspackages• Visit,ParaView,Jupyter,Rstudio,...
– DevelopmentEnvironment• ShiYer
-27-
Cray Programming Environment • Meta-modules
PrgEnv-intel,PrgEnv-cray,PrgEnv-gnu– Organizeasetofmodules
• Compiler(intel,cray,gnu)• Libraries(includingMPI)tunedforcompiler
– IntelisdefaultonEdisonandCori• SwappingProgrammingEnvironments
% module swap PrgEnv-intel PrgEnv-cray – swapscompiler– noneedtoswaplibraries!
-28-
Compiler Wrappers • OnCori/Edison:– DefinedbyPrgEnv-*modules– ftn(fortran),cc(C),CC(C++) – ProvidesincludeheaderandlibrarysearchpathsforMPI,commonmathlibraries(e.g.,CrayLibsci),CraysystemsoYware
– ProvidesconsistentlevelofopCmizaConacrosscompilers
• Usecompilerwrapperstobuildapplica&ons• Seldomneedna&vecompilers!• MoredetailsinaCompilingCodestalklatertoday
-29-
CHOS Environment
• ProvidesdifferentOSenvironments– OYendifferentthird-partysoYware
• SomesoYwarepackageshavespecificOSrequirements– PossiblyduetovalidaConrequirements
• UsedonPDSFandGenepool• Transparent– DefaultconfiguraConformostusers– AlternateconfiguraConsforsomeusers
• MoreDetailshSp://www.nersc.gov/users/computa&onal-systems/pdsf/so\ware-and-tools/chos/
-30-
Compile and Run My First Job!(Cori Haswell example)
-31-
My Hello World Program
-32-
elvis@cori04>catmpi-hello.f90IMPLICITNONEINCLUDE'mpif.h'INTEGER::myPE,totPEs,ierrCALLMPI_INIT(ierr)CALLMPI_COMM_RANK(MPI_COMM_WORLD,myPE,ierr)CALLMPI_COMM_SIZE(MPI_COMM_WORLD,totPEs,ierr)PRINT*,"myCPUis",myPE,"oftotal",totPEsCALLMPI_FINALIZE(ierr)STOPEND
Compile
• UsecompilerwrapperswhichlinksMPIlibrariesautoma&cally.
-33-
elvis@cori04>Yn-ompi-hellompi-hello.f90elvis@cori04>ls-almpi-hello-rwxr-x---1elviselvis9241160Feb2210:14mpi-hello
Submit Batch Job
• PrepareaSlurmbatchscript
• Submitittothebatchqueue
-34-
elvis@cori04>catrun-hello.sl#!/bin/bash-l#SBATCH-N2#Use2computenode#SBATCH-t00:10:00#Set10minuteCmelimit#SBATCH-pdebug#Submittothe"debug"parCCon#SBATCH-LSCRATCH#Jobrequires$SCRATCHfilesystem#SBATCH-Chaswell#RequestHaswellnodessrun-n64./mpi-hello
elvis@cori04>sbatchrun-hello.slSubmiKedbatchjob3838675
Check Results • Checkjobinbatchqueue
• Onceitiscompleted,checkresults
• Moredetailsonrunningjobsinalatertalktoday
-35-
elvis>sqsJOBIDSTREASONUSERNAMENODESUSEDREQUESTEDSUBMITPARTITIONRANK_PRANK_BF3838675PDPriorityelvisrun-hello.sl20:0010:002017-02-22T10:24:32debugN/A
elvis>catslurm-3838675.outmyCPUis0oftotal64myCPUis1oftotal64myCPUis2oftotal64…myCPUis61oftotal64myCPUis62oftotal64myCPUis63oftotal64
Common Best Practices
-36-
Selected Best Practices (1)
• CheckMOTDmessagesforcurrentsystemstatus,pastoutages,andplannedmaintenances– FromSSHloginprompt– hKp://www.nersc.gov/live-status/motd/
• Benicetoothersregardingthesharedresources– LimitCPUandmemoryusageonloginnodes– DoproducConworkoncomputenodes
• Releaselicenses– LimitednumberofcertainsoYwarepackagesareavailable,suchasMatlab,IDL,etc.
-37-
Selected Best Practices (2)
• Don’tuse“watch”withdefault2secondsinterval– Checkevery10minormore– Sendemailswhenbatchjobstartsandends
• #SBATCH–mail-type=<events>– Validevents:BEGIN,END,FAIL,etc.
• #SBATCH–mail-user=<email_address>
• Runapplica&onsfromLustrescratchor/projectinsteadofglobalhomesdirectory,toget– Largerspace– OpCmalIOperformance
• Backupyourimportantfilesfrequently– /scratchfilesarepurged
-38-
Further Information
• hSp://www.nersc.gov/users/connec&ng-to-nersc/connec&ng-with-ssh/
• hSp://www.nersc.gov/users/connec&ng-to-nersc/using-nx/
• hSp://www.nersc.gov/users/so\ware/nersc-user-environment/
• hSp://www.nersc.gov/users/so\ware/nersc-user-environment/modules/
• hSp://www.nersc.gov/users/gekng-started/• hSps://www.nersc.gov/users/computa&onal-systems/
cori/gekng-started/your-first-program-on-cori/
-39-
Thank you.
-40-
Recommended