1 Integrating GPUs into Condor Timothy Blattner Marquette University Milwaukee, WI April 22, 2009

1

Integrating GPUs into CondorIntegrating GPUs into Condor

Timothy BlattnerTimothy BlattnerMarquette UniversityMarquette University

Milwaukee, WIMilwaukee, WI

April 22, 2009April 22, 2009

2

OutlineOutline Background and VisionBackground and Vision

Graphics CardsGraphics Cards

Condor ApproachCondor Approach

ProblemsProblems

Conclusions and Future WorkConclusions and Future Work

3

Graphics cardsGraphics cards Powerful – NVIDIA Tesla C1060Powerful – NVIDIA Tesla C1060

240 massively parallel processing cores240 massively parallel processing cores 4 GB GDDR34 GB GDDR3 CUDA CapableCUDA Capable

~993 gigaflops~993 gigaflops ~$1,300~$1,300

Cheap – NVIDIA 9800 GTCheap – NVIDIA 9800 GT 112 massively parallel processing cores112 massively parallel processing cores 512 MB GDDR3512 MB GDDR3 CUDA CapableCUDA Capable

~$120~$120

4

Vision and FocusVision and Focus Pool of computers containing graphics cards, Pool of computers containing graphics cards,

managed by Condormanaged by Condor Provide users the ability to utilize graphics cards Provide users the ability to utilize graphics cards

identified by Condoridentified by Condor

? ? ?

Central Manager

5

OpportunitiesOpportunities

Resources may already be thereResources may already be there Majority of machines have graphics cards in themMajority of machines have graphics cards in them

GPU resources sit idle while Condor runs on the GPU resources sit idle while Condor runs on the CPUCPU

Similar workSimilar work GPUGRID.netGPUGRID.net

Distributed computing project using NVIDIA Distributed computing project using NVIDIA graphics card for atom molecular simulations graphics card for atom molecular simulations of proteinsof proteins

Uses GPU-enabled BOINC clientUses GPU-enabled BOINC client

6

Prototype ImplementationPrototype Implementation Linux onlyLinux only

Script queries operating system and graphics cardScript queries operating system and graphics card

Hawkeye Cron job manager runs scriptHawkeye Cron job manager runs script

Script outputs graphics card information into ClassAd Script outputs graphics card information into ClassAd formatformat

Binary for NVIDIA cards for more specific Binary for NVIDIA cards for more specific informationinformation

7

Graphics Card ArchitectureGraphics Card Architecture

8

Graphics card APIsGraphics card APIs Favor general purpose computationsFavor general purpose computations

CUDA (NVIDIA)CUDA (NVIDIA)

Brook (ATI)Brook (ATI)

openCL (Khronos Group)openCL (Khronos Group)

9

CUDA Programming ModelCUDA Programming Model Kernels are functions run on the Kernels are functions run on the devicedevice (GPU) (GPU)

Host (CPU) code invokes kernels and determinesHost (CPU) code invokes kernels and determines– Number of threadsNumber of threads– Thread block structure for organizing threadsThread block structure for organizing threads

Kernel invocations are Kernel invocations are asynchronousasynchronous– Control returns to the CPU immediatelyControl returns to the CPU immediately– CUDA provides synchronization primitivesCUDA provides synchronization primitives– Some CUDA calls (e.g. memory allocation) are Some CUDA calls (e.g. memory allocation) are

synchronoussynchronous

10

Hawkeye Cron Job ManagerHawkeye Cron Job Manager Provides mechanism for collecting, storing, and Provides mechanism for collecting, storing, and

using information about computersusing information about computers

Periodically executes specified program(s)Periodically executes specified program(s)

Program outputs in form of ClassAdProgram outputs in form of ClassAd

Outputs are added to machine's ClassAdOutputs are added to machine's ClassAd

11

Hawkeye ImplementationHawkeye Implementation Added to local configuration fileAdded to local configuration file Runs script every minuteRuns script every minute Condor user must be granted graphics card Condor user must be granted graphics card

privileges in order to query the cardprivileges in order to query the card

STARTD_CRON_JOBLIST = $(STARTD_CRON_JOBLIST), STARTD_CRON_JOBLIST = $(STARTD_CRON_JOBLIST), UPDATEGPUUPDATEGPU

STARTD_CRON_UPDATEGPU_EXECUTABLE = gpu.shSTARTD_CRON_UPDATEGPU_EXECUTABLE = gpu.shSTARTD_CRON_UPDATEGPU_PERIOD = 1mSTARTD_CRON_UPDATEGPU_PERIOD = 1mSTARTD_CRON_UPDATEGPU_MODE = PeriodicSTARTD_CRON_UPDATEGPU_MODE = PeriodicSTARTD_CRON_UPDATEGPU_KILL = TrueSTARTD_CRON_UPDATEGPU_KILL = True

12

Script OutputScript Output HasGpu = True NGpu = 1 Gpu0 = "Quadro FX 3700" Gpu0CudaCapable = True Gpu0_Major = 1 Gpu0_Minor = 1 Gpu0Mem = 536150016 Gpu0Procs = 14 Gpu0Cores = 112 Gpu0ShareMem = 16384 Gpu0ThreadsPerBlock = 512 Gpu0ClockRate = 1.24 HasCuda = True -

13

Job SubmissionJob Submission Users can submit jobs with GPU requirements into CondorUsers can submit jobs with GPU requirements into Condor Portable across Linux DistrosPortable across Linux Distros

Universe = vanillaExecutable = tests/CudaJobInitialdir = gpuJobsRequirements = (HasGpu == true) && (Gpu0CudaCapable == true)

Log = gpu_test.log Error = gpu_test.stderrOutput = gpu_test.stdoutQueue

condor_submit gpu_job.submit

14

Access ControlAccess Control /dev/nvidiactl, /dev/nvidia* devices need read/write /dev/nvidiactl, /dev/nvidia* devices need read/write

by submitting/running userby submitting/running user

Could beCould be

Nobody, open accessNobody, open access

Controlled by Unix group, containing limited Controlled by Unix group, containing limited usersusers

Integrated more directly with Condor user control, Integrated more directly with Condor user control, slot usersslot users

15

ProblemsProblems PreemptionPreemption

Jobs running in GPU kernel cannot be interrupted Jobs running in GPU kernel cannot be interrupted reliably by Unix signalsreliably by Unix signals

Watchdog timerWatchdog timer After 5 seconds, job is killedAfter 5 seconds, job is killed A Solution: use general purpose graphics card as A Solution: use general purpose graphics card as

secondary displaysecondary display

Memory SecurityMemory Security Malicious users, interrupting a job between GPU Malicious users, interrupting a job between GPU

kernel calls, have the opportunity to overwrite or kernel calls, have the opportunity to overwrite or copy GPU memorycopy GPU memory

16

SummarySummary

Condor based approach for advertising GPU Condor based approach for advertising GPU resourcesresources

Linux-based prototype implementationLinux-based prototype implementation

Can access available GPUsCan access available GPUs Works best on dedicated machines, with no need Works best on dedicated machines, with no need

for preemptionfor preemption

Current LimitationsCurrent Limitations Doesn’t report GPU usageDoesn’t report GPU usage Lack of preemptionLack of preemption Limited OS and video card supportLimited OS and video card support

17

Future WorkFuture Work Create benchmark and testing suiteCreate benchmark and testing suite

Handle preemptionHandle preemption Investigate how watchdog worksInvestigate how watchdog works

GPU usage reportingGPU usage reporting

Integrate memory protectionIntegrate memory protection

Support more Operating SystemsSupport more Operating Systems Windows and Mac OS XWindows and Mac OS X

Support alternative architectures and APIsSupport alternative architectures and APIs Brook and OpenCLBrook and OpenCL

18

Questions?Questions?

Contact:Contact:[email protected]

[email protected]://sourceforge.net/projects/condorgpu/https://sourceforge.net/projects/condorgpu/

Documents

1 Integrating GPUs into Condor Timothy Blattner Marquette University Milwaukee, WI April 22, 2009