View
0
Download
0
Category
Preview:
Citation preview
Mike Wendt
@mike_wendt
github.com/nvidia
github.com/mike-wendt
BUILDING A GPU-FOCUSED CI SOLUTION
2
AGENDA
Need for CPU CI
Challenges of GPU CI
Methods to Implement GPU CI
Improving GPU CI Today
Demo
Lessons Learned
Next Steps
Getting Started
3
NEED FOR GPU CI
• The leading open-source software projects from Apache and others rely on CI
• External demand
• Partners are collaborating with us on projects like GPU Open Analytics Initiative (GoAi) and need GPU CI to ensure stable builds
• Internal demand
• Large code-bases internally for all kinds of GPU-accelerated applications require testing across different platforms/hardware
• Performance testing of new drivers and hardware needs repeatable methods to make sure we continue to deliver performance
The number of GPU-accelerated applications are growing
4
CHALLENGES OF GPU CI
Need GPUs
Cloud or physical
Resource management
Expose GPU configuration to developers
Driver, CUDA, GPU type
Many traditional tools like Travis CI, Circle CI, and others do not support GPUs
For good reasons, dangers of misuse
For tools that offer support, many times it is not native
Still feels “hacky,” but it gets the job done
GPUs bring a different set of problems than traditional CI
5
METHODS TO IMPLEMENT
GPU CI
6
BARE-METAL + GPU
Benefits
Reduces complexity with minimal setup
Works well for a small set of projects that use the same/similar dependencies
Challenges
Managing dependencies can be tricky for multiple projects
Limits ability to test multiple platforms, limited to installed CUDA/OS
Resource management is difficult
Fastest to get started with the most limitations
7
BARE-METAL + GPUFastest to get started with the most limitations
Server
GPUs
CI Environment
Source
CodeTests
Test
Results
8
DOCKER + NVIDIA CONTAINER RUNTIME
Docker runtime that allows for GPU pass-thru on Linux systems
Works with Debian/Ubuntu, RHEL/CentOS, and Amazon Linux
Allows for testing multiple CUDA/OS environments on one machine
Includes options to set supported driver operations and restrict GPU visibility
github.com/nvidia/nvidia-docker
9
DOCKER + GPU
Benefits
Ability to test multiple CUDA/OS combinations
Handles dependency management for all projects
Enables fine-grained resource management
Supports scale needed for larger projects and teams
Challenges
Typically requires pre-built Docker images with environments for testing and code to test injected into container for testing
Configuration tends to be a lot of environment variables and cumbersome to manage
GitLab CI and Jenkins require “runners” for multiple nodes
Easier to use with some hacking still required
10
DOCKER + GPUEasier to use with some hacking still required
Server
GPUs
CI Environment Docker Container
Docker + NVIDIA Runtime
Source
Code
Tests
Dockerfile or
Container
Test Results
Custom
Config
11
DOCKER + GPUEasier to use with some hacking still required
Server
GPUs
CI Environment Docker Container
Docker + NVIDIA Runtime
Source
Code
Tests
Dockerfile or
Container
Test Results
Custom
Config
12
DOCKER + GPUEasier to use with some hacking still required
Server
GPUs
CI Environment Docker Container
Docker + NVIDIA Runtime
Source
Code
Tests
Dockerfile or
Container
Test Results
Custom
Config
13
KUBERNETES + DOCKER + GPU
Benefits
GPU support in v1.8+ of Kubernetes
Takes care of the “runner” challenge with GitLab/Jenkins
Resource management and scheduling is handled by Kubernetes
Challenges
Can only target GPUs on homogeneous nodes (heterogeneous support coming)
Not all tools support GPU CI out of the box
Docker containers required for testing, but this can be the previous step in a pipeline
Promises to be the easiest to use with minimal hacking
14
KUBERNETES + DOCKER + GPUPromises to be the easiest to use with minimal hacking
Kubernetes Master
Kubernetes Master
Docker Container
Docker + NVIDIA Runtime
Source
Code
Tests
Dockerfile or
Container
Test Results
Server
CI Environment
Kubernetes Worker
GPUs
Docker Container Repo
Docker Test
Container
Scheduler
…
Custom
Config
15
KUBERNETES + DOCKER + GPUPromises to be the easiest to use with minimal hacking
Kubernetes Master
Kubernetes Master
Docker Container
Docker + NVIDIA Runtime
Source
Code
Tests
Dockerfile or
Container
Test Results
Server
CI Environment
Kubernetes Worker
GPUs
Docker Container Repo
Docker Test
Container
Scheduler
…
Custom
Config
16
KUBERNETES + DOCKER + GPUPromises to be the easiest to use with minimal hacking
Kubernetes Master
Kubernetes Master
Docker Container
Docker + NVIDIA Runtime
Source
Code
Tests
Dockerfile or
Container
Test Results
Server
CI Environment
Kubernetes Worker
GPUs
Docker Container Repo
Docker Test
Container
Scheduler
…
Custom
Config
17
KUBERNETES + DOCKER + GPUPromises to be the easiest to use with minimal hacking
Kubernetes Master
Kubernetes Master
Docker Container
Docker + NVIDIA Runtime
Source
Code
Tests
Dockerfile or
Container
Test Results
Server
CI Environment
Kubernetes Worker
GPUs
Docker Container Repo
Docker Test
Container
Scheduler
…
Custom
Config
18
HOW CAN WE MAKE THIS
BETTER TODAY?
19
JENKINS PLUGIN FOR NVIDIA + DOCKER
Simplifies the configuration of Docker containers for GPU CI testing
Allows for targeting a Dockerfile within the repo to build and use for testing or a Docker image in a remote hub
Supports side-containers with GPU support
Easy to use and adapt a project for GPU CI
Based on Jenkins docker-slaves plugin
20
DEMO
21
JENKINS PLUGIN FOR NVIDIA + DOCKER Simplifying the configuration for GPU CI
Server
GPUs
Jenkins CI Environment Docker Container
Docker + NVIDIA Runtime
Source
Code
Tests
Dockerfile or
Container +
Plugin Config
Test Results
22
LESSONS LEARNED
• CI best practices apply to GPU code as well
• Pull request testing is one of the best methods to ensure code quality
• GitLab CI works great if there are only a few GPU-enabled repos to test
• For scale-out, GitLab on Kubernetes is best
• Larger organizations and projects need a centralized CI platform like Jenkins
• Setup of a new repo is easy and with parameterized builds we can make use of existing pipelines
• Advanced uses of Jenkins
• Tagging is key to test on multiple GPU architectures and pipelines for multiple CUDA version testing
23
NEXT STEPS
• Continue plugin development and release as an open source project
• Internal
• Continue deployment of GPU CI and migrate performance testing toward full GPU CI
• Leverage capabilities of Jenkins to go beyond CI with CD and workflow automation
• External
• Expand GPU CI testing by testing pull requests of open source projects using Jenkins and the plugin
• Take advantage of the GPU targeting within Kubernetes and new GPU features in the coming months
• Look at ways to more closely integrate GPU CI with GitLab CI and Jenkins plugins for Kubernetes
24
GETTING STARTED
github.com/nvidia
NVIDIA Docker Runtime
nvidia-docker
NVIDIA Kubernetes Device Plugin
k8s-device-plugin
github.com/mike-wendt
Jenkins Plugin For NVIDIA
Coming soon
Docker + NVIDIA Runtime on Ubuntu
nvidia-docker-ubuntu
Links to useful repos
Mike Wendt
@mike_wendt
github.com/nvidia
github.com/mike-wendt
THANK YOU
Recommended