Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Shifter: Containers in HPC EnvironmentsSecond ADAC Workshop, Lugano, June 2016
Lucas Benedicic, CSCSJune 14th 2016
Second ADAC Workshop, Lugano, June 2016 | 2
Agenda
BackgroundImplementationUse CasesSecurity
Second ADAC Workshop, Lugano, June 2016 | 3
Background
Docker overview
Build an imagecapturing allapplicationrequirements
Commit the imageor use a recipe file
Send the imagedescriptor tocollaborators
Push it toDockerHub or aprivate Registry
Pull the imagefrom DockerHub ora private Registry
Launch the imageas a container
Second ADAC Workshop, Lugano, June 2016 | 5
Docker overview
Build an imagecapturing allapplicationrequirements
Commit the imageor use a recipe file
Send the imagedescriptor tocollaborators
Push it toDockerHub or aprivate Registry
Pull the imagefrom DockerHub ora private Registry
Launch the imageas a container
Second ADAC Workshop, Lugano, June 2016 | 5
Docker overview
Build an imagecapturing allapplicationrequirements
Commit the imageor use a recipe file
Send the imagedescriptor tocollaborators
Push it toDockerHub or aprivate Registry
Pull the imagefrom DockerHub ora private Registry
Launch the imageas a container
Second ADAC Workshop, Lugano, June 2016 | 5
Docker drawbacks
ArchitectureDocker assumesa local disk
SecurityDocker userscan easilyescalateprivileges onthe hostsystem
IntegrationDocker is notdesigned towork with batchsystems
ComplexityDocker uses aclient/daemonarchitecture
Second ADAC Workshop, Lugano, June 2016 | 6
Docker drawbacks
ArchitectureDocker assumesa local disk
SecurityDocker userscan easilyescalateprivileges onthe hostsystem
IntegrationDocker is notdesigned towork with batchsystems
ComplexityDocker uses aclient/daemonarchitecture
Second ADAC Workshop, Lugano, June 2016 | 6
Docker drawbacks
ArchitectureDocker assumesa local disk
SecurityDocker userscan easilyescalateprivileges onthe hostsystem
IntegrationDocker is notdesigned towork with batchsystems
ComplexityDocker uses aclient/daemonarchitecture
Second ADAC Workshop, Lugano, June 2016 | 6
Docker drawbacks
ArchitectureDocker assumesa local disk
SecurityDocker userscan easilyescalateprivileges onthe hostsystem
IntegrationDocker is notdesigned towork with batchsystems
ComplexityDocker uses aclient/daemonarchitecture
Second ADAC Workshop, Lugano, June 2016 | 6
Solution: Shifter
Partnership with NERSC and Cray to design asolution to run containers on HPC platforms
Design goalsFlexibility requires no administrator assistance to launch a containerIntegration shared resource availability (e.g., mounts, devices andnetwork interfaces)Compatibility integrates with public image repositories (e.g.,DockerHub)Security stripped-down version of the original image is deployed inread-only mode
Second ADAC Workshop, Lugano, June 2016 | 7
Solution: Shifter
Partnership with NERSC and Cray to design asolution to run containers on HPC platformsDesign goals
Flexibility requires no administrator assistance to launch a container
Integration shared resource availability (e.g., mounts, devices andnetwork interfaces)Compatibility integrates with public image repositories (e.g.,DockerHub)Security stripped-down version of the original image is deployed inread-only mode
Second ADAC Workshop, Lugano, June 2016 | 7
Solution: Shifter
Partnership with NERSC and Cray to design asolution to run containers on HPC platformsDesign goals
Flexibility requires no administrator assistance to launch a containerIntegration shared resource availability (e.g., mounts, devices andnetwork interfaces)
Compatibility integrates with public image repositories (e.g.,DockerHub)Security stripped-down version of the original image is deployed inread-only mode
Second ADAC Workshop, Lugano, June 2016 | 7
Solution: Shifter
Partnership with NERSC and Cray to design asolution to run containers on HPC platformsDesign goals
Flexibility requires no administrator assistance to launch a containerIntegration shared resource availability (e.g., mounts, devices andnetwork interfaces)Compatibility integrates with public image repositories (e.g.,DockerHub)
Security stripped-down version of the original image is deployed inread-only mode
Second ADAC Workshop, Lugano, June 2016 | 7
Solution: Shifter
Partnership with NERSC and Cray to design asolution to run containers on HPC platformsDesign goals
Flexibility requires no administrator assistance to launch a containerIntegration shared resource availability (e.g., mounts, devices andnetwork interfaces)Compatibility integrates with public image repositories (e.g.,DockerHub)Security stripped-down version of the original image is deployed inread-only mode
Second ADAC Workshop, Lugano, June 2016 | 7
Implementation
Shifter Architecture
Second ADAC Workshop, Lugano, June 2016 | 9
Shifter vs Docker: similarities
The user-defined images are under user control
Allows volume mappingmount /a/b on the host on /b/a in the container
Containers can be executedenvironment variables, working directory, entry-point scripts, ...
Instantiate multiple containers on the same computenode
Second ADAC Workshop, Lugano, June 2016 | 10
Shifter vs Docker: similarities
The user-defined images are under user controlAllows volume mapping
mount /a/b on the host on /b/a in the container
Containers can be executedenvironment variables, working directory, entry-point scripts, ...
Instantiate multiple containers on the same computenode
Second ADAC Workshop, Lugano, June 2016 | 10
Shifter vs Docker: similarities
The user-defined images are under user controlAllows volume mapping
mount /a/b on the host on /b/a in the container
Containers can be executedenvironment variables, working directory, entry-point scripts, ...
Instantiate multiple containers on the same computenode
Second ADAC Workshop, Lugano, June 2016 | 10
Shifter vs Docker: similarities
The user-defined images are under user controlAllows volume mapping
mount /a/b on the host on /b/a in the container
Containers can be executedenvironment variables, working directory, entry-point scripts, ...
Instantiate multiple containers on the same computenode
Second ADAC Workshop, Lugano, June 2016 | 10
Shifter vs Docker: differences
Containers run under the user’s UID inside thecontainer
Images are modified at construction timeReplaces /etc/passwd, /etc/group, ...
Generates hostsfiles to identify other nodes in the allocation
Images are read-only on the compute nodeShifter does not use cgroups directly
Resources are handled by the workload manager (e.g., SLURM)
Second ADAC Workshop, Lugano, June 2016 | 11
Shifter vs Docker: differences
Containers run under the user’s UID inside thecontainerImages are modified at construction time
Replaces /etc/passwd, /etc/group, ...
Generates hostsfiles to identify other nodes in the allocation
Images are read-only on the compute nodeShifter does not use cgroups directly
Resources are handled by the workload manager (e.g., SLURM)
Second ADAC Workshop, Lugano, June 2016 | 11
Shifter vs Docker: differences
Containers run under the user’s UID inside thecontainerImages are modified at construction time
Replaces /etc/passwd, /etc/group, ...
Generates hostsfiles to identify other nodes in the allocation
Images are read-only on the compute node
Shifter does not use cgroups directlyResources are handled by the workload manager (e.g., SLURM)
Second ADAC Workshop, Lugano, June 2016 | 11
Shifter vs Docker: differences
Containers run under the user’s UID inside thecontainerImages are modified at construction time
Replaces /etc/passwd, /etc/group, ...
Generates hostsfiles to identify other nodes in the allocation
Images are read-only on the compute nodeShifter does not use cgroups directly
Resources are handled by the workload manager (e.g., SLURM)
Second ADAC Workshop, Lugano, June 2016 | 11
Use Cases
Creating a Docker image
Dockerfile
FROM ubuntu:14.04
# Update packages and install dependencies
RUN apt-get update -y &&
apt-get install -y build-essential
# Copy in the application
ADD . /theapp
# Build it
RUN cd /theapp &&
make &&
make install
Second ADAC Workshop, Lugano, June 2016 | 13
Use the image with Shifter
SLURM batch job
#!/bin/bash
#SBATCH -N 16 -t 20
#SBATCH --image=docker:ubuntu:14.04
#SBATCH --volume=/scratch/user/data:/data
module load shifter
srun -n 16 shifter /theapp/app
Second ADAC Workshop, Lugano, June 2016 | 14
Shifter: Extending the Docker workflow to HPC
Develop an application on your laptop and run it on aSupercomputer
Enables the user to define complex software-stacksthemselves
Runs the Linux flavor of their choice
Improves reproducibility
Improves sharing (e.g., Dockerfile, DockerHub)
Second ADAC Workshop, Lugano, June 2016 | 15
Shifter: Extending the Docker workflow to HPC
Develop an application on your laptop and run it on aSupercomputer
Enables the user to define complex software-stacksthemselves
Runs the Linux flavor of their choice
Improves reproducibility
Improves sharing (e.g., Dockerfile, DockerHub)
Second ADAC Workshop, Lugano, June 2016 | 15
Shifter: Extending the Docker workflow to HPC
Develop an application on your laptop and run it on aSupercomputer
Enables the user to define complex software-stacksthemselves
Runs the Linux flavor of their choice
Improves reproducibility
Improves sharing (e.g., Dockerfile, DockerHub)
Second ADAC Workshop, Lugano, June 2016 | 15
Shifter: Extending the Docker workflow to HPC
Develop an application on your laptop and run it on aSupercomputer
Enables the user to define complex software-stacksthemselves
Runs the Linux flavor of their choice
Improves reproducibility
Improves sharing (e.g., Dockerfile, DockerHub)
Second ADAC Workshop, Lugano, June 2016 | 15
Shifter: Extending the Docker workflow to HPC
Develop an application on your laptop and run it on aSupercomputer
Enables the user to define complex software-stacksthemselves
Runs the Linux flavor of their choice
Improves reproducibility
Improves sharing (e.g., Dockerfile, DockerHub)
Second ADAC Workshop, Lugano, June 2016 | 15
Atlas and LHC
CSCS operates a cluster running experiments of theLHC at CERN
Jobs expect a RHEL-compatible OS and aprecompiled software stack
Shifter reproduces the complete software stack onthe Cray XC
Job efficiency is comparable on both systems
Second ADAC Workshop, Lugano, June 2016 | 16
Atlas and LHC
CSCS operates a cluster running experiments of theLHC at CERN
Jobs expect a RHEL-compatible OS and aprecompiled software stack
Shifter reproduces the complete software stack onthe Cray XC
Job efficiency is comparable on both systems
Second ADAC Workshop, Lugano, June 2016 | 16
Atlas and LHC
CSCS operates a cluster running experiments of theLHC at CERN
Jobs expect a RHEL-compatible OS and aprecompiled software stack
Shifter reproduces the complete software stack onthe Cray XC
Job efficiency is comparable on both systems
Second ADAC Workshop, Lugano, June 2016 | 16
Atlas and LHC
CSCS operates a cluster running experiments of theLHC at CERN
Jobs expect a RHEL-compatible OS and aprecompiled software stack
Shifter reproduces the complete software stack onthe Cray XC
Job efficiency is comparable on both systemsSecond ADAC Workshop, Lugano, June 2016 | 16
Apache Spark
Designed around commodity clusters, i.e., ethernetand local disks
Does not scale well on parallel filesystems, e.g.,Lustre
Shifter minimizes the metadata overhead
Tested on NERSC’s Cori up to 1600 nodes
Second ADAC Workshop, Lugano, June 2016 | 17
Apache Spark
Designed around commodity clusters, i.e., ethernetand local disks
Does not scale well on parallel filesystems, e.g.,Lustre
Shifter minimizes the metadata overhead
Tested on NERSC’s Cori up to 1600 nodes
Second ADAC Workshop, Lugano, June 2016 | 17
Apache Spark
Designed around commodity clusters, i.e., ethernetand local disks
Does not scale well on parallel filesystems, e.g.,Lustre
Shifter minimizes the metadata overhead
Tested on NERSC’s Cori up to 1600 nodes
Second ADAC Workshop, Lugano, June 2016 | 17
Apache Spark
Designed around commodity clusters, i.e., ethernetand local disks
Does not scale well on parallel filesystems, e.g.,Lustre
Shifter minimizes the metadata overhead
Tested on NERSC’s Cori up to 1600 nodes
Second ADAC Workshop, Lugano, June 2016 | 17
Accessing GPUs
Containers are both hardware-agnostic andplatform-agnostic by design
This is not the case when using GPUsit uses specialized hardware, andit requires specific software on the host, i.e., NVIDIA kernel driver
Shifter approach (CSCS + NVIDIA)direct access to device charactersthe required libraries are dynamically discovered at runtime
Second ADAC Workshop, Lugano, June 2016 | 18
Accessing GPUs
Containers are both hardware-agnostic andplatform-agnostic by designThis is not the case when using GPUs
it uses specialized hardware, andit requires specific software on the host, i.e., NVIDIA kernel driver
Shifter approach (CSCS + NVIDIA)direct access to device charactersthe required libraries are dynamically discovered at runtime
Second ADAC Workshop, Lugano, June 2016 | 18
Accessing GPUs
Containers are both hardware-agnostic andplatform-agnostic by designThis is not the case when using GPUs
it uses specialized hardware, andit requires specific software on the host, i.e., NVIDIA kernel driver
Shifter approach (CSCS + NVIDIA)direct access to device charactersthe required libraries are dynamically discovered at runtime
Second ADAC Workshop, Lugano, June 2016 | 18
Accessing GPUs
The Stream benchmark within a Shifter container shows nativeperformance!
NVIDIA’s DGX-1 software stack is based on this solution
Second ADAC Workshop, Lugano, June 2016 | 19
Shifter and MPI
Challengesdifferent versions, implementations (vendors), hardware ...
Embedded in the Imageadd the required libraries into the imageusers should maintain their own images
Site-specific base imagesusers extend a managed image including the required librariesthese are upgraded together with the system
Dynamic-linking at runtimeuser’s application built with ABI compatibilitysystem-specific implementation dynamically mounted at runtime
Second ADAC Workshop, Lugano, June 2016 | 20
Shifter and MPI
Challengesdifferent versions, implementations (vendors), hardware ...
Embedded in the Imageadd the required libraries into the imageusers should maintain their own images
Site-specific base imagesusers extend a managed image including the required librariesthese are upgraded together with the system
Dynamic-linking at runtimeuser’s application built with ABI compatibilitysystem-specific implementation dynamically mounted at runtime
Second ADAC Workshop, Lugano, June 2016 | 20
Shifter and MPI
Challengesdifferent versions, implementations (vendors), hardware ...
Embedded in the Imageadd the required libraries into the imageusers should maintain their own images
Site-specific base imagesusers extend a managed image including the required librariesthese are upgraded together with the system
Dynamic-linking at runtimeuser’s application built with ABI compatibilitysystem-specific implementation dynamically mounted at runtime
Second ADAC Workshop, Lugano, June 2016 | 20
Shifter and MPI
Challengesdifferent versions, implementations (vendors), hardware ...
Embedded in the Imageadd the required libraries into the imageusers should maintain their own images
Site-specific base imagesusers extend a managed image including the required librariesthese are upgraded together with the system
Dynamic-linking at runtimeuser’s application built with ABI compatibilitysystem-specific implementation dynamically mounted at runtime
Second ADAC Workshop, Lugano, June 2016 | 20
Security
Container Security
Security contexts don’t provide enough security andare difficult to configure, e.g., SELinux
Docker’s approach is broken by design, e.g., root inthe container is still root in the host
Look at what RedHat did
Second ADAC Workshop, Lugano, June 2016 | 22
Container Security
Security contexts don’t provide enough security andare difficult to configure, e.g., SELinux
Docker’s approach is broken by design, e.g., root inthe container is still root in the host
Look at what RedHat did
Second ADAC Workshop, Lugano, June 2016 | 22
Container Security
Security contexts don’t provide enough security andare difficult to configure, e.g., SELinux
Docker’s approach is broken by design, e.g., root inthe container is still root in the host
Look at what RedHat did
Second ADAC Workshop, Lugano, June 2016 | 22
Shifter Security Model
User accesses the container as their UID, not root orcontextual root
Generated site /etc/passwd, /etc/group inside thecontainer
Embeded sshd is statically linked and accessibleunder the user’s UID
User-provided data are verified and filtered ifneeded, e.g., sudo
Second ADAC Workshop, Lugano, June 2016 | 23
Shifter Security Model
User accesses the container as their UID, not root orcontextual root
Generated site /etc/passwd, /etc/group inside thecontainer
Embeded sshd is statically linked and accessibleunder the user’s UID
User-provided data are verified and filtered ifneeded, e.g., sudo
Second ADAC Workshop, Lugano, June 2016 | 23
Shifter Security Model
User accesses the container as their UID, not root orcontextual root
Generated site /etc/passwd, /etc/group inside thecontainer
Embeded sshd is statically linked and accessibleunder the user’s UID
User-provided data are verified and filtered ifneeded, e.g., sudo
Second ADAC Workshop, Lugano, June 2016 | 23
Shifter Security Model
User accesses the container as their UID, not root orcontextual root
Generated site /etc/passwd, /etc/group inside thecontainer
Embeded sshd is statically linked and accessibleunder the user’s UID
User-provided data are verified and filtered ifneeded, e.g., sudo
Second ADAC Workshop, Lugano, June 2016 | 23
Conclusions
Containers are here to stay
Not an universal solution
They can be secured and isolated to matchhypervisorsStill, several important issues arise
Support, e.g., how should images be maintained and/ortroubleshooted?Adoption, e.g., how are users adopting Docker?Training, e.g., can we leverage from the Docker community? Issite-specific documentation needed?
We haven’t even scratched the surface of thepossibilities
Second ADAC Workshop, Lugano, June 2016 | 24
Conclusions
Containers are here to stay
Not an universal solution
They can be secured and isolated to matchhypervisorsStill, several important issues arise
Support, e.g., how should images be maintained and/ortroubleshooted?Adoption, e.g., how are users adopting Docker?Training, e.g., can we leverage from the Docker community? Issite-specific documentation needed?
We haven’t even scratched the surface of thepossibilities
Second ADAC Workshop, Lugano, June 2016 | 24
Conclusions
Containers are here to stay
Not an universal solution
They can be secured and isolated to matchhypervisors
Still, several important issues ariseSupport, e.g., how should images be maintained and/ortroubleshooted?Adoption, e.g., how are users adopting Docker?Training, e.g., can we leverage from the Docker community? Issite-specific documentation needed?
We haven’t even scratched the surface of thepossibilities
Second ADAC Workshop, Lugano, June 2016 | 24
Conclusions
Containers are here to stay
Not an universal solution
They can be secured and isolated to matchhypervisorsStill, several important issues arise
Support, e.g., how should images be maintained and/ortroubleshooted?Adoption, e.g., how are users adopting Docker?Training, e.g., can we leverage from the Docker community? Issite-specific documentation needed?
We haven’t even scratched the surface of thepossibilities
Second ADAC Workshop, Lugano, June 2016 | 24
Conclusions
Containers are here to stay
Not an universal solution
They can be secured and isolated to matchhypervisorsStill, several important issues arise
Support, e.g., how should images be maintained and/ortroubleshooted?Adoption, e.g., how are users adopting Docker?Training, e.g., can we leverage from the Docker community? Issite-specific documentation needed?
We haven’t even scratched the surface of thepossibilities
Second ADAC Workshop, Lugano, June 2016 | 24
Conclusions
Containers are here to stay
Not an universal solution
They can be secured and isolated to matchhypervisorsStill, several important issues arise
Support, e.g., how should images be maintained and/ortroubleshooted?Adoption, e.g., how are users adopting Docker?Training, e.g., can we leverage from the Docker community? Issite-specific documentation needed?
We haven’t even scratched the surface of thepossibilities
Second ADAC Workshop, Lugano, June 2016 | 24
Thank you for your attention.