Upload
bruce-becker
View
161
Download
1
Tags:
Embed Size (px)
DESCRIPTION
presentation on the status of the SAGrid application porting platform based on Jenkins and CVMFS, given to the EGI Community Forum 2014
Citation preview
Jenkins + CVMFS :Distributed Development,Centralised Delivery
Bruce Becker | [email protected]: SAGrid
SANREN, Meraka Institute, CSIR
Stefanus Riekert | [email protected] Application Engineer
University of the Free State
Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za
Outline● What users want● SAGrid VO – a catch-all VO with many applications● Problem statements:
● Problem 1: ”the usual problem” – maintaining applications in a distributed computing environment
● Problem 2: ”Another usual problem” - maintaining a complex application inventory
● General solution : CVMFS + Jenkins● Some specifics of SAGrid CI platform ● Outlook
Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za
SAGrid as a catch-all VO
● The South African National Grid operates a catch-all VO which all South African researchers can use to access computing and data resources.
● SAGrid VO is not a domain-specific VO, so● several widely-varying uses for the applications
supported by this VO● Applications requested by users or communities
themselves
Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za
What users want
Amazing infrastructure
Some users want highly varied, modular
application selection
Vertically integratedHighly specialised
applications
Highly trained supportHighly trained support
Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za
What users get sometimes
Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za
The problem (1) - ”the usual problem”
● Software distribution was done mostly by hand”:● Someone from the ops team develops script to install the application● Apps installed via job submission ● Tags applied via script or by the job itself
● Issues:● Major overhead of work● Inconsistent installation procedures between applications and sites● Bottleneck in porting applications (has to be done by someone in the
VO)● Duplication of effort, especially in dependencies of applications● Difficult to manage application lifecycles
Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za
The problem (2) - what about the community ?
● Managing the inventory in a catch-all VO can be complex when there are many applications
● Prioritising porting requests depends on the knowledge of the export porting the application● Can lead to major delays in porting and deploying applications
● However, a user or community usually has an expert who knows how to tune, port and configure the application properly, as well as dependencies● Usually, ”they” have to conform to ”us” - learn grid tools and
terminology, etc
Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za
Problem (3) :Changes to the playing field
● New middleware stacks
● New architectures – GPGPU, ARM
Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za
Questions to answer● How do we lower the barrier to entry to the grid or
cloud infrastructure ?● How can the application expert prove to the resource
provider that the application will actually run on the execution environment of the site ?
● How can we manage the lifecycle of applications across multiple versions, architectures, configurations ?
● How can we ensure that once applications are ”certified”, they are actually available on as many sites as possible ?
Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za
General Solution: Jenkins + CVMFS
● The issues outlined are ”typical” in a large software project
● Usually solved by judicious use of Continuous Integration system
● Once applications have been ”ported”, put them into a trusted repository
● Previously – built RPMs, but required site-admin intervention
● One-time configuration with CVMFS
Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za
First, some changes● Distribute the effort, centralise the tools
● Move repository from ”closed” SVN repo– https://ops.sagrid.ac.za/trac/svn/repo
● to git– https://github.com/SAGridOps/SoftwareInstallation
● Don't have to give write access to a single repo, instead accept pull requests
● Take advantage of all the Github infrastructure● Expand possible contributors to those ”outside” the
infrastructure● Recognise individuals' contribution
Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za
Recognise individuals...
Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za
Decentralise the team
Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za
Collaborate with code
Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za
Let the robots do the work
● Define what we want to deploy – let the experts take care of how to deploy
● DevOps paradigm – same review/tag/release mechanisms on operations code as we have for scientific applications● Teach a marketable skill● Allow specialisation● Enable remote management of complex services● Ensure that published methodology is adopted
methodology
Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za
Quality Control and feedback
● Ensure that requested applications are included in the repo
● Provide testing and QA infrastructure
● Self-serve to users
Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za
The CI environment● Jenkins is extremely flexible... can do almost anything● AuthN/AuthZ
● Currently using Github Oauth ● Take advantage of future Identity Federation
● We wanted to simulate different execution environments● Already in production● Planned for future
● Track and re-use depedendencies
Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za
Matrix-based builds● Independent different builds and build statuses for
different configurations:● Application name● Version● OS● Architecture● … can add specific tuning configurations...
● We can see exactly what's broken where – build more resilient integration code.
Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za
Typical workflow
Test
ing
mat
rix
Defines relevanttests in Jenkins
Writes code to pass required tests
Dev/Stage env.Application developer
Infrastructure expert
Reads descriptionof execution environment tests
Promote a buildto CVMFS
Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za
Dependency managementsimple case
● Common problem with applications : need a specific version of a compiler
● Compiling the compiler can itself be tricky...
● Jenkins tests the full dependency chain necessary
Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za
Real-world application
● GADGET – astrophysics hydrodynamic simulations
● Many (levels of) dependencies
Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za
Public Application Dashboard
Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za
Authenticated view
Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za
Generic build script# GADGET requires HDF5 FFTW2 ZLIB and openmpimodule add cimodule add fftw/2.1.5module add hdf5module add openmpimodule add gsl
# GADGET requires HDF5 FFTW2 ZLIB and openmpimodule add cimodule add fftw/2.1.5module add hdf5module add openmpimodule add gsl
rm rf $FFTW_DIRtar xvfz /repo/$SITE/$OS/$ARCH/fftw/$FFTW_VERSION/build.tar.gz C /rm rf $HDF5_DIRtar xvfz /repo/$SITE/$OS/$ARCH/hdf5/$HDF5_VERSION/build.tar.gz C /rm rf $OPENMPI_DIRtar xvfz /repo/$SITE/$OS/$ARCH/openmpi/$OPENMPI_VERSION/build.tar.gz C /rm rf $GSL_DIRtar xvfz /repo/$SITE/$OS/$ARCH/gsl/$GSL_VERSION/build.tar.gz C /
rm rf $FFTW_DIRtar xvfz /repo/$SITE/$OS/$ARCH/fftw/$FFTW_VERSION/build.tar.gz C /rm rf $HDF5_DIRtar xvfz /repo/$SITE/$OS/$ARCH/hdf5/$HDF5_VERSION/build.tar.gz C /rm rf $OPENMPI_DIRtar xvfz /repo/$SITE/$OS/$ARCH/openmpi/$OPENMPI_VERSION/build.tar.gz C /rm rf $GSL_DIRtar xvfz /repo/$SITE/$OS/$ARCH/gsl/$GSL_VERSION/build.tar.gz C /
Set up theenvironment
Clean build, retrieve dependency artifacts
Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za
Generic build scriptmake install DESTDIR=$WORKSPACE/buildmkdir p $REPO_DIRrm rf $REPO_DIR/*tar cvzf $REPO_DIR/build.tar.gz C $WORKSPACE/build apprepo
make install DESTDIR=$WORKSPACE/buildmkdir p $REPO_DIRrm rf $REPO_DIR/*tar cvzf $REPO_DIR/build.tar.gz C $WORKSPACE/build apprepo
Actually build...Create the artifact
cat <<MODULE_FILE#%Module1.0## $NAME modulefile##proc ModulesHelp { } { puts stderr " This module does nothing but alert the user" puts stderr " that the [moduleinfo name] module is not available"}preqreq("gsl","fftw/2.1.5","hdf5")modulewhatis "$NAME $VERSION."setenv GSL_VERSION $VERSIONsetenv GSL_DIR /apprepo/$::env(SITE)/$::env(OS)/$::env(ARCH)/$NAME/$VERSIONprependpath LD_LIBRARY_PATH $::env(GSL_DIR)/libMODULE_FILE) > modules/$VERSION
cat <<MODULE_FILE#%Module1.0## $NAME modulefile##proc ModulesHelp { } { puts stderr " This module does nothing but alert the user" puts stderr " that the [moduleinfo name] module is not available"}preqreq("gsl","fftw/2.1.5","hdf5")modulewhatis "$NAME $VERSION."setenv GSL_VERSION $VERSIONsetenv GSL_DIR /apprepo/$::env(SITE)/$::env(OS)/$::env(ARCH)/$NAME/$VERSIONprependpath LD_LIBRARY_PATH $::env(GSL_DIR)/libMODULE_FILE) > modules/$VERSION
Create the modulefile
Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za
So, it works ! … almostNext steps
● We have an open, collaborative, low-barrier platform for researchers to bring applications to the grid
● Small technical tasks : ● Implement promoted builds mechanism to populate sagrid.ac.za CVMFS repo● Implement SAML AuthN, integrate IdF● Probes to check that CVMFS is mounted on sites (?)
● Operating in ”stealth mode” at the moment – not advertising, but open to anyone who is interested to collect feedback
● Addressing specific user communities to test drive the system:● Machine learning astro applications (rapid prototyping)● Bioinformatics application suites (complex ecosystem)
● Present next phase of the project in November in Cape Town – move to production