SIMPLIFYING COMPLEX SOFTWARE ASSEMBLY
THE COMPONENT RETRIEVAL LANGUAGE AND IMPLEMENTATION
Presenter :Eric SeidelDept. of Computer ScienceCity College of New [email protected]
Co-authors:Gabrielle Allen, Steven Brandt, Frank Löffler, and Erik SchnetterCenter for Computation & TechnologyLouisiana State University
Wednesday, August 4, 2010
COMPONENT FRAMEWORKS• Set of individual software modules coordinated by glue
framework
• Each component (module) performs a specific task and encapsulates a set of related functions data
• Frameworks can range from having a few components to many
• Components communicate via interfaces
• Used for various purposes, HPC examples include
• Cactus Framework
• CCA Frameworks (e.g. Caffeine)
•Domain specific frameworks (e.g. Earth System Modeling Framework)
Wednesday, August 4, 2010
CACTUS• Component Framework
• Over 500 unique components
• Distributed around the world
• Flesh
• Core application
• Thorns
• Independent modules
• Perform actual computation
• High Performance Computing
• Massively parallel
• Runs on high end supercomputer clusters
• Supports many applications
• Numerical Relativity
• Quantum Gravity
• Computational Fluid Dynamics
www.cactuscode.org
Wednesday, August 4, 2010
CACTUS WORKFLOW
• Managed using “Thornlists”
• Plaintext list of thorns required for a specific configuration
• Used to checkout, update, build, and test the source code
!REPOSITORY_TYPE pserver!REPOSITORY_LOCATION cvs.cactuscode.org!REPOSITORY_NAME /cactusdevcvs!REPOSITORY_USER eric9
CactusBase/BoundaryCactusBase/CartGrid3DCactusBase/CoordBaseCactusBase/IOASCIICactusBase/IOBasicCactusBase/IOUtilCactusBase/InitBaseCactusBase/LocalInterp
Wednesday, August 4, 2010
EINSTEIN TOOLKIT
• Toolkit for relativistic astrophysical simulations
• Developed using Cactus
• Comprised of 135 thorns
• Initial Data, Evolution/Analysis methods, Utilities
• First official release 2 months ago www.einsteintoolkit.org
Wednesday, August 4, 2010
MOTIVATION
•Distributed Software Frameworks are hard to assemble and manage
• Einstein Toolkit comprised of 135 individual components
• Very tedious to manually checkout or update
• Large barrier to entry for new users
Wednesday, August 4, 2010
VERSION CONTROL SYSTEMS• Used to track revisions in source code
• Concurrent Versions System (cvs)
• Released in 1990
• Uses client-server model
• Server stores full history of repository
• Clients retrieve specific revision
• Subversion (svn)
• Released in 2000
• Successor to cvs
• Also uses client-server model
• Git
• Released in 2005
• Uses distributed model
• Everyone has copy of full historyhttp://en.wikipedia.org/wiki/
File:Revision_controlled_project_visualization-2010-24-02.svg
Wednesday, August 4, 2010
GETCACTUS
• Designed to checkout and update Cactus thorns and flesh
• Specific to Cactus Framework
• Originally designed for CVS
• SVN and git added later
• Still difficult to distribute the framework
• Users must edit the thornlist
!REPOSITORY_TYPE pserver!REPOSITORY_LOCATION cvs.cactuscode.org!REPOSITORY_NAME /cactusdevcvs!REPOSITORY_USER eric9
CactusBase/BoundaryCactusBase/CartGrid3DCactusBase/CoordBaseCactusBase/IOASCIICactusBase/IOBasicCactusBase/IOUtilCactusBase/InitBaseCactusBase/LocalInterp
Wednesday, August 4, 2010
# NAME is an alphanumeric or ’.’ character
DOCUMENT : DIRECTIVES ;
DIRECTIVE : DEFINE NAME ’=’ PATH EOL| CHECKOUT ’=’ COMPONENTLIST EOL| CHECKOUT ’=’ EOL COMPONENTLIST EOL| REPO_LOC ’=’ LOC EOL| AUTH_LOC ’=’ LOC EOL| PATH_DIRECTIVE ’=’ PATH EOL
# !REPO_PATH, !CHECKOUT, !TARGET,# !ANON_PASS, !NAME
| NAME_DIRECTIVE ’=’ NAME EOL# !CRL_VERSION, !AUTH_USER,# !ANON_USER, !TYPE
;
DIRECTIVES : DIRECTIVE| DIRECTIVES DIRECTIVE;
LOC : PSERVER PATH # CVS repository| NAME ’:’ ’/’ ’/’ PATH # Git/SVN repository| NAME ’@’ NAME ’:’ PATH # Git repository;
PATH : NAME| ’/’ NAME| PATH ’/’ NAME;
COMPONENTLIST : PATH| COMPONENTLIST EOL PATH ;
Figure 2: Grammar for the CRL in Bison format
anonymous methods. The auto-update option will by-
pass the user prompt and update any components that
have been previously checked out, this allows GetCom-ponents to be safely called by another program as a
background process.
Authentication and updates are handled by the un-
derlying version control tools, with GetComponents pro-
viding a uniform layer between the user and the under-
lying tools. Figure 3 shows the general authentication
process used by GetComponents, which is called once
for each component block, unless anonymous mode has
been selected. It first checks for !AUTH_URL, which spec-
ifies authenticated access to the repository. It then at-
tempts to match the AUTH_URL to the GetComponentsusers file (located by default in $HOME/.crl/users). If
a match is found, GetComponents will use the associ-
ated username and then proceed to processing the next
component block. If no match is found, GetComponentswill prompt the user for their username, and attempt to
login to the repository using the appropriate command
(eg. cvs login), after which it will save the username
and URL in the users file. This has the security benefit
of keeping passwords visible only to the actual retrieval
tools. The user may also specify a ’-’ at this prompt
to indicate they wish to perform an anonymous check-
out for all components in the block. GetComponents will
store this as well in the users file, so the user is not
forced to specify anonymous access repeatedly. If the
user mistakenly entered the wrong username, or wishes
to change access methods, they may specify the -reset-authentication option, which will delete the users file
and allow the user to reenter their usernames.
If errors occur during the checkout process, GetCompo-nents stores the name of the component that caused the
error, and prints out a list of all components that had er-
rors before exiting. In addition any error will be logged,
including the exact command that was called, and the
error that was returned by the checkout tool. GetCompo-nents will also time the entire checkout/update process
and print the total time elapsed before exiting.
Multiple component lists may be specified together, in
which case GetComponents will concatenate the lists and
process them as one. The component list may also be
specified as an URL, which GetComponents will down-
load and then process normally. This further simplifies
the code assembly process, as the user must only down-
load GetComponents to initiate the assembly. In addi-
tion, the anonymous checkout process is shortened by
performing a shallow checkout of git repositories. As
a distributed versioning system, cloning a git reposi-
tory requires one to clone the entire repository, along
with the full history of the repository. Over time, this
history accumulates, and can consume a large amount
of disk space. A shallow checkout of a git repository
only clones the most recent changeset, thereby reducing
(sometimes greatly) the size of the resulting local copy,
for example the Carpet repository can be reduced from
115MB to 76MB by performing a shallow checkout.
GetComponents was written to be very modular, and it
can easily be extended to include other versioning tools.
All of the tools are handled by their own subroutine, and
are pointed to by a single hash, which GetComponentscompares with the !TYPE directive in each component.
To add new functionality, one would only have to write
a subroutine for the new tool, and add an entry to the
checkout_types hash.
7. EXAMPLE: EINSTEIN TOOLKIT
The Einstein Toolkit [7] is a collection of software com-
ponents and tools for simulating and analyzing general
relativistic astrophysical systems. Such systems include
gravitational wave space-times, collisions of compact ob-
jects such as black holes or neutron stars, accretion onto
compact objects, supernovae core collapse and gamma-
ray bursts. Different research teams typically use the
Einstein Toolkit as the basis of their group codes where
they supplement the toolkit with additional modules for
initial data, evolution, analysis etc.
The Einstein Toolkit uses a distributed development
model where its software modules are either developed,
distributed and supported by the core maintainers team,
or by individual groups. Where modules are provided
by external groups, the Einstein Toolkit maintainers
provide quality control for modules for inclusion in the
toolkit and coordinate support and releases. While the
core of the toolkit is a set of Cactus thorns (distributed
from different repositories), the toolkit also contains ex-
ample parameter files, documentation, and tools for vi-
sualization, debugging, and simulation deployment.
The component list (einsteintoolkit.th2) for the
Einstein Toolkit uses the CRL for distribution of its cur-
rently 130 different software components. All the com-
ponents of the Einstein Toolkit are available by anony-
2https://svn.einsteintoolkit.org/manifest/einsteintoolkit.th
COMPONENT RETRIEVAL LANGUAGE
•Designed to fix problems with original GetCactus script
• Provides unified, tool agnostic syntax
•Abstracts authentication procedures
•General-Purpose
•No longer specific to Cactus
Wednesday, August 4, 2010
SAMPLE CRL FILE
!DEFINE ROOT = Cactus!DEFINE ARR = $ROOT/arrangements
!TARGET = $ROOT!TYPE = svn!AUTH_URL = https://svn.cactuscode.org/flesh/trunk!URL = http://svn.cactuscode.org/flesh/trunk!CHECKOUT = Cactus!NAME = .
!TARGET = $ROOT!TYPE = svn!URL = https://svn.cct.lsu.edu/repos/numrel/$1/trunk!CHECKOUT = simfactory
!TARGET = $ARR!TYPE = svn!AUTH_URL = https://svn.cactuscode.org/arrangements/$1/$2/trunk!URL = http://svn.cactuscode.org/arrangements/$1/$2/trunk!CHECKOUT =CactusArchive/ADMCactusBase/BoundaryCactusBase/CartGrid3DCactusBase/CoordBase
!TARGET = $ARR!TYPE = git!URL = git://github.com/ianhinder/Kranc.git!AUTH_URL = [email protected]:ianhinder/Kranc.git!REPO_PATH= Auxiliary/Cactus!CHECKOUT =KrancNumericalTools/GenericFD
# McLachlan, the spacetime code!TARGET = $ARR!TYPE = git!URL = git://carpetcode.dyndns.org/McLachlan!AUTH_URL = [email protected]:McLachlan!REPO_PATH= $2!CHECKOUT = McLachlan/doc McLachlan/m McLachlan/parMcLachlan/ML_BSSNMcLachlan/ML_BSSN_HelperMcLachlan/ML_BSSN_O2McLachlan/ML_BSSN_O2_HelperMcLachlan/ML_BSSN_TestMcLachlan/ML_ADMConstraints
Wednesday, August 4, 2010
GETCOMPONENTS
•Designed to be very modular
• Currently supports 5 version control systems and http/ftp downloads
• Very easy to add more
• Can take input as local file or URL
•Manages all authentication issues
AssembleSimulation GetComponents
Einstein Toolkit
Cactus Fleshand CCTK
svn.cactuscode.org
Carpet AMRgit.carpetcode.org
Core Einstein Toolkitsvn.einsteintoolkit.org
Einstein Toolkitsvn.partnersite.org
Research Groups
Group Modulessvn.groupthorns.org
Individual Modulesgit.mythorns.org
Group Modulesftp.groupthorns.org
Tools, Parameter Files, & Data
svn.einsteintoolkit.org
./GetComponents http://tinyurl.com/einsteintoolkit-2010-06Wednesday, August 4, 2010
Anonymous mode
selected?
Are components available
anonymously?
Use anonymous
checkout
Print error and ignore component
Is username for URL known?
Prompt for username
Use known username
Verify access
Checkout components
yes
yesno
yes
no
no
AUTHENTICATION
• Authentication handled entirely by VCS tools
• GetComponents stores list of authenticated repositories and users
• Also tracks repositories with specified anonymous access
• Very secure
• GetComponents never sees any passwords!
Wednesday, August 4, 2010
CHECKOUT VS. UPDATE SPEED
0
325
650
975
1300
Abe Frost Kraken Lincoln LoneStar Longhorn Queen Bee Ranger Spur Steele
Tim
e (s
econ
ds)
TeraGrid Resource
Serial Checkout Parallel Checkout
Wednesday, August 4, 2010
GETCOMPONENTS
•Generating component lists is still time-consuming and tedious
• Barrier/impossible for new users
•Don’t need all Einstein Toolkit modules to run a simulation
•How to determine which components are needed for a particular simulation?
• e.g. what is needed to model two black holes, or a coastal surge?
Wednesday, August 4, 2010
Boundary
SymBase PUGH
WaveToyC
CartGrid3D
IDScalarWaveC
CoordBase
COMPONENT DEPENDENCIES
•Dependency tracking could allow custom built simulations
• Specify one component containing data about the simulation
• Initial values, type of simulation, etc
• Then recursively check component dependencies
Wednesday, August 4, 2010
WaveToyC
Boundary CartGrid3D
IDScalarWaveC
IsoSurfacer
WaveBinarySource
CoordBase
HTTPDEextra
HTTPD
IOAsciiIOBasic IOJpeg
IOUtil jpeg6b
LocalInterp LocalReduce
PUGHReduce
PUGH
PUGHSlab
SocketSymBase
Time
Interface InheritanceFunction RequirementDirect Thorn DependencyShared Variable DependencyCapability Requirement
COMPONENT DEPENDENCIES-- WAVETOY EXAMPLE
Wednesday, August 4, 2010
Nerve
AntichainEvol
BinaryCauset
CFlatSprinkleRandomAntichain
Distributions
IOUtil
MonteCarlo
PUGH
RNGs
Interface InheritanceFunction RequirementDirect Thorn DependencyShared Variable DependencyCapability Requirement
COMPONENT DEPENDENCIES-- QUANTUM GRAVITY
Wednesday, August 4, 2010
adm
admanalysis
admbase
admconstraints
ahfinder
ahfinderdirect
calck
distortedbhivp
ehfinder
exact
extract
grhydro
idanalyticbh
idaxibrillbh
idaxioddbrillbh
idbrilldata
idconstraintviolate
idfileadm
idlinearwaves
meudon_bin_bh
meudon_bin_nsmeudon_mag_ns
ml_admconstraints
ml_admquantities
ml_bssn
ml_bssn_helper
ml_bssn_o2
ml_bssn_o2_helper
noexcision
quasilocalmeasures
rotatingdbhivp
tmunubase
twopunctures
weylscal4
coordgauge
grhydro_initdata
staticconformal
tovsolver
admcoupling
admmacros
aeilocalinterp
lapack
blas
lorene
boundary
ellsor
cartoon2d
periodic
reflectionsymmetry
rotatingsymmetry180
rotatingsymmetry90
carpetinterp
carpet
carpetinterp2
carpetioascii
carpetiobasic
carpetiohdf5
carpetioscalar
carpetreduce
carpetregrid
carpetregrid2
carpetslab
cartgrid3d
carpetevolutionmask
carpetlib
carpetmask
nanchecker carpettracker
ioascii
iohdf5util
iojpeg
spacemask
dissipation
hydro_analysis
hydro_initexcision
legoexcision
multipole
noise
sphericalsurface
constants
coordbase
ellbase
eos_hybrid
eos_baseeos_polytrope
eos_idealfluid
eosg_hybrideosg_base
eosg_idealfluid
eosg_polytrope
formaline
fortran
newrad
genericfd
loopcontrolgsl
hdf5
iohdf5
httpdextra
httpd
hydrobase
setmask_sphericalsurface
initbase
iobasic ioutil
timerreport
terminationtrigger
jpeg6b
tgrtensor
localinterp
localreduce ml_bssn_test
mol
nice norms
pugh
pughinterp
pughreduce
pughslab
slab
slabtest
socket
summationbyparts
symbase
tatelliptic
time
Interface InheritanceFunction RequirementDirect Thorn DependencyShared Variable DependencyCapability Requirement
COMPONENT DEPENDENCIES-- EINSTEIN TOOLKIT
Wednesday, August 4, 2010
adm
admanalysis
admbase
admconstraints
ahfinder
ahfinderdirect
calck
distortedbhivp
ehfinder
exact
extract
grhydro
idanalyticbh
idaxibrillbh
idaxioddbrillbh
idbrilldata
idconstraintviolate
idfileadm
idlinearwaves
meudon_bin_bh
meudon_bin_nsmeudon_mag_ns
ml_admconstraints
ml_admquantities
ml_bssn
ml_bssn_helper
ml_bssn_o2
ml_bssn_o2_helper
noexcision
quasilocalmeasures
rotatingdbhivp
tmunubase
twopunctures
weylscal4
coordgauge
grhydro_initdata
staticconformal
tovsolver
admcoupling
admmacros
aeilocalinterp
lapack
blas
lorene
boundary
ellsor
cartoon2d
periodic
reflectionsymmetry
rotatingsymmetry180
rotatingsymmetry90
carpetinterp
carpet
carpetinterp2
carpetioascii
carpetiobasic
carpetiohdf5
carpetioscalar
carpetreduce
carpetregrid
carpetregrid2
carpetslab
cartgrid3d
carpetevolutionmask
carpetlib
carpetmask
nanchecker carpettracker
ioascii
iohdf5util
iojpeg
spacemask
dissipation
hydro_analysis
hydro_initexcision
legoexcision
multipole
noise
sphericalsurface
constants
coordbase
ellbase
eos_hybrid
eos_baseeos_polytrope
eos_idealfluid
eosg_hybrideosg_base
eosg_idealfluid
eosg_polytrope
formaline
fortran
newrad
genericfd
loopcontrolgsl
hdf5
iohdf5
httpdextra
httpd
hydrobase
setmask_sphericalsurface
initbase
iobasic ioutil
timerreport
terminationtrigger
jpeg6b
tgrtensor
localinterp
localreduce ml_bssn_test
mol
nice norms
pugh
pughinterp
pughreduce
pughslab
slab
slabtest
socket
summationbyparts
symbase
tatelliptic
time
Interface InheritanceFunction RequirementDirect Thorn DependencyShared Variable DependencyCapability Requirement
COMPONENT DEPENDENCIES-- EINSTEIN TOOLKIT
Wednesday, August 4, 2010
DISTRIBUTION
• GetComponents is freely available with an open-source license
• www.eseidel.org/download/GetComponents
• Full documentation available• ./GetComponents --man
Wednesday, August 4, 2010
ACKNOWLEDGEMENTS
•Many thanks to Gabrielle Allen, Steve Brandt, Frank Löffler, and Erik Schnetter
Wednesday, August 4, 2010