Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
LLNL-PRES-754800This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC
How Open Source Supports the Largest Computers on the PlanetBest Practices for HPC Software Developers
Ian LeeLawrence Livermore National Laboratory
July 18, 2018
LLNL-PRES-7548002software.llnl.gov
LLNL-PRES-7548003software.llnl.gov
LLNL-PRES-7548004software.llnl.govhttps://upload.wikimedia.org/wikipedia/commons/a/a8/U.S._National_labs_map.jpg
LLNL-PRES-7548005software.llnl.govhttp://www.ex-astris-scientia.org/articles/new_enterprise/enterprise-warpcore.jpg
LLNL-PRES-7548006software.llnl.govhttps://pixabay.com/get/e833b10d2af4083ed1534705fb0938c9bd22ffd41db612439df7c17ba0/silos-1602209_1920.jpg
LLNL-PRES-7548007software.llnl.gov
1960s 1970s 1980s 1990s 2000s 2010s
Pioneering simulations of
particle tracking
CDC 3600
CDC 7600
Ozone mixing models
CRAY 1
ASCI Blue-Pacific
Helping the medical community plan
radiation treatment
Unprecedented dislocation dynamics
simulations
BlueGene
Breakthrough visualizations of
mixing fluids
Dynamics in three dimensions
Global climate modeling
Detailed predictions
of ecosystems
Petascale and exascale
computing
LLNL-PRES-754800
8software.llnl.gov
§ 3 out of 16 #1 systems over last 20 years
Top500.org
ASCI White
Nov 2000 – Nov 2001
BlueGene/L
Nov 2004 – Nov 2007
Sequoia
June 2012
https://www.top500.org/resources/top-systems/
LLNL-PRES-7548009software.llnl.gov
Sierra
LLNL-PRES-75480010software.llnl.gov
ZFS on Linux
§ ZFS is an open source filesystem and volume manager designed to address the limitations of existing storage solutions
§ 2011: Available for Linux
§ Ten LLNL filesystems, totaling ~ 100PB
§ Ships in Ubuntu 16.04
http://zfsonlinux.org
LLNL-PRES-75480011software.llnl.gov
LLNL-PRES-75480012software.llnl.gov
LLNL-PRES-75480013software.llnl.gov
LLNL-PRES-75480014software.llnl.gov
LLNL-PRES-75480015software.llnl.gov
LLNL-PRES-75480016software.llnl.gov
LLNL-PRES-75480017software.llnl.gov
LLNL-PRES-75480018software.llnl.gov
LLNL-PRES-75480019software.llnl.govhttps://software.llnl.gov
LLNL-PRES-75480020software.llnl.gov
LLNL Open Source Presence
https://software.llnl.gov/explore
LLNL-PRES-75480021software.llnl.gov
LLNL Open Source Engagement
https://software.llnl.gov/explore
LLNL-PRES-75480022software.llnl.gov
LLNL Open Source Activities
https://software.llnl.gov/explore
LLNL-PRES-75480023software.llnl.gov
LLNL-PRES-75480024software.llnl.gov
Science & Technology Review
“Our large collection of software is a precious Laboratory asset, one that benefits both Lawrence Livermore, and in many cases, the public at large.”
- Bruce HendricksonAssociate Director, Computation
https://str.llnl.gov/2018-01/comjan18
LLNL-PRES-75480025software.llnl.govhttps://www.exascaleproject.org/more-on-the-software-that-underpins-the-exascale-computing-project/
LLNL-PRES-75480026software.llnl.gov
§ “Federal Source Code Policy: Achieving Efficiency, Transparency, and Innovation through Reuseable and Open Source Software”
— “Agencies shall make custom-developed code available for Government-wide reuse and make their code inventories discoverable at https://www.code.gov (“Code.gov”) […]”
— “[…] establishes a pilot program that requires agencies, when commissioning new custom software, to release at least 20 percent of new custom-developed code as Open Source Software (OSS) […]”
Federal Source Code Policy
https://sourcecode.cio.gov
https://code.gov & https://sourcecode.cio.gov
LLNL-PRES-75480027software.llnl.govhttps://code.gov
LLNL-PRES-75480028software.llnl.govhttps://osti.gov/doecode
LLNL-PRES-75480029software.llnl.govhttps://government.github.com
LLNL-PRES-75480030software.llnl.gov
US Government Organizations on GitHub
https://government.github.com/community/
This document was prepared as an account of work sponsored by an agency of the United States government. Neither the United States government nor Lawrence Livermore National Security, LLC, nor any of their employees makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Rference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States government or Lawrence Livermore National Security, LLC. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States government or Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes.
LLNL-PRES-75480033software.llnl.gov
TOSS – Tri-Lab Operating System Software
§ Built on Red Hat Enterprise Linux— Not an HPC distribution
§ Adds LLNL developed additions and patches to support HPC— Low Latency Interconnect: Infiniband— Parallel File System: Lustre— Resource Manager: SLURM
§ Work closely with open communitiesComponents not in TOSS
Supported Linux Commodity Hardware Platform
Kernel, Infiniband, Message Passing Interface
Batch Scheduler (MOAB)
UserEnvironment
LustreFile Systems
Compiler &Development Tools
Resource Manager (SLURM)
TOSSComponents
HPSS Hopper
LLNL-PRES-550311
TOSS is a software stack for HPC – large, interconnected clusters!
LLNL-PRES-75480034software.llnl.gov
§ Began as simple resource manager— Now scalable to 1.6M+ cores (sequoia)
§ Launch and manage parallel jobs— Large, parallel jobs, often MPI
§ Queuing and scheduling of jobs— Much more work than resources
http://slurm.schedmd.comhttp://slurm.schedmd.com
http://www.ibm.com/developerworks/library/l-slurm-utility/figure3.gif
LLNL-PRES-75480035software.llnl.gov
§ Family of projects used to build site-customized resource management systems
§ flux-core— Implements the communication layer and lowest level services and interfaces
§ flux-sched— Consists of an engine that handles all the functionality common to scheduling
§ capacitor— A bulk execution manager using flux-core, handles running and monitoring 1000’s of jobs
http://flux-framework.github.io
LLNL-PRES-75480036software.llnl.gov
§ Handles combinatorial explosion of ABI-incompatible packages
§ All versions coexist, binaries work regardless of user’s environment
§ Familiar syntax, reminiscent of brew, yum, etc
$ spack install mpileaks unconstrained$ spack install [email protected] @ custom version$ spack install [email protected] %[email protected] % custom compiler$ spack install [email protected] %[email protected] +threads +/- build option$ spack install [email protected] os=SuSE11 os=<frontend OS>$ spack install [email protected] os=CNL10 os=<backend OS>$ spack install [email protected] os=CNL10 target=haswell target=<cpu target>
SPACK
https://spack.io
LLNL-PRES-75480037software.llnl.gov
§ Manages the first-ever decentralized database for handling climate science data
§ Multiple petabytes of data at dozens of federated sites worldwide
§ International collaboration for the software that powers most global climate change research
https://github.com/ESGF
https://esgf.llnl.gov
LLNL-PRES-75480038software.llnl.gov
VisIt
§ Originally developed to visualize and analyze the results of terascalesimulations
§ Interactive, scalable, visualization, animation and analysis tool
§ Powerful, easy to use GUI
§ Distributed and parallel architecture allows handling extremely large data sets interactively
https://visit.llnl.gov
LLNL-PRES-75480039software.llnl.govhttps://computation.llnl.gov/casc
LLNL-PRES-75480040software.llnl.govhttps://code.gov/#/explore-code/agencies/DOE
LLNL-PRES-75480041software.llnl.gov
Public US Government GitHub Data Scrape
§ 252 US Government Orgs— U.S. Federal (137)— U.S. Military and Intelligence (12)— U.S. Research Labs (103)
§ 8716 Open Source Repositories
https://github.com/LLNL/scraper/pull/3
LLNL5%
Other US Governm
ent95%