41
LLNL-PRES-754800 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC How Open Source Supports the Largest Computers on the Planet Best Practices for HPC Software Developers Ian Lee Lawrence Livermore National Laboratory July 18, 2018

How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-754800This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC

How Open Source Supports the Largest Computers on the PlanetBest Practices for HPC Software Developers

Ian LeeLawrence Livermore National Laboratory

July 18, 2018

Page 2: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-7548002software.llnl.gov

Page 3: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-7548003software.llnl.gov

Page 4: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-7548004software.llnl.govhttps://upload.wikimedia.org/wikipedia/commons/a/a8/U.S._National_labs_map.jpg

Page 5: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-7548005software.llnl.govhttp://www.ex-astris-scientia.org/articles/new_enterprise/enterprise-warpcore.jpg

Page 6: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-7548006software.llnl.govhttps://pixabay.com/get/e833b10d2af4083ed1534705fb0938c9bd22ffd41db612439df7c17ba0/silos-1602209_1920.jpg

Page 7: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-7548007software.llnl.gov

1960s 1970s 1980s 1990s 2000s 2010s

Pioneering simulations of

particle tracking

CDC 3600

CDC 7600

Ozone mixing models

CRAY 1

ASCI Blue-Pacific

Helping the medical community plan

radiation treatment

Unprecedented dislocation dynamics

simulations

BlueGene

Breakthrough visualizations of

mixing fluids

Dynamics in three dimensions

Global climate modeling

Detailed predictions

of ecosystems

Petascale and exascale

computing

Page 8: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-754800

8software.llnl.gov

§ 3 out of 16 #1 systems over last 20 years

Top500.org

ASCI White

Nov 2000 – Nov 2001

BlueGene/L

Nov 2004 – Nov 2007

Sequoia

June 2012

https://www.top500.org/resources/top-systems/

Page 9: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-7548009software.llnl.gov

Sierra

Page 10: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-75480010software.llnl.gov

ZFS on Linux

§ ZFS is an open source filesystem and volume manager designed to address the limitations of existing storage solutions

§ 2011: Available for Linux

§ Ten LLNL filesystems, totaling ~ 100PB

§ Ships in Ubuntu 16.04

http://zfsonlinux.org

Page 11: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-75480011software.llnl.gov

Page 12: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-75480012software.llnl.gov

Page 13: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-75480013software.llnl.gov

Page 14: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-75480014software.llnl.gov

Page 15: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-75480015software.llnl.gov

Page 16: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-75480016software.llnl.gov

Page 17: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-75480017software.llnl.gov

Page 18: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-75480018software.llnl.gov

Page 19: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-75480019software.llnl.govhttps://software.llnl.gov

Page 20: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-75480020software.llnl.gov

LLNL Open Source Presence

https://software.llnl.gov/explore

Page 21: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-75480021software.llnl.gov

LLNL Open Source Engagement

https://software.llnl.gov/explore

Page 22: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-75480022software.llnl.gov

LLNL Open Source Activities

https://software.llnl.gov/explore

Page 23: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-75480023software.llnl.gov

Page 24: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-75480024software.llnl.gov

Science & Technology Review

“Our large collection of software is a precious Laboratory asset, one that benefits both Lawrence Livermore, and in many cases, the public at large.”

- Bruce HendricksonAssociate Director, Computation

https://str.llnl.gov/2018-01/comjan18

Page 25: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-75480025software.llnl.govhttps://www.exascaleproject.org/more-on-the-software-that-underpins-the-exascale-computing-project/

Page 26: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-75480026software.llnl.gov

§ “Federal Source Code Policy: Achieving Efficiency, Transparency, and Innovation through Reuseable and Open Source Software”

— “Agencies shall make custom-developed code available for Government-wide reuse and make their code inventories discoverable at https://www.code.gov (“Code.gov”) […]”

— “[…] establishes a pilot program that requires agencies, when commissioning new custom software, to release at least 20 percent of new custom-developed code as Open Source Software (OSS) […]”

Federal Source Code Policy

https://sourcecode.cio.gov

https://code.gov & https://sourcecode.cio.gov

Page 27: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-75480027software.llnl.govhttps://code.gov

Page 28: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-75480028software.llnl.govhttps://osti.gov/doecode

Page 29: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-75480029software.llnl.govhttps://government.github.com

Page 30: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-75480030software.llnl.gov

US Government Organizations on GitHub

https://government.github.com/community/

Page 31: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

Thank [email protected]

@IanLee1521 // @LLNL_OpenSource

https://speakerdeck.com/IanLee1521

Page 32: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

This document was prepared as an account of work sponsored by an agency of the United States government. Neither the United States government nor Lawrence Livermore National Security, LLC, nor any of their employees makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Rference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States government or Lawrence Livermore National Security, LLC. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States government or Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes.

Page 33: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-75480033software.llnl.gov

TOSS – Tri-Lab Operating System Software

§ Built on Red Hat Enterprise Linux— Not an HPC distribution

§ Adds LLNL developed additions and patches to support HPC— Low Latency Interconnect: Infiniband— Parallel File System: Lustre— Resource Manager: SLURM

§ Work closely with open communitiesComponents not in TOSS

Supported Linux Commodity Hardware Platform

Kernel, Infiniband, Message Passing Interface

Batch Scheduler (MOAB)

UserEnvironment

LustreFile Systems

Compiler &Development Tools

Resource Manager (SLURM)

TOSSComponents

HPSS Hopper

LLNL-PRES-550311

TOSS is a software stack for HPC – large, interconnected clusters!

Page 34: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-75480034software.llnl.gov

§ Began as simple resource manager— Now scalable to 1.6M+ cores (sequoia)

§ Launch and manage parallel jobs— Large, parallel jobs, often MPI

§ Queuing and scheduling of jobs— Much more work than resources

http://slurm.schedmd.comhttp://slurm.schedmd.com

http://www.ibm.com/developerworks/library/l-slurm-utility/figure3.gif

Page 35: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-75480035software.llnl.gov

§ Family of projects used to build site-customized resource management systems

§ flux-core— Implements the communication layer and lowest level services and interfaces

§ flux-sched— Consists of an engine that handles all the functionality common to scheduling

§ capacitor— A bulk execution manager using flux-core, handles running and monitoring 1000’s of jobs

http://flux-framework.github.io

Page 36: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-75480036software.llnl.gov

§ Handles combinatorial explosion of ABI-incompatible packages

§ All versions coexist, binaries work regardless of user’s environment

§ Familiar syntax, reminiscent of brew, yum, etc

$ spack install mpileaks unconstrained$ spack install [email protected] @ custom version$ spack install [email protected] %[email protected] % custom compiler$ spack install [email protected] %[email protected] +threads +/- build option$ spack install [email protected] os=SuSE11 os=<frontend OS>$ spack install [email protected] os=CNL10 os=<backend OS>$ spack install [email protected] os=CNL10 target=haswell target=<cpu target>

SPACK

https://spack.io

Page 37: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-75480037software.llnl.gov

§ Manages the first-ever decentralized database for handling climate science data

§ Multiple petabytes of data at dozens of federated sites worldwide

§ International collaboration for the software that powers most global climate change research

https://github.com/ESGF

https://esgf.llnl.gov

Page 38: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-75480038software.llnl.gov

VisIt

§ Originally developed to visualize and analyze the results of terascalesimulations

§ Interactive, scalable, visualization, animation and analysis tool

§ Powerful, easy to use GUI

§ Distributed and parallel architecture allows handling extremely large data sets interactively

https://visit.llnl.gov

Page 39: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-75480039software.llnl.govhttps://computation.llnl.gov/casc

Page 40: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-75480040software.llnl.govhttps://code.gov/#/explore-code/agencies/DOE

Page 41: How Open Source Supports the Largest Computers on the Planetideas-productivity.org/wordpress/wp-content/... · § 3 out of 16 #1 systems over last 20 years Top500.org ASCI White Nov

LLNL-PRES-75480041software.llnl.gov

Public US Government GitHub Data Scrape

§ 252 US Government Orgs— U.S. Federal (137)— U.S. Military and Intelligence (12)— U.S. Research Labs (103)

§ 8716 Open Source Repositories

https://github.com/LLNL/scraper/pull/3

LLNL5%

Other US Governm

ent95%