Jul 21 2013 XSEDE’13

  • Published on

  • View

  • Download

Embed Size (px)


Managing Large-Scale Data Using ADIOS. Jul 21 2013 XSEDE13. Yuan Tian tiany@ornl.gov Jeremy Logan jlogans@ornl.gov University of Tennessee. Outline. ADIOS Introduction ADIOS Write API Hands-on 1, Write data with ADIOS Hands-on 1, Tools ADIOS Read API - PowerPoint PPT Presentation


<p>ADIOS-slides</p> <p>Jul 21 2013XSEDE13Yuan Tian tiany@ornl.govJeremy Logan jlogans@ornl.govUniversity of TennesseeManaging Large-Scale Data Using ADIOS</p> <p>2</p> <p>OutlineADIOS IntroductionADIOS Write APIHands-on 1, Write data with ADIOSHands-on 1, ToolsADIOS Read APIHands-on 1, Read data with ADIOSHands-on 1++, Spatial aggregationHands-on 2, ADIOS Write API (Non-XML version)Hands-on 4, Multi-block writing with non-XML APIHands-on 3, Staging exampleHands-on 4, Visualization of ADIOS dataHands-on 5, I/O skeleton generation with SkelADIOS + MatlabHands-on 8 PythonHands-on 9, JavaSummary3ADIOS ParticipantsORNL: Hasan Abbasi, Jong Youl Choi , Scott Klasky, Qing Liu, Kimmy Mu, Norbert Podhorszki, Dave Pugmire, Roselyne Tchoua, Georgia Tech: Greg Eisenhauer, Jay Lofstead, Karsten Schwan, Matt Wolf, Fang ZhangUTK: Jeremy Logan, Yuan Tian Rutgers: C. Docan, Tong Jin, Manish Parashar, Fan ZhangNCSU: Drew Boyuka, Z. Gong, Nagiza Samatova, LBNL: Arie Shoshani, John WuEmory University: Tahsin Kurc, Joel SaltzSandia: Jackie Chen, Todd Kordenbock, Ken MorelandNREL: Ray GroutPPPL: C. S. Chang, Stephane Ethier , Seung Hoe Ku, William Tang Caltech: Julian CummingsUCI: Zhihong LinTsinghua University (China): Wei Xue, Lizhi Wang</p> <p>4*Red marks major research/developer for the ADIOS project, Blue denotes student, Green denotes application scientistsThanks for our Current Funding OLCF: ADIOS, Barb Helland (Buddy Bland)Runtime Staging, Exascale-SDM, ASCR: Lucy NowellScalable Data Management, Analysis and Visualization Institute, ASCR: Lucy NowellASCR, International Collaboration Framework for Extreme Scale Experiments (ICEE): Rich CarlsonNSF, An Application Driven I/O Optimization Approach for PetaScale Systems and Scientific Discoveries: Almadena ChtchelkanovaASCR, SciDAC, Edge Physics Simulation (EPSI): Randall Laviolette, John MandrekasNSF, Remote Data Analysis and Visualization: Barry SchneiderNASA, An Elastic Parallel I/O Framework for Computational Climate Modeling: Tsengdar LeeOFES, Center for Nonlinear Simulation of Energetic Particles in Burning Plasmas (CSEP), John MandrekasOFES, Energetic Particles, John MandrekasBES, Network for ab initio many-body methods: development, education and training, Hans M. ChristenNSF-NSFC, High Performance I/O Methods and infrastructure for large-scale geo-science applications on supercomputers., D. Katz</p> <p> Application PartnersOver 158 Publicationsfrom 2006 - 2013</p> <p>Our fundingVirtualBox installInstall VirtualBox firstVersion 4.2.4 providedfor some linux distro too, but if not, download from https://www.virtualbox.org/wiki/Download_Old_Builds_4_2Windows: Install Extension Pack or Disable USB controllerOracle_VM_VirtualBox_Extension_Pack-4.2.4-81684.vbox-extpackor later in the virtual machines USB section, uncheck Enable USB ControllerFile/Import Applianceadios150-tutorial.ovacheck Reinitialize the MAC address of all network cardsRun virtual machineIgnore the warning about some shared folder7exFAT file systemLinux may not read exFAT file systemInstall fuse-exfat (experimental) packageE.g. Ubuntu</p> <p>If automount is disabled, mount/unmount manually8sudo add-apt-repository ppa:relan/exfat sudo apt-get update sudo apt-get install fuse fuse-exfat exfat-utilssudo mkdir /media/exfat sudo mount -t exfat /dev/sdb1 /media/exfatsudo umount /media/exfatVirtualBox know-howuser / pwd: esimmon / esimmonrun mpd &amp; in a terminal, only once to start the MPICH2 daemoneditors: vi, geditimage viewer: eog .to flip through the images in the current directory</p> <p>For copy/paste between VM and your laptop OSnot needed for this tutorialset in VM menu: Devices/Shared Clipboard/Bidirectionalmaybe need to install VBoxGuestAdditionssee VirtualBox UserManual, chapter 4.29OutlineADIOS IntroductionADIOS Write APIHands-on 1, Write data with ADIOSHands-on 1, ToolsADIOS Read APIHands-on 1, Read data with ADIOSHands-on 1++, Spatial aggregationHands-on 2, ADIOS Write API (Non-XML version)Hands-on 2, Multi-block writing with non-XML APIHands-on 3, Staging exampleHands-on 4, Visualization of ADIOS dataHands-on 5, I/O skeleton generation with SkelADIOS + MatlabHands-on 8 PythonHands-on 9, JavaSummary10Big dataBig data is characterized byVolumeRemote SensingWeb (text, images, video)Simulations: 1 PB for some simulationsExperimentsVelocitySKA: 10 PB/sVarietyHeterogeneous, spatial and temporalMulti-resolution, Multi-sensorVeracityValue </p> <p>11Extreme scale computing.TrendsMore FLOPSLimited number of users at the extreme scaleProblemsPerformanceResiliencyDebuggingGetting Science doneProblems will get worseNeed a revolutionary way to store, access, debug to get the science done!</p> <p>From J. Dongarra, Impact of Architecture and Technology for Extreme Scale on Software and Algorithm Design, Cross-cutting Technologies for Computing at the Exascale, February 2-5, 2010.Most people get &lt; 10 GB/s at scaleFile Systems13</p> <p>LUSTRELustre consists of four major components MetaData Server (MDS) Object Storage Servers (OSSs) Object Storage Targets (OSTs) ClientsMDSOSSOSTPerformance: Striping, alignment, placementGPFS is similar, but </p> <p>Our Goals for sustainable software developmentEase of use</p> <p>High Performance</p> <p>Scalable</p> <p>Portable</p> <p>Easy to master</p> <p>The SOA philosophy for HPC/ADIOSThe overarching design philosophy of our framework is based on the Service-Oriented Architecture Used to deal with system/application complexity, rapidly changing requirements, evolving target platforms, and diverse teamsApplications constructed by assembling services based on a universal view of their functionality using a well-defined APIService implementations can be changed easilyIntegrated simulation can be assembled using these servicesManage complexity while maintaining performance/scalabilityComplexity from the problem (complex physics)Complexity from the codes and how they are Complexity of underlying disruptive infrastructureComplexity from coordination across codes and research teamsComplexity of the end-to-end workflowsScott</p> <p>Bunch of challenges after the first experiments here are the challenges newChallenge of software development, software complexity, complexity Slide of all of the different challengesBullet list of the challenges.</p> <p>Most cited ADIOS-related publications96Flexible io and integration for scientific codes through the adaptable io system (adios)81Datastager: scalable data staging services for petascale applications76Adaptable, metadata rich IO methods for portable high performance IO53PreDatA preparatory data analytics on peta-scale machines46Managing variability in the IO performance of petascale storage systems43Grid-based parallel data streaming implemented for the gyrokinetic toroidal code36 DataSpaces: An interaction and coordination framework for coupled simulation workflows35 High performance threaded data streaming for large scale simulations32 Workflow automation for processing plasma fusion simulation data26 Plasma edge kinetic-MHD modeling in tokamaks using Kepler workflow for code coupling, data management and visualization26 An autonomic service architecture for self-managing grid applications23 Extending i/o through high performance data services19 EDO: improving read performance for scientific applications through elastic data organization19 Six degrees of scientific data: reading patterns for extreme scale science IO19 Compressing the incompressible with ISABELA: In-situ reduction of spatio-temporal data</p> <p>ADIOSAn I/O abstraction framework Provides portable, fast, scalable, easy-to-use, metadata rich output Change I/O method on-the-flyAbstracts the API from the method http://www.nccs.gov/user-support/center-projects/adios/Typical10X performanceimprovement forsynchronous I/Oover othersolutions</p> <p>Read performance of a 2D slice of a 3D variable + timeGeorgia Tech, Rutgers, NCSU, Emory, Auburn, Sandia, LBL, PPPL18Explanation of numbers</p> <p>Combustion: S3D code output performance (analysis data = checkpoint/restart data). ADIOS enabled S3D to scale beyond 30k cores (I/O became the bottleneck in larger runs before ADIOS)Enabled S3D to scale to full size of Jaguar XT5 (ref: http://www.hpcwire.com/hpcwire/2009-10-29/adios_ignites_combustion_simulations.html)</p> <p>Fusion: Simulation (XGC and GTC)ref: Podhorszki, Norbert, et al. "Plasma fusion code coupling using scalable I/O services and scientific workflows."Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science. ACM, 2009.</p> <p>Seismology: SCEC (South California Earth Quake Center) M8 runs on Jaguar XT5 in 2010 (finalist for Gordon Bell award), checkpoint/restart data I/O performance. ADIOS enabled to actually write out the huge dataset and thus enabled to actually complete the complete simulation (with restarts after machine failures).</p> <p>Climate Analysis: NASAs GEOS-5 code. Our experimental results on the Jaguar supercomputer at ORNL have demonstrated a maximum of 11x speedup for the write performance, and 73x speedup for the performance of data post-processing. Published at MSST'13: A Lightweight I/O Scheme to Facilitate Spatial and Temporal Queries of Scientific Data Analytics</p> <p>Square Kilometer Array: I/O performance for a correlator code. All radio telescopes data is streamed through this application on the Fornax supercomputer at iVEC in Western Australia. Ref: Personal email from researcher at IVEC</p> <p>Industrial Engineering: RAMGEN Ltd, supersonic gas compression turbines, using CFD code. ADIOS enabled scaling the geometry from 500M cells to 3.5 billion cells in their simulations in 2012 on Jaguar XK6 and thus study fine turbulent structures.Ref: </p> <p> 19ADIOS informationADIOS is an I/O frameworkSimilar software philosophy as Linux: there is no single ownerProvides multiple methods to stage data to a staging area (on node, off node, off machine)Data output can be anything one wantsDifferent methods allow for different types of data movement, aggregation, and arrangement to the storage system or to stream over the local-nodes, LAN, WANIt contains our own file format if you choose to use it (ADIOS-BP)In the next release, it will be able to compress/decompress data in parallel: plugged into the transform layerIn the next release: it will contain mechanisms to index and then query the data20</p> <p>The History ofGeneral Relativity: enabled interactive analysis</p> <p>Z. Lin: Allowed fusion code to scale from 100s to 1000s of cores</p> <p>C. S. Chang, Edge Fusion code scales to 100K cores from 1K, and couple multi-physics codes</p> <p>J.Chen, combustion, allows us to scale beyond the MPI-IO barrier at 30K cores</p> <p>Workflow streams from KSTAR to ORNL for 10X faster data processing of KSTAR ECEI data</p> <p>GEOS5 climate code reading performance improved 78X over parallel HDF5Ramgen/NumecaFirst commercial code to use ADIOS,Allowed them to scale from 10K cores to over 200K cores </p> <p>1. The vision began with the abstraction of data streaming and I/O back in 1985 with the General Relativity Black Hole code.2. 1995 we worked on techniques to use the JIT java compiler to move analysis code to other analysis platforms, to share the data. This concept then went into many papers and then ADIOS code (COD C On Demand) some years later.3. 2000. Thread and buffer the data for Z. Lin (GTC Gyrokinetic Turbulence Code) to allow them to scale to 1K cores, since parallel HDF5 was over 10X to slow compared to our data streaming.4. C.S. Chang had to couple data, and using our research from Predata (2011), along with ADIOS 1.0, we were able to scale his Fusion Edge simulation to over 100K cores, and couple this code to a Magneto Hydrodynamic Code (M3D) (this was using our Service Oriented Architecture Vision).5. 3 Phase I/O (aggregate data, move data to the file system, asynchronous metadata movement) along with ADIOS-sub-files has allowed many codes (Ramgen/Numecca, Combusion, etc.) to scale to the full machine, and allowing most codes to get over 75% of the peak I/O performance, compared to before, where most didnt get 1% of the peak I/O performance.6. Predata allowed us to do code-coupling, along with Dataspaces, which created a shared space semantic for coupling codes, allowing a code to use the SOA technique.7. Using both spatial and temporal aggregation allowed us to get over 70X faster reading performance and over 21All data chunks are from a single producerMPI processSingle diagnosticAbility to create a separate metadata file when sub-files are generatedAllow code to be integrated to streamsAllows variables to be individually compressedFormat is for data-in-motion and data-at-restMetadata for allchunksStatistics for all chunks</p> <p>ADIOS Self-describing Format for Data Chunks/StreamsADIOS latest Release1.5.0 Release June 2013http://www.olcf.ornl.gov/center-projects/adios/Changes to Write API: Changes to Read API: New staging methods: DIMES FLEXPATH CMAKE build files (besides Automake files) New write method VAR_MERGE for spatial aggregation of small per-process-output into larger chunks. It improves both write and read performance for applicationsPlease give us suggestions for our next release at SC-2013.Aggregated file read method </p> <p>Staged write C codeUses any staging or file method to read data, writes with any ADIOS method</p> <p>ADIOS performance for writing</p> <p>Favorite highlights"I appreciate the clean yet capable interface and that from my perspective, it just works," Grout. ADIOS has allowed us to couple fusion codes to solve new physics problems, Chang.So far I have found that even using ADIOS in the way you said is bad and silly in every aspect, the I/O performance is still improved by about 10X, J. Wang ICAR thanks to Dr. Podhorszki, the developers at Numeca have had excellent support in integrating ADIOS into FINE/Turbo. We have observed a 100x speedup of I/O, which is difficult to achieve in commercial codes on industrial applications, Grosvenor.25Reading performanceAggregate and place chunks on file system with an Elastic Data OrganizationAbility to optimize for common read patterns (e.g. 2D slice from 3D variable), space-time optimizationsAchieved a 73x speedup for read performance, and 11x speedup for write performance in mission critical climate simulation GEOS-5 (NASA), on Jaguar.</p> <p>First place ACM student Research Competition 2012Read performance of a 2D slice of a 3D variable + timeGEOS-5 ResultsCommon read patterns for GEOS-5 users are reduced from 10 0.1 secondsAllows interactive data exploration for mission critical visualizations</p> <p>Introduction to StagingInitial development as a research effort to minimize I/O overheadDraws from past work on threaded I/OExploits network hardware support for fast data transfer to remote memory</p> <p>Hasan Abbasi, Matthew Wolf, Greg Eisenhauer, Scott Klasky, Karsten Schwan, Fang Zheng: DataStager: scalable data staging services for petascale applications. Cluster Computing 13(3): 277-290 (2010)Ciprian Docan, Manish Parashar, Scott Klasky: DataSpaces: an interaction and coordination framework for coupled simulation workf...</p>