28
Strategic Planning Agenda Limitless Storage Limitless Possibilities https://hps.vi4io.org Department of Computer Science Copyright University of Reading 2019-03-25 LIMITLESS POTENTIAL | LIMITLESS OPPORTUNITIES | LIMITLESS IMPACT Julian M. Kunkel ACES Strategy Meeting S H

Strategic Planning Agenda - Limitless Storage Limitless …€¦ · Scalable data management practice The inhomogeneous storage stack Suboptimal performance & performance portability

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Strategic Planning Agenda - Limitless Storage Limitless …€¦ · Scalable data management practice The inhomogeneous storage stack Suboptimal performance & performance portability

Strategic Planning Agenda

Limitless StorageLimitless Possibilities

https://hps.vi4io.org

Department of Computer Science

Copyright University of Reading

2019-03-25

LIMITLESS POTENTIAL | LIMITLESS OPPORTUNITIES | LIMITLESS IMPACT

Julian M. Kunkel

ACES Strategy Meeting

SH

)

Page 2: Strategic Planning Agenda - Limitless Storage Limitless …€¦ · Scalable data management practice The inhomogeneous storage stack Suboptimal performance & performance portability

Ongoing Activities Near-Term Goals Long-Term Aspirations

Outline

1 Ongoing Activities

2 Near-Term Goals

3 Long-Term Aspirations

Julian Kunkel Reading, 2019 2 / 14

Page 3: Strategic Planning Agenda - Limitless Storage Limitless …€¦ · Scalable data management practice The inhomogeneous storage stack Suboptimal performance & performance portability

Ongoing Activities Near-Term Goals Long-Term Aspirations

Department Activities

Teaching

� CS1PC – Programming in C

� CS3DP – Distributed and Parallel Computing

Administrative

� SID: Strategic infrastructure development for the department

� Museum: C37 (history of computer science, ongoing)

Julian Kunkel Reading, 2019 3 / 14

Page 4: Strategic Planning Agenda - Limitless Storage Limitless …€¦ · Scalable data management practice The inhomogeneous storage stack Suboptimal performance & performance portability

Ongoing Activities Near-Term Goals Long-Term Aspirations

Research Interest

High-performance storage for HPC

� Efficient I/O

I Performance analysis methods, tools and benchmarksI Optimizing parallel file systems and middlewareI Modeling of performance and costsI Tuning of I/O: Prescribing settingsI Management of workflows

� Data reduction: compression library, algorithms, methods

� Interfaces: towards domain-specific solutions and novel interfaces

Other research interests

� Application of big data analytics (e.g., for transportation)

� Domain-specific languages (for Icosahedral climate models)

� Cost-efficiency for data centers in general and for “produced” science

Julian Kunkel Reading, 2019 4 / 14

Page 5: Strategic Planning Agenda - Limitless Storage Limitless …€¦ · Scalable data management practice The inhomogeneous storage stack Suboptimal performance & performance portability

Ongoing Activities Near-Term Goals Long-Term Aspirations

Research

Involvement in currently running projects

� AIMES (ends in Q3/2019) – DFG fundedAdvanced Computation and I/O Methods for Earth-System Simulations

� PeCoH (ends in Q1/2020) – DFG fundedPerformance Conscious HPC

� Cooperation with Bull (ends Q4/2020) – industry fundedI/O Analysis at DKRZ

� ESiWACE (ends Q3/2019) – H2020 projectCentre of Excellence in Simulation of Weather and Climate in EuropeMoved some money to Reading for PDRA (to start ASAP)

Projects in the Queue

� Advanced Storage Monitoring (2PM PDRA) / Cooperation with DDN

� ESiWACE2 (3Y PDRA) follow up of ESiWACE

Julian Kunkel Reading, 2019 5 / 14

Page 6: Strategic Planning Agenda - Limitless Storage Limitless …€¦ · Scalable data management practice The inhomogeneous storage stack Suboptimal performance & performance portability

Ongoing Activities Near-Term Goals Long-Term Aspirations

Co-Supervised PhDs

University of Hamburg

� Anastasiia Novikova: Compression of Climate/Weather Data

� Frank Gadban: Understanding the Convergence of HPC and Cloud Computing

� Nabeeh Jumah: Language Extensibility and Configurability to Support Stencil CodeDevelopment

� Eugen Betke: Machine Learning of I/O Behavior

� ... (focus on relevant ones)

University of Reading

� Ekene Ozioko: Coordination Scheme for Human Driven and Autonomous VehicleTraffic Intersection using traffic light with intersection control unit

� Haifa Alsultan (start 04/2019?): Big Data Processing for Climate Science

Julian Kunkel Reading, 2019 6 / 14

Page 7: Strategic Planning Agenda - Limitless Storage Limitless …€¦ · Scalable data management practice The inhomogeneous storage stack Suboptimal performance & performance portability

Ongoing Activities Near-Term Goals Long-Term Aspirations

Support Activities

� Regularly attended conferences (others infrequently)

I ISC-HPC (June)I Supercomputing (November)

� Community building

I Bootstrapped: The Virtual Institute for I/Ohttps://www.vi4io.org

I Supporting: European Open File System (EOFS)https://www.eofs.eu/

I Organizing: Various I/O workshops / Birds-of-a-feather sessions

� Awareness: co-created the IO-500 listhttp://io-500.org

� Beyond teaching:

I Online teaching platform for C-Programming (ICP project)I Establishing a HPC certification program

https://hpc-certification.org

Julian Kunkel Reading, 2019 7 / 14

Page 8: Strategic Planning Agenda - Limitless Storage Limitless …€¦ · Scalable data management practice The inhomogeneous storage stack Suboptimal performance & performance portability

Ongoing Activities Near-Term Goals Long-Term Aspirations

Long-Term Software-Development

� IOR/MDTest: Benchmarks for HPC I/O

� ESDM: Earth-System Data Middleware (ESiWACE)

� SCIL: Scientific Compression Library

� Various small benchmarks/tools

Julian Kunkel Reading, 2019 8 / 14

Page 9: Strategic Planning Agenda - Limitless Storage Limitless …€¦ · Scalable data management practice The inhomogeneous storage stack Suboptimal performance & performance portability

Ongoing Activities Near-Term Goals Long-Term Aspirations

Outline

1 Ongoing Activities

2 Near-Term Goals

3 Long-Term Aspirations

Julian Kunkel Reading, 2019 9 / 14

Page 10: Strategic Planning Agenda - Limitless Storage Limitless …€¦ · Scalable data management practice The inhomogeneous storage stack Suboptimal performance & performance portability

Ongoing Activities Near-Term Goals Long-Term Aspirations

Near-Term Goals

� Get the PostDoc (Luciana Pedro) started for ESiWACE and deliver...

� Integrate compression into long-term archives (JASMIN and others)

� Make I/O Middleware ready for "production" runs and enabling machine learning

� In-memory storage and compute (cooperation with KOVE/Argonne)

� First certificates of the HPC Certification Program go live

I Tailor HPC Certification Program for the need of NWP community

� Comparison of DSLs for Weather Climate on the Shallow Water EquationGDDML/GridTools/PsyClone (joint paper)

� (Rework the modules for CS1PR|PC / CS3DP)

� (Opening the museum)

Julian Kunkel Reading, 2019 10 / 14

Page 11: Strategic Planning Agenda - Limitless Storage Limitless …€¦ · Scalable data management practice The inhomogeneous storage stack Suboptimal performance & performance portability

Ongoing Activities Near-Term Goals Long-Term Aspirations

Outline

1 Ongoing Activities

2 Near-Term Goals

3 Long-Term Aspirations

Julian Kunkel Reading, 2019 11 / 14

Page 12: Strategic Planning Agenda - Limitless Storage Limitless …€¦ · Scalable data management practice The inhomogeneous storage stack Suboptimal performance & performance portability

Ongoing Activities Near-Term Goals Long-Term Aspirations

Personal Vision: Towards Intelligent Storage Systems and Interfaces

Access paradigmDatabase File system

Local storage

ILM/HSM Self-awarenessSystem characteristics

NoSQL HDF5

Topology awareHierarchical storage

Performance model

Data replication

Semi-structured data

Content aware

Semantical access

Data transformation

Dynamic “on-disk” format

Intelligence Smart

Natural storage accessData exploration

Semantical name space Guided interface

Programmability

Data mining

Application focus User

Storage system

Arbitrary views � Abstract data interfaces

� Enhanced data management

� Integrate compute/storage engine

� Flexible views on data

� Smart storage

I Self-aware systemsI AI optimized placement

� Across sites

Julian Kunkel Reading, 2019 12 / 14

Page 13: Strategic Planning Agenda - Limitless Storage Limitless …€¦ · Scalable data management practice The inhomogeneous storage stack Suboptimal performance & performance portability

Ongoing Activities Near-Term Goals Long-Term Aspirations

ESDM is just the Beginning: Next Generation Interfaces

Towards a new I/O stack considering:

� User metadata and workflows as first-class citizens

� Smart hardware and software components

� Liquid-Computing: Smart-placement of computing

I Utilizing arbitrary compute and storage technology!

� Self-aware instead of unconscious

� Improving over time (self-learning, hardware upgrades)

� Enhanced monitoring

NGCommunity Strategy via a Forum / Open Board

Sta

ndar

d 1.

0

Data Model

Interface

Reference Implementation

Standard-Forum

Steering Board

Committee

WorkgroupUse

-cas

es

Mini-apps

Workflows

Industry

Data centers

Pseudo code

Mem

ber

s

Bod

ies

Nex

t Gen

erat

ion

Sta

ndar

d 2.

0

Experts

Julian Kunkel Reading, 2019 13 / 14

Page 14: Strategic Planning Agenda - Limitless Storage Limitless …€¦ · Scalable data management practice The inhomogeneous storage stack Suboptimal performance & performance portability

Ongoing Activities Near-Term Goals Long-Term Aspirations

Selected Small Long-Term Projects

I/O Modeling and Diagnosing Causes

� Predict likely reason/cause-of-effect of I/O by just analyzing runtime

� Estimate best-case time, if optimizations would work as intended

� Create a tool that automatizes the process...

Personalized Learning

� Use machine learning to personalize learning of the C online course

� Identify good lections, prescribe the order for students

Julian Kunkel Reading, 2019 14 / 14

Page 15: Strategic Planning Agenda - Limitless Storage Limitless …€¦ · Scalable data management practice The inhomogeneous storage stack Suboptimal performance & performance portability

Data Compression

Appendix

Julian Kunkel Reading, 2019 15 / 14

Page 16: Strategic Planning Agenda - Limitless Storage Limitless …€¦ · Scalable data management practice The inhomogeneous storage stack Suboptimal performance & performance portability

Data Compression

The Performance Challenge

� DKRZ file systems offer about 700 GiB/s throughput

I However, I/O operations are typically inefficient: Achieving 10% of peak is good

� Influences on I/O performance

I Application’s access pattern and usage of storage interfacesI Network congestionI Slow storage media (tape, HDD, SSD)I Concurrent activity – shared nature of I/OI Tunable optimizations deal with characteristics of storage mediaI These factors lead to complex interactions and non-linear behavior

Julian Kunkel Reading, 2019 16 / 14

Page 17: Strategic Planning Agenda - Limitless Storage Limitless …€¦ · Scalable data management practice The inhomogeneous storage stack Suboptimal performance & performance portability

Data Compression

Illustration of Performance Variability� Rerunning the same operation (access size, ...) leads to performance variation� Individual measurements – 256 KiB sequential reads (outliers purged)

Julian Kunkel Reading, 2019 17 / 14

Page 18: Strategic Planning Agenda - Limitless Storage Limitless …€¦ · Scalable data management practice The inhomogeneous storage stack Suboptimal performance & performance portability

Data Compression

Comparing Density Plot with the Individual Data Points

Duration for sequential reads with 256 KiB accesses (off0 mem layout)

Algorithm for determining classes (color schemes)

� Create density plot with Gaussian kernel density estimator

� Find minima and maxima in the plot

� Assign one class for all points between minima and maxima

� Rightmost hill is followed by cutoff (blue) close to zero ⇒ outliers (unexpected slow)

Julian Kunkel Reading, 2019 18 / 14

Page 19: Strategic Planning Agenda - Limitless Storage Limitless …€¦ · Scalable data management practice The inhomogeneous storage stack Suboptimal performance & performance portability

Data Compression

Write Operations

Results for one write run with sequential 256 KiB accesses (off0 mem layout).

Known optimizations for write

� Write-behind: cache data first in memory, then write back

� Write back is expected to be much slower

This behavior can be seen in the figure !Julian Kunkel Reading, 2019 19 / 14

Page 20: Strategic Planning Agenda - Limitless Storage Limitless …€¦ · Scalable data management practice The inhomogeneous storage stack Suboptimal performance & performance portability

Data Compression

Outline

4 Data CompressionAlgorithmsESDMParallel I/OResults

Julian Kunkel Reading, 2019 20 / 14

Page 21: Strategic Planning Agenda - Limitless Storage Limitless …€¦ · Scalable data management practice The inhomogeneous storage stack Suboptimal performance & performance portability

Data Compression

Compression Research: Involvement

� Development of algorithms for lossless compression

I MAFISC: suite of preconditioners for HDF5, aims to pack data optimallyReduced climate/weather data by additional 10-20%, simple filters are sufficient

� Cost-benefit analysis: e.g., for long-term storage MAFISC pays of

� Analysis of compression characteristics for earth-science related data sets

I Lossless LZMA yields best ratio but is very slow, LZ4fast outperforms BLOSCI Lossy: GRIB+JPEG2000 vs. MAFSISC and proprietary software

� Development of the Scientific Compression Library (SCIL)

I Separates concern of data accuracy and choice of algorithmsI Users specify necessary accuracy and performance parametersI Metacompression library makes the choice of algorithmsI Supports also new algorithmsI Ongoing: standardization of useful compression quantities

� A method for system-wide determination of data characteristics

I Method has been integrated into a script suite to scan data centers

Julian Kunkel Reading, 2019 21 / 14

Page 22: Strategic Planning Agenda - Limitless Storage Limitless …€¦ · Scalable data management practice The inhomogeneous storage stack Suboptimal performance & performance portability

Data Compression

Ongoing Activity: Earth-Science Data Middleware

� Part of the ESiWACE Center of Excellence in H2020

I Centre of Excellence in Simulation of Weather and Climate in Europe

� ESiWACE2 follow up has been funded!

ESDM provides a transitional approach towards a vision for I/O addressing

� Scalable data management practice

� The inhomogeneous storage stack

� Suboptimal performance & performance portability

� Data conversion/merging

Julian Kunkel Reading, 2019 22 / 14

Page 23: Strategic Planning Agenda - Limitless Storage Limitless …€¦ · Scalable data management practice The inhomogeneous storage stack Suboptimal performance & performance portability

Data Compression

Earth-System Data Middleware

Design Goals of the Earth-System Data Middleware

1 Relaxed access semantics, tailored to scientific data generation

I Avoid false sharing (of data blocks) in the write-pathI Understand application data structures and scientific metadataI Reduce penalties of shared file access

2 Site-specific (optimized) data layout schemes

I Based on site-configuration and performance modelI Site-admin/project group defines mappingI Flexible mapping of data to multiple storage backendsI Exploiting backends in the storage landscape

3 Ease of use and deployment particularly configuration

4 Enable a configurable namespace based on scientific metadata

Julian Kunkel Reading, 2019 23 / 14

Page 24: Strategic Planning Agenda - Limitless Storage Limitless …€¦ · Scalable data management practice The inhomogeneous storage stack Suboptimal performance & performance portability

Data Compression

Architecture

Key Concepts

� Middleware utilizes layout component to make placement decisions

� Applications work through existing API (currently: NetCDF library)

� Data is then written/read efficiently; potential for optimization inside library

NetCDFNetCDF ....

Layout component

User-level APIs

File system Object store ...

User-level APIs

Site-specificback-endsand

mapping

Data-type aware

file a file b file c obj a obj b

Site InternetArchival

CanonicalFormat

Julian Kunkel Reading, 2019 24 / 14

Page 25: Strategic Planning Agenda - Limitless Storage Limitless …€¦ · Scalable data management practice The inhomogeneous storage stack Suboptimal performance & performance portability

Data Compression

Challenges

� Achieving high performance

� Understanding observed behavior (and performance)

� Tuning system settings and configurations

� Enabling performance portability

� Managing files and (data-intense) workflows

� Utilizing heterogeneous storage landscapes

These are opportunities for tools and method development!

� Diagnosing causes, predicting performance, prescribing settings

� Smarter ways of data handling

Julian Kunkel Reading, 2019 25 / 14

Page 26: Strategic Planning Agenda - Limitless Storage Limitless …€¦ · Scalable data management practice The inhomogeneous storage stack Suboptimal performance & performance portability

Data Compression

Performance Analysis

ProblemAssessing observed time for I/O is difficult.What best-case performance can we expect?

Support for analysis – my involvement

� Models and simulation

I Trivial models: using throughput + latencyI PIOSimHD: MPI application + storage system simulator

� Tools to capture and analyze system statistics and I/O activities

I HDTrace – tracing tool for parallel I/O (+ PVFS2)I SIOX – tool to capture I/O on various levelsI Grafana – Online monitoring for DKRZ (support)

� Benchmarks – on various levels, e.g., Metadata (md-workbench, IOR)

� Statistic model to determine likely cause based on time

Julian Kunkel Reading, 2019 26 / 14

Page 27: Strategic Planning Agenda - Limitless Storage Limitless …€¦ · Scalable data management practice The inhomogeneous storage stack Suboptimal performance & performance portability

Data Compression

Resulting Performance Models for Read

Read models predicting caching and memory location.

Julian Kunkel Reading, 2019 27 / 14

Page 28: Strategic Planning Agenda - Limitless Storage Limitless …€¦ · Scalable data management practice The inhomogeneous storage stack Suboptimal performance & performance portability

Data Compression

Using the Model to Identify AnomaliesUsing the model, the figure for reverse access shows slow-down (by read-ahead)

Julian Kunkel Reading, 2019 28 / 14