26
Non-Volatile Memory for Next Generation I/O Dr Michèle Weiland [email protected]

Non-Volatile Memory for Next Generation I/O€¦ · Next Generation I/O Dr Michèle Weiland [email protected]. Current trends & approaches 28/03/2017 39th ORAP Forum 2. Burst

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Non-Volatile Memory for Next Generation I/O€¦ · Next Generation I/O Dr Michèle Weiland m.weiland@epcc.ed.ac.uk. Current trends & approaches 28/03/2017 39th ORAP Forum 2. Burst

Non-Volatile Memory for Next Generation I/O

Dr Michèle [email protected]

Page 2: Non-Volatile Memory for Next Generation I/O€¦ · Next Generation I/O Dr Michèle Weiland m.weiland@epcc.ed.ac.uk. Current trends & approaches 28/03/2017 39th ORAP Forum 2. Burst

Current trends & approaches

28/03/2017 39th ORAP Forum 2

Page 3: Non-Volatile Memory for Next Generation I/O€¦ · Next Generation I/O Dr Michèle Weiland m.weiland@epcc.ed.ac.uk. Current trends & approaches 28/03/2017 39th ORAP Forum 2. Burst

Burst buffer

28/03/2017 39th ORAP Forum 3

highperformancenetwork

externalfilesystem

computenodes

highperformancenetwork

externalfilesystem

computenodes

burstfilesystem

Page 4: Non-Volatile Memory for Next Generation I/O€¦ · Next Generation I/O Dr Michèle Weiland m.weiland@epcc.ed.ac.uk. Current trends & approaches 28/03/2017 39th ORAP Forum 2. Burst

Moving beyond burst buffer

• Non-volatile is coming to the node rather than the filesystem• Argonne Theta machine has 128GB SSD in each

compute node

28/03/2017 39th ORAP Forum 4

highperformancenetwork

externalfilesystem

computenodes

Page 5: Non-Volatile Memory for Next Generation I/O€¦ · Next Generation I/O Dr Michèle Weiland m.weiland@epcc.ed.ac.uk. Current trends & approaches 28/03/2017 39th ORAP Forum 2. Burst

Non-volatile memory

• Non-volatile RAM• 3D XPoint technology is one example

• Much larger capacity than DRAM• Hosted in the DRAM slots (DIMM form

factor), controlled by a standard memory controller

• Slower than DRAM by a small factor, but significantly faster than SSDs

28/03/2017 39th ORAP Forum 5

Memory

Storage

Cache

SlowStorage

Cache

NVRAM

FastStorage

Memory

Page 6: Non-Volatile Memory for Next Generation I/O€¦ · Next Generation I/O Dr Michèle Weiland m.weiland@epcc.ed.ac.uk. Current trends & approaches 28/03/2017 39th ORAP Forum 2. Burst

The NEXTGenIO approach

28/03/2017 39th ORAP Forum 6

Page 7: Non-Volatile Memory for Next Generation I/O€¦ · Next Generation I/O Dr Michèle Weiland m.weiland@epcc.ed.ac.uk. Current trends & approaches 28/03/2017 39th ORAP Forum 2. Burst

NEXTGenIO key facts

• FETHPC Research & Innovation Action• Active for 18 months so

far• 8 partners, covering

• Hardware• HPC centres and users• Software• Tools developers

28/03/2017 39th ORAP Forum 7

Page 8: Non-Volatile Memory for Next Generation I/O€¦ · Next Generation I/O Dr Michèle Weiland m.weiland@epcc.ed.ac.uk. Current trends & approaches 28/03/2017 39th ORAP Forum 2. Burst

Our objectives

• Hardware platform prototypeØDemonstrating the prototype’s broad applicability for both HPC

and data centric applications

• Exascale I/O investigationØUnderstanding how best to exploit NVRAM

• Systemware developmentØProducing the necessary software to enable (Exascale)

application execution on the hardware platform

• Application co-design ØUnderstanding individual application I/O profiles and typical I/O

workloads on shared systems running multiple different applications

28/03/2017 39th ORAP Forum 8

Page 9: Non-Volatile Memory for Next Generation I/O€¦ · Next Generation I/O Dr Michèle Weiland m.weiland@epcc.ed.ac.uk. Current trends & approaches 28/03/2017 39th ORAP Forum 2. Burst

Systemware

• System software must understand extra level present in the memory hierarchy• Work on adapting job scheduler (SLURM)• Development of a data scheduler• Object stores as alternatives to file systems

• DAOS (Distributed Application Object Storage)• dataClay

• Multi-node NVRAM file system• echoFS

☛ Key goal: Platform must be usable “as is” for legacy applications

28/03/2017 39th ORAP Forum 9

Page 10: Non-Volatile Memory for Next Generation I/O€¦ · Next Generation I/O Dr Michèle Weiland m.weiland@epcc.ed.ac.uk. Current trends & approaches 28/03/2017 39th ORAP Forum 2. Burst

Workloads & I/O

• Try and understand how different I/O behaviour and scheduling policies will impact job throughput• Three different workloads

• Generic à EPCC• Special purpose à ECMWF• Commercial à Arctur

• I/O Workload Simulator• Create benchmark of synthetic jobs generated from

real workloads à to be deployed on HPC system• Create simulator of workload schedule to test

impact of policies and I/O performance à to be deployed on laptop

28/03/2017 39th ORAP Forum 10

Page 11: Non-Volatile Memory for Next Generation I/O€¦ · Next Generation I/O Dr Michèle Weiland m.weiland@epcc.ed.ac.uk. Current trends & approaches 28/03/2017 39th ORAP Forum 2. Burst

ARCHER workload

28/03/2017 39th ORAP Forum 11

metadata

write

read

Page 12: Non-Volatile Memory for Next Generation I/O€¦ · Next Generation I/O Dr Michèle Weiland m.weiland@epcc.ed.ac.uk. Current trends & approaches 28/03/2017 39th ORAP Forum 2. Burst

ARCHER workload

28/03/2017 39th ORAP Forum 12

metadata

write

read

Page 13: Non-Volatile Memory for Next Generation I/O€¦ · Next Generation I/O Dr Michèle Weiland m.weiland@epcc.ed.ac.uk. Current trends & approaches 28/03/2017 39th ORAP Forum 2. Burst

Tools support

• Profiling and debugging tools need to be able to understand implications of an additional (potentially persistent) memory layer

28/03/2017 39th ORAP Forum 13

Page 14: Non-Volatile Memory for Next Generation I/O€¦ · Next Generation I/O Dr Michèle Weiland m.weiland@epcc.ed.ac.uk. Current trends & approaches 28/03/2017 39th ORAP Forum 2. Burst

Applications

• Traditional HPC• OpenFOAM à CFD• CASTEP à Chemistry• IFS à weather forecasting• MONC à cloud modelling

• Novel uses• OSPRay à ray tracing, rendering engine• Halvade à genome sequencing• Tiramisu à deep learning (based on Caffe)• K-means à ML

28/03/2017 39th ORAP Forum 14

Page 15: Non-Volatile Memory for Next Generation I/O€¦ · Next Generation I/O Dr Michèle Weiland m.weiland@epcc.ed.ac.uk. Current trends & approaches 28/03/2017 39th ORAP Forum 2. Burst

Usage models

28/03/2017 39th ORAP Forum 15

Page 16: Non-Volatile Memory for Next Generation I/O€¦ · Next Generation I/O Dr Michèle Weiland m.weiland@epcc.ed.ac.uk. Current trends & approaches 28/03/2017 39th ORAP Forum 2. Burst

NVRAM usage models

• The “memory” usage model allows for the extension of the main memory • The data is volatile like normal DRAM based main memory

• The “storage” usage model which supports the use of NVRAM like a classic block device• E.g. like a very fast SSD

• The “application direct” usage model maps persistent storage from the NVRAM directly into the main memory address space• Direct CPU load/store instructions for persistent main

memory regions

28/03/2017 39th ORAP Forum 16

Page 17: Non-Volatile Memory for Next Generation I/O€¦ · Next Generation I/O Dr Michèle Weiland m.weiland@epcc.ed.ac.uk. Current trends & approaches 28/03/2017 39th ORAP Forum 2. Burst

Exploiting distributed storage

Filesystem

Memory Memory Memory Memory Memory Memory

Node Node Node Node Node Node

Network

Filesystem

Network

Memory

NodeNVRAM

Memory

NodeNVRAM

Memory

NodeNVRAM

Memory

NodeNVRAM

Memory

NodeNVRAM

Memory

NodeNVRAM

28/03/2017 17Filesystem

Network

Memory

NodeNVRAM

Memory

Node

Memory

NodeNVRAM

Memory

Node

Memory

NodeNVRAM

Memory

Node

Page 18: Non-Volatile Memory for Next Generation I/O€¦ · Next Generation I/O Dr Michèle Weiland m.weiland@epcc.ed.ac.uk. Current trends & approaches 28/03/2017 39th ORAP Forum 2. Burst

Using distributed storage

• Without changing applications• Large memory space/in-memory database etc…• Local filesystem

ØUsers manage data themselvesØNo global data access/namespace, large number of filesØStill require global filesystem for persistence

28/03/2017 39th ORAP Forum 18

Filesystem

Network

Memory

Node/tmp

Memory

Node/tmp

Memory

Node/tmp

Memory

Node/tmp

Memory

Node/tmp

Memory

Node/tmp

Page 19: Non-Volatile Memory for Next Generation I/O€¦ · Next Generation I/O Dr Michèle Weiland m.weiland@epcc.ed.ac.uk. Current trends & approaches 28/03/2017 39th ORAP Forum 2. Burst

Using distributed storage

• Without changing applications• Filesystem buffer

ØPre-load data into NVRAM from filesystemØUse NVRAM for I/O and write data back to filesystem at the

endØRequires systemware to preload and postmove dataØUses filesystem as namespace manager

28/03/2017 39th ORAP Forum 19

Filesystem

Network

Memory

Nodebuffer

Memory

Nodebuffer

Memory

Nodebuffer

Memory

Nodebuffer

Memory

Nodebuffer

Memory

Nodebuffer

Page 20: Non-Volatile Memory for Next Generation I/O€¦ · Next Generation I/O Dr Michèle Weiland m.weiland@epcc.ed.ac.uk. Current trends & approaches 28/03/2017 39th ORAP Forum 2. Burst

Using distributed storage

• Without changing applications• Global filesystem

ØRequires functionality to create and tear down global filesystems for individual jobs

ØRequires filesystem that works across nodesØRequires functionality to preload and postmove filesystemsØNeed to be able to support multiple filesystems across system

28/03/2017 39th ORAP Forum 20

Filesystem

Network

Memory Memory

Node

Memory Memory Memory Memory

Node

Node NodeNodeNode

Filesystem

Page 21: Non-Volatile Memory for Next Generation I/O€¦ · Next Generation I/O Dr Michèle Weiland m.weiland@epcc.ed.ac.uk. Current trends & approaches 28/03/2017 39th ORAP Forum 2. Burst

Using distributed storage

• With changes to applications• Object store

ØNeeds same functionality as global filesystemØRemoves need for POSIX, or POSIX-like functionality

28/03/2017 39th ORAP Forum 21

Filesystem

Network

Memory Memory

Node

Memory Memory Memory Memory

Node

Node NodeNodeNode

Objectstore

Page 22: Non-Volatile Memory for Next Generation I/O€¦ · Next Generation I/O Dr Michèle Weiland m.weiland@epcc.ed.ac.uk. Current trends & approaches 28/03/2017 39th ORAP Forum 2. Burst

Using distributed storage

• Without changing applications• Automatic check-pointing

• Resiliency• Local check-pointing without hitting the filesystem

• Pause and restart• Just-in-time scheduling/high priority jobs• Waiting for something else to happen…

28/03/2017 39th ORAP Forum 22

Page 23: Non-Volatile Memory for Next Generation I/O€¦ · Next Generation I/O Dr Michèle Weiland m.weiland@epcc.ed.ac.uk. Current trends & approaches 28/03/2017 39th ORAP Forum 2. Burst

Using distributed storage

• New usage models• Resident data sets

• Sharing preloaded data across a range of jobs• Data analytic workflows• How to control access/authorisation/security/etc….?

• Workflows• Producer-consumer model

• Remove filesystem from intermediate stages

28/03/2017 39th ORAP Forum 23

Job1

Filesystem

Job2 Job3 Job4

Page 24: Non-Volatile Memory for Next Generation I/O€¦ · Next Generation I/O Dr Michèle Weiland m.weiland@epcc.ed.ac.uk. Current trends & approaches 28/03/2017 39th ORAP Forum 2. Burst

Using distributed storage

• Workflows• How to enable different sized applications?

• How to schedule these jobs fairly?• How to enable secure access?

28/03/2017 39th ORAP Forum 24

Job1

Filesystem

Job2Job3

Job4Job2

Job2 Job2 Job4

Page 25: Non-Volatile Memory for Next Generation I/O€¦ · Next Generation I/O Dr Michèle Weiland m.weiland@epcc.ed.ac.uk. Current trends & approaches 28/03/2017 39th ORAP Forum 2. Burst

The Challenge of distributed storage

• Enabling all the use cases in multi-user, multi-job environment is the real challenge• Heterogeneous scheduling mix• Different requirements on the NVRAM• Scheduling across these resources• Enabling sharing of nodes• Not impacting on node compute performance

• Enabling applications to do more I/O• Large numbers of our applications don’t heavily

use I/O at the moment• What can we enable if I/O is significantly cheaper?

28/03/2017 39th ORAP Forum 25

Page 26: Non-Volatile Memory for Next Generation I/O€¦ · Next Generation I/O Dr Michèle Weiland m.weiland@epcc.ed.ac.uk. Current trends & approaches 28/03/2017 39th ORAP Forum 2. Burst

Questions?

28/03/2017 39th ORAP Forum 26