24
Portable, Scalable, and High-Performance I/O Forwarding on Massively Parallel Systems Jason Cope [email protected]

Portable, Scalable, and High-Performance I/O Forwarding on ...DOE FastOS2 I/O Forwarding Scalability Layer (IOFSL) Project Goal: Design, build, and distribute a scalable, unified

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Portable, Scalable, and High-Performance I/O Forwarding on ...DOE FastOS2 I/O Forwarding Scalability Layer (IOFSL) Project Goal: Design, build, and distribute a scalable, unified

Portable, Scalable, and High-Performance I/O Forwarding on Massively Parallel Systems 

[email protected]

Page 2: Portable, Scalable, and High-Performance I/O Forwarding on ...DOE FastOS2 I/O Forwarding Scalability Layer (IOFSL) Project Goal: Design, build, and distribute a scalable, unified

Computation and I/O Performance Imbalance

  Leadership‐classcomputa:onalscale:–  >100,000processes–  Mul:‐corearchitectures

–  Lightweightopera:ngsystemsoncomputenodes

  Leadership‐classstoragescale:–  >100servers–  Clusterfilesystems

–  Commercialstoragehardware

  Computeandstorageimbalanceincurrentleadership‐classsystemshindersapplica:onI/Operformance–  1GB/sofstoragethroughputforevery10TFofcomputa:on

performancegap

–  Thegaphasincreasedbyafactorof10inrecentyears2

Page 3: Portable, Scalable, and High-Performance I/O Forwarding on ...DOE FastOS2 I/O Forwarding Scalability Layer (IOFSL) Project Goal: Design, build, and distribute a scalable, unified

DOE FastOS2 I/O Forwarding Scalability Layer (IOFSL) Project

Goal:Design,build,anddistributeascalable,unifiedhigh‐endcompu:ngI/OforwardingsoPwarelayerthatwouldbeadoptedbytheDOEOfficeofScienceandNNSA.–  Reducethenumberoffilesystemopera:onsthattheparallel

filesystemhandles–  Providefunc:onshippingatthefilesysteminterfacelevel

–  Offloadfilesystemfunc:onsfromsimpleorfullOSclientprocessestoavarietyoftargets

–  Supportmul:pleparallelfilesystemsolu:onsandnetworks

–  IntegratewithMPI‐IOandanyhardwarefeaturesdesignedtosupportefficientparallelI/O

3

Page 4: Portable, Scalable, and High-Performance I/O Forwarding on ...DOE FastOS2 I/O Forwarding Scalability Layer (IOFSL) Project Goal: Design, build, and distribute a scalable, unified

Outline

  I/OForwardingScalabilityLayer(IOFSL)Overview  IOFSLDeploymentonArgonne’sIBMBlueGene/PSystems  IOFSLDeploymentonOakRidge’sCrayXTSystems

  Op:miza:onsandResults–  PipelininginIOFSL–  RequestSchedulingandMerginginIOFSL–  IOFSLRequestProcessing

  FutureWorkandSummary

4

Page 5: Portable, Scalable, and High-Performance I/O Forwarding on ...DOE FastOS2 I/O Forwarding Scalability Layer (IOFSL) Project Goal: Design, build, and distribute a scalable, unified

HPC I/O Software Stack

!

!"#$%&'()*+('+,(%

! !"#$""%#&'#()*"#+,-#.'/&0)1"#).#"//232"$&#).#4'..256"

5

Page 6: Portable, Scalable, and High-Performance I/O Forwarding on ...DOE FastOS2 I/O Forwarding Scalability Layer (IOFSL) Project Goal: Design, build, and distribute a scalable, unified

IOFSL Architecture

 Client- MPI‐IOusingZoidFSROMIOinterface

- POSIXusinglibsysioorFUSE

 Network- TransmitmessageusingBMIoverTCP/IP,MX,IB,Portals,andZOID

- MessagesencodedusingXDR

 Server- DelegatesIOtobackendfilesystemsusingna:vedriversorlibsysio

I/O Forwarding Server

System Network

Client Processing Node

ROMIO libsysio FUSE

ZOIDFS Client

Network API

Network API

ZOIDFS Server

PVFS POSIX libsysio

LustreGPFS

6

Page 7: Portable, Scalable, and High-Performance I/O Forwarding on ...DOE FastOS2 I/O Forwarding Scalability Layer (IOFSL) Project Goal: Design, build, and distribute a scalable, unified

Argonne’s IBM Blue Gene/P Systems

7

FigureCourtesyofRobertRoss,ANL

Page 8: Portable, Scalable, and High-Performance I/O Forwarding on ...DOE FastOS2 I/O Forwarding Scalability Layer (IOFSL) Project Goal: Design, build, and distribute a scalable, unified

IOFSL Deployment on Argonne’s IBM Blue Gene/P Systems

8

ION

Storage Server

Storage Server

Storage Server

10 Gbit Ethernet Network

ION

Compute Nodes Compute Nodes

TreeNetwork

TreeNetwork

PVFS2 servers GPFS servers

PVFS2 clients GPFS clients

IOFSL serversZOID servers

IOFSL clients,ZOID clients

Page 9: Portable, Scalable, and High-Performance I/O Forwarding on ...DOE FastOS2 I/O Forwarding Scalability Layer (IOFSL) Project Goal: Design, build, and distribute a scalable, unified

Initial IOFSL Results on Argonne’s IBM Blue Gene/P Systems

0

100

200

300

400

500

600

700

800

4 16 64 256 1024 4096 16384 65536

Avg

Ba

nd

wid

th (

MiB

/s)

Message Size (KiB)

CIODIOFSL

0

100

200

300

400

500

600

700

800

900

4 16 64 256 1024 4096 16384 65536

Avg B

andw

idth

(M

iB/s

)

Message Size (KiB)

CIODIOFSL

9

IORRead IORWrite

Page 10: Portable, Scalable, and High-Performance I/O Forwarding on ...DOE FastOS2 I/O Forwarding Scalability Layer (IOFSL) Project Goal: Design, build, and distribute a scalable, unified

Initial IOFSL Results on Argonne’s IBM Blue Gene/P Systems

10

0

100

200

300

400

500

600

700

800

64 128 256 512 1024

Avg B

andw

idth

(M

iB/s

)

Clients

CIOD, non-collective, t=8MIOFSL, TASK, t=8M

Page 11: Portable, Scalable, and High-Performance I/O Forwarding on ...DOE FastOS2 I/O Forwarding Scalability Layer (IOFSL) Project Goal: Design, build, and distribute a scalable, unified

Oak Ridge’s Cray XT Systems

Enterprise Storagecontrollers and large

racks of disks are connectedvia InfiniBand.

48 DataDirect S2A9900controller pairs with

1 Tbyte drives and 4 InifiniBand

connections per pair

Storage Nodesrun parallel file system software and manage incoming FS traffic.

192 dual quad coreXeon servers with

16 Gbytes of RAM each

SION Networkprovides connectivity

between OLCF resources and

primarily carries storage traffic.

3000+ port 16 Gbit/secInfiniBand switch

complex

Lustre Router Nodesrun parallel file system

client software andforward I/O operations

from HPC clients.

192 (XT5) and 48 (XT4)one dual core

Opteron nodes with8 GB of RAM each

Jaguar XT5

Jaguar XT4

XT5 SeaStar2+ 3D Torus

9.6 Gbytes/sec

InfiniBand16 Gbit/sec

384 Gbytes/s

96Gbytes/s

384 Gbytes/s

384 Gbytes/s

Serial ATA3 Gbit/sec

366 Gbytes/s

Other Systems

(Viz, Clusters)

11

FigureCourtesyofGalenShipman,ORNL

Page 12: Portable, Scalable, and High-Performance I/O Forwarding on ...DOE FastOS2 I/O Forwarding Scalability Layer (IOFSL) Project Goal: Design, build, and distribute a scalable, unified

IOFSL Deployment on Oak Ridge’s Cray XT Systems

Storage Server

Storage Server

Storage Server

16 Gbit Infiniband Network

Compute Nodes

Lustre servers

Lustre clients

IOFSL clients

IOFSL servers

TCPNetwork

LustreRouterNodes

12

Page 13: Portable, Scalable, and High-Performance I/O Forwarding on ...DOE FastOS2 I/O Forwarding Scalability Layer (IOFSL) Project Goal: Design, build, and distribute a scalable, unified

Initial IOFSL Results on Oak Ridge’s Cray XT Systems

13

0

100

200

300

400

500

600

700

800

128 256 512 1024 2048 4096

Avg B

and

wid

th (

MiB

/s)

Clients

IOFSL, TASK, t=8MXT4, non-collective, t=8M

Page 14: Portable, Scalable, and High-Performance I/O Forwarding on ...DOE FastOS2 I/O Forwarding Scalability Layer (IOFSL) Project Goal: Design, build, and distribute a scalable, unified

IOFSL Optimization #1: Pipeline Data Transfers

  Mo:va:on–  LimitsontheamountofmemoryavailableonI/Onodes

–  Limitsontheamountofpostednetworkopera:ons

–  Needtooverlapnetworkopera:onsandfilesystemopera:onforsustainedthroughput

  Solu:on:PipelinedatatransfersbetweentheIOFSLclientandserver–  Nego:atethepipelinetransferbuffersize–  Databuffersareaggregatedorsegmentedatthenego:ated

buffersize

–  Issuenetworktransferrequestsforeachpipelinebuffer–  Reformatpipelinebuffersintotheoriginalbuffersizes

  Currentlyserialandparallelpipelinemodes

14

Page 15: Portable, Scalable, and High-Performance I/O Forwarding on ...DOE FastOS2 I/O Forwarding Scalability Layer (IOFSL) Project Goal: Design, build, and distribute a scalable, unified

Pipeline Data Transfer Results for Different IOFSL Server Configurations

15

0

50

100

150

200

250

256 512 1024 2048 4096 8192

Avg

Ba

nd

wid

th (

MiB

/s)

Pipeline Buffer Size (MiB)

Server Config #1 (SM Events)Server Config #2 (TASK Events)

Page 16: Portable, Scalable, and High-Performance I/O Forwarding on ...DOE FastOS2 I/O Forwarding Scalability Layer (IOFSL) Project Goal: Design, build, and distribute a scalable, unified

IOFSL Optimization #2: Request Scheduling and Merging   RequestschedulingaggregatesseveralrequestsintoabulkIO

request

–  Reducesthenumberofclientaccessestothefilesystems–  Withpipelinetransfers,overlapsnetworkandstorageIO

accesses

  Twoschedulingmodessupported–  FIFOmodeaggregatesrequestsastheyarrive

–  Handle‐BasedRound‐Robin(HBRR)iteratesoverallac:vefilehandlestoaggregaterequests

  Requestmergingiden:fiesaggregatesnoncon:guousrequestsintocon:guousrequests–  BruteForcemodeiteratesoverallpendingrequests

–  IntervalTreemodecomparesrequeststhatareonsimilarranges

16

Page 17: Portable, Scalable, and High-Performance I/O Forwarding on ...DOE FastOS2 I/O Forwarding Scalability Layer (IOFSL) Project Goal: Design, build, and distribute a scalable, unified

IOFSL Request Scheduling and Merging Results with the IOFSL GridFTP Driver

0

50

100

150

200

250

300

350

400

8 16 32 64 128

Avg B

andw

idth

(M

iB/s

)

Number of Clients

Requesting SchedulingNo Request Scheduling

17

MPI Application Application

FUSEMPI-IO

WAN

IOFSL Server

GridFTP Server

GridFTP Server GridFTP Server

High-Performance Storage System

Archival Storage System

Page 18: Portable, Scalable, and High-Performance I/O Forwarding on ...DOE FastOS2 I/O Forwarding Scalability Layer (IOFSL) Project Goal: Design, build, and distribute a scalable, unified

IOFSL Optimization #3: Request Processing and Event Mode   Mul:‐ThreadedTaskMode

–  Newthreadforexecu:ngeachIOrequest–  Simpleimplementa:on

–  Threadconten:onandscalabilityissues  StateMachineMode

–  UseafixednumberofthreadsfromathreadpooltoexecuteIOrequests

–  DivideIOrequestsintosmallerunitsofwork

–  ThreadpoolsschedulesIOrequeststorunnon‐blockingunitsofwork(datamanipula:on,pipelinecalcula:ons,requestmerging)

–  Yieldexecu:onofIOrequestsonblockingresourceaccesses(networkcommunica:on,:merevents,memoryalloca:ons)

18

Page 19: Portable, Scalable, and High-Performance I/O Forwarding on ...DOE FastOS2 I/O Forwarding Scalability Layer (IOFSL) Project Goal: Design, build, and distribute a scalable, unified

IOFSL Request Processing and Event Mode: Argonne’s IBM Blue Gene/P Results

19

0

200

400

600

800

1000

1200

1400

1600

64 128 256 512 1024

Avg B

andw

idth

(M

iB/s

)

Clients

CIOD, non-collective, t=8MIOFSL, TASK, t=8M

IOFSL, SM, t=8M

Page 20: Portable, Scalable, and High-Performance I/O Forwarding on ...DOE FastOS2 I/O Forwarding Scalability Layer (IOFSL) Project Goal: Design, build, and distribute a scalable, unified

IOFSL Request Processing and Event Mode: Oak Ridge’s Cray XT4 Results

20

0

100

200

300

400

500

600

700

800

128 256 512 1024 2048 4096

Avg B

andw

idth

(M

iB/s

)

Clients

IOFSL, TASK, t=8MIOFSL, SM, t=8M

XT4, non-collective, t=8M

Page 21: Portable, Scalable, and High-Performance I/O Forwarding on ...DOE FastOS2 I/O Forwarding Scalability Layer (IOFSL) Project Goal: Design, build, and distribute a scalable, unified

Current and Future Work

  ScalingandtuningofIOFSLonIBMBG/PandCrayXTsystems  Collabora:vecachinglayerbetweenIOFSLservers  Securityinfrastructure  Integra:ngIOFSLwithend‐to‐endI/Otracingandvisualiza:ontools

fortheNSFHECURAIOVIS/Jupiterproject

21

Page 22: Portable, Scalable, and High-Performance I/O Forwarding on ...DOE FastOS2 I/O Forwarding Scalability Layer (IOFSL) Project Goal: Design, build, and distribute a scalable, unified

Project Participants and Support

  ArgonneNa,onalLaboratory:RobRoss,PeteBeckman,KamilIskra,DriesKimpe,JasonCope

  LosAlamosNa,onalLaboratory:JamesNunez,JohnBent,GaryGrider,SeanBlanchard,LatchesarIonkov,HughGreenberg

  OakRidgeNa,onalLaboratory:StevePoole,TerryJones  SandiaNa,onalLaboratories:LeeWard

  UniversityofTokyo:KazukiOhta,YutakaIshikawa

  TheIOFSLprojectissupportedbytheDOEOfficeofScienceandNNSA.

22

Page 23: Portable, Scalable, and High-Performance I/O Forwarding on ...DOE FastOS2 I/O Forwarding Scalability Layer (IOFSL) Project Goal: Design, build, and distribute a scalable, unified

IOFSL Software Access, Documentation, and Links

  IOFSLProjectWebsite:hpp://www.iofsl.org

  IOFSLWikiandDevelopersWebsite:hpp://trac.mcs.anl.gov/projects/iofsl/wiki

  AccesstoIOFSLPublicgitRepository:gitclonehpp://www.mcs.anl.gov/research/projects/iofsl/gitiofsl

  Recentpublica:ons  K.Ohta,D.Kimpe,J.Cope,K.Iskra,R.Ross,andY.Ishikawa,"Op:miza:onTechniquesattheI/OForwardingLayer,”IEEECluster2010(toappear).

  D.Kimpe,J.Cope,K.Iskra,andR.Ross."GridsandHPC:NotasDifferentasyoumightthink,"Para2010mini‐symposiumonReal‐:meaccessandProcessingofLargeDataSets,April2010.

  23

Page 24: Portable, Scalable, and High-Performance I/O Forwarding on ...DOE FastOS2 I/O Forwarding Scalability Layer (IOFSL) Project Goal: Design, build, and distribute a scalable, unified

Questions?

JasonCope

[email protected]

24