IMCa: A High-Performance Caching Front-end for GlusterFS...

IMCa: A High-Performance Caching Front-end for GlusterFS on InfiniBand

Ranjit Noronha and Dhabaleswar K. PandaNetwork Based Computing Lab

The Ohio State University<noronha, panda>@cse.ohio-state.edu

Outline of the Talk

• Background and Motivation

• Architecture and Design of IMCa

• Experimental Evaluation of IMCa

• Conclusions and Future Work

Background

• Large Scale Scientific and Commercial Workloads

• Petascale Computers have arrived

• High-Performance access to the I/O data is crucial – Parallel applications is often limited by I/O

• Clustered/Parallel File Systems have evolved to meet this challenge

• File System performance still dependent on disk performance

• Single Server Bandwidth Drop With Multiple Clients

• Parallel I/O Bandwidth From Multiple Servers

1 2 3 4 5 6 7 8

RDMA IPoIB GigE

1 2 3 4 5 6 7 8

R DMA IP oIB G igE

Number of clients

4GB Server Memory

8GB Server Memory

• Performance for Small Files– Generally difficult to achieve– Many environments with a large number of small files– Storing on the same disk block provides limited benefit– Striping does not provide benefit– Store on different servers

• Cache Coherency Problems– Client side cache provides good performance– Non-coherent client cache limited when there is sharing– Limited Scalability of coherent caches

• Server Load Problems– RDMA reduces overhead from TCP/IP– RDMA based transport protocols cannot reduce copying costs

within the file system

Problem Statement• Which file-system operations are potential

targets for caching?• What are the alternatives to the traditional

client cache/server cache architecture?• What are the advantages and disadvantages

of alternate cache architectures?• How do we provide the performance of the

non-coherent client cache without the scalability problems of the coherent client cache?

Outline of the Talk

Potential File System Operations That May Be Cached

• Potential Targets For Caching– Should be something the client reads– Should be possible to uniquely identify cache target– Should be possible to chunk the data element

• Small Operations Stat, Create, Delete, Open• Stat

– Read by the client – Used as a form of update by many applications– Should be used – Should be updated on read/write operations on the server

• Create/delete– Not read by the client– Delete should invalidate previous cache entries

• File Open– Not a target for caching, but may be used for prefetching

• Data Transfer Operations – Read and Writes– Blocks Needed

Intermediate Cache Architecture (IMCa)

• Easy to maintain coherency• Extensible• Can multiple Cache nodes

provide benefit?

FS Client

SMCache

Underlying FS Cache

Cache1 Cache2 CachenEach Cache is a node (MCD Array)

Hash Function (CRC32) toFind the Cache Server

CMCache

Need for Blocks In IMCa

• Most file system store data on the disk as blocks

• Parallel file-systems stripe data across multiple servers

• IMCa uses a fixed block size to store data across the cache servers– Block size should provide good performance

for most small files– Should avoid

• excessive fragmentation

Requested Data

Data Block Boundaries

Extradata File data segmented

by IMCa blocksize

Design-Read Operations (Hit)Client

SMCache

Underlying FS Cache

Cache1 Cache2 Cachen

CMCache

Design-Read Operations (Miss)Client

SMCache

Underlying FS Cache

CMCache

Design-Write Operations

Client

SMCache

Underlying FS Cache

CMCache

Advantages/Disadvantages of IMCa

• Fewer Requests Hit the Server• Latency for requests read from the cache is

lower• MCDs are self-managing• Failures in MCDs do not impact correctness• Additional node elements needed especially

for caching• Cold Misses are expensive• Additional Blocks/Data Transfers Needed• Overhead and delayed updates

Outline of the Talk

GlusterFS File System

• Clustered File System

• Client and Server in userspace

• Use FUSE interface to translate FS calls from the kernel to the user daemons

• No Stripping data distributed across servers

• Possible to apply translators at the server and client to perform different functions

• WWW.glusterfs.org

Experimental Setup

• 64-node cluster– 8-core Intel Clovertown– 8 GB memory

• InfiniBand DDR is the interconnect• GlusterFS file-system• The data servers each have 8 RAID highpoint disks• Communication protocol is IPoIB in Reliable Connected

(RC) mode• MCDs run on independent nodes and use up to 6GB of

memory • CMCache and SMCache use a CRC32 hash function for

locating data on the MCDs• Lustre 1.6.4.3 is used with a socklnd for comparison

Experiment-stat

– Consists of two stages

– First stage (untimed)• 262144 files created by a single node

– Second stage (timed)• each node tries to perform a stat on each of the 262144 files sequentially

Stat performance

• Time to stat 262144 different files• Benchmark has two phase create (untimed), followed by stat (timed)• 82% improvement at 64 nodes

16 32 64

No Cache 1 Cache Server 2

2 Cache Servers 4 Cache Servers

6 Cache Servers Lustre-4DS

Number of Nodes

1 2 4 8

No Cache 1 Cache Server 2

2 Cache Servers 4 Cache Servers

6 Cache Servers Lustre-4DS

Experiment-Write Single Client

– One Client

– Writes 1,024 records of size r sequentially to the file

– Measure time for this to complete

Write – Single Client

0200400600800

1000120014001600

1 2 4 8 16 32 64 128

No Cache IMCa (2K) IMCa (Server Threads)

I/O Record Size (bytes)

• 2KB block size• Server thread helps performance

Experiments-Read

• Single Client Read– Follows Write component of the benchmark

– Move file pointer to the beginning of the file

– Read 1,024 records of size r sequentially to the file

– Measure time for this to complete

• Multiple Client Read– Each client uses a separate file

• Multiple Client Read Shared – Same file used by every client

• Lustre configurations– Cold Client Cache Unmount between Write and Read

– Warm Client Cache No unmount between Write and Read

Read Latency (Single Client)

1 4 16 64 256

No Cache Cache (256)

Cache (2K) Cache (8K)

Lustre-1DS (Cold) Lustre-4DS (Cold)

Lustre-4DS (Warm)

No Cache Cache (256)

Cache (2K) Cache (8K)

Lustre-1DS (Cold) Lustre-4DS (Cold)

Lustre-4DS (Warm)

Bytes•Lustre shows best latency•Cache provides benefit for small message sizes

Read Multiple Client (32 clients)

1 2 4 8

NoCache

IMCa (1)

IMCa (2)

IMCa (4)

Lustre (Cold)

Lustre (Warm)

•51% improvement in latency at 16K•Multiple MCDs help reduce capacity misses

Iozone throughput

No Cache Cache (1) Cache (2) Cache (4) Lustre-1DS (Cold)

•1, 2, 4, 8 IOzone threads, 1GB files, 2KB block size•325 MB/s (NoCache) -> 868 MB/s (4 MCDs)

Read-Shared Latency

2 4 8 16 32

No Cache Lustre-1DS (Cold) MCD (1)

Number of nodes

•IMCa helps improve performance over NoCache case

Outline of the Talk

Conclusions and Future Work

•Proposed, Designed and Evaluated an Intermediate Cache for GlusterFS•Good improvement in stat performance•Improvement in latency/throughput of read operations

•Depends on block size• Would like to evaluate the performance with RDMA•Would like to evaluate distribution algorithms

Acknowledgements

Our research is supported by the following organizations

Thank you

{noronha, panda}@cse.ohio-state.edu

Network-Based Computing Laboratory

http://nowlab.cse.ohio-state.edu/

IMCa: A High-Performance Caching Front-end for GlusterFS...

Documents

Clusters With Glusterfs

QEMU-GlusterFS integration › en › latest › ...Enabling GlusterFS for Virtualization use QEMU-GlusterFS integration – Native integration, no FUSE mount – Gluster as QEMU block

IMCA diving seminar · M 179, IMCA M 196, IMCA M 203, IMCA M 206. SAFETY FLASHES IMCA has distributed 30 safety flashes in 2016 so far, covering 120 incidents. Recent flashes drew

Manual Imca

On-demand File Caching with GlusterFS · On-demand File Caching with GlusterFS. Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012 Agenda • This

Glusterfs and openstack

IMCA Capital

Glusterfs for sysadmins-justin_clift

GlusterFS Documentation

IMCA WorkingSaferOffshore

Imca Competence

GlusterFS Internals and - Red Hat€¦ · GlusterFS Internals and Directions Jeff Darcy Principal Engineer, Red Hat 13 June, 2013. GlusterFS is not a filesystem. Wait . . . what?

GlusterFS Current Features & Roadmappeople.redhat.com/ndevos/talks/FOSDEM-2015/GlusterFS...GlusterFS Current Features & Roadmap Niels de Vos GlusterFS co-maintainer ndevos@redhat.com

Achieving Coherent and Aggressive Caching in … Storage Developer Conference. © Insert Your Company Name. All Rights Reserved. Achieving Coherent and Aggressive Caching in DFS, GlusterFS

GLUSTERD-2 - FOSDEM...GLUSTERFS TERMS Peer/Node/Server - A computer with the GlusterFS server packages installed Trusted Storage Pool - The GlusterFS cluster Brick - An empty directory

GlusterFS CTDB Integration

Marian Marinov Clusters With Glusterfs

QEMU-GlusterFS integration

Glusterfs and Hadoop

GlusterFS – Architecture & Roadmap