23
1 © 2017 IBM Corporation Alluxio - CAPI Flash Integration for Big Data Frameworks Kavana N Bhat & Shajith Chandran Power Systems, IBM

Alluxio - CAPI Flash Integration for Big Data Frameworks CAPI: IBM CAPI User Space Block Library: What is CAPI? For Partners: 15 16 17 Core

Embed Size (px)

Citation preview

Page 1: Alluxio - CAPI Flash Integration for Big Data Frameworks CAPI:  IBM CAPI User Space Block Library:  What is CAPI?  For Partners:  15 16 17 Core

1

© 2017 IBM Corporation

Alluxio - CAPI Flash Integration for Big Data Frameworks

Kavana N Bhat & Shajith Chandran

Power Systems, IBM

Page 2: Alluxio - CAPI Flash Integration for Big Data Frameworks CAPI:  IBM CAPI User Space Block Library:  What is CAPI?  For Partners:  15 16 17 Core

2

© 2017 IBM Corporation

Big Data Ecosystem Today

Page 3: Alluxio - CAPI Flash Integration for Big Data Frameworks CAPI:  IBM CAPI User Space Block Library:  What is CAPI?  For Partners:  15 16 17 Core

3

© 2017 IBM Corporation

Alluxio (formerly Tachyon)

Memory Speed Virtual Distributed Storage

Enables Virtualized Data Across Multiple Types of Storage

Page 4: Alluxio - CAPI Flash Integration for Big Data Frameworks CAPI:  IBM CAPI User Space Block Library:  What is CAPI?  For Partners:  15 16 17 Core

4

© 2017 IBM Corporation

Why use Alluxio?

Spark Job

Spark mem

Hadoop MR Job

YARN

HDFS / Amazon S3

HDFS

disk

block 1

block 3

block 2

block 4

Alluxio

in-memory Data

Data survive in memory

after computation

crashes

Off-heap storage, no GC

Memory-speed data sharing across

jobs and frameworks

Page 5: Alluxio - CAPI Flash Integration for Big Data Frameworks CAPI:  IBM CAPI User Space Block Library:  What is CAPI?  For Partners:  15 16 17 Core

5

© 2017 IBM Corporation

Storage Tier Hierarchy

Faster

Higher

Capacity

Page 6: Alluxio - CAPI Flash Integration for Big Data Frameworks CAPI:  IBM CAPI User Space Block Library:  What is CAPI?  For Partners:  15 16 17 Core

6

© 2017 IBM Corporation

Automatic Data Migration

Data can be evicted to lower layers if it is “cooling down”

Data can be promoted to upper layers if it is “warming up”

Page 7: Alluxio - CAPI Flash Integration for Big Data Frameworks CAPI:  IBM CAPI User Space Block Library:  What is CAPI?  For Partners:  15 16 17 Core

7 © 2017 IBM Corporation 7

CAPI: Coherent Accelerator Processor Interface

Page 8: Alluxio - CAPI Flash Integration for Big Data Frameworks CAPI:  IBM CAPI User Space Block Library:  What is CAPI?  For Partners:  15 16 17 Core

8

© 2017 IBM Corporation

Typical I/O Model Flow:

Flow with a Coherent Model:

Shared Mem.

Notify Accelerator Acceleration

Shared Memory

Completion

DD Call Copy or Pin

Source Data

MMIO Notify

Accelerator Acceleration

Poll / Interrupt

Completion

Copy or Unpin

Result Data

Ret. From DD

Completion

Application

Dependent, but

Equal to below

Application

Dependent, but

Equal to above

300 Instructions 10,000 Instructions 3,000 Instructions 1,000 Instructions

1,000 Instructions

7.9µs 4.9µs

Total ~13µs for data prep

400 Instructions 100 Instructions

0.3µs 0.06µs

Total 0.36µs

Page 9: Alluxio - CAPI Flash Integration for Big Data Frameworks CAPI:  IBM CAPI User Space Block Library:  What is CAPI?  For Partners:  15 16 17 Core

9

© 2017 IBM Corporation

Modify Alluxio to use the new user-level CAPI Flash block APIs

Requires changes to Alluxio

Alluxio-specific Implementation

Non-reusable

Create a User-Space Filesystem for CAPI Flash

Generic Implementation

Provides standard interfaces for file operations - No changes to Alluxio

Page 10: Alluxio - CAPI Flash Integration for Big Data Frameworks CAPI:  IBM CAPI User Space Block Library:  What is CAPI?  For Partners:  15 16 17 Core

10

© 2017 IBM Corporation

The userspace filesystem is primarily contained in a library(libcflsh_usfs), that runs on top of the

CAPI flash block library API.

Provides analogs of all major filesystem APIs (open, close, read, write, aio_read, aio_write etc).

Provides wrapper layer that allows applications to work without modifications with use of the

library preload feature. Wrapper intercepts filesystem calls from libc intended for libcflsh_usfs and routes them internally.

Page 11: Alluxio - CAPI Flash Integration for Big Data Frameworks CAPI:  IBM CAPI User Space Block Library:  What is CAPI?  For Partners:  15 16 17 Core

11

© 2017 IBM Corporation

Sup

er Pip

e I/O

PHYP/Hardware

Kernel Space

User Space

CAPI Flash

Block Library Alluxio

with CAPI

CAPI Flash Adapter

Alluxio with

Traditional IO

VFS

Filesystem

Disk Driver

User

space FS

Alluxio with CAPI USFS

Page 12: Alluxio - CAPI Flash Integration for Big Data Frameworks CAPI:  IBM CAPI User Space Block Library:  What is CAPI?  For Partners:  15 16 17 Core

12

© 2017 IBM Corporation

Alluxio Performance with CAPI

Alluxio with Traditional FS on Flash

Page 13: Alluxio - CAPI Flash Integration for Big Data Frameworks CAPI:  IBM CAPI User Space Block Library:  What is CAPI?  For Partners:  15 16 17 Core

13

© 2017 IBM Corporation

Alluxio Performance with CAPI

Alluxio with CAPI USFS on Flash

Page 14: Alluxio - CAPI Flash Integration for Big Data Frameworks CAPI:  IBM CAPI User Space Block Library:  What is CAPI?  For Partners:  15 16 17 Core

14

© 2017 IBM Corporation

Alluxio Project: www.alluxio.org

IBM CAPI: http://ibm.biz/powercapi

IBM CAPI User Space Block Library: https://github.com/open-power/capiflash

What is CAPI? http://lt.be.ibm.com/stg/ltu48342

For Partners:

http://www.ibm.com/services/weblectures/dlv/partnerworld/ltu48850

SuperVessel Website: https://ptopenlab.com/cloudlabconsole/#/

Contact: [email protected], [email protected]

Page 15: Alluxio - CAPI Flash Integration for Big Data Frameworks CAPI:  IBM CAPI User Space Block Library:  What is CAPI?  For Partners:  15 16 17 Core

15

© 2017 IBM Corporation

Questions?

Page 16: Alluxio - CAPI Flash Integration for Big Data Frameworks CAPI:  IBM CAPI User Space Block Library:  What is CAPI?  For Partners:  15 16 17 Core

16

© 2017 IBM Corporation

Thank You

Page 17: Alluxio - CAPI Flash Integration for Big Data Frameworks CAPI:  IBM CAPI User Space Block Library:  What is CAPI?  For Partners:  15 16 17 Core

17

© 2017 IBM Corporation 17

© 2015 IBM Corporation

Memory Subsystem

Virt Addr

What was done before CAPI?

POWER8

Core

POWER8

Core

POWER8

Core

POWER8

Core

POWER8

Core

POWER8

Core

App

FPGA PCIE

Variables Input

Data

DD

Device Driver

Storage Area

Variables

Input

Data

Variables

Input

Data

Output

Data

Output

Data

Prior to CAPI, an application called a device driver to utilize an

FPGA Accelerator.

The device driver performed a memory mapping operation.

3 versions of the data (not coherent).

1000s of instructions in the device driver.

Page 18: Alluxio - CAPI Flash Integration for Big Data Frameworks CAPI:  IBM CAPI User Space Block Library:  What is CAPI?  For Partners:  15 16 17 Core

18

© 2017 IBM Corporation 18

© 2015 IBM Corporation

Memory Subsystem

Virt Addr

CAPI Coherency

POWER8

Core

POWER8

Core

POWER8

Core

POWER8

Core

POWER8

Core

POWER8

Core App

FPGA PCIE

With CAPI, the FPGA shares memory with the cores

PS

L

Variables Input

Data

Output

Data

1 coherent version of the data.

No device driver call/instructions.

Page 19: Alluxio - CAPI Flash Integration for Big Data Frameworks CAPI:  IBM CAPI User Space Block Library:  What is CAPI?  For Partners:  15 16 17 Core

19

© 2017 IBM Corporation 19

© 2015 IBM Corporation

Typical I/O Model Flow:

Flow with a Coherent Model:

Shared Mem.

Notify Accelerator Acceleration

Shared Memory

Completion

DD Call Copy or Pin

Source Data

MMIO Notify

Accelerator Acceleration

Poll / Interrupt

Completion

Copy or Unpin

Result Data

Ret. From DD

Completion

Application

Dependent, but

Equal to below

Application

Dependent, but

Equal to above

300 Instructions 10,000 Instructions 3,000 Instructions 1,000 Instructions

1,000 Instructions

7.9µs 4.9µs

Total ~13µs for data prep

400 Instructions 100 Instructions

0.3µs 0.06µs

Total 0.36µs

CAPI vs. I/O Device Driver: Data Prep

Page 20: Alluxio - CAPI Flash Integration for Big Data Frameworks CAPI:  IBM CAPI User Space Block Library:  What is CAPI?  For Partners:  15 16 17 Core

20

© 2017 IBM Corporation

PCIe

How CAPI Works

Algorithm Algo m rith

POWER8 Processor

Acceleration Portion:

Data or Compute Intensive,

Storage or External I/O

Application Portion:

Data Set-up, Control

Sharing the same memory space

Accelerator is a peer to POWER8 Core

CAPI Developer Kit Card

Page 21: Alluxio - CAPI Flash Integration for Big Data Frameworks CAPI:  IBM CAPI User Space Block Library:  What is CAPI?  For Partners:  15 16 17 Core

21

© 2017 IBM Corporation

FPGA

POWER8

Core

CA

PP

P

CIe

CAPI technology connections

• Proprietary hardware to enable coherent acceleration

• Operating system enablement

• Ubuntu LE

• Libcxl function calls

• Customer application and accelerator

• Application sets up data and calls the accelerator functional unit (AFU)

• AFU reads and writes coherent data across the PCIe and communicates with the application

• PSL cache holds coherent data for quick AFU access

POWER8 Processor

OS

App

Memory (Coherent)

AFU

IBM Supplied PSL

Page 22: Alluxio - CAPI Flash Integration for Big Data Frameworks CAPI:  IBM CAPI User Space Block Library:  What is CAPI?  For Partners:  15 16 17 Core

22

© 2017 IBM Corporation 22

© 2015 IBM Corporation

2 2 Set Work Element

Descriptor (WED) at

AddrX – may contain

addresses of other data

structures

Understands WED content - and

any other addressed data

structures

AFU reserved for work Open device

cxl_afu_open_dev

1 Connect to

accelerator

App

OS

IBM Supplied

PSL

AFU

If required, App can

read or write AFU

registers

5 MMIO interface

AFU continues to work

using this interface

Reset AFU

PSL_WED_Ax is

set to AddrX

AFU_CNTL_An[E]

is set

jea gets AddrX

jcom gets start

CTL interface Start accelerator 3 Attach device

cxl_afu_attach

6 6 AFU finishes

(Mechanism is user defined)

De-assert RUNNING

Assert DONE

App knows AFU is finished

(Mechanism is user

defined)

App can start again

from top or free AFU

CTL interface

Free device

cxl_afu_free

CAPI solution flow

Resp interface

CMD interface

Buffer interface 4

AFU fetches AddrX (the WED)

starts operation

Page 23: Alluxio - CAPI Flash Integration for Big Data Frameworks CAPI:  IBM CAPI User Space Block Library:  What is CAPI?  For Partners:  15 16 17 Core

23

© 2017 IBM Corporation

IBM Data Engine for NoSQL

Load Balancer

500GB Cache

Node

10Gb Uplink

POWER8 Server

Flash Array w/ up

to 40TB

After: NoSQL POWER8 + CAPI Flash

WWW

10Gb Uplink

WWW

Backup Nodes

500GB Cache

Node 500GB Cache

Node 500GB Cache

Node 500GB Cache

Node

Before: NoSQL in memory (x86)

24U 4U

Less is More

24:1 physical server consolidation =

6x less rack space

Infrastructure Requirements

- Large Distributed (Scale out)

- Large Memory per node

- Networking Bandwidth Needs

- Load Balancing

Acceptable

latency

CAPI

Memory

Conventional PCIe I/O

network

network

network

24:1 infrastructure consolidation

Upto 3x cost savings

6x less rack space 2U server + 2U FlashSystem vs. typical deployment

Infrastructure Consolidation