50
OPTIMIZING FORESTDB FOR FLASH- BASED SSD Sang-Won Lee Professor, Sungkungkwan University Sundar Sridharan Senior Software Engineer, Couchbase Inc.

Optimizing forest db for flash based ssd: Couchbase Connect 2015

Embed Size (px)

Citation preview

Page 1: Optimizing forest db for flash based ssd: Couchbase Connect 2015

OPTIMIZING FORESTDB FOR FLASH-BASED SSD

Sang-Won LeeProfessor, Sungkungkwan University

Sundar SridharanSenior Software Engineer, Couchbase Inc.

Page 2: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

2

Contents

▪ Introduction▪ SHARE Interface in Flash-Based SSD for

ForestDB▪ ForestDB Optimizations at File System Layer▪ Evaluation Results▪ Future Work▪ Summary

Page 3: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

3

Introduction

▪It is all-flash storage era!

▪Legacy of harddisk era at system softwares▪ Suboptimal on top of flash storage

▪ForestDB: next-generation KV engine of Couchbase

▪Opportunities▪ Exploit flash storage characteristics (SHARE Interface)▪ Leverage modern CoW-based file systems

Page 4: Optimizing forest db for flash based ssd: Couchbase Connect 2015

SHARE Interface in Flash-Based SSD

for ForestDB

Page 5: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

5

Characteristics of Flash Storage (vs. Hard Disk)

▪No-overwrite and FTL layer▪ Overwrite is not allowed▪ Another layer of address mapping inside flash storage

▪Limited lifetime

▪Write time in flash storage ~ write amount▪ Write time in harddisk ~ mechanical disk head

movement

Page 6: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

6

Copy-on-Write in ForestDB

▪Document update▪ Copy-on-Write, instead of in-place-update

Page 7: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

7

Copy-On-Write in ForestDB (2)

▪Why CoW? ▪ 1) Write atomicity and 2) multi-version concurrency

control ▪ A reasonable solution in HDD

▪Problems with CoW in flash storage▪ Tree-wandering write amplification low performance ▪ Flash storage lifetime

Page 8: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

8

Opportunities in Flash Storage

▪Address mapping inside flash storage (by FTL)

Page 9: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

9

Opportunities in Flash Storage(2)

▪SHARE interface: explicit address remapping

Page 10: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

10

Opportunities in Flash Storage (3)

▪ForestDB Compaction with SHARE▪ No write of valid documents to new file

Page 11: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

11

SHARE Implementation

▪Firmware extension for SHARE▪ OpenSSD Board (http://www.openssd-project.org/)▪ Atomic and recoverable

Page 12: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

12

Performance Evaluation

▪Normal time performance: YCSB’s workload-F

Page 13: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

13

Performance Evaluation (2)

▪Compaction performance

Elapsed Time(sec)

Written Bytes(MB)

Original ForestDB 227.5 1126.4

ForestDB with SHARE 88.4 150.6

Page 14: Optimizing forest db for flash based ssd: Couchbase Connect 2015

ForestDB Optimizations atFile System Layer

Page 15: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

15

Overview

▪Motivation – the catch-22

▪Why B-Tree file system (Btrfs)

▪How ForestDB solves the catch-22 using Btrfs

▪Optimizing with Linux Asynchronous library (libaio)

▪Performance Results

Page 16: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

16

Append-Only Key-Value Stores are Great!

▪Consistency▪Stable access to multiple point-in-time snapshots of data

▪Performance with Isolation▪Multi-Version Concurrency Control (MVCC) means readers

and writers do not block each other

▪Recoverability▪Can easily rollback entire database to a stable past state

▪SSD Friendly▪Avoids in-place updates and Flash Layer Translations

Page 17: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

17

Append-Only KV Stores are Great!

Page 18: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

18

MVCC: Readers & Writer Run Unblocked!

Page 19: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

19

But...

▪Disk can fill up with stale data

▪Need to do garbage collection - Compaction

Page 20: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

20

Compactions Do Garbage Collection...

Page 21: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

21

Compactions for Garbage Collection

Page 22: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

22

What if size of active data exceeds free space available….

A Fundamental Problem with Disk Space

Writer appends too much data

Page 23: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

23

A Fundamental Problem: Catch-22

“My disk is getting full... I want to free up space but don’t have enough free space to free up space!”

Size of Active Data must be strictly lesser than free space available on disk!!

Page 24: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

24

B-Tree File System (Btrfs)

▪Btrfs is a copy-on-write filesystem for Linux

▪Development began in Oracle in 2007 and marked as stable since August 2014 (http://goo.gl/upukn4)

▪Industry support from Facebook, Fujitsu, Fusion-IO, Intel, Netgear, Novel/SUSE, Oracle, Red Hat etc

▪Available as an option in all major Linux distributions

Page 25: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

25

Btrfs Features (Short list)▪Max file size upto 16 exbibytes (1 exbibyte in ext4)▪Self healing due to copy-on-write nature▪Online defragmentation▪Online volume growth and shrinking▪Online block device addition and removal▪Block discards for improved wear levelling on SSDs using TRIM▪Transparent compression configurable with file or volume ▪Online data scrubbing▪Send/receive of diffs▪Snapshots and subvolumes

▪File Cloning!

Page 26: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

26

Btrfs Basics - Representation

File P with reference counted extents

Page 27: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

27

Btrfs Feature - Copy File Range

Copy file range api lets new File “Q” share physical disk extents from File “P”

Page 28: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

28

Btrfs Feature - Blocks shared across files

Copy-On-Write lets new updates to happen on File Q

Page 29: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

29

Btrfs Basics - Deleting File

Deleting file Q

Page 30: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

30

Btrfs Basics - Freeing up space

Freeing up space

Page 31: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

31

ForestDB Compaction Using Btrfs Cloning

Compaction works by using BTRFS to copy-on-write (clone) valid block-ranges from old file into new file...

Page 32: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

32

ForestDB Compaction Using Btrfs Cloning

Deleting old file.fdb.0 frees up space only belonging to the stale blocks. Valid blocks of file.fdb.1 stay intact!

Page 33: Optimizing forest db for flash based ssd: Couchbase Connect 2015

Performance ResultsUbuntu 14.04, Btrfs v3.12, 4 CPU cores, 20GB

SSD drive 8GB DRAM

Page 34: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

34

Performance (1) – ForestDB on Btrfs

~1.25 - 2 X Faster! ½ write amplification!

Page 35: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

35

Performance (2) – ForestDB on Btrfs

~1.5 - 4 X Faster! ½ write amplification!

Page 36: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

36

Performance (3) – ForestDB on Btrfs

~2 X Faster! ½ write amplification!

Page 37: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

37

Speeding up Reads with libaio

▪Modern SSDs have multiple I/O channels

▪Asynchronous I/O maximizes throughput

▪Well suited for ForestDB compaction tasks

Page 38: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

38

Performance (4) ForestDB on Btrfs with libaio

13X faster!

7X faster!

4X faster!

Page 39: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

39

Advantages of Btrfs with libaio

▪Efficiently uses disk space avoiding the catch-22

▪Reduces Write Amplification by 2 times▪Longer SSD lifespan due to reduced wear

▪Over 13 X faster compaction speeds

▪Generic file system layer solution that applies to SSD as well as spinning disks

Page 40: Optimizing forest db for flash based ssd: Couchbase Connect 2015

Future Work

Page 41: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

41

Future Work

▪Optimize Btrfs clone feature for better performance▪Working with the Linux Btrfs community

▪Optimize ForestDB to skip reading if cloning on compaction

▪Adapt Ext4 file system to add the new system call that allows us to share physical blocks among multiple files

Page 42: Optimizing forest db for flash based ssd: Couchbase Connect 2015

Summary

Page 43: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

43

Summary

▪ForestDB with SHARE interface in SSD▪Speeds up compactions by 3X with 10X lower write

amplification

▪ForestDB with Btrfs clone feature in File system layer▪Speeds up compactions by 2X with 2X lower write

amplification

▪ForestDB with Btrfs clone feature with Linux libaio▪ Speeds up compactions by 13X with 2X lower write

amplification

Page 44: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

44

Questions?

Sang-Won Lee, [email protected]

Sundar [email protected]

Page 45: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

45

Initial Load Performance

3x ~ 6x less time

Page 46: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

46

Initial Load Performance

4x less write overhead

Page 47: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

47

Read-Only Performance

1 2 4 80

5000

10000

15000

20000

25000

30000

Throughput

ForestDB LevelDB RocksDB

# reader threads

Ope

ratio

ns p

er s

econ

d

2x ~ 5x

Page 48: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

48

Write-Only Performance

1 4 16 64 2560

2000

4000

6000

8000

10000

12000

Throughput

ForestDB LevelDB RocksDB

Write batch size (# documents)

Ope

ratio

ns p

er s

econ

d

- Small batch size (e.g., < 10) is not usually common

3x ~ 5x

Page 49: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

49

Write-Only Performance

1 4 16 64 2560

50

100

150

200

250

300

350

400

450

Write Amplification

ForestDB LevelDB RocksDB

Write batch size (# documents)

Writ

e am

plifi

catio

n(N

orm

aliz

ed t

o a

sing

le d

oc s

ize)

ForestDB shows 4x ~ 20x less write amplification

Page 50: Optimizing forest db for flash based ssd: Couchbase Connect 2015

©2015 Couchbase Inc.

50

Mixed Workload Performance

1 2 4 80

2000

4000

6000

8000

10000

12000

Mixed (Unrestricted) Performance

ForestDB LevelDB RocksDB

# reader threads

Ope

ratio

ns p

er s

econ

d

2x ~ 5x