69
FlashBlox: Achieving Both Performance Isolation and Uniform Lifetime for Virtualized SSDs Jian Huang Anirudh Badam Laura Caulfield Suman Nath Sudipta Sengupta Bikash Sharma Moinuddin K. Qureshi

FlashBlox: Achieving Both Performance Isolation and …nvmw.ucsd.edu/nvmw18-program/unzip/current/nvmw2018...Anirudh Badam Laura Caulfield Suman Nath † ‡ ‡ FlashBlox: Achieving

  • Upload
    hadiep

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

FlashBlox: Achieving Both Performance Isolation

and Uniform Lifetime for Virtualized SSDs

Jian Huang † Anirudh Badam Laura Caulfield

Suman Nath Sudipta Sengupta Bikash Sharma Moinuddin K. Qureshi

† ‡

Flash Has Changed Over the Last Decade

2

Performance

Improvement

100x lower latency

5,000x higher throughput

Flash Has Changed Over the Last Decade

2

Increased

Parallelism

Dozens of

parallel chips

Performance

Improvement

100x lower latency

5,000x higher throughput

Flash Has Changed Over the Last Decade

2

Increased

Parallelism

Dozens of

parallel chips

Became

Commodity

Less than $0.2/GB

Performance

Improvement

100x lower latency

5,000x higher throughput

Flash Has Changed Over the Last Decade

2

Increased

Parallelism

Dozens of

parallel chips

Became

Commodity

Less than $0.2/GB

Significant improvements on Flash

Performance

Improvement

100x lower latency

5,000x higher throughput

Shared Flash-Based Solid State Disk (SSD) in the Cloud

3

…….

Shared Flash-Based Solid State Disk (SSD) in the Cloud

3

…….

SSDs are virtualized and shared in data centers

Channel

Chip Chip

Channel

Chip Chip

…………… ……… … …

Channel

Chip Chip

……… …

Flash Translation Layer

Performance Interference in Shared SSD

4

…….

Flash-based SSD: A Black Box

Write Read

Channel

Chip Chip

Channel

Chip Chip

…………… ……… … …

Channel

Chip Chip

……… …

Flash Translation Layer

Performance Interference in Shared SSD

4

…….

Flash-based SSD: A Black Box

Write ReadRead/write interferences cause long (3x) tail latency!

Channel

Chip Chip

Channel

Chip Chip

…………… ……… … …

Channel

Chip Chip

……… …

Flash Translation Layer

Performance Interference in Shared SSD

4

…….

Write Read

Channel

Chip Chip

Channel

Chip Chip

…………… ……… … …

Channel

Chip Chip

……… …

Flash Translation Layer

Performance Interference in Shared SSD

4

…….

Write Read

FlashBlox: Hardware Isolation in Cloud Storage

5

Channel

Chip Chip

Channel

Chip Chip

…………… ……… … …

…….

Channel

Chip Chip

……… …

Flash Translation Layer

FlashBlox: Hardware Isolation in Cloud Storage

5

Channel

Chip Chip

Channel

Chip Chip

…………… ……… … …

…….

Channel

Chip Chip

……… …

Flash Translation Layer

Leveraging parallel chips for hardware isolation

Internal Parallelism Enables Hardware Isolation

6

Channel

Chip

plane plane

Chip

plane plane

Channel

Chip

plane plane

Chip

plane plane

…………… ……… … …

Channel

Chip

plane plane

Chip

plane plane

……… …

Internal Parallelism Enables Hardware Isolation

6

Channel

Chip

plane plane

Chip

plane plane

Channel

Chip

plane plane

Chip

plane plane

…………… ……… … …

Channel

Chip

plane plane

Chip

plane plane

……… …

Channel-Level Parallelism

Internal Parallelism Enables Hardware Isolation

6

Channel

Chip

plane plane

Chip

plane plane

Channel

Chip

plane plane

Chip

plane plane

…………… ……… … …

Channel

Chip

plane plane

Chip

plane plane

……… …

Channel-Level Parallelism

Chip-Level Parallelism

Internal Parallelism Enables Hardware Isolation

6

Channel

Chip

plane plane

Chip

plane plane

Channel

Chip

plane plane

Chip

plane plane

…………… ……… … …

Channel

Chip

plane plane

Chip

plane plane

……… …

Channel-Level Parallelism

Chip-Level Parallelism

Plane-Level

Parallelism

Plane-level parallelism is constrained

as each chip contains only one address buffer

Internal Parallelism Enables Hardware Isolation

6

Channel

Chip

plane plane

Chip

plane plane

Channel

Chip

plane plane

Chip

plane plane

…………… ……… … …

Channel

Chip

plane plane

Chip

plane plane

……… …

Channel-Level Parallelism

Chip-Level Parallelism

Plane-Level

Parallelism

Different parallelism level provides different isolation guarantee

New Abstractions for Hardware Isolation

7

Channel

Chip

plane plane

Chip

plane plane

Channel

Chip

plane plane

Chip

plane plane

…………… ……… … …

Channel

Chip

plane plane

Chip

plane plane

……… …

New Abstractions for Hardware Isolation

7

Channel

Chip

plane plane

Chip

plane plane

Channel

Chip

plane plane

Chip

plane plane

…………… ……… … …

Channel

Chip

plane plane

Chip

plane plane

……… …

Virtual SSD

(Chip Level)

Virtual SSD

(Channel Level)

Virtual SSD

(Plane Level)

High Medium Low

New Abstractions for Hardware Isolation

7

Channel

Chip

plane plane

Chip

plane plane

Channel

Chip

plane plane

Chip

plane plane

…………… ……… … …

Channel

Chip

plane plane

Chip

plane plane

……… …

Virtual SSD

(Chip Level)

Virtual SSD

(Channel Level)

Virtual SSD

(Plane Level)

High Medium Low

Software-based

Hardware Isolation Meets the Pay-As-You-Go Model in Cloud

8

Channel

Chip

plane plane

Chip

plane plane

Channel

Chip

plane plane

Chip

plane plane

…………… ……… … …

Channel

Chip

plane plane

Chip

plane plane

……… …

vSSD (Chip)vSSD (Channel) vSSD (Software)

Hardware Isolation Meets the Pay-As-You-Go Model in Cloud

8

Channel

Chip

plane plane

Chip

plane plane

Channel

Chip

plane plane

Chip

plane plane

…………… ……… … …

Channel

Chip

plane plane

Chip

plane plane

……… …

vSSD (Chip)vSSD (Channel) vSSD (Software)

Azure

DocumentDB

Azure

SQL DatabaseAmazon

DynamoDB

Hardware Isolation Meets the Pay-As-You-Go Model in Cloud

8

Channel

Chip

plane plane

Chip

plane plane

Channel

Chip

plane plane

Chip

plane plane

…………… ……… … …

Channel

Chip

plane plane

Chip

plane plane

……… …

vSSD (Chip)vSSD (Channel) vSSD (Software)

Azure

DocumentDB

Azure

SQL DatabaseAmazon

DynamoDB

Throughput

Single Partition Size

Price

Hardware Isolation Meets the Pay-As-You-Go Model in Cloud

8

Channel

Chip

plane plane

Chip

plane plane

Channel

Chip

plane plane

Chip

plane plane

…………… ……… … …

Channel

Chip

plane plane

Chip

plane plane

……… …

vSSD (Chip)vSSD (Channel) vSSD (Software)

Hundreds of vSSDs can be

supported in a single server

Impact of Hardware Isolation on SSD Lifetime

9

Channel

Chip Chip

Channel

Chip Chip

…………… ……… … …

…….

Channel

Chip Chip

……… …

Flash Translation Layer

Impact of Hardware Isolation on SSD Lifetime

9

Channel

Chip Chip

Channel

Chip Chip

…………… ……… … …

…….

Channel

Chip Chip

……… …

Flash Translation Layer

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Ave

rage

#B

lock

s Era

sed/s

ec

The average rate at which flash blocks are erased

Impact of Hardware Isolation on SSD Lifetime

9

Channel

Chip Chip

Channel

Chip Chip

…………… ……… … …

…….

Channel

Chip Chip

……… …

Flash Translation Layer

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Ave

rage

#B

lock

s Era

sed/s

ec

The average rate at which flash blocks are erased

Impact of Hardware Isolation on SSD Lifetime

9

Channel

Chip Chip

Channel

Chip Chip

…………… ……… … …

…….

Channel

Chip Chip

……… …

Flash Translation Layer

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Ave

rage

#B

lock

s Era

sed/s

ec

The average rate at which flash blocks are erased

Flash blocks wear out at different rate with different workload

Impact of Hardware Isolation on SSD Lifetime

9

Channel

Chip Chip

Channel

Chip Chip

…………… ……… … …

…….

Channel

Chip Chip

……… …

Flash Translation Layer

Write Intensive

FlashBloxChallenges

10

Chip

Chip

Chip

AppApp App

SSD Lifetime Performance Isolation

FlashBloxChallenges

10

Chip

Chip

Chip

AppApp App

SSD Lifetime Performance Isolation

Chip

Chip

Chip

AppApp App

SSD Lifetime Performance Isolation

FlashBloxChallenges

10

Chip

Chip

Chip

AppApp App

SSD Lifetime Performance IsolationSSD Lifetime

FlashBlox: Swapping Channels for Wear Balance

11

Channel 1

Use

d E

rase

Cyc

les

Channel 2 Channel 3 Channel 4

Adjusting the wear imbalance at a more coarse time granularity

can achieve near-ideal SSD lifetime

FlashBlox: Swapping Channels for Wear Balance

11

Channel 1

Use

d E

rase

Cyc

les

Channel 2 Channel 3 Channel 4

The channel that has incurred the maximum wearout

The channel that has the minimum rate of wearout

FlashBlox: Swapping Channels for Wear Balance

11

Channel 1

Use

d E

rase

Cyc

les

Channel 2 Channel 3 Channel 4

Channel migration takes 15 minutes, once per 19 days

Overall performance drops only for 0.04% of all the time

How Frequently Should We Swap?

12

Channel 1

Use

d E

rase

Cyc

les

Channel 2 Channel 3 Channel 4

Imbalance = MaxWear / AvgWear

How Frequently Should We Swap?

12

Channel 1

Use

d E

rase

Cyc

les

Channel 2 Channel 3 Channel 4

M

Imbalance = MaxWear / AvgWear4

App

How Frequently Should We Swap?

12

Channel 1

Use

d E

rase

Cyc

les

Channel 2 Channel 3 Channel 4

M M

Imbalance = MaxWear / AvgWear42

App

How Frequently Should We Swap?

12

Channel 1

Use

d E

rase

Cyc

les

Channel 2 Channel 3 Channel 4

M M M

Imbalance = MaxWear / AvgWear424/3

App

How Frequently Should We Swap?

12

Channel 1

Use

d E

rase

Cyc

les

Channel 2 Channel 3 Channel 4

M M M M

Imbalance = MaxWear / AvgWear424/31

App

How Frequently Should We Swap?

12

Channel 1

Use

d E

rase

Cyc

les

Channel 2 Channel 3 Channel 4

M M M M

Imbalance = MaxWear / AvgWear424/31

M

8/5

M

4/3

M

8/7

M

1

App

How Frequently Should We Swap?

12

Channel 1

Use

d E

rase

Cyc

les

Channel 2 Channel 3 Channel 4

M M M M

Imbalance = MaxWear / AvgWear424/31

M

8/5

M

4/3

M

8/7

M

1

App

How many times should we swap within SSD lifetime?

Quantifying the Swapping Frequency

13

Quantifying the Swapping Frequency

13

after K rounds of cycling:

Quantifying the Swapping Frequency

13

after K rounds of cycling:

Wear Imbalance = (MK + M)/(MK + M/N) = (K + 1)/(K + 1/N) ≤ (1 + x)

Maximum Wearout Average Wearout

Quantifying the Swapping Frequency

13

after K rounds of cycling:

Wear Imbalance = (MK + M)/(MK + M/N) = (K + 1)/(K + 1/N) ≤ (1 + x)

K ≥ (N – 1 – x) / (Nx)

Quantifying the Swapping Frequency

13

after K rounds of cycling:

Wear Imbalance = (MK + M)/(MK + M/N) = (K + 1)/(K + 1/N) ≤ (1 + x)

K ≥ (N – 1 – x) / (Nx)

If N = 16, x = 0.1, then K = 9, which means after swap NK = 148 times,

we can guarantee the wear imbalance is bounded in 1.1

Quantifying the Swapping Frequency

13

after K rounds of cycling:

Wear Imbalance = (MK + M)/(MK + M/N) = (K + 1)/(K + 1/N) ≤ (1 + x)

K ≥ (N – 1 – x) / (Nx)

If N = 16, x = 0.1, then K = 9, which means after swap NK = 148 times,

we can guarantee the wear imbalance is bounded in 1.1

For an SSD with 5 years lifetime, swap once per 12 days

can guarantee the channels are well balanced for worst case

Adaptive Wear Leveling in Practice

14

Channel 1

Use

d E

rase

Cyc

les

Channel 2 Channel 3 Channel 4

M

M/3 M/20

App App App App

Adaptive Wear Leveling in Practice

14

Channel 1

Use

d E

rase

Cyc

les

Channel 2 Channel 3 Channel 4

M

M/3 M/20

App App

Using erase rate as the trigger condition for swapping

App App

Intra Channel Wear Leveling

15

Channel 1

Use

d E

rase

Cyc

les

Channel 2 Channel 3 Channel 4

Intra Channel Wear Leveling

15

Channel 1

Use

d E

rase

Cyc

les

Channel 2 Channel 3 Channel 4

Chips will be swapped along with the channel migration

Chip

Intra Channel Wear Leveling

15

Channel 1

Use

d E

rase

Cyc

les

Channel 2 Channel 3 Channel 4

Chips will be swapped along with the channel migration

Chip

Intra-chip wear leveling mechanisms +

FlashBlox Architecture

16

App

Channel-Level

Wear Leveling

Flash

Resource

Manager

Chip-Level

Wear Leveling

FlashBlox Architecture

16

App

Channel-Level

Wear Leveling

Flash

Resource

Manager

App

Virtual SSD

App

Virtual SSD…

Chip-Level

Wear Leveling

Isolation, Bandwidth & Capacity Requirement

(Virtual SSD to Parallel Chips Mappings)

FlashBlox Architecture

16

App

Channel-Level

Wear Leveling

Flash

Resource

Manager

App

Virtual SSD

App

Virtual SSD…

Chip-Level

Wear Leveling

Isolation, Bandwidth & Capacity Requirement

(Virtual SSD to Parallel Chips Mappings)

FlashBlox Architecture

16

App

Channel-Level

Wear Leveling

Flash

Resource

Manager

App

Virtual SSD

App

Virtual SSD…

Chip-Level

Wear Leveling

Pay-As-You-Go Model in Cloud

Inter Channel

Swapping

Isolation, Bandwidth & Capacity Requirement

(Virtual SSD to Parallel Chips Mappings)

FlashBlox Architecture

16

App

Channel-Level

Wear Leveling

Flash

Resource

Manager

App

Virtual SSD

App

Virtual SSD…

Chip-Level

Wear Leveling

Channel Channel

Intra Channel

Swapping

Intra Channel

Swapping

Other FTL

Algorithms

… … …

Inter Channel

Swapping

Isolation, Bandwidth & Capacity Requirement

(Virtual SSD to Parallel Chips Mappings)

FlashBlox Architecture

16

App

Channel-Level

Wear Leveling

Flash

Resource

Manager

App

Virtual SSD

App

Virtual SSD…

Chip-Level

Wear Leveling

Channel Channel

FlashBlox Experimental Setup

17

14 data center workloads

16 channels

4 chips

4 planes

16 KB page size

Yahoo Cloud Service Benchmark

Bing Search / Index / PageRank

Transactional Database

Azure Storage

Tail Latency Reduction with FlashBlox

18

App1 App2

Tail Latency Reduction with FlashBlox

18

App1 App2 A: Session store recording recent actions

B: Photo tagging

C: User profile cache

D: User status update

E: Threaded conversations

F: User database

Tail Latency Reduction with FlashBlox

18

0

100

200

300

400

500

600

700

A+A A+B A+C A+D A+E A+F

99th

Perc

entile

Lat

ency

(mic

rose

cons)

Yahoo Cloud Service Benchmark (YCSB)

App1-Software Isolation App1-FlashBlox

App2-Software Isolation App2-FlashBlox

Tail latency reduction: 2.6x, average latency reduction: 1.4x

App1 App2

Impact of Channel Migration on Application Performance

19

0

0.1

0.2

0.3

0.4

0.5

0.6

Lat

ency

(m

illis

eco

nds)

Time (Seconds)

Bing Search’s Performance During Channel Migration

Without MigrationWith Migration

Impact of Channel Migration on Application Performance

19

0

0.1

0.2

0.3

0.4

0.5

0.6

Lat

ency

(m

illis

eco

nds)

Time (Seconds)

Bing Search’s Performance During Channel Migration

Without MigrationWith Migration

34%

Impact of Channel Migration on Application Performance

19

Channel migration takes 15 minutes, once per 19 days

Overall performance drops only for 0.04% of all the time

0

0.1

0.2

0.3

0.4

0.5

0.6

Lat

ency

(m

illis

eco

nds)

Time (Seconds)

Bing Search’s Performance During Channel Migration

Without MigrationWith Migration

34%

FlashBlox Summary

20

2.6x reduction on tail latency

Near-ideal SSD lifetime

Swap once per 19 days

21

Thanks!Jian Huang†

[email protected]

Anirudh Badam Laura Caulfield Suman Nath

Sudipta Sengupta Bikash Sharma Moinuddin K. Qureshi

Q&A