37
WalB: A Fast and Low Latency Backup System for Block Devices Cybozu Meetup #8 SRE WalB Kota Uchida September 25, 2017 1

WalB: Real-time and Incremental Backup System for Block Devices

Embed Size (px)

Citation preview

Page 1: WalB: Real-time and Incremental Backup System for Block Devices

WalB: A Fast and Low LatencyBackup System for Block Devices

Cybozu Meetup #8 SRE WalB

Kota Uchida

September 25, 2017

1

Page 2: WalB: Real-time and Incremental Backup System for Block Devices

2

About me

▌Kota Uchida

▌SRE team at Cybozu, Inc.

▌A WalB developer

Page 3: WalB: Real-time and Incremental Backup System for Block Devices

3

About Cybozu

▌A large cloud service vendor in Japan.

▌Largest market shares

in field of collaborative software.

▌We serve web applications on our own cloud platform.

kintone: a low-code business app platform

and more

Page 4: WalB: Real-time and Incremental Backup System for Block Devices

#customer companies:

#accesses / day:

write IOs / day:

20,000+

210 millions

24.5 TiB

4

Page 5: WalB: Real-time and Incremental Backup System for Block Devices

5

Service Level Objective

▌24/7 nonstop service

▌99.99% availability (4 min / month)

▌Daily backup (retention period is 14 days)

▌Disaster recover: copy data to a remote site once a day

Page 6: WalB: Real-time and Incremental Backup System for Block Devices

Architecture of our platform

6

ApplicationServer

L7LB

Storage Server

dm-snap

Storage Server

dm-snap

Backup Server

Remote Site

DatabaseServer

DiffDiff

DiffDiff

The scope of this talk

RAID 1Blob

Server

Page 7: WalB: Real-time and Incremental Backup System for Block Devices

MappingInfo

Snapshot Managementwith dm-snap

7

A B

Original Volume Area

Snapshot Area

Logical Structure

Physical Structure

(1) CoW

Latest Image

Write A’ Write B’

Snapshot Image

(2) Write

B’

B

B’

A

A’

A’

0 1 2 3 4

Page 8: WalB: Real-time and Incremental Backup System for Block Devices

Backup using dm-snap

8

Snapshot1

(2) Full-scan a new snapshot

Logical Structure

Snapshot0

B’A’

(3) Generate a diff imageby comparing two snapshots

B

(1) Full-scan an old snapshot

B’A’

A

Page 9: WalB: Real-time and Incremental Backup System for Block Devices

Full-scan at night

9

Daytime

Backup processing time

o’clock

Page 10: WalB: Real-time and Incremental Backup System for Block Devices

UX degradationduring a full-scan

10Full-scanning

Page 11: WalB: Real-time and Incremental Backup System for Block Devices

11

We have no more “nights”

▌Until now:

Full scan is allowed only when access rate is low, i.e., at night.

▌From now on:

We have to handle accesses from multiple timezones.

▌We must be able to backup any time without UX degradation.

Page 12: WalB: Real-time and Incremental Backup System for Block Devices

12

New Solution

▌We need a new solution with:

No IO spikes

Short backup time

▌We compared dm-thin with WalB

Page 13: WalB: Real-time and Incremental Backup System for Block Devices

13

What is dm-thin?

▌dm-thin provides thin-provisioning volume management to

share same data among volumes

reduce disk usage using snapshots

▌In the mainline Linux kernel

Page 14: WalB: Real-time and Incremental Backup System for Block Devices

Snapshot Managementwith dm-thin

Logical Structure

Physical Structure

A

Latest Tree

Latest Image A

Page 15: WalB: Real-time and Incremental Backup System for Block Devices

Snapshot Managementwith dm-thin

15

Logical Structure

Physical Structure

A

Snapshot Tree Latest Tree

ASnapshot

Latest Image A

Page 16: WalB: Real-time and Incremental Backup System for Block Devices

Snapshot Managementwith dm-thin

16

A A’

Snapshot Tree Latest Tree

(1) CoW

(1) CoW

Write A’

Physical Structure

(2) Write

(2) Update

A’

ASnapshot

Latest Image

Logical Structure

Page 17: WalB: Real-time and Incremental Backup System for Block Devices

17

A B B’

Snapshot0 Snapshot1

A’

A’ B’

A BSnapshot0

Snapshot1

Generate a diff image using dm-thin metadata

Logical Structure

Physical Structure

Backup using dm-thin

Page 18: WalB: Real-time and Incremental Backup System for Block Devices

18

What is WalB?

▌A real-time and incremental backup system

developed at Cybozu Labs

▌Can backup block devices without IO spikes

dm-snapfull scanning

WalBno spikes

Page 19: WalB: Real-time and Incremental Backup System for Block Devices

Special Block Devices for WalB

19

WalB device

Data device Log device

Read Write

Any application (File system, DBMS, etc.)

Linear mapped Ring buffer

Page 20: WalB: Real-time and Incremental Backup System for Block Devices

Write IO Logging and Backup with WalB

20

A B

Data Device Log Device

0 1 2 3 4

Time series of write I/Os

Time

Page 21: WalB: Real-time and Incremental Backup System for Block Devices

Write IO Logging and Backup with WalB

21

B

A B

Write A’

Data Device Log Device

A’

0 1 2 3 4

1 A’

Time series of write I/Os

Time

Scan the log device and generate a diff image

Page 22: WalB: Real-time and Incremental Backup System for Block Devices

Write IO Logging and Backup with WalB

22

B

A B

B’

Write A’

Write B’

Data Device Log Device

A’

A’ 41

0 1 2 3 4

A’

A’ B’

Time series of write I/Os

Scan the log device and generate a diff image

Time

1

Page 23: WalB: Real-time and Incremental Backup System for Block Devices

23

Performance test

▌Compared dm-snap, dm-thin, and WalB

▌Executed a workload during a backup

The workload & the backup will affect each other

▌Measured the following metrics:

Latencies of the workload

Backup time

Page 24: WalB: Real-time and Incremental Backup System for Block Devices

24

Environment & Settings

▌Test environment:

CPU:2.40 GHz x 12 cores

MEM:192 GiB

HDD:4 TB HDD, RAID 6 (8D2P)

NIC:10 Gbps x 2

Kernel:4.11 (latest upstream)

▌Test settings:

100 GiB volumes

Workload: 4 KiB Random writes for a 5 GiB range

Page 25: WalB: Real-time and Incremental Backup System for Block Devices

25

Measuring the Backup Time(dm-snap, dm-thin)

▌dm-snap:take a snapshot & scan full image

▌dm-thin:get a structure of snapshot trees & find modified

blocks & read these blocks

5 GiB 95 GiB (unchanged)

4 KiB Random Writes

dm-snap : scan full image

dm-thin : scan changed chunks (tree traversal)

Page 26: WalB: Real-time and Incremental Backup System for Block Devices

26

Measuring the Backup Time(WalB)

▌WalB:scan logs from a log device & send them to a backup

server continuously

5 GiB 95 GiB (unchanged)

4 KiB Random Writes

WalB : scan logs

Log Device

Write IO logsWalB Device

Backup Server

DiffDiff

Network

Page 27: WalB: Real-time and Incremental Backup System for Block Devices

Write I/O latency

dm-thin

dm-snap

WalB

no-backup

27

IO spikes due to CoW,worse than dm-snap!

Small overhead

large due to CoW

Page 28: WalB: Real-time and Incremental Backup System for Block Devices

Backup time

28

1146

2260

1.2

slower than dm-snap

so fast!

Page 29: WalB: Real-time and Incremental Backup System for Block Devices

29

Conclusion

▌dm-snap & dm-thin

High I/O latency during a backup

Long backup time

▌WalB

Stable and low I/O latency (no spikes)

Short backup time

WalB satisfies our requirements for production use.

Page 30: WalB: Real-time and Incremental Backup System for Block Devices

30

Try WalB!

▌Project page

https://walb-linux.github.io/

▌Tutorial

https://github.com/walb-linux/walb-

tools/tree/master/misc/vagrant/

Vagrantfile for Ubuntu 16.04 and CentOS 7

Page 31: WalB: Real-time and Incremental Backup System for Block Devices

Remote Host

31

Incremental backup

▌Daily backup (retention period is 14 days)

▌Worker daemon of WalB selects diff files older than 14

days and applies them to a base image.

Volume Diff Diff Diff…Base

Diff files for 14 days

Backup Host

Apply everyday

Page 32: WalB: Real-time and Incremental Backup System for Block Devices

Remote Host

32

Restoring a volume

▌To restore the latest state of a volume:

take a snapshot of a base image, and

apply all diff files to it.

Diff Diff Diff…Base

Base'Writablesnapshot

Apply all diffs

Page 33: WalB: Real-time and Incremental Backup System for Block Devices

Remote Host

33

Make restoration faster 1/2

▌Fast restoration

by preparing read-only snapshots for each day

Diff Diff Diff…Base

1421

dm-thin snapshots for each day

Diff

Page 34: WalB: Real-time and Incremental Backup System for Block Devices

Remote Host

34

Make restoration faster 2/2

▌Apply some diffs to the appropriate snapshot.

▌At most 24 hours of diffs are needed to be applied.

Faster!

Diff Diff Diff…Base

1421

Diff

Page 35: WalB: Real-time and Incremental Backup System for Block Devices

35

Worldline: restoring a whole environment

▌"Worldline" means a parallel world.

▌We backup configurations in addition to user data.

Configurations:

definitions for each customer (ID, FQDN, Apps, …),

application version definition,

host definition, etc.

▌It is important to use applications whose versions are

consistent with user data backed up before.

Page 36: WalB: Real-time and Incremental Backup System for Block Devices

36

Worldline: restoring a whole environment

▌A daily script takes a snapshot of a whole environment.

▌An weekly script restores the latest backup, so we can use it

for investigation of failures or development our services.

User data

DiffDiff

Snapshot

ConfigDB

ConfigDB'Backup Backup

Worldline

Spare hosts

Restore

DiffDiff

Restore

Page 37: WalB: Real-time and Incremental Backup System for Block Devices

Q&Aemail: [email protected]

twitter: @uchan_nos

37