73
ZIL Performance: How I Doubled Sync Write Speed Prakash Surya | October 24 2017 1 / 73

ZIL Performance: How I Doubled Sync Write Speed

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ZIL Performance: How I Doubled Sync Write Speed

ZIL Performance: How I Doubled SyncWrite SpeedPrakash Surya | October 24 2017

1 / 73

Page 2: ZIL Performance: How I Doubled Sync Write Speed

Agenda1. What is the ZIL?

2. How is it used? How does it work?

3. The problem to be �xed; the solution.

4. Details on the changes I made.

5. Performance testing and results.

*Press "p" for notes, and "c" for split view.

2 / 73

Page 3: ZIL Performance: How I Doubled Sync Write Speed

1 – What is the ZIL?

3 / 73

Page 4: ZIL Performance: How I Doubled Sync Write Speed

What is the ZIL?ZIL: Acronym for (Z)FS (I)ntent (L)og

Logs synchronous operations to disk, before spa_sync()

What operations get logged?

zfs_create, zfs_remove, zfs_write, etc.

Doesn't include non-modifying ZPL operations:

zfs_read, zfs_seek, etc.

What gets logged?

The fact that a logical operation is occurring is logged

zfs_remove → directory object ID + name only

Not logging which blocks will change due to logical operation

4 / 73

Page 5: ZIL Performance: How I Doubled Sync Write Speed

When is the ZIL used?Always*

ZPL operations (itx's) logged via in-memory lists

lists of in-memory itx's written to disk via zil_commit()

zil_commit() called for:

any sync write**

*Except when dataset configured with: sync=disabled. **Except when dataset configured with: sync=always.

5 / 73

Page 6: ZIL Performance: How I Doubled Sync Write Speed

What is the SLOG?SLOG: Acronym for (S)eperate (LOG) Device

Conceptually, SLOG is different than the ZIL

ZIL is mechanism for writing, SLOG is device written to

An SLOG is not necessary

By default (no SLOG), ZIL will write to main pool VDEVs

An SLOG can be used to improve latency of ZIL writes

When attached, ZIL writes to SLOG instead of main pool*

*For some operations; see code for details.

6 / 73

Page 7: ZIL Performance: How I Doubled Sync Write Speed

Why does the ZIL exist?Writes in ZFS are "write-back"

Data is first written and stored in-memory, in DMU layer

Later, data for whole pool written to disk via spa_sync()

Without the ZIL, sync operations could wait for spa_sync()

spa_sync() can take tens of seconds (or more) to complete

Further, with the ZIL, write amplification can be mitigated

A single ZPL operation can cause many writes to occur

ZIL allows operation to "complete" with minimal data written

ZIL needed to provide "fast" synchronous semantics to applications

Correctness could be acheived without it, but would be "too slow"

7 / 73

Page 8: ZIL Performance: How I Doubled Sync Write Speed

ZIL On-Disk FormatEach dataset has it's own unique ZIL on-disk

ZIL stored on-disk as a singly linked list of ZIL blocks (lwb's)

Uberblock

MOS

Dataset ZIL header

lwb

lwb

Contents

Dataset

Contents

ZIL header

lwb

lwb

8 / 73

Page 9: ZIL Performance: How I Doubled Sync Write Speed

2 – How is the ZIL used?

9 / 73

Page 10: ZIL Performance: How I Doubled Sync Write Speed

How is the ZIL used?ZPL will generally interact with the ZIL in two phases:

1. Log the operation(s) — zil_itx_assign

Tells the ZIL an operation is occurring

2. Commit the operation(s) — zil_commit

Causes the ZIL to write log record of operation to disk

10 / 73

Page 11: ZIL Performance: How I Doubled Sync Write Speed

Example: zfs_writezfs_write → zfs_log_write

zfs_log_write

→ zil_itx_create

→ zil_itx_assign

zfs_write → zil_commit

11 / 73

Page 12: ZIL Performance: How I Doubled Sync Write Speed

Example: zfs_fsyncfsync → zil_commit

fsync doesn't create any new modifications

only writes previous itx's to disk

thus, no zfs_log_fsync function

12 / 73

Page 13: ZIL Performance: How I Doubled Sync Write Speed

Contract between ZIL and ZPL.Parameters to zil_commit: ZIL pointer, object number

These uniquely identify an object whose data is to be committed

When zil_commit returns:

Operations relevant to the object specified, will be persistent on disk

relevant – all operations that would modify that object

persistent – Log block(s) written (completed) → disk flushed

Interface of zil_commit doesn't specify which operation(s) to commit

13 / 73

Page 14: ZIL Performance: How I Doubled Sync Write Speed

2 – How does the ZIL work?

14 / 73

Page 15: ZIL Performance: How I Doubled Sync Write Speed

How does the ZIL work?In memory ZIL contains an itxg_t structure*

Each itxg_t contains:

A single list of sync operations (for all objects)

Object specific lists of async operations

*Actually multiple itxg_t structures, one per-txg.

15 / 73

Page 16: ZIL Performance: How I Doubled Sync Write Speed

Example: itx lists

sync list itx S1 itx S2

object A list itx A1 itx A2

object B list itx B1

16 / 73

Page 17: ZIL Performance: How I Doubled Sync Write Speed

How are itx's written to disk?zil_commit handles the process of writing itx_t's to disk:

17 / 73

Page 18: ZIL Performance: How I Doubled Sync Write Speed

How are itx's written to disk?zil_commit handles the process of writing itx_t's to disk:

1. find all relevant itx's, move them to the "commit list"

18 / 73

Page 19: ZIL Performance: How I Doubled Sync Write Speed

Example: zil_commit Object B

sync list itx S1 itx S2

object A list itx A1 itx A2

object B list itx B1

commit list

19 / 73

Page 20: ZIL Performance: How I Doubled Sync Write Speed

Example: zil_commit Object B

sync list itx S1 itx S2

object A list itx A1 itx A2

object B list itx B1

commit list

20 / 73

Page 21: ZIL Performance: How I Doubled Sync Write Speed

Example: zil_commit Object B

sync list itx S1 itx S2

object A list itx A1 itx A2

object B list

itx B1

commit list

21 / 73

Page 22: ZIL Performance: How I Doubled Sync Write Speed

Example: zil_commit Object B

sync list itx S1 itx S2

object A list itx A1 itx A2

object B list

itx B1

commit list

22 / 73

Page 23: ZIL Performance: How I Doubled Sync Write Speed

Example: zil_commit Object B

sync list

itx S1 itx S2

object A list itx A1 itx A2

object B list

itx B1commit list

23 / 73

Page 24: ZIL Performance: How I Doubled Sync Write Speed

How are itx's written to disk?zil_commit handles the process of writing itx_t's to disk:

1. Move async itx's for object being commited, to the sync list

2. Write all commit list itx's to disk

24 / 73

Page 25: ZIL Performance: How I Doubled Sync Write Speed

Example: zil_commit Object B

itx S1 itx S2 itx B1commit list

ZIL header

lwb 1

25 / 73

Page 26: ZIL Performance: How I Doubled Sync Write Speed

Example: zil_commit Object B

itx S2 itx B1commit list

ZIL header

lwb 1

itx S1

26 / 73

Page 27: ZIL Performance: How I Doubled Sync Write Speed

Example: zil_commit Object B

itx S2 itx B1commit list

ZIL header

lwb 1

itx S1

27 / 73

Page 28: ZIL Performance: How I Doubled Sync Write Speed

Example: zil_commit Object B

itx S2 itx B1commit list

ZIL header

lwb 1

itx S1 itx S2

28 / 73

Page 29: ZIL Performance: How I Doubled Sync Write Speed

Example: zil_commit Object B

itx S2 itx B1commit list

ZIL header

lwb 1

itx S1

lwb 2

29 / 73

Page 30: ZIL Performance: How I Doubled Sync Write Speed

Example: zil_commit Object B

itx B1commit list

ZIL header

lwb 1

itx S1

lwb 2

itx S2

30 / 73

Page 31: ZIL Performance: How I Doubled Sync Write Speed

Example: zil_commit Object B

itx B1commit list

ZIL header

lwb 1

itx S1

lwb 2

itx S2

31 / 73

Page 32: ZIL Performance: How I Doubled Sync Write Speed

Example: zil_commit Object B

commit list

ZIL header

lwb 1

itx S1

lwb 2

itx S2 itx B1

32 / 73

Page 33: ZIL Performance: How I Doubled Sync Write Speed

Example: zil_commit Object B

commit list

ZIL header

lwb 1

itx S1

lwb 2

itx S2 itx B1

lwb 3

33 / 73

Page 34: ZIL Performance: How I Doubled Sync Write Speed

How are itx's written to disk?zil_commit handles the process of writing itx_t's to disk:

1. Move async itx's for object being commited, to the sync list

2. Write all commit list itx's to disk

3. Wait for all ZIL block writes to complete

34 / 73

Page 35: ZIL Performance: How I Doubled Sync Write Speed

Example: zil_commit Object B

commit list

ZIL header

lwb 1

itx S1

lwb 2

itx S2 itx B1

lwb 3

35 / 73

Page 36: ZIL Performance: How I Doubled Sync Write Speed

Example: zil_commit Object B

commit list

ZIL header

lwb 1

itx S1

lwb 2

itx S2 itx B1

lwb 3

36 / 73

Page 37: ZIL Performance: How I Doubled Sync Write Speed

Example: zil_commit Object B

commit list

ZIL header

lwb 1

itx S1

lwb 2

itx S2 itx B1

lwb 3

37 / 73

Page 38: ZIL Performance: How I Doubled Sync Write Speed

How are itx's written to disk?zil_commit handles the process of writing itx_t's to disk:

1. Move async itx's for object being commited, to the sync list

2. Write all commit list itx's to disk

3. Wait for all ZIL block writes to complete

4. Flush VDEVs

38 / 73

Page 39: ZIL Performance: How I Doubled Sync Write Speed

How are itx's written to disk?zil_commit handles the process of writing itx_t's to disk:

1. Move async itx's for object being commited, to the sync list

2. Write all commit list itx's to disk

3. Wait for all ZIL block writes to complete

4. Flush VDEVs

5. Notify waiting threads

39 / 73

Page 40: ZIL Performance: How I Doubled Sync Write Speed

3 – Problem

40 / 73

Page 41: ZIL Performance: How I Doubled Sync Write Speed

Problem1. itx's grouped and written in "batches"

The commit list constitutes a batch

Batch size proportional to sync workload on system

2. Waiting threads only notified when all ZIL blocks in batch complete

3. Only a single batch processed at a time

41 / 73

Page 42: ZIL Performance: How I Doubled Sync Write Speed

Problemlwb A

lwb B

lwb C

lwb D

lwb E lwb I

lwb F

lwb G

lwb H

lwb J lwb N

lwb M

lwb K lwb O lwb S

lwb Tlwb Plwb L

lwb R

lwb Q

Time

lwb U

lwb V

lwb W

lwb X

lwb Y

Disk 1

Disk 2

Disk 4

Disk 3

Time spent servicing lwb's for each disk

Color indicates order waiting threads notified

42 / 73

Page 43: ZIL Performance: How I Doubled Sync Write Speed

3 – Solution

43 / 73

Page 44: ZIL Performance: How I Doubled Sync Write Speed

SolutionRemove concept of "batches":

1. Allow zil_commit to issue new ZIL block writes immediately

In contrast to waiting for the current batch to complete

2. Notify threads immediately when dependent lwb's on disk

In contrast to waiting for all lwb's on disk

44 / 73

Page 45: ZIL Performance: How I Doubled Sync Write Speed

Problemlwb A

lwb B

lwb C

lwb D

lwb E lwb I

lwb F

lwb G

lwb H

lwb J lwb N

lwb M

lwb K lwb O lwb S

lwb Tlwb Plwb L

lwb R

lwb Q

Time

lwb U

lwb V

lwb W

lwb X

lwb Y

Disk 1

Disk 2

Disk 4

Disk 3

Time spent servicing lwb's for each disk

Color indicates order waiting threads notified

45 / 73

Page 46: ZIL Performance: How I Doubled Sync Write Speed

Solutionlwb A

lwb B

lwb C

lwb D

lwb E lwb I

lwb F

lwb G

lwb H

lwb J lwb N

lwb M

lwb K lwb O lwb S

lwb Tlwb Plwb L

lwb R

lwb Q

Time

lwb U

lwb V

lwb W

lwb X

lwb Y

Disk 1

Disk 2

Disk 4

Disk 3

Time spent servicing lwb's for each disk

Color indicates order waiting threads notified

46 / 73

Page 47: ZIL Performance: How I Doubled Sync Write Speed

4 – Details on the Changes I Made

47 / 73

Page 48: ZIL Performance: How I Doubled Sync Write Speed

batch root

lwb 3lwb 2lwb 1

flush root

VDEV 2VDEV 1

Step 1

Step 2

Step 3 CV

Before

48 / 73

Page 49: ZIL Performance: How I Doubled Sync Write Speed

batch root

lwb 3lwb 2lwb 1

flush root

VDEV 2VDEV 1

Step 1

Step 2

Step 3 CV

Before

49 / 73

Page 50: ZIL Performance: How I Doubled Sync Write Speed

batch root

lwb 3lwb 2lwb 1

flush root

VDEV 2VDEV 1

Step 1

Step 2

Step 3 CV

Before

50 / 73

Page 51: ZIL Performance: How I Doubled Sync Write Speed

batch root

lwb 3lwb 2lwb 1

flush root

VDEV 2VDEV 1

Step 1

Step 2

Step 3 CV

Before

51 / 73

Page 52: ZIL Performance: How I Doubled Sync Write Speed

batch root

lwb 3lwb 2lwb 1

flush root

VDEV 2VDEV 1

Step 1

Step 2

Step 3 CV

Before

52 / 73

Page 53: ZIL Performance: How I Doubled Sync Write Speed

batch root

lwb 3lwb 2lwb 1

flush root

VDEV 2VDEV 1

Step 1

Step 2

Step 3 CV

Before

53 / 73

Page 54: ZIL Performance: How I Doubled Sync Write Speed

lwb 3

VDEV 1 flushlwb 3 write

lwb 2

VDEV 2 flushlwb 2 write

lwb 1

VDEV 1 flushlwb 1 write

CV ...

CV ... CV

CV ... CV

CV

After

54 / 73

Page 55: ZIL Performance: How I Doubled Sync Write Speed

lwb 3

VDEV 1 flushlwb 3 write

lwb 2

VDEV 2 flushlwb 2 write

lwb 1

VDEV 1 flushlwb 1 write

CV ...

CV ... CV

CV ... CV

CV

After

55 / 73

Page 56: ZIL Performance: How I Doubled Sync Write Speed

lwb 3

VDEV 1 flushlwb 3 write

lwb 2

VDEV 2 flushlwb 2 write

lwb 1

VDEV 1 flushlwb 1 write

CV ...

CV ... CV

CV ... CV

CV

After

56 / 73

Page 57: ZIL Performance: How I Doubled Sync Write Speed

lwb 3

VDEV 1 flushlwb 3 write

lwb 2

VDEV 2 flushlwb 2 write

lwb 1

VDEV 1 flushlwb 1 write

CV ...

CV ... CV

CV ... CV

CV

After

57 / 73

Page 58: ZIL Performance: How I Doubled Sync Write Speed

lwb 3

VDEV 1 flushlwb 3 write

lwb 2

VDEV 2 flushlwb 2 write

lwb 1

VDEV 1 flushlwb 1 write

CV ...

CV ... CV

CV ... CV

CV

After

58 / 73

Page 59: ZIL Performance: How I Doubled Sync Write Speed

lwb 3

VDEV 1 flushlwb 3 write

lwb 2

VDEV 2 flushlwb 2 write

lwb 1

VDEV 1 flushlwb 1 write

CV ...

CV ... CV

CV ... CV

CV

After

59 / 73

Page 60: ZIL Performance: How I Doubled Sync Write Speed

lwb 3

VDEV 1 flushlwb 3 write

lwb 2

VDEV 2 flushlwb 2 write

lwb 1

VDEV 1 flushlwb 1 write

CV ...

CV ... CV

CV ... CV

CV

After

60 / 73

Page 61: ZIL Performance: How I Doubled Sync Write Speed

lwb 3

VDEV 1 flushlwb 3 write

lwb 2

VDEV 2 flushlwb 2 write

lwb 1

VDEV 1 flushlwb 1 write

CV ...

CV ... CV

CV ... CV

CV

After

61 / 73

Page 62: ZIL Performance: How I Doubled Sync Write Speed

lwb 3

VDEV 1 flushlwb 3 write

lwb 2

VDEV 2 flushlwb 2 write

lwb 1

VDEV 1 flushlwb 1 write

CV ...

CV ... CV

CV ... CV

CV

After

62 / 73

Page 63: ZIL Performance: How I Doubled Sync Write Speed

lwb 3

VDEV 1 flushlwb 3 write

lwb 2

VDEV 2 flushlwb 2 write

lwb 1

VDEV 1 flushlwb 1 write

CV ...

CV ... CV

CV ... CV

CV

After

63 / 73

Page 64: ZIL Performance: How I Doubled Sync Write Speed

lwb 3

VDEV 1 flushlwb 3 write

lwb 2

VDEV 2 flushlwb 2 write

lwb 1

VDEV 1 flushlwb 1 write

CV ...

CV ... CV

CV ... CV

CV

After

64 / 73

Page 65: ZIL Performance: How I Doubled Sync Write Speed

lwb 3

VDEV 1 flushlwb 3 write

lwb 2

VDEV 2 flushlwb 2 write

lwb 1

VDEV 1 flushlwb 1 write

CV ...

CV ... CV

CV ... CV

CV

After

65 / 73

Page 66: ZIL Performance: How I Doubled Sync Write Speed

commit list

ZIL header

lwb 1

itx S1

lwb 2

itx S2 itx B1

commit list

ZIL header

lwb 1

itx S1

lwb 2

itx S2 itx B1

lwb 3

New Tunable: lwb Timeout

66 / 73

Page 67: ZIL Performance: How I Doubled Sync Write Speed

5 – Performance testing and results

67 / 73

Page 68: ZIL Performance: How I Doubled Sync Write Speed

~83% Increase in IOPs on Average – Max Rate – 8 HDDs

68 / 73

Page 69: ZIL Performance: How I Doubled Sync Write Speed

~48% Increase in IOPs on Average – Max Rate – 8 SSDs

69 / 73

Page 70: ZIL Performance: How I Doubled Sync Write Speed

~27% Decrease in Latency on Average – Fixed Rate – 8 HDDs

*IOPs increased with new code, and >64 threads; those data points omitted.

70 / 73

Page 71: ZIL Performance: How I Doubled Sync Write Speed

~16% Decrease in Latency on Average – Fixed Rate – 8 SSDs

71 / 73

Page 72: ZIL Performance: How I Doubled Sync Write Speed

More DetailsTwo fio workloads were used:

1. each thread submitting sync writes as fast as it could

2. each thread submitting 64 sync writes per second

1, 2, 4, and 8 disk zpools; both SSD and HDD

fio threads ranging from 1 to 1024; increasing in powers of 2

Full details can be found here

72 / 73

Page 73: ZIL Performance: How I Doubled Sync Write Speed

End

73 / 73