30
OFVWG: Erasure Coding RDMA Offload Sagi Grimberg

OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical

  • Upload
    others

  • View
    33

  • Download
    0

Embed Size (px)

Citation preview

Page 1: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical

OFVWG: Erasure Coding RDMA Offload

Sagi Grimberg

Page 2: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical

Problem Statement

•  Modern storage arrays are usually distributed in a clustered environment.

•  Problem: Disks and/or nodes inevitably tend to fail.

–  How can we survive failures and keep our data intact?

OFVWG 2

Page 3: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical

RAID 1 (Replication)

•  Instead of storing the data once, we will store more copies of the data on another disk/node.

•  If a disk/node fail, we are able to still recover the data.

•  If we want to survive X failures, we need to replicate X instances of the data.

OFVWG 3

Page 4: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical

RAID 1 pros/cons

•  Pros: –  Simple to do –  No need for extra computation –  No need for reconstruct logic

•  Cons: –  Requires a high storage space for redundancy –  Inefficient wire utilization

OFVWG 4

Page 5: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical

RAID 5 (single parity block)

•  We divide our data into X blocks and calculate a single parity block and store it as well.

•  If any of the drives fail we can reconstruct the

original data back from the parity block.

OFVWG 5

Page 6: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical

RAID 5 pros/cons

•  Pros: –  Efficient storage utilization (small storage space for

redundancy) –  Efficient wire utilization

•  Cons: –  Requires computation to generate the parity block –  Requires computation to reconstruct the original data –  Need multi-level RAID to survive more than a single

failure.

OFVWG 6

Page 7: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical

RAID 6 (dual parity block)

•  We divide our data into X blocks and calculate two parity block and store them as well.

•  If any two drives/nodes fail we can reconstruct

the original data back from the parity blocks.

OFVWG 7

Page 8: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical

RAID 6 pros/cons

•  Pros: –  Efficient storage utilization (small storage space for

redundancy) –  Efficient wire utilization

•  Cons: –  Requires computation to generate two parity blocks –  Requires computation to reconstruct the original data –  Need multi-level RAID to survive more than two

failures.

OFVWG 8

Page 9: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical

Erasure coding (generalize RAID)

•  There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes).

•  The mathematical approach is to use higher

rank polynomials over Galois finite fields GF(2^w) in order to use minimum storage for K number of disk/node failures.

•  Codes can be systematic (raw data is stored) or non-systematic (data projections are stored).

OFVWG 9

Page 10: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical

Erasure coding (generalize RAID)

•  Erasure codes allows us to survive M failures for any K data blocks where: K+M≤2↑𝑤 

•  For example if we use 𝐺𝐹( 2↑4 ) and we want to survive 4 disk failures we can protect 12 data blocks. –  This means we only spend 33.3% of storage to store

redundancy metadata.

OFVWG 10

Page 11: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical

Erasure coding Illustration

OFVWG 11

Page 12: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical

Erasure coding Decode Illustration

OFVWG 12

Page 13: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical

Erasure coding Decode Illustration

OFVWG 13

1.

2.

Page 14: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical

Erasure coding pros/cons

•  Pros: –  *Very* Efficient storage utilization (small storage space for

redundancy) –  *Very* Efficient wire utilization –  User can choose his configuration (K,M) – no need for multi-level

RAID.

•  Cons: –  Large computation overhead needed to generate the

redundancy metadata blocks –  Large computation overhead needed to reconstruct the original

data

OFVWG 14

Page 15: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical

RDMA Erasure coding offload

•  Erasure codes calculations is CPU intensive.

•  Next generation HCAs can offer a calculation engine.

•  These HCAs can also offer a coherent calculation and networking solutions.

OFVWG 15

Page 16: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical

Programming model - SW

OFVWG 16

Page 17: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical

Programming model - Synchronous

OFVWG 17

Page 18: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical

Programming model - Asynchronous

OFVWG 18

Page 19: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical

Programming model – Full striping

OFVWG 19

Page 20: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical

API – Erasure coding context •  EC context verbs representation

•  Allocation/Deallocation API

OFVWG 20

Page 21: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical

API – EC init attributes

OFVWG 21

Page 22: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical

API – EC memory layout

OFVWG 22

Page 23: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical

API – Synchronous Encode

OFVWG 23

Page 24: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical

API – Asynchronous Encode

OFVWG 24

Page 25: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical

API – Asynchronous Encode

OFVWG 25

Page 26: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical

API – Verbs stripe object

•  In order to perform the full striping operation via a single API call we need to provide our strping layout (who gets what)

OFVWG 26

Page 27: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical

API – Encode + Transfer

OFVWG 27

Page 28: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical

API – Synchronous Decode

OFVWG 28

Page 29: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical

API – Asynchronous Decode

OFVWG 29

•  Pretty much the same idea

Page 30: OFVWG: Erasure Coding RDMA Offload · Erasure coding (generalize RAID) • There are different types of erasure codes (Reed-Solomon, Cauchy and other MDS codes). • The mathematical

OFVWG

Thank You