46
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . How to use CUDA to accelerate . . . . Experiment Q&A Accelerate Reed-solomon coding for Fault-Tolerance in RAID-like system Shuai YUAN NTHU LSA lab 1 / 19

Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

Embed Size (px)

DESCRIPTION

https://github.com/yszheda/GPU-RSCode

Citation preview

Page 1: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

Accelerate Reed-solomon coding forFault-Tolerance in RAID-like system

Shuai YUAN

NTHULSA lab

1 / 19

Page 2: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

Outline

▶ Background

▶ How to use CUDA to accelerate

▶ Experiment

▶ Q & A

2 / 19

Page 3: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

Why Reed-Solomon codes?

Why fault-tolerance?

In a RAID-like system, storage is distributed among several devices,the probability of one of these devices failing becomes significant.

3 / 19

Page 4: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

Why Reed-Solomon codes?

Why fault-tolerance?

In a RAID-like system, storage is distributed among several devices,the probability of one of these devices failing becomes significant.MTTF (mean time to failure) of one device is P ,

MTTF of a system of n devices isP

n.

3 / 19

Page 5: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

Why Reed-Solomon codes?

Why fault-tolerance?

In a RAID-like system, storage is distributed among several devices,the probability of one of these devices failing becomes significant.MTTF (mean time to failure) of one device is P ,

MTTF of a system of n devices isP

n.

Thus in such systems, fault-tolerance must be taken into account.

3 / 19

Page 6: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

Why Reed-Solomon codes?

Why erasure codes?

Replication is expensive!e.g. To achieve double-fault tolerance, we need 2 replicas.

4 / 19

Page 7: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

Why Reed-Solomon codes?

Erasure Codes

Given a message/file of k equal-size blocks/chunks/fragments,encode it into n blocks/chunks/fragments (n > k)

▶ Optimal erasure codes: any k out of the n fragments aresufficient to recover the original message/file.i.e. can tolerate arbitrary (n− k) failures.We call this (n,k) MDS (maximum distance separable)property.e.g. Reed Solomon Codes

▶ Near-optimal erasure codes: require (1 + ε)k codeblocks/chunks to recover (ε > 0).i.e. cannot tolerate arbitrary (n− k) failures.e.g. Local Reconstruction Codes (used in WAS)

5 / 19

Page 8: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

Why Reed-Solomon codes?

Replication vs. Erasure Codes

Given a file of size MUsing Reed Solomon Codes(n=4, k=2)

Achieve double fault tolerance

▶ R-S Codes: storage overhead = M

▶ Relication: storage overhead = 2M (2 replicas)

6 / 19

Page 9: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

Reed-Solomon coding mechanism

Reed-Solomon encoding process

Given file F : divide it into k equal-size native chunks:F = [Fi]i=1,2,...,k. Encode them into (n− k) code chunks:C = [Ci]i=1,2,...,n−k.Use Encoding Matrix EM(n−k)×k to produce code chunks:

CT = EM × F T

Ci is the linear combination of F1, F2, . . . , Fk.

7 / 19

Page 10: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

Reed-Solomon coding mechanism

Reed-Solomon encoding process

In our case,F =

(A B

)EM =

(1 11 2

)CT =

(1 11 2

(AB

)=

(A+BA+ 2B

)

7 / 19

Page 11: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

Reed-Solomon coding mechanism

Reed-Solomon encoding process

Let P = [Pi]i=1,2,...,n = [F1, F2, . . . , Fk, C1, C2, . . . , Cn−k] be the

n chunks in storage, EM ′ =

(I

EM

), Here

I =

1 0 . . . 00 1 . . . 0...

.... . .

...0 0 . . . 1

then

P T = EM ′ × F T

=

(F T

CT

)

7 / 19

Page 12: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

Reed-Solomon coding mechanism

Reed-Solomon encoding process

In our case,

EM ′ =

(I

EM

)=

1 00 11 11 2

P T = EM ′ × F T

=

1 00 11 11 2

×(AB

)

=

AB

A+BA+ 2B

7 / 19

Page 13: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

Reed-Solomon coding mechanism

Reed-Solomon encoding process

EM ′ is the key of MDS property!

7 / 19

Page 14: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

Reed-Solomon coding mechanism

Reed-Solomon encoding process

Theorem (necessary and sufficient condition)

Every possible k × k submatrix obtained by removing (n− k) rowsfrom EM ′ has full rank.equivalent expression of full rank:

▶ rank = k

▶ non-singular...

Alternative view:Consider the linear space ofP = [Pi]i=1,2,...,n = [F1, F2, . . . , Fk, C1, C2, . . . , Cn−k], itsdimension is k, and any k out of n vectors form a basis of thelinear space.

7 / 19

Page 15: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

Reed-Solomon coding mechanism

Reed-Solomon encoding process

Reed-Solomon Codes uses Vandermonde matrix V as EM

V =

1 1 1 . . . 11 2 3 . . . k12 22 32 . . . k2

......

.... . .

...

1(n−k) 2(n−k) 3(n−k) . . . k(n−k)

7 / 19

Page 16: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

Reed-Solomon coding mechanism

Reed-Solomon encoding process

EM ′ =

1 0 0 . . . 00 1 0 . . . 00 0 1 . . . 0...

......

. . ....

0 0 0 . . . 11 1 1 . . . 11 2 3 . . . k12 22 32 . . . k2

......

.... . .

...

1(n−k) 2(n−k) 3(n−k) . . . k(n−k)

7 / 19

Page 17: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

Reed-Solomon coding mechanism

Reed-Solomon encoding process

Remark:

▶ All arithmetic operations in Galois Field GF(2x). Then everynumber is less than 2x.

▶ EM ′ satisfies MDS property.

7 / 19

Page 18: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

Reed-Solomon coding mechanism

Reed-Solomon decoding process

We randomly select k out of n fragments, sayX = [x1, x2, · · · , xk]T .Then we use the coefficients of each fragments xi to construct ak × k matrix V ′.Original data can be regenerated by multiplying matrix X with theinverse of matrix V ′:

F = V ′−1X

8 / 19

Page 19: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

Optimization motivation

Optimization motivation

The reason that discourages using Reed-Solomon coding to replacereplication is its high computational complexity:

▶ Galois Field arithmetic

▶ matrix operations

9 / 19

Page 20: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

Accelerate operations in Galois Field (GF)

Accelerate operations in Galois Field (GF)

GF(2w) field is constructed by finding a primitive polynomial q(x)of degree w over GF(2), and then enumerating the elements(which are polynomials) with the generator x. Addition in this fieldis performed using polynomial addition, and multiplication isperformed using polynomial multiplication and taking the resultmodulo q(x).

10 / 19

Page 21: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

Accelerate operations in Galois Field (GF)

Accelerate operations in Galois Field (GF)

GF(2w) field is constructed by finding a primitive polynomial q(x)of degree w over GF(2), and then enumerating the elements(which are polynomials) with the generator x. Addition in this fieldis performed using polynomial addition, and multiplication isperformed using polynomial multiplication and taking the resultmodulo q(x).In implementation, we map the elements of GF(2w) to binarywords of size w, so that computation in GF(2w) becomes bitwiseoperations. Let r(x) be a polynomial in GF(2w). Then we canmap r(x) to a binary word b of size w by setting the ith bit of b tothe coefficient of xi in r(x).

10 / 19

Page 22: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

Accelerate operations in Galois Field (GF)

Accelerate operations in Galois Field (GF)

GF(2w) field is constructed by finding a primitive polynomial q(x)of degree w over GF(2), and then enumerating the elements(which are polynomials) with the generator x. Addition in this fieldis performed using polynomial addition, and multiplication isperformed using polynomial multiplication and taking the resultmodulo q(x).In implementation, we map the elements of GF(2w) to binarywords of size w, so that computation in GF(2w) becomes bitwiseoperations. Let r(x) be a polynomial in GF(2w). Then we canmap r(x) to a binary word b of size w by setting the ith bit of b tothe coefficient of xi in r(x).After mapping, Addition/subtraction of binary elements of GF(2w)can be performed by bitwise exclusive-or.

10 / 19

Page 23: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

Accelerate operations in Galois Field (GF)

Accelerate operations in Galois Field (GF)

However, multiplication is still costly.Use bitwise operations to emulate polynomial multiplication inGF(2w): a loop of w iterations.e.g. In GF(28), multiplying 2 bytes takes 8 iterations.

11 / 19

Page 24: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

Accelerate operations in Galois Field (GF)

Accelerate operations in Galois Field (GF)

CPU approach

store all the result in a multiplication table

e.g. In GF(28), multiplying 2 bytes takes 256× 256 = 64KBconsumes a lot of space ⇝ not suitable for GPU

11 / 19

Page 25: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

Accelerate operations in Galois Field (GF)

Accelerate operations in Galois Field (GF)

trade-off between computation and memory consumption ([1])We build up two smaller tables:▶ exponential table: maps from a binary element b to power j

such that xj is equivalent to b.▶ logarithm table: maps from a power j to its binary element b

such that xj is equivalent to b.

Multiplication in GF(2w):▶ looking each binary number in the logarithm table for its

logarithm (this is equivalent to finding the polynomial)▶ adding the logarithms modulo 2w − 1 (this is equivalent to

multiplying the polynomial modulo q(x))▶ looking up exponential table to convert the result back to a

binary number

Totally three table lookups and a modular addition.11 / 19

Page 26: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

Accelerate encoding process

Accelerate encoding process

▶ Generating Vandermonde’s Matrix

▶ Matrix multiplication

12 / 19

Page 27: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

Accelerate decoding process

Accelerate decoding process

▶ Computing inverse matrix

▶ Matrix multiplication

13 / 19

Page 28: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

Accelerate decoding process

Accelerate matrix inversion

Guassian/Guass-Jordon elimination in GF(28)

[V ′I] =

1 0 0 0 | 1 0 0 00 1 0 0 | 0 1 0 01 2 3 4 | 0 0 1 01 1 1 1 | 0 0 0 1

14 / 19

Page 29: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

Accelerate decoding process

Accelerate matrix inversion

Guassian/Guass-Jordon elimination in GF(28)

[V ′I] =

1 0 0 0 | 1 0 0 00 1 0 0 | 0 1 0 00 2 3 4 | 1 0 1 00 1 1 1 | 1 0 0 1

14 / 19

Page 30: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

Accelerate decoding process

Accelerate matrix inversion

Guassian/Guass-Jordon elimination in GF(28)

[V ′I] =

1 0 0 0 | 1 0 0 00 1 0 0 | 0 1 0 00 0 3 4 | 1 2 1 00 0 1 1 | 1 1 0 1

14 / 19

Page 31: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

Accelerate decoding process

Accelerate matrix inversion

Guassian/Guass-Jordon elimination in GF(28)

[V ′I] =

1 0 0 0 | 1 0 0 00 1 0 0 | 0 1 0 00 0 1 247 | 244 245 244 00 0 0 246 | 245 244 244 1

14 / 19

Page 32: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

Accelerate decoding process

Accelerate matrix inversion

Guassian/Guass-Jordon elimination in GF(28)

[V ′I] =

1 0 0 0 | 1 0 0 00 1 0 0 | 0 1 0 00 0 1 0 | 104 187 186 2100 0 0 1 | 105 186 186 211

14 / 19

Page 33: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

Accelerate decoding process

Accelerate matrix inversion

Guassian/Guass-Jordon elimination in GF(28)

GPU

▶ normalize row

▶ make column into reduced echelon form

14 / 19

Page 34: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

Experiment Settings

Experiment Settings

▶ Intel(R) Xeon(R) CPU E5620 @ 2.40GHz

▶ nVidia Corporation GF100 [Tesla C2050]

15 / 19

Page 35: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

GPU Cost Breakdown

Experiment Result (GPU cost breakdown)

k = 4, n = 6

29.45792

173.697983

218.380676

429.164764

576.414185

0

100

200

300

400

500

600

752KB 47MB 161MB 679MB 1.1GB

time(ms)

file size

GPU R-S encoding

Copy code chunks from GPU memory to hostmemory

Encoding file (matrix multiplication)

Copy encoding matrix from GPU memory tohost memory

Generating encoding matrix

Copy data chunks from host memory to GPUmemory

16 / 19

Page 36: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

GPU Cost Breakdown

Experiment Result (GPU cost breakdown)

k = 4, n = 6

0

50

100

150

200

250

300

350

400

450

0 200000 400000 600000 800000 1000000 1200000

time(ms)

file size(KB)

GPU R-S encoding

Total computation time

Total communication time

16 / 19

Page 37: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

GPU Cost Breakdown

Experiment Result (GPU cost breakdown)

k = 4, n = 6

31.092863

178.489853

262.113556

614.687073

751.384766

0

100

200

300

400

500

600

700

800

752KB 47MB 161MB 679MB 1.1GB

time(ms)

file size

GPU R-S decoding

Copy data chunks from GPU memory to hostmemory

Decoding file (matrix multiplication)

Generating decoding matrix (inverse matrix)

Copy encoding matrix from host memory toGPU memory

Copy code chunks from host memory to GPUmemory

16 / 19

Page 38: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

GPU Cost Breakdown

Experiment Result (GPU cost breakdown)

k = 4, n = 6

0

100

200

300

400

500

600

0 200000 400000 600000 800000 1000000 1200000

time(ms)

file size(KB)

GPU R-S decoding

Total computation time

Total communication time

16 / 19

Page 39: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

GPU vs CPU

Experiment Result (GPU vs CPU)

k = 4, n = 6

9.87 X

25.73 X

58.06 X

64.72 X

0

200

400

600

800

1000

1200

1400

1600

1800

2000

0 200000 400000 600000 800000 1000000 1200000

bandwidth (MB/s)

file size(KB)

RS encode: CPU vs GPU

GPU bandwidth

CPU bandwidth

17 / 19

Page 40: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

GPU vs CPU

Experiment Result (GPU vs CPU)

k = 4, n = 6

1.47 X

18.12 X

39.19 X

75.4 X

89.99 X

0

200

400

600

800

1000

1200

1400

1600

0 200000 400000 600000 800000 1000000 1200000

bandwidth (MB/s)

file size(KB)

RS decode: CPU vs GPU

GPU bandwidth

CPU bandwidth

17 / 19

Page 41: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

tuning k

Experiment Result (tuning k)

File size: 1.1GBdouble fault tolerance

576.414185 665.971008 947.589661

1495.114624

2702.535156

5060.778809

0

1000

2000

3000

4000

5000

6000

4 8 16 32 64 128

time(ms)

k

Total GPU encoding time (n-k=2)

Total GPU encoding time

18 / 19

Page 42: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

tuning k

Experiment Result (tuning k)

File size: 1.1GBdouble fault tolerance

751.384766 1028.636963 1210.730591

1856.644165

3167.548828

5958.092285

0

1000

2000

3000

4000

5000

6000

7000

4 8 16 32 64 128

time(ms)

k

Total GPU decoding time (n-k=2)

Total GPU decoding time

18 / 19

Page 43: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

tuning k

Experiment Result (tuning k)

File size: 1.1GBtriple fault tolerance

766.305176 695.003662 934.1427

1509.532715

2674.739502

5053.129883

0

1000

2000

3000

4000

5000

6000

4 8 16 32 64 128

time(ms)

k

Total GPU encoding time (n-k=3)

Total GPU encoding time

18 / 19

Page 44: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

tuning k

Experiment Result (tuning k)

File size: 1.1GBtriple fault tolerance

873.917542 1026.679443 1345.650269

1875.053711

3167.339355

5960.175293

0

1000

2000

3000

4000

5000

6000

7000

4 8 16 32 64 128

time(ms)

k

Total GPU decoding time (n-k=3)

Total GPU decoding time

18 / 19

Page 45: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

Q & A

Thank You!

19 / 19

Page 46: Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

..........

.....

.....................................................................

.....

......

.....

.....

.

. . . . . . .Background

. . . . .How to use CUDA to accelerate

. . . .Experiment Q & A

J.S. Plank et al.A tutorial on reed-solomon coding for fault-tolerance inraid-like systems.Software Practice and Experience, 27(9):995–1012, 1997.

19 / 19