57
HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee

HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

HEADHardwarE Accelerated Deduplication

Final Report

CS710 Computing Acceleration with FPGA

December 9, 2016

Insu Jang

Seikwon Kim

Seonyoung Lee

Page 2: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Executive Summary

A-Z development of deduplication SW version of deduplication Chunking HW logic Fingerprinting HW logic HW device driver on PetalinuxMerge deduplication SW-HW

Performance BenefitARM+FPGA

8x faster than ARM only 3x faster than x86_64

With low power consumption

2

Page 3: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Background

3

Page 4: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Deduplication Concept

Data compression technique Eliminate duplicate copies

Reduce overall cost of storage

Manage data growth

4

Page 5: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Deduplication Concept

Cloud storage without deduplication

5

Page 6: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Deduplication Concept

Cloud storage with deduplication

6

Page 7: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Deduplication Process

File chunkingFixed size chunking

Variable size chunking

Chunk finger printing

Eliminating duplicates

7

Page 8: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

File Chunking

Divide file into chunks

Chunk Portion of data, presumed to be duplicate

Parts that will reform a file

8

This is a chunk__

This might be a same chunk__

This is CS710__

This is a chunk__

Page 9: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Fixed Size Chunking

9

Page 10: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Fixed Size Chunking

10

FF FF 00 AA BB CC 00 AA 11 AB00 DE AD BE EF CC DE AD 00 11

CC FF FF 00 AA BB CC 00 AA 11 ABCC 00 DE AD BE EF CC DE AD 00 11

5 Byte

5 Byte

Page 11: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Fixed Size Chunking

Pros fast way of chunking

Cons Deduplication ratio

11

FF FF 00 AA BB CC 00 AA 11 AB00 DE AD BE EF CC DE AD 00 11

CC FF FF 00 AA BB CC 00 AA 11 ABCC 00 DE AD BE EF CC DE AD 00 11

≠ ≠≠ ≠

0%

Page 12: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Variable Size Chunking

12

=

Page 13: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Variable Size Chunking

13

FF FF 00 AA BB CC 00 AA 11 AB00 DE AD BE EF CC DE AD 00 11

CC FF FF 00 AA BB CC 00 AA 11 ABCC 00 DE AD BE EF CC DE AD 00 11

Delimiter ‘CC’

Delimiter ‘CC’

Page 14: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Variable Size Chunking

Pros Deduplication ratio

Cons Slower than fixed chunking

14

=

50%

FF FF 00 AA BB CC 00 AA 11 AB00 DE AD BE EF CC DE AD 00 11

CC FF FF 00 AA BB CC 00 AA 11 ABCC 00 DE AD BE EF CC DE AD 00 11

=

Page 15: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Chunk Fingerprinting

Transform chunks into shorter values Faster to compare between chunks

15

13 00 14

13 00 14

13 00 14

00 BB 12

16 86 00

AA BE EF

Page 16: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Eliminating Duplicates

16

00 BB 12

16 86 00

AA BE EF

13 00 14

Page 17: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Restoring the Dedup File

17

Dedup_File.txt FP1 FP2 FP1 FP3 FP4 FP2 ……

FP1FP2FP3FP4……

This isdedup

good presentation

for

This is dedup This is good presentation for dedup

Page 18: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Deduplication Key Design Issues

How to create chunks Variable size chunks

Fast and simple rolling hash algorithm

How to finger print Collision-free hash

Murmur

• SHA-256

• FNV

18

Page 19: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Fast & Simple Rolling Hash

19

1 2 3 4 5 6 7 8 9 0

1 2 3+ + % X == VAL?

2 3 4+ % X == VAL?......

1-

3 4 5+ % X == VAL?2-

Page 20: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Finger Printing Murmur Hash

Non-cryptographic hash function

128 bit hash

Very low collision rate

Fast

20

Page 21: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Deduplication SW

Basic deduplication process in SWDo {

Read data from file

Perform chunking (rollingHash)

Calculate fingerprint (murmurHash)

Save <hash, chunk> to DB

Enqueue hash value into hash list

Eliminate chunk data from buffer

} while (!file.eof());

Save <filename, hash list> pair to DB

21

8KB buffer

ChunkCalculate murmurHash 0xe045f78ac…

Page 22: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Hardware Design

22

Page 23: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

FPGA Key Challenges

Performance disadvantage Computation must be 7 times faster than CPU

PS 667Mhz VS PL 100Mhz

Extra overhead when FPGA is usedMemory copy from PS to PL

Memory copy from PL to PS

23

Page 24: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

FPGA Key Technical Issues

What to parallelize Chunking hash

Finger printing hash

How to parallelize Unrolling

Memory placement

Pipelining

Communication between PS and PL DMA

24

Page 25: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Design in a Nutshell

25

ARM CPU

FIFO Memory

DMAMemory

Petalinux

Application

ChunkingModules

Finger PrintingModules

SD Card

Page 26: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Recall Fast & Simple Rolling Hash

26

1 2 3 4 5 6 7 8 9 0

1 2 3+ + % X == VAL?

2 3 4+ % X == VAL?......

1-

3 4 5+ % X == VAL?2-

SERIAL COMPUTATION1

2

Page 27: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

BRAM Basics

Maximum partition: 128 To parallelize, BRAM must be partitioned

Read per partitioned BRAM

27

0 1 … 64 65 … 128 129 ….

BRAM Part 1 BRAM Part 2 BRAM Part 3

Can be read at onceCannot be read at once

Page 28: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Chunking

28

Hardware Design

Page 29: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Parallelized Chunk Hash

PS sends 8KB data to PL via DMA

Partition 8KB data into multiple windows

Unroll modules to calculate sum Parallelized computation

29

0 … 1023+ + 64 … 1087+ + 128 … 1151+ +

SUM Module1 SUM Module2 SUM Module3

0 1 … 64 65 … 128 129 ….

Page 30: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Hash Module3

Hash Module2

Parallelized Chunk Hash

Compute hash of each added values Parallelized computation

30

0 … 1023+ +

% X == VAL?

% X == VAL?

% X == VAL?Hash Module1

SUM Module1

SUM Module2

SUM Module3

64 … 1087+ +

128 … 1151+ +

Page 31: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Rolling Module3

Rolling Module2

Rolling Module1

Parallelized Chunk Hash

Compute rolling hash Parallelized computation

1 … 1024+ % X == VAL?0-

% X == VAL?

% X == VAL?

Hash Module1

Hash Module2

Hash Module3

65 … 1088+ 64-

129 … 1152+ 129-

Page 32: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Finger Printing

32

Hardware Design

Page 33: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Parallelized Finger Printing

Murmur hash for finger printing

str[0..3] str[4..7] str[8..11]

Multiply c1 Multiply c1 Multiply c1

ROL 15 ROL 15 ROL 15

Multiply c2 Multiply c2 Multiply c2

Seed

ROL 13 ROL 13 ROL 13

Multiply, Add

Multiply, Add

Multiply, Add

Seed Val

Hash

Page 34: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Parallelized Finger Printing

Hash can partly be calculated in parallel

str[0..3] str[4..7] str[8..11]

Multiply c1 Multiply c1 Multiply c1

ROL 15 ROL 15 ROL 15

Multiply c2 Multiply c2 Multiply c2

Seed

ROL 13 ROL 13 ROL 13

Multiply, Add

Multiply, Add

Multiply, Add

Seed Val

Hash

Page 35: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

1. Compute parallelizable part simultaneously

Parallelized Finger Printing

str[0..3] str[4..7] str[8..11]

Multiply c1 Multiply c1 Multiply c1

ROL 15 ROL 15 ROL 15

Multiply c2 Multiply c2 Multiply c2

Key 1 Key 2 Key 3

Total up to 2048 keys (one key for 4 bytes data)in 45 cycles

Page 36: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Key 1 Key 2 Key 3

2. Serially compute non-parallelizable part

Parallelized Finger Printing

ROL 13 ROL 13 ROL 13

Multiply, Add

Multiply, Add

Multiply, Add

Seed Val

Hash

Calculate final hashin 8194 cycles

Page 37: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

DMA

37

Hardware Design

Page 38: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

PS – PL DMA Communication

Device driver for AXI DMAUsed AXI DMA library for Zynq in Github

https://github.com/bperez77/xilinx_axidmaaxi_dma_oneway_transfer(...)

Actual transfer seems to be done when HW module calls hls::stream.read()

38

SWaxi_dma_oneway

_transfer()

axi_dma_oneway

_transfer()

HWhls::stream.read()

hls::stream.write()

Page 39: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Device Driver

39

Hardware Design

Page 40: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Remaining Step

40

ARM CPU

FIFO Memory

DMAMemory

Petalinux

Application

ChunkingModules

Finger PrintingModules

SD Card

Invoke HW execution

Generic-uio doesn’t work..

Page 41: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Device Driver for Custom HW

Need a dedicated device driver for handling our HW module

Why not use generic-uio device driver?It seems not support custom interrupt handling

Our device driver handles interrupt when AP_DONE signal becomes high

4141

Registered interruptfor ‘hello’ HW module

Registered interrupt for AXI DMA

Page 42: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Device Driver for Custom HW

HW module interfaceFor standalone, auto-generated by Vivado

Generated when you export hardware

42

AXI Lite signal interface

Page 43: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Device Driver for Custom HW

Example: How to invoke HW execution?See device driver for standalone*(0x43c00000 + 0x0) |= 1;

In Linux device driver, it is equal to

ioremap()maps physical memory to kernel space virtual memory

43

u32 reg = ioread32(0x43c00000 + 0x0);

iowrite32(reg|0x1, 0x43c00000);

u32 reg = ioread32(0xe09a0000 + 0x0);

iowrite32(reg|0x1, 0xe09a0000);

Page 44: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Merge Deduplication HW/SW

Basic deduplication process in SW/HWDo {

Read data from file to fill 8KB buffer

Perform chunking to get a chunk

Calculate murmurHash for the chunk

Save <hash, chunk> pair to DB

Enqueue hash value into the hash list

Eliminate chunk data from the buffer

} while (!file.eof());

Save <filename, hash list> pair to DB

Chunking & Hashing are accelerated by HW44

Transfer buffer data via DMA to PL

Invoke PL execution and wait to be completed

Transfer result data via DMA from PL

Page 45: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Result

45

Page 46: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Experimentation

Compare 3 environment Intel x86_64 i7-6700

ARM only system (SW based Deduplication) ARM cortex A9

ARM + FPGA (HW based Deduplication)

Workload : 5 image files 88MB ~ 243MB

46

Intel x86_64 ARM cortex PS ARM + FPGA

Clock frequency 3.4GHz 667MHz 100MHz

Ratio (slower) 1 5.14 34.82

Page 47: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Performance : ARM vs PL [1/2]

Even Clock freq. 6 x slower than ARM SW only

ARM + FPGA shows 1.2x better performance47

ARM cortex PS ARM + FPGA

Clock freq. ratio 1 (667MHz) 6.6x (100MHz)

1.203 x1.150 x

1.227 x1.156 x

1.232 x1.193 x

1 x 1 x 1 x 1 x 1 x 1 x

0.7 x

1.0 x

1.3 x

87980600 105528056 95420240 168750056 243000056 GeomeanNo

rmal

ized

exe

cuti

on

tim

e

File size (bytes)

Deduplication time (lower is better)

ARM only ARM+FPGA

Page 48: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Performance : ARM vs PL [2/2]

8x better performance per unit clock

48

8.027 x 7.670 x8.184 x 7.709 x

8.219 x 7.958 x

1 x 1 x 1 x 1 x 1 x 1 x

0.0 x

2.0 x

4.0 x

6.0 x

8.0 x

10.0 x

87980600 105528056 95420240 168750056 243000056 Geomean

No

rmal

ized

exe

cuti

on

tim

e p

er u

nit

clo

ck

File size (bytes)

Deduplication time (per unit clock, lower is better)

ARM only ARM+FPGA

ARM cortex PS ARM + FPGA

Clock freq. ratio 1 (667MHz) 6.6x (100MHz)

Page 49: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Performance : Overview

49

8.027 x7.670 x

8.184 x7.709 x

8.219 x 7.958 x

2.873 x 2.681 x

3.693 x

2.722 x 2.903 x 2.953 x

1 x 1 x 1 x 1 x 1 x 1 x

0.0 x

1.0 x

2.0 x

3.0 x

4.0 x

5.0 x

6.0 x

7.0 x

8.0 x

9.0 x

87980600 105528056 95420240 168750056 243000056 Geomean

No

rmal

ized

exe

cuti

on

ti

me

per

un

it c

lock

File size (bytes)

Deduplication time (per unit clock, lower is better)

ARM only x86_64 ARM+FPGA

Page 50: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

IP Utilization & latency

50

Latency (clock cycle)

Estimated utilization

3%

46%

17%

65%

0% 20% 40% 60% 80%

DSP

BRAM

FF

LUT

Page 51: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Power Consumption

Total on-chip power : 1.925W Dynamic : 1.752W

Static : 0.173W

51

Page 52: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Executive Summary

A-Z development of deduplication SW version of deduplication Chunking HW logic Fingerprinting HW logic HW device driver on PetalinuxMerge deduplication SW-HW

Performance Benefit ARM+FPGA

8x faster than ARM only 3x faster than x86_64

With low power consumption

52

Page 53: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

53

Thank you

Page 54: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

54

Backup

Page 55: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Performance : ARM vs x86_64 [1/2]

55

ARM cortex PS Intel x86_64

Clock frequency 667MHz 3.4GHz

14.242 x 14.584 x

11.297 x

14.436 x 14.432 x

1 x 1 x 1 x 1 x 1 x

0.0 x

5.0 x

10.0 x

15.0 x

87,980,600 105,528,056 95,420,240 168,750,056 243,000,056

No

rmal

ized

exe

cuti

on

tim

e

File size (bytes)

Deduplication Performance (lower is better)

ARM x86_64

Page 56: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Performance : ARM vs x86_64 [2/2]

56

ARM cortex PS Intel x86_64

Clock frequency 667MHz 3.4GHz

2.794 x 2.861 x

2.216 x

2.832 x 2.831 x

1 x 1 x 1 x 1 x 1 x

0.0 x

0.5 x

1.0 x

1.5 x

2.0 x

2.5 x

3.0 x

87,980,600 105,528,056 95,420,240 168,750,056 243,000,056

No

rmal

ized

exe

cuti

on

tim

e p

er u

nit

clo

ck

File size (bytes)

Deduplication Performance (per unit clock, lower is better)

ARM x86_64

Page 57: HEAD - GitHub Pages · HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee. Executive

Implemented Design

57