Shredder GPU-Accelerated Incremental Storage and Computation Pramod Bhatotia §, Rodrigo Rodrigues...

ShredderGPU-Accelerated Incremental

Storage and Computation

Pramod Bhatotia§, Rodrigo Rodrigues§, Akshat Verma¶

§MPI-SWS, Germany¶IBM Research-India

USENIX FAST 2012

Pramod Bhatotia 2

Handling the data deluge• Data stored in data centers is growing at a fast

• Challenge: How to store and process this data ?• Key technique: Redundancy elimination

• Applications of redundancy elimination• Incremental storage: data de-duplication• Incremental computation: selective re-execution

Pramod Bhatotia 3

Redundancy elimination is expensive

For large-scale data, chunking easily becomes a bottleneck

Content-based chunking [SOSP’01]

ContentMarker

ChunkingFile Chunks Hash Yes

Duplicate

Hashing Matching

0 1 0 1 1 0 1 0 1 1 0 0 1 1 0 1 1 0 0 1 0 1 1 1 1 0 1 1 0 1 0 1 0

FileFingerprint

Pramod Bhatotia 4

Accelerate chunking using GPUs

GPUs have been successfully applied to compute-intensive tasks

Multicore GPU based design0

20Gbps (2.5GBps)

Using GPUs for data-intensive tasks presents new challenges

StorageServers

Pramod Bhatotia 5

Rest of the talk

• Shredder design• Basic design• Background: GPU architecture & programming

model• Challenges and optimizations

• Evaluation• Case studies

• Computation: Incremental MapReduce• Storage: Cloud backup

Pramod Bhatotia 6

Shredder basic design

GPU (Device)CPU (Host)

Reader

Transfer

Chunkingkernel

Data for chunking

Chunkeddata

Pramod Bhatotia 7

GPU architecture

CPU(Host)

GPU (Device)

Multi-processor N

Multi-processor 2Multi-processor 1

SP SP SPSP

Host memory

Device global

memory

Shared memory

Pramod Bhatotia 8

GPU programming model

CPU(Host)

GPU (Device)

Multi-processor N

Multi-processor 2Multi-processor 1

SP SP SPSP

Host memory

Device global

memory

Shared memory

Output

Threads

Pramod Bhatotia 9

Scalability challenges

1. Host-device communication bottlenecks

2. Device memory conflicts

3. Host bottlenecks(See paper for details)

Pramod Bhatotia 10

Host-device communication bottleneck

Challenge # 1

GPU (Device)

Device global

memoryPCI

Synchronous data transfer and kernel execution • Cost of data transfer is comparable to kernel

execution• For large-scale data it involves many data transfers

CPU (Host)

Mainmemory Chunking

kernel

Transfer

Reader

Pramod Bhatotia 11

Asynchronous execution

TimeCopy to Buffer 1

GPU (Device)Asynchronous

Buffer 1

Device global memoryMain

memoryTransfer Buffer 2

CPU (Host)

ComputeBuffer 2

ComputeBuffer 1

Copy to Buffer 2

Pros: + Overlaps communication with computation+ Generalizes to multi-buffering

Cons:- Requires page-pinning of buffers at host side

Pramod Bhatotia 12

Circular ring pinned memory buffers

GPU (Device)

CPU (Host)

Memcpy

Pinned circular Ring buffers

Pageablebuffers

Device global memory

Asynchronouscopy

Pramod Bhatotia 13

Device memory conflictsChallenge # 2

GPU (Device)

Device global

memoryPCI

CPU (Host)

Mainmemory Chunking

kernel

Transfer

Reader

Pramod Bhatotia 14

Accessing device memory

Thread-1 Thread-2 Thread-4

Multi-processor

SP-1 SP-2 SP-3 SP-4

Thread-3400-600Cycles

Fewcycles

Device shared memory

Pramod Bhatotia 15

Un-coordinated accesses to global memory lead to a large number of memory bank conflicts

Memory bank conflicts

Multi-processor

SP SP SP

Pramod Bhatotia 16

Accessing memory banks

Bank 0 Bank 3Bank 1 Bank 2

MSBs LSBs

Address

Memoryaddress

Chipenable

Data out

Interleaved memory

Pramod Bhatotia 17

Memory coalescingDevice global memory

Memorycoalescing

e1 2 4Thread # 3

Pramod Bhatotia 18

Processing the data

Thread-1 Thread-2 Thread-4

Thread-3

Pramod Bhatotia 19

Outline

• Shredder design• Evaluation• Case-studies

Pramod Bhatotia 20

Evaluating Shredder

• Goal: Determine how Shredder works in practice• How effective are the optimizations?• How does it compare with multicores?

• Implementation• Host driver in C++ and GPU in CUDA• GPU: NVidia Tesla C2050 cards• Host machine: Intel Xeon with12 cores

(See paper for details)

Pramod Bhatotia 21

Shredder vs. Multicores

Multicore GPU Basic GPU Async GPU Async + Coalescing

Pramod Bhatotia 22

Outline

• Shredder design• Evaluation• Case studies

• Computation: Incremental MapReduce• Storage: Cloud backup (See paper for details)

Pramod Bhatotia 23

Read input

Map tasks

Reduce tasks

Write output

Incremental MapReduce

Pramod Bhatotia 24

Unstable input partitions

Read input

Map tasks

Reduce tasks

Write output

Pramod Bhatotia 25

GPU accelerated Inc-HDFS

Input file

ShredderHDFS Client

copyFromLocal

Split-1 Split-2 Split-3

Content-based chunking

Pramod Bhatotia 26

Related work

• GPU-accelerated systems• Storage: Gibraltar [ICPP’10], HashGPU [HPDC’10]• SSLShader[NSDI’11], PacketShader[SIGCOMM’10],

• Incremental computations• Incoop[SOCC’11], Nectar[OSDI’10],

Percolator[OSDI’10],…

Pramod Bhatotia 27

Conclusions

• GPU-accelerated framework for redundancy elimination• Exploits massively parallel GPUs in a cost-

effective manner

• Shredder design incorporates novel optimizations• More data-intensive than previous usage of GPUs

• Shredder can be seamlessly integrated with storage systems • To accelerate incremental storage and

computation

Thank You!

Shredder GPU-Accelerated Incremental Storage and Computation Pramod Bhatotia §, Rodrigo Rodrigues...

Documents

C:\Users\Akshat Mishra\Desktop\

DUAL-SHEAR M45 SHREDDER M45 SHREDDER SSI’s Dual-Shear … · M45 SHREDDER . SSI’s Dual-Shear ® M45 is a low-speed, high-torque, two-shaft, rotary shear shredder designed to efficiently

Where’s the shredder?

Connected Cars by Akshat Kant

paper shredder

Chipper/ Shredder Safety

Santa Cruz Shredder

SSI Shredder

AKSHAT JAIN production

Water Cooler by Akshat enterprise

EV60E Shear Shredder

shredder design

Quad Q100 SHREddER - Industrial Shredder Manufacturer · Quad® Q100 SHREddER SSI’s Quad® Q100 is a low-speed, high-torque, four-shaft, rotary shear shredder designed to produce

Akshat Ppt

Akshat Training Report

Indian Telecommunications Industry - Akshat Grover

Akshat final ppt.ppt

Series HR & EWS SINGLE-SHAFT SHREDDER · Single-shaft shredder Series EWS Herbold sing le-shaft shredder series EWS are the advanced development of the standard shredder series HR,

Akshat Apartments Pvt. Ltd · 2014-12-19 · a 2 Penthouses & a 3 Duplexes LOCATION ADVANTAGES a 3 BHK 4 with Servant Roam Penthouses in Akshat Roop Duplexes in Akshat Gulab Located

Akshat Kanota Brochure Final · BRINGING JOY TO YOUR LIFE SINCE 1995 Akshat Apartments Pvt. Ltd. "Akshat House" A-27/13-A, Kanti Chandra Road, Bani Park, Jaipur - 302016, Rajasthan,