51
ELIS – Multimedia Lab Charles Hollemeersch Bart Pieters 28/11/2008 Gastcollege GPU-team Multimedia Lab

Gastcollege GPU-team Multimedia Lab

  • Upload
    edie

  • View
    76

  • Download
    0

Embed Size (px)

DESCRIPTION

Gastcollege GPU-team Multimedia Lab. Charles Hollemeersch Bart Pieters 28/11/2008. Who we are. Charles Hollemeersch PhD student at Multimedia Lab [email protected] Bart Pieters PhD student at Multimedia Lab [email protected] Visit our website - PowerPoint PPT Presentation

Citation preview

Page 1: Gastcollege  GPU-team  Multimedia Lab

ELIS – Multimedia Lab

Charles HollemeerschBart Pieters28/11/2008

Gastcollege GPU-team Multimedia Lab

Page 2: Gastcollege  GPU-team  Multimedia Lab

2/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

Who we are• Charles Hollemeersch

– PhD student at Multimedia Lab– [email protected]

• Bart Pieters– PhD student at Multimedia Lab– [email protected]

• Visit our website– http://multimedialab.elis.ugent.be/ and

http://multimedialab.elis.ugent.be/GPU

Page 3: Gastcollege  GPU-team  Multimedia Lab

3/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

Our Research Topics• Video acceleration

– accelerate state-of-the-art video codecs using the GPU• Game technology

– texture compression, parallel game actors …• Medical visualization

– reconstruction of medical images …• Multi-GPU applications

– …

Page 4: Gastcollege  GPU-team  Multimedia Lab

4/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

Introducing Multimedia Lab’s ‘Supercomputer’

• Quad GPU PC– four GeForce 280GTX video cards– 3732 gigaflops of GPU processing power

Page 5: Gastcollege  GPU-team  Multimedia Lab

5/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

Agenda• 8u30 – 9u45

– Bart – GPGPU• 10u00 – 11u15

– Charles – Game Technology

Page 6: Gastcollege  GPU-team  Multimedia Lab

ELIS – Multimedia Lab

Bart PietersMultimedia Lab – UGent

28/11/2008

Page 7: Gastcollege  GPU-team  Multimedia Lab

7/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

Overview

• Introduction – GPU– GPGPU

• Programming Concepts and Mappings– Direct3D and OpenGL– NVIDIA CUDA

• Case Study: Decoding H.264/AVC– motion compensation– results

• Conclusions• Q&A

Page 8: Gastcollege  GPU-team  Multimedia Lab

8/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

Graphics Processing Unit (GPU)• Programmable chip on graphics cards• Developed in a gaming context

– 3-D scenery by means of rasterization

• Programmable pipeline since DirectX 8.1– vertex, geometry, and pixel shaders– high-level language support

• Modern GPUs support high-precision– 32-bit floating point

• Massive floating-point processing power– 933 gigaflops (NVIDIA GeForce

280GTX)– 141.7 GB/s peak memory bandwidth– fast PCI-Express bus, up to 2GB/sec

transfer speed

Page 9: Gastcollege  GPU-team  Multimedia Lab

9/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

CPU and GPU Comparison

Intel Xeon X5355 NVIDIA G80 (8800 GTX)

Clock Speed 2,66 GHz 575 MHz

#Cores / SPEs 4 128

Max. GFlop/s (float) 85 500

Typical Instr. Duration 1-2 cycles (SSE) min. 4 cycles

Die Size (mm²) 143 480

Typical Memory Speed 8GB/sec (DDR2-1066) 86 GB/sec (GDDR3-1800)

Power Usage (watt) 120 185

Price (€) 800 500

• Today’s GPUs are yesterday’s supercomputers

Page 10: Gastcollege  GPU-team  Multimedia Lab

10/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

Why are GPUs so fast?• Parallelism

– massively-parallel/many-core architecture• needs a lot of work to be efficient

– specialized hardware build for parallel tasks– more transistors mean more performance

• Multi-billion dollar gaming industry drives innovation

Control

Cache

ALUALU

DRAM DRAM

GPUCPU

Page 11: Gastcollege  GPU-team  Multimedia Lab

11/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

Computational Model: Stream Processing Model

• GPU is practically a stream processor• Applications consist of streams and kernels• Each kernel takes relatively long to process (PCIe, memory

latency)– latency hidden by throughput

Input Stream

Kernel

Output Stream

Page 12: Gastcollege  GPU-team  Multimedia Lab

12/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

Inside a modern GPU

L2

FB

SP SP

L1

TF

Thre

ad P

roce

ssor

Vtx Thread Issue

Setup / Rstr / ZCull

Geom Thread Issue Pixel Thread Issue

Data Assembler

Host

SP SP

L1

TF

SP SP

L1

TF

SP SP

L1

TF

SP SP

L1

TF

SP SP

L1

TF

SP SP

L1

TF

SP SP

L1

TF

L2

FB

L2

FB

L2

FB

L2

FB

L2

FB

Page 13: Gastcollege  GPU-team  Multimedia Lab

13/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

Introducing GPGPU• The GPU on commodity video cards has evolved into a

processor that is– powerful– flexible– inexpensive– precise

Attractive platform for general-purpose computation

Page 14: Gastcollege  GPU-team  Multimedia Lab

14/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

GPGPU• General-Purpose GPU

– use the GPU for general-purpose algorithms

• No magical GPU compiler– still no x86 processor (Larrabee?)– explicit mappings required using advanced APIs

• Programming close to the hardware– trend for higher abstraction, i.e. NVIDIA CUDA

• Techniques are suited for future many-core architectures– future CPU/GPU projects, AMD Fusion, Larrabee, …

• Dependency issues– hundreds of independent tasks required for efficient use

Page 15: Gastcollege  GPU-team  Multimedia Lab

15/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

Stream Processing Model Revisited• GPU is practically a stream processor• Applications consist of streams and kernels• Read back is not possible

Input Stream

Kernel

Output Stream

Page 16: Gastcollege  GPU-team  Multimedia Lab

16/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

GPGPU in Practice

Fluid

Dynam

ics

Protei

n Fold

ing

Neural

Network

s

Finan

cial C

alcula

tions

Matrix

Multipl

icatio

n

Ray Tr

acing

Medica

l Imag

ing

Video

Coding

0

20

40

60

80… times faster

Page 17: Gastcollege  GPU-team  Multimedia Lab

17/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

GPGPU APIs• Classic way

– (mis)use graphics pipeline• render a special ‘scene’

– Direct3D, OpenGL– pixel, geometry, and vertex shaders

• New APIs specifically for GPGPU computations– NVIDIA CUDA, ATI CTM, DirectX11 Compute Shader,

OpenCl

Page 18: Gastcollege  GPU-team  Multimedia Lab

18/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

Overview

• Introduction – GPU– GPGPU

• Programming Concepts and Mappings– Direct3D and OpenGL– NVIDIA CUDA

• Case Study: Decoding H.264/AVC– motion compensation– results

• Conclusions• Q&A

Page 19: Gastcollege  GPU-team  Multimedia Lab

19/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

3-D Pipeline• Deep pipeline

GPUCPU

Application Transform Rasterizer Shade VideoMemory

(Textures)Vertices(3D)

Xformed,Lit

Vertices(2D)

Fragments(pre-pixels)

Finalpixels

(Color, Depth)

Graphics State

Render-to-texture

Programmable

Programmable

Page 20: Gastcollege  GPU-team  Multimedia Lab

20/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

3-D Pipeline - Transform• Vertex Shader

– processing geometry data– input is a vertex

• position• texture coordinates• vertex color,...

– output is a vertex

struct Vertex{ float3 position : POSITION; float4 color : COLOR0;};

Vertex wave(Vertex vin){ Vertex vout; vout.x = vin.x; vout.y = vin.y; vout.z = (sin(vin.x) + sin(IN.wave.x)) * 2.5f;

vout.color = float4(1.0f, 1.0f, 1.0f, 1.0f); return vout;}

Vertex Shader

Page 21: Gastcollege  GPU-team  Multimedia Lab

21/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

• Pixel (or fragment) Shader– input is interpolated vertex data

• position• texture coordinates• normals, …

– use texels from a texture– output is a fragment

• pixel color• transparancy• depth

– result is stored in the frame buffer or in a texture

• ‘Render to Texture’

PSOut shade(PSIn pin){

PSOut pout; pout.color = tex(pin.tex, sampler)

return pout;

}

PixelShader

struct PSOut{ float4 color; : COLOR0};

struct PSIn{ float2 tex; : TEXCOORD0};

3-D Pipeline - Shading

Page 22: Gastcollege  GPU-team  Multimedia Lab

22/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

GPU-CPU Analogies• Explicit mapping on 3-D concepts is necessary• Rewrite an algorithm and find parallelism• Use the GPU in parallel to the CPU

1. upload data to the GPU• very fast PCI-Express bus, up to 2GB/sec transfer speed

2. process the data• meanwhile the CPU is available

3. download result to the CPU• recent GPU models have high download speed

Page 23: Gastcollege  GPU-team  Multimedia Lab

23/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

Intermediary Buffer in System Memory

GPU-CPU Pipelined Design

CPUGP

U

GPU DataGPU Data

workPrepare GPU Data workPrepare GPU Data

Process Data, Visualize Process Data, Visualize

Page 24: Gastcollege  GPU-team  Multimedia Lab

24/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

GPU-CPU Analogies (2)

uv

CPU GPU

Array Texture

Page 25: Gastcollege  GPU-team  Multimedia Lab

25/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

GPU-CPU Analogies (3) …

fish[] = createfish() …

for all pixels bwfish[i][j]= bw(fish[i][j]); …

CPU GPU

Render

Array Write = Render to Texture

Page 26: Gastcollege  GPU-team  Multimedia Lab

26/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

GPU-CPU Analogies (4)

Loop body / kernel / algorithm step = Fragment Program

CPU GPUMotion

Compensation

for (int y=0;y<height;++y){ for (int x=0;x<width;++x) { Vec2 mv = mvectors[y/4][x/4];

int ox = Clip(x + mv.x); int oy = Clip(y + mv.y); output[y][x] = input[oy][ox]; }}

PSOut motioncompens(PSIn in){ PSOut out; Float2 mv = in.mv; Float2 texcoords = in.texcoords;

texcoords += mv; out.color = tex2d(texcoords, sampler);}

Vec2 mv = mvectors[y/4][x/4];

int ox = Clip(x + mv.x); int oy = Clip(y + mv.y); output[y][x] = input[oy][ox];

C++ Microsoft HLSL

Page 27: Gastcollege  GPU-team  Multimedia Lab

27/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

GPU Loop for Each Pixel

Vertex Shader

Rasterizer Pixel Shader

PSOut motioncompens(PSIn in){ PSOut out; Float2 mv = in.mv; Float2 texcoords = in.texcoords;

texcoords += mv; out.color = tex2d(texcoords, sampler);}

PSOut motioncompens(PSIn in){ PSOut out; Float2 mv = in.mv; Float2 texcoords = in.texcoords;

texcoords += mv; out.color = tex2d(texcoords, sampler);}

PSOut motioncompens(PSIn in){ PSOut out; Vec2 mv = in.mv; Vec2 texcoords = in.texcoords;

texcoords += mv; out.color = tex2d(texcoords, sampler);}

PSOut motioncompens(PSIn in){ PSOut out; Float2 mv = in.mv; Float2 texcoords = in.texcoords;

texcoords += mv; out.color = tex2d(texcoords, sampler);}

…PSOut motioncompens(PSIn in){ PSOut out; Float2 mv = in.mv; Float2 texcoords = in.texcoords;

texcoords += mv; out.color = tex2d(texcoords, sampler);}

Render a Quad

Page 28: Gastcollege  GPU-team  Multimedia Lab

28/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

Overview

• Introduction – GPU– GPGPU

• Programming Concepts and Mappings– Direct3D and OpenGL– NVIDIA CUDA

• Case Study: Decoding H.264/AVC– motion compensation– results

• Conclusions• Q&A

Page 29: Gastcollege  GPU-team  Multimedia Lab

29/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

GPGPU-specific APIs• NVIDIA CUDA

– Compute Unified Device Architecture– C-code with annotations compiled to executable code

• DirectX 11 Compute Shader– shader execution without rendering– technology preview available in latest DirectX SDK

• OpenCl– Open Computing Language– C++-code with annotations

• ATI CTM– Close to The Metal– GPU assembler– depricated

Page 30: Gastcollege  GPU-team  Multimedia Lab

30/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

NVIDIA CUDA• General-Purpose GPU Computing Platform• GPU is a super-threaded co-processor

– acceleration of massive amounts of GPU threads• Supported on NVIDIA G80 and higher

– 50-500EUR price range• No more (mis)use of 3-D API• C-code with annotations for

– memory location– host or device functions– thread synchronization

• Compilation with CUDA-compiler– split host and device code– linkable object code

Page 31: Gastcollege  GPU-team  Multimedia Lab

31/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

NVIDIA CUDA - Examplevoid runGPUTest() { CUT_DEVICE_INIT();

... float* d_data = NULL;

// allocate gpu memory cudaMalloc( (void**) &d_data, size);

dim3 dimBlock(8, 8, 1); dim3 dimGrid(width / dimBlock.x, height / dimBlock.y, 1);

// run kernel on gpu transformKernel<<< dimGrid, dimBlock, 0 >>>( d_data );

// download cudaMemcpy( h_data, d_data, size, cudaMemcpyDeviceToHost);

...}

Page 32: Gastcollege  GPU-team  Multimedia Lab

32/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

NVIDIA CUDA – Example (2)__ global__ void transformKernel( float* g_odata) { // calculate normalized texture coordinates unsigned int x = blockIdx.x*blockDim.x + threadIdx.x; unsigned int y = blockIdx.y*blockDim.y + threadIdx.y;

int2 mv = tex2D(mvtex, x, y);

int mx = x + mv.x; int my = y + mv.y; g_odata[y*width + x] = tex2D(reftex, mx, my);}

Page 33: Gastcollege  GPU-team  Multimedia Lab

33/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

Device (GPU)

Grid 1

Programming ModelHost (CPU)

Block(0, 0)

Block(1, 0)

Grid 2

Kernel 1

Kernel 2

Block(0, 1)

Block(1, 1)

Block(2, 0)Block(2, 1)

Block (1, 1)

Thread(0, 0)

Thread(1, 0)

Thread(2, 0)

Thread(0, 1)

Thread(1, 1)

Thread(2, 1)

Thread(0, 2)

Thread(1, 2)

Thread(2, 2)

Thread(0, 3)

Thread(1, 3)

Thread(2, 3)

Page 34: Gastcollege  GPU-team  Multimedia Lab

34/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

Hardware Model• Multiprocessor – MP (16)• Streaming Processor (8 per MP)

– handles one thread• Memory

– very fast high-latency– uncached– special memory hardware

for constants & texture (cached)

• Registers– limited amount

Page 35: Gastcollege  GPU-team  Multimedia Lab

35/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

CUDA Threads• Each Streaming Processor handles

one thread– 240 on GeForce 280GTX!

• Smart hardware can schedule thousands of threads on 240 processors

• Extremely lightweight– not like CPU threads

• Threads per Multiprocessor handled in SIMD manner– each thread executes the same

instruction at a given clock cycle– lock-step execution

Block (1, 1)

Thread(0, 0)

Thread(1, 0)

Thread(2, 0)

Thread(0, 1)

Thread(1, 1)

Thread(2, 1)

Thread(0, 2)

Thread(1, 2)

Thread(2, 2)

Thread(0, 3)

Thread(1, 3)

Thread(2, 3)

Page 36: Gastcollege  GPU-team  Multimedia Lab

36/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

x 3 y … z …

Lock-step Execution

x = x * 2;If ( x > 10) z = 0;Else z = y / 2 ++x

Thread 1

x 100 y … z …

x = x * 2;If ( x > 10) z = 0;Else z = y / 2 ++x

Thread 2

x 200 y … z …

x = x * 2;If ( x > 10) z = 0;Else z = y / 2 ++x

Thread 31

x 1 y … z …

x = x * 2;If ( x > 10) z = 0;Else z = y / 2 ++x

Thread 32

Locked Locked Locked Locked

Program Counter

Program Counter

Program CounterProgram Counter

Program Counter

• Heavy branching needs to be avoided

Page 37: Gastcollege  GPU-team  Multimedia Lab

37/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

Overview

• Introduction – GPU– GPGPU

• Programming Concepts and Mappings– Direct3D and OpenGL– NVIDIA CUDA

• Case Study: Decoding H.264/AVC– motion compensation– results

• Conclusions• Q&A

Page 38: Gastcollege  GPU-team  Multimedia Lab

38/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

Decoding H.264/AVC• Many decoding steps are suitable for parallelization

– quantization– transformation– motion compensation– deblocking– color space conversion

• Others introduce dependencies – entropy coding– intra prediction

Page 39: Gastcollege  GPU-team  Multimedia Lab

39/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

Video Coding Hardware• Specialized on-board 2-D video processing chips

– one macroblock at the time– black boxes

• limited support for non-windows systems– limited support for various video codecs

• e.g. H.264/AVC profiles– partly programmable

• GPU– millions of transistors– accessible via 3-D API or General Purpose GPU API

Page 40: Gastcollege  GPU-team  Multimedia Lab

40/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

Decoding an H.264/AVC bitstream• H.264/AVC

– recent video coding standard– successor of MPEG-4 Visual

• Computationally intensive– multiple reference frames (up to 16)– B-pictures– sub-pixel interpolations

• Motion compensation, reconstruction, deblocking, and color space conversion– takes up to 80% of total processing time– suitable for execution on the GPU

Page 41: Gastcollege  GPU-team  Multimedia Lab

41/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

Intermediary Buffer in System Memory

Pipelined Design for Video Decoding

CPUGP

U

MVs ResidueMVs Residue

VLD, IQ, InverseTransformation

ReadBitstream

VLD, IQ, Inverse Transformation

ReadBitstream

CSC,Visualization

MC, Reconstr.,Deblocking

CSC,Visualization

MC, Reconstr.,Deblocking

QPsQPs

Page 42: Gastcollege  GPU-team  Multimedia Lab

42/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

Input SequenceReference Picture Current Picture

(x1,y1)

Motion Vectors Prediction

Time

Motion Compensatio

n

Motion Compensation

(x2,y2)(x3,y3)

Residual Data

=

-

Page 43: Gastcollege  GPU-team  Multimedia Lab

43/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

Reference Picture

(x1,y1)

Motion Vectors Prediction

Time

Motion Compensatio

n

Motion Compensation: Decoder

(x2,y2)(x3,y3)

Residual Data

+

Page 44: Gastcollege  GPU-team  Multimedia Lab

44/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

Output Array

Motion Compensation in CUDADevice (GPU)

Kernel 1

Block(0, 0)

Block(1, 0)

Block(0, 1)

Block(1, 1)

Block(2, 0)

Block(2, 1)

Block (1, 1)

Thread(0, 0)

Thread(1, 0)

Thread(2, 0)

Thread(0, 1)

Thread(1, 1)

Thread(2, 1)

Thread(0, 2)

Thread(1, 2)

Thread(2, 2)

Page 45: Gastcollege  GPU-team  Multimedia Lab

45/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

Output Array

Motion Compensation in CUDADevice (GPU)

Kernel 1

Block(0, 0)

Block(1, 0)

Block(0, 1)

Block(1, 1)

Block(2, 0)

Block(2, 1)

Block (1, 1)

Thread(0, 0)

Thread(1, 0)

Thread(2, 0)

Thread(0, 1)

Thread(1, 1)

Thread(2, 1)

Thread(0, 2)

Thread(1, 2)

Thread(2, 2)

Page 46: Gastcollege  GPU-team  Multimedia Lab

46/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

Output Array

Motion Compensation in CUDADevice (GPU)

Kernel 1

Block(0, 0)

Block(1, 0)

Block(0, 1)

Block(1, 1)

Block(2, 0)

Block(2, 1)

Block (1, 1)

Thread(0, 0)

Thread(1, 0)

Thread(2, 0)

Thread(0, 1)

Thread(1, 1)

Thread(2, 1)

Thread(0, 2)

Thread(1, 2)

Thread(2, 2)

Page 47: Gastcollege  GPU-team  Multimedia Lab

47/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

Motion Compensation in Direct3D• Put video picture in textures• Use vertices to represent a macroblock• Let texture coordinate point to the texture

• Full-pel motion compensation– manipulate texture coordinates

• Multiple pixel shaders fill macroblocks and interpolate

[0.50,0.30]

[0.60,0.30]

[0.50,0.40]

[0.60,0.40]

[0.50,0.40]

[0.60,0.40]

[0.50,0.50]

[0.60,0.50] Reference texture for

rasterization process

Page 48: Gastcollege  GPU-team  Multimedia Lab

48/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

Interpolation Strategies for Sub-pixel MC

Vertex Grid

Viewable area

+

+

VertexShaders Pixel

Shader 1Full-Pel

Half-Pel

Q-Pel

PixelShader 2

PixelShader 3

Page 49: Gastcollege  GPU-team  Multimedia Lab

49/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

Experimental Results• GPU algorithm scores faster than CPU algorithm• CPU is offloaded, free for other tasks

Motion

Compe

nsatio

n

Debloc

king

Color S

pace

Conver

sion

0123

720p1080p

… times faster

Page 50: Gastcollege  GPU-team  Multimedia Lab

50/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

Conclusions• GPU is an attractive platform for general-purpose

computation– flexible, powerful, inexpensive

• General-purpose APIs– approach the GPU as a super-threaded co-processor

• GPGPU requires lots of parallel jobs– e.g. hundreds to thousands

• GPGPU allow faster execution while offloading the CPU– e.g. decoding of H.264/AVC bitstreams

• GPGPU techniques are suited for future architectures

Page 51: Gastcollege  GPU-team  Multimedia Lab

51/50

ELIS – Multimedia Lab

GPGPUBart Pieters - MMLab

Gastcollege OMMT – 28/11/2008

Questions?• Multimedia Lab

– http://multimedialab.elis.ugent.be/GPU– [email protected]