42
M APPING V IDEO C ODECS TO H ETEROGENEOUS A RCHITECTURES Mauricio Alvarez-Mesa | Techische Universität Berlin - Spin Digital | MULTIPROG 2015

APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

MAPPING VIDEO CODECS TO HETEROGENEOUSARCHITECTURESMauricio Alvarez-Mesa | Techische Universität Berlin - Spin Digital | MULTIPROG 2015

Page 2: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

Video Codecs

– 70% of internet traffic will be video in 2018 [CISCO]

– Video compression: significant improvements over the last years

– Growing demand/offer for higher quality: HD to UHD

– Video codecs require more and more computing power

Mapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 2

Page 3: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

Heterogeneous Architectures

Like them or not: they are available

– CPUs + GPUs + DSPs + HW accelerators + ...

Mapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 3

Page 4: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

Mapping: the Complete Application

– Video Player: Decoding + Rendering

Mapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 4

Page 5: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

Mapping: Where to Execute What?

Figure : Qualcomm Sanpdragon 810 SoC

Mapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 5

Page 6: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

Outline

Introduction

Hardware Acceleration of Video Decoding

OpenCL Acceleration of Video Decoding

CPU Decoding + GPU Rendering

Conclusions

Mapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 6

Page 7: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

Hardware Accelerators

Mapping:

– Decoder in HW: ASIC for video decoding

– Rendering in GPU

Pros

– Low-power: 4K H.265 < 100 mW

Cons

– Flexibility: more than one codec

– Programmability issuesFigure : Tikekar et al., JSSC, Jan. 2014

Mapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 7

Page 8: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

Hardware Accelerators

Mapping:

– Decoder in HW: ASIC for video decoding

– Rendering in GPU

Pros

– Low-power: 4K H.265 < 100 mW

Cons

– Flexibility: more than one codec

– Programmability issuesFigure : Tikekar et al., JSSC, Jan. 2014

Mapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 7

Page 9: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

HW Accelerators: the Software Issue

Multiple APIs: accessing HW modules from SW player

“Supported codecs depend on the chosen API: which is selected atruntime depending on what is available on the system”Gstreamer Manual

API Vendor MPEG2 MPEG4 H.264 VC1 H.265

VAAPI Intel X X X X ?VDPAU NVidia X X X X ?XVBA AMD X X ?DXVA Microsoft X ?VDA Apple X ?

Mapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 8

Page 10: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

Outline

Introduction

Hardware Acceleration of Video Decoding

OpenCL Acceleration of Video Decoding

CPU Decoding + GPU Rendering

Conclusions

Mapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 9

Page 11: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

OpenCL Video Decoding

Mapping

– Decoder: accelerate decoding on the GPU using OpenCL

– Rendering: in GPU

Pros

– Programmability

– Compatibility: write once, and compile/run everywhere (isn’t it?)

Cons

– Programming model mismatch

– Overheads

Mapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 10

Page 12: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

OpenCL Video Decoding

Mapping

– Decoder: accelerate decoding on the GPU using OpenCL

– Rendering: in GPU

Pros

– Programmability

– Compatibility: write once, and compile/run everywhere (isn’t it?)

Cons

– Programming model mismatch

– Overheads

Mapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 10

Page 13: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

OpenCL Video Decoding

Mapping

– Decoder: accelerate decoding on the GPU using OpenCL

– Rendering: in GPU

Pros

– Programmability

– Compatibility: write once, and compile/run everywhere (isn’t it?)

Cons

– Programming model mismatch

– OverheadsMapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 10

Page 14: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

H.264 Video Decoding with GPUs

– H.264/AVC: one of the most widely used video codecs

Entropydecoding

Inversequanti-zation

InverseDCT

+ Deblockingfilter

Intraprediction

Motioncompen-

sation

Framebuffer

YUVvideo

H.264stream

Kernels EntropyDec.

InverseDCT

Intra-Pred.

MotionComp.

De-blocking

Parallelism Low High Low High Low

Divergence High Low High High High

Mapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 11

Page 15: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

H.264 Inverse Transform on GPUs

Kernel Only

0

5

10

15

20

25

30

Spee

dups

CRF37 CRF22 Intra

4x4

CRF37 CRF22 Intra

8x8SIMDGT430GTS450GTX560Ti

– Speedups over scalar execution in CPU

– Significant speedups: over 25×Mapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 12

Page 16: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

H.264 Inverse Transform on GPUs

Kernel + Memory Transfers + OpenCL runtime

0.00.51.01.52.02.53.03.54.0

Spee

dups

CRF37 CRF22 Intra

1080p

CRF37 CRF22 Intra

2160pNon-OverOverlapSIMD

– Entire application slower than CPU + SIMD with 1 coreB. Wang et al. An Optimized Parallel IDCT on Graphics Processing Units. LNCS 2012.

Mapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 13

Page 17: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

H.264 Motion Compensation on GPUs

– discrete GPU attains the highest performance, 1.7 speedup

– significant performance penalty when including the overheadMapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 14

Page 18: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

H.264 Motion Compensation on GPUs

0

0.2

0.4

0.6

0.8

1

MC

Bre

akdown

kernel Mem Copy OCL runtime

0

0.2

0.4

0.6

0.8

1

1080p 2160p

AMD

Nvidia

Intel

AMD

Nvidia

Intel

B. Wang et al. Parallel H.264/AVC Motion Compensation for GPUs using OpenCL. IEEE TCSVT.

Mapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 15

Page 19: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

H.264: Complete Application

Entropy:frame N

IDCTEntropy:

frame N+1IDCT ...

Reconstruction:frame N

MCReconstruction:

frame N+1MC ...

Signal Signal

offload

offload

0 5 10 15 20 25Seq-CPU

Seq-CPU-IDCTSeq-CPU-MC

Seq-CPU-IDCT-MCSeq-GPU-IDCT

Seq-GPU-MCSeq-GPU-IDCT-MC

Pipelined OthersEntropyIDCTMCMBRECTotal

– GPU + frame-level pipelining: faster than single threaded CPU

– Required complete application redesignMapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 16

Page 20: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

HEVC/H.265 Video Coding Standard

– HEVC/H.265: 50% better compression compared to H.264/AVC

– Add high-level parallelization tools: WP, Tiles→ good for CPUs

– Remove dependencies in deblocking filter

– Add another filter: SAO

Entropydecoding

Inversequanti-zation

InverseDCT

+ Deblockingfilter SAO filter

Intraprediction

Motioncompen-

sation

Referencepictures

YUVvideo

HEVCstream

Kernels Entropy Dec. Inverse T Intra-Pred. Motion Comp. De-blocking SAOParallelism Low High Low High High HighDivergence High Low High High High Low

Mapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 17

Page 21: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

H.265 In-loop Filters on Discrete GPUs

Mapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 18

Page 22: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

H.265 In-loop Filters on Integrated GPUs

Mapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 19

Page 23: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

H.265 In-loop Filters on Mobile GPUs

Mapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 20

Page 24: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

OPenCL Decoding: Lessons Learned

– Programming model mismatch:

– not all kernels can be offloaded– for kernels that can be offloaded: data Locality loss

– Acceleration of kernels but not of complete application

– Overheads:

– CPU GPU memory transfers– OpenCL runtime

– Effects on rendering of using the GPU for decoding not analyzed(yet)

Mapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 21

Page 25: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

Outline

Introduction

Hardware Acceleration of Video Decoding

OpenCL Acceleration of Video Decoding

CPU Decoding + GPU Rendering

Conclusions

Mapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 22

Page 26: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

CPU Decoding

CPU optimizations

– Multithreading: C++11 threads

– SIMD: compiler intrinsics

Pros

– Programmability

– High performance: in high performance CPUs

Cons

– Low power efficiency: see Chi et al. TACO Jan 2015.

– Low performance: in low power CPUs

Mapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 23

Page 27: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

CPU Decoding

CPU optimizations

– Multithreading: C++11 threads

– SIMD: compiler intrinsics

Pros

– Programmability

– High performance: in high performance CPUs

Cons

– Low power efficiency: see Chi et al. TACO Jan 2015.

– Low performance: in low power CPUsMapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 23

Page 28: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

4K CPU Decoding Performance

1-Thread 2-Threads 4-Threadsfps sp. fps sp. fps sp

Haswell 37.8 1.00 73.4 1.94 134.9 3.57Bay Trail 6.2 1.00 12.2 1.95 23.6 3.79

Cortex-A9 2.8 1.00 5.3 1.90 8.6 3.05Cortex-A15 6.0 1.00 11.4 1.91 19.9 3.33

– High performance CPUs fast enough for 4Kp50

– Low power CPUs not fast enough for 4Kp25

References– Chi et al. SIMD acceleration for HEVC Decoding, IEEE TCSVT. 2015– Chi et al. Parallel Scalability and Efficiency of HEVC Parallelization Approaches. IEEE TCSVT. Dec 2012– Chi et al. Parallel HEVC Decoding on Multi- and Many-core Architectures. JSPS 2013

Mapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 24

Page 29: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

GPU Rendering using OpenGL

Pros

– GPU should be efficient for simple 2D rendering

– Color conversion is just matrix vector multiplication

– GPUs have HW units for color conversion

– OPenGL is a mature language, well supported in many platforms

Cons

– 2D rendering is complex

– Color Conversion is not the bottleneck

– OPenGL has extensions, and has no performance guarantees

Mapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 25

Page 30: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

GPU Rendering using OpenGL

Pros

– GPU should be efficient for simple 2D rendering

– Color conversion is just matrix vector multiplication

– GPUs have HW units for color conversion

– OPenGL is a mature language, well supported in many platforms

Cons

– 2D rendering is complex

– Color Conversion is not the bottleneck

– OPenGL has extensions, and has no performance guarantees

Mapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 25

Page 31: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

Video Rendering Stages

DecodingTexture

UploadingColor ModelConversion

Displaying

Frame inYUV YUV RGBA

– Texture Uploading: upload/transform image from CPU to GPU

– Color Model Conversion: YUV into RGBA conversion

– Displaying: from framebuffer to screen

Mapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 26

Page 32: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

4K Video Play: Discrete GPU

YUV420 8-bit

11.9MB

YUV420 10-bit 23.7MB

YUV422 10-bit 31.6MB

YUV444 10-bit 47.5MB

Video Formats

0

5

10

15

20

Perform

ance (millisec/frame)

4K Playback on GTX750-Ti Linux

Decoding + Rendering

Decoding

Rendering

Texture Uploading

CC + Displaying

Color Conversion

Mapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 27

Page 33: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

4K Video Play: Integrated GPUs

YUV420 8-bit

11.9MB

YUV420 10-bit 23.7MB

YUV422 10-bit 31.6MB

YUV444 10-bit 47.5MB

Video Formats

0

10

20

30

40

50

60

Perform

ance (millisec/frame)

4K Playback on HD4600 Linux

Decoding + Rendering

Decoding

Rendering

Texture Uploading

CC + Displaying

Color Conversion

Mapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 28

Page 34: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

Integrated GPUs + Linear Texture

YUV420 8-bit 11.9MB

YUV420 10-bit 23.7MB

YUV422 10-bit 31.6MB

YUV444 10-bit 47.5MB

Video Formats

0

5

10

15

20

25

30

Performance (millisec/frame)

4K Texture Uploading on HD4600 Win8.1

glTexSubImage2D

PBO + memcpy

PBOMAP

INTEL_map_texture

YUV420 8-bit

11.9MB

YUV420 10-bit 23.7MB

YUV422 10-bit 31.6MB

YUV444 10-bit 47.5MB

Video Formats

0

5

10

15

20

25

30

Perform

ance (millisec/frame)

4K Playback on HD4600 Win8.1

Decoding + Rendering

Decoding

Rendering

Texture Uploading

CC + Displaying

Color Conversion

Mapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 29

Page 35: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

Integrated GPUs + Linear Texture

YUV420 8-bit 11.9MB

YUV420 10-bit 23.7MB

YUV422 10-bit 31.6MB

YUV444 10-bit 47.5MB

Video Formats

0

5

10

15

20

25

30

Performance (millisec/frame)

4K Texture Uploading on HD4600 Win8.1

glTexSubImage2D

PBO + memcpy

PBOMAP

INTEL_map_texture

YUV420 8-bit

11.9MB

YUV420 10-bit 23.7MB

YUV422 10-bit 31.6MB

YUV444 10-bit 47.5MB

Video Formats

0

5

10

15

20

25

30

Perform

ance (millisec/frame)

4K Playback on HD4600 Win8.1

Decoding + Rendering

Decoding

Rendering

Texture Uploading

CC + Displaying

Color Conversion

Mapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 29

Page 36: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

Decoding + Rendering: lessons learned

Discrete GPUs

– Performance drop: sharing memory for PCIe copy

– Can be solved by using integrated GPUs: remove copies

Integrated GPUs

– Texture uploading takes more time than decoding

– Can be solved with an OPenGL extension

OpenGL extensions

– INTEL_map_texture: remove memory layout conversion

– Performance loss due to memory contention

Mapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 30

Page 37: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

Decoding + Rendering: lessons learned

Discrete GPUs

– Performance drop: sharing memory for PCIe copy

– Can be solved by using integrated GPUs: remove copies

Integrated GPUs

– Texture uploading takes more time than decoding

– Can be solved with an OPenGL extension

OpenGL extensions

– INTEL_map_texture: remove memory layout conversion

– Performance loss due to memory contention

Mapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 30

Page 38: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

Decoding + Rendering: lessons learned

Discrete GPUs

– Performance drop: sharing memory for PCIe copy

– Can be solved by using integrated GPUs: remove copies

Integrated GPUs

– Texture uploading takes more time than decoding

– Can be solved with an OPenGL extension

OpenGL extensions

– INTEL_map_texture: remove memory layout conversion

– Performance loss due to memory contentionMapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 30

Page 39: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

Outline

Introduction

Hardware Acceleration of Video Decoding

OpenCL Acceleration of Video Decoding

CPU Decoding + GPU Rendering

Conclusions

Mapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 31

Page 40: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

Conclusions

Video codecs and parallel architectures: conflicting objectives

– Video codecs: remove redundancy→ introduce dependencies

– Parallelism : remove dependencies→ reduce compresssion

Solve the system problem

– Complete video decoding: not just kernels

– Rendering: is not just matrix multiplication

– Video player is Decoding + Rendering

Mapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 32

Page 41: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

Conclusions

Video codecs and parallel architectures: conflicting objectives

– Video codecs: remove redundancy→ introduce dependencies

– Parallelism : remove dependencies→ reduce compresssion

Solve the system problem

– Complete video decoding: not just kernels

– Rendering: is not just matrix multiplication

– Video player is Decoding + Rendering

Mapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 32

Page 42: APPING VIDEO CODECS TO HETEROGENEOUS ...research.ac.upc.edu/multiprog/multiprog2015/alvarez...–OPenGL is a mature language, well supported in many platforms Cons –2D rendering

Questions?

– Acknowledgements: Biao Wang, Haifeng Gao, Chi Ching Chi

– More info:

– http://www.aes.tu-berlin.de/

– http://www.spin-digital.com/

Mapping Video Codecs to Heterogeneous Architectures | M. Alvarez-Mesa | Multiprog 2015

Slide 33