17
CE-4030: OPTIMIZING PHOTO EDITING APPLICATION FOR AMD HETEROGENEOUS SYSTEM ARCHITECTURE CYBERLINK MARKETING MANAGER STANLEY LAM

CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley Lam

Embed Size (px)

Citation preview

Page 1: CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley Lam

CE-4030: OPTIMIZING PHOTO EDITING APPLICATION

FOR AMD HETEROGENEOUS SYSTEM ARCHITECTURE

CYBERLINK MARKETING MANAGER

STANLEY LAM

Page 2: CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley Lam

AGENDA

Why Photo Editing Application – PhotoDirector?

Photo Editing Pipelines (RAW processing)

How AMD HSA helps in Photo Editing?

| PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL2

How AMD HSA helps in Photo Editing?

Proof of Concept: HSA Performance Showcase

Key Takeaways

Page 3: CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley Lam

Why Photo Editing Software

– PhotoDirector?

Page 4: CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley Lam

WHY PHOTO EDITING SOFTWARE?

� CyberLink Multimedia Software

‒ Media Playback: PowerDVD

‒ Video Editing: PowerDirector

‒ Photo Editing: PhotoDirector

� Why Photo Editing Software?

‒ Many editing tasks can be parallelize

THE RIGHT APPLICATION FOR HSA

ModelResolution

(M)Width Height MEM Space

Nikon D3S 24 6034 4012 193,667,264

Nikon D4 24 6048 4032 195,084,288

Nikon D70S 24 6034 4028 194,439,616

Nikon D800E 36 7378 4924 290,634,176

Nikon D90 36 7360 4912 289,218,560

Canon Eos 20D 21 5616 3744 168,210,432

Canon Eos 5D Mark Iii 21 5616 3744 168,210,432

Canon Eos 600D 22 5760 3840 176,947,200

Canon Eos 7D 20 5472 3648 159,694,848

Samsung Nx11 20 5472 3648 159,694,848

| PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL4

‒ Many editing tasks can be parallelize

‒ Processing / Decoding RAW files is time consuming

‒ RAW image editing can be both computational & memory

intensive

� How AMD HSA helps in Photo Editing?

‒ Utilize GPU compute units to speed up performance

‒ Eliminate overheads and memory copy bottlenecks between

HOST and DEVICE memories

Samsung Nx11 20 5472 3648 159,694,848

Samsung Dslr-A700 20 5472 3648 159,694,848

Sony Slt-A77V 24 6000 4000 192,000,000

Sony Dslr-A850 24 6000 4000 192,000,000

Sony Dslr-A900 24 6048 4032 195,084,288

Sony Nex-5N 24 6048 4032 195,084,288

Sony Dsc-Rx100 24 6000 4000 192,000,000

Sony Dsc-Rx1 20 5472 3648 159,694,848

Sony Dsc-F828 24 6000 4000 192,000,000

Pentax K-5 Ii 40 7264 5440 316,129,280

Phase One P 20 22 4096 5456 178,782,208

Phase One P 30 22 4096 5456 178,782,208

Phase One P40+ 32 6526 4904 256,028,032

Phase One P 45+ 39 7246 5444 315,577,792

Phase One P65+ 39 7246 5444 315,577,792

Phase One Dslr-A100 60 8984 6732 483,842,304

Page 5: CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley Lam

Photo Editing Pipeline

Page 6: CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley Lam

PHOTO EDITING PIPELINE

� KEY Area for potential performance improvement

RAW PROCESSING

Camera Model RAW Decode time

RAW Decoder JPEG Encoder

IMG_0077.CR2 NEW.JPG

Photo RetouchRAW Decoder

Photo Retouch

(Full Scale Size)

JPEG Encoder

IMG_0077.CR2 NEW.JPG

Photo Retouch

(Preview Size)

RAW Decoder

RAW Decoder

| PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL6

� KEY Area for potential performance improvement

‒ RAW Decoder

‒ Decoder elapse time is long for complex RAW formats

� RAW Decode is necessary during all stages in the editing

pipeline

‒ When generating FULL SCALE preview

‒ When entering Retouch module for the first time

‒ When resuming from previous editing

‒ When exporting to JPG/TIFF files

Test Platform

CPU: AMD A10-4655M

RAM: 4GB

OS: Windows 7 32-bit

Test Tool

PhotoDirector 5

Camera Model RAW Decode time

(single photo)

Canon 1D-X 7.347 seconds

Canon 1Ds MK3 8.400 seconds

Panasonic DMC FZ100 7.916 seconds

Phase One P25 10.475 seconds

Phase One P30 12.495 seconds

Phase One P45 13.049 seconds

Samsung NX10 6.263 seconds

Samsung NX100 5.280 seconds

Sony A700 5.522 seconds

Sony F828 6.996 seconds

Page 7: CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley Lam

RAW Decoder

(GPU)

PHOTO EDITING PIPELINE OPENCL AND MEMORY MANAGEMENT

RAW Decoder

(GPU)

JPEG Encoder

(CPU)

IMG_0077.CR2 NEW.JPG

Photo Retouch

(CPU & GPU)

HOST MemoryFrame Buffer Frame Buffer

UN-MAP UN-MAP

Frame Buffer

| PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL7

� Performance can be improved by utilizing GPU compute

power (OpenCL 1.x)

‒ Improve RAW decode performance

‒ Improve EDITING (Retouch) performance

‒ OpenCL 1.x is great, however…

DEVICE MemoryFrame Buffer

MAP

Frame Buffer

MAP

UN-MAP

Frame Buffer

UN-MAP

Page 8: CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley Lam

MEMORY SPACE AND PERFORMANCE

� OpenCL 1.x can speed up performance substantially and

yet creates new challenges

‒ Buffering between HOST and DEVICE creates overheads

‒ Sometimes the overheads are taking up a large portion of

execution time

‒ DEVICE memory space is limited

‒ 512MB can only hold one 36MP photo, or two 24MP photos

‒ Creates more read and writes between HOST and DEVICE

RELATIVE KERNEL VS. BUFFER PERFORMANCE ANALYSIS

| PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL8

‒ Creates more read and writes between HOST and DEVICE

memories

512MB

Frame Buffer

36MP

Tiling

More Reads

More Writes

Page 9: CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley Lam

How AMD HSA helps in

Photo Editing?

Page 10: CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley Lam

RAW Decoder

OPTIMIZING PERFORMANCE WITH AMD HSATHE ADVANTAGE OF ADOPTING HSA WITH OPENCL

RAW Decoder JPEG Encoder

IMG_0077.CR2 NEW.JPG

Photo Retouch

HOST Memory

Frame Buffer Frame Buffer

| PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL10

� Using AMD HSA to improve performance over OpenCL 1.x

‒ Share virtual memory breaks border of CPU and GPU

‒ Reduce overheads of moving data

‒ Use AMD APU platform to achieve true Heterogeneous Computing

DEVICE Memory

Frame Buffer Frame Buffer Frame Buffer

Page 11: CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley Lam

3 LEVELS OF SHARED VIRTUAL MEMORY

� 3 Levels of Shared Virtual Memory support (can be configured during initialization)

‒ Coarse Grain Buffer

‒ Ability to share virtual pointers between HOST and DEVICE

‒ Fine Grain Buffer

‒ Ability to share buffer space between HOST and DEVICE

CHOOSING SHARED VIRTUAL MEMORY

| PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL11

‒ Fine Grain System Buffer

‒ Ability to allow DEVICE to access entire HOST address space

‒ **Eliminates the need to specify explicit SVM pointers

� Coding Complexity

‒ Complexity: Coarse Grain > Fine Grain > Fine Grain System

Page 12: CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley Lam

COARSE GRAIN SHARED BUFFER

� PhotoDirector’s existing code base does not contain excessive pointers, we are able to choose the buffer

type that gives the best performance

OPENCL BUFFER VS. HSA BUFFER

Standard OCL Buffers

HOST DEVICE HOST DEVICE

Buffer 1 Buffer 1 Buffer 1

HSA Coarse Grain Buffers

| PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL12

Buffer 1

Buffer 2

Buffer 1

Buffer 2Buffer 1

Buffer 2

Buffer 1

Buffer 2

Page 13: CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley Lam

Proof of concept:

HSA Performance Showcase

Page 14: CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley Lam

AMD HSA BUFFER TYPESRELATIVE PERFORMANCE COMPARISON

� Our proof of concept codes showed

potential performance difference

‒ Good potential performance when using

Coarse Grain Buffers

‒ Results show roughly 2x difference between

Coarse Grain vs. Fine Grain implementation

CoarseGrain

Performance Index of Applying Hue Change to RAW Photo

| PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL14

Coarse Grain vs. Fine Grain implementation

FineGrain

Test Platform

CPU: AMD KAVERI

RAM: 4GB

OS: Windows 7 64-bit

Test Tool

PhotoDirector 5 Testbed

Page 15: CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley Lam

Key Takeaways

Page 16: CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley Lam

KEY TAKEAWAY

� AMD HSA shows great potential for

photo editing application

– CyberLink PhotoDirector

‒ Many more photo editing tasks can

leverage the performance advantage on

AMD HSA Platforms

AMD HSA SHOWS GREAT POTENTIAL

| PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL16

AMD HSA Platforms

‒ It’s important to experiment and work

with the most suitable HSA buffer type

‒ Potential performance improvements for

Parallelizable and Memory intensive

applications

Page 17: CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley Lam

DISCLAIMER & ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.

The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap

changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software

changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD

reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of

such revisions or changes.

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY

| PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL17

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY

INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE

LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION

CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

ATTRIBUTION

© 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices,

Inc. in the United States and/or other jurisdictions. SPEC is a registered trademark of the Standard Performance Evaluation Corporation (SPEC). Other names

are for informational purposes only and may be trademarks of their respective owners.