HSA Overview

HETEROGENEOUS SYSTEM ARCHITECTURE

AND

THE HSA FOUNDATION

2 | Heterogeneous System Architecture | June 2012

INTRODUCING HETEROGENEOUS SYSTEM ARCHITECTURE (HSA)

HSA is a purpose designed architecture to enable the

software ecosystem to combine and exploit the

complementary capabilities of sequential programming

elements (CPUs) and parallel processing elements (such as

GPUs) to deliver new capabilities to users that go beyond

the traditional usage scenarios

AMD is making HSA an open standard to jumpstart the

ecosystem


EFFECTIVE COMPUTE OFFLOAD IS MADE EASY BY HSA

Data Parallel Workloads

Graphics Workloads

APP Accelerated Software Applications

Serial and Task Parallel Workloads

Accelerated Processing Unit


AMD HSA FEATURE ROADMAP

Architectural

Integration

Unified Address Space

for CPU and GPU

Fully coherent memory

between CPU & GPU

GPU uses pageable

system memory via

CPU pointers

Optimized

Platforms

Bi-Directional Power

Mgmt between CPU

and GPU

GPU Compute C++

support

HSA Memory

Management Unit

Physical

Integration

Integrate CPU & GPU

in silicon

Unified Memory

Controller

Common

Manufacturing

Technology

System

Integration

GPU compute context

switch

Quality of service

GPU graphics pre-

emption


HSA COMPLIANT FEATURES

Optimized

Platforms

Bi-Directional Power

Mgmt between CPU

and GPU

GPU Compute C++

support

HSA Memory

Management Unit

Support OpenCL C++ directions and Microsoft’s upcoming C++ AMP language.

This eases programming of both CPU and GPU working together to process

parallel workloads, such as Computer Vision, Video Encoding/Transcoding, etc.

CPU and GPU can share system memory. This means all system memory is

accessible by both CPU or GPU, depending on need. In today’s world, only a

subset of system memory can be used by the GPU.

Enables “power sloshing” where CPU and GPU are able to dynamically lower or

raise their power and performance, depending on the activity and which one is

more suited to the task at hand.


HSA COMPLIANT FEATURES

The unified address space provides ease of programming for developers to create

applications. For HSA platforms, a pointer is really a pointer and does not require

separate memory pointers for CPU and GPU.

The GPU can take advantage of the CPU virtual address space. With pageable

system memory, the GPU can reference the data directly in the CPU domain. In

prior architectures, data had to be copied between the two spaces or page-locked

prior to use.

Allows for data to be cached by both the CPU and the GPU, and referenced by

either. In all previous generations, GPU caches had to be flushed at command

buffer boundaries prior to CPU access. And unlike discrete GPUs, the CPU

and GPU in an APU share a high speed coherent bus.

Architectural

Integration

Unified Address Space

for CPU and GPU

Fully coherent memory

between CPU & GPU

GPU uses pageable

system memory via

CPU pointers


GPU tasks can be context switched, making the GPU a multi-tasker. Context

switching means faster application, graphics and compute

interoperation. Users get a snappier, more interactive experience.

As more applications enjoy the performance and features of the GPU, it is important

that interactivity of the system is good. This means low latency access to the GPU

from any process.

With context switching and pre-emption, time criticality is added to the tasks

assigned to the processors. Direct access to the hardware for multi-users or

multiple applications are either prioritized or equalized.

FULL HSA FEATURES

System

Integration

GPU compute context

switch

Quality of service

GPU graphics pre-

emption


PROBLEM

~10M+

CPU Coders

~4M+

Apps

Good User

Experiences

Developers historically program CPUs

~100K

GPU

Coders

~200+

Apps

Significant niche

Value

GPU/HW blocks hard to program

Not all workloads accelerate

Developer

Return (Differentiation in

Performance,

Power,

Features,

Time2Market)

Developer Investment (Effort, Time, New skills)

HSA + SDKs =

Productivity & Performance with low Power

Few M HSA

Coders

Few K

Apps

Wide range of

Differentiated

Experiences

SOLUTION

UNLEASHING DEVELOPER INNOVATION


HSA SOLUTION STACK

Application SW

Drivers

Differentiated HW CPU(s) GPU(s) Other

Accelerators

HSA Finalizer

Legacy

Drivers

Application

Domain Specific Libs

(Bolt, OpenCV™, … many others)

HSA Runtime

DirectX

Runtime

Other

Runtime

HSAIL

GPU ISA

OpenCL™

Runtime

How we deliver the HSA value

proposition

Overall Vision:

– Make GPU easily accessible

Support mainstream languages

Expandable to domain specific

languages

– Make compute offload efficient

Direct path to GPU (avoid Graphics

overhead)

Eliminate memory copy

Low-latency dispatch

– Make it ubiquitous

Drive HSA as a standard through

HSA Foundation

Open Source key components

Knl Driver

Ctl

HSA Software


HSA INTERMEDIATE LAYER - HSAIL

HSAIL is a virtual ISA for parallel programs

Finalized to native ISA by a JIT compiler or “Finalizer”

Allow rapid innovations in native GPU architectures

HSAIL will be constant across implementations

Explicitly parallel

Designed for data parallel programming

Support for exceptions, virtual functions, and other high level language features

Syscall methods

GPU code can call directly to system services, IO, printf, etc

Debugging support


C++ AMP

C++ AMP: a data parallel programming model initiated by Microsoft for accelerators

First announced at the 2011 AFDS

C++ based higher level programming model with advanced C++11 features

Single source model to well integrate host and device programming

Implicit programming model that is “future proofed” to enable HSA features, e.g. avoiding

host-to-device copies

A C++ AMP implementation available from the Microsoft Visual Studio 11 suite under a beta

release


C++ AMP AND HSA

Compute-focused efficient HSA implementation to replace a graphics-centric implementation

for C++ AMP

E.g. low latency dispatch, HSAIL enabled

The shared virtual memory in HSA eliminates the data copies between host and device in

existing C++ AMP programs without any source changes.

Additional advanced C++ features on GPU, e.g.

More data types

Function calls

Virtual functions

Arbitrary control flow

Exceptional handling

Device and platform atomics


OPENCL™ AND HSA

HSA is an optimized platform architecture for OpenCL™

Not an alternative to OpenCL™

OpenCL™ on HSA will benefit from

Avoidance of wasteful copies

Low latency dispatch

Improved memory model

Pointers shared between CPU and GPU

HSA also exposes a lower level programming interface, for those that want the

ultimate in control and performance

Optimized libraries may choose the lower level interface


HSA TAKING PLATFORM TO PROGRAMMERS

Balance between CPU and GPU for performance and power efficiency

Make GPUs accessible to wider audience of programmers

Programming models close to today’s CPU programming models

Enabling more advanced language features on GPU

Shared virtual memory enables complex pointer-containing data structures (lists, trees,

etc) and hence more applications on GPU

Kernel can enqueue work to any other device in the system (e.g. GPU->GPU, GPU->CPU)

• Enabling task-graph style algorithms, Ray-Tracing, etc

Clearly defined HSA memory model enables effective reasoning for parallel programming

HSA provides a compatible architecture across a wide range of programming models and

HW implementations.


THE HSA FOUNDATION - BRINGING ABOUT THE NEXT GENERATION PLATFORM

An open standardization body to bring about broad industry support for Heterogeneous

Computing via the full value chain Silicon IP to ISV.

GPU computing as a first class co-processor to the CPU through architecture definition

Architectural support for special purpose hardware accelerators ( Rasterizer, Security

Processors, DSP, etc.)

Own and evolve the specifications and conformance suite

Bring to market strong development solutions to drive innovative advanced content and

applications

Cultivate programing talent via HSA developer training and academic programs


THANK YOU


Disclaimer & Attribution The information presented in this document is for informational purposes only and may contain technical inaccuracies,

omissions and typographical errors.

The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not

limited to product and roadmap changes, component and motherboard version changes, new model and/or product

releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the

like. There is no obligation to update or otherwise correct or revise this information. However, we reserve the right to revise

this information and to make changes from time to time to the content hereof without obligation to notify any person of such

revisions or changes.

NO REPRESENTATIONS OR WARRANTIES ARE MADE WITH RESPECT TO THE CONTENTS HEREOF AND NO

RESPONSIBILITY IS ASSUMED FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS

INFORMATION.

ALL IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE ARE

EXPRESSLY DISCLAIMED. IN NO EVENT WILL ANY LIABILITY TO ANY PERSON BE INCURRED FOR ANY DIRECT,

INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION

CONTAINED HEREIN, EVEN IF EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

AMD, the AMD arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other names

used in this presentation are for informational purposes only and may be trademarks of their respective owners.

OpenCL is a trademark of Apple Inc. used by permission by Khronos.

© 2012 Advanced Micro Devices, Inc.