Introduction to HSA

INTRODUCTION TO HETEROGENEOUS SYSTEM ARCHITECTURE

Presenter: BingRu Wu

Outline

◻ Introduction◻ Goal◻ Concept◻ Memory Model◻ System Components

Introduction

◻ HSA: Heterogeneous System Architecture◻ Promising future:

◻ Arm processors producers◻ GPU vendors: AMD, Imaginations

◻ Fully utilize computation resource◻ Our system may connect to major

application base with supporting HSA

Goal of HSA

◻ Remove programmability barrier◻ Memory space barrier◻ Access latency among devices

◻ Backward compatible◻ Utilize existing programming models

Concept of HSA

Abstract

◻ Two kinds of compute unit◻ LCU: Latency Compute Unit (ex. CPU)◻ TCU: Throughput Compute Unit (ex. GPU)

◻ Merged memory space

Memory Management (1/2)

◻ Shared page table◻ Memory is shared by all devices◻ No longer host to device copy and vice versa◻ Support pointer data structure (ex. list)

◻ Page faulting◻ Virtual memory space for all devices◻ ex. GPU now can use memory as if it has

whole memory space

Memory Management (2/2)

◻ Coherent memory regions◻ The memory is coherent

◻ Shared among all devices (CUs)◻ Unified address space

◻ Memory type separated by address◻ Private / local / global memory decided by

memory region◻ No special instruction is required

User-Level Command Queue

◻ Queues for communication◻ User to device◻ Device to device

◻ HSA runtime handles the queue◻ Allocation & destruction◻ Each per application◻ Vendor dependent implementation

◻ Direct access to devices◻ No OS syscall◻ No task managing

Hardware Scheduler (1/3)

◻ No real scheduling on TCU (GPU)◻ Task scheduling◻ Task preemption

◻ Current implementation◻ Execute without lock:

◻ All threads execute◻ Multiple tasks cause error result


◻ Current implementation◻ Execute with lock:

◻ Code exception may cause the resource being locked up

◻ Long runtime tasks prevent others from execution

◻ We may fail to finish critical jobs


HSA runtime guarantees:◻ Bounded execution time

◻ Any process cease in reasonable time◻ Fast switch among applications

◻ Use hardware to save time◻ Application level parallelism

HSAIL (1/2)

◻ HSA Intermediate Language◻ The language for TCU

◻ Similar to “PTX” code◻ No graphic-specific instructions◻ Further translated to HW ISA (by Finalizer)

◻ The abstract platform is similar to OpenCL◻ Work item (thread)◻ Work group (block)◻ NDRange (grid)

HSAIL (2/2)

Memory Model

◻ All types of memory using same space◻ Memory access behavior

◻ Not all regions are accessible by all devices◻ OS kernel should not be accessible◻ Mapping to a region in kernel is still possible

◻ Accessing identical address may gives different values◻ Work item private memory◻ Work group local memory◻ Accessing other item / group is not valid

Virtual Memory Address

◻ Global◻ The memory shared by all LCU & TCU◻ Accessible via work item / group

◻ Group◻ The memory shared by all work items in the

same group◻ Private

◻ The memory only visible by a work item

Memory Region

◻ Kernarg◻ The memory for kernel arguments◻ Kernel is the code fragment we ask a device

to run on◻ Readonly

◻ Read-only type of global memory◻ Spill

◻ Memory for register spill◻ Arg

◻ Memory for function call arguments

Memory Region

Memory Consistency

◻ LCU◻ LCU maintains its own consistency◻ Shares global memory

◻ Work item◻ Memory operation to same address by single

work item is in order◻ Memory operations to different address may

be reordered◻ Other than that, nothing is guaranteed

System Components

HSA System

Compilation

◻ Frontend◻ LLVM IR◻ No data dependency

◻ Backend◻ Convert IR to HSAIL◻ Optimization happens

here◻ Binary format

◻ ELF format◻ Embedded container for

HSAIL (BRIG)

Runtime

◻ HSA runtime◻ Issue tasks to device

protocol◻ Device

◻ Convert HSAIL to ISA with Finalizer

HSAIL Program Features

◻ Backward Compatible◻ A system without HSA support should still

run the executable◻ Function Invocation

◻ LCU functions may call LCU ones◻ TCU functions may call TCU ones with

Finalizer support◻ LCU to TCU / TCU to LCU is supported by

using queue◻ C++ compatible

Conclusion

◻ HSA is an open and standard layer between software / hardware

◻ The cardinal feature of HSA is the unified virtual memory space

◻ No replacement for current programming framework, no new language is required

Reference

◻ Heterogeneous System Architecture: A Technical Review

◻ HSA Programmer’s Reference Manual◻ HSAIL: Write-Once-Run-Everywhere for

Heterogeneous Systems

Engineering

Introduction to HSA