26
INTRODUCTION TO HETEROGENEOUS SYSTEM ARCHITECTURE Presenter: BingRu Wu

Introduction to HSA

  • Upload
    -

  • View
    94

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Introduction to HSA

INTRODUCTION TO HETEROGENEOUS SYSTEM ARCHITECTURE

Presenter: BingRu Wu

Page 2: Introduction to HSA

Outline

◻ Introduction◻ Goal◻ Concept◻ Memory Model◻ System Components

Page 3: Introduction to HSA

Introduction

◻ HSA: Heterogeneous System Architecture◻ Promising future:

◻ Arm processors producers◻ GPU vendors: AMD, Imaginations

◻ Fully utilize computation resource◻ Our system may connect to major

application base with supporting HSA

Page 4: Introduction to HSA

Goal of HSA

◻ Remove programmability barrier◻ Memory space barrier◻ Access latency among devices

◻ Backward compatible◻ Utilize existing programming models

Page 5: Introduction to HSA

Concept of HSA

Page 6: Introduction to HSA

Abstract

◻ Two kinds of compute unit◻ LCU: Latency Compute Unit (ex. CPU)◻ TCU: Throughput Compute Unit (ex. GPU)

◻ Merged memory space

Page 7: Introduction to HSA

Memory Management (1/2)

◻ Shared page table◻ Memory is shared by all devices◻ No longer host to device copy and vice versa◻ Support pointer data structure (ex. list)

◻ Page faulting◻ Virtual memory space for all devices◻ ex. GPU now can use memory as if it has

whole memory space

Page 8: Introduction to HSA

Memory Management (2/2)

◻ Coherent memory regions◻ The memory is coherent

◻ Shared among all devices (CUs)◻ Unified address space

◻ Memory type separated by address◻ Private / local / global memory decided by

memory region◻ No special instruction is required

Page 9: Introduction to HSA

User-Level Command Queue

◻ Queues for communication◻ User to device◻ Device to device

◻ HSA runtime handles the queue◻ Allocation & destruction◻ Each per application◻ Vendor dependent implementation

◻ Direct access to devices◻ No OS syscall◻ No task managing

Page 10: Introduction to HSA

Hardware Scheduler (1/3)

◻ No real scheduling on TCU (GPU)◻ Task scheduling◻ Task preemption

◻ Current implementation◻ Execute without lock:

◻ All threads execute◻ Multiple tasks cause error result

Page 11: Introduction to HSA

Hardware Scheduler (2/3)

◻ Current implementation◻ Execute with lock:

◻ Code exception may cause the resource being locked up

◻ Long runtime tasks prevent others from execution

◻ We may fail to finish critical jobs

Page 12: Introduction to HSA

Hardware Scheduler (3/3)

HSA runtime guarantees:◻ Bounded execution time

◻ Any process cease in reasonable time◻ Fast switch among applications

◻ Use hardware to save time◻ Application level parallelism

Page 13: Introduction to HSA

HSAIL (1/2)

◻ HSA Intermediate Language◻ The language for TCU

◻ Similar to “PTX” code◻ No graphic-specific instructions◻ Further translated to HW ISA (by Finalizer)

◻ The abstract platform is similar to OpenCL◻ Work item (thread)◻ Work group (block)◻ NDRange (grid)

Page 14: Introduction to HSA

HSAIL (2/2)

Page 15: Introduction to HSA

Memory Model

Page 16: Introduction to HSA

◻ All types of memory using same space◻ Memory access behavior

◻ Not all regions are accessible by all devices◻ OS kernel should not be accessible◻ Mapping to a region in kernel is still possible

◻ Accessing identical address may gives different values◻ Work item private memory◻ Work group local memory◻ Accessing other item / group is not valid

Virtual Memory Address

Page 17: Introduction to HSA

◻ Global◻ The memory shared by all LCU & TCU◻ Accessible via work item / group

◻ Group◻ The memory shared by all work items in the

same group◻ Private

◻ The memory only visible by a work item

Memory Region

Page 18: Introduction to HSA

◻ Kernarg◻ The memory for kernel arguments◻ Kernel is the code fragment we ask a device

to run on◻ Readonly

◻ Read-only type of global memory◻ Spill

◻ Memory for register spill◻ Arg

◻ Memory for function call arguments

Memory Region

Page 19: Introduction to HSA

Memory Consistency

◻ LCU◻ LCU maintains its own consistency◻ Shares global memory

◻ Work item◻ Memory operation to same address by single

work item is in order◻ Memory operations to different address may

be reordered◻ Other than that, nothing is guaranteed

Page 20: Introduction to HSA

System Components

Page 21: Introduction to HSA

HSA System

Page 22: Introduction to HSA

Compilation

◻ Frontend◻ LLVM IR◻ No data dependency

◻ Backend◻ Convert IR to HSAIL◻ Optimization happens

here◻ Binary format

◻ ELF format◻ Embedded container for

HSAIL (BRIG)

Page 23: Introduction to HSA

Runtime

◻ HSA runtime◻ Issue tasks to device

protocol◻ Device

◻ Convert HSAIL to ISA with Finalizer

Page 24: Introduction to HSA

HSAIL Program Features

◻ Backward Compatible◻ A system without HSA support should still

run the executable◻ Function Invocation

◻ LCU functions may call LCU ones◻ TCU functions may call TCU ones with

Finalizer support◻ LCU to TCU / TCU to LCU is supported by

using queue◻ C++ compatible

Page 25: Introduction to HSA

Conclusion

◻ HSA is an open and standard layer between software / hardware

◻ The cardinal feature of HSA is the unified virtual memory space

◻ No replacement for current programming framework, no new language is required

Page 26: Introduction to HSA

Reference

◻ Heterogeneous System Architecture: A Technical Review

◻ HSA Programmer’s Reference Manual◻ HSAIL: Write-Once-Run-Everywhere for

Heterogeneous Systems