Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
▣ Naums Mogers
Christophe Dubach
ARM Research Summit, September 2019
Functional Interface forPerformance Portability on Parallel Accelerators
Hardware accelerators
Architectures
Scopes
Applications
2Requirements
CPU GPU
CGRA
FPGA
ASIC
ML
Security
Server Desktop Mobile Embed
Vision
DBGraph
Energy Mem
Hardware accelerators
Architectures
Scopes
Applications
3Requirements
CPU GPU
CGRA
FPGA
ASIC
ML
Security
Server Desktop Mobile Embed
Vision
DBGraph
Energy Mem
Applications
Current landscape
XLA
TPU
BrainWaveISA
BrainWave
ArmNN
ARM ML
MDK
Movidius
HiAI
HiSiliconNPU
OpenCL
GPUsMulticore
CPUs
OpenMP VHDL
FPGAs
XLA
TPU
BrainWaveISA
BrainWave
ArmNN
ARM ML
MDK
Movidius
HiAI
HiSiliconNPU
Applications
Image Processing
Neural Networks
Graph Analytics
Data-ParallelIntermediate Language
Performance PortableCode Generator
OpenCL
GPUsMulticore
CPUs
OpenMP VHDL
FPGAs
Domain-Specific Languages (DSLs)
Language for Parallelism
Compiler Technology
What we need
6
What is the right interface for HW accelerators?
7
What is the right interface for HW accelerators?
Functional approach
Abstract
▣ Expresses algorithm (WHAT), not implementation (HOW)
8
Safe
▣ Easier to use and parallelise
High-level
▣ Captures plenty of algorithmic meta-info for analysis
Expressive
▣ Control flow
▣ Memory management
Pure
▣ Easy to transform
Composable
▣ Easier to maintain, code-reuse
XLA
TPU
BrainWaveISA
BrainWave
ArmNN
ARM ML
MDK
Movidius
HiAI
HiSiliconNPU
Applications
Image Processing
Neural Networks
Graph Analytics
Lift: Data-ParallelIntermediate Language
Lift: Rewrite rule-based compilerto HW-specific languages
OpenCL
GPUsMulticore
CPUs
OpenMP VHDL
FPGAs
Domain-Specific Languages (DSLs)
Language for Parallelism
Compiler Technology
Lift
10
Lift www.lift-project.org
▣ Functional data-parallel language and compiler
Lift IR www.lift-project.org
toGlobal
toLocal
toPrivate
toVector
toScalar
add, mul
dot
tanh
Address space operators
Casters Data operators
Int, Float
Vector
Array
Data types
Map, Reduce
Zip, Split
Scatter, Gather
Slide
Algorithmic patterns
12
Lift IR: Views www.lift-project.org
Reorder(stride(s)) >> Map(f)
▣ Virtual composable data layout transformations□ Reorder, Transpose, Slide, Slice, etc
▣ Expressed with Views▣ Help avoid extra memory writes
NO WRITES TO MEMORY
TRANSFORMED READS
Lift IR www.lift-project.org
toGlobal
toLocal
toPrivate
toVector
toScalar
add, mul
dot
tanh
Address space operators
Casters Data operators
Int, Float
Vector
Array
Data types
Map, Reduce
Zip, Split
Scatter, Gather
Slide
Algorithmic patterns
14
Lift IR www.lift-project.org
Address space operators
Casters Data operators
IR level
toGlobal
toLocal
toPrivate
toVector
toScalar
add, mul
dot
tanh
Generic
DSL
conv, lstm
blur, sharpen
select
toDRAM
toSRAM
toRegistor
MapGlobalMapLocalReduceSeq
toInt8toFloat16
VVMulMVMulVTanh
Platform-specific
Map, Reduce
Zip, Split
Scatter, Gather
Slide
Algorithmic patterns
Int, Float
Vector
Array
Data types
Vector
Matrix
Tensor
Int8Float16Float32
15
How do we achieve performance portability?
Lift: Rewrite Rules
▣ Express algorithmic implementation choices▣ Preserve semantic correctness▣ Leverage algorithmic info
▣ Decouples optimisation from code generation16
Split-join rule Map fusion rule GEMV rule
Rewrite rules
17
Lift: Rewrite Rules
IR level
▣ Split-join rule
▣ Map fusion rule
▣ Reduce rules
▣ ...
Generic
DSL
▣ Algorithm choices for high-level primitives
▣ Precision level
▣ ...
Platform-specific
▣ Using built-ins
▣ Lowering to the platform programming model
▣ ...
18
Lift: rewriting
HOW TO OPTIMISE?
19
Lift: rewriting
HARD STARTING POINT
20
Lift: rewriting Search
21
Lift: rewriting
Exploitation
Search
Built-in primitive
22
Lift: rewriting
Exploitation
Search
Built-in primitive
Parallelisation choice
23
Lift: rewriting
Exploitation
Code generation
Search
Built-in primitive
Parallelisation choice
Lift: Rewrite Rules
▣ Domain-specific and generic▣ Reusable▣ Provably correct▣ Self-contained, extensible
24
Lift: Constraint Inference
▣ Required for valid search space generation when using tuning parameters
▣ Leverages algorithmic meta-info▣ Can express heuristics and HW restrictions
25
Lift: Search Space Exploration
▣ Uniform random sampling▣ Predictor models▣ Genetic algorithms▣ …
26
Lift: Research Directions
▣ Linear algebra▣ Sparse data parallelism▣ Optimising reductions▣ Stencil computations▣ 3D wave modelling▣ High-level synthesis for FPGAs▣ Machine Learning
27
Lift for Machine Learning
▣ Machine Learning□ Convolution inference optimisation□ Platforms: Mali GPUs, BrainWave
28
Lift for Machine Learning
▣ Machine Learning□ Convolution inference optimisation□ Platforms: Mali GPUs, BrainWave
29
Lift for Machine Learning
▣ Machine Learning□ Convolution inference optimisation□ Platforms: Mali GPUs, BrainWave
30
Lift for Machine Learning
Naums Mogers, PhD student, Edinburgh
How to best exploit HW accelerators?
31
Christof Schlaak, PhD student, Edinburgh
How to generate accelerator architectures?
Lu Li, Postdoctoral Researcher, Edinburgh
How to optimise the host code?How to drive the rewriting process?
Christophe Dubach, Reader, Edinburgh
All of the above
32
Lift source code is published
https://github.com/lift-project/lift
References(icons) Noun Project, https://thenounproject.com(icons) Font Awesome, https://fontawesome.com
http://www.lift-project.org