Upload
steve-severance
View
4.363
Download
3
Embed Size (px)
DESCRIPTION
My slides from Haskell Hackers at Hacker Dojo on 10/16/2014.
Citation preview
GPU Programming with HaskellSteve Severance [email protected]
Outline
Introduction to GPUs
When to use a GPU instead of a CPU
Using a GPU with accelerate
Building an options pricer
What is a GPU?
Graphics Processing Unit
Hundreds or Thousands of Cores
High Memory Throughput
Fully Programmable
GPU Architecture
Single Instruction Multiple Data (SIMD)
High Throughput Thread Scheduler
Interleaving Operations
GPU Architecture
CPU MemoryGPU
16GB/s
GPU Circa 1999
Geforce 256
Accelerated Graphics Port (AGP)
Hardware Transform and Lighting (TnL)
Fixed Function Pipeline
GPU Circa 2001
Geforce 3/R200/XBox
First Pixel/Vertex Shaders
Limited C-like Language
GPU Circa 2014
Fully Programmable
Unified Memory
Rich High Level Languages/Tools
GPU Tradeoffs
Limited branching
Limited Memory
High Latency
GPU vs CPU
GPU is about throughput
CPU is about flexibility and latency
Programmability
CUDA
OpenCL
DirectCompute
GPU Problems
Non-branching algorithms
Matrix (cudaBLAS)
Deep Learning
Options Pricing
Can I run GPU Programs?
accelerate requires CUDA
OpenCL is a low level OpenCL wrapper
NVidia CUDA Tools (https://developer.nvidia.com/cuda-toolkit)
Introducing Accelerate
DSL for Parallel Code
Primarily CUDA, Also LLVM
Compiler lowers into CUDA code
Accelerate Basics
Acc is our DSL type. Holds the Abstract Syntax Tree (AST) of our computation
Familiar operators replace Prelude (fold,map,zip,etc…)
Accelerate Basics
Creating a Computation
Acc (Array DIM1 Float) -> Acc (Array DIM1 Float)
Running a Computation
run :: Arrays a => Acc a -> a
Arrays
data Array sh e
Comprised of both a Shape and an Element (Elt)
Elt instances for common numeric types and tuples
Arrays can be multi-dimensional, but not nested
Array Shapes
Z is a Rank-0
:. Operator Increases the Rank by One Dimension
DIM1, DIM2, DIM3, etc…
Computations
Acc is a computation on an array
Exp is a computation on an element
Exp can also be used to pass constants
What run is going to do
Compile our Program
Copy Data to GPU
Execute Program
Copy Results Back to Memory
Black-Sholes
Partial Differential Equation to Compute the Price of an Option
Massive Performance Boost on a GPU
Bloomberg Uses GPUs to compute Options Prices
Code/Demo Time
Summary
lift/unlift
use adds an Array to the computation
constant wraps constants
map does what map always does
What next?
accelerate has a rich API
Slices
Aggregation
Recursion
Stencils
Thanks
Nathan Howell
The accelerate Team
You for listening
Further Reading
https://speakerdeck.com/tmcdonell/gpgpu-programming-in-haskell-with-accelerate
http://hackage.haskell.org/package/accelerate
http://quantlib-gpu.sourceforge.net/AcceleratingFinancialApplicationsOnTheGPU-paper.pdf