27
GPU Programming with Haskell Steve Severance [email protected]

Haskell Accelerate

Embed Size (px)

DESCRIPTION

My slides from Haskell Hackers at Hacker Dojo on 10/16/2014.

Citation preview

Page 1: Haskell  Accelerate

GPU Programming with HaskellSteve Severance [email protected]

Page 2: Haskell  Accelerate

Outline

Introduction to GPUs

When to use a GPU instead of a CPU

Using a GPU with accelerate

Building an options pricer

Page 3: Haskell  Accelerate

What is a GPU?

Graphics Processing Unit

Hundreds or Thousands of Cores

High Memory Throughput

Fully Programmable

Page 4: Haskell  Accelerate

GPU Architecture

Single Instruction Multiple Data (SIMD)

High Throughput Thread Scheduler

Interleaving Operations

Page 5: Haskell  Accelerate

GPU Architecture

CPU MemoryGPU

16GB/s

Page 6: Haskell  Accelerate

GPU Circa 1999

Geforce 256

Accelerated Graphics Port (AGP)

Hardware Transform and Lighting (TnL)

Fixed Function Pipeline

Page 7: Haskell  Accelerate

GPU Circa 2001

Geforce 3/R200/XBox

First Pixel/Vertex Shaders

Limited C-like Language

Page 8: Haskell  Accelerate

GPU Circa 2014

Fully Programmable

Unified Memory

Rich High Level Languages/Tools

Page 9: Haskell  Accelerate

GPU Tradeoffs

Limited branching

Limited Memory

High Latency

Page 10: Haskell  Accelerate

GPU vs CPU

GPU is about throughput

CPU is about flexibility and latency

Page 11: Haskell  Accelerate

Programmability

CUDA

OpenCL

DirectCompute

Page 12: Haskell  Accelerate

GPU Problems

Non-branching algorithms

Matrix (cudaBLAS)

Deep Learning

Options Pricing

Page 13: Haskell  Accelerate

Can I run GPU Programs?

accelerate requires CUDA

OpenCL is a low level OpenCL wrapper

NVidia CUDA Tools (https://developer.nvidia.com/cuda-toolkit)

Page 14: Haskell  Accelerate

Introducing Accelerate

DSL for Parallel Code

Primarily CUDA, Also LLVM

Compiler lowers into CUDA code

Page 15: Haskell  Accelerate

Accelerate Basics

Acc is our DSL type. Holds the Abstract Syntax Tree (AST) of our computation

Familiar operators replace Prelude (fold,map,zip,etc…)

Page 16: Haskell  Accelerate

Accelerate Basics

Creating a Computation

Acc (Array DIM1 Float) -> Acc (Array DIM1 Float)

Running a Computation

run :: Arrays a => Acc a -> a

Page 17: Haskell  Accelerate

Arrays

data Array sh e

Comprised of both a Shape and an Element (Elt)

Elt instances for common numeric types and tuples

Arrays can be multi-dimensional, but not nested

Page 18: Haskell  Accelerate

Array Shapes

Z is a Rank-0

:. Operator Increases the Rank by One Dimension

DIM1, DIM2, DIM3, etc…

Page 19: Haskell  Accelerate

Computations

Acc is a computation on an array

Exp is a computation on an element

Exp can also be used to pass constants

Page 20: Haskell  Accelerate

What run is going to do

Compile our Program

Copy Data to GPU

Execute Program

Copy Results Back to Memory

Page 21: Haskell  Accelerate

Black-Sholes

Partial Differential Equation to Compute the Price of an Option

Massive Performance Boost on a GPU

Bloomberg Uses GPUs to compute Options Prices

Page 22: Haskell  Accelerate

Black-Sholes Equation

Stolen from investopedia.com

Page 23: Haskell  Accelerate

Code/Demo Time

Page 24: Haskell  Accelerate

Summary

lift/unlift

use adds an Array to the computation

constant wraps constants

map does what map always does

Page 25: Haskell  Accelerate

What next?

accelerate has a rich API

Slices

Aggregation

Recursion

Stencils

Page 26: Haskell  Accelerate

Thanks

Nathan Howell

The accelerate Team

You for listening

Page 27: Haskell  Accelerate

Further Reading

https://speakerdeck.com/tmcdonell/gpgpu-programming-in-haskell-with-accelerate

http://hackage.haskell.org/package/accelerate

http://quantlib-gpu.sourceforge.net/AcceleratingFinancialApplicationsOnTheGPU-paper.pdf