26
Generating a Fast GPU Median Filter with Haskell Michal Dobrogost June 5th, 2014 Michal Dobrogost Generating a Fast GPU Median Filter with Haskell 1 / 26

Generating a Fast GPU Median Filter with Haskell · 2014-06-07 · GPU Programming 3 Slide Crash Course 1000’s of concurrant threads Memory latency hiding Careful I Multiple threads

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Generating a Fast GPU Median Filter with Haskell · 2014-06-07 · GPU Programming 3 Slide Crash Course 1000’s of concurrant threads Memory latency hiding Careful I Multiple threads

Generating a Fast GPU Median Filter with Haskell

Michal Dobrogost

June 5th, 2014

Michal Dobrogost Generating a Fast GPU Median Filter with Haskell 1 / 26

Page 2: Generating a Fast GPU Median Filter with Haskell · 2014-06-07 · GPU Programming 3 Slide Crash Course 1000’s of concurrant threads Memory latency hiding Careful I Multiple threads

Overview

Entire approach is based on a paper by Perrot[1].

Thank you Point Grey Research

OverviewI GPU ProgrammingI Median FilterI Naive SelectionI Forgetful SelectionI DSL Extensions

Michal Dobrogost Generating a Fast GPU Median Filter with Haskell 2 / 26

Page 3: Generating a Fast GPU Median Filter with Haskell · 2014-06-07 · GPU Programming 3 Slide Crash Course 1000’s of concurrant threads Memory latency hiding Careful I Multiple threads

GPU Programming3 Slide Crash Course

1000’s of concurrant threads

Memory latency hiding

CarefulI Multiple threads share an instruction pointer (SIMT)I Memory hierarchy but no cacheI Static arrays may be far away

C like languages: CUDA, OpenCL

Michal Dobrogost Generating a Fast GPU Median Filter with Haskell 3 / 26

Page 4: Generating a Fast GPU Median Filter with Haskell · 2014-06-07 · GPU Programming 3 Slide Crash Course 1000’s of concurrant threads Memory latency hiding Careful I Multiple threads

GPU ProgrammingSingle Instruction Multiple Thread (SIMT)

int sumOfProducts(int k, int* data, int size) {

if (k == 0) { return 0; }

int accum = 0;

for (int i = 0; i < size; i++) {

accum = accum + k * data[i];

}

return accum;

}

Michal Dobrogost Generating a Fast GPU Median Filter with Haskell 4 / 26

Page 5: Generating a Fast GPU Median Filter with Haskell · 2014-06-07 · GPU Programming 3 Slide Crash Course 1000’s of concurrant threads Memory latency hiding Careful I Multiple threads

GPU ProgrammingStatic Array vs Registers

void clean(int* data) {

int cache[3];

for (int i = 0; i < 3; i++) {

cache[i] = data[i];

}

// Access cache many times

}

void fast(int* data) {

int c0, c1, c2;

c0 = data[0];

c1 = data[1];

c2 = data[2];

// Access c0,c1,c2 many times

}

Michal Dobrogost Generating a Fast GPU Median Filter with Haskell 5 / 26

Page 6: Generating a Fast GPU Median Filter with Haskell · 2014-06-07 · GPU Programming 3 Slide Crash Course 1000’s of concurrant threads Memory latency hiding Careful I Multiple threads

Median FilterA Road Median

1http://en.wikipedia.org/wiki/File:N11_dual-carriageway_median_barrier.jpg

Michal Dobrogost Generating a Fast GPU Median Filter with Haskell 6 / 26

Page 7: Generating a Fast GPU Median Filter with Haskell · 2014-06-07 · GPU Programming 3 Slide Crash Course 1000’s of concurrant threads Memory latency hiding Careful I Multiple threads

Median FilterCorrupted Image

Michal Dobrogost Generating a Fast GPU Median Filter with Haskell 7 / 26

Page 8: Generating a Fast GPU Median Filter with Haskell · 2014-06-07 · GPU Programming 3 Slide Crash Course 1000’s of concurrant threads Memory latency hiding Careful I Multiple threads

Median FilterSalt and Pepper Noise

Michal Dobrogost Generating a Fast GPU Median Filter with Haskell 8 / 26

Page 9: Generating a Fast GPU Median Filter with Haskell · 2014-06-07 · GPU Programming 3 Slide Crash Course 1000’s of concurrant threads Memory latency hiding Careful I Multiple threads

Median FilterRestored Image

Michal Dobrogost Generating a Fast GPU Median Filter with Haskell 9 / 26

Page 10: Generating a Fast GPU Median Filter with Haskell · 2014-06-07 · GPU Programming 3 Slide Crash Course 1000’s of concurrant threads Memory latency hiding Careful I Multiple threads

Median FilterHow it works

For each pixel take a 5x5 window

Linearize pixel intensities into a single collection

Find the median value

Michal Dobrogost Generating a Fast GPU Median Filter with Haskell 10 / 26

Page 11: Generating a Fast GPU Median Filter with Haskell · 2014-06-07 · GPU Programming 3 Slide Crash Course 1000’s of concurrant threads Memory latency hiding Careful I Multiple threads

Naive SelectionSorting Networks

Michal Dobrogost Generating a Fast GPU Median Filter with Haskell 11 / 26

Page 12: Generating a Fast GPU Median Filter with Haskell · 2014-06-07 · GPU Programming 3 Slide Crash Course 1000’s of concurrant threads Memory latency hiding Careful I Multiple threads

Naive SelectionDomain Specific Language

[(0,1),(1,2),(2,3),(3,4),(4,5),

(0,1),(1,2),(2,3),(3,4),

(0,1),(1,2),(2,3),

(0,1),(1,2),

(0,1)]

Michal Dobrogost Generating a Fast GPU Median Filter with Haskell 12 / 26

Page 13: Generating a Fast GPU Median Filter with Haskell · 2014-06-07 · GPU Programming 3 Slide Crash Course 1000’s of concurrant threads Memory latency hiding Careful I Multiple threads

Naive SelectionLeverage Parent Language

link :: [Int] -> [(Int, Int)]

link xs = zip xs $ tail xs

bubble :: Int -> [(Int,Int)]

bubble n = concatMap iter [n-1,n-2..1]

where iter k = link [0..k]

Michal Dobrogost Generating a Fast GPU Median Filter with Haskell 13 / 26

Page 14: Generating a Fast GPU Median Filter with Haskell · 2014-06-07 · GPU Programming 3 Slide Crash Course 1000’s of concurrant threads Memory latency hiding Careful I Multiple threads

Naive SelectionC “compiler”

swap :: String -> String -> String

swap x y = "T tmp(" ++ x ++ "); " ++

x ++ " = " ++ y ++ "; " ++

y ++ " = tmp;"

printC :: [(Int,Int)] -> String

printC [] = ""

printC ((x,y):xs) = res ++ printC xs

where x’ = "x" ++ show x

y’ = "x" ++ show y

res = "if (" ++ x’ ++ " > " ++ y’ ++ ") { " ++

swap x’ y’ ++

" }\\n"

Michal Dobrogost Generating a Fast GPU Median Filter with Haskell 14 / 26

Page 15: Generating a Fast GPU Median Filter with Haskell · 2014-06-07 · GPU Programming 3 Slide Crash Course 1000’s of concurrant threads Memory latency hiding Careful I Multiple threads

Naive SelectionGenerated C Code

if (x0 > x1) { T tmp(x0); x0 = x1; x1 = tmp; }

if (x1 > x2) { T tmp(x1); x1 = x2; x2 = tmp; }

if (x2 > x3) { T tmp(x2); x2 = x3; x3 = tmp; }

if (x3 > x4) { T tmp(x3); x3 = x4; x4 = tmp; }

if (x4 > x5) { T tmp(x4); x4 = x5; x5 = tmp; }

if (x0 > x1) { T tmp(x0); x0 = x1; x1 = tmp; }

if (x1 > x2) { T tmp(x1); x1 = x2; x2 = tmp; }

if (x2 > x3) { T tmp(x2); x2 = x3; x3 = tmp; }

if (x3 > x4) { T tmp(x3); x3 = x4; x4 = tmp; }

if (x0 > x1) { T tmp(x0); x0 = x1; x1 = tmp; }

if (x1 > x2) { T tmp(x1); x1 = x2; x2 = tmp; }

if (x2 > x3) { T tmp(x2); x2 = x3; x3 = tmp; }

if (x0 > x1) { T tmp(x0); x0 = x1; x1 = tmp; }

if (x1 > x2) { T tmp(x1); x1 = x2; x2 = tmp; }

if (x0 > x1) { T tmp(x0); x0 = x1; x1 = tmp; }

Michal Dobrogost Generating a Fast GPU Median Filter with Haskell 15 / 26

Page 16: Generating a Fast GPU Median Filter with Haskell · 2014-06-07 · GPU Programming 3 Slide Crash Course 1000’s of concurrant threads Memory latency hiding Careful I Multiple threads

Naive SelectionLATEX “compiler”

printTex :: [(Int,Int)] -> String

printTex comps =

"\\documentclass[tikz]{standalone}\n" ++

"\\usepackage{tikz}\n" ++

"\\begin{document}\n" ++

"\\begin{tikzpicture}\n" ++

verticalWires ++ connections ++

"\\end{tikzpicture}\n" ++

"\\end{document}\n"

where

verticalWires = ...

connections = ...

Michal Dobrogost Generating a Fast GPU Median Filter with Haskell 16 / 26

Page 17: Generating a Fast GPU Median Filter with Haskell · 2014-06-07 · GPU Programming 3 Slide Crash Course 1000’s of concurrant threads Memory latency hiding Careful I Multiple threads

Forgetful SelectionTrace on a 3x3 Window

starting point is 5 = ceil(9/2)

/

Data: 0 1 2 3 4 5 6 7 8

| | | | | | | | |

min x x x x max | | |

| | | | | | |

| | | | /---/ | |

min x x x max | |

| | | | |

| | | /-------/ |

min x x max |

| | |

| | /-----------/

min x max

\

medianMichal Dobrogost Generating a Fast GPU Median Filter with Haskell 17 / 26

Page 18: Generating a Fast GPU Median Filter with Haskell · 2014-06-07 · GPU Programming 3 Slide Crash Course 1000’s of concurrant threads Memory latency hiding Careful I Multiple threads

Forgetful SelectionRow Step — Basic

step :: [Int] -> [(Int,Int)]

step ixs = (\x -> x ++ reverse x) $ link ixs

-- step [0..5]

Michal Dobrogost Generating a Fast GPU Median Filter with Haskell 18 / 26

Page 19: Generating a Fast GPU Median Filter with Haskell · 2014-06-07 · GPU Programming 3 Slide Crash Course 1000’s of concurrant threads Memory latency hiding Careful I Multiple threads

Forgetful SelectionRow Step — No Duplicates

-- group :: Eq a => [a] -> [[a]]

noDup :: Eq a => [a] -> [a]

noDup xs = map head $ group xs

step :: [Int] -> [(Int,Int)]

step ixs = noDup $ (\x -> x ++ reverse x) $ link ixs

Michal Dobrogost Generating a Fast GPU Median Filter with Haskell 19 / 26

Page 20: Generating a Fast GPU Median Filter with Haskell · 2014-06-07 · GPU Programming 3 Slide Crash Course 1000’s of concurrant threads Memory latency hiding Careful I Multiple threads

Forgetful SelectionRow Step — Minimize Dependancies

interleave :: [a] -> [a] -> [a]

interleave [] ys = ys

interleave (x:xs) ys = x : interleave’ xs ys

where interleave’ xs [] = xs

interleave’ xs (y:ys) = y : interleave xs ys

step :: [Int] -> [(Int,Int)]

step ixs = noDup $ (\x -> interleave x $ reverse x) $ link ixs

Michal Dobrogost Generating a Fast GPU Median Filter with Haskell 20 / 26

Page 21: Generating a Fast GPU Median Filter with Haskell · 2014-06-07 · GPU Programming 3 Slide Crash Course 1000’s of concurrant threads Memory latency hiding Careful I Multiple threads

Forgetful SelectionFull Algorithm

forgetful :: Int -> [(Int,Int)]

forgetful n = concatMap step rows

where (row0, rest) = splitAt ((n+1) ‘div‘ 2) [0..n-1]

rows = zipWith (\r x -> r ++ [x]) (tails row0) rest

Michal Dobrogost Generating a Fast GPU Median Filter with Haskell 21 / 26

Page 22: Generating a Fast GPU Median Filter with Haskell · 2014-06-07 · GPU Programming 3 Slide Crash Course 1000’s of concurrant threads Memory latency hiding Careful I Multiple threads

Forgetful SelectionComparison on a 3x3 Window

Forgetful Select (24) Bubble Sort (36)

Michal Dobrogost Generating a Fast GPU Median Filter with Haskell 22 / 26

Page 23: Generating a Fast GPU Median Filter with Haskell · 2014-06-07 · GPU Programming 3 Slide Crash Course 1000’s of concurrant threads Memory latency hiding Careful I Multiple threads

DSL ExtensionsLate Loads

data DSL = Comp Int Int | Load Int Int

mkComp :: (Int,Int) -> DSL

mkComp (x,y) = Comp x y

genLoads :: (Int,Int) -> Int -> [DSL] -> [DSL]

genLoads (winX,winY) stride instrs = ...

printC :: [DSL] -> String

Michal Dobrogost Generating a Fast GPU Median Filter with Haskell 23 / 26

Page 24: Generating a Fast GPU Median Filter with Haskell · 2014-06-07 · GPU Programming 3 Slide Crash Course 1000’s of concurrant threads Memory latency hiding Careful I Multiple threads

Conclusion

Haskell can be extremely effective for throw away scripting

Suitable datatypes go a long way

Leverage Haskell to boost your DSL

Original implementationI ∼ 4 hoursI 28 linesI forgetful was a monster

Presentation codeI Time hard to judge – shared with presentationI 22 linesI link is the key ingredient

Michal Dobrogost Generating a Fast GPU Median Filter with Haskell 24 / 26

Page 25: Generating a Fast GPU Median Filter with Haskell · 2014-06-07 · GPU Programming 3 Slide Crash Course 1000’s of concurrant threads Memory latency hiding Careful I Multiple threads

References

Gilles Perrot, Stphane Domas, Raphal Couturier Fine-tunedHigh-speed Implementation of a GPU-based Median Filter. Journal ofSignal Processing Systems, 2013-07-05.

Michal Dobrogost Generating a Fast GPU Median Filter with Haskell 25 / 26

Page 26: Generating a Fast GPU Median Filter with Haskell · 2014-06-07 · GPU Programming 3 Slide Crash Course 1000’s of concurrant threads Memory latency hiding Careful I Multiple threads

THANK YOU

Michal Dobrogost Generating a Fast GPU Median Filter with Haskell 26 / 26