DHihlPlllICtDttiDamascene: Highly Parallel Image Contour...

Preview:

Citation preview

D Hi hl P ll l I C t D t tiDamascene: Highly Parallel Image Contour DetectionDamascene: Highly Parallel Image Contour Detectiong y gB C t N S d B Yii S Y L M k M h K t K tBryan Catanzaro, Narayanan Sundaram, Bor‐Yiing Su, Yunsup Lee, Mark Murphy, Kurt Keutzery , y , g , p , p y,

I C D i Program Flow Overall PerformanceImage Contour Detection Program Flow Overall PerformanceImage Contour Detection

Image Size: 481 by 321, 154401 pixels in totalI t d t ti i f d t l t iConvert Color Space Textons: K meansImage

Image Size: 481 by 321, 154401 pixels in totalCPU/GPU Runtime: 236 7s/2 081s

Image contour detection is fundamental to image Convert Color Space Textons: K-meansg CPU/GPU Runtime:  236.7s/2.081ssegmentation and many other computer vision problems

Speedup: 114xsegmentation and many other computer vision problems

CPU/GPU: 0.09/0.001 (s) 90xCPU/GPU: 0.09/0.001 (s) 90x CPU/GPU: 8.58/0.169(s) 51xCPU/GPU: 8.58/0.169(s) 51x p p

L A B Convert 

Kmeans

Local CuesLocal Cues

Combine

Nonmax

Intervening

Local CuesLocal CuesEigensolver

Local CuesCPU/GPU: 53 18/0 78(s) 68x

Local CuesCPU/GPU: 53 18/0 78(s) 68x

OE

CombineCPU/GPU: 53.18/0.78(s) 68xCPU/GPU: 53.18/0.78(s) 68x Combine

BG CGA CGB TGImage Human Generated Machine Generated BG CGA CGB TGgContours Contours CPU R i GPU RuntimeContours Contours CPU Runtime GPU Runtime

R=3 R=5 R=5 R=5R=3 R=5 R=5 R=5

Precision‐Recall GraphgPb Algorithm: Current Leader Precision‐Recall GraphgPb Algorithm: Current LeaderR=5 R=10 R=10 R=10

g g

R=10 R=20 R=20 R=20global Probability of  CVPR 2008 damasceneR=10 R=20 R=20 R=20g yboundary [Maire   1boundary [Maire, A b l  F lk  

1

We achieve slightly better Arbelaez, Fowlkes, 

We achieve slightly better     h  B k l  

Malik, CVPR 2008 ]accuracy on the Berkeley 

, ]Currently  the most  0.8y y

Segmentation DatasetCurrently, the most 

t  i   t   Combine Non-max Suppression Intervening ContourSegmentation Datasetaccurate image contour  Combine Non max Suppression Intervening Contour

Comparing to human detector

n

p gsegmented “ground truth”5 8mins per small image 

0.6

onCPU/GPU: CPU/GPU: CPU/GPU: CPU/GPU: CPU/GPU: CPU/GPU: segmented  ground truthF meas re 0 70 for both

5.8mins per small image (481 b 321) li it  it   is

io

277/0.833(ms)277/0.833(ms) 130/0.31(ms)130/0.31(ms) 6.32/0.034 (s)6.32/0.034 (s) F‐measure 0.70 for both(481 by 321) limits its  eci277/0.833(ms)

332x277/0.833(ms)

332x130/0.31(ms)

419x130/0.31(ms)

419x6.32/0.034 (s)

185x6.32/0.034 (s)

185xapplicability 0.4Pr

e332x332x 419x419x 185x185xpp yToo slow for interactive 

P

Too slow for interactive h t   ditiphoto editing

Too slow even for Image  0.2

Generlaized Eigen SolverToo slow even for Image Retrieval Generlaized Eigen SolverRetrieval

0CPU/GPU: 151.2/0.81 (s) 186xCPU/GPU: 151.2/0.81 (s) 186x 0

0 0.2 0.4 0.6 0.8 1

CPU/GPU: 151.2/0.81 (s) 186xCPU/GPU: 151.2/0.81 (s) 186x

RecallRecall

Platform: Nvidia GTX200 SeriesPlatform: Nvidia GTX200 Series

ConclusionOriented Energy Combination ConclusionOriented Energy CombinationS ifi ti GTX280Specifications GTX280 CPU/GPU: 2300/16.5 (ms) 140xCPU/GPU: 2300/16.5 (ms) 140xProcessors 30 @ 1.3

( )( )

GHzPhysical SIMD Width 8 • Damascene provides highest quality image contour Physical SIMD Width 8SP GFLOPS 933

• Damascene provides highest quality image contour d i       bl  SP GFLOPS 933 detection at user acceptable rates

Memory Bandwidth 141.7 GB/sp

• It demonstrates the transformational speedup yRegister File 1 875 MB

It demonstrates the transformational speedup t ti l  f  hit tRegister File 1.875 MB

L l S 480 kBpotential of manycore architectures

Local Store 480 kB • Damascene was enabled by the collaborative Global PbCombine, NormalizeMemory 1 GB

Damascene was enabled by the collaborative environment at the Berkeley UPCRCGlobal PbCombine, Normalizey environment at the Berkeley UPCRC

• Future work will generalize Damascene into a case CPU/GPU: CPU/GPU: gstudy for application and programming frameworks 

CPU/GPU: 437/0.241 (ms)

CPU/GPU: 437/0.241 (ms) study for application and programming frameworks 

d437/0.241 (ms)

1813x437/0.241 (ms)

1813x (stay tuned)1813x1813x ( y )

Recommended