1
D Hi hl P ll lI C t Dt ti Damascene: Highly Parallel Image Contour Detection Damascene: Highly Parallel Image Contour Detection B Ct N S d B Yii S Y L M kM h K t K t Bryan Catanzaro, Narayanan Sundaram, BorYiing Su, Yunsup Lee, Mark Murphy, Kurt Keutzer I C D i Program Flow Overall Performance Image Contour Detection Program Flow Overall Performance Image Contour Detection Image Size: 481 by 321, 154401 pixels in total I t dt ti if d t lt i Convert Color Space Textons:K means Image Image Size: 481 by 321, 154401 pixels in total CPU/GPU Runtime: 236 7s/2 081s Image contour detection is fundamental to image Convert Color Space Textons: K-means CPU/GPU Runtime: 236.7s/2.081s segmentation and many other computer vision problems Speedup: 114x segmentation and many other computer vision problems CPU/GPU: 0.09/0.001 (s) 90x CPU/GPU: 0.09/0.001 (s) 90x CPU/GPU: 8.58/0.169(s) 51x CPU/GPU: 8.58/0.169(s) 51x L A B Convert Kmeans Local Cues Local Cues Combine Nonmax Intervening Local Cues Local Cues Eigensolver Local Cues CPU/GPU: 53 18/0 78(s) 68x Local Cues CPU/GPU: 53 18/0 78(s) 68x OE Combine CPU/GPU: 53.18/0.78(s) 68x CPU/GPU: 53.18/0.78(s) 68x Combine BG CGA CGB TG Image Human Generated Machine Generated BG CGA CGB TG Contours Contours CPU R i GPU Runtime Contours Contours CPU Runtime GPU Runtime R=3 R=5 R=5 R=5 R=3 R=5 R=5 R=5 PrecisionRecall Graph gPb Algorithm: Current Leader PrecisionRecall Graph gPb Algorithm: Current Leader R=5 R=10 R=10 R=10 R=10 R=20 R=20 R=20 global Probability of CVPR 2008 damascene R=10 R=20 R=20 R=20 boundary [Maire 1 boundary [Maire, Abl F lk 1 We achieve slightly better Arbelaez, Fowlkes, We achieve slightly better h B kl Malik, CVPR 2008 ] accuracy on the Berkeley Currently the most 0.8 Segmentation Dataset Currently, the most t i t Combine Non-max Suppression Intervening Contour Segmentation Dataset accurate image contour Combine Non max Suppression Intervening Contour Comparing to human detector n segmented “ground truth” 58 mins per small image 0.6 on CPU/GPU: CPU/GPU: CPU/GPU: CPU/GPU: CPU/GPU: CPU/GPU: segmented ground truth F meas re 0 70 for both 5.8 mins per small image (481 b 321) li it it isio 277/0.833(ms) 277/0.833(ms) 130/0.31(ms) 130/0.31(ms) 6.32/0.034 (s) 6.32/0.034 (s) Fmeasure 0.70 for both (481 by 321) limits its eci 277/0.833(ms) 332x 277/0.833(ms) 332x 130/0.31(ms) 419x 130/0.31(ms) 419x 6.32/0.034 (s) 185x 6.32/0.034 (s) 185x applicability 0.4 Pre 332x 332x 419x 419x 185x 185x Too slow for interactive P Too slow for interactive ht diti photo editing Too slow even for Image 0.2 Generlaized Eigen Solver Too slow even for Image Retrieval Generlaized Eigen Solver Retrieval 0 CPU/GPU: 151.2/0.81 (s) 186x CPU/GPU: 151.2/0.81 (s) 186x 0 0 0.2 0.4 0.6 0.8 1 CPU/GPU: 151.2/0.81 (s) 186x CPU/GPU: 151.2/0.81 (s) 186x Recall Recall Platform: Nvidia GTX200 Series Platform: Nvidia GTX200 Series Conclusion Oriented Energy Combination Conclusion Oriented Energy Combination S ifi ti GTX280 Specifications GTX280 CPU/GPU: 2300/16.5 (ms) 140x CPU/GPU: 2300/16.5 (ms) 140x Processors 30 @ 1.3 GHz Physical SIMD Width 8 Damascene provides highest quality image contour Physical SIMD Width 8 SP GFLOPS 933 Damascene provides highest quality image contour d i bl SP GFLOPS 933 detection at user acceptable rates Memory Bandwidth 141.7 GB/s It demonstrates the transformational speedup Register File 1 875 MB It demonstrates the transformational speedup t ti l f hit t Register File 1.875 MB L lS 480 kB potential of manycore architectures Local Store 480 kB Damascene was enabled by the collaborative Global Pb Combine, Normalize Memory 1 GB Damascene was enabled by the collaborative environment at the Berkeley UPCRC Global Pb Combine, Normalize environment at the Berkeley UPCRC Future work will generalize Damascene into a case CPU/GPU: CPU/GPU: study for application and programming frameworks 437/0.241 (ms) 437/0.241 (ms) study for application and programming frameworks d 437/0.241 (ms) 1813x 437/0.241 (ms) 1813x (stay tuned) 1813x 1813x

DHihlPlllICtDttiDamascene: Highly Parallel Image Contour …parlab.eecs.berkeley.edu/sites/all/parlab/files/... · 2012. 2. 23. · DHihlPlllICtDttiDamascene: Highly Parallel Image

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DHihlPlllICtDttiDamascene: Highly Parallel Image Contour …parlab.eecs.berkeley.edu/sites/all/parlab/files/... · 2012. 2. 23. · DHihlPlllICtDttiDamascene: Highly Parallel Image

D Hi hl P ll l I C t D t tiDamascene: Highly Parallel Image Contour DetectionDamascene: Highly Parallel Image Contour Detectiong y gB C t N S d B Yii S Y L M k M h K t K tBryan Catanzaro, Narayanan Sundaram, Bor‐Yiing Su, Yunsup Lee, Mark Murphy, Kurt Keutzery , y , g , p , p y,

I C D i Program Flow Overall PerformanceImage Contour Detection Program Flow Overall PerformanceImage Contour Detection

Image Size: 481 by 321, 154401 pixels in totalI t d t ti i f d t l t iConvert Color Space Textons: K meansImage

Image Size: 481 by 321, 154401 pixels in totalCPU/GPU Runtime: 236 7s/2 081s

Image contour detection is fundamental to image Convert Color Space Textons: K-meansg CPU/GPU Runtime:  236.7s/2.081ssegmentation and many other computer vision problems

Speedup: 114xsegmentation and many other computer vision problems

CPU/GPU: 0.09/0.001 (s) 90xCPU/GPU: 0.09/0.001 (s) 90x CPU/GPU: 8.58/0.169(s) 51xCPU/GPU: 8.58/0.169(s) 51x p p

L A B Convert 

Kmeans

Local CuesLocal Cues

Combine

Nonmax

Intervening

Local CuesLocal CuesEigensolver

Local CuesCPU/GPU: 53 18/0 78(s) 68x

Local CuesCPU/GPU: 53 18/0 78(s) 68x

OE

CombineCPU/GPU: 53.18/0.78(s) 68xCPU/GPU: 53.18/0.78(s) 68x Combine

BG CGA CGB TGImage Human Generated Machine Generated BG CGA CGB TGgContours Contours CPU R i GPU RuntimeContours Contours CPU Runtime GPU Runtime

R=3 R=5 R=5 R=5R=3 R=5 R=5 R=5

Precision‐Recall GraphgPb Algorithm: Current Leader Precision‐Recall GraphgPb Algorithm: Current LeaderR=5 R=10 R=10 R=10

g g

R=10 R=20 R=20 R=20global Probability of  CVPR 2008 damasceneR=10 R=20 R=20 R=20g yboundary [Maire   1boundary [Maire, A b l  F lk  

1

We achieve slightly better Arbelaez, Fowlkes, 

We achieve slightly better     h  B k l  

Malik, CVPR 2008 ]accuracy on the Berkeley 

, ]Currently  the most  0.8y y

Segmentation DatasetCurrently, the most 

t  i   t   Combine Non-max Suppression Intervening ContourSegmentation Datasetaccurate image contour  Combine Non max Suppression Intervening Contour

Comparing to human detector

n

p gsegmented “ground truth”5 8mins per small image 

0.6

onCPU/GPU: CPU/GPU: CPU/GPU: CPU/GPU: CPU/GPU: CPU/GPU: segmented  ground truthF meas re 0 70 for both

5.8mins per small image (481 b 321) li it  it   is

io

277/0.833(ms)277/0.833(ms) 130/0.31(ms)130/0.31(ms) 6.32/0.034 (s)6.32/0.034 (s) F‐measure 0.70 for both(481 by 321) limits its  eci277/0.833(ms)

332x277/0.833(ms)

332x130/0.31(ms)

419x130/0.31(ms)

419x6.32/0.034 (s)

185x6.32/0.034 (s)

185xapplicability 0.4Pr

e332x332x 419x419x 185x185xpp yToo slow for interactive 

P

Too slow for interactive h t   ditiphoto editing

Too slow even for Image  0.2

Generlaized Eigen SolverToo slow even for Image Retrieval Generlaized Eigen SolverRetrieval

0CPU/GPU: 151.2/0.81 (s) 186xCPU/GPU: 151.2/0.81 (s) 186x 0

0 0.2 0.4 0.6 0.8 1

CPU/GPU: 151.2/0.81 (s) 186xCPU/GPU: 151.2/0.81 (s) 186x

RecallRecall

Platform: Nvidia GTX200 SeriesPlatform: Nvidia GTX200 Series

ConclusionOriented Energy Combination ConclusionOriented Energy CombinationS ifi ti GTX280Specifications GTX280 CPU/GPU: 2300/16.5 (ms) 140xCPU/GPU: 2300/16.5 (ms) 140xProcessors 30 @ 1.3

( )( )

GHzPhysical SIMD Width 8 • Damascene provides highest quality image contour Physical SIMD Width 8SP GFLOPS 933

• Damascene provides highest quality image contour d i       bl  SP GFLOPS 933 detection at user acceptable rates

Memory Bandwidth 141.7 GB/sp

• It demonstrates the transformational speedup yRegister File 1 875 MB

It demonstrates the transformational speedup t ti l  f  hit tRegister File 1.875 MB

L l S 480 kBpotential of manycore architectures

Local Store 480 kB • Damascene was enabled by the collaborative Global PbCombine, NormalizeMemory 1 GB

Damascene was enabled by the collaborative environment at the Berkeley UPCRCGlobal PbCombine, Normalizey environment at the Berkeley UPCRC

• Future work will generalize Damascene into a case CPU/GPU: CPU/GPU: gstudy for application and programming frameworks 

CPU/GPU: 437/0.241 (ms)

CPU/GPU: 437/0.241 (ms) study for application and programming frameworks 

d437/0.241 (ms)

1813x437/0.241 (ms)

1813x (stay tuned)1813x1813x ( y )