17
Accelerating 3D Facial Modeling using ArrayFire, OpenCV and CUDA Umar Arshad (@arshad_umar) ArrayFire (@arrayfire)

Accelerating 3D Facial Modeling Using ArrayFire, OpenCV and …on-demand.gputechconf.com/gtc/2014/presentations/S4426... · 2014-05-21 · Umar Arshad Subject: This session will discuss

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Accelerating 3D Facial Modeling Using ArrayFire, OpenCV and …on-demand.gputechconf.com/gtc/2014/presentations/S4426... · 2014-05-21 · Umar Arshad Subject: This session will discuss

Accelerating 3D Facial Modeling using ArrayFire, OpenCV and CUDA

Umar Arshad (@arshad_umar)ArrayFire (@arrayfire)

Page 2: Accelerating 3D Facial Modeling Using ArrayFire, OpenCV and …on-demand.gputechconf.com/gtc/2014/presentations/S4426... · 2014-05-21 · Umar Arshad Subject: This session will discuss

ArrayFire

● World’s leading GPU experts○ In the industry since 2007○ NVIDIA Partner

● Deep experience working with thousands of customers○ Analysis○ Acceleration○ Algorithm development

● GPU Training○ Hands on course with a CUDA engineer○ Customized to meet your needs

Page 4: Accelerating 3D Facial Modeling Using ArrayFire, OpenCV and …on-demand.gputechconf.com/gtc/2014/presentations/S4426... · 2014-05-21 · Umar Arshad Subject: This session will discuss

Problem

● Came to us with a slow application○ Made use of OpenCV and OpenMP○ 8 threads: 30+ seconds○ One process○ Developed on OSX

● Required a significant hardware investment○ Increased maintenance○ Financially not viable in production○ Had windows infrastructure

Page 5: Accelerating 3D Facial Modeling Using ArrayFire, OpenCV and …on-demand.gputechconf.com/gtc/2014/presentations/S4426... · 2014-05-21 · Umar Arshad Subject: This session will discuss

Improvements

● OpenCV - ArrayFire interop● Rendering using GPUs

○ Partial CUDA based estimation○ OpenGL based rendering

● Batching Operations○ Combining data into single operation

● Concurrent Processing○ CPU: small variable length data○ GPU: large fixed length data

Page 6: Accelerating 3D Facial Modeling Using ArrayFire, OpenCV and …on-demand.gputechconf.com/gtc/2014/presentations/S4426... · 2014-05-21 · Umar Arshad Subject: This session will discuss

Moving to ArrayFire

● OpenCV Mat to ArrayFire array○ Row vs. Column Major

○ http://blog.accelereyes.com/blog/2012/09/19/image-processing-with-arrayfire-and-opencv/

● Similar Interface○ Allowed for quick porting

Page 7: Accelerating 3D Facial Modeling Using ArrayFire, OpenCV and …on-demand.gputechconf.com/gtc/2014/presentations/S4426... · 2014-05-21 · Umar Arshad Subject: This session will discuss

Rendering

● Software rasterization● Analysis of algorithm

○ Did not require an exact render

● ArrayFire based estimate○ Plot points○ Dilate

Page 8: Accelerating 3D Facial Modeling Using ArrayFire, OpenCV and …on-demand.gputechconf.com/gtc/2014/presentations/S4426... · 2014-05-21 · Umar Arshad Subject: This session will discuss

Rendering

● Moved to OpenGL for some cases○ Makes use of hardware rasterizer○ ArrayFire -> OpenGL interop using CUDA-OpenGL interop○ See ArrayFire GitHub for sample implementation

Page 9: Accelerating 3D Facial Modeling Using ArrayFire, OpenCV and …on-demand.gputechconf.com/gtc/2014/presentations/S4426... · 2014-05-21 · Umar Arshad Subject: This session will discuss

Batching

● Used OpenMP for parallelism○ One frame per thread○ Optimized for CPU

● One CPU thread + GPU○ Parallelism on GPU vs. Parallelism on CPU

● Combined OpenMP threads

Page 10: Accelerating 3D Facial Modeling Using ArrayFire, OpenCV and …on-demand.gputechconf.com/gtc/2014/presentations/S4426... · 2014-05-21 · Umar Arshad Subject: This session will discuss

Batching

● Many small operations○ Individually it didn’t make sense to port to the GPU

● Increase dimensionality of the data○ 2D -> 3D○ GFOR and Strided Access

● Moved to single threaded code

Page 11: Accelerating 3D Facial Modeling Using ArrayFire, OpenCV and …on-demand.gputechconf.com/gtc/2014/presentations/S4426... · 2014-05-21 · Umar Arshad Subject: This session will discuss
Page 12: Accelerating 3D Facial Modeling Using ArrayFire, OpenCV and …on-demand.gputechconf.com/gtc/2014/presentations/S4426... · 2014-05-21 · Umar Arshad Subject: This session will discuss
Page 13: Accelerating 3D Facial Modeling Using ArrayFire, OpenCV and …on-demand.gputechconf.com/gtc/2014/presentations/S4426... · 2014-05-21 · Umar Arshad Subject: This session will discuss

Batching

● Call custom CUDA kernels○ Special indexing

● Specialized Matrix Multiply○ ssyrk vs. gemm○ 2x faster○ concurrent execution using streams

float * bound = boundary.device<float>();kernel<<< threads, blocks >>>(bound, boundary.elements());

Page 14: Accelerating 3D Facial Modeling Using ArrayFire, OpenCV and …on-demand.gputechconf.com/gtc/2014/presentations/S4426... · 2014-05-21 · Umar Arshad Subject: This session will discuss

Batching

● Results○ 90ms -> 28ms on a GTX 690

● Other Improvements○ Overlapped pinned memory transfers○ Generic to Specialized matrix multiply○ Streams

Page 15: Accelerating 3D Facial Modeling Using ArrayFire, OpenCV and …on-demand.gputechconf.com/gtc/2014/presentations/S4426... · 2014-05-21 · Umar Arshad Subject: This session will discuss

Concurrent Computation

● Overlap CPU and GPU computation○ CPU handles variable length data sets one frame at a time○ GPU handles fixed length data sets all frames concurrently

#pragma omp sections

{

#pragma omp section

{

// GPU Code

}

#pragma omp section

{

// CPU Code

}

}

Page 16: Accelerating 3D Facial Modeling Using ArrayFire, OpenCV and …on-demand.gputechconf.com/gtc/2014/presentations/S4426... · 2014-05-21 · Umar Arshad Subject: This session will discuss

Results

● 1 Process (5 threads): 8 seconds● 6 Processes(2 threads): 22 seconds

Page 17: Accelerating 3D Facial Modeling Using ArrayFire, OpenCV and …on-demand.gputechconf.com/gtc/2014/presentations/S4426... · 2014-05-21 · Umar Arshad Subject: This session will discuss

Q & A