14
APPLYING OF THE NVIDIA CUDA TO THE VIDEO PROCESSING IN THE TASK OF THE ROUNDWOOD VOLUME ESTIMATION Artem Kruglov, Andrey Chiryshev, Evgeniya Shishko 1 1 Ural federal university named after the first President of Russia B.N.Yeltsin

Applying of the NVIDIA CUDA to the video processing in the task of the roundwood volume estimation

Embed Size (px)

Citation preview

Page 1: Applying of the NVIDIA CUDA to the video processing in the task of the roundwood volume estimation

APPLYING OF THE NVIDIA CUDA TO THE VIDEO PROCESSING IN THE TASK OF THE ROUNDWOOD VOLUME ESTIMATION

Artem Kruglov, Andrey Chiryshev, Evgeniya Shishko1

1Ural federal university named after the first President of Russia B.N.Yeltsin

Page 2: Applying of the NVIDIA CUDA to the video processing in the task of the roundwood volume estimation

PreambleThe system for roundwood geometry measurement on the basis of machine vision allows to: • Automate the process of the roundwood scanning and

sorting, • Reduce an error as compared to the systems based on a

laser scanning and photocell,• Fulfill an external assessment of the quality and type of

wood.

Ural-PDC 2016

Page 3: Applying of the NVIDIA CUDA to the video processing in the task of the roundwood volume estimation

Problem• logs detection should be carried out in the real time, which

imposes much tighter restrictions to the applying of the image analysis methods:

• maximum processing time for each frame is 20 ms.

Ural-PDC 2016

Page 4: Applying of the NVIDIA CUDA to the video processing in the task of the roundwood volume estimation

Roundwood volume estimation algorithm

Ural-PDC 2016

Page 5: Applying of the NVIDIA CUDA to the video processing in the task of the roundwood volume estimation

Possible scenes

Ural-PDC 2016

Empty scene Appearance of the log

Log tracking End of the log

Page 6: Applying of the NVIDIA CUDA to the video processing in the task of the roundwood volume estimation

Possible scenes

Ural-PDC 2016

Two consecutive logs Appearance of the parallel log

Two parallel logs

Page 7: Applying of the NVIDIA CUDA to the video processing in the task of the roundwood volume estimation

Results of the computational experiment

ImageProcessing time, ms

Image enhancement

Background model

refreshing

Object detection

1 0 2,1 02 31,4 0 10,63 32,0 0 11,54 31,9 0 12,05 32,1 0 13,06 31,9 0 13,67 32,2 0 13,4

Ural-PDC 2016

Page 8: Applying of the NVIDIA CUDA to the video processing in the task of the roundwood volume estimation

CUDA implementation1. Selecting of the active GPU;2. Allocating the data store for an image in the global

memory;3. Loading input image into the global memory;4. Loading coefficients of the filter window into the constant

memory;5. Forming the structure of the computational grid;6. Launching the kernel functions executing the image

filtration;7. Copying the output image from the GPU memory to the

internal memory;8. Deallocating.

Ural-PDC 2016

Page 9: Applying of the NVIDIA CUDA to the video processing in the task of the roundwood volume estimation

CUDA implementation1 #define BLOCK DIM 162 constant int c_mask [25];3 // Host-function4 void FilterGPU (byte *inbuf, byte *outbuf, int height, int width, int apert, int *mask)5 {6 // Device initialization7 cudaSetDevice (0);8 // Memory allocating9 byte *buf;10 cudaMalloc ((void **)&buf, sizeof (byte)*width*height);11 // Loading image into the device12 cudaMemcpy (buf, inbuf, sizeof (byte)*width* height, cudaMemcpyHostToDevice);13 // Loading coefficients of the filter window into the constant memory14 cudaMemcpyToSymbol (c_mask, mask, sizeof (int)*apert*apert);15 // Computational grid forming16 dim3 blocks = dim3 (width/BLOCK_DIM+1, height/BLOCK_DIM+1);17 dim3 threads = dim3 (BLOCK_DIM, BLOCK_DIM);18 // Kernel launch19 kernel<<< blocks, threads >>> (buf, height, width, apert);20 cudaThreadSynchronize ();21 // Uploading output image to the PC memory22 cudaMemcpy (outbuf, buf, sizeof (byte)*width*height, cudaMemcpyDeviceToHost);23 // Device deallocating24 cudaFree (buf);25 }

Ural-PDC 2016

Page 10: Applying of the NVIDIA CUDA to the video processing in the task of the roundwood volume estimation

CUDA implementation

• The fast shared memory used to process each part of the image.

• Image enhancement operations in the Kernel function:• Area opening (erosion + dilatation)• Dilatation

Ural-PDC 2016

part of the image (16x16) processed in the thread

CUDA architecture uses global, constant, textural, shared and local types of the memory.• The constant memory is used to store Filter window. • The global memory is used to store the image (capacity

and R/W).

Page 11: Applying of the NVIDIA CUDA to the video processing in the task of the roundwood volume estimation

Approaches comparison

Ural-PDC 2016

The computational experiment has been carried out on the IBM PC with the following characteristics:• 8-core CPU Intel Core i7 920 2.79 GHz;• GPU NVIDIA GeForce GTS 450 (192 CUDA kernels);• 64x Windows 7 OS.

Compared approaches:• Sequential implementation (CPU only)• OpenMP multitreading• Parallel implementation (CUDA)

Page 12: Applying of the NVIDIA CUDA to the video processing in the task of the roundwood volume estimation

Approaches comparison

Ural-PDC 2016

ImageAverage processing time, ms

Speedup (serial)Serial OpenMP CUDA

1 2,1 2,1 2,1 12 42,0 19,9 12,2 3,443 43,5 21,0 12,8 3,44 43,9 21,8 13,9 3,165 45,1 22,9 14,5 3,116 45,5 22,8 15,2 2,997 45,6 22,8 15,4 2,96

Page 13: Applying of the NVIDIA CUDA to the video processing in the task of the roundwood volume estimation

Conclusion• The average data transfer time to/from GPU is 0,2 ms

• Calculations speedup using CUDA is up to 3 times

• The time costs of the algorithm in the parallel implementation is less then 20 ms per frame - suitable for real-time system

Ural-PDC 2016

Page 14: Applying of the NVIDIA CUDA to the video processing in the task of the roundwood volume estimation

Thank you for your attention!

Ural-PDC 2016