Upload
ural-pdc
View
58
Download
0
Embed Size (px)
Citation preview
APPLYING OF THE NVIDIA CUDA TO THE VIDEO PROCESSING IN THE TASK OF THE ROUNDWOOD VOLUME ESTIMATION
Artem Kruglov, Andrey Chiryshev, Evgeniya Shishko1
1Ural federal university named after the first President of Russia B.N.Yeltsin
PreambleThe system for roundwood geometry measurement on the basis of machine vision allows to: • Automate the process of the roundwood scanning and
sorting, • Reduce an error as compared to the systems based on a
laser scanning and photocell,• Fulfill an external assessment of the quality and type of
wood.
Ural-PDC 2016
Problem• logs detection should be carried out in the real time, which
imposes much tighter restrictions to the applying of the image analysis methods:
• maximum processing time for each frame is 20 ms.
Ural-PDC 2016
Roundwood volume estimation algorithm
Ural-PDC 2016
Possible scenes
Ural-PDC 2016
Empty scene Appearance of the log
Log tracking End of the log
Possible scenes
Ural-PDC 2016
Two consecutive logs Appearance of the parallel log
Two parallel logs
Results of the computational experiment
ImageProcessing time, ms
Image enhancement
Background model
refreshing
Object detection
1 0 2,1 02 31,4 0 10,63 32,0 0 11,54 31,9 0 12,05 32,1 0 13,06 31,9 0 13,67 32,2 0 13,4
Ural-PDC 2016
CUDA implementation1. Selecting of the active GPU;2. Allocating the data store for an image in the global
memory;3. Loading input image into the global memory;4. Loading coefficients of the filter window into the constant
memory;5. Forming the structure of the computational grid;6. Launching the kernel functions executing the image
filtration;7. Copying the output image from the GPU memory to the
internal memory;8. Deallocating.
Ural-PDC 2016
CUDA implementation1 #define BLOCK DIM 162 constant int c_mask [25];3 // Host-function4 void FilterGPU (byte *inbuf, byte *outbuf, int height, int width, int apert, int *mask)5 {6 // Device initialization7 cudaSetDevice (0);8 // Memory allocating9 byte *buf;10 cudaMalloc ((void **)&buf, sizeof (byte)*width*height);11 // Loading image into the device12 cudaMemcpy (buf, inbuf, sizeof (byte)*width* height, cudaMemcpyHostToDevice);13 // Loading coefficients of the filter window into the constant memory14 cudaMemcpyToSymbol (c_mask, mask, sizeof (int)*apert*apert);15 // Computational grid forming16 dim3 blocks = dim3 (width/BLOCK_DIM+1, height/BLOCK_DIM+1);17 dim3 threads = dim3 (BLOCK_DIM, BLOCK_DIM);18 // Kernel launch19 kernel<<< blocks, threads >>> (buf, height, width, apert);20 cudaThreadSynchronize ();21 // Uploading output image to the PC memory22 cudaMemcpy (outbuf, buf, sizeof (byte)*width*height, cudaMemcpyDeviceToHost);23 // Device deallocating24 cudaFree (buf);25 }
Ural-PDC 2016
CUDA implementation
• The fast shared memory used to process each part of the image.
• Image enhancement operations in the Kernel function:• Area opening (erosion + dilatation)• Dilatation
Ural-PDC 2016
part of the image (16x16) processed in the thread
CUDA architecture uses global, constant, textural, shared and local types of the memory.• The constant memory is used to store Filter window. • The global memory is used to store the image (capacity
and R/W).
Approaches comparison
Ural-PDC 2016
The computational experiment has been carried out on the IBM PC with the following characteristics:• 8-core CPU Intel Core i7 920 2.79 GHz;• GPU NVIDIA GeForce GTS 450 (192 CUDA kernels);• 64x Windows 7 OS.
Compared approaches:• Sequential implementation (CPU only)• OpenMP multitreading• Parallel implementation (CUDA)
Approaches comparison
Ural-PDC 2016
ImageAverage processing time, ms
Speedup (serial)Serial OpenMP CUDA
1 2,1 2,1 2,1 12 42,0 19,9 12,2 3,443 43,5 21,0 12,8 3,44 43,9 21,8 13,9 3,165 45,1 22,9 14,5 3,116 45,5 22,8 15,2 2,997 45,6 22,8 15,4 2,96
Conclusion• The average data transfer time to/from GPU is 0,2 ms
• Calculations speedup using CUDA is up to 3 times
• The time costs of the algorithm in the parallel implementation is less then 20 ms per frame - suitable for real-time system
Ural-PDC 2016
Thank you for your attention!
Ural-PDC 2016