Upload
bruis
View
46
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Finding Body Parts with Vector Processing. Cynthia Bruyns Bryan Feldman CS 252. Introduction. Take existing algorithm for tracking human motion, speed up by computing on the GPU. Demonstrate that many vision algorithms are prime candidates for using vector processing. Demo. - PowerPoint PPT Presentation
Citation preview
Finding Body Parts with Vector Processing
Cynthia Bruyns
Bryan Feldman
CS 252
Introduction
Take existing algorithm for tracking human motion, speed up by computing on the GPU.
Demonstrate that many vision algorithms are prime candidates for using vector processing
Results after false candidates have been removed
Demo
Vision Algorithms
Often computationally expensive-searching over many pixels for objects at many orientations and scales
E.g. • [((1024x768)pix)x3colors]x[12orientations]x[5 scales]
Very often the case that highly parallizable
Limb Finding
Goal – find candidate limbs Limbs look like long dark
rectangles on light backgrounds or long light things on dark backgrounds
1. Convolution with filter
convolve using FFT
• Response indicates how much pixels go from low to high intensity
• Convolve over all three color channels so as to not miss red – blue of same intensity
Algorithm specifics
*x
2. For every pixel location get respconv from “left” and “right”, put into new matrix resplimb
Algorithm specifics
-respconvx
x
respcon
v
x x
resplimb
Algorithm specifics
3. Find local maximums –
for every pixel replace with max. of local neighbors. If resplimb=locMax it’s a max
.50 .25 .40 .23
.75 .41 .98 .75
.11 .43 .15 .23
.78 .34 .13 .15
.75 .98 .98 .98
.75 .98 .98 .98
.78 .98 .98 .98
.78 .87 .23 .23
resplimb locMax
GPU
It’s a good choice because each operation is per pixel – SIMD-like
Data stored in texture buffers equivalent to local cache
Clean instruction set and developing interface language to exploit vector operations
Justify your gaming habits
GPU dataflow model
Hardware supports several data types for bandwidth optimization, i.e. 32 bit floating point, half etc.
Data passed to main memory stages via binding
Application FragmentProcessor
Asse
mb
ly &
Raste
rizatio
n
Fram
eb
uff
er
Op
era
tion
s
Framebuffer
Textures
VertexProcessor
Fragment processor has high resource limits 1024 instructions 512 constants or uniform parameters
• Each constant counts as one instruction 16 texture units
• Reuse as many times as desired No branching
• But, can do a lot with condition codes No indexed reads from registers
• Use texture reads instead No memory writes
The algorithm Draw invokes the fragment programs The texture becomes a data structure – use two for framebuffers
to avoid RAW hazzards
FFT Fragment program
FFT Fragment program
Image
Mask
Convolution Program
CylinderProgram
Find MaxProgram
For each orientation to search
Results
0
100
200
300
400
500
600
700
256 512 1024
im age s ize
tim
e
sc
ale CPU orig
CPU FFT
GPU
(CPU-2.53 GHz P4GPU Nvidia FX5900)
Mask size fixed (22x13) vary image size
*Additional GPU optimizations possible
Results – log scale
1
10
100
1000
256 512 1024
im age s ize
tim
e
log
sca
le
CPU orig
CPU FFT
GPU
(CPU-2.53 GHz P4GPU Nvidia FX5900)
Mask size fixed (22x13) vary image size
42.7 sec
252.1 sec
*Additional GPU optimizations possible
Results
0
20
40
60
80
100
120
140
160
11x7 22x13 44x25
m ask s ize
time
CPU orig
CPU FFT
GPU
Image size fixed (512x512) vary mask size
Varying mask sizes allow for varying limb sizes on same image
Results
Comments
GPU and image processing are a good match
Time to move memory from CPU to GPU is cumbersome – but can be overcome
Non-uniformity of installations, products, exact specifications are hearsay
Acknowledgements
Kenneth Moreland Deva Ramanan Okan Arikan