17
1 RGBD Occlusion Detection via Deep Convolutional Neural Networks Soumik Sarkar 1,2 , Vivek Venugopalan 1 , Kishore Reddy 1 , Michael Giering 1 , Julian Ryde 3 , Navdeep Jaitly 4,5 1 United Technologies Research Center, East Hartford, CT 2 Currently with Iowa State University, Ames, IA 3 United Technologies Research Center, Berkeley, CA 4 University of Toronto, ON, Canada 5 Currently with Google Inc., Mountain View, CA Subject to the EAR, ECCN: EAR99. This information is subject to the export control laws of the United States, specifically including the Export Administration Regulations (EAR), 15 C.F.R. Part 730 et seq. Transfer, retransfer or disclosure of this data by any means to a non-US person (individual or company), whether in the U.S. or abroad, without any required export license or other approval from the U.S. Govt. is prohibited. UTC Proprietary - This material contains proprietary information of United Technologies Corporation. Any copying, distribution, or dissemination of the contents of this material is strictly prohibited and may be unlawful without the express written permission of UTC. If you have obtained this material in error, please notify UTRC Counsel at (860) 610-7948 immediately.

RGBD Occlusion Detection via Deep Convolutional Neural ... · Soumik Sarkar1,2, Vivek Venugopalan 1, Kishore Reddy , Michael Giering1, Julian Ryde3, Navdeep Jaitly4,5 1United Technologies

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: RGBD Occlusion Detection via Deep Convolutional Neural ... · Soumik Sarkar1,2, Vivek Venugopalan 1, Kishore Reddy , Michael Giering1, Julian Ryde3, Navdeep Jaitly4,5 1United Technologies

1

RGBD Occlusion Detection via Deep

Convolutional Neural Networks Soumik Sarkar1,2, Vivek Venugopalan1, Kishore Reddy1,

Michael Giering1, Julian Ryde3, Navdeep Jaitly4,5

1United Technologies Research Center, East Hartford, CT

2Currently with Iowa State University, Ames, IA 3United Technologies Research Center, Berkeley, CA

4University of Toronto, ON, Canada 5Currently with Google Inc., Mountain View, CA

Subject to the EAR, ECCN: EAR99. This information is subject to the export control laws of the United States,

specifically including the Export Administration Regulations (EAR), 15 C.F.R. Part 730 et seq. Transfer, retransfer or

disclosure of this data by any means to a non-US person (individual or company), whether in the U.S. or abroad, without

any required export license or other approval from the U.S. Govt. is prohibited.

UTC Proprietary - This material contains proprietary information of United Technologies Corporation. Any copying, distribution, or

dissemination of the contents of this material is strictly prohibited and may be unlawful without the express written permission of UTC.

If you have obtained this material in error, please notify UTRC Counsel at (860) 610-7948 immediately.

Page 2: RGBD Occlusion Detection via Deep Convolutional Neural ... · Soumik Sarkar1,2, Vivek Venugopalan 1, Kishore Reddy , Michael Giering1, Julian Ryde3, Navdeep Jaitly4,5 1United Technologies

United Technologies

2

Otis Sikorsky UTC Climate, Controls & Security

UTC Aerospace Systems Pratt & Whitney

UTC Propulsion & Aerospace Systems

This page contains no technical data and is not subject to the EAR or the ITAR.

Page 3: RGBD Occlusion Detection via Deep Convolutional Neural ... · Soumik Sarkar1,2, Vivek Venugopalan 1, Kishore Reddy , Michael Giering1, Julian Ryde3, Navdeep Jaitly4,5 1United Technologies

Occlusion detection

Occlusion edges help image feature selection, once occlusion

boundaries are established – the depth of the region can be determined

This is very useful in Simultaneous localization and mapping (SLAM)

problems in robotics applications for indoor environments, object

recognition, grasping, obstacle avoidance in UAV applications, etc.

3

A voxel map and the corresponding geometric edges for a hallway

This page contains no technical data and is not subject to the EAR or the ITAR.

Page 4: RGBD Occlusion Detection via Deep Convolutional Neural ... · Soumik Sarkar1,2, Vivek Venugopalan 1, Kishore Reddy , Michael Giering1, Julian Ryde3, Navdeep Jaitly4,5 1United Technologies

Range edge

Occlusion detection

Occlusion edges depend on the gradient of the depth image which is

very sensitive to noise in the depth map

The depth map derived from a single image is very noisy and has large

errors.

In our work, we are estimating the occlusion edges directly rather than

estimating depth first and then calculating occlusion edges. Secondly

there are additional cues other than depth which contribute to

establishing occlusion edges that our technique is taking advantage of.

4

Image edge Original image

This page contains no technical data and is not subject to the EAR or the ITAR.

Page 5: RGBD Occlusion Detection via Deep Convolutional Neural ... · Soumik Sarkar1,2, Vivek Venugopalan 1, Kishore Reddy , Michael Giering1, Julian Ryde3, Navdeep Jaitly4,5 1United Technologies

Deep Neural Nets and Convolutional Neural Nets

Convolutional filters to generate feature

maps from data

Subsampling or pooling for dimension

reduction and higher order feature

generation

5

Input to the Deep Neural network

Target layer

Output predicted feature vectors

shared weights

maxmax

max

max

Convolutions - sliding filter max pooling

subsampling

hm+ 1

hm

W hidden layers

UTC PROPRIETARY - Export Controlled - ECCN: EAR99

Page 6: RGBD Occlusion Detection via Deep Convolutional Neural ... · Soumik Sarkar1,2, Vivek Venugopalan 1, Kishore Reddy , Michael Giering1, Julian Ryde3, Navdeep Jaitly4,5 1United Technologies

Occlusion detection from Freiburg dataset

Use readily available dataset for demonstrating occlusion edge

detection from Computer Vision Group at Technische Universität

München (TUM)

Partition the trajectory into training and test datasets for the neural nets

6

Reference: http://vision.in.tum.de/data/datasets/rgbd-dataset/download

UTC PROPRIETARY - Export Controlled - ECCN: EAR99

Page 7: RGBD Occlusion Detection via Deep Convolutional Neural ... · Soumik Sarkar1,2, Vivek Venugopalan 1, Kishore Reddy , Michael Giering1, Julian Ryde3, Navdeep Jaitly4,5 1United Technologies

Problem setup

7

Input: RGB+D information from

consecutive video frames (640x480)

captured by mobile sensor

Deep Convolutional Neural Network

Output: Occlusion edges

UTC PROPRIETARY - Export Controlled - ECCN: EAR99

Page 8: RGBD Occlusion Detection via Deep Convolutional Neural ... · Soumik Sarkar1,2, Vivek Venugopalan 1, Kishore Reddy , Michael Giering1, Julian Ryde3, Navdeep Jaitly4,5 1United Technologies

8

Training process

No occlusion

patch

Occlusion

patch

Center-pixel

based labels Partition into 32x32 patch examples

Training and Testing processes

Network Training

Testing process Trained Network

32x32 patches generated

with fixed stride

Prediction of patch

(Center-pixel) label

UTC PROPRIETARY - Export Controlled - ECCN: EAR99

Page 9: RGBD Occlusion Detection via Deep Convolutional Neural ... · Soumik Sarkar1,2, Vivek Venugopalan 1, Kishore Reddy , Michael Giering1, Julian Ryde3, Navdeep Jaitly4,5 1United Technologies

9

Post-processing for Occlusion edge reconstruction

Testing and post-processing Trained Network

32x32 patches generated

with fixed stride

Prediction of patch

(Center-pixel) label

Prediction

confidence from

softmax posterior

Prediction confidence converted

to patch-wide label using a

Gaussian kernel (with Full Width

at Half Maximum - FWHM)

Gaussian labels are fused in

a mixture model to generate

smooth occlusion edges

UTC PROPRIETARY - Export Controlled - ECCN: EAR99

Page 10: RGBD Occlusion Detection via Deep Convolutional Neural ... · Soumik Sarkar1,2, Vivek Venugopalan 1, Kishore Reddy , Michael Giering1, Julian Ryde3, Navdeep Jaitly4,5 1United Technologies

Experimental setup

Nvidia Tesla K40 GPU with 2880 cores and 12 GB device RAM

Initial pre-processing for dividing dataset into training and test and

extracting small images (32x32) from large frames (480x640)

Image size fixed at 32x32 with number of channels depending on the

experiment

4 channels for RGBD

3 channels for RGB

6 channels for RGBD + optical flow (UV)

Ground truth consists of labelled edges by using only the depth sensor

10 UTC PROPRIETARY - Export Controlled - ECCN: EAR99

Page 11: RGBD Occlusion Detection via Deep Convolutional Neural ... · Soumik Sarkar1,2, Vivek Venugopalan 1, Kishore Reddy , Michael Giering1, Julian Ryde3, Navdeep Jaitly4,5 1United Technologies

Optical flow pre-processing

11 UTC PROPRIETARY - Export Controlled - ECCN: EAR99

Page 12: RGBD Occlusion Detection via Deep Convolutional Neural ... · Soumik Sarkar1,2, Vivek Venugopalan 1, Kishore Reddy , Michael Giering1, Julian Ryde3, Navdeep Jaitly4,5 1United Technologies

Results

12 UTC PROPRIETARY - Export Controlled - ECCN: EAR99

Data Channels Patch

stride

Training

dataset

Testing

dataset

Test error

(averaged over

80-100 epochs)

Computation

time/epoch

RGBD (1 frame) 4 4 56354 500000 15.35 1m 21s

4 8 14278 316167 18.76 2m 17s

RGB (1 frame) 3 4 56354 500000 16.43 1m 2s

3 8 14278 316167 18.72 1m 42s

RGBDUV 6 4 56354 500000 15.18 1m 22s

Page 13: RGBD Occlusion Detection via Deep Convolutional Neural ... · Soumik Sarkar1,2, Vivek Venugopalan 1, Kishore Reddy , Michael Giering1, Julian Ryde3, Navdeep Jaitly4,5 1United Technologies

Post-processing Results

Input: RGBD image (32x32x4), stride 8

13

Input: RGBD image (32x32x4), stride 4 Performance improves with

higher granularity of fusion

UTC PROPRIETARY - Export Controlled - ECCN: EAR99

Page 14: RGBD Occlusion Detection via Deep Convolutional Neural ... · Soumik Sarkar1,2, Vivek Venugopalan 1, Kishore Reddy , Michael Giering1, Julian Ryde3, Navdeep Jaitly4,5 1United Technologies

Post-processing Results

Input: RGB image (32x32x3), stride 8

14

Input: RGB image (32x32x3), stride 4 Performance improves with

higher granularity of fusion

Overall detection confidence

deteriorates without D channel

UTC PROPRIETARY - Export Controlled - ECCN: EAR99

Page 15: RGBD Occlusion Detection via Deep Convolutional Neural ... · Soumik Sarkar1,2, Vivek Venugopalan 1, Kishore Reddy , Michael Giering1, Julian Ryde3, Navdeep Jaitly4,5 1United Technologies

RGBD and optical flow (RGBDUV) Results

15 UTC PROPRIETARY - Export Controlled - ECCN: EAR99

Page 16: RGBD Occlusion Detection via Deep Convolutional Neural ... · Soumik Sarkar1,2, Vivek Venugopalan 1, Kishore Reddy , Michael Giering1, Julian Ryde3, Navdeep Jaitly4,5 1United Technologies

Conclusion

Deep CNN can extract significant occlusion edge features from only

RGB channels (i.e., without the depth sensor information). Occlusion

detection accuracy increases when we introduce optical flow.

Deep Convolutional Neural Nets (Deep CNN) for multi-modal fusion

applied to occlusion detection

The trade-off between high resolution patch analysis and frame-level

computation time is critical for real-time robotics applications

Currently investigating multiple time-frames of RGB input in order to

extract structure from motion

16 This page contains no technical data and is not subject to the EAR or the ITAR.

Page 17: RGBD Occlusion Detection via Deep Convolutional Neural ... · Soumik Sarkar1,2, Vivek Venugopalan 1, Kishore Reddy , Michael Giering1, Julian Ryde3, Navdeep Jaitly4,5 1United Technologies

Questions

17