Toward Object Discovery and Modeling via 3-D Scene Comparison

Toward Object Discovery and Modeling via 3-D Scene Comparison

Evan Herbst, Peter Henry, Xiaofeng Ren, Dieter FoxUniversity of Washington; Intel Research Seattle

1

Overview• Goal: learn about an environment by tracking

changes in it over time• Detect objects that occur in different places at

different times

2

• Handle textureless objects• Avoid appearance/shape priors

• Represent a map with static + dynamic parts

Algorithm Outline• Input: two RGB-D videos• Mapping & reconstruction of

each video• Interscene alignment• Change detection• Spatial regularization• Outputs: reconstructed static

background; segmented movable objects

3

Scene Reconstruction• Mapping based on RGB-D Mapping [Henry et al. ISER’10]• Visual odometry, loop-closure detection, pose-graph optimization,

bundle adjustment

4

Scene Reconstruction• Mapping based on RGB-D Mapping [Henry et al. ISER’10]• Surface representation: surfels

5

Scene Differencing• Given two scenes, find parts that differ• Surfaces in two scenes similar iff object doesn’t move• Comparison at each surface point

6

Scene Differencing• Given two scenes, find parts that differ• Comparison at each surface point• Start by globally aligning scenes

7

(2-D) (3-D)

Naïve Scene Differencing

• Easy algorithm: closest point within δ → same• Ignores color, surface orientation• Ignores occlusions

8

• Model probability that a surface point moved

• Sensor readings z• Expected measurement z*• m ϵ {0, 1}

Scene Differencing

9

z*z0

z1z2 z3

frame 0

frame 10

frame 25frame 49

Sensor Models

10

• Model probability that a surface point moved

• Sensor readings z; expected measurement z*• By Bayes,

• Two sensor measurement models• With no expected surface:• With expected surface:

Sensor Models• Two sensor measurement models•With expected surface• Depth: uniform + exponential + Gaussian 1

• Color: uniform + Gaussian• Orientation: uniform + Gaussian

11

1 Thrun et al., Probabilistic Robotics, 2005

zd*

Sensor Models• Two sensor measurement models•With expected surface• Depth: uniform + exponential + Gaussian 1

• Color: uniform + Gaussian• Orientation: uniform + Gaussian

•With no expected surface• Depth: uniform + exponential• Color: uniform• Orientation: uniform 12

1 Thrun et al., Probabilistic Robotics, 2005

zd*

Example Result

13

Scene 1

Scene 2

Spatial Regularization

14

• Points treated independently so far• MRF to label each surfel moved or not moved• Data term given by pointwise evidence

• Smoothness term: Potts, weighted by curvature

Spatial Regularization

15

• Points treated independently so far• MRF to label each surfel moved or not moved

Scene 1

Scene 2

pointwise regularized

Experiments• Trained MRF on four scenes (1.4M surfels)• Tested on twelve scene pairs (8.0M surfels)• 70% error reduction wrt max-class baseline

16

Count % Count %

Total surfels 8.0M 100 8.0M 100

Moved surfels 250k 3 250k 3

Errors 250k 3 55.5k 0.7

False pos 0 0 4.5k 0.06

False neg 250k 3 51.0k 0.64

Baseline Ours

Experiments• Results: complex scene

17

Experiments• Results: large object

18

• Next steps• All scenes in one optimization• Model completion from many scenes• Train more supervised object segmentation

Conclusion

• Segment movable objects in 3-D using scene changes over time• Represent a map as static + dynamic parts• Extensible sensor model for RGB-D sensors

19

Using More Than 2 Scenes• Given our framework, pretty easy to combine evidence from

multiple scenes:

• wscene could be chosen to weight all scenes (rather than frames) equally, or upweight those taken under good lighting

• Other ways to subsample frames: as in keyframe selection in mapping

20

– Color, normal: uniform + Gaussian; mixing controlled by probability that beam hit expected surface

First Sensor Model: Surface Didn’t Move• Modeling sensor measurements:

• Depth: uniform + exponential + Gaussian *

21

* Fox et al., “Markov Localization…”, JAIR ‘99

zd*

Experiments• Trained MRF on four scenes (2.7 Msurfels)• Tested on twelve scene pairs (8.0 Msurfels)• 250k moved surfels; we get 4.5k FP, 51k FN• 65% error reduction wrt max-class baseline

• Extract foreground segments as “objects”

22

Overview• Many visits to same area over time• Find objects by motion

23

(extra) Related Work• Prob. Sensor models• Depth only• Depth & color, extra indep. Assumptions

• Static + dynamic maps• In 2-d• Usually not modeling objs

24

Spatial Regularization• Pointwise only so far• MRF to label each surfel moved or not moved• Data term given by pointwise evidence

• Smoothness term: Potts, weighted by curvature

25

Depth-Dependent Color/Normal Model

• Modeling sensor measurements:• Combine depth/color/normal:

26

Scene Reconstruction• Mapping based on RGB-D Mapping [Henry et al. ISER’10]• Surface representation: surfels

27

Documents

Toward Object Discovery and Modeling via 3-D Scene Comparison