1
Object Scene Flow for Autonomous Vehicles Moritz Menze 1 , Andreas Geiger 2 1 Leibniz Universität Hannover, 2 MPI for Intelligent Systems, Tübingen Abstract We propose a novel model and dataset for 3D scene flow estimation with an application to autonomous driving. As outdoor scenes often decompose into a small number of independently moving objects, we represent each element in the scene by its rigid motion parameters and each superpixel by a 3D plane and an index to the correspond- ing object. This minimal representation increases robustness and is formalized in a discrete-continuous CRF. Our model intrinsically segments the scene into its constituting dynamic components. We demonstrate the performance of our model on existing benchmarks and a novel realistic dataset with scene flow ground truth. We ob- tain this dataset by annotating 400 dynamic scenes from the KITTI raw data collection using 3D CAD models for all vehicles in motion. Our experiments reveal novel challenges which cannot be handled by existing methods. Results and Ground Truth. Estimated segmentation into moving objects (top), optical flow (center) and proposed optical flow ground truth (bottom). Representation Following the idea of piecewise-rigid shape and motion [8], the 3D scene is approximated by planar superpixels from StereoSLIC s = {s i |i ∈ S} As opposed to existing works, we assume a finite number of rigidly moving objects in the scene o = {o i |i ∈ O} Superpixels are parameterized by a plane n and an object index k : s i =(n i ,k i ) T Objects store their rigid motion parameters o j =(R j , t j ) T The superpixels inherit motion parameters from the assigned object Model E (s, o)= X i∈S ϕ i (s i , o) | {z } data + X ij ψ ij (s i , s j ) | {z } smoothness Data Term The data term ϕ consists of pairwise potentials which are evaluated for 3 pairs of images (see figure below) ϕ i (s i , o)= X j ∈O [ k i = j ] · D i (n i , o j ) where the Iverson bracket restricts ϕ to the selected object. To compute D i (n i , o j ) we leverage: Dense Census features Sparse optical flow from feature matches SGM disparity maps for both rectified pairs Smoothness Term Our smoothness potential ψ ij (s i , s j ) decomposes as: ψ ij (s i , s j )= θ 3 ψ depth ij (n i , n j )+ θ 4 ψ orient ij (n i , n j )+ θ 5 ψ motion ij (s i , s j ) with weights θ controlling the individual terms: Regularization of depth is achieved by penalizing differences in disparity at shared boundary pixels (ψ depth ij ) We encourage the orientation of neighboring planes to be similar by evaluating the difference of plane normals n (ψ orient ij ) Coherence of the assigned object indices is enforced by an orientation-sensitive Potts model (ψ motion ij ) Inference We use max-product particle BP to jointly infer shape and motion parameters with TRW-S for the inner loop Particles are drawn from normal distributions around the current MAP solution and from neighboring superpixels Dataset 3D Points Annotation SGM Disparity Map Before Optimization After Optimization We propose a novel realistic scene flow dataset which includes dynamic objects We annotated 200 training and 200 test scenes based on KITTI raw data The static background of the scenes is recovered from laser scanner data by removing all dynamic objects and compensating for the vehicle’s ego-motion Dynamic objects are captured by fitting detailed CAD models to accumulated 3D laser point clouds in each frame manually annotated 2D control points SGM disparity maps Ground Truth. Disparity at t 0 (left) and optical flow (right). Qualitative Results 0.38 - 0.75 0.19 - 0.38 0.00 - 0.19 0.75 - 1.50 1.50 - 3.00 3.00 - 6.00 6.00 - 12.00 12.00 - 24.00 24.00 - 48.00 48.00 - Inf Qualitative Results. Each subfigure shows from top-to-bottom: disparity and optical flow ground truth in the reference view, the disparity map (D1) and optical flow map (Fl) estimated by our algorithm, and the respective error images using the color scheme depicted in the legend. The four scenes below the horizontal line are failure cases. Results on the ”Sphere” sequence Input Disparity 1 Error D1 Segmentation Disparity 2 Error D2 Superpixels Optical Flow Error Fl [7] [5] [9] [8] Ours RMSE 2D Flow 0.63 0.69 0.77 0.63 0.55 RMSE Disparity 3.8 3.8 10.9 2.84 2.58 RMSE Scene Flow 1.76 2.51 2.55 1.73 0.75 Illustration of our results for the synthetic “Sphere” sequence [5]. Top-left to bottom-right: The left input image of the first frame, our first dispar- ity/error map, the obtained segmenta- tion into different rigid body motions, the second disparity/error map, the su- perpixels we use, the recovered optical flow/error map. Quantitative Results bg fg bg+fg Huguet [5] 67.69 64.03 67.08 GCSF [2] 52.92 59.11 53.95 SGM [3] + LDOF [1] 43.99 44.77 44.12 SGM [3] + Sun [6] 38.21 53.03 40.68 SGM [3] + Sphere Flow [4] 23.09 37.11 25.42 PRSF [8] 13.49 33.71 16.85 Our full model 7.01 28.76 10.63 Quantitative Results on the Proposed Scene Flow Dataset. This table shows scene flow errors in %, averaged over all 200 test images. We provide separate numbers for the background region (bg), all foreground objects (fg) as well as all pixels in the image (bg+fg). Outliers are defined as those values exceeding ground truth by at least 3 px and 5%. 1 2 3 4 5 0 20 40 60 80 100 Error (%) Number of objects SF bg+fg occ SF fg occ SF bg occ 0 10 20 30 40 50 0 20 40 60 80 100 Error (%) Number of iterations SF bg+fg occ SF fg occ SF bg occ Parameter Variation. This figure shows the scene flow errors of our method on the proposed dataset with respect to the number of object proposals and MP-PBP iterations. References [1]T. Brox and J. Malik. Large displacement optical flow: Descriptor matching in variational motion estimation. PAMI, 33:500–513, March 2011. [2]Jan Cech, Jordi Sanchez-Riera, and Radu P. Horaud. Scene flow estimation by growing correspondence seeds. In CVPR, 2011. [3] Heiko Hirschmüller. Stereo processing by semiglobal matching and mutual information. PAMI, 30(2):328–341, 2008. [4] Michael Hornacek, Andrew Fitzgibbon, and Carsten Rother. SphereFlow: 6 DoF scene flow from RGB-D pairs. In CVPR, 2014. [5]Frédéric Huguet and Frédéric Devernay. A variational method for scene flow estimation from stereo sequences. In ICCV, 2007. [6] Deqing Sun, Stefan Roth, and Michael J. Black. A quantitative analysis of current practices in optical flow estimation and the principles behind them. IJCV, 106(2):115–137, 2013. [7]Levi Valgaerts, Andres Bruhn, Henning Zimmer, Joachim Weickert, Carsten Stoll, and Christian Theobalt. Joint estimation of motion, structure and geometry from stereo sequences. In ECCV, 2010. [8] C. Vogel, K. Schindler, and S. Roth. Piecewise rigid scene flow. In ICCV, 2013. [9]Andreas Wedel, Clemens Rabe, Tobi Vaudrey, Thomas Brox, Uwe Franke, and Daniel Cremers. Efficient dense scene flow from sparse or dense stereo data. In ECCV, 2008. Contact Information Web http://www.cvlibs.net/projects/objectsceneflow Email [email protected], [email protected]

Object Scene Flow for Autonomous Vehiclescvlibs.net/publications/Menze2015CVPR_poster.pdf · ij(s i,s j) = θ 3 ψ depth ij (n i,n j)+ ... 6/4/2015 6:34:02 PM

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Object Scene Flow for Autonomous Vehiclescvlibs.net/publications/Menze2015CVPR_poster.pdf · ij(s i,s j) = θ 3 ψ depth ij (n i,n j)+ ... 6/4/2015 6:34:02 PM

Object Scene Flow for Autonomous VehiclesMoritz Menze1, Andreas Geiger2

1Leibniz Universität Hannover, 2MPI for Intelligent Systems, Tübingen

Abstract

We propose a novel model and dataset for 3D scene flow estimationwith an application to autonomous driving. As outdoor scenes oftendecompose into a small number of independently moving objects, werepresent each element in the scene by its rigid motion parametersand each superpixel by a 3D plane and an index to the correspond-ing object. This minimal representation increases robustness andis formalized in a discrete-continuous CRF. Our model intrinsicallysegments the scene into its constituting dynamic components. Wedemonstrate the performance of our model on existing benchmarksand a novel realistic dataset with scene flow ground truth. We ob-tain this dataset by annotating 400 dynamic scenes from the KITTIraw data collection using 3D CAD models for all vehicles in motion.Our experiments reveal novel challenges which cannot be handled byexisting methods.

Results and Ground Truth. Estimated segmentation into moving objects (top),optical flow (center) and proposed optical flow ground truth (bottom).

Representation

•Following the idea of piecewise-rigid shape and motion [8], the 3Dscene is approximated by planar superpixels from StereoSLIC

s = {si|i ∈ S}•As opposed to existing works, we assume a finite number of rigidlymoving objects in the scene o = {oi|i ∈ O}

•Superpixels are parameterized by a plane n and an object index k:si = (ni, ki)T

•Objects store their rigid motion parameters oj = (Rj, tj)T

•The superpixels inherit motion parameters from the assigned object

Model

E(s,o) = ∑i∈S

ϕi(si,o)︸ ︷︷ ︸data

+ ∑i∼jψij(si, sj)︸ ︷︷ ︸smoothness

Data Term

The data term ϕ consists of pairwise potentials which are evaluatedfor 3 pairs of images (see figure below)

ϕi(si,o) = ∑j∈O

[ki = j] ·Di(ni,oj)

where the Iverson bracket restricts ϕ to the selected object.

To compute Di(ni,oj) we leverage:•Dense Census features•Sparse optical flow from feature matches•SGM disparity maps for both rectified pairs

Smoothness Term

Our smoothness potential ψij(si, sj) decomposes as:

ψij(si, sj) = θ3ψdepthij (ni,nj) + θ4ψ

orientij (ni,nj) + θ5ψ

motionij (si, sj)

with weights θ controlling the individual terms:•Regularization of depth is achieved by penalizing differences indisparity at shared boundary pixels (ψdepth

ij )•We encourage the orientation of neighboring planes to be similar byevaluating the difference of plane normals n (ψorient

ij )•Coherence of the assigned object indices is enforced by anorientation-sensitive Potts model (ψmotion

ij )

Inference

•We use max-product particle BP to jointly infer shape and motionparameters with TRW-S for the inner loop

•Particles are drawn from normal distributions around the currentMAP solution and from neighboring superpixels

Dataset

3DPo

ints

Anno

tatio

nSG

MDisparity

Map

Before Optimization After Optimization

•We propose a novel realistic scene flow dataset which includes dynamic objects•We annotated 200 training and 200 test scenes based on KITTI raw data•The static background of the scenes is recovered from laser scanner data by removing all dynamicobjects and compensating for the vehicle’s ego-motion

•Dynamic objects are captured by fitting detailed CAD models to•accumulated 3D laser point clouds in each frame•manually annotated 2D control points•SGM disparity maps

Ground Truth. Disparity at t0 (left) and optical flow (right).

Qualitative Results

0.38 - 0.750.19 - 0.380.00 - 0.19 0.75 - 1.50 1.50 - 3.00 3.00 - 6.00 6.00 - 12.00 12.00 - 24.00 24.00 - 48.00 48.00 - Inf

Qualitative Results. Each subfigure shows from top-to-bottom: disparity andoptical flow ground truth in the reference view, the disparity map (D1) and opticalflow map (Fl) estimated by our algorithm, and the respective error images usingthe color scheme depicted in the legend. The four scenes below the horizontal lineare failure cases.

Results on the ”Sphere” sequence

Input Disparity 1 Error D1

Segmentation Disparity 2 Error D2

Superpixels Optical Flow Error Fl

[7] [5] [9] [8] OursRMSE 2D Flow 0.63 0.69 0.77 0.63 0.55RMSE Disparity 3.8 3.8 10.9 2.84 2.58RMSE Scene Flow 1.76 2.51 2.55 1.73 0.75

Illustration of our results for thesynthetic “Sphere” sequence [5].Top-left to bottom-right: The left inputimage of the first frame, our first dispar-ity/error map, the obtained segmenta-tion into different rigid body motions,the second disparity/error map, the su-perpixels we use, the recovered opticalflow/error map.

Quantitative Results

bg fg bg+fgHuguet [5] 67.69 64.03 67.08GCSF [2] 52.92 59.11 53.95SGM [3] + LDOF [1] 43.99 44.77 44.12SGM [3] + Sun [6] 38.21 53.03 40.68SGM [3] + Sphere Flow [4] 23.09 37.11 25.42PRSF [8] 13.49 33.71 16.85Our full model 7.01 28.76 10.63

Quantitative Results on the Proposed Scene Flow Dataset. This tableshows scene flow errors in %, averaged over all 200 test images. We provideseparate numbers for the background region (bg), all foreground objects (fg) as wellas all pixels in the image (bg+fg). Outliers are defined as those values exceedingground truth by at least 3 px and 5%.

1 2 3 4 5

0

20

40

60

80

100

Err

or

(%)

Number of objects

SF bg+fg occSF fg occSF bg occ

0 10 20 30 40 50

0

20

40

60

80

100

Err

or

(%)

Number of iterations

SF bg+fg occSF fg occSF bg occ

Parameter Variation.This figure shows the sceneflow errors of our methodon the proposed datasetwith respect to the numberof object proposals andMP-PBP iterations.

References

[1] T. Brox and J. Malik. Large displacement optical flow: Descriptor matching invariational motion estimation. PAMI, 33:500–513, March 2011.

[2] Jan Cech, Jordi Sanchez-Riera, and Radu P. Horaud. Scene flow estimation bygrowing correspondence seeds. In CVPR, 2011.

[3]Heiko Hirschmüller. Stereo processing by semiglobal matching and mutualinformation. PAMI, 30(2):328–341, 2008.

[4]Michael Hornacek, Andrew Fitzgibbon, and Carsten Rother. SphereFlow: 6DoF scene flow from RGB-D pairs. In CVPR, 2014.

[5] Frédéric Huguet and Frédéric Devernay. A variational method for scene flowestimation from stereo sequences. In ICCV, 2007.

[6]Deqing Sun, Stefan Roth, and Michael J. Black. A quantitative analysis ofcurrent practices in optical flow estimation and the principles behind them.IJCV, 106(2):115–137, 2013.

[7] Levi Valgaerts, Andres Bruhn, Henning Zimmer, Joachim Weickert, CarstenStoll, and Christian Theobalt. Joint estimation of motion, structure andgeometry from stereo sequences. In ECCV, 2010.

[8] C. Vogel, K. Schindler, and S. Roth. Piecewise rigid scene flow. In ICCV, 2013.[9] Andreas Wedel, Clemens Rabe, Tobi Vaudrey, Thomas Brox, Uwe Franke, and

Daniel Cremers. Efficient dense scene flow from sparse or dense stereo data. InECCV, 2008.

Contact InformationWebhttp://www.cvlibs.net/projects/[email protected], [email protected]