1
Efficient Multi-Frequency Phase Unwrapping using Kernel Density Estimation Felix Järemo Lawin, Per-Erik Forssén, Hannes Ovrén {felix.jaremo-lawin, per-erik.forssen, hannes.ovren}@liu.se Introduction We introduce an efficient method to unwrap multi-frequency phase estimates for time-of- flight ranging. The algorithm generates multiple depth hypotheses and uses a spatial kernel den- sity estimate (KDE) to rank them. We apply the method on the Kinect v2. The Kinect v2 is designed for scenes with less than 8m range, but with our method the effective range can be extended. When extending the depth range to the maximal value of 18.75m, we get about 52% more valid measurements than exist- ing drivers in libfreenect2 and Microsoft Kinect SDK. Runs in 190Hz on a Nvidia GeForce GTX 760 GPU. Code is available at: http://www.cvl.isy.liu.se/research/ datasets/kinect2-dataset/ Method In amplitude modulated time-of-flight ranging the depth is obtained by measuring the phase shift φ m [0, 2π ) of the modulated signal given the frequencies f m . 0 5 10 15 20 25 0 5 10 Phase (rad) 80MHz 0 5 10 15 20 25 0 5 10 Phase (rad) 16MHz 0 5 10 15 20 25 Distance (m) 0 5 10 Phase (rad) 120MHz Figure 1: Left: The Kinect v2 sensor, right: wrapped phases for Kinect v2, in the range 0 to 25 meters. Top to bottom: φ 0 , φ 1 , φ 2 . The dashed line at 18.75 meters indicates the common wrap-around point for all three phases. Just before this line we have n 0 =9, n 1 =1, and n 2 = 14. The corresponding unwrapped phase mea- surements are φ m +2πn m . Phase unwrapping means finding the un- wrapping coefficients n =(n 0 ,...,n M -1 ). Each vector n corresponds to a hypothesis of the depth t i . Our method considers several hypotheses for each pixel location and selects the one with the highest kernel density value in a spatial neighbourhood N (x): p(t i (x)) = j ∈I ,k∈N (x) w jk K (t i (x) - t j (x k )) j ∈I ,k∈N (x) w jk , w ik = g (x - x k )p(t i |n i )p(t i |a i ) ,K (x)= e -x 2 2h 2 . The three factors in w ik are: the spatial weight g (x - x k ). the unwrapping likelihood p(t i (x)|n i (x)). the phase likelihood p(t i (x)|a i (x)), where a i =(a 0 ,...,a M -1 ), are the amplitudes. The final hypothesis selection is then made as: i * = arg max i∈I p(t i ) . Inliers are classified by p(t i ) >T . Results We apply the method to depth decoding for the Kinect v2 sensor, and compare it to the Microsoft Kinect SDK and to the open source driver libfreenect2. Ground truth is constructed by fusing many depth frames from 9 different camera poses into one very accurate depth image. Raw measurements from the ground truth pose are decoded into depth images using Microsoft, libfreenect2 and our method. A point is counted as an inlier when a method outputs a depth estimate that correspond to a correct unwrapping, which we set to be within 30cm from the ground truth. We evaluate on the kitchen dataset with maximal depth of 6.7m and the lecture dataset with maximum depth of 14.6m. kitchen kitchen (depth limited) lecture kitchen lecture 10 -7 10 -5 10 -3 10 -1 10 0 outlier rate 0 0.2 0.4 0.6 0.8 1 inlier rate r=5,I=2 r=1,I=2 r=5,I=3 I=1 libfreenect2 10 -7 10 -5 10 -3 10 -1 10 0 outlier rate r=5,I=2, d<8 r=1,I=2 d<8 r=5,I=3 d<8 libfreenect2 d<8 Microsoft 10 -7 10 -5 10 -3 10 -1 10 0 outlier rate r=5,I=2 r=5,I=3 r=1,I=2 I=1 libfreenect2 Microsoft Figure 2: Inlier and outlier rate plots and corresponding ground truth depth images. Each point or curve is the average over 25 frames. In the kitchen (depth limited) curve the algorithms assume a maximum depth of 8m, which simplifies the outlier rejection. A clear improvement can be observed in the depth images, especially in large depth scenes. Figure 3: Single frame output. Left: libfreenect2, Center: proposed method. Right: corresponding RGB image. Pixels suppressed by outlier rejection are shown in green. Recordings of raw Kinect v2 measurements were unwrapped and passed to the Kinect fusion implementation KinFu in the Point Cloud Library. libfreenect2: - Proposed: - libfreenect2: - Proposed: - Figure 4: KinFu scans of two different scenes using depth images produced by libfreenect2 and the proposed method. The input duration was 200 frames.

usingKernelDensityEstimation EfficientMulti …users.isy.liu.se/cvl/perfo/papers/jaremo_eccv16_poster.pdf · 2016. 10. 17. · EfficientMulti-FrequencyPhaseUnwrapping usingKernelDensityEstimation

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: usingKernelDensityEstimation EfficientMulti …users.isy.liu.se/cvl/perfo/papers/jaremo_eccv16_poster.pdf · 2016. 10. 17. · EfficientMulti-FrequencyPhaseUnwrapping usingKernelDensityEstimation

EfficientMulti-FrequencyPhaseUnwrappingusingKernelDensityEstimationFelix Järemo Lawin, Per-Erik Forssén, Hannes Ovrén{felix.jaremo-lawin, per-erik.forssen, hannes.ovren}@liu.se

IntroductionWe introduce an efficient method to unwrapmulti-frequency phase estimates for time-of-flight ranging. The algorithm generates multipledepth hypotheses and uses a spatial kernel den-sity estimate (KDE) to rank them. We applythe method on the Kinect v2.

• The Kinect v2 is designed for scenes withless than 8m range, but with our methodthe effective range can be extended.

• When extending the depth range to themaximal value of 18.75m, we get about52% more valid measurements than exist-ing drivers in libfreenect2 and MicrosoftKinect SDK.

• Runs in ≈ 190Hz on a Nvidia GeForceGTX 760 GPU. Code is available at:http://www.cvl.isy.liu.se/research/datasets/kinect2-dataset/

MethodIn amplitude modulated time-of-flight rangingthe depth is obtained by measuring the phaseshift φm ∈ [0, 2π) of the modulated signal giventhe frequencies fm.

0 5 10 15 20 250

5

10

Ph

ase

(ra

d)

80MHz

0 5 10 15 20 250

5

10

Ph

ase

(ra

d)

16MHz

0 5 10 15 20 25

Distance (m)

0

5

10

Ph

ase

(ra

d)

120MHz

Figure 1: Left: The Kinect v2 sensor, right:wrapped phases for Kinect v2, in the range 0 to 25meters. Top to bottom: φ0, φ1, φ2. The dashed lineat 18.75 meters indicates the common wrap-aroundpoint for all three phases. Just before this line wehave n0 = 9, n1 = 1, and n2 = 14.

• The corresponding unwrapped phase mea-surements are φm + 2πnm.

• Phase unwrapping means finding the un-wrapping coefficients n = (n0, . . . , nM−1).

• Each vector n corresponds to a hypothesisof the depth ti.

• Our method considers several hypothesesfor each pixel location and selects the onewith the highest kernel density value in aspatial neighbourhood N (x):

p(ti(x)) =

∑j∈I,k∈N (x) wjkK(ti(x)− tj(xk))∑

j∈I,k∈N (x) wjk,

wik = g(x− xk, σ)p(ti|ni)p(t

i|ai) ,K(x) = e−x2

2h2 .

The three factors in wik are:

• the spatial weight g(x− xk, σ).

• the unwrapping likelihood p(ti(x)|ni(x)).

• the phase likelihood p(ti(x)|ai(x)), whereai = (a0, . . . , aM−1), are the amplitudes.

The final hypothesis selection is then made as:

i∗ = argmaxi∈I

p(ti) .

• Inliers are classified by p(ti) > T .

ResultsWe apply the method to depth decoding for the Kinect v2 sensor, and compare it to the MicrosoftKinect SDK and to the open source driver libfreenect2.

• Ground truth is constructed by fusing many depth frames from 9 different camera poses intoone very accurate depth image.

• Raw measurements from the ground truth pose are decoded into depth images using Microsoft,libfreenect2 and our method.

• A point is counted as an inlier when a method outputs a depth estimate that correspond to acorrect unwrapping, which we set to be within 30cm from the ground truth.

• We evaluate on the kitchen dataset with maximal depth of 6.7m and the lecture dataset withmaximum depth of 14.6m.

kitchen kitchen (depth limited) lecture kitchen lecture

10-7

10-5

10-3

10-1

100

outlier rate

0

0.2

0.4

0.6

0.8

1

inlier

rate

r=5,I=2

r=1,I=2

r=5,I=3

I=1

libfreenect2

10-7

10-5

10-3

10-1

100

outlier rate

0

0.2

0.4

0.6

0.8

1

inlier

rate

r=5,I=2, d<8

r=1,I=2 d<8

r=5,I=3 d<8

libfreenect2 d<8

Microsoft

10-7

10-5

10-3

10-1

100

outlier rate

0

0.2

0.4

0.6

0.8

1

inlier

rate

r=5,I=2

r=5,I=3

r=1,I=2

I=1

libfreenect2

Microsoft

Figure 2: Inlier and outlier rate plots and corresponding ground truth depth images. Each point or curve isthe average over 25 frames. In the kitchen (depth limited) curve the algorithms assume a maximum depthof 8m, which simplifies the outlier rejection.

• A clear improvement can be observed in the depth images, especially in large depth scenes.

Figure 3: Single frame output. Left: libfreenect2, Center: proposed method. Right: corresponding RGBimage. Pixels suppressed by outlier rejection are shown in green.

• Recordings of raw Kinect v2 measurements were unwrapped and passed to the Kinect fusionimplementation KinFu in the Point Cloud Library.

libfreenect2: −→

Proposed:−→

libfreenect2: −→

Proposed:−→

Figure 4: KinFu scans of two different scenes using depth images produced by libfreenect2 and the proposedmethod. The input duration was 200 frames.