111
Intrinsic3D: High-Quality 3D Reconstruction by Joint Appearance and Geometry Optimization with Spatially-Varying Lighting R. Maier 1,2 , K. Kim 1 , D. Cremers 2 , J. Kautz 1 , M. Nießner 2,3 International Conference on Computer Vision 2017 Ours Fusion 1 NVIDIA 3 Stanford University 2 TU Munich

Intrinsic3D: High-Quality 3D Reconstruction by Joint ... · Intrinsic3D: High-Quality 3D Reconstruction by Joint Appearance and Geometry Optimization with Spatially-Varying Lighting

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

Intrinsic3D: High-Quality 3D Reconstruction by Joint Appearance and Geometry Optimization with Spatially-Varying Lighting

R. Maier1,2, K. Kim1, D. Cremers2, J. Kautz1, M. Nießner2,3

International Conference on Computer Vision 2017

OursFusion

1 NVIDIA 3 Stanford University2 TU Munich

2

Overview

• Motivation & State-of-the-art

• Approach

• Results

• Conclusion

3

Overview

• Motivation & State-of-the-art

• Approach

• Results

• Conclusion

4

Motivation

• Recent progress in Augmented Reality (AR) / Virtual Reality (VR)

HTC ViveMicrosoft HoloLens

5

Motivation

• Recent progress in Augmented Reality (AR) / Virtual Reality (VR)

• Requirement of high-quality 3D content for AR, VR, Gaming …

HTC Vive

NVIDIA VR Funhouse

Microsoft HoloLens

6

Motivation

• Recent progress in Augmented Reality (AR) / Virtual Reality (VR)

• Requirement of high-quality 3D content for AR, VR, Gaming …

• Usually: manual modelling (e.g. Maya)

HTC Vive

NVIDIA VR Funhouse

Microsoft HoloLens

7

Motivation

• Recent progress in Augmented Reality (AR) / Virtual Reality (VR)

• Requirement of high-quality 3D content for AR, VR, Gaming …

• Usually: manual modelling (e.g. Maya)

• Wide availability of commodity RGB-D sensors: efficient methods for 3D reconstruction

HTC Vive

NVIDIA VR Funhouse

Microsoft HoloLens

Asus Xtion

8

Motivation

• Recent progress in Augmented Reality (AR) / Virtual Reality (VR)

• Requirement of high-quality 3D content for AR, VR, Gaming …

• Usually: manual modelling (e.g. Maya)

• Wide availability of commodity RGB-D sensors: efficient methods for 3D reconstruction

• Challenge: how to reconstruct high-quality 3D models with best-possible geometry and color from low-cost depth sensors?

HTC Vive

NVIDIA VR Funhouse

Microsoft HoloLens

Asus Xtion

9

State-of-the-art

• Goal: stream of RGB-D frames of a scene → 3D shape that maximizes the geometric consistency

RGB-D based 3D Reconstruction

10

State-of-the-art

• Goal: stream of RGB-D frames of a scene → 3D shape that maximizes the geometric consistency

• Real-time, robust, fairly accurate geometric reconstructions

RGB-D based 3D Reconstruction

KinectFusion, 2011

“KinectFusion: Real-time Dense Surface Mapping and Tracking”, Newcombe et al., ISMAR 2011.

11

State-of-the-art

• Goal: stream of RGB-D frames of a scene → 3D shape that maximizes the geometric consistency

• Real-time, robust, fairly accurate geometric reconstructions

RGB-D based 3D Reconstruction

DynamicFusion, 2015KinectFusion, 2011

“KinectFusion: Real-time Dense Surface Mapping and Tracking”, Newcombe et al., ISMAR 2011.

“DynamicFusion: Reconstruction and Tracking of Non-rigid Scenes in Real-time”, Newcombe et al., CVPR 2015.

12

State-of-the-art

• Goal: stream of RGB-D frames of a scene → 3D shape that maximizes the geometric consistency

• Real-time, robust, fairly accurate geometric reconstructions

RGB-D based 3D Reconstruction

BundleFusion, 2017DynamicFusion, 2015KinectFusion, 2011

“KinectFusion: Real-time Dense Surface Mapping and Tracking”, Newcombe et al., ISMAR 2011.

“DynamicFusion: Reconstruction and Tracking of Non-rigid Scenes in Real-time”, Newcombe et al., CVPR 2015.

“BundleFusion: Real-time Globally Consistent 3D Reconstruction using On-the-fly Surface Re-integration”, Dai et al., ToG 2017.

13

State-of-the-artVoxel Hashing• Baseline RGB-D based 3D reconstruction

framework• initial camera poses

• sparse SDF reconstruction

“Real-time 3D Reconstruction at Scale using Voxel Hashing”, Nießner et al., ToG 2013.

14

State-of-the-artVoxel Hashing• Baseline RGB-D based 3D reconstruction

framework• initial camera poses

• sparse SDF reconstruction

• Challenges:• (Slightly) inaccurate and over-smoothed geometry

“Real-time 3D Reconstruction at Scale using Voxel Hashing”, Nießner et al., ToG 2013.

15

State-of-the-artVoxel Hashing• Baseline RGB-D based 3D reconstruction

framework• initial camera poses

• sparse SDF reconstruction

• Challenges:• (Slightly) inaccurate and over-smoothed geometry

• Bad colors

“Real-time 3D Reconstruction at Scale using Voxel Hashing”, Nießner et al., ToG 2013.

16

State-of-the-artVoxel Hashing• Baseline RGB-D based 3D reconstruction

framework• initial camera poses

• sparse SDF reconstruction

• Challenges:• (Slightly) inaccurate and over-smoothed geometry

• Bad colors

• Inaccurate camera pose estimation

“Real-time 3D Reconstruction at Scale using Voxel Hashing”, Nießner et al., ToG 2013.

17

State-of-the-artVoxel Hashing• Baseline RGB-D based 3D reconstruction

framework• initial camera poses

• sparse SDF reconstruction

• Challenges:• (Slightly) inaccurate and over-smoothed geometry

• Bad colors

• Inaccurate camera pose estimation

• Input data quality (e.g. motion blur, sensor noise)

“Real-time 3D Reconstruction at Scale using Voxel Hashing”, Nießner et al., ToG 2013.

18

State-of-the-artVoxel Hashing• Baseline RGB-D based 3D reconstruction

framework• initial camera poses

• sparse SDF reconstruction

• Challenges:• (Slightly) inaccurate and over-smoothed geometry

• Bad colors

• Inaccurate camera pose estimation

• Input data quality (e.g. motion blur, sensor noise)

“Real-time 3D Reconstruction at Scale using Voxel Hashing”, Nießner et al., ToG 2013.

19

State-of-the-artVoxel Hashing• Baseline RGB-D based 3D reconstruction

framework• initial camera poses

• sparse SDF reconstruction

• Challenges:• (Slightly) inaccurate and over-smoothed geometry

• Bad colors

• Inaccurate camera pose estimation

• Input data quality (e.g. motion blur, sensor noise)

• Goal: High-Quality Reconstruction of Geometry and Color

“Real-time 3D Reconstruction at Scale using Voxel Hashing”, Nießner et al., ToG 2013.

2020

State-of-the-art

2121

State-of-the-artHigh-Quality Colors [Zhou2014]

“Color Map Optimization for 3D Reconstruction with Consumer Depth Cameras”, Zhou and Koltun, ToG 2014

Optimize camera poses and image deformations to optimally fit initial (maybe wrong) reconstruction

But: HQ images required, no geometry refinement involved

2222

State-of-the-artHigh-Quality Colors [Zhou2014]

“Color Map Optimization for 3D Reconstruction with Consumer Depth Cameras”, Zhou and Koltun, ToG 2014

Optimize camera poses and image deformations to optimally fit initial (maybe wrong) reconstruction

But: HQ images required, no geometry refinement involved

High-Quality Geometry [Zollhöfer2015]

“Shading-based Refinement on Volumetric Signed Distance Functions”, Zollhöfer et al., ToG 2015

Adjust camera poses in advance (bundle adjustment) to improve colorUse shading cues (RGB) to refine geometry (shading based refinement of surface & albedo)

But: RGB is fixed (no color refinement based on refined geometry)

2323

State-of-the-artHigh-Quality Colors [Zhou2014]

“Color Map Optimization for 3D Reconstruction with Consumer Depth Cameras”, Zhou and Koltun, ToG 2014

Optimize camera poses and image deformations to optimally fit initial (maybe wrong) reconstruction

But: HQ images required, no geometry refinement involved

High-Quality Geometry [Zollhöfer2015]

“Shading-based Refinement on Volumetric Signed Distance Functions”, Zollhöfer et al., ToG 2015

Adjust camera poses in advance (bundle adjustment) to improve colorUse shading cues (RGB) to refine geometry (shading based refinement of surface & albedo)

But: RGB is fixed (no color refinement based on refined geometry)

Idea: jointly optimize for geometry, albedo and image formation model to simultaneously obtain high-quality geometry and appearance!

24

Our Method

25

Our Method

• Temporal view sampling & filtering techniques (input frames)

26

Our Method

• Temporal view sampling & filtering techniques (input frames)

• Joint optimization of • surface & albedo (Signed Distance

Field) • image formation model

27

Our Method

• Temporal view sampling & filtering techniques (input frames)

• Joint optimization of • surface & albedo (Signed Distance

Field) • image formation model

28

Our Method

• Temporal view sampling & filtering techniques (input frames)

• Joint optimization of • surface & albedo (Signed Distance

Field) • image formation model

• Lighting estimation using Spatially-Varying Spherical Harmonics (SVSH)

29

Our Method

• Temporal view sampling & filtering techniques (input frames)

• Joint optimization of • surface & albedo (Signed Distance

Field) • image formation model

• Lighting estimation using Spatially-Varying Spherical Harmonics (SVSH)

• Optimized colors (by-product)

30

Overview

• Motivation & State-of-the-art

• Approach

• Results

• Conclusion

31

ApproachOverview

RGB-D

32

ApproachOverview

RGB-D SDF Fusion

33

ApproachOverview

RGB-D SDF Fusion Shading-based Refinement(Shape-from-Shading)

34

ApproachOverview

RGB-D SDF Fusion

Temporal view sampling / filtering

Shading-based Refinement(Shape-from-Shading)

35

ApproachOverview

RGB-D SDF Fusion

Temporal view sampling / filtering

Shading-based Refinement(Shape-from-Shading)

Spatially-VaryingLighting Estimation

36

ApproachOverview

RGB-D SDF Fusion

Temporal view sampling / filtering

Shading-based Refinement(Shape-from-Shading)

Spatially-VaryingLighting Estimation

Joint Appearance and Geometry Optimization• surface• albedo• image formation model

37

ApproachOverview

High-Quality 3D Reconstruction

RGB-D SDF Fusion

Temporal view sampling / filtering

Shading-based Refinement(Shape-from-Shading)

Spatially-VaryingLighting Estimation

Joint Appearance and Geometry Optimization• surface• albedo• image formation model

38

ApproachOverview

RGB-D

39

RGB-D Data

• 1086 RGB-D frames

• Sensor:

• Depth 640x480px

• Color 1280x1024px

• ~10 Hz

• Primesense

• Poses estimated using Voxel Hashing

Example: Fountain dataset

Source: http://qianyi.info/scenedata.html

40

ApproachOverview

SDF Fusion

41

• Voxel grid: dense (e.g. KinectFusion) or sparse (e.g. Voxel Hashing)

Signed Distance FieldsVolumetric 3D model representation

“A volumetric method for building complex models from range images”, Curless and Levoy, SIGGRAPH 1996.

42

• Voxel grid: dense (e.g. KinectFusion) or sparse (e.g. Voxel Hashing)• Each voxel stores:

• Signed Distance Function (SDF): signed distance to closest surface• Color values• Weights

Signed Distance FieldsVolumetric 3D model representation

“A volumetric method for building complex models from range images”, Curless and Levoy, SIGGRAPH 1996.

43

Signed Distance Fields

• Integrate depth maps into SDF with their estimated camera poses

Fusion of depth maps

44

Signed Distance Fields

• Integrate depth maps into SDF with their estimated camera poses

• Voxel updates using weighted average

Fusion of depth maps

45

Signed Distance Fields

• Integrate depth maps into SDF with their estimated camera poses

• Voxel updates using weighted average

Fusion of depth maps

46

Signed Distance Fields

• Integrate depth maps into SDF with their estimated camera poses

• Voxel updates using weighted average

Fusion of depth maps

47

Signed Distance Fields

• Integrate depth maps into SDF with their estimated camera poses

• Voxel updates using weighted average

Fusion of depth maps

48

Signed Distance Fields

• Integrate depth maps into SDF with their estimated camera poses

• Voxel updates using weighted average

• Extract ISO-surface with Marching Cubes (triangle mesh)

”Marching cubes: A high resolution 3D surface construction algorithm”, Lorensen and Cline, SIGGRAPH 1987.

Fusion of depth maps

49

ApproachOverview

Temporal view sampling / filtering

50

Keyframe Selection

• Compute per-frame blur score (for color image)

“The blur effect: perception and estimation with a new no-reference perceptual blur metric”, Crete et al., SPIE 2007.

Frame 81 Frame 84

• Select frame with best score within a fixed size window as keyframe

51

Sampling / Filtering

• Sample from selected keyframes only

• Collect observations for voxel in input views:

Sampling of voxel observationsInput keyframes

Reconstruction

52

Sampling / Filtering

• Sample from selected keyframes only

• Collect observations for voxel in input views:

Sampling of voxel observations

Voxel center transformed and projected into input view

Input keyframes

Reconstruction

53

Sampling / Filtering

• Sample from selected keyframes only

• Collect observations for voxel in input views:

• Observation weights: view-dependent on normal and depth

• Filter observations: keep only best 5 observations by weight

Sampling of voxel observations

Voxel center transformed and projected into input view

Input keyframes

Reconstruction

54

ApproachOverview

Shading-based Refinement(Shape-from-Shading)

Spatially-VaryingLighting Estimation

Joint Appearance and Geometry Optimization• surface• albedo• image formation model

Double-hierarchical(coarse-to-fine: SDF Volume / RGB-D)

55

Shape-from-Shading

• Shading equation:

56

Shape-from-Shadingsurface normal

• Shading equation:

57

Shape-from-Shading

lighting

surface normal

• Shading equation:

58

Shape-from-Shadingalbedo

lighting

surface normal

• Shading equation:

59

Shape-from-Shadingalbedo

lighting

surface normal

• Shading equation:

Shading

60

Shape-from-Shading

• Shading-based refinement:

• Intuition: high-frequency changes in surface geometry → shading cues in input images

albedo

lighting

surface normal

• Shading equation:

Shading

61

Shape-from-Shading

• Shading-based refinement:

• Intuition: high-frequency changes in surface geometry → shading cues in input images

• Estimate lighting given surface and albedo (intrinsic material properties)

albedo

lighting

surface normal

• Shading equation:

Shading

62

Shape-from-Shading

• Shading-based refinement:

• Intuition: high-frequency changes in surface geometry → shading cues in input images

• Estimate lighting given surface and albedo (intrinsic material properties)

• Estimate surface and albedo given the lighting: minimize difference between estimated shading and input luminance

albedo

lighting

surface normal

• Shading equation:

Shading

-

63

ApproachOverview

Spatially-VaryingLighting Estimation

64

• Encode incident lighting for a given surface point

• Smooth for Lambertian surfaces

Lighting EstimationSpherical Harmonics (SH)

65

• Encode incident lighting for a given surface point

• Smooth for Lambertian surfaces

• SH Basis functions Hm parametrized by unit normal n

Lighting EstimationSpherical Harmonics (SH)

66

• Encode incident lighting for a given surface point

• Smooth for Lambertian surfaces

• SH Basis functions Hm parametrized by unit normal n

• Good approx. using only 9 SH basis functions (2nd order)

Lighting EstimationSpherical Harmonics (SH)

67

• Encode incident lighting for a given surface point

• Smooth for Lambertian surfaces

• SH Basis functions Hm parametrized by unit normal n

• Good approx. using only 9 SH basis functions (2nd order)

• Estimate SH coefficients:

Lighting EstimationSpherical Harmonics (SH)

68

• Encode incident lighting for a given surface point

• Smooth for Lambertian surfaces

• SH Basis functions Hm parametrized by unit normal n

• Good approx. using only 9 SH basis functions (2nd order)

• Estimate SH coefficients:

Lighting EstimationSpherical Harmonics (SH)

• Shortcoming: purely directional → cannot represent scene lighting for all surface points simultaneously

69

Spatially-Varying LightingSubvolume Partitioning

70

Spatially-Varying Lighting

• Partition SDF volume into subvolumes

Subvolume Partitioning

71

Spatially-Varying Lighting

• Partition SDF volume into subvolumes

• Estimate independent SH coefficients for each subvolume

Subvolume Partitioning

72

Spatially-Varying Lighting

• Partition SDF volume into subvolumes

• Estimate independent SH coefficients for each subvolume

• Obtain per-voxel SH coefficients through tri-linear interpolation

Subvolume Partitioning

73

Spatially-Varying LightingJoint Optimization

74

Spatially-Varying Lighting

• Estimate SVSH coefficients for all subvolumes jointly:

Joint Optimization

75

Spatially-Varying Lighting

• Estimate SVSH coefficients for all subvolumes jointly:

Joint Optimization

Similarity between estimated shading and input luminance

Data term:

76

Spatially-Varying Lighting

• Estimate SVSH coefficients for all subvolumes jointly:

Joint Optimization

Similarity between estimated shading and input luminance

Data term:

Laplacian regularizer:

Smooth illumination changes

77

ApproachOverview

Joint Appearance and Geometry Optimization• surface• albedo• image formation model

78

Joint Optimization

• Joint optimization of geometry, albedo and image formation model (camera poses and camera intrinsics):

Shading-based SDF optimization

with

79

Joint Optimization

• Joint optimization of geometry, albedo and image formation model (camera poses and camera intrinsics):

Shading-based SDF optimization

Gradient-based shading constraint (data term)

with

80

Joint Optimization

• Joint optimization of geometry, albedo and image formation model (camera poses and camera intrinsics):

Shading-based SDF optimization

Gradient-based shading constraint (data term)Volumetric regularizer: smoothness in distance values (Laplacian)

with

81

Joint Optimization

• Joint optimization of geometry, albedo and image formation model (camera poses and camera intrinsics):

Shading-based SDF optimization

Gradient-based shading constraint (data term)Volumetric regularizer: smoothness in distance values (Laplacian)Surface Stabilization constraint: stay close to initial distance values

with

82

Joint Optimization

• Joint optimization of geometry, albedo and image formation model (camera poses and camera intrinsics):

Shading-based SDF optimization

Gradient-based shading constraint (data term)Volumetric regularizer: smoothness in distance values (Laplacian)Surface Stabilization constraint: stay close to initial distance valuesAlbedo regularizer: constrain albedo changes based on chromaticity (Laplacian)

with

83

Joint Optimization

• Idea: maximize consistency between estimated voxel shading and sampled intensities in input luminance images (gradient for robustness)

Shading Constraint (data term)

84

Joint Optimization

• Idea: maximize consistency between estimated voxel shading and sampled intensities in input luminance images (gradient for robustness)

Shading Constraint (data term)

Best views for voxel and respective view-dependent weights

85

Joint Optimization

• Idea: maximize consistency between estimated voxel shading and sampled intensities in input luminance images (gradient for robustness)

Shading Constraint (data term)

Best views for voxel and respective view-dependent weightsShading: allows for optimization of surface (through normal) and albedo

86

Joint Optimization

• Idea: maximize consistency between estimated voxel shading and sampled intensities in input luminance images (gradient for robustness)

Shading Constraint (data term)

Best views for voxel and respective view-dependent weightsShading: allows for optimization of surface (through normal) and albedoVoxel center transformed and projected into input view

87

Joint Optimization

• Idea: maximize consistency between estimated voxel shading and sampled intensities in input luminance images (gradient for robustness)

Shading Constraint (data term)

Best views for voxel and respective view-dependent weightsShading: allows for optimization of surface (through normal) and albedo

Sampling: allows for optimization of camera poses and camera intrinsicsVoxel center transformed and projected into input view

88

Recolorization

• Recompute voxel colors after optimization at each level

Optimal colors

89

Recolorization

• Recompute voxel colors after optimization at each level

• Sampling

• Sample from keyframes only

• Collect, weight and filter observations

Optimal colors

90

Recolorization

• Recompute voxel colors after optimization at each level

• Sampling

• Sample from keyframes only

• Collect, weight and filter observations

• Weighted average of observations:

Optimal colors

91

Overview

• Motivation & State-of-the-art

• Approach

• Results

• Conclusion

92

Ground Truth: GeometryFrog (synthetic)

Ours

Fusion Ground truth

93

Ground Truth: GeometryFrog (synthetic)

Ours

Zollhöfer et al. 15

Fusion Ground truth

94

Ground Truth: GeometryFrog (synthetic)

Ours

Zollhöfer et al. 15 Ours

Fusion Ground truth

95

Ground Truth: Quantitative ResultsFrog (synthetic) Zollhöfer et al. 15

• Generated synthetic RGB-D dataset (noise on depth and camera poses)

• Quantitative surface accuracy evaluation• Color coding: absolute distances (ground truth)

96

Ground Truth: Quantitative ResultsFrog (synthetic)

Ours

Zollhöfer et al. 15

• Generated synthetic RGB-D dataset (noise on depth and camera poses)

• Quantitative surface accuracy evaluation• Color coding: absolute distances (ground truth)

97

Ground Truth: Quantitative ResultsFrog (synthetic)

Ours

Zollhöfer et al. 15

• Generated synthetic RGB-D dataset (noise on depth and camera poses)

• Quantitative surface accuracy evaluation• Color coding: absolute distances (ground truth)

Mean absolute deviation:• Ours: 0.222mm (std.dev. 0.269mm)• Zollhöfer et al: 0.278mm (std.dev. 0.299mm)

→ 20.14% more accurate

98

Qualitative ResultsRelief (geometry)

Input Color

Fusion

Ours

Zollhöfer et al. 15

Ours

99

Qualitative ResultsFountain (appearance)

Input Color Ours

Fusion

Zollhöfer et al. 15

Ours

100

Qualitative ResultsLion

Input ColorGeometry (ours) Appearance (ours)

Fusion Ours Fusion Ours

101

Qualitative ResultsTomb Statuary

Input Color Appearance (ours)

Fusion Ours

Geometry (ours)

Fusion Ours

102

Qualitative ResultsGate

Input Color

Fusion Ours

Appearance (ours)Geometry (ours)

Fusion Ours

103

Qualitative ResultsHieroglyphics

Input Color

Fusion Ours

Appearance (ours)Geometry (ours)

Fusion Ours

104

Qualitative ResultsBricks

Input Color Geometry (ours)

Appearance (ours)

Ours Fusion

105

Luminance

Shading: Global SH vs. SVSHFountain

106Albedo

Luminance

Shading: Global SH vs. SVSHFountain

107Albedo

Luminance

Shading: Global SH vs. SVSHFountain

Global SH

DifferenceShading

108Albedo

Luminance

Shading: Global SH vs. SVSHFountain

Global SH

SVSH

Difference

DifferenceShading

Shading

109

Overview

• Motivation & State-of-the-art

• Approach

• Results

• Conclusion

110

Conclusion

• High-Quality 3D Reconstruction of Geometry and Appearance• Temporal view sampling & filtering techniques

• Spatially-Varying Lighting estimation

• Joint optimization of surface & albedo (SDF) and image formation model

• Optimized texture as by-product

111

Conclusion

• High-Quality 3D Reconstruction of Geometry and Appearance• Temporal view sampling & filtering techniques

• Spatially-Varying Lighting estimation

• Joint optimization of surface & albedo (SDF) and image formation model

• Optimized texture as by-product

Thank you!

Questions?

Robert Maier

Technical University of MunichComputer Vision Group

[email protected]://vision.in.tum.de/members/maierr