Automatic scene inference for 3D object compositing

Automatic scene inference for 3D object compositing

Kevin Karsch (UIUC), Sunkavalli, K. Hadap, S.; Carr, N.; Jin, H.; Fonte, R.; Sittig, M., David Forsyth

SIGGRAPH 2014

What is this system• Image editing system• Drag-and-drop object insertion• Place objects in 3D and relight• Fully automatic for recovering a comprehensive 3D

scene model: geometry, illumination, diffuse albedo, and camera parameters

• From single low dynamic range (LDR) image

Existing problems• It’s the artist’s job to create photorealistic

effects by recognizing the physical space• Lighting, shadow, perspective• Need: camera parameters, scene geometry,

surface materials, and sources of illumination

State-of-the-art• http://www.popularmechanics.com/technolog

y/digital/visual-effects/4218826• http://en.wikipedia.org/wiki/The_Adventures

_of_Seinfeld_%26_Superman

http://www.popularmechanics.com/technology/digital/visual-effects/4218826

http://www.popularmechanics.com/technology/digital/visual-effects/4218826

http://en.wikipedia.org/wiki/The_Adventures_of_Seinfeld_&_Superman

http://en.wikipedia.org/wiki/The_Adventures_of_Seinfeld_&_Superman

What can not this system handle• Works best when scene lighting is diffuse;

therefore generally works better indoors than out• Errors in either geometry, illumination, or

materials may be prominent• Does not handle object insertion behind existing

scene elements

Contribution• Illumination inference: recovers a full lighting

model including light sources not directly visible in the photograph

• Depth estimation: combines data-driven depth transfer with geometric reasoning about the scene layout

How to do this• Need: geometry, illumination, surface

reflectance• Even though the estimates are coarse, the

composites still look realistic because even large changes in lighting are often not perceivable

Workflow

Indoor/outdoor scene classification• K-nearest-neighbor matching of GIST features• Indoor dataset: NYUv2• Outdoor dataset: Make3D• Different training images and classifiers are

chosen depending on indoor/outdoor scene

Single image reconstruction• Camera parameters, geometry– Focal length f, camera center (cx, cy) and extrinsic

parameters are computed from three orthogonal vanishing points detected in the scene

Surface materials• Per-pixel diffuse material albedo and shading

by Color Rentinex method

Data-driven depth estimation• Database: rgbd• Appearance cues for correspondences: multi-

scale SIFT features• Incorporate geometric information

Data-driven depth estimation

Et: depth transferEm: Manhattan worldEo: orientationE3s: spatial smoothness in 3D

Scene illumination

Visible sources• Segment the image into superpixels;• Then compute features for each superpixel;– Location in image– Use 340 features used in Make3D

• Train a binary classifier with annotated data to predict whether or not a superpixel is emitting/reflecting a significant amount of light.

Out-of-view sources• Data-driven: annotated SUN360 panorama

dataset;• Assumption: if photographs are similar, then

the illumination environment beyond the photographed region will be similar as well.

Out-of-view sources• Use features: geometric context, orientation maps, spatial

pyramids, HSV histograms, output of the light classifier;• Measure: histogram intersection score, per-pixel inner

product;• Similarity metric of IBLs: how similar the rendered canonical

objects are;• Ranking function: 1-slack, linear SVN-ranking optimization

(trained).

Relative intensities of the light sources• Intensity estimation through rendering: adjusting until a

rendered version of the scene matches the original image;• Humans cannot distinguish between a range of illumination

configurations, suggesting that there is a family of lighting conditions that produce the same perceptual response.

• Simply choose the lighting configuration that can be rendered faster.

Physically grounded image editing• Drag-and-drop insertion• Lighting adjustment• Synthetic depth-of-field

User study• Real object, real scene VS inserted object, real

scene• Synthetic object, synthetic scene VS inserted

object, synthetic scene• Produces perceptually convincing results

Documents

Automatic scene inference for 3D object compositing