85
3D + Graphics QI ZHU & JUHO KIM

3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

  • Upload
    others

  • View
    2

  • Download
    1

Embed Size (px)

Citation preview

Page 1: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

3D + GraphicsQI ZHU & JUHO KIM

Page 2: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Outline

• Pose estimation (3D recovery from 2D images)

• Novel Image / View synthesis

• Reconstruction and generation of 3D

Page 3: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Part 1POSE ESTIMATION (3D RECOVERY FROM 2D IMAGES)

Page 4: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Viewpoint Estimation

INPUT: RGB image

OUTPUT: Camera pose = Rotation (yaw, pitch, roll) and Translation Matrix

Beyond PASCAL: A Benchmark for 3D Object Detection in the Wild WACV14

Page 5: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Render for CNN [ICCV15]

Su, Hao, et al. "Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views." ICCV2015

Page 6: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Render for CNN [ICCV15]

Su, Hao, et al. "Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views." ICCV2015

Page 7: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Render for CNN [ICCV15]

Su, Hao, et al. "Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views." ICCV2015

CNN

Page 8: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Render for CNN [ICCV15]

Su, Hao, et al. "Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views." ICCV2015

Page 9: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Generating synthesizedimagesStructure-preserving deformation

◦ Symmetry-preserving free-form deformation

◦ Embed object in uniform grid

◦ Represent every point in space as a weighted combination of the control points

◦ Draw i.i.d vectors for control point

◦ Translate vector of each control point

◦ Set the translations of symmetric control points to be equal

Scott Schaefer Free-Form Deformation of Solid Geometric Models TAMU

Page 10: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Cont’dOverfit-Resistant Image Synthesis ◦ Variation in rendering

◦ vary light condition and camera pose

◦ Background synthesis ◦ alpha composition blend

◦ Cropping◦ Adding real annotated image

Image Compositing and Blending CMU15-463 2007 Fall

Page 11: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Final Training Dataset

Image Compositing and Blending CMU15-463 2007 Fall

Page 12: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Problem formulationInput: single RGB imageViewpoint as a tuple (θ, φ, ψ) of camera rotation parameters

◦ Discretized and divided into 360, 180, 360 bins

Rotation reference: predefined initial pose face camera

Output: probabilities of each viewpoint

Loss function:

where Pv(s; cs ) is the probability of view v for sample s from the soft-max viewpoint classifier of class cs

Su, Hao, et al. "Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views." ICCV2015

Page 13: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Class-Dependent Network Architecture

• Based on Alex Net [NIPS12]

• Shared weights but different class FC Layer

• Large number of outputs! (380+180+360) x N

• Claim is that different output layers handle the large variance among

different object categories

Su, Hao, et al. "Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views." ICCV2015

Page 14: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Does synthetic training data help?

Su, Hao, et al. "Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views." ICCV2015

Page 15: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Several results

Su, Hao, et al. "Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views." ICCV2015

Page 16: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Keypoint Estimation: 2D to 3D

Wei, Shih-En, et al. “Convolutional pose machines.”CVPR 2016, IKEA dataset [Lim et al., 2013]

Page 17: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Single Image 3D Interpreter Network [ECCV16]

Simultaneously infer 2D keypoints heatmap and

3D structures from single image!

Wu, Jiajun, et al. “Single image 3d interpreter network.” ECCV 2016

Page 18: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

ChallengesAnnotations?◦ 2D keypoints labels – easy to get, e.g. crowdsourcing

◦ 3D object annotations in real 2D images – hard to acquire

Synthetic training data?◦ Statistics of real and synthesized images is different

Wu, Jiajun, et al. “Single image 3d interpreter network.” ECCV 2016

Page 19: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Using both real and synthetic image

Wu, Jiajun, et al. “Single image 3d interpreter network.” ECCV 2016

Page 20: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Using both real and synthetic image

Wu, Jiajun, et al. “Single image 3d interpreter network.” ECCV 2016

Page 21: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

3D Object Representation

Wu, Jiajun, et al. “Single image 3d interpreter network.” ECCV 2016

Page 22: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

3D-Skeleton RepresentationAssumption:

◦ objects can only have constrained deformations

◦ The first base shape is the mean shape of all objects within the category

◦ 3D keypoint locations are a weighted sum of base shapes

Wu, Jiajun, et al. “Single image 3d interpreter network.” ECCV 2016

Page 23: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Step 1: Estimates 2D keypoint heatmaps (a and b)

Step 2: Train 3D interpreter on 3D synthetic data (c)

Step 3: Jointly train projection layer from 2D annotations (d)

Wu, Jiajun, et al. “Single image 3d interpreter network.” ECCV 2016

Network Overview

Page 24: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Zoom in: Keypoint Estimator

Wu, Jiajun, et al. “Single image 3d interpreter network.” ECCV 2016

Page 25: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Zoom in: 3D Interpreter

Wu, Jiajun, et al. “Single image 3d interpreter network.” ECCV 2016

Page 26: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Zoom In: End to End

Wu, Jiajun, et al. “Single image 3d interpreter network.” ECCV 2016

Page 27: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Results

Wu, Jiajun, et al. “Single image 3d interpreter network.” ECCV 2016

Page 28: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Part 2NOVEL IMAGE / V IEW SYNTHESIS

Page 29: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

New scene synthesis?

Flynn, John, et al. “DeepStereo: Learning to predict new views from the world‘s imagery.” CVPR 2016

Page 30: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Flynn, John, et al. “DeepStereo: Learning to predict new views from the world‘s imagery.” CVPR 2016

Given Images

Newly Generated Image

Note the change in viewpoint

Page 31: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

DeepStereo: Learning to Predict New Views from the World’s Imagery

Issues:

• Interpret rotation and

image reprojection

• Long-distance pixel

correlation

Flynn, John, et al. “DeepStereo: Learning to predict new views from the world‘s imagery.” CVPR 2016

Given V1, V2, generate C

Page 32: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

DeepStereo: plane–sweep volumes

Solve two other

problems before

generating a new

view!But why?

Flynn, John, et al. “DeepStereo: Learning to predict new views from the world‘s imagery.” CVPR 2016

Page 33: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Plane–sweep volumes (Cont’d)

• Sweep family of planes at different depths w.r.t. a reference camera

• For each depth, project each input image onto that plane

• This is equivalent to a homography warping each input image into the

reference view

R. Collins. A space-sweep approach to true multi-image matching. CVPR 1996.

Page 34: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

DeepStereo: overview

{S} Depth Selection tower

output

{C} Color tower output

Flynn, John, et al. “DeepStereo: Learning to predict new views from the world‘s imagery.” CVPR 2016

Output is a masked sum over

images selected at different

depths

Page 35: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

DeepStereo: Selection Tower

INPUT: Plane-sweep volumes

OUTPUT: A probability map (or selection map) for each depth indicating the likelihood of each pixel having that depth.

Weight-sum imagesynthesis: Can beinterpreted as expectationamong depth

Flynn, John, et al. “DeepStereo: Learning to predict new views from the world‘s imagery.” CVPR 2016

Page 36: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

DeepStereo: Color Tower

Bonus: D input planes

reduce the effect of

occlusion

OUTPUT: 3D volume

of nodes -> R,G,B

channels each pixel

Flynn, John, et al. “DeepStereo: Learning to predict new views from the world‘s imagery.” CVPR 2016

Page 37: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

DeepStereo: TrainingApply multi-resolution patches in Color Tower

Feed 26 x 26 patches Produce 8 x 8 patches

Use 96 depth planes

Trained via Adagrad

Flynn, John, et al. “DeepStereo: Learning to predict new views from the world‘s imagery.” CVPR 2016

Page 38: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

DeepStereo: Results

Flynn, John, et al. “DeepStereo: Learning to predict new views from the world‘s imagery.” CVPR 2016

Page 39: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Flynn, John, et al. “DeepStereo: Learning to predict new views from the world‘s imagery.” CVPR 2016

Page 40: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Part 3RECONSTRUCTION AND GENERATION OF 3D

Page 41: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Part 3

3D structure reconstruction and generation

• DC-IGN model (Kulkarni et al. NIPS 2015)

• Perspective transformer networks (Yan et al. NIPS 2016)

• 3D-GAN (Wu et al. NIPS 2016)

Page 42: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Deep Convolutional Inverse Graphics Network (DC-IGN)

• Motivation: Can a deep network learn to disentangle factors of image generation such as lighting, rotation, etc.?

• Can we learn a renderer?

• Recall: Conditional VAEs and constraints on latent space

Kulkarni et al., Deep Convolutional Inverse Graphics Network, NIPS 2015

Page 43: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

DC-IGN• Generate transformed images w.r.t rotations, light variations, etc.

• Input: 2D image (150 x 150 pixels)

• Output: 2D image that has one different 3D property

Kulkarni et al., Deep Convolutional Inverse Graphics Network, NIPS 2015

rotation light variation

• Learn latent variables that represent complex transformations

• Use an encoder-decoder structure based on VAE

Link: http://willwhitney.github.io/dc-ign/www/

Page 44: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

DC-IGN• Graphics codes z

Kulkarni et al., Deep Convolutional Inverse Graphics Network, NIPS 2015

rotation of the object

elevation of the object with respect to the camera

variations of the light source

Extrinsic properties

Page 45: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

DC-IGN• Model architecture

- Deep convolution and de-convolution within a VAE formulation

Kulkarni et al., Deep Convolutional Inverse Graphics Network, NIPS 2015

Page 46: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

DC-IGN• Encoder output:

• Model parameter:

• Distribution parameters:

• Variational objective function:

Kulkarni et al., Deep Convolutional Inverse Graphics Network, NIPS 2015

Page 47: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

DC-IGN• Training on a minibatch in which only one extrinsic property

changes i.e. Only a specific latent variable changes

• Key Idea: Force all other latent variables to be same across examples – Force all of them to be close to the minibatch mean

Kulkarni et al., Deep Convolutional Inverse Graphics Network, NIPS 2015

Page 48: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

DC-IGN• Generation w.r.t. manipulation of pose variables

Kulkarni et al., Deep Convolutional Inverse Graphics Network, NIPS 2015

Page 49: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

DC-IGN• Generation w.r.t. light directions

Kulkarni et al., Deep Convolutional Inverse Graphics Network, NIPS 2015

Page 50: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

DC-IGN• Manipulate rotation for a different dataset (chair dataset)

Kulkarni et al., Deep Convolutional Inverse Graphics Network, NIPS 2015

Page 51: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Pixel to Voxel

• DC-IGN: 2D image (pixel) → transformed 2D image (pixel)

• Next models: 2D image (pixel) → 3D image (voxel)

Page 52: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

• Predict the underlying true 3D shape of an object given a 2D single image

• Learn 3D object reconstruction without 3D ground-truth data

• Use different 2D images from multiple viewpoints

• Define two loss functions to generate 3D structures

Perspective Transformer Nets

Yan et al., Perspective Transformer Nets, NIPS 2016

Page 53: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Perspective Transformer Nets• 𝐼 𝑘 : 2D image from k-th viewpoint α 𝑘 by projection

→ 𝐼 𝑘 = 𝑃(𝑋 ; α 𝑘 )

Yan et al., Perspective Transformer Nets, NIPS 2016

Page 54: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Perspective Transformer NetsCase 1 we know the ground truth 3D volume 𝑉

• Generate 3D volume 𝑉 = 𝑓 𝐼 𝑘 = 𝑔(ℎ(𝐼 𝑘 ))

where ℎ · learns a viewpoint-invariant latent representation

𝑔 · is a volume generator

• Loss function

• However, ground truth volume may not be available in practice

Yan et al., Perspective Transformer Nets, NIPS 2016

Page 55: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Perspective Transformer NetsCase 2 we do NOT know the ground truth 3D volume 𝑉

• Use 2D silhouette images

• 𝑆(𝑗): ground truth 2D silhouette image for the j-th viewpoint α 𝑗

• መ𝑆(𝑗): generated silhouettes

• Loss function:

Page 56: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Perspective Transformer Nets

Page 57: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

• Consider a combination of ℒ𝑣𝑜𝑙 and ℒ𝑝𝑟𝑜𝑗

Perspective Transformer Nets

Yan et al., Perspective Transformer Nets, NIPS 2016

𝑓 𝐼 𝑘 = 𝑔(ℎ(𝐼 𝑘 ))

Reference for encoder: Yang et al., NIPS 2015

Page 58: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Perspective Transformer Nets• How to obtain 2D silhouette መ𝑆(𝑗) - perspective projection

• Transformation matrix

where K: camera calibration matrix & (R, t): extrinsic parameters

• Perspective transformation:

where 3D coordinates:

screen coordinates:

Yan et al., Perspective Transformer Nets, NIPS 2016

Page 59: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Perspective Transformer Nets• Use spatial transformer network (Jaderberg et al. NIPS 2015)

(1) Perform dense sampling from input volume in 3D coordinates

to output volume in screen coordinates

(2) Flatten the 3D spatial output across disparity dimension.

Yan et al., Perspective Transformer Nets, NIPS 2016

Page 60: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Perspective Transformer Nets• Training on single category

Yan et al., Perspective Transformer Nets, NIPS 2016

Page 61: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Perspective Transformer Nets

Page 62: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Perspective Transformer Nets• Training on multiple category

Page 63: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

3D-GAN• Generate an object in 3D voxel space from a randomly sampled vector

• Use the Generative Adversarial Network (GAN)

• Map randomly sampled vector in a latent space to an object in 3D voxel space

Wu et al., Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling, NIPS 2016

Random vector

Link: http://3dgan.csail.mit.edu/

Page 64: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

3D-GAN• GAN – Generator + Discriminator

• Generator 𝐺: 𝑧 → 𝐺(𝑧)

where 𝑧: latent vector (200 dimension),

𝐺(𝑧): 3D object in 3D voxel space (64 x 64 x 64 cube)

• Discriminator D: output a confidence value of whether an input is

real or synthetic

• Overall adversarial loss function:

where 𝑥: a real object in a 64 x 64 x 64 space,

𝑧: randomly sampled noise from 𝑝(𝑧)

Wu et al., Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling, NIPS 2016

Page 65: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

3D-GAN• Network structure of generator in 3D-GAN

Wu et al., Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling, NIPS 2016

Page 66: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

3D-VAE-GAN• Extension to 3D-GAN

• Inspired by VAE-GAN of Larsen et al. (ICML 2016)

• Take a 2D image as input to generate a 3D object

Wu et al., Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling, NIPS 2016

Page 67: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

3D-VAE-GAN• New loss function

where 𝑥: a 3D shape from the training set,

𝑦: 𝑥′𝑠 corresponding 2D image,

𝑞(𝑧|𝑦): variational distribution of 𝑧,

𝑝(𝑧): multivariate Gaussian prior

Wu et al., Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling, NIPS 2016

Page 68: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

3D-GAN• 3D object generation

Wu et al., Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling, NIPS 2016

Page 69: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

3D-GAN• 3D object classification

Wu et al., Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling, NIPS 2016

Page 70: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

3D-VAE-GAN• Single image 3D reconstruction

Wu et al., Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling, NIPS 2016

Page 71: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

3D-VAE-GAN• Single image 3D reconstruction

Wu et al., Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling, NIPS 2016

Page 72: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

3D-GAN• Shape arithmetic for chairs

Wu et al., Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling, NIPS 2016

Page 73: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Summary

• 3D vision and graphics based on deep learning

- Pose estimation (3D recovery from 2D images)

- Novel Image / View synthesis

- Reconstruction and generation of 3D

Page 74: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

References• John Flynn, Ivan Neulander, James Philbin, Noah Snavely, DeepStereo: Learning to Predict New Views from the World’s Imagery,

CVPR 2016.

• Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, Spatial Transformer Networks, NIPS 2015

• Alex Kendall, Matthew Grimes, Roberto Cipolla, PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization, ICCV 2015.

• Tejas D. Kulkarni, William F. Whitney, Pushmeet Kohli, Josh Tenenbaum, Deep Convolutional Inverse Graphics Network, NIPS 2015.

• Danilo Jimenez Rezende, S. M. Ali Eslami, Shakir Mohamed, Peter Battaglia, Max Jaderberg, Nicolas Heess, Unsupervised Learning of 3D Structure from Images, NIPS 2016.

• Hao Su, Charles R. Qi, Yangyan Li, Leonidas J. Guibas, Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views. ICCV 2015.

• Jiajun Wu, Tianfan Xue, Joseph J. Lim, Yuandong Tian, Joshua B. Tenenbaum, Antonio Torralba, and William T. Freeman, Single Image 3D Interpreter Network, ECCV 2016.

• Jiajun Wu, Chengkai Zhang, Tianfan Xue, William T. Freeman, Joshua B. Tenenbaum, Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling, NIPS 2016.

• Xinchen Yan, Jimei Yang, Ersin Yumer, Yijie Guo, Honglak Lee, Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision, NIPS 2016.

• Jimei Yang, Scott Reed, Ming-Hsuan Yang, Honglak Lee, Weakly-supervised Disentangling with Recurrent Transformations for 3D View Synthesis, NIPS 2015.

Page 75: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Backup slidesREZENDE ET AL . , N IPS 2016 .

Page 76: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Rezende et al. (NIPS 2016)• Construct the underlying 3D structures from 2D observations

• Learn a generative model of 3D structures

• Recover the structure from 2D images via inference

Rezende et al., Unsupervised Learning of 3D structure from Images, NIPS 2016

Page 77: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Rezende et al. (NIPS 2016)• Consider a conditional latent variable model

𝑥: observed image or volume

𝑐: observed contextual information (nothing, object

class label, or one or more views of the scene from

different cameras)

𝑧: low-dimensional codes of latent manifold of object

shapes

ℎ: 3D representations (volume or mesh)

Rezende et al., Unsupervised Learning of 3D structure from Images, NIPS 2016

Page 78: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Rezende et al. (NIPS 2016)• Proposed framework

Rezende et al., Unsupervised Learning of 3D structure from Images, NIPS 2016

Page 79: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Rezende et al. (NIPS 2016)• Sequential generative process: refinement of hidden representations

• Each step generates an independent set of 𝑧𝑡

• 𝑓𝑟𝑒𝑎𝑑: task dependent context encoder

• 𝑓𝑠𝑡𝑎𝑡𝑒: transition function (fully connected LSTM)

• 𝑓𝑤𝑟𝑖𝑡𝑒: volumetric spatial transformer (Jaderberg et al. NIPS 2015)

• Proj: projection operator from latent 3D representation h𝑇 to the training data’s domain

Rezende et al., Unsupervised Learning of 3D structure from Images, NIPS 2016

Page 80: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Rezende et al. (NIPS 2016)• Volumetric spatial transformer (VST)

where : simple affine transformation of a 3D

grid of points that uniformly covers the input image

Rezende et al., Unsupervised Learning of 3D structure from Images, NIPS 2016

Page 81: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Rezende et al. (NIPS 2016)• Projection operator (3 types)

- 3D → 3D (identity)

- 3D → 2D neural network projection (learned)

- 3D → 2D OpenGL projection (fixed)

Rezende et al., Unsupervised Learning of 3D structure from Images, NIPS 2016

Page 82: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Rezende et al. (NIPS 2016)• Probabilistic volume completion (Necker cube, Primitives and MNIST3D)

Rezende et al., Unsupervised Learning of 3D structure from Images, NIPS 2016

Page 83: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Rezende et al. (NIPS 2016)• Class-conditional samples (one-hot encoding of class as context)

Rezende et al., Unsupervised Learning of 3D structure from Images, NIPS 2016

Page 84: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Rezende et al. (NIPS 2016)• Recover 3D structure from 2D images

Rezende et al., Unsupervised Learning of 3D structure from Images, NIPS 2016

Page 85: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel

Rezende et al. (NIPS 2016)• Unsupervised learning of 3D structure (mesh representations)

Rezende et al., Unsupervised Learning of 3D structure from Images, NIPS 2016

Link: https://www.youtube.com/watch?v=stvDAGQwL5c