Neural Scene De-renderingnsd.csail.mit.edu/talks/nsd_poster_cvpr.pdfNeural Scene De-rendering Jiajun Wu1 Joshua B. Tenenbaum1 PushmeetKohli2,* 1 Massachusetts Institute of Technology

Neural Scene De-rendering Jiajun Wu1 Joshua B. Tenenbaum1 Pushmeet Kohli2,*

1 Massachusetts Institute of Technology 2 DeepMind * Work done when the author was with Microsoft Research

Scene De-renderingGoal: a compact, interpretable scene representation

Applications

Motivation• An object-based, disentangled representation has wide applications• Representations learned by current deep nets are hard to interpret

InpaintingSolution: looping in a forward graphics engine in recognitionAdvantages• Graphics engines bring in symbolic representation naturally• Graphics engines generalize well to a variable number of objects• The learned representation is rich, and has wide applications.

Model

<objects><balloon: right><bench: yellow><tree: right><boy: stand happy><girl: sit sad>

</objects>

<objects><pig: left close><villager: left><tree: tall right>

</objects>

De-render

Render

De-render

Render

(a) Input image

Interpreting proposals

Rendering images

(c) Inference

Proposingsegments

Applications

(d) Rendered image(I) (II) (III)

(b) Segment proposals

Imageediting:

Captioning: The boy is …

Inpainting, analogy-making, …

(prediction loss) (reconstruction loss)

Scene XML<object>

<category>triangle</category><size>1.5</size><color>blue</color><position>1.5,2,1</position><yaw>0</yaw>…

</object><object>…

(c) Original image(a) Corrupted input (b) Reconstruction

: :: :

: :: :

: :: :Reference 𝐴′Reference 𝐴 Query 𝐵 Prediction 𝐵′

: :: :

Analogy-Making

GraphicsEngine

Inference & Reconstruction

If = 0 If = 1

(a) Input image (c) NSD(b) CNN+LSTM (d) NSD (full)

Results

Training Steps• Supervised pre-training with the prediction loss• End-to-end fine-tuning with the reconstruction loss (with REINFORCE)

• Graphics engines as generalized decoders• Visually distinctive images may have similar representations• Solution: Optimizing in both spaces

jenny and mike are having fun in the sandbox unaware of the storm that's coming their way

Caption Retrieval

Documents

Neural Scene De-renderingnsd.csail.mit.edu/talks/nsd_poster_cvpr.pdfNeural Scene De-rendering Jiajun Wu1 Joshua B. Tenenbaum1 PushmeetKohli2,* 1 Massachusetts Institute of Technology