71
Unsupervised Representations towards Counterfactual Predictions Animesh Garg

Animesh Garg

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Animesh Garg

Unsupervised Representations towards Counterfactual Predictions

Animesh Garg

Page 2: Animesh Garg

Compositional Representations

Vacuuming Sweeping/Mopping Cooking Laundry

Page 3: Animesh Garg

Compositional Representations

Vacuuming Sweeping/Mopping Cooking Laundry

Diversity:New Scenes,

Tools,…

Complexity: Long-term Settings

Page 4: Animesh Garg

Personal Robot Assistant Atlas Robot

Humanoid

Staged Cooking

Dartmouth AI Meeting

U N I M AT E1st Industrial robot

‘611956 1968 2013 2018 2020

Unstructured/Unknown New Environment

Compositional Representations

Page 5: Animesh Garg

Generalization

New Task Variationsin Novel Environments

Task Imitation

Task Performance in Data Scarce Set-up

Supervision

Input

Compositional Representations

Page 6: Animesh Garg

Compositional Representations

Learning Disentanglement

Learning Keypoints

Learning Causal Graphs

Page 7: Animesh Garg

Compositional Representations

Learning Disentanglement

Learning Keypoints

Learning Causal Graphs

Page 8: Animesh Garg

Generative Models: Disentanglement

Objectives of Disentanglement• Compositional Representations• Controllable Sample Generation

dSprites 3DShapes

Existing datasets in unsupervised disentanglement learning

True Factors of Variation

Latent Code

Page 9: Animesh Garg

Disentanglement: Challenges

✗High-Resolution Output

✗Non-identifiability in Unsupervised setting

✗Metrics focus on learning disentangled representations

Locatello et al. ICML 2019.

Page 10: Animesh Garg

Disentanglement: Challenges

✗High-Resolution OutputStyleGAN based backbone (~1%)New high-resolution synthetic datasets: Falcor3D and Isaac3D✗Non-identifiability in Unsupervised settingLimited Supervision (~1%)

✗Metrics focus on learning disentangled representationsNew Metric to Trade-off between controllability and disentanglement

Page 11: Animesh Garg

Disentanglement: StyleGAN

Kerras et al. CVPR 2019.

• Used a style-based generator to replace traditional generator• Success at generating high-resolution

realistic images

Page 12: Animesh Garg

Disentanglement: Semi-StyleGAN

(ours) Li et al. ICML 2020.

The semi-supervised loss is given by

withunsupervised InfoGAN loss term

supervised label reconstruction term

smoothness regularization term

Disentanglement in StyleGANMapping Network in the generator conditions on the

factor code and the encoder predicts its value

J set of all possible factor codesX: set of labeled pairs of real image and factor code M: mixed set of labeled and generated image-code pairsE, G: encoder and generator neural networks

Page 13: Animesh Garg

Disentanglement: Semi-StyleGAN

Li et al. ICML 2020.

The semi-supervised loss is given by

withunsupervised InfoGAN loss term

supervised label reconstruction term

smoothness regularization term

Disentanglement in StyleGANMapping Network in the generator conditions on the

factor code and the encoder predicts its value

Labeled Data 𝒥 ≪ 𝒳 Unpaired Dataℳ: Artificially Augmented Data

Page 14: Animesh Garg

Semi-StyleGAN: CelebA (256x256)

Li et al. ICML 2020.

With very limited supervision, Semi-StyleGAN can achieve good disentanglement on real data

𝑔! 𝑔"

𝐼

𝐼# Data Point<person_id>, long-hair

𝐼#$ Data Point<person_id>, bangs

Counterfactual

Fixed-value

Page 15: Animesh Garg

Semi-StyleGAN: Isaac3D (512x512)

Li et al. ICML 2020.

Each factor in the interpolated images changes smoothly without affecting other factors

𝑔! 𝑔"

𝐼

𝐼# Data Point<object_id>, x-pos

𝐼#$ Data Point<object_id>, x-pos_new

Counterfactual

Fixed-value

Page 16: Animesh Garg

Semi-StyleGAN: Falcor3D (512x512)

Li et al. ICML 2020.

Each factor in the interpolated images changes smoothly without affecting other factors

𝑔! 𝑔"

𝐼

𝐼# Data Point<lighting>, camera_pos

𝐼#$ Data Point< lighting >, camera_new

Counterfactual

Fixed-value

Page 17: Animesh Garg

Semi-StyleGAN: Role of Limited Supervision

Only using 0.25∼2.5% of labeled data at par with supervised disentanglement

(L2 and L2-gen: lower is better, MIG and MIG-gen: higher is better)

Semi-StyleGAN with the default setting

Page 18: Animesh Garg

Semi-StyleGAN: Fine-Grained Tuning

New architecture with same loss model for semantic fine-grained image editing

Coarse-Grained Fine-Grained

Page 19: Animesh Garg

Semi-StyleGAN: CelebA (256x256)

Li et al. ICML 2020.

We randomly choose some deep learning researchers as test images

𝑔! 𝑔"

𝐼

𝐼# Data Pointeyebrow, smile

𝐼#$ Data Pointeyebrow, grin

Counterfactual

Fixed-value

Page 20: Animesh Garg

Semi-StyleGAN: CelebA (256x256)

Li et al. ICML 2020.

We shift the robot position to the right side, and attach it with an unseen object in test images

𝑔! 𝑔"

𝐼

𝐼# Data Pointwall_color, obj_color

𝐼#$ Data Pointwall_color, Obj_color

Counterfactual

Fixed-value

Page 21: Animesh Garg

Compositional Representations

Learning Disentanglement

Learning Keypoints

Learning Causal Graphs

Page 22: Animesh Garg

goal position

target object

Representations for multi-step reasoning in Robotics under physical and semantic constraints

Page 23: Animesh Garg
Page 24: Animesh Garg
Page 25: Animesh Garg
Page 26: Animesh Garg

choose action sequenceat, ……, at+H

s0

dynamics

s’~ f (· | s,a)

[Deisenroth et al, RSS’07], [Guo et al, NeurIPS’14], [Watter et al, NeurIPS’15], [Finn et al, ICRA’17], …...

Model-based learning

Page 27: Animesh Garg

data ↑learning ↑

?

[Janer et al. ICRA’19]

[Agrawal et al. ICRA’16]

[Ebert et al. CoRL’17]

[Deisenroth et al. RSS’07]

Model-based learning

Page 28: Animesh Garg

effect code c

CAVIN: Hierarchical planning in learned latent spaces

s0

CAVIN Planner

motion code z

Leverage Hierarchical Abstraction in Action Space

Without Hierarchical Supervision

Page 29: Animesh Garg

effect code c

CAVIN: Hierarchical planning in learned latent spaces

s0

CAVIN Planner

motion code z

subgoals

Page 30: Animesh Garg

effect code c

CAVIN: Hierarchical planning in learned latent spaces

s0

CAVIN Planner

motion code z

actions

Page 31: Animesh Garg

meta-dynamics

CAVIN: Hierarchical planning in learned latent spaces

s0

choosec~ p(c)

Page 32: Animesh Garg

meta-dynamics

CAVIN: Hierarchical planning in learned latent spaces

s0

choosec~ p(c)

st+3Tst+2T

st+T

Page 33: Animesh Garg

action generatormeta-dynamics

Hierarchical planning in learned latent spaceschoosez~ p(z)

s0

choosec~ p(c)

Page 34: Animesh Garg

action generatormeta-dynamics dynamics

CAVIN: Hierarchical planning in learned latent spaceschoosez~ p(z)

s0

choosec~ p(c)

at+1

at+2

at+3

Page 35: Animesh Garg

Learning with cascaded variational inference

task-agnostic interaction

qh (c | s,s’’) qg(z | s,c,a)

h (s’’ | s,c)

meta-dynamics

g (a | s,c,z)

action generator

c

μc Σc

z

μz Σz

Page 36: Animesh Garg

stpreprocess

kinect2 sensor

action[ x, y, Δx, Δy]

CAVIN Planner

visual observation

Page 37: Animesh Garg

Tasks

crossing

target objectgoal position

Move the target to the goal

across grey tiles

insertion

target objectgoal position

Move the target to the goal

without traversing red tiles.

clearing

Clear all objects within the

area of blue tiles.

Page 38: Animesh Garg

Quantitative Evaluation

0

10

20

30

40

50

60

70

80

90

100

MPC CVAE-MPC SeCTAR CAVIN CAVIN-Real

Succ

ess R

ate

(%)

Dense Sparse

15%12%

MPC (Guo et al. ‘14, Agrawal et al. ‘16, Finn et al. 17); CVAE-MPC (Ichter et al. 18), SeCTAR (Co-Reyes et al ‘18)

Hierarchical Latent space dyn.↓

Better performance with sparse reward signal

Averaged over 3 Tasks with 1000 test instances each

Page 39: Animesh Garg

Move 2 obstacles5x

Page 40: Animesh Garg

Get around5x

Page 41: Animesh Garg

Open path5x

Page 42: Animesh Garg

Compositional Representations

Learning Disentanglement

Learning Keypoints

Learning Causal Graphs

Page 43: Animesh Garg

Composition through Keypoints

Interpretable Unsupervised

He et al 2017, Kreiss et al 2019, Lin et al 2020, Sprurr et al 2020

Tang et al 2019, Christiansen et al 2019, Bian et al 2019, He et al 2019, Chen et al 2019

Page 44: Animesh Garg

Composition through Keypoints

Interpretable Unsupervised

He et al 2017, Kreiss et al 2019, Lin et al 2020, Sprurr et al 2020

Tang et al 2019, Christiansen et al 2019, Bian et al 2019, He et al 2019, Chen et al 2019

Page 45: Animesh Garg

Learning Keypoints From Video

Page 46: Animesh Garg

Learning Keypoints From Video

Page 47: Animesh Garg

Learning Keypoints From Video

Page 48: Animesh Garg

Learning Keypoints From Video

Page 49: Animesh Garg

Learning Keypoints From Video

Page 50: Animesh Garg

Learning Keypoints From Video

Page 51: Animesh Garg

Learning Keypoints From VideoBB

CHu

man

3.6

MC

eleb

A/M

AFL

Page 52: Animesh Garg

Learning Keypoints From Video

Page 53: Animesh Garg

Unsupervised Keypoints: Batch RL

Large Set of Task Demonstrations

Policy Learning without Interaction

Page 54: Animesh Garg

Unsupervised Keypoints: Batch RL

Page 55: Animesh Garg

Unsupervised Keypoints: Batch RLUnsupervised Representation Learning

0

25

50

75

BC BCQ IRIS

Accu

racy

Page 56: Animesh Garg

Unsupervised Keypoints: Video Prediction

Page 57: Animesh Garg

Unsupervised Keypoints: Video Prediction

Page 58: Animesh Garg

Compositional Representations

Learning Disentanglement

Learning Keypoints

Learning Causal Graphs

Page 59: Animesh Garg

Learning Causality

Photo from Wu et al., Learning to See Physics via Visual De-animation• Intuitive Physics

• vs Reinforcement Learning• Generalization• Goal specification• Sample efficiency

• vs Analytical Physics Model• Underlying dynamics is uncertain or unknown• Simulation is too time consuming• Partial observation

Page 60: Animesh Garg

Learning Causality

Page 61: Animesh Garg

Learning Causality

• Intuitive Physics

• State Representation?• Keypoints

• vs. 6 DoF pose

Photo from Jakab et al., Unsupervised Learning of Object Landmarks through Conditional Image Generation

Photo from Tremblay et al., Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects

Page 62: Animesh Garg

From left to right:(1) Input image(2) Predicted keypoints(3) Overlay(4) Heatmap from the keypoints(5) Reconstructed target image

Page 63: Animesh Garg

Learning Causality

Page 64: Animesh Garg

Learning Causality

Page 65: Animesh Garg

Learning Causality

Page 66: Animesh Garg

Learning Causality: Extrapolation

Page 67: Animesh Garg

Learning Causality: Counterfactual

Page 68: Animesh Garg

Learning Causality

Page 69: Animesh Garg

Learning Causality

Page 70: Animesh Garg

Learning Causality

Page 71: Animesh Garg

Unsupervised Representations towards Counterfactual Predictions

Animesh [email protected] // pair.toronto.edu // Twitter: @Animesh_Garg