Visual Dynamics: Probabilistic Future Frame Synthesis …vgg/rg/slides/vgg_rg_23_feb_2017... ·...

Preview:

Citation preview

VisualDynamics:ProbabilisticFutureFrameSynthesisviaCrossConvolutionalNetworks

TianfanXue* JiajunWu* KatieBouman BillFreeman

NIPS2016

VGGReadingGroup,24Feb2017AnkushGupta

Frame2

Task:futureframeprediction

Frame1 Frame2Deterministicneuralnetwork

Deterministicpredictionsfailtomodeluncertainty

Frame1 Deterministicneuralnetwork

Deterministicpredictionsfailtomodeluncertainty

Prediction

Whatistheproblem?

Frame1 Deterministicneuralnetwork

Deterministicpredictionsfailtomodeluncertainty

Prediction

Whatistheproblem?

SynthesisnetworkInputframe Sampledfutureframe

Sampledifferentfutureframes

Mainidea NetworkstructureOutline Whatthenetworklearns Result

Inputrandommotionvector𝑧~𝑝$(𝑧)

SynthesisnetworkInputframe

Sampledifferentfutureframes

Mainidea NetworkstructureOutline Whatthenetworklearns Result

Inputrandommotionvector𝑧~𝑝$(𝑧)

Sampledfutureframe

Inputframe Anothersampledfutureframe

Segments Transformedsegments

Inputrandommotionvector𝑧~𝑝$(𝑧)

Synthesizeusingdifferenttransformations

Mainidea NetworkstructureOutline Whatthenetworklearns Result

Sampledfutureframe

Motionvector𝑧

SynthesisnetworkInputframe

Encodingnetwork

Futureframe(groundtruth)

Training

Mainidea NetworkstructureOutline Whatthenetworklearns Result

Motionvector𝑧

Encodingnetwork

Synthesisnetwork Futureframe

(prediction)Trainingsamples

(Label-free)

Training

Inputframe

Futureframe(groundtruth)

Mainidea NetworkstructureOutline Whatthenetworklearns Result

Futureframe𝐼()*(prediction)

Motionvector𝑧

Encodingnetwork

Synthesisnetwork

Training

Futureframe𝐼+,(groundtruth)

Inputframe

Objectivefunction:𝐼()* − 𝐼+, + 𝐷01(𝒛||𝑁(𝟎, 𝐈))

Reconstructionloss

Mainidea NetworkstructureOutline Whatthenetworklearns Result

Futureframe𝐼()*(prediction)

Futureframe𝐼+,(groundtruth)

Inputframe

Encodingnetwork

Synthesisnetwork

Training Objectivefunction:𝐼()* − 𝐼+, + 𝐷01(𝒛||𝑁(𝟎, 𝐈))

KL-divergenceloss

Motionvector𝑧

Mainidea NetworkstructureOutline Whatthenetworklearns Result

Variational Autoencoder[Kingma andWelling,2014]

Futureframe𝐼()*(prediction)

Synthesisnetwork

Testing

Futureframe𝐼+,(groundtruth)

Encodingnetwork

Inputframe

Inputframe

Mainidea NetworkstructureOutline Whatthenetworklearns Result

u

Inputrandommotionvector𝑧~𝑝$(𝑧)

Realoutputfromournetwork

Inputframe Futureframe

TransformsegmentsFindsegments

Inputrandommotionvector𝑧

Synthesizebytransformingsegments

Mainidea NetworkstructureOutline Whatthenetworklearns Result

Imagesegments Convolution

0 0 0

0 1 0

0 0 0

0 0 1

0 0 0

0 0 0

Movementcanbesynthesizedthroughconvolution

Mainidea NetworkstructureOutline Whatthenetworklearns Result

Imagesegments

Applyingmotiontoeachsegment

Mainidea NetworkstructureOutline Whatthenetworklearns Result

Motionkernels

Thedecodingnetworkgeneratesamotionkernelforeachcorrespondingsegment

Decodingnet

Motionvector𝑧

[Brabandere etal.2016][Finnetal.2016]

Motionvector𝑧

Inputframe

Futureframe

Synthesisnetwork

Futureframe

Mainidea NetworkstructureOutline Whatthenetworklearns Result

Whatisencodedinthemotionvector?

Encodingnetwork

Motionvector𝑧 Upwardmotionwhenchangingthisdimension

Mainidea NetworkstructureOutline Whatthenetworklearns Result

Eachdimensionencodesatypeofmotion

Motionvector𝑧 Legmotionwhenchangingthisdimension

Eachdimensionencodesatypeofmotion

Mainidea NetworkstructureOutline Whatthenetworklearns Result

• Simulatedshapes

• Trainingsamples

Results:toyexample

Mainidea NetworkstructureOutline Whatthenetworklearns Result

Input

Learnedsegments

Networkautomaticallydetectssegments

Triangles

Circles

Mainidea NetworkstructureOutline Whatthenetworklearns Result

Input SamplednextframeGroundtruthdistribution

Sampledistribution

Networklearnsthecorrelationbetweenappearanceandmotion

Mainidea NetworkstructureOutline Whatthenetworklearns Result

Input Sampledfutureframes

Results:real-worldimages

Mainidea NetworkstructureOutline Whatnetworklearns Result

Challenge:largemotion

Mainidea NetworkstructureOutline Whatthenetworklearns Result

Input TwosampledfutureframesArtifactsappearwhenmotionislarge

Baseline:Transferflow 25.5%Ourmethod 31.3%

Labeledasreal

MechanicalTurkstudytoassesssynthesisquality

Idealsynthesisalgorithmachieves50%

Mainidea NetworkstructureOutline Whatthenetworklearns Result

• Samplemultiplefutureframesthatareconsistentwiththeinput

• Synthesizeframesbytransformingsegments

• Learnamotionrepresentationwithoutsupervision

Contributions

Recommended