DEEP LEARNING FOR SLOW MOTION VIDEO · 2019-05-24 · DEEP LEARNING CONTAINERS Took learnings from...

Preview:

Citation preview

Thomas True, Senior Applied Engineer for Professional Video

Andrew Page, Director Advanced Technologies

DEEP LEARNING FOR SLOW MOTION VIDEO

2

Deep Learning

3

WHAT IS DEEP LEARNING?

Artificial Intelligence (AI): general coverall for machines doing interesting things

Machine Learning (ML):computers complete tasks without explicit programming

Neural Networks (NN):one technique to achieve ML

Deep Learning (DL):adds “hidden layers” to Neural Networks to solve complex problems

Great explanation: https://goo.gl/hkayWG

The differences between AI, ML & DL

4

DEEP LEARNING APPLICATION DEVELOPMENT

UntrainedNeural Network

Model

Deep Learning

Framework

TRAININGLearning a new capability

from existing data

Trained ModelNew Capability

App or ServiceFeaturing Capability

INFERENCEApplying this capability

to new data

Trained ModelOptimized for Performance

5

WHAT PROBLEM ARE YOU SOLVING?Defining the AL/DL Task

QUESTION AI/DL TASK

Is “it” present

or not?Detection

What type of thing

is “it”?Classification

To what extent

is “it” present?Segmentation

What is the likely

outcome? Prediction

What will likely

satisfy the objective?Recommendation

What would be a

new variant?Generation

INPUTSEXAMPLE

OUTPUTS

Text Data Images

AudioVideo

Object Identification(Labeling)

Object Detection

Feature Tracking

Denoised Pixel Values

Animation Pose Selection

Texture Creation

6

WHAT IS “SUCCESSFUL” DEEP LEARNING?

What characteristics do successful deep learning applications share?

1. Define the problem; Make sure DL is appropriate

2. Collect a lot of the right data (labelled data is even better)

3. Ensure ROI; choose a large task

Many things but not “all the things!”

7

Slow Motion Video

8

Defining the Problem

9

MULTI-FRAME VARIABLE LENGTH INTERPOLATIONInterpolate an Arbitrary Number of Frames Between Two Input Frames

?

T = 0 T = 1

? ? ? ?

[Mahajan et al. SIGGRAPH’09]

10

SINGLE-FRAME INTERPOLATIONInterpolate Only the Single Frame Inbetween

T = 0 T = 1T = 0.5

[Long et al. ECCV 2016, Niklaus et al. CVPR 2017, Liu et al. ICCV 2017]

11

RECURSIVE SINGLE-FRAME INTERPOLATIONNo Time Order for Real-Time Applications

T = 0 T = 1Step 1Step 2 Step 2

12

RECURSIVE SINGLE-FRAME INTERPOLATIONInefficient for Arbitrary Number of Frames

? ?

T = 0 T = 1T = 1/3 T = 2/3

13

MULTI-FRAME VARIABLE LENGTH INTERPOLATIONCan Interpolate in Correct Time Order

?

T = 0 T = 1

? ? ? ?

14

MULTI-FRAME VARIABLE LENGTH INTERPOLATIONCan Interpolate in Reverse Time Order

?

T = 0 T = 1

? ? ? ?

15

MULTI-FRAME VARIABLE LENGTH INTERPOLATIONCan Interpolate All at Once

?

T = 0 T = 1

? ? ? ?

16

MULTI-FRAME VARIABLE LENGTH INTERPOLATIONCan Interpolate Only Selected Frames

?

T = 0 T = 1

? ? ? ?

T = 1/3 T = 2/3

17

Super SloMo

18

SUPER SLOMONVIDIA Research Published at IEEE CVPR 2018

https://arxiv.org/pdf/1712.00080.pdf

Huaizu Jiang, Deqing Sun, Varun Jampani, Ming-Hsuan Yang, Erik Learned-Miller, Jan

Kautz. Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation. IEEE

CVPR 2018

19

SUPER SLOMOFrame Synthesis Using Optical Flow

T = 0 T = 1T = t ∈(0,1)

20

SUPER SLOMOVisibility Information

T = 0 T = 1T = t

𝜶 = 1 𝜶 = 0

?

21

OPTICAL FLOWImage Motion Correspondence

Color key

Baker et al. IJCV’11

Input Optical flow

Data credit: Liu et al. CVPR’08

22

OPTICAL FLOWCNNs Estimates Flow Between Input Frames

inputimages

bi-directflow

optical flow computation between inputs

FlowNet, FlowNet2, SpyNet, PWC-Net…

23

?

SUPER SLOMO OPTICAL FLOWChallenge: Flow from Unknown to Input

23

T = 0 T = 1T = t

24

SUPER SLOMO OPTICAL FLOWSimplifed 1D Setting

𝑇 = 0 𝑇 = 1

𝐹1→0

𝐹0→1

25

SUPER SLOMO OPTICAL FLOWGoal: Compute Motion Vector that Combines Forward and Backward Motion

𝑇 = 0 𝑇 = 𝑡 𝑇 = 1

𝐹𝑡→1

𝐹𝑡→0

26

SUPER SLOMO OPTICAL FLOWSolution: Borrow from Bi-Directional Flow

𝑇 = 0 𝑇 = 𝑡 𝑇 = 1

𝐹𝑡→0

𝐹1→0

𝑡𝐹1→0

Same spatial location

27

SUPER SLOWMO OPTICAL FLOWSolution: Borrow from the Bi-Directional Optical Flow

𝑇 = 0 𝑇 = 𝑡 𝑇 = 1

𝐹0→1

(1 − 𝑡)𝐹0→1

𝐹𝑡→0

28

𝑇 = 0 𝑇 = 𝑡 𝑇 = 1

𝐹𝑡→0

𝐹0→1

(𝑡 − 1)𝐹0→1

SUPER SLOWMO OPTICAL FLOWSolution: Borrow from the Bi-Directional Optical Flow

29

SUPER SLOWMO OPTICAL FLOWBilinear Combination of Two Approximations

𝑇 = 0 𝑇 = 𝑡 𝑇 = 1(𝑡 − 1)𝐹0→1

𝑡𝐹1→0

𝐹𝑡→0 ≈ 𝑡 𝑡𝐹1→0 + 1 − 𝑡 (𝑡 − 1)𝐹0→1

𝐹𝑡→0

30

SUPER SLOMO NETWORK ARCHITECTUREAdd Flow Approximation

inputimages

bi-directflow

optical flow computation between input

flowapprox

Deterministic, parameter-free

𝐹𝑡→0 ≈ 𝑡 𝑡𝐹1→0 + 1 − 𝑡 (𝑡 − 1)𝐹0→1

31

ASSUMPTION: SMOOTH MOTION FIELD

𝑇 = 0 𝑇 = 𝑡 𝑇 = 1

𝐹1→0

𝐹0→1

32

OPTICAL FLOWMotion Fields are Piecewise Smooth

Input

Ground truth

optical flow

MPI-Sintel [Butler et al. ECCV’12] HAMA [Liu et al. CVPR’08] Middlebury [Baker et al. ICCV’07]

33

PROBLEM: MOTION BOUNDARIES

𝑇 = 0 𝑇 = 𝑡 𝑇 = 1

𝐹0→1

Motion boundary

34

PROBLEM: MOTION BOUNDARIES

𝑇 = 0 𝑇 = 𝑡 𝑇 = 1

𝐹1→0

𝐹0→1

Motion boundary

35

PROBLEM: MOTION BOUNDARIES

𝑇 = 0 𝑇 = 𝑡 𝑇 = 1

𝐹1→0

𝐹0→1

𝐹𝑡→0

(𝑡 − 1)𝐹0→1

𝑡𝐹1→0

?

Motion boundary

36

FLOW REFINEMENT

flowapprox

Flow refinement module

refinedflow

input images

warped images

37

SUPER SLOMO OPTICAL FLOWRefinement Near Motion Boundaries

intermediate

flow approximations

intermediate

refined flowdifferenceinput

38

FLOW REFINEMENT AND VISIBILITY REASONING

flowapprox

Flow refinement module

refinedflow

visibilitymaps

input images

warped images

prediction

39

intermediate optical flow

inputimages

warped input images

visibilitymaps

predictionw/ visibility maps

predictionw/o visibility maps

መ𝐼𝑡 = 𝛼𝑔 𝐼0, 𝐹𝑡→0 + 1 − 𝛼 𝑔(𝐼1, 𝐹𝑡→1)

𝐼0

𝐼1

𝐹𝑡→0

𝐹𝑡→1

𝑔 𝐼0, 𝐹𝑡→0

𝑔(𝐼1, 𝐹𝑡→1)

𝛼

1 − 𝛼

መ𝐼𝑡

g: warping function

PREDICTION USE FLOW AND VISIBILITY MAP

40

PREDICTION USING FLOW AND VISIBILITY MAP

intermediate optical flow

inputimages

warped input images

visibilitymaps

መ𝐼𝑡 = 𝛼𝑔 𝐼0, 𝐹𝑡→0 + 1 − 𝛼 𝑔(𝐼1, 𝐹𝑡→1)

𝐼0

𝐼1

𝐹𝑡→0

𝐹𝑡→1

𝑔 𝐼0, 𝐹𝑡→0

𝑔(𝐼1, 𝐹𝑡→1)

𝛼

1 − 𝛼

predictionw/ visibility maps

predictionw/o visibility maps

g: warping function

41

OVERALL NETWORK ARCHITECTURECombination of Two CNNs

flowapprox

refinedflow

visibilitymaps

inputimages

bi-directflow

prediction

at each time step t

optical flow computation between input arbitrary-time image synthesis

input images

warped images

42

TRAINING DATASET

YouTube 240-fps Adobe 240-fps*

*provided by Oliver Wang

43

GENERATIVE ADVERSARIAL NETWORK (GAN)G tries to fool D

The Art CriticThe Art Forger

44

PERFORMANCERunning Time (PyTorch)

44

Training (6 V100) Inference (1 V100)

~1000 videos, 0.3M frames:

10 days

7 1280*720 images:

0.36s

•@Thomas True•Have we done any perf analysis of the existing GPUD4V implementations on Matrox at 8k to see where the bottlenecks actually are?

45

EVS

46

EVS AI ENGINE

AI generated

AI generated

AI generated

AI generated

AI generated

AI generated

Regular camera

Artificial Intelligence hallucinated intermediate images

Regular

feed

Slow Motion

feed

AI Engine

47

EVS AI ENGINECreating On-Demand Slow Motion

REPLAY

SERVER

LSM

AI ENGINE

IN = X VIDEO

FRAMES

OUT = NX VIDEO

FRAMES

Commands

Instructions are sent

to the AI engine

Normal speed

baseband video

or file

Slow motion video

baseband feed

or file

PreviewINSERTION

OF TC, …

LTC, UMD, …

information

48

EVS AI ENGINEMimicking a Supermotion Camera

CAMERA

AI ENGINE

IN = X VIDEO

FRAMES

OUT = NX VIDEO

FRAMES

Normal speed

baseband

video

Slow motion

video in

multiple

phases

REPLAY

SERVER

LSM

Commands

49

NVIDIA NGX

50

NVIDIA NGX: DL FOR CREATIVE APPLICATIONS

Delivering Deep Learning Research Into Creative Applications

• The framework to make NV DL research available in end user applications through a common service interface

• DL Features: SloMo, Up-Res, InFilling, DLAA …

• Created, trained and updated by NVIDIA, will be available to any application running on an NGX GPU

• NGX SDK - ISVs can add these DL features to their application without going through the expense of developing and training https://developer.nvidia.com/rtx/ngx

51

Results

52

QUALITY ISSUESMotion Blur

Objects

move too

fast

Motion blur

EXPOSURE TIME

SHARPENING TECHNIQUES

53

QUALITY ISSUESMissing Information

Objects

move too

fast

Missing

information

SIGNIFICANTLY REDUCED

BY DOMAIN SPECIFIC TRAINING

54

QUALITY ISSUESMissing Information

55

HOW DO WE IMPROVE THE QUALITY?

More training data with variety of frame rates from 30 fps to 240 fps

Application Specific Training (Sports, Drama, etc.)

Post process filtering

More Training

56

Getting Started in DL

57

DEEP LEARNING CONTAINERS

Took learnings from scores of teams across NVIDIA and industry

Built containers to speed and ease deployment

Goals:

• Get engineers working on Deep Learning in minutes

• Ensure full GPU acceleration and latest

optimizations on all frameworks

• Isolate frameworks (some don’t play nice)

• Share, collaborate, and test applications across

different environments

57

58

VIRTUAL MACHINES VS. CONTAINERSMotivations

Packaging mechanism for applications

► Consistent and reproducible deployment

► Lightweight with faster startup than VMs

► Logical isolation from other applications (at the OS level)

No first-class support for GPUs was available in runtimes

Image credits

59

GPU SUPPORT IN DOCKER: 1.0

Open-source Project on GitHub since 2016>2MM downloads

>7.5K stars

>13 MM pulls of CUDA images from DockerHub

Enabled Various Use-cases► NGC optimized containers from NVIDIA

► Adopted by major deep learning frameworks

60

NVIDIA GPU CLOUDDeep Learning Across Platforms

NVIDIA TITAN (powered by NVIDIA

Volta or Pascal)

NVIDIA DGX-1, DGX-2 and

DGX Station

Amazon EC2 P3 instances with NVIDIA Volta

Use across workstation, local servers and cloud

To learn more about NVIDIA GPU Cloud, visit:https://nvidia.com/cloud

To sign up, go to:https://nvidia.com/ngcsignup

61

NVIDIA GPU CLOUDDownloadable Containers

Recommended