Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Thomas True, Senior Applied Engineer for Professional Video
Andrew Page, Director Advanced Technologies
DEEP LEARNING FOR SLOW MOTION VIDEO
2
Deep Learning
3
WHAT IS DEEP LEARNING?
Artificial Intelligence (AI): general coverall for machines doing interesting things
Machine Learning (ML):computers complete tasks without explicit programming
Neural Networks (NN):one technique to achieve ML
Deep Learning (DL):adds “hidden layers” to Neural Networks to solve complex problems
Great explanation: https://goo.gl/hkayWG
The differences between AI, ML & DL
4
DEEP LEARNING APPLICATION DEVELOPMENT
UntrainedNeural Network
Model
Deep Learning
Framework
TRAININGLearning a new capability
from existing data
Trained ModelNew Capability
App or ServiceFeaturing Capability
INFERENCEApplying this capability
to new data
Trained ModelOptimized for Performance
5
WHAT PROBLEM ARE YOU SOLVING?Defining the AL/DL Task
QUESTION AI/DL TASK
Is “it” present
or not?Detection
What type of thing
is “it”?Classification
To what extent
is “it” present?Segmentation
What is the likely
outcome? Prediction
What will likely
satisfy the objective?Recommendation
What would be a
new variant?Generation
INPUTSEXAMPLE
OUTPUTS
Text Data Images
AudioVideo
Object Identification(Labeling)
Object Detection
Feature Tracking
Denoised Pixel Values
Animation Pose Selection
Texture Creation
6
WHAT IS “SUCCESSFUL” DEEP LEARNING?
What characteristics do successful deep learning applications share?
1. Define the problem; Make sure DL is appropriate
2. Collect a lot of the right data (labelled data is even better)
3. Ensure ROI; choose a large task
Many things but not “all the things!”
7
Slow Motion Video
8
Defining the Problem
9
MULTI-FRAME VARIABLE LENGTH INTERPOLATIONInterpolate an Arbitrary Number of Frames Between Two Input Frames
?
T = 0 T = 1
? ? ? ?
[Mahajan et al. SIGGRAPH’09]
10
SINGLE-FRAME INTERPOLATIONInterpolate Only the Single Frame Inbetween
T = 0 T = 1T = 0.5
[Long et al. ECCV 2016, Niklaus et al. CVPR 2017, Liu et al. ICCV 2017]
11
RECURSIVE SINGLE-FRAME INTERPOLATIONNo Time Order for Real-Time Applications
T = 0 T = 1Step 1Step 2 Step 2
12
RECURSIVE SINGLE-FRAME INTERPOLATIONInefficient for Arbitrary Number of Frames
? ?
T = 0 T = 1T = 1/3 T = 2/3
13
MULTI-FRAME VARIABLE LENGTH INTERPOLATIONCan Interpolate in Correct Time Order
?
T = 0 T = 1
? ? ? ?
14
MULTI-FRAME VARIABLE LENGTH INTERPOLATIONCan Interpolate in Reverse Time Order
?
T = 0 T = 1
? ? ? ?
15
MULTI-FRAME VARIABLE LENGTH INTERPOLATIONCan Interpolate All at Once
?
T = 0 T = 1
? ? ? ?
16
MULTI-FRAME VARIABLE LENGTH INTERPOLATIONCan Interpolate Only Selected Frames
?
T = 0 T = 1
? ? ? ?
T = 1/3 T = 2/3
17
Super SloMo
18
SUPER SLOMONVIDIA Research Published at IEEE CVPR 2018
https://arxiv.org/pdf/1712.00080.pdf
Huaizu Jiang, Deqing Sun, Varun Jampani, Ming-Hsuan Yang, Erik Learned-Miller, Jan
Kautz. Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation. IEEE
CVPR 2018
19
SUPER SLOMOFrame Synthesis Using Optical Flow
T = 0 T = 1T = t ∈(0,1)
20
SUPER SLOMOVisibility Information
T = 0 T = 1T = t
𝜶 = 1 𝜶 = 0
?
21
OPTICAL FLOWImage Motion Correspondence
Color key
Baker et al. IJCV’11
Input Optical flow
Data credit: Liu et al. CVPR’08
22
OPTICAL FLOWCNNs Estimates Flow Between Input Frames
inputimages
bi-directflow
optical flow computation between inputs
FlowNet, FlowNet2, SpyNet, PWC-Net…
23
?
SUPER SLOMO OPTICAL FLOWChallenge: Flow from Unknown to Input
23
T = 0 T = 1T = t
24
SUPER SLOMO OPTICAL FLOWSimplifed 1D Setting
𝑇 = 0 𝑇 = 1
𝐹1→0
𝐹0→1
25
SUPER SLOMO OPTICAL FLOWGoal: Compute Motion Vector that Combines Forward and Backward Motion
𝑇 = 0 𝑇 = 𝑡 𝑇 = 1
𝐹𝑡→1
𝐹𝑡→0
26
SUPER SLOMO OPTICAL FLOWSolution: Borrow from Bi-Directional Flow
𝑇 = 0 𝑇 = 𝑡 𝑇 = 1
𝐹𝑡→0
𝐹1→0
𝑡𝐹1→0
Same spatial location
27
SUPER SLOWMO OPTICAL FLOWSolution: Borrow from the Bi-Directional Optical Flow
𝑇 = 0 𝑇 = 𝑡 𝑇 = 1
𝐹0→1
(1 − 𝑡)𝐹0→1
𝐹𝑡→0
28
𝑇 = 0 𝑇 = 𝑡 𝑇 = 1
𝐹𝑡→0
𝐹0→1
(𝑡 − 1)𝐹0→1
SUPER SLOWMO OPTICAL FLOWSolution: Borrow from the Bi-Directional Optical Flow
29
SUPER SLOWMO OPTICAL FLOWBilinear Combination of Two Approximations
𝑇 = 0 𝑇 = 𝑡 𝑇 = 1(𝑡 − 1)𝐹0→1
𝑡𝐹1→0
𝐹𝑡→0 ≈ 𝑡 𝑡𝐹1→0 + 1 − 𝑡 (𝑡 − 1)𝐹0→1
𝐹𝑡→0
30
SUPER SLOMO NETWORK ARCHITECTUREAdd Flow Approximation
inputimages
bi-directflow
optical flow computation between input
flowapprox
Deterministic, parameter-free
𝐹𝑡→0 ≈ 𝑡 𝑡𝐹1→0 + 1 − 𝑡 (𝑡 − 1)𝐹0→1
31
ASSUMPTION: SMOOTH MOTION FIELD
𝑇 = 0 𝑇 = 𝑡 𝑇 = 1
𝐹1→0
𝐹0→1
32
OPTICAL FLOWMotion Fields are Piecewise Smooth
Input
Ground truth
optical flow
MPI-Sintel [Butler et al. ECCV’12] HAMA [Liu et al. CVPR’08] Middlebury [Baker et al. ICCV’07]
33
PROBLEM: MOTION BOUNDARIES
𝑇 = 0 𝑇 = 𝑡 𝑇 = 1
𝐹0→1
Motion boundary
34
PROBLEM: MOTION BOUNDARIES
𝑇 = 0 𝑇 = 𝑡 𝑇 = 1
𝐹1→0
𝐹0→1
Motion boundary
35
PROBLEM: MOTION BOUNDARIES
𝑇 = 0 𝑇 = 𝑡 𝑇 = 1
𝐹1→0
𝐹0→1
𝐹𝑡→0
(𝑡 − 1)𝐹0→1
𝑡𝐹1→0
?
Motion boundary
36
FLOW REFINEMENT
flowapprox
Flow refinement module
refinedflow
input images
warped images
37
SUPER SLOMO OPTICAL FLOWRefinement Near Motion Boundaries
intermediate
flow approximations
intermediate
refined flowdifferenceinput
38
FLOW REFINEMENT AND VISIBILITY REASONING
flowapprox
Flow refinement module
refinedflow
visibilitymaps
input images
warped images
prediction
39
intermediate optical flow
inputimages
warped input images
visibilitymaps
predictionw/ visibility maps
predictionw/o visibility maps
መ𝐼𝑡 = 𝛼𝑔 𝐼0, 𝐹𝑡→0 + 1 − 𝛼 𝑔(𝐼1, 𝐹𝑡→1)
𝐼0
𝐼1
𝐹𝑡→0
𝐹𝑡→1
𝑔 𝐼0, 𝐹𝑡→0
𝑔(𝐼1, 𝐹𝑡→1)
𝛼
1 − 𝛼
መ𝐼𝑡
g: warping function
PREDICTION USE FLOW AND VISIBILITY MAP
40
PREDICTION USING FLOW AND VISIBILITY MAP
intermediate optical flow
inputimages
warped input images
visibilitymaps
መ𝐼𝑡 = 𝛼𝑔 𝐼0, 𝐹𝑡→0 + 1 − 𝛼 𝑔(𝐼1, 𝐹𝑡→1)
𝐼0
𝐼1
𝐹𝑡→0
𝐹𝑡→1
𝑔 𝐼0, 𝐹𝑡→0
𝑔(𝐼1, 𝐹𝑡→1)
𝛼
1 − 𝛼
predictionw/ visibility maps
predictionw/o visibility maps
g: warping function
41
OVERALL NETWORK ARCHITECTURECombination of Two CNNs
flowapprox
refinedflow
visibilitymaps
inputimages
bi-directflow
prediction
at each time step t
optical flow computation between input arbitrary-time image synthesis
input images
warped images
42
TRAINING DATASET
YouTube 240-fps Adobe 240-fps*
*provided by Oliver Wang
43
GENERATIVE ADVERSARIAL NETWORK (GAN)G tries to fool D
The Art CriticThe Art Forger
44
PERFORMANCERunning Time (PyTorch)
44
Training (6 V100) Inference (1 V100)
~1000 videos, 0.3M frames:
10 days
7 1280*720 images:
0.36s
•@Thomas True•Have we done any perf analysis of the existing GPUD4V implementations on Matrox at 8k to see where the bottlenecks actually are?
45
EVS
46
EVS AI ENGINE
AI generated
AI generated
AI generated
AI generated
AI generated
AI generated
Regular camera
Artificial Intelligence hallucinated intermediate images
Regular
feed
Slow Motion
feed
AI Engine
47
EVS AI ENGINECreating On-Demand Slow Motion
REPLAY
SERVER
LSM
AI ENGINE
IN = X VIDEO
FRAMES
OUT = NX VIDEO
FRAMES
Commands
Instructions are sent
to the AI engine
Normal speed
baseband video
or file
Slow motion video
baseband feed
or file
PreviewINSERTION
OF TC, …
LTC, UMD, …
information
48
EVS AI ENGINEMimicking a Supermotion Camera
CAMERA
AI ENGINE
IN = X VIDEO
FRAMES
OUT = NX VIDEO
FRAMES
Normal speed
baseband
video
Slow motion
video in
multiple
phases
REPLAY
SERVER
LSM
Commands
49
NVIDIA NGX
50
NVIDIA NGX: DL FOR CREATIVE APPLICATIONS
Delivering Deep Learning Research Into Creative Applications
• The framework to make NV DL research available in end user applications through a common service interface
• DL Features: SloMo, Up-Res, InFilling, DLAA …
• Created, trained and updated by NVIDIA, will be available to any application running on an NGX GPU
• NGX SDK - ISVs can add these DL features to their application without going through the expense of developing and training https://developer.nvidia.com/rtx/ngx
51
Results
52
QUALITY ISSUESMotion Blur
Objects
move too
fast
Motion blur
EXPOSURE TIME
SHARPENING TECHNIQUES
53
QUALITY ISSUESMissing Information
Objects
move too
fast
Missing
information
SIGNIFICANTLY REDUCED
BY DOMAIN SPECIFIC TRAINING
54
QUALITY ISSUESMissing Information
55
HOW DO WE IMPROVE THE QUALITY?
More training data with variety of frame rates from 30 fps to 240 fps
Application Specific Training (Sports, Drama, etc.)
Post process filtering
More Training
56
Getting Started in DL
57
DEEP LEARNING CONTAINERS
Took learnings from scores of teams across NVIDIA and industry
Built containers to speed and ease deployment
Goals:
• Get engineers working on Deep Learning in minutes
• Ensure full GPU acceleration and latest
optimizations on all frameworks
• Isolate frameworks (some don’t play nice)
• Share, collaborate, and test applications across
different environments
57
58
VIRTUAL MACHINES VS. CONTAINERSMotivations
Packaging mechanism for applications
► Consistent and reproducible deployment
► Lightweight with faster startup than VMs
► Logical isolation from other applications (at the OS level)
No first-class support for GPUs was available in runtimes
Image credits
59
GPU SUPPORT IN DOCKER: 1.0
Open-source Project on GitHub since 2016>2MM downloads
>7.5K stars
>13 MM pulls of CUDA images from DockerHub
Enabled Various Use-cases► NGC optimized containers from NVIDIA
► Adopted by major deep learning frameworks
60
NVIDIA GPU CLOUDDeep Learning Across Platforms
NVIDIA TITAN (powered by NVIDIA
Volta or Pascal)
NVIDIA DGX-1, DGX-2 and
DGX Station
Amazon EC2 P3 instances with NVIDIA Volta
Use across workstation, local servers and cloud
To learn more about NVIDIA GPU Cloud, visit:https://nvidia.com/cloud
To sign up, go to:https://nvidia.com/ngcsignup
61
NVIDIA GPU CLOUDDownloadable Containers