Describing videos by exploiting temporal structure

Describing Videos by Exploiting Temporal Structure

Slides by Alberto MontesComputer Vision Group, April 12th, 2016

[arXiv] [GitXiv] [video] [code]

Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christoper Pal, Hugo Larochelle, Aaron Courville

Introduction

Goal: Generate captions from videos.

Video Description Generation Framework

Encoder-Decoder Framework

Encoder: Convolutional Neural Network

Basic approach:

Deep CNN over frames

Decoder: Long Short-Term Memory Network

Long Short Term Memory

Forget Gate:

Input Gate Layer

New candidates for cell state

Update Memory Content:

E[yt]: word embedding matrix

inputprevious

hidden stateWeights matrices:context from

encoder bias

Exploiting Temporal Structure

Exploiting Local Features

● Trained for activity recognition.● Only the conv layers will be used.

Histograms of oriented Gradient

Histograms of oriented Flow

Motion Boundary Histogram

A Spatio-Temporal Convolution Neural Net

Exploiting Global Structure

Attention Mechanism

Update of attention weights:

Experiments

YouTube2Text

1,970 video clips with multiple descriptions

Training set: 1,200 video clips

Validation set: 100 video clips

Datasets

Videos taken from DVDs

49,000 video clips

Training set: 39,000 video clips

Validation set: 5,000

Test set: 5,000

Setup and Training

4 setups:

◉ Basic (2D GoogLeNet CNN)◉ Local (+ 3D CNN features)◉ Global (+ temporal attention

mechanism)◉ Local + Global

Training

- Adadelta gradient- Loss function:

Results

Evaluation

Conclusions

Propose a 3D CNN to capture local fine-grained motion information.

A temporal attention mechanism to capture global information.

State-of-the-art results on Youtube2text with a combination of both approaches.

Thank you!Questions?

Describing videos by exploiting temporal structure

Technology

Avalanche Ski-Resort Snow-Clad Mountain Moving Vistas: Exploiting Motion for Describing Scenes Nitesh Shroff, Pavan Turaga, Rama Chellappa University of

kr.usembassy.gov · Web viewProposal sections describing the two videos featuring ... Video Film Shooting ... (persons with origins from Burma, Thailand, Malaysia, Indonesia

Moving Vistas: Exploiting Motion for Describing Scenespturaga/papers/Vistas.pdf · Moving Vistas: Exploiting Motion for Describing Scenes Nitesh Shroff, Pavan Turaga and Rama Chellappa

Exploiting Generative Models in Discriminative Classifierspapers.nips.cc/paper/1520-exploiting-generative-models-in... · Exploiting Generative Models in Discriminative Classifiers

Exploiting Digital Datasets

Exploiting Complexity

Cataloguing moving images: describing artists' videos with MARC21 Jacqueline Cooke Research Support Librarian, Goldsmiths ARLIS study day: Picture this!

TRAINING VIDEOS TRAINING VIDEOS

Exploiting Symbian

Exploiting Detachability

Detecting and Exploiting Vulnerability in ActiveX Controlsfarsi]-detecting-and-exploiting... · Detecting and Exploiting Vulnerability in ActiveX Controls Shahriyar Jalayeri (Snake)

Exploiting Parallelism

Exploiting thetrianglerelationship

Exploiting Ungrounded Tactile Haptic Displays for Mobile Robotic Teleoperationdro.deakin.edu.au/eserv/DU:30018317/nahavandi-exploiting... · Exploiting Ungrounded Tactile Haptic Displays

Exploiting Tal

Exploiting Chaos

arXiv:1902.10322v2 [cs.CV] 29 Apr 2019 · Krishnamoorthy et al. [27] led the early works of describing open domain videos. [20] proposed semantic hierarchies to establish relationships

Power Quality Monitoring: Waveform Analysis - IEEE · PDF file2 Outline • Introduction • Describing a Power Quality Waveform • Sample PQ Monitoring Videos • Sample PQ Waveforms

Exploiting Context

Exploiting material