Deep Learning for videoxgwang/video.pdf · 2014. 7. 12. · Reference • Babenko, Boris,...

Deep Learning for videoDeep Learning for video understandingg

Wanli OuyangWanli OuyangDepartment of Electronic Engineering, Th Chi U i i f H KThe Chinese University of Hong Kong

OutlineOutline

• Deep learning for image window

Tracking

OutlineOutline

• Deep learning for image window• Deep learning for multiple imagesDeep learning for multiple images

Action recognitionTrack cycling Heptathlon Longboarding

Deep learning for image window

Deep learning trackerDeep learning tracker

• Tracking by classificationForeground positiveg pBackground negative

Cl ifi ti ith d• Classification with deep model

Deep classifier

[Babenko et al TPAMI11]

Deep classifier

[Wang&Yeung NIPS13]

Deep learning trackerDeep learning tracker

b k d d• Pretrain by stacked auto encoder• Use 4 fully connected deep model y pfor learning the classifier from 32x32 input patch32x32 input patch.

classification

Deep classifierDeep classifier

Deep learning for multiple images

Deep learning for multiple imagesDeep learning for multiple images

C id K i 3K h l i th i t• Consider K images as 3K channels in the input data.

• Apply 3D CNN for extracting features

1 image, 3 channels K image, 3K channels

3D CNN for action recognition [Ji et al. TPAMI13]

• CNN channels can be hard wired. E.g. gray pixel values, gradient‐x/y,hard wired. E.g. gray pixel values, gradient x/y, optical flow‐x/y.Learned weights at other layersLearned weights at other layers

3D CNN for action recognition3D CNN for action recognition

• Encourage the output to be close to high‐level features (bag‐of‐words, motion edge history ( g g yimage).

Auxiliary feature yextractors

Auxiliary motion

3D CNN

motion features

Action class

Action recognition resultsAction recognition results

Cell to ear

Object put

Pointing

Large‐scale Video Classification with CNN [Karpath et al. CVPR 2014]

• Multi‐resolution

Temporal FusionTemporal Fusion

Experimental resultsExperimental results• Randomly sample 20 clips of a video andRandomly sample 20 clips of a video and averaging the output of these clip predictions.

L b diTrack cycling Longboarding

Cycling Longboarding

Track cycling Aggressive inline skating

2 “How to”s2 How to s• How to effectively train a deep model D t t tiData augmentation Label more dataPre train on large scale related data (RCNN)Pre‐train on large‐scale related data (RCNN) Layerwise pre‐training + fine tuning (Multi‐stage)

• How to formulate a vision problem with deep learning?• How to formulate a vision problem with deep learning? Tune hyper‐parameters, e.g. number of hidden nodes,

number of layers, activation function, dropout.u be o aye s, act at o u ct o , d opoutMake use of experience and insights obtained in CV researchSequential design/learning vs joint learning Contextual information (Multi‐stage, face, human pose)Background clutter removal (SDN)Background clutter removal (SDN)Short and long range temporal relationship (Action recognition)

Conclusion 2 “How to”sConclusion ‐ 2 How to s• How to effectively train a deep model D t t tiData augmentation Label more dataPre train on large scale related data (RCNN)Pre‐train on large‐scale related data (RCNN) Layerwise pre‐training + fine tuning (Multi‐stage)

• How to formulate a vision problem with deep learning?• How to formulate a vision problem with deep learning? Tune hyper‐parameters, e.g. number of hidden nodes,

number of layers, activation function, dropout.u be o aye s, act at o u ct o , d opoutMake use of experience and insights obtained in CV researchSequential design/learning vs joint learning Contextual information (Multi‐stage, face, human pose)Background clutter removal (SDN)Background clutter removal (SDN)Short and long range temporal relationship (Action recognition)

ReferenceReference

• Babenko, Boris, Ming‐Hsuan Yang, and Serge Belongie. "Robust object tracking with online multiple instance learning " Pattern Analysis and Machine Intelligence IEEElearning. Pattern Analysis and Machine Intelligence, IEEE Transactions on 33.8 (2011): 1619‐1632.

• Wang N & Yeung D Y “Learning a deep compact imageWang, N., & Yeung, D. Y., Learning a deep compact image representation for visual tracking” NIPS, 2013.

• Karpathy, Andrej, et al. "Large‐scale video classification with p y, j, gconvolutional neural networks“ CVPR, 2014.

• Ji, Shuiwang, et al. "3D convolutional neural networks for human action recognition." Pattern Analysis and Machine Intelligence, IEEE Transactions on35.1 (2013): 221‐231.

Thank you!Thank you!

mmlab.ie.cuhk.edu.hk/ www.ee.cuhk.edu.hk/~xgwang/ www.ee.cuhk.edu.hk/~wlouyang/

Deep Learning for videoxgwang/video.pdf · 2014. 7. 12. · Reference • Babenko, Boris,...

Documents

Task Specific Local Region Matching · 2014. 9. 4. · Task Specific Local Region Matching Boris Babenko, Piotr Dollár and Serge Belongie Department of Computer Science and Engineering

Periodic Motion Detection via Approximate Sequence Alignment Ivan Laptev*, Serge Belongie**, Patrick Perez* *IRISA/INRIA, Rennes, France **Univ. of California,

THE DIGITAL CLASSROOMS PROJECT ONLINE VIDEOtechnogogy.org.uk/video.pdf · Nik Peachey Learning Technology Consultant, Trainer , Writer Associate Trainer - Bell Educational Services

Model–basedHalftoning for Color Image Segmentationcseweb.ucsd.edu/~sjb/icpr00.pdfModel–basedHalftoning for Color Image Segmentation Jan Puzicha and Serge Belongie UC Berkeley,

Babenko equation

YULIYA BABENKO - KSU | Faculty Webfacultyweb.kennesaw.edu/ybabenko/docs/yuliya-babenko-cv.pdf4. MTH 4391, Complex Analysis, 1 time 5. MTH 3310, Differential Equations, 2 times COURSES

Evaluating Video Visualizations of Human Behaviorjohn.stasko/papers/chi11-video.pdf · Evaluating Video Visualizations of Human Behavior ... visualizes behavior as aggregate motion

Carolina Galleguillos and Serge Belongie Department of Computer Science and Engineering, UCSD {cgallegu,sjb}@cs.ucsd.edu Grocery shopping is a common activity

Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

video¸›ระกาศ... · bdbo . Title: video.pdf Author: JAME-MANAGE-PC Created Date: 12/29/2017 11:49:55 PM

Robust Object Tracking with Online Multiple Instance Learning Advisor: Sheng-Jyh Wang Student: Pei Chu Boris Babenko, Ming-Hsuan Yang, Serge Belongie

Allan Holdsworth Booklet for REH Video.pdf

Tax advantages of Employer issuing Employee Stock Options Ilona Babenko

Ultimate Media, 2013 - Massachusetts Institute of Technologycfp.mit.edu/events/13Apr/LippmanUltimate Media Video.pdf · • Extreme social connectedness An extreme platform for visual

Employee Stock Options and Investments..Babenko

Convolutional Networks with Adaptive Computation Graphs...Convolutional Networks with Adaptive Inference Graphs Andreas Veit Serge Belongie Department of Computer Science & Cornell

Estudio Sobre Los Usos Y Dificultades De La Camara De Video.pdf

Supervised Learning of Edges and Object Boundaries Piotr Dollár Zhuowen Tu Serge Belongie

Automation of Vessel Counting Jessica DeQuachBoris Babenko Christman LabBelongie Lab

Carolina Galleguillos and Serge Belongie Department of Computer Science and Engineering, UCSD