19
Deep Learning for video Deep Learning for video understanding Wanli Ouyang Wanli Ouyang Department of Electronic Engineering, Th Chi Ui i fH K The Chinese University ofHong K ong

Deep Learning for videoxgwang/video.pdf · 2014. 7. 12. · Reference • Babenko, Boris, Ming‐Hsuan Yang, and Serge Belongie. "Robust object tracking with online multiple instance

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deep Learning for videoxgwang/video.pdf · 2014. 7. 12. · Reference • Babenko, Boris, Ming‐Hsuan Yang, and Serge Belongie. "Robust object tracking with online multiple instance

Deep Learning for videoDeep Learning for video understandingg

Wanli OuyangWanli OuyangDepartment of Electronic Engineering, Th Chi U i i f H KThe Chinese University of Hong Kong

Page 2: Deep Learning for videoxgwang/video.pdf · 2014. 7. 12. · Reference • Babenko, Boris, Ming‐Hsuan Yang, and Serge Belongie. "Robust object tracking with online multiple instance

OutlineOutline

• Deep learning for image window

Tracking

Page 3: Deep Learning for videoxgwang/video.pdf · 2014. 7. 12. · Reference • Babenko, Boris, Ming‐Hsuan Yang, and Serge Belongie. "Robust object tracking with online multiple instance

OutlineOutline

• Deep learning for image window• Deep learning for multiple imagesDeep learning for multiple images

Action recognitionTrack cycling Heptathlon Longboarding

Page 4: Deep Learning for videoxgwang/video.pdf · 2014. 7. 12. · Reference • Babenko, Boris, Ming‐Hsuan Yang, and Serge Belongie. "Robust object tracking with online multiple instance

Deep learning for image window

Page 5: Deep Learning for videoxgwang/video.pdf · 2014. 7. 12. · Reference • Babenko, Boris, Ming‐Hsuan Yang, and Serge Belongie. "Robust object tracking with online multiple instance

Deep learning trackerDeep learning tracker

• Tracking by classificationForeground positiveg pBackground negative

Cl ifi ti ith d• Classification with deep model

Deep classifier

[Babenko et al TPAMI11]

Deep classifier

[Wang&Yeung NIPS13]

Page 6: Deep Learning for videoxgwang/video.pdf · 2014. 7. 12. · Reference • Babenko, Boris, Ming‐Hsuan Yang, and Serge Belongie. "Robust object tracking with online multiple instance

Deep learning trackerDeep learning tracker

b k d d• Pretrain by stacked auto encoder• Use 4 fully connected deep model y pfor learning the classifier from 32x32 input patch32x32 input patch. 

classification

Deep classifierDeep classifier

Page 7: Deep Learning for videoxgwang/video.pdf · 2014. 7. 12. · Reference • Babenko, Boris, Ming‐Hsuan Yang, and Serge Belongie. "Robust object tracking with online multiple instance

Deep learning for multiple images

Page 8: Deep Learning for videoxgwang/video.pdf · 2014. 7. 12. · Reference • Babenko, Boris, Ming‐Hsuan Yang, and Serge Belongie. "Robust object tracking with online multiple instance

Deep learning for multiple imagesDeep learning for multiple images

C id K i 3K h l i th i t• Consider K images as 3K channels in the input data.

• Apply 3D CNN for extracting features

1 image, 3 channels K image, 3K channels

Page 9: Deep Learning for videoxgwang/video.pdf · 2014. 7. 12. · Reference • Babenko, Boris, Ming‐Hsuan Yang, and Serge Belongie. "Robust object tracking with online multiple instance

3D CNN for action recognition [Ji et al. TPAMI13]

• CNN channels can be hard wired. E.g. gray pixel values, gradient‐x/y,hard wired. E.g. gray pixel values, gradient x/y, optical flow‐x/y.Learned weights at other layersLearned weights at other layers

Page 10: Deep Learning for videoxgwang/video.pdf · 2014. 7. 12. · Reference • Babenko, Boris, Ming‐Hsuan Yang, and Serge Belongie. "Robust object tracking with online multiple instance

3D CNN for action recognition3D CNN for action recognition 

• Encourage the output to be close to high‐level features (bag‐of‐words, motion edge history ( g g yimage).

Auxiliary feature yextractors

Auxiliary motion

3D CNN

motion features

Action class

Page 11: Deep Learning for videoxgwang/video.pdf · 2014. 7. 12. · Reference • Babenko, Boris, Ming‐Hsuan Yang, and Serge Belongie. "Robust object tracking with online multiple instance

Action recognition resultsAction recognition results

Cell to ear

Object put

Pointing

Page 12: Deep Learning for videoxgwang/video.pdf · 2014. 7. 12. · Reference • Babenko, Boris, Ming‐Hsuan Yang, and Serge Belongie. "Robust object tracking with online multiple instance

Large‐scale Video Classification with CNN [Karpath et al. CVPR 2014]

• Multi‐resolution

8989

89

89

89

Page 13: Deep Learning for videoxgwang/video.pdf · 2014. 7. 12. · Reference • Babenko, Boris, Ming‐Hsuan Yang, and Serge Belongie. "Robust object tracking with online multiple instance

Temporal FusionTemporal Fusion

Page 14: Deep Learning for videoxgwang/video.pdf · 2014. 7. 12. · Reference • Babenko, Boris, Ming‐Hsuan Yang, and Serge Belongie. "Robust object tracking with online multiple instance

Experimental resultsExperimental results• Randomly sample 20 clips of a video andRandomly sample 20 clips of a video and averaging the output of these clip predictions.

L b diTrack cycling Longboarding

Cycling Longboarding

Track cycling Aggressive inline skating

Page 15: Deep Learning for videoxgwang/video.pdf · 2014. 7. 12. · Reference • Babenko, Boris, Ming‐Hsuan Yang, and Serge Belongie. "Robust object tracking with online multiple instance

2 “How to”s2  How to s• How to effectively train a deep model D t t tiData augmentation Label more dataPre train on large scale related data (RCNN)Pre‐train on large‐scale related data (RCNN) Layerwise pre‐training + fine tuning (Multi‐stage)

• How to formulate a vision problem with deep learning?• How to formulate a vision problem with deep learning? Tune hyper‐parameters,  e.g. number of hidden nodes, 

number of layers, activation function, dropout.u be o aye s, act at o u ct o , d opoutMake use of experience and insights obtained in CV researchSequential design/learning vs joint learning Contextual information  (Multi‐stage, face, human pose)Background clutter removal (SDN)Background clutter removal (SDN)Short and long range temporal relationship (Action recognition)

Page 16: Deep Learning for videoxgwang/video.pdf · 2014. 7. 12. · Reference • Babenko, Boris, Ming‐Hsuan Yang, and Serge Belongie. "Robust object tracking with online multiple instance

Conclusion 2 “How to”sConclusion ‐ 2  How to s• How to effectively train a deep model D t t tiData augmentation Label more dataPre train on large scale related data (RCNN)Pre‐train on large‐scale related data (RCNN) Layerwise pre‐training + fine tuning (Multi‐stage)

• How to formulate a vision problem with deep learning?• How to formulate a vision problem with deep learning? Tune hyper‐parameters,  e.g. number of hidden nodes, 

number of layers, activation function, dropout.u be o aye s, act at o u ct o , d opoutMake use of experience and insights obtained in CV researchSequential design/learning vs joint learning Contextual information  (Multi‐stage, face, human pose)Background clutter removal (SDN)Background clutter removal (SDN)Short and long range temporal relationship (Action recognition)

Page 17: Deep Learning for videoxgwang/video.pdf · 2014. 7. 12. · Reference • Babenko, Boris, Ming‐Hsuan Yang, and Serge Belongie. "Robust object tracking with online multiple instance

ReferenceReference

• Babenko, Boris, Ming‐Hsuan Yang, and Serge Belongie. "Robust object tracking with online multiple instance learning " Pattern Analysis and Machine Intelligence IEEElearning. Pattern Analysis and Machine Intelligence, IEEE Transactions on 33.8 (2011): 1619‐1632.

• Wang N & Yeung D Y “Learning a deep compact imageWang, N., & Yeung, D. Y.,  Learning a deep compact image representation for visual tracking” NIPS, 2013.

• Karpathy, Andrej, et al. "Large‐scale video classification with p y, j, gconvolutional neural networks“ CVPR, 2014.

• Ji, Shuiwang, et al. "3D convolutional neural networks for human action recognition." Pattern Analysis and Machine Intelligence, IEEE Transactions on35.1 (2013): 221‐231.

Page 18: Deep Learning for videoxgwang/video.pdf · 2014. 7. 12. · Reference • Babenko, Boris, Ming‐Hsuan Yang, and Serge Belongie. "Robust object tracking with online multiple instance

Thank you!Thank you!

mmlab.ie.cuhk.edu.hk/ www.ee.cuhk.edu.hk/~xgwang/ www.ee.cuhk.edu.hk/~wlouyang/

Page 19: Deep Learning for videoxgwang/video.pdf · 2014. 7. 12. · Reference • Babenko, Boris, Ming‐Hsuan Yang, and Serge Belongie. "Robust object tracking with online multiple instance