SUPER: Towards Real-time Event Recognition in Internet Videos Yu-Gang Jiang School of Computer Science Fudan University Shanghai, China [email protected]

Embed Size (px)

Citation preview

  • Slide 1
  • SUPER: Towards Real-time Event Recognition in Internet Videos Yu-Gang Jiang School of Computer Science Fudan University Shanghai, China [email protected] ACM ICMR 2012, Hong Kong, June 2012 Speeded Up Event Recognition ACM International Conference on Multimedia Retrieval (ICMR), Hong Kong, China, Jun. 2012.
  • Slide 2
  • The Problem 2 Recognize high-level events in videos Were particularly interested in Internet Consumer videos Applications Video Search Personal Video Collection Management Smart Advertising Intelligence Analysis
  • Slide 3
  • Our Objective 3 Improve Efficiency Maintain Accuracy
  • Slide 4
  • The Baseline Recognition Framework 4 Feature extraction SIFT Spatial-temporal interest points MFCC audio feature Late Average Fusion 2 kernel SVM Classifier Yu-Gang Jiang, Xiaohong Zeng, Guangnan Ye, Subh Bhattacharya, Dan Ellis, Mubarak Shah, Shih-Fu Chang, Columbia-UCF TRECVID2010 Multimedia Event Detection: Combining Multiple Modalities, Contextual Concepts, and Temporal Matching, NIST TRECVID Workshop, 2010. Best Performing approach in TRECVID-2010 Multimedia event detection (MED) task
  • Slide 5
  • Three Audio-Visual Features 5 SIFT (visual) D. Lowe, IJCV 04 STIP (visual) I. Laptev, IJCV 05 MFCC (audio) 16ms
  • Slide 6
  • Bag-of-words Representation SIFT / STIP / MFCC words Soft weighting (Jiang, Ngo and Yang, ACM CIVR 2007) Bag-of-SIFT 6 Bag of audio words / bag of frames: K. Lee and D. Ellis, Audio-Based Semantic Concept Classification for Consumer Video, IEEE Trans on Audio, Speech, and Language Processing, 2010
  • Slide 7
  • Baseline Speed 7 Feature extraction SIFT Spatial-temporal interest points MFCC audio feature Late Average Fusion 2 kernel SVM Classifier 4 Factors on speed: Feature, Classifier, Fusion, Frame Sampling 82.0 916.8 2.36 ~2.00