Video Activity Recognition

Video Activity Recognition

Team:Andrii DidkivskyiNickolay MykhalychOleksii MoskalenkoOlena ShevchenkoOrest KupynPylyp ShurpykSviatoslav Sheipak

Mentor:Andrii Lyubonko

LVDS 2017

13000 videos - 101 classes - 10Gb of data

UCF101

Real life example

Sky Diving - 47%

Handstand Walking - 18%

Rope Climbing - 18%

State of the ArtJuly 20th 2017

https://arxiv.org/pdf/1705.07750.pdf

Our solution

Video preprocessing

50

CNN CNN CNN

Sequence learning

Output

Visual Features

Input

LSTM

ResNet-18

Pipeline

ResultsModels Validation score, %

LTRCN 50 frames 68,5

LTRCN 16 frames with stride 8 ~60*

LTRCN 50 frames with batch

normalized LSTM

73,2

LTRCN - Long-term recurrent convolution network

* - to be improved...

Results

Top Winners Top Losers

Diving Yoyo

Playing Piano Jump rope

Sumo wrestling Nunchucks

Testing

Brushing Teeth - 73%

Apply Lipstick - 20%

Haircut - 4%

Apply Eye Makeup - 97%

Apply Lipstick - 1 %

Brushing Teeth - 0.5%

Typing - 85%

Playing Guitar - 4%

Blowing Candles - 4%

Writing On Board - 69%

Brushing Teeth - 11%

Apply Lipstick - 8%

References1. Cooijmans, Tim, et al. "Recurrent batch normalization." arXiv preprint

arXiv:1603.09025 (2016). https://arxiv.org/pdf/1603.09025.pdf

2. http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Donah

ue_Long-Term_Recurrent_Convolutional_2015_CVPR_paper.pdf

3. https://arxiv.org/pdf/1411.4389.pdf

4. https://arxiv.org/pdf/1705.07750.pdf

5. http://colah.github.io/posts/2015-08-Understanding-LSTMs/


http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Donahue_Long-Term_Recurrent_Convolutional_2015_CVPR_paper.pdf

http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Donahue_Long-Term_Recurrent_Convolutional_2015_CVPR_paper.pdf



http://colah.github.io/posts/2015-08-Understanding-LSTMs/

https://github.com/lyubonko/ldsss17_project

https://github.com/lyubonko/ldsss17_project

Education

Video Activity Recognition