Upload
alejandro-cartas
View
75
Download
2
Embed Size (px)
Citation preview
Going&Deeper&into&First.Person&Ac2vity&Recogni2on&
!
Slides!by!Alejandro&Cartas!
Minghuang!Ma,!Haoqi!Fan!and!Kris!M.!Kitani!
All!diagrams!and!images!are!originally!from!Minghuang!Ma,!et.!Al.!or!otherwise!stated!!
What&is&this&paper&about?&
• Proposes! a! two.stream& CNN! to! recognize! Ac2vi2es&(object+ac2on)!in!short!egocentric!videos.!
Preparing a Hotdog sequence
`!
Pictu
res!ta
ken!fro
m!GTEA!dataset!
BREAD!
Take&bread!!
Take!
HOTDOG&(SAUSAGE)!
Take&Hotdog!Take!
BREAD! HOTDOG&(SAUSAGE)!
Put!Put&Hotdog&on&Bread!
… …
Proposed&approach&
hand segmentation object localization
action'take'
object'milk container'
activity'take milk container'
optical flow
Motion stream
Appearance stream
Input video
Take&bread&sequence!!
ARM+HAND! ARM+HAND! ARM+HAND!BREAD! BREAD!
… …
Pictu
res!ta
ken!fro
m!GTEA!dataset!
CNN&Architecture&
Fully&Convolu2onal&networks&
!
Late&Fusion&
!
Binary&SoQmax&layer&
!
Perpixel&Euclidean&loss&
!
CNN&Architecture&
Fully&Convolu2onal&networks&
Late&Fusion&
Binary&SoQmax&layer&
Perpixel&Euclidean&loss&
Appearance&Stream&
!
Appearance&training&
Hand mask
Location heatmap
Segmentation CNN
Localization CNN
fine-tune
Training&data&
Images!
GroundHtruth!
hand!masks!
Heatmaps!
(Gaussian!bumps)!
Appearance&stream&
Localization CNN
ObjectNet
(Appearance-based)
Segmentation CNN
Input video clip
Handsegmentation
interest region
Results&
CNN&Architecture&
Fully&Convolu2onal&networks&
Late&Fusion&
Binary&SoQmax&layer&
Perpixel&Euclidean&loss&
CNN&Architecture&
Fully&Convolu2onal&networks&
Late&Fusion&
Binary&SoQmax&layer&
Perpixel&Euclidean&loss&
Mo2on&Stream&
!
Mo2on&stream&
Results&
(Motion-based)
ActionNet
Input video clip
Optical flow
~ ~
StartHEnd!
Image!frames!
StartHEnd!
OpLcal!flow!frames!
Average!
A&fixed&set&of&L"frames"
Pair&of&ver2cal&and&horizontal&frames"
CNN&Architecture&
Fully&Convolu2onal&networks&
Late&Fusion&
Binary&SoQmax&layer&
Perpixel&Euclidean&loss&
CNN&Architecture&
Fully&Convolu2onal&networks&
Late&Fusion&
Binary&SoQmax&layer&
Perpixel&Euclidean&loss&
Full&architecture&
!
Results&
GTEA& GAZE& GAZE&+44&
Closer&look&at&the&GAZE&Confusion&matrix&
Pictu
res!ta
ken!fro
m!GAZE!dataset!
Take peanut Open peanut
Close peanut
Scoop peanut