2
Nikhil Sawant K. K. Biswas Department of Computer Science and Engineering Indian Institute of Technology, Delhi Human Action Recognition Based on Spatio-temporal Features Target Localization Possible search space is xyt cube, which is reduce using target localization. Action and actor is localized in space and time Background subtraction helps localizing the actor. ROI is marked around the actor ROI is the only region processed, rest all ignored Motion Features Fixed sized grid A fixed sized grid overlaid on the region of interest Dimension of the grid is (X div x Y div) ROI is divided into both b ij with cenres at cij respectively We make use of Optical Flows. Optical flow is the pattern of relative motion between the object/object feature points and the viewer/camera. We make use of Lucas – Kanade, two frame differential method, it comparatively yields robust and dense optical flows Organizing Optical Flows Simple averaging Noise Reduction Noise removal by averaging. Optical flows with magnitude > (C*O mean ) are ignored, where C – constant [1.5 - 2], O mean - mean of optical flow within ROI Shape Feature Shape of the person gives information about the action being performed. Viola-Jones box features used to get shape features. We make use of 2-rectangle ad 4-rectangle features Foreground pixels in white region are subtracted from foreground pixels in grey region. These features are applied for all possible locations on the rectangular grid. Spatio-temporal Descriptor Shape and motion features combined over the span of time to form spatio-temporal features TSPAN is the offset between the consecutive video frames TLEN is the number of video frames used TLEN and TSPAN allows us to capture large change in possibly small number of number of frames Unorganized optical flows organized optical flows Learning with Adaboost We use standard Adaboost algorithm for learning the data. Adaboost is state of art learning algorithm. In case of Adaboost strong hypothesis is made up of weak hypothesis, infact weighted sum of weak hypothesis is a strong hypothesis. We consider linear decision stumps as the weak classifiers. We prepare mutually exclusive training and testing dataset. The system is trained first for the set of actions. For each give video system classifies it into one of the action class for which it is trained. Data set Results and conclusion We observe only 10% error in waving, stand up and bending actions in our own dataset rest all actions show 0% error. In case of Weizman data set error is only observed in run and wave1 actions rest all action are unambiguous. We report overall error rate of 2.17% From this technique we can conclude that spatio-temporal features including motion and shape features can be used for action recognition effectively. Adaboost successfully Weighted averaging We constructed our own dataset with 7 actions and 8 actors videos are shot in daylight and against stable background. Various actions recorded are walk, run, wave1, wave2, bend, sit-down, stand-up We also benchmark our method with standard Weizman dataset, which contain 9 actions by 10 actors various actions. The actions included are bend, jack, jump, pjump, run, side, skip, walk, wave1, wave2.

Human Action Recognition Based on Spacio-temporal features-Poster

Embed Size (px)

Citation preview

Page 1: Human Action Recognition Based on Spacio-temporal features-Poster

Nikhil Sawant K. K. BiswasDepartment of Computer Science and Engineering

Indian Institute of Technology, Delhi

Human Action Recognition Based on Spatio-temporal Features

Target LocalizationPossible search space is xyt cube, which is reduce using target localization. Action and actor is localized in space and timeBackground subtraction helps localizing the actor. ROI is marked around the actorROI is the only region processed, rest all ignored

Motion FeaturesFixed sized grid

A fixed sized grid overlaid on the region of interest

Dimension of the grid is (Xdiv x Ydiv)

ROI is divided into both bij with cenres at cij respectively

We make use of Optical Flows. Optical flow is the pattern of relative motion between the object/object feature points and the viewer/camera.We make use of Lucas – Kanade, two frame differential method, it comparatively yields robust and dense optical flows

Organizing Optical FlowsSimple averaging

Noise Reduction

Noise removal by averaging.Optical flows with magnitude > (C*Omean) are ignored, where C – constant [1.5 - 2], Omean- mean of optical flow within ROI

Shape FeatureShape of the person gives information about the action being performed. Viola-Jones box features used to get shape features. We make use of 2-rectangle ad 4-rectangle featuresForeground pixels in white region are subtracted from foreground pixels in grey region. These features are applied for all possible locations on the rectangular grid.

Spatio-temporal DescriptorShape and motion features combined over the span of time to form spatio-temporal featuresTSPAN is the offset between the consecutive video framesTLEN is the number of video frames used

TLEN and TSPAN allows us to capture large change in possibly small number of number of frames

Unorganized optical flows organized optical flows

Learning with AdaboostWe use standard Adaboost algorithm for learning the data. Adaboost is state of art learning algorithm. In case of Adaboost strong hypothesis is made up of weak hypothesis, infact weighted sum of weak hypothesis is a strong hypothesis. We consider linear decision stumps as the weak classifiers.

We prepare mutually exclusive training and testing dataset. The system is trained first for the set of actions. For each give video system classifies it into one of the action class for which it is trained.

Data set Results and conclusionWe observe only 10% error in waving, stand up and bending actions in our own dataset rest all actions show 0% error. In case of Weizman data set error is only observed in run and wave1 actions rest all action are unambiguous. We report overall error rate of 2.17% From this technique we can conclude that spatio-temporal features including motion and shape features can be used for action recognition effectively. Adaboost successfully classifies the descriptors formed using spatio-temporal features.

Weighted averaging

We constructed our own dataset with 7 actions and 8 actors videos are shot in daylight and against stable background. Various actions recorded are walk, run, wave1, wave2, bend, sit-down, stand-up We also benchmark our method with standard Weizman dataset, which contain 9 actions by 10 actors various actions. The actions included are bend, jack, jump, pjump, run, side, skip, walk, wave1, wave2.

Page 2: Human Action Recognition Based on Spacio-temporal features-Poster

Nikhil Sawant K. K. BiswasDepartment of Computer Science and Engineering

Indian Institute of Technology, Delhi

Human Action Recognition Based on Spatio-temporal Features

Target LocalizationPossible search space is xyt cube, which is reduce using target localization. Action and actor is localized in space and timeBackground subtraction helps localizing the actor. ROI is marked around the actorROI is the only region processed, rest all ignored

Motion FeaturesFixed sized grid

A fixed sized grid overlaid on the region of interestDimension of the grid is (Xdiv x Ydiv)

ROI is divided into both bij with cenres at cij respectively

We make use of Optical Flows. Optical flow is the pattern of relative motion between the object/object feature points and the viewer/camera. We make use of Lucas – Kanade, two frame differential method, it comparatively yields robust and dense optical flows

Organizing Optical FlowsNoise Reduction

Noise removal by averaging.Optical flows with magnitude > (C*Omean) are ignored, where C – constant [1.5 - 2], Omean- mean of optical flow within ROI

Shape FeatureShape of the person gives information about the action being performed. Viola-Jones box features used to get shape features. We make use of 2-rectangle ad 4-rectangle featuresForeground pixels in white region are subtracted from foreground pixels in grey region. These features are applied for all possible locations on the rectangular grid.

Shape and motion features combined over the span of time to form spatio-temporal featuresTSPAN is the offset between the consecutive video framesTLEN is the number of video frames usedTLEN and TSPAN allows us to capture large change in possibly small number of number of frames

Unorganized optical flows organized optical flows

LearningSpatio-temporal features formed using shape and motion features. The features extracted from the training are provides to the learning system so that the pattern produced by the action classes is understood. We prepare mutually exclusive training and testing dataset. Once the system is trained with variety of samples from each class it is ready of action detection. For each given video system classifies it into one of the action class for which it is trained.

Data set Results and conclusionWe observe only 10% error in waving, stand up and bending actions in our own dataset rest all actions show 0% error. In case of Weizman data set error is only observed in run and wave1 actions rest all action are unambiguous. We report overall error rate of 2.17% From this technique we can conclude that spatio-temporal features including motion and shape features can be used for action recognition effectively. Adaboost successfully classifies the descriptors formed using spatio-temporal features.

Weighted averaging

We constructed our own dataset with 7 actions and 8 actors videos are shot in daylight and against stable background. Various actions recorded are walk, run, wave1, wave2, bend, sit-down, stand-up We also benchmark our method with standard Weizman dataset, which contain 9 actions by 10 actors various actions. The actions included are bend, jack, jump, pjump, run, side, skip, walk, wave1, wave2.

Spatio-temporal Descriptor

Confusion matrix (weizman dataset)

Adaboost

We use standard Adaboost algorithm for learning the data. Adaboost is state of art learning algorithm. In case of Adaboost strong hypothesis is made up of weak hypothesis, infact weighted sum of weak hypothesis is a strong hypothesis. We consider linear decision stumps as the weak classifiers.

Classification in case of Adaboost can be binary or multiclass we make use of multiclass classification. We give ‘n’ action classes to the Adaboost system which trains itself to detect the pattern produced by different actions.