Mathematical Model for Vision-Based Recognition of Human Gestures and Applications

Mathematical Model for Vision-Based Recognitionof Human Gestures and ApplicationsSetiawan Hadi, Universitas Padjadjaran, IndonesiaInternational Congress of Mathematicians (ICM), August 13 - 21, 2014, Korea

ObjectiveDevelop a mathematical model, based on HiddenMarkov Model, for recognizing gesture in temporalstreaming data. This model can be applied in avision-based gesture recognition system. The sys-tem is useful for human computer interaction, help-ing people with disabilities, security surveillance,touchless retrieval system, event analysis, commu-nication and translation system, and many more.

OverviewGestures are expressive, meaningful atomic body-part motions, i.e., physical movements of the fingers,arms, head, face, or body with the intent to conveyinformation or interact with the environment.

Figure 1: Sample of Human Gesture [1]

Types of Gesture [2]

Gesticulation. Spontaneous movements of thehands and arms that accompany speech.

Language-like gestures. Gesticulation that is in-tegrated into a spoken utterance, replacing aparticular spoken word or phrase.

Pantomimes. Gestures that depict objects or ac-tions, with or without accompanying speech.

Emblems. Familiar gestures such as “V” for vic-tory, thumbs up, and assorted rude gestures.

Sign languages. Linguistic systems, such asAmerican/British/Indonesian Sign Language.

Figure 2: General System for Activity Recognition

Materials and Methods1. Acquisition System, using Kinect XBOX360

2. Image preprocessing Methods

3. Hidden Markov Model (HMM)

4. Microsoft Kinect Software Development Kit

5. Human gesture dataset (ChaLearn, Indone-sian cultural/anthropological signs

Figure 3: RGBD data from Kinect [3]

Figure 4: Example of HMM model for the actionstretching an arm [4]

Mathematical ModelDiscrete Markov Model

• N distinct states.

• Begins (at time t = 1) in some initial state(s).

• At each time step (t = 1, 2, . . .) the system moves from current to next state (possibly the same as thecurrent state) according to transition probabilities associated with current state.

Figure 5: HMM Topology of Straight-line Segment [5]

A Hidden Markov Model can be symbolized with λ = (A,B, π) and is characterized by the following elements

• The set of states S = s1, s2, . . . , sN . N represents the number of states in the model.

• An initial probability distribution for each state π such that πi = P (si), 1 ≤ i ≤ N

• An N -by-N transition matrix A = {aij}, which is given by aij = P (sj |si), 1 ≤ i, j ≤ N

where aij is the probability of the transition from state si at time t to sj at time t + 1. The sum ofthe entries in each row of matrix A must be 1 because it is the sum of the probabilities of making atransition from a given state to each other states

∑jaij = 1.

• The set of possible emission (an observation) O = o1, o2, . . . , oT in which T is the length of gesturepath.

• The set of discrete symbols V = v1, v2, . . . , vM , where M represents the number of distinct observationsymbols per state.

• An N -by-M observation matrix B = {bj(m)}, where bj(m) = P (vm|sj), 1 ≤ j ≤ N , 1 ≤ m ≤ M ; and∑

mbj(m) = 1, where bj(m) gives the probability of emitting symbol vm at state sj . The sum of the

entries in each row of matrix B must be 1 for the same previous reason.

A complete specification of the HMMs contains two model parameters (N and M). Additionally, it alsoincludes the observation symbols and the three probabilistic parameters A, B and π. Thus, a compactnotation of HMM is λ = P (π,A,B). Here, λ refers to the parameters set of the model.

Figure 6: Space-time Representation of Action [4]

Applications

Figure 7: Gesture Recognition System

Hand Gesture for EthnicMusical Instrument Play-ing: In this application, handsmovement of an Indonesiantraditional musical instrument(named Kendang) player is de-tected.

Gesture-based Calculator:In this application, finger move-ment that hit virtual number onthe screen is detected and sub-mitted to a simple calculationengine.

Simple Object Transporta-tion via Head Movement: Inthis application, head movementis used to “click” selections tomove robot arm to a direction,to pick and store a simple object.

References[1] Kaggle I. ChaLearn Looking at People 2014. ChaLearn

and University of Barcelona, 2014.

[2] Kendo A. Some relationship between body motion andspeech. Studies in Dyadic Comm., pages 177–210, 1972.

[3] Sung et al. Unstructured human activity detection fromRGBD images. IEEE Int. Conf. on Robotics and Au-tomation, 2012.

[4] Aggarwal J.K. and Ryoo M.S. Human activity analysis:A review. ACM Computing Surveys, 43(3), 2011.

[5] Elmezain M. and S. El-shinaway. Vision based handgesture recognition using generative and discriminativestochastic models. Int. J. of Computer, Inf. Sys. andCntrl Eng., 7(11), 2013.

[6] Dymarski P. Hidden Markov Models, Theory and Appli-cations. InTech, 2011.

AcknowledgementsThis work is supported by (1) ICM 2014 Invita-tion Program NANUM 2014 (3) Indonesian govern-ment through Directorate General of Higher Edu-cation, (3) Universitas Padjadjaran through Facultyof Mathematics and Natural Sciences (FMIPA) andInformatics Department.

ContactDr. Setiawan HadiRobotics, AI and Digital Image LaboratoryComputer Vision LaboratoryUniversitas Padjadjaran, Bandung, IndonesiaURL: http://informatika.unpad.ac.id/visilabE-Mail: [email protected]