6
Gaussian Process Dynamical Models for hand gesture interpretation in Sign Language Nuwan Gamage a , Ye Chow Kuang a,, Rini Akmeliawati b , Serge Demidenko c a Monash University Sunway Campus, Jalan Lagoon Selatan, Bandar Sunway, 46150 Selangor D.E., Malaysia b International Islamic University Malaysia, Jalan Gombak, 53100 Kuala Lumpur, Malaysia c RMIT International University Saigon South Campus, 702 Nguyen Van Linh Blvd., District 7, HCMC, Viet Nam article info Article history: Received 6 January 2011 Available online 6 September 2011 Communicated by G. Borgefors Keywords: Gesture interpretation Gaussian Process Dynamical Model Gaussian Process Sign Language Hidden Markov Model abstract Classifying human hand gestures in the context of a Sign Language has been historically dominated by Artificial Neural Networks and Hidden Markov Model with varying degrees of success. The main objective of this paper is to introduce Gaussian Process Dynamical Model as an alternative machine learning method for hand gesture interpretation in Sign Language. In support of this proposition, the paper pre- sents the experimental results for Gaussian Process Dynamical Model against a database of 66 hand ges- tures from the Malaysian Sign Language. Furthermore, the Gaussian Process Dynamical Model is tested against established Hidden Markov Model for a comparative evaluation. A discussion on why Gaussian Process Dynamical Model is superior over existing methods in Sign Language interpretation task is then presented. Ó 2011 Elsevier B.V. All rights reserved. 1. Introduction Machine based automatic Sign Language (SL) hand gesture inter- pretation has long been a popular research topic in Human Com- puter Interaction (HCI). Gesture interpretation accuracy depends on many factors including the adopted learning method, hand ges- ture features, employed device characteristics, etc. Hardware de- vices such as single and stereo cameras (Brand and Oliver, 1997), depth-aware cameras (Cappé et al., 2005) and wired glove (Eddy, 1996) were used for gesture data acquiring. Features such as hand movements, hand position, hand pose, finger configuration hand silhouettes are commonly extracted from the input data for gesture interpretation (Freeman and Weissman, 1995). Statistical machine learning methods such as Artificial Neural Networks (ANN) (Juang and Rabiner, 1991) and Hidden Markov Model (HMM) (Rabiner, 1989; Ong and Ranganath, 2005; Microsoft Corporation, 2010) have been applied to hand gesture learning and interpretation. The suitability of ANN and HMM in gesture recognition tasks have not been questioned in the past. This paper proposes Gaussian Process Dynamical Model (GPDM) as an alternative to HMM and ANN for hand gesture recognition in the context of SL translation. GPDM is more transparent and hence more amenable to human interpretation compared to the black-box approaches such as HMM and ANN. Furthermore, the Gaussian regularisation inherent in GPDM framework enables better generalisation performance in gesture recognition task as will be shown in the below sections. The results are validated on a set of gestures from the Malaysian Sign Language (MSL), and the performance is compared against the popular HMM approach. Section 2 briefly reviews HMM and GPDM machine learning methods. Section 3 explains the proposed methodology and the rationale behind it. Section 4 provides extensive details on the experimental validation of the proposed methodology; the results obtained and related discussion. Section 5 discusses possible future investigations. Finally, Section 6 summarises the findings and con- tributions of this study. 2. Machine learning Gesture interpretation for Sign Languages can be regarded as a supervised learning problem, as the labels of the training and test samples are known prior to the training process. Two of the super- vised learning schemes employed in this study are HMM and GDPM. 2.1. Hidden Markov Model (HMM) Apart from ANN, HMM is the most popular machine learning method in SL translation. Thus, HMM is used as the benchmark (or controlled) method in this study. 0167-8655/$ - see front matter Ó 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.patrec.2011.08.015 Corresponding author. Tel.: +60 3 5514 6239; fax: +60 3 5514 6207. E-mail addresses: [email protected] (N. Gamage), kuang.ye.chow@mo- nash.edu, [email protected] (Y.C. Kuang), [email protected] my (R. Akmeliawati), [email protected] (S. Demidenko). Pattern Recognition Letters 32 (2011) 2009–2014 Contents lists available at SciVerse ScienceDirect Pattern Recognition Letters journal homepage: www.elsevier.com/locate/patrec

Gaussian Process Dynamical Models for hand gesture interpretation in Sign Language

Embed Size (px)

Citation preview

Page 1: Gaussian Process Dynamical Models for hand gesture interpretation in Sign Language

Pattern Recognition Letters 32 (2011) 2009–2014

Contents lists available at SciVerse ScienceDirect

Pattern Recognition Letters

journal homepage: www.elsevier .com/locate /patrec

Gaussian Process Dynamical Models for hand gesture interpretationin Sign Language

Nuwan Gamage a, Ye Chow Kuang a,⇑, Rini Akmeliawati b, Serge Demidenko c

a Monash University Sunway Campus, Jalan Lagoon Selatan, Bandar Sunway, 46150 Selangor D.E., Malaysiab International Islamic University Malaysia, Jalan Gombak, 53100 Kuala Lumpur, Malaysiac RMIT International University Saigon South Campus, 702 Nguyen Van Linh Blvd., District 7, HCMC, Viet Nam

a r t i c l e i n f o

Article history:Received 6 January 2011Available online 6 September 2011Communicated by G. Borgefors

Keywords:Gesture interpretationGaussian Process Dynamical ModelGaussian ProcessSign LanguageHidden Markov Model

0167-8655/$ - see front matter � 2011 Elsevier B.V. Adoi:10.1016/j.patrec.2011.08.015

⇑ Corresponding author. Tel.: +60 3 5514 6239; faxE-mail addresses: [email protected] (N. G

nash.edu, [email protected] (Y.C.my (R. Akmeliawati), [email protected] (

a b s t r a c t

Classifying human hand gestures in the context of a Sign Language has been historically dominated byArtificial Neural Networks and Hidden Markov Model with varying degrees of success. The main objectiveof this paper is to introduce Gaussian Process Dynamical Model as an alternative machine learningmethod for hand gesture interpretation in Sign Language. In support of this proposition, the paper pre-sents the experimental results for Gaussian Process Dynamical Model against a database of 66 hand ges-tures from the Malaysian Sign Language. Furthermore, the Gaussian Process Dynamical Model is testedagainst established Hidden Markov Model for a comparative evaluation. A discussion on why GaussianProcess Dynamical Model is superior over existing methods in Sign Language interpretation task is thenpresented.

� 2011 Elsevier B.V. All rights reserved.

1. Introduction

Machine based automatic Sign Language (SL) hand gesture inter-pretation has long been a popular research topic in Human Com-puter Interaction (HCI). Gesture interpretation accuracy dependson many factors including the adopted learning method, hand ges-ture features, employed device characteristics, etc. Hardware de-vices such as single and stereo cameras (Brand and Oliver, 1997),depth-aware cameras (Cappé et al., 2005) and wired glove (Eddy,1996) were used for gesture data acquiring. Features such as handmovements, hand position, hand pose, finger configuration handsilhouettes are commonly extracted from the input data for gestureinterpretation (Freeman and Weissman, 1995). Statistical machinelearning methods such as Artificial Neural Networks (ANN) (Juangand Rabiner, 1991) and Hidden Markov Model (HMM) (Rabiner,1989; Ong and Ranganath, 2005; Microsoft Corporation, 2010) havebeen applied to hand gesture learning and interpretation.

The suitability of ANN and HMM in gesture recognition taskshave not been questioned in the past. This paper proposes GaussianProcess Dynamical Model (GPDM) as an alternative to HMM andANN for hand gesture recognition in the context of SL translation.GPDM is more transparent and hence more amenable to humaninterpretation compared to the black-box approaches such as

ll rights reserved.

: +60 3 5514 6207.amage), kuang.ye.chow@mo-Kuang), [email protected]. Demidenko).

HMM and ANN. Furthermore, the Gaussian regularisation inherentin GPDM framework enables better generalisation performance ingesture recognition task as will be shown in the below sections.The results are validated on a set of gestures from the MalaysianSign Language (MSL), and the performance is compared againstthe popular HMM approach.

Section 2 briefly reviews HMM and GPDM machine learningmethods. Section 3 explains the proposed methodology and therationale behind it. Section 4 provides extensive details on theexperimental validation of the proposed methodology; the resultsobtained and related discussion. Section 5 discusses possible futureinvestigations. Finally, Section 6 summarises the findings and con-tributions of this study.

2. Machine learning

Gesture interpretation for Sign Languages can be regarded as asupervised learning problem, as the labels of the training and testsamples are known prior to the training process. Two of the super-vised learning schemes employed in this study are HMM andGDPM.

2.1. Hidden Markov Model (HMM)

Apart from ANN, HMM is the most popular machine learningmethod in SL translation. Thus, HMM is used as the benchmark(or controlled) method in this study.

Page 2: Gaussian Process Dynamical Models for hand gesture interpretation in Sign Language

2010 N. Gamage et al. / Pattern Recognition Letters 32 (2011) 2009–2014

A formal definition of HMM is given by Ong and Ranganath(2005),

‘‘An HMM is a doubly stochastic process with an underlying sto-chastic process that is not observable (it is hidden), but can onlybe observed through another set of stochastic processes that pro-duce the sequence of observed symbols.’’

HMM is an extension of the Markov process model. In a regularMarkov process, states are inter-connected with state transitionprobabilities. In HMM, an additional layer of stochastic process(so-called observations) is introduced. Observations are derivedbased on state transition patterns and there are probabilities ofoccurrence associated with them. Observations are the only visiblepart in HMM, while the actual Markov process is hidden under-neath. As the current state of the Markov process is unobservableat any given moment; it is given the name ‘Hidden Markov Model’(Rabiner and Juang, 1986; Pavlovic et al., 1997; Ong and Rang-anath, 2005). Fig. 1 illustrates the operation of HMM graphically.

The challenge in training a HMM is to find a set of suitable mod-el parameters (N,M,A,B,p) that will describe system dynamics sat-isfactorily while avoiding over-learning. Many variations of basicHMM and advanced algorithms have been developed to achievethis goal. Interested readers can find out the details in (Ong andRanganath, 2005) and the references there in. Historically, variantsof HMM have dominated the gesture interpretation domain. Anextensive list of studies in this field is available in (Cappé, 2001;Ong and Ranganath, 2005).

HMMs with mixture of Gaussians outputs (MHMM) reported in(Bilmes, 1998; Murphy, 1998; Fraser, 2008; Resch, 2010) will beused below in the experimental studies. MHMM is one of the latestgeneralizations of HMM with significant learning capability andflexibility.

2.2. Gaussian Process Dynamical Models (GPDM)

Gaussian Processes (GP) were initially employed for regression ofstatic data (MacKay, 2003). GP alone could not handle tracking,gesture recognition or any other time-series data problems effec-tively. Furthermore, performing machine learning with high-dimensional data often leads to poor outcomes (Urtasun, 2006).Hence, the Gaussian Process Latent Variable Model (GPLVM) wasintroduced to circumvent these issues (Lawrence, 2003). GPLVM

Fig. 1. Hidden Markov Model (x – states, y – observations, a

essentially learns the most relevant low-dimensional embedding(or latent variables) from the high-dimensional training data (Urt-asun, 2006) while discarding statistically less significant variations.The disadvantage of GPLVM is that, there is no description of therelationship between the latent variables. Thus the GPDM was pro-posed by (Wang and Fleet, 2005, 2008) to address this deficiency ofGPLVM.

GPDM can be graphically represented as in Fig. 2(a). The sub-script t indicates the time stamp of a time-series data; xt is the latentvariable, yt is the high dimensional-data; / are basis functions thatencode the transition probability from latent variables to each otherwhile w are the basis functions that map the latent variables to thetraining data The mappings are parameterised by A and B. Theseparameters are not of interest from the Bayesian perspective of func-tion approximation and may not be unique (Urtasun, 2006).Thus itis possible to marginalize them out as demonstrated in (MacKay,2003) (this is similar to Kernel Trick). Combining all the priors, thelatent mapping, and the dynamics provides the final GPDM modelas presented in Fig. 2(b). A complete discussion on the GPD mathe-matical framework is available in (Wang and Fleet, 2005).

GPDM is a fairly recent development in the machine learningspace. Its initial applications were mostly concentrated around hu-man motion tracking (Wang and Fleet, 2005; Urtasun, 2006). Handgesture interpretation of Sign Language is an inherently complexproblem, which compounds multidimensional time-series dataand fluctuations in different instances (of the same gesture). More-over, GPDM is a parameter-less model, making it a convenient anda computationally less heavy method in practice. Overall, GPDMbrings many advantages in solving certain machine learning prob-lems, mostly in the domain of human motion modelling.

3. Gesture interpretation process

Gesture interpretation process can be divided into three distinctstages, namely: (1) Raw data normalisation; (2) training gesturemodels; and (3) classification.

3.1. Raw data normalisation

The raw data is captured using a colour digital camera in the vi-deo format. A SwisTrack (Correll and Sempo, 2006; Lochmatter andRoduit, 2008) implementation of the colour separation and track-ing method explained in (Gamage and Akmeliawati, 2009) is used

– state transition probabilities, b – output probabilities).

Page 3: Gaussian Process Dynamical Models for hand gesture interpretation in Sign Language

Fig. 2. Graphical representation of GPDM. (a) Nonlinear latent-variable model for time series. (b) After marginalizing parameters A and B.

1 Ninety percent evaluation is equivalent to ‘hold-one-out’ test.

N. Gamage et al. / Pattern Recognition Letters 32 (2011) 2009–2014 2011

to extract the 2D trajectories of head and hands of the signer. Ex-tracted trajectories are converted to a Polar coordinate systemfrom Cartesian coordinates at the first stage. Stationary head orhand components are not informative. Hence they are removedin the second stage. Then at stage three, it is possible to distinguishgestures with a single or both hands, assuming the head remainsrelatively stationary. In the following stage 4, the mean co-ordinateof the gesture is removed to reposition it at the origin. In stage 5,the gesture is rotated and scaled down to a neutral position usingSingular Value Decomposition (SVD) (Anderson et al., 1999; Anton,2010). Finally in stage 6, the gesture is re-sampled using Beziercurve (Farin and Hansford, 2000) techniques, in order to obtain tra-jectories that are: (i) jitter-free, clean and smooth, (ii) having stan-dard sampling rate (e.g., 30 sampling points per trajectory). Anoptional chain code based vectorization (i.e., reshaping multi-dimensional data into a single dimension) is performed for HMMrelated experiments.

3.2. Training gesture models

Training HMMs (with mixture of Gaussians outputs) for ges-tures requires training data and the initial versions of prior,mean vector and co-variance matrix as an input. The numberof hidden states (Q) and the number of Gaussian peaks (M)are two parameters that can be varied manually to maximiseeffectiveness of the training process. [Note: HMM gesture mod-els for certain combinations of Q and M are unattainable dueto limited of training data size, especially for single-hand ges-tures.] Prior, mean vector and co-variance are optimised duringthe training and optimised parameters are the HMM model fora particular gesture class.

As for training the GPDM gesture models, it requires trainingdata, dimension weights, initial versions of hyperparametersand latent variable. Principal Component Analysis (PCA) of thetraining data usually taken as the initial latent variable. Duringthe training process, hyper-parameters and latent variable areoptimised for a particular gesture, and the mean gesture trajec-tory is estimated. The mean gesture trajectory and variance atvarious points along the trajectory defines the GPDM model fora particular gesture.

3.3. Classification

Classification with HMM is a rather straight forward process.Using the forward-backward algorithm (Baum et al., 1970; Mars-land, 2009), log-likelihood between a new gesture and an existinggesture classis calculated. This is repeated for all the gesture clas-ses in the existing database. The gesture class that provides themaximum log-likelihood is considered to be the most probablegesture class for the new gesture.

As for classification in GPDM, a kernel can be generated for aparticular gesture class with the hyperparameters and latent vari-able. Then a Radial Basis Function (RBF) kernel can be generatedusing a new gesture sample after subtracting the mean gesture tra-jectory. Using both kernels, the Gaussian mean and variance distri-bution can be calculated; and then log-likelihood can be estimated.Similar to HMM classification, this process is reiterated to all avail-able gesture classes in the database, and the gesture class that pro-vides the maximum log-likelihood is the most probable gestureclass for the new gesture.

4. Experiments and results

4.1. Gesture database

A database of 66 gesture classes from MSL has been used. Out ofthem, 36 are single-hand gestures while the other 30 are both-hand gestures. A total of 11 instances per gesture have been col-lected from a single signer. The experiment has been repeated onan independent American Sign Language database (Kosmopoulos,2006; Askaroglou et al., 2007) with the same conclusion. OnlyMSL results are presented below due to space limitation.

4.2. Experiments

MHMM with Expectation Maximisation (EM) of the HMM Toolboxwritten by Murphy (1998) was used for HMM related experiments.GPDM code written by Wang et al. (2008) was used for GPDMexperiments. Both code bases were customised to accommodatethe obtained gesture data.

As GPDM is a parameter-less model manual parameter tuning isnot required. However, the input data and their characteristics canbe altered such that GPDM model would be able to learn the ges-ture data and related patterns better. In the reported experiments,a fixed sampling-rate of 30 points-per-gesture was used (an opti-mal value based on the performed parametric experiments).Asfor the reference HMM system, two parameters (M and Q) werevaried.

Experiment 1. The classification accuracy and trends for bothGPDM and HMM were evaluated. The popular ‘hold-out’ testingmethod (Dziuda, 2010) was employed. Out of the 11 availableinstances (per gesture), 50%, 80% and 90%1 of the samples wererandomly chosen for training (which translates to 6, 9 and 10samples from the total of 11 samples). Remaining samples were usedas independent test samples to evaluate the model’s classificationperformance.

Page 4: Gaussian Process Dynamical Models for hand gesture interpretation in Sign Language

2012 N. Gamage et al. / Pattern Recognition Letters 32 (2011) 2009–2014

Experiment 2. The variance in classification accuracy (or robust-ness of learning process) was evaluated. In the first set of experi-ments, the database was randomly divided into training and testsamples. However, aiming to verify the trends observed above donot significantly deviate based on the training samples, the follow-ing experiment was devised. If 5 samples out of 11 to be chosen fortraining, a total of 11C5 = 462 combinations can be obtained. BothGPDM and HMM models were subjected to training and classifica-tion based on these combinations to observe variance trends.

Fig. 3. Classification bias: training vs. test samples configuration: HMM at M = 3;

Experiment 3. The training times for a gesture (one instance ofboth cases, i.e., single- and both-hand) were tested on a HewlettPackard xw6400 Workstation (hardware specification: Intel XeonE5345 Quad-Core 2.33 GHz, FSB 1333 MHz, 4 GB DDR2 RAM). Alltests were conducted in ‘single-threaded’ configuration to avoidpossible biases.

Q = 5, single and both-hand gestures.

4.3. Experimental results and discussion

Gesture classification results for GPDM and HMM are tabulatedin Table 1(a) and (b) respectively.

Effect of parameters Q and M on MHMM based training Prior tolooking into other trends, it is worthwhile to observe the behaviourof dominant parameters in MHMM and how they affect classifica-tion results. Based on the analysis of experimental data in Table1(b), the parameter Q plays a significant role in MHMM. The great-er the number of states, the more the MHMM is able to learn, thebetter the classification results are. On the contrary, parameter Mhas only a marginal effect in improving the classification accuracy.This implies that having a single Gaussian peak is enough to covermost of the gesture data and additional peaks might assist in cov-ering the left over data points. Thus the conclusion of this analysisis that, using a greater number for both M and Q parameters help inenhancing the gesture classification results for MHMM.

Classification bias between Training and Test Samples The differ-ence in classification accuracy between training and test samplesfor GPDM is around 0–7%, and for HMM it is around 22–39% (see

Table 1Hand Gesture Classification (in percent).

Sample set Single-hand (36 gestures) Both-hand (30

NTS = 6 NTS = 9 NTS = 10 NTS = 6

(a) GPDMTraining 75 77 74 87Test 70 67 72 85Combined 73 75 74 86

Q = 5|10 Q = 5|10 Q = 5|10 Q = 5|10

(b) HMMM = 1Training 99|N/A 96|N/A 95|N/A 79|81Test 67|N/A 65|N/A 78|N/A 51|56Combined 85|N/A 91|N/A 93|N/A 66|70

M = 2Training 100|N/A 98|N/A 98|N/A 81|81Test 70|N/A 74|N/A 72|N/A 49|52Combined 86|N/A 94|N/A 96|N/A 66|68

M = 3Training 100|N/A 98|N/A 99|N/A 80|82Test 65|N/A 71|N/A 69|N/A 44|44Combined 84|N/A 93|N/A 96|N/A 64|65

M = 4Training 100|N/A 99|N/A 98|N/A 80|81Test 66|N/A 69|N/A 69|N/A 43|40Combined 84|N/A 94|N/A 96|N/A 63|63

Legend: NTS – number of training samples, M – number of Gaussian peaks, Q – number

Fig. 3). This confirms that HMM is more susceptible to overlearn-ing, which potentially misclassify untrained samples. This couldbe circumvented by using large number of carefully selected train-ing data, which comprehensively cover all probable variations. Col-lecting large quantities of gesture data is generally impractical,while the training times and computational costs could be prohib-itive. GPDM’s capability of learning gesture model from smallrepresentative data set while maintaining good generalisationperformance is highly desirable in SL gesture interpretationapplications.

Classification bias between Single and Both-Hand Gestures Gener-ally GPDM has better recognition accuracy for both-hand gesturesas shown in Fig. 4. This is mainly because the both-hand gesturescarry twice as many independent data compared to single-handgestures. A both-hand gesture tells the positional relationships be-tween the hands, while one of the hands is in complete rest in sin-gle-hand gestures. Thus both-hand gestures are much more robustagainst linear translation compared to single-hand gestures, mak-ing it easier to classify correctly compared to single-hand gestures.As for the HMM, it classifies single-hand gestures better than

gestures) Single + both-hand (66 gestures)

NTS = 9 NTS = 10 NTS = 6 NTS = 9 NTS = 10

85 87 81 81 8083 90 77 74 8085 88 79 79 80

Q = 5|10 Q = 5|10 Q = 5|10 Q = 5|10 Q = 5|10

73|80 71|79 97|N/A 92|N/A 90|N/A58|65 64|67 65|N/A 67|N/A 77|N/A70|77 70|78 82|N/A 88|N/A 89|N/A

76|80 73|79 98|N/A 95|N/A 93|N/A58|61 64|67 65|N/A 72|N/A 74|N/A72|77 72|78 83|N/A 91|N/A 92|N/A

77|80 74|79 98|N/A 95|N/A 94|N/A51|54 64|64 60|N/A 67|N/A 73|N/A72|75 73|78 81|N/A 90|N/A 92|N/A

76|81 76|79 98|N/A 96|N/A 95|N/A51|47 61|58 59|N/A 66|N/A 71|N/A72|74 74|78 80|N/A 90|N/A 93|N/A

of states.

Page 5: Gaussian Process Dynamical Models for hand gesture interpretation in Sign Language

Fig. 4. Classification bias: single vs. both-hand gestures. Configuration: HMM atM = 3; Q = 5, combined samples.

Fig. 6. Training times for GPDM and HMM, NTS – number of training samples,parameter sweeps for HMM: M = 1–4 and Q = 1–10.

N. Gamage et al. / Pattern Recognition Letters 32 (2011) 2009–2014 2013

both-hand for any given number of states (Q) due to fixed learningcapacity for any pre-determined number of states.

Effect of Number of Training Samples. The observed general trendis that using more training samples helps in improving the accu-racy. However, it has to be noted that GPDM is able to classify un-trained new gestures with a reasonable accuracy even whenlimited training samples (e.g., 6 out of 11) are available in thetraining phase. This is advantageous as: (a) it significantly reducesthe training times and hence the computational complexity in-volved; and (b) in practice, it is difficult to obtain large trainingdatasets.

Variance in Classification due to Training Samples. The obtainedclassification accuracies are shown in Fig. 5. For HMM, the modestconfiguration of M = 3 and Q = 5 was used. Standard deviations ofthe classification variances are similar for both learning modelsare similar (single-hand: HMM = 1.95%, GPDM = 1.27%; and both-hand: HMM = 3.89%, GPDM = 3.37%). However, the difference be-tween the best and the worst classification performance for sin-gle-hand case is: HMM = 18%, GPDM = 7%; while for both-handcase it is: HMM = 22%, GPDM = 18%. In other words, GPDM hassmaller performance variation compared to HMM.

Training time. In machine learning, the training times are asimportant as the classification results. Fig. 6 summarises the train-ing times for both GPDM and HMM. Noticeably, main trends are:(1) HMM training times are much longer than those of GPDM(approximately 300% or more); (2) increase in the training timeis directly proportional to the number of training samples (for both

Fig. 5. Classification variance for GPDM and HMM. Confi

GPDM and HMM). As the number of training samples increases, thesize of covariance matrices in GPDM grows as square of the train-ing size. Thus inverting such matrices takes more time and pro-cessing resources. High training time is a significant issue forHMM because optimal M and Q cannot be pre-determined. Thusexhaustive parameter sweeps and validation have to be conducted.It is to be noted that reported results for HMM only accounts forparameter sweeps M = 1–4 and Q = 1–10. A full vocabulary gesturerecognition system is likely to require parametric sweeps withwider range to estimate the optimal M and Q parameters, whichcould lead to even lengthier training times. GPDM does not facesuch problems, as it is a parameter-less model. Overall, GPDMhas better training time properties compared to HMM.

The experimental results presented in this section suggest thatGDPM has a more consistent classification performance comparedto HMM in the task of hand gesture classification. GDPM outper-forms HMM with large margin in the cases where the number oftraining samples is low (this is almost always true for real-lifeapplication of gesture recognition), limited feature set andmacro-gestures. Finally, GPDM has efficient training times and lesscomputational complexity in contrast.

5. Future work

One of the advantage of GPDM is its ability to classify with ahigher rate of accuracy even with a fewer set of features. In the re-ported experiments, the only feature utilised was the gesture’s 2Dtrajectory. Even with such an abstruse yet highly relevant feature,

guration: HMM at M = 3; Q = 5, combined samples.

Page 6: Gaussian Process Dynamical Models for hand gesture interpretation in Sign Language

2014 N. Gamage et al. / Pattern Recognition Letters 32 (2011) 2009–2014

the recognition rate for GPDM was exceptionally good for dual-hand gestures, while reasonable for single-hand gestures. Thisdifference is largely down to the fact that dual-hand gesturesquantitatively carry more descriptive data compared to single-hand gestures. In the mean time, it is to be noted that using manyfeatures has no direct influence on the classification accuracyregardless the statistical model used (Alpaydin, 2004; Marsland,2009). Certain features may not be as descriptive or distinctive asothers. Irrelevant features would contribute significant noise andaffect the overall recognition accuracy. Overall, as long as the fea-tures that are highly pertinent to a gesture are presented, GPDM isable to provide modest results. Features like hand motion relativeto other-hand, stationary components (e.g. face) and correlatedhand motion points (e.g. shoulder-joint, elbow-joint and wrist-joint) are potentially useful. This is an area that needs to be inves-tigated in the future research.

6. Conclusion

The hypothesis on GPDM offering a better performance in hu-man hand gesture classification compared to HMM has been pro-posed in the reported study. In supporting this, GPDM was testedagainst a database of 66 hand gestures from MSL while achievinga successful gesture classification rate of 79%. Furthermore, itwas established that GPDM is stronger in areas such as: rigour intraining (number of samples, parameters, and training time),over-learning tendencies and variance in classification.

The contribution of this study includes: (1) introducing GPDMfor gesture classification in the Sign Language context (at the bestauthors’ knowledge this is the first attempt of such an application);(2) application of GPDM other than in its traditional motion track-ing problem domain; (3) establishing GPDM’s superiority in ges-ture classification task by extensive parametric experimentationwith a fairly large gesture database; (4) proving that GPDM is ableto outperform the existing HMM in many respects in the context ofgesture interpretation for sign languages.

Acknowledgements

This research is funded by Monash University Sunway Campusunder the Internal Seeding Grant (E-04-07), Royal Society of NewZealand (ISATA08-13) and International Islamic University Malay-sia Research Matching Grant Scheme RMGS 09-03. The authorswould like to thank Malaysian Federation of Deaf (MFD) and thesigners who participated in the data collection stage of this re-search; and Tan Siao Lip and Melanie Ooi for administrating thisexercise. The support extended by Ms. Vineetha Menon in usingthe MATLAB Parallel Computing Toolbox; and the insightful sug-gestions and feedback provided by Prof. Christopher H. Messomare acknowledged and greatly appreciated.

References

Alpaydin, E., 2004. Introduction to Machine Learning. MIT Press.Anderson, E., Bai, Z., et al., 1999. LAPACK Users’ Guide. Society for Industrial and

Applied Mathematics.

Anton, H., 2010. Elementary Linear Algebra. John Wiley & Sons.Askaroglou, I., Tzikopoulos, S., et al. 2007. Extraction of mid-level semantics from

gesture videos using a Bayesian network. In: 1st Panhellenic StudentsConference on Informatics, Patras, Greece.

Baum, L.E., Petrie, T., et al., 1970. A maximization technique occurring in thestatistical analysis of probabilistic functions of Markov chains. Ann. Math.Statist. 41 (1), 164–171.

Bilmes, J.A., 1998. A gentle tutorial of the EM algorithm and its application toparameter estimation for Gaussian mixture and Hidden Markov Models.International Computer Science Institute, Berkeley, CA.

Brand, M., Oliver, N., et al. 1997. Coupled hidden Markov models for complex actionrecognition. In: Proc. IEEE Comput. Soc. Conf. on Computer Vision and PatternRecognition, 1997.

Cappé, O. (2001). ‘‘Ten years of HMMs.’’ from <http://www.tsi.enst.fr/~cappe/docs/hmmbib.html>.

Cappé, O., Moulines, E., et al., 2005. Inference in Hidden Markov Models. Springer.Correll, N., Sempo, G., et al., 2006. SwisTrack: A tracking tool for multi-unit robotic

and biological systems. In: IEEE/RSJ Internat. Conf. on Intelligent Robots andSystems.

Dziuda, D., 2010. Data Mining for Genomics and Proteomics: Analysis of Gene andProtein Expression Data. John Wiley & Sons.

Eddy, S.R., 1996. Hidden Markov models. Curr. Opin. Struct. Biol. 6 (3), 361–365.

Farin, G.E., Hansford, D., 2000. The Essentials of CAGD. AK Peters, Ltd.Fraser, A.M., 2008. Hidden Markov Models and Dynamical Systems. SIAM.Freeman, W.T., Weissman, C.D., 1995. Television control by hand gestures. In:

Internat. Workshop on Automatic Face and Gesture Recognition, pp. 179–183.

Gamage, N., Akmeliawati, R., et al., 2009. Towards robust skin colour detection andtracking. In: Instrumentation and Measurement Technology Conference, 2009.I2MTC‘09. IEEE.

Juang, B.H., Rabiner, L.R., 1991. Hidden Markov models for speech recognition.Technometrics 33 (3), 251–272.

Kosmopoulos, D., Maglogiannis, I., 2006. Hand tracking for gesture recognition tasksusing dynamic bayesian network. International Journal of Intelligent Systemsand Applications 1 (3–4), 359–375.

Lawrence, N.D., 2003. Gaussian process latent variable models for visualisation ofhigh dimensional data NIPS 2003, Vancouver, Canada. Neural Inform. Process.Systems.

Lochmatter, T., Roduit, P., et al., 2008. SwisTrack – a flexible open source trackingsoftware for multi-agent systems. In: IEEE/RSJ Internat. Conf. on IntelligentRobots and Systems, 2008. IROS 2008.

MacKay, D.J.C., 2003. Information Theory, Inference, and Learning Algorithms.Cambridge University Press.

Marsland, S., 2009. Machine Learning: An Algorithmic Perspective. CRC Press.Microsoft Corporation (2010, 03rd June 2010). Introducing Kinect for Xbox 360.

http://www.xbox.com/en-US/kinect (retrieved 24.10.10).Murphy, K. (1998, 8th June 2005). Hidden Markov Model (HMM) Toolbox for

Matlab. <http://www.cs.ubc.ca/~murphyk/Software/HMM/hmm.html>(retrieved 25.05.09).

Ong, S.C.W., Ranganath, S., 2005. Automatic sign language analysis: A survey andthe future beyond lexical meaning. IEEE Trans. Pattern Anal. Machine Intell. 27(6), 873–891.

Pavlovic, V.I., Sharma, R., et al., 1997. Visual interpretation of hand gestures forhuman–computer interaction: A review. IEEE Trans. Pattern Anal. MachineIntell. 19 (7), 677–695.

Rabiner, L., Juang, B., 1986. An introduction to hidden Markov models. IEEE ASSPMag 3 (1), 4–16.

Rabiner, L.R., 1989. A tutorial on hidden Markov models and selected applications inspeech recognition. Proc. IEEE 77 (2), 257–286.

Resch, B. (2010, 24th June 2010). Mixtures of Gaussians: A Tutorial for the CourseComputational Intelligence. <http://www.igi.tugraz.at/lehre/CI/tutorials/MixtGaussian/index.html> (retrieved 27.09.10).

Urtasun, R., 2006. Motion Models for Robust 3D Human Body Tracking. Faculty ofComputer and Communication, Lausanne École Polytechnique Fédérale deLausanne (EPFL), Doctor of Science: 203.

Wang, J., Fleet, D., et al., 2008. Gaussian process dynamical models for humanmotion. IEEE Trans. Pattern Anal Machine Intell. 30 (2), 283–298.

Wang, J.M., Fleet, D.J., et al., 2005. Gaussian Process Dynamical Models. NIPS 2005,Vancouver, Canada. Adv. Neural Inform. Process. Systems.

Wang, J.M., Fleet, D.J., et al. 2008. GPDM code. http://www.dgp.toronto.edu/~jmwang/gpdm/ (retrieved 30.10.09).