10.1.1.86.2333

8/13/2019 10.1.1.86.2333

1/4

Gesture Recognition for Virtual Reality Applications

Using Data Gloves and Neural Networks

John Weissmann, Department of Computer Science, University of Zurich, [email protected],

Ralf Salomon, Department of Computer Science, University of Zurich, [email protected]

AbstractThis paper explores the use of hand gestures as a means

of human-computer interactions for virtual reality

applications. For the application, specific hand gestures,

such as fist, index finger, and victory sign, have

been defined. Most exisiting approaches use various

camera-based recognition systems, which are rathercostly and very sensitive to environmental changes.

In contrast, this paper explores a data glove as the input

device, which provides 18 measurement values for the

angles of different finger joints. This paper compares the

performance of different neural network models, such as

back-propagation and radial-basis functions, which are

used by the recognition system to recognize the actual

gesture.

Some network models achieve a recognition rate (training

as well as generalization) of up to 100% over a number of

test subjects. Due to its good performance, this

recogniton system is the first step towards virtual reality

applications in which program execution is controlled by

a sign language.

IntroductionCurrently, interactions with virtual reality (VR)

applications are done in a simple way. Even when

sophisticated devices such as space balls, 3D mice or data

gloves are present, they are mainly used as a means for

pointing and grabbing, i.e. the same I/O-paradigm as is

used with 2D mice. However, it has been shown [1], forexample, that experienced users work more efficiently

with word processors , for example, when using keyboard

shortcuts than with the mouse. Generalising this

observation to 3 dimensions, our aim was to move away

from the simple point&click paradigm to a more compact

way of interaction. Therefore, we explore how hand

gestures could be used to interact with VR applications in

the form of a simple sign language.

In gesture recognition, it is more common to use a camera

in combination with an image recognition system [2].

These systems have the disadvantage that the

image/gesture recognition is very sensitive to

illumination, hand position, hand orientation etc. In order

to circumnavigate these problems we decided to use a dataglove as input device.

Problem DescriptionThe problem we faced was to find a way to map a set of

angular measurements as delivered by the data glove to a

set of pre-defined hand gestures. Furthermore, it would be

advantageous to have a system with a certain amount of

flexibility, so that the same system could be used by

different people.

MethodsIn our experiments, we used the CyberGlove, distributed

by Virtual Technologies Inc. [3], which measures the

angles of 18 joints of the hand: two for each finger, one

each for the angles between neighbouring fingers, as well

as one each for thumb rotation, palm arch, wrist pitch,

and wrist yaw. To design and train the neural networks

we used the Stuttgart Neural Network Simulator [4], a

free software package. SNNS also provides a tool which

can convert a trained network to a C-code module which

can subsequently be included in an application.

For our experiments we chose a set of 20 static handgestures such as fist, index finger, gun, and

victory sign. Accordingly, each neural network model

had 18 input and 20 output nodes. The experiments were

performed with three standard three-layered back-

propagation networks using the logistic function

f(neti )= 1 (1+exp(neti)at each layer l with

neti = wijojl1

j

8/13/2019 10.1.1.86.2333

2/4

and with oj

l 1 denoting the output of the units of the

previous layer.

Learning was performed with a constant learning rate of

= 0.2. For more information on back-propagation, see[5] or [6].

We collected a pattern set of 200 hand gestures from one

person which we divided into a training set of 140

patterns and a test set of 60 patterns.

The structure of these networks can be described as

follows:

(i) Network BPfull : all hidden units (30 units) are fully

connected to all input units

(ii) Network BPpair : each hidden unit is connected to the

input units corresponding to the measurements of two

fingers ("finger pairs"). Since we treat the measurements

of thumb rotation, palm arch, wrist pitch and wrist yaw as

measurements of a sixth finger, this amounts to 15 unitsin the hidden layer.

(iii) Network BPtripleeach hidden unit is connected to the

input units corresponding to the measurements of finger

triples, which again leads to 15 hidden units.

The idea behind the architectures of BPpair and BPtriple is

to exploit a (tentative) correlation between gestures and

finger combinations.

In all networks, all hidden units are fully connected to all

output units, each of which is responsible for recognising

a particular gesture (see Fig. 1).

Hand and Wrist Middlefinger Thumb

FistIndexGunOK

Fig. 1 : Structure of the finger pair network. The nodes

on the input layer are grouped by fingers. Each node of

the hidden layer receives its input from exactly two finger

node groups. Each output node receives its input from all

nodes of the hidden layer. For clarity not all nodes and

connections are shown.

The recognised gesture is determined in a winner-takes-

all fashion, if at least one output unit exceeds the

(experimentally determined) threshold value = 0.8;otherwise the pattern is classified unknown.

First ResultsThe first network, BPfullperformed quite poorly (< 10%),

whereas the BPpair and BPtriple yielded high recognition

rates of 99.5% and 92.0%, respectively, on the test set.

If a gesture recognition system is to be used in a

productive way, it must be flexible enough so that

different people can use it without having to go through a

tedious data collection and training session. Obviously,

the particular recognition rate depends significantly on the

test person's hand geometry.

To get a better idea of the generalisation capabilities of

such networks, we took training and test sets from 5

different persons. Again, all of the training sets consisted

of 140 patterns. In a first experiment we trained 5

networks (based on the finger pair structure) with the 5

training sets and checked the recognition rate of each

network on each of the 5 test sets. For this and the

following experiments we restricted ourselves to the

network architecture BPpair. The results are shown in the

following table:

Table 1 Net

A

Net

B

Net

C

Net

D

Net

E

Test Set A 1.00 0.82 0.98 0.95 0.67Test Set B 0.92 1.00 0.90 0.88 0.80

Test Set C 0.87 0.93 0.98 0.97 0.77

Test Set D 0.85 0.90 0.88 1.00 0.67

Test Set E 0.78 0.77 0.75 0.77 0.98

As can be seen, the recognition rate for the own test set

is practically 100%, the exceptions being Net C and Net

D. The recognition rate for alien test sets strongly varies

between 67% and 98%; in most cases it is higher than

85%. These results seemed to indicate the possibility of

training a net in such a way, that the gestures of any

person will be recognised with an acceptable accuracy.

Combined Training SetsIn the next experiment we merged several combinations of

the original 5 training sets into new training sets. The

following table shows the recognition rates of five

networks trained with combinations of 4 training sets

each. In this table, Net A denotes a net which has

8/13/2019 10.1.1.86.2333

3/4

been trained with a combination of training sets from the

persons B, C, D, and E but not A.

Table 2 Net

A

Net

B

Net

C

Net

D

Net

E

Test Set A 1.00 1.00 1.00 1.00 1.00

Test Set B .98 0.98 1.00 0.98 1.00Test Set C 1.00 1.00 1.00 1.00 1.00

Test Set D 1.00 0.98 0.98 0.97 1.00

Test Set E 1.00 1.00 1.00 1.00 0.88

A further net, trained with a combination of all five

training sets, scored extremely well on the test sets. With

the exception of test set B, for which the recognition rate

was 98.3%, it showed a 100% recognition rate.

Of course we are aware that the data set we used is too

small to permit significant statements about such a nets

performance for all possible hand geometries. However,we believe the results achieved so far are encouraging.

Nevertheless it is conceivable that the combined net cant

cope with the gestures of a user with a hand geometry

radically differing from those used to create the training

sets. Therefore, it would be interesting to look at systems

whose parameters could be changed at runtime.

Radial Basis FunctionsRadial-basis function (RBF) networks consist of an input

and an output layer in which each output unit is fully

connected to all input units.

Each output unit oj

maintains an N-dimensional vector

r

cj

(with N representing the number of input units), which

represents the centre of a Gaussian bump. Each output

unit first calculates the distance

dj= (ci

jui)

2

i=1

n

of its centre to the current point of the input activation

denoted by the ui 's. It then determines its activation

act(oj

)= exp(dj

/ )

with denoting a scaling factor. Further details on RBFnetworks can be found in [6]

For our experiments with radial basis function systems we

employed the same training and test sets as were used for

the backpropagation networks. As scaling factor weused the value 1.0. In the next table the recognition rates

of 5 simple RBFs (i.e. RBFs trained with the training set

of one person each) are shown:

Table 3 RBF

A

RBF

B

RBF

C

RBF

D

RBF

E

Test Set A 0.98 0.35 0.53 0.65 0.33

Test Set B 0.53 0.95 0.51 0.60 0.35

Test Set C 0.70 0.48 0.98 0.55 0.40

Test Set D 0.66 0.53 0.53 1.00 0.46

Test Set E 0.45 0.31 0.43 0.53 0.98

It can be seen that the generalisation capabilities of the

simple RBFs are somewhat inferior to those of the simply

trained back-propagation networks.

However, by training RBFs with combinations of training

sets, we can achieve generalisation capabilities similar to

those of the back-propagation networks trained with

combined training sets. As was the case in table 2, RBF

A denotes a RBF whose training set is a combination of

the training sets B, C, D, and E, but not A.

Table 4 RBF

A

RBF

B

RBF

C

RBF

D

RBF

E

Test Set A 0.91 0.99 1.00 0.99 1.00

Test Set B 0.98 0.86 0.99 0.99 0.99

Test Set C 0.99 1.00 0.94 0.99 0.99

Test Set D 0.99 0.99 0.99 0.97 0.98

Test Set E 0.96 0.93 0.96 0.96 0.72

The advantage of employing RBFs lies in the fact that

RBFs can be easily retrained at run time due to their

linear character. This means a gesture recognition system

based on RBFs could be adaptively retrained if it

encounters a user whose hand geometry differs strongly

from those in the training sets.

Applications and Future WorkIn order to demonstrate the usability of a sign language as

a means of controlling a program we incorporated our

gesture recognition system into a simple virtual reality

application. This application consists of some objects in a

3-dimensional space and a robot hand (see Fig. 2). We

assigned simple commands such as move robot hand

forward, rotate robot hand about x-axis, or grab

object to some of the gestures. With a small learningeffort, it is possible to effectively navigate in the virtual

world and manipulate objects therein.

8/13/2019 10.1.1.86.2333

4/4

Fig. 2 : The Test Application. The gesture-controlled

robot hand is just about to grab an object in virtual

space.

We are currently working on the integration of our system

as a means of interaction in a number of virtual reality

applications developed at the University of Zurich, such as

a virtual endoscopy application and a geographical

information system.

In the future, we are planning to continue our work in the

following directions:

Exploiting the adaptive possibilities of RBF-basedsystems: This would enable run time changes to the system,thus enabling retraining of systems to the gestures of

a new user with unrecognizable gestures. Recognition of dynamic gestures: Gestures such as waving or wagging a finger can

make a sign language much more intuitive. In order

to correctly recognize dynamic gestures the data glove

must be equipped with a tracking device such as the

Ascension Flock of Birds [7] or Polhemus

Fastrak [8], in order to provide the system with

positional and orientational information. Use of both hands: In VR applications where a particular gesture of the

right hand, such as extended index finger, isassigned the command move forward, gestures of

the left hand could be used as modifiers to regulate

the speed. Recognition of gesture sequences: Here the problem lies in detecting and eliminating

unwanted intermediate gestures. If, for instance, the

gesture thumbs up is followed by the gesture

extended index finger, the gesture gun (extended

index fingerplus thumb) might unintenionally be

formed during the transition.The application of a gesture recognition system as

described in this paper must not necessarily be restricted

to VR programs; once the points mentioned above havebeen solved, it would, for example, also open up the

possibility of building a system for the translation of ASL

(American Sign Language) to spoken English

ConclusionThis paper demonstrates that the chosen combination of

data glove and neural networks achieves high recognition

rates on a set of predefined gestures. Therefore it can be

considered as being a first step towards VR applications or

other types of applications in which program execution is

controlled by means of a sign language.

AcknowledgementsThis work is supported in part by the Swiss National

Science Foundation grant #21-50684.97

References[1] G. dYdewalle et al., Graphical versus Character-Based

Word Processors: An Analysis of User Performance, Behaviour

and Information Technology, 1995 v.14 n.4 p.208-214

[2] R. Kieldsen, J. Kender, Toward the Use of Gesture in

Traditional User Interfaces, Proceedings of the Second

International Conference on Automatic Face and Gesture

Recognition, 1996, p.151-156

[3] Virtual Technologies Inc., Palo Alto, CA 94306

Production and ditribution of data gloves and related devices.

www.virtex.com

[4] Web site for the Stuttgart Neural Network Simulator :

www-ra.informatik.uni-tuebingen.de/SNNS/

[5] J. Hertz, A. Krogh, R. Palmer,Introduction to the Theory of

Neural Computation, Santa Fe Institute in the sciences of

complexity; Lecture notes v.1, Addison Wesley

[6] R. Rojas,Neural Networks: A systematic Introduction,Springer-Verlag, Berlin (1996).

[7] Ascension Technology Corporation

www.ascension-tech.com

[8] Polhemus Incorporated

www.polhemus.com

Documents

10.1.1.86.2333