Upload
thiago-ribeiro
View
218
Download
0
Embed Size (px)
Citation preview
8/13/2019 10.1.1.86.2333
1/4
Gesture Recognition for Virtual Reality Applications
Using Data Gloves and Neural Networks
John Weissmann, Department of Computer Science, University of Zurich, [email protected],
Ralf Salomon, Department of Computer Science, University of Zurich, [email protected]
AbstractThis paper explores the use of hand gestures as a means
of human-computer interactions for virtual reality
applications. For the application, specific hand gestures,
such as fist, index finger, and victory sign, have
been defined. Most exisiting approaches use various
camera-based recognition systems, which are rathercostly and very sensitive to environmental changes.
In contrast, this paper explores a data glove as the input
device, which provides 18 measurement values for the
angles of different finger joints. This paper compares the
performance of different neural network models, such as
back-propagation and radial-basis functions, which are
used by the recognition system to recognize the actual
gesture.
Some network models achieve a recognition rate (training
as well as generalization) of up to 100% over a number of
test subjects. Due to its good performance, this
recogniton system is the first step towards virtual reality
applications in which program execution is controlled by
a sign language.
IntroductionCurrently, interactions with virtual reality (VR)
applications are done in a simple way. Even when
sophisticated devices such as space balls, 3D mice or data
gloves are present, they are mainly used as a means for
pointing and grabbing, i.e. the same I/O-paradigm as is
used with 2D mice. However, it has been shown [1], forexample, that experienced users work more efficiently
with word processors , for example, when using keyboard
shortcuts than with the mouse. Generalising this
observation to 3 dimensions, our aim was to move away
from the simple point&click paradigm to a more compact
way of interaction. Therefore, we explore how hand
gestures could be used to interact with VR applications in
the form of a simple sign language.
In gesture recognition, it is more common to use a camera
in combination with an image recognition system [2].
These systems have the disadvantage that the
image/gesture recognition is very sensitive to
illumination, hand position, hand orientation etc. In order
to circumnavigate these problems we decided to use a dataglove as input device.
Problem DescriptionThe problem we faced was to find a way to map a set of
angular measurements as delivered by the data glove to a
set of pre-defined hand gestures. Furthermore, it would be
advantageous to have a system with a certain amount of
flexibility, so that the same system could be used by
different people.
MethodsIn our experiments, we used the CyberGlove, distributed
by Virtual Technologies Inc. [3], which measures the
angles of 18 joints of the hand: two for each finger, one
each for the angles between neighbouring fingers, as well
as one each for thumb rotation, palm arch, wrist pitch,
and wrist yaw. To design and train the neural networks
we used the Stuttgart Neural Network Simulator [4], a
free software package. SNNS also provides a tool which
can convert a trained network to a C-code module which
can subsequently be included in an application.
For our experiments we chose a set of 20 static handgestures such as fist, index finger, gun, and
victory sign. Accordingly, each neural network model
had 18 input and 20 output nodes. The experiments were
performed with three standard three-layered back-
propagation networks using the logistic function
f(neti )= 1 (1+exp(neti)at each layer l with
neti = wijojl1
j
8/13/2019 10.1.1.86.2333
2/4
and with oj
l 1 denoting the output of the units of the
previous layer.
Learning was performed with a constant learning rate of
= 0.2. For more information on back-propagation, see[5] or [6].
We collected a pattern set of 200 hand gestures from one
person which we divided into a training set of 140
patterns and a test set of 60 patterns.
The structure of these networks can be described as
follows:
(i) Network BPfull : all hidden units (30 units) are fully
connected to all input units
(ii) Network BPpair : each hidden unit is connected to the
input units corresponding to the measurements of two
fingers ("finger pairs"). Since we treat the measurements
of thumb rotation, palm arch, wrist pitch and wrist yaw as
measurements of a sixth finger, this amounts to 15 unitsin the hidden layer.
(iii) Network BPtripleeach hidden unit is connected to the
input units corresponding to the measurements of finger
triples, which again leads to 15 hidden units.
The idea behind the architectures of BPpair and BPtriple is
to exploit a (tentative) correlation between gestures and
finger combinations.
In all networks, all hidden units are fully connected to all
output units, each of which is responsible for recognising
a particular gesture (see Fig. 1).
Hand and Wrist Middlefinger Thumb
FistIndexGunOK
Fig. 1 : Structure of the finger pair network. The nodes
on the input layer are grouped by fingers. Each node of
the hidden layer receives its input from exactly two finger
node groups. Each output node receives its input from all
nodes of the hidden layer. For clarity not all nodes and
connections are shown.
The recognised gesture is determined in a winner-takes-
all fashion, if at least one output unit exceeds the
(experimentally determined) threshold value = 0.8;otherwise the pattern is classified unknown.
First ResultsThe first network, BPfullperformed quite poorly (< 10%),
whereas the BPpair and BPtriple yielded high recognition
rates of 99.5% and 92.0%, respectively, on the test set.
If a gesture recognition system is to be used in a
productive way, it must be flexible enough so that
different people can use it without having to go through a
tedious data collection and training session. Obviously,
the particular recognition rate depends significantly on the
test person's hand geometry.
To get a better idea of the generalisation capabilities of
such networks, we took training and test sets from 5
different persons. Again, all of the training sets consisted
of 140 patterns. In a first experiment we trained 5
networks (based on the finger pair structure) with the 5
training sets and checked the recognition rate of each
network on each of the 5 test sets. For this and the
following experiments we restricted ourselves to the
network architecture BPpair. The results are shown in the
following table:
Table 1 Net
A
Net
B
Net
C
Net
D
Net
E
Test Set A 1.00 0.82 0.98 0.95 0.67Test Set B 0.92 1.00 0.90 0.88 0.80
Test Set C 0.87 0.93 0.98 0.97 0.77
Test Set D 0.85 0.90 0.88 1.00 0.67
Test Set E 0.78 0.77 0.75 0.77 0.98
As can be seen, the recognition rate for the own test set
is practically 100%, the exceptions being Net C and Net
D. The recognition rate for alien test sets strongly varies
between 67% and 98%; in most cases it is higher than
85%. These results seemed to indicate the possibility of
training a net in such a way, that the gestures of any
person will be recognised with an acceptable accuracy.
Combined Training SetsIn the next experiment we merged several combinations of
the original 5 training sets into new training sets. The
following table shows the recognition rates of five
networks trained with combinations of 4 training sets
each. In this table, Net A denotes a net which has
8/13/2019 10.1.1.86.2333
3/4
been trained with a combination of training sets from the
persons B, C, D, and E but not A.
Table 2 Net
A
Net
B
Net
C
Net
D
Net
E
Test Set A 1.00 1.00 1.00 1.00 1.00
Test Set B .98 0.98 1.00 0.98 1.00Test Set C 1.00 1.00 1.00 1.00 1.00
Test Set D 1.00 0.98 0.98 0.97 1.00
Test Set E 1.00 1.00 1.00 1.00 0.88
A further net, trained with a combination of all five
training sets, scored extremely well on the test sets. With
the exception of test set B, for which the recognition rate
was 98.3%, it showed a 100% recognition rate.
Of course we are aware that the data set we used is too
small to permit significant statements about such a nets
performance for all possible hand geometries. However,we believe the results achieved so far are encouraging.
Nevertheless it is conceivable that the combined net cant
cope with the gestures of a user with a hand geometry
radically differing from those used to create the training
sets. Therefore, it would be interesting to look at systems
whose parameters could be changed at runtime.
Radial Basis FunctionsRadial-basis function (RBF) networks consist of an input
and an output layer in which each output unit is fully
connected to all input units.
Each output unit oj
maintains an N-dimensional vector
r
cj
(with N representing the number of input units), which
represents the centre of a Gaussian bump. Each output
unit first calculates the distance
dj= (ci
jui)
2
i=1
n
of its centre to the current point of the input activation
denoted by the ui 's. It then determines its activation
act(oj
)= exp(dj
/ )
with denoting a scaling factor. Further details on RBFnetworks can be found in [6]
For our experiments with radial basis function systems we
employed the same training and test sets as were used for
the backpropagation networks. As scaling factor weused the value 1.0. In the next table the recognition rates
of 5 simple RBFs (i.e. RBFs trained with the training set
of one person each) are shown:
Table 3 RBF
A
RBF
B
RBF
C
RBF
D
RBF
E
Test Set A 0.98 0.35 0.53 0.65 0.33
Test Set B 0.53 0.95 0.51 0.60 0.35
Test Set C 0.70 0.48 0.98 0.55 0.40
Test Set D 0.66 0.53 0.53 1.00 0.46
Test Set E 0.45 0.31 0.43 0.53 0.98
It can be seen that the generalisation capabilities of the
simple RBFs are somewhat inferior to those of the simply
trained back-propagation networks.
However, by training RBFs with combinations of training
sets, we can achieve generalisation capabilities similar to
those of the back-propagation networks trained with
combined training sets. As was the case in table 2, RBF
A denotes a RBF whose training set is a combination of
the training sets B, C, D, and E, but not A.
Table 4 RBF
A
RBF
B
RBF
C
RBF
D
RBF
E
Test Set A 0.91 0.99 1.00 0.99 1.00
Test Set B 0.98 0.86 0.99 0.99 0.99
Test Set C 0.99 1.00 0.94 0.99 0.99
Test Set D 0.99 0.99 0.99 0.97 0.98
Test Set E 0.96 0.93 0.96 0.96 0.72
The advantage of employing RBFs lies in the fact that
RBFs can be easily retrained at run time due to their
linear character. This means a gesture recognition system
based on RBFs could be adaptively retrained if it
encounters a user whose hand geometry differs strongly
from those in the training sets.
Applications and Future WorkIn order to demonstrate the usability of a sign language as
a means of controlling a program we incorporated our
gesture recognition system into a simple virtual reality
application. This application consists of some objects in a
3-dimensional space and a robot hand (see Fig. 2). We
assigned simple commands such as move robot hand
forward, rotate robot hand about x-axis, or grab
object to some of the gestures. With a small learningeffort, it is possible to effectively navigate in the virtual
world and manipulate objects therein.
8/13/2019 10.1.1.86.2333
4/4
Fig. 2 : The Test Application. The gesture-controlled
robot hand is just about to grab an object in virtual
space.
We are currently working on the integration of our system
as a means of interaction in a number of virtual reality
applications developed at the University of Zurich, such as
a virtual endoscopy application and a geographical
information system.
In the future, we are planning to continue our work in the
following directions:
Exploiting the adaptive possibilities of RBF-basedsystems: This would enable run time changes to the system,thus enabling retraining of systems to the gestures of
a new user with unrecognizable gestures. Recognition of dynamic gestures: Gestures such as waving or wagging a finger can
make a sign language much more intuitive. In order
to correctly recognize dynamic gestures the data glove
must be equipped with a tracking device such as the
Ascension Flock of Birds [7] or Polhemus
Fastrak [8], in order to provide the system with
positional and orientational information. Use of both hands: In VR applications where a particular gesture of the
right hand, such as extended index finger, isassigned the command move forward, gestures of
the left hand could be used as modifiers to regulate
the speed. Recognition of gesture sequences: Here the problem lies in detecting and eliminating
unwanted intermediate gestures. If, for instance, the
gesture thumbs up is followed by the gesture
extended index finger, the gesture gun (extended
index fingerplus thumb) might unintenionally be
formed during the transition.The application of a gesture recognition system as
described in this paper must not necessarily be restricted
to VR programs; once the points mentioned above havebeen solved, it would, for example, also open up the
possibility of building a system for the translation of ASL
(American Sign Language) to spoken English
ConclusionThis paper demonstrates that the chosen combination of
data glove and neural networks achieves high recognition
rates on a set of predefined gestures. Therefore it can be
considered as being a first step towards VR applications or
other types of applications in which program execution is
controlled by means of a sign language.
AcknowledgementsThis work is supported in part by the Swiss National
Science Foundation grant #21-50684.97
References[1] G. dYdewalle et al., Graphical versus Character-Based
Word Processors: An Analysis of User Performance, Behaviour
and Information Technology, 1995 v.14 n.4 p.208-214
[2] R. Kieldsen, J. Kender, Toward the Use of Gesture in
Traditional User Interfaces, Proceedings of the Second
International Conference on Automatic Face and Gesture
Recognition, 1996, p.151-156
[3] Virtual Technologies Inc., Palo Alto, CA 94306
Production and ditribution of data gloves and related devices.
www.virtex.com
[4] Web site for the Stuttgart Neural Network Simulator :
www-ra.informatik.uni-tuebingen.de/SNNS/
[5] J. Hertz, A. Krogh, R. Palmer,Introduction to the Theory of
Neural Computation, Santa Fe Institute in the sciences of
complexity; Lecture notes v.1, Addison Wesley
[6] R. Rojas,Neural Networks: A systematic Introduction,Springer-Verlag, Berlin (1996).
[7] Ascension Technology Corporation
www.ascension-tech.com
[8] Polhemus Incorporated
www.polhemus.com