HMM based Automatic Arabic Sign Language Translator using

Hidden Markov Model based Automatic Arabic Sign Language Translator

using KinectOmar Amin†, Hazem Said‡, Ahmed Samy†, Hoda El Korashy¥

†Teaching Assistant, Computer Engineering Department Ain Shams University.

†Software Developer at Robovics.

‡Assistant Professor, Computer Engineering Department, Ain Shams University.

¥Professor, Computer Engineering Department, Ain Shams University.

Outline

• Introduction• Problem Statement• Related Work

• Proposed System• System Description

• Experimental work.

• Conclusion.

2

Problem Introduction

Source : http://wfdeaf.org/human-rights/crpd/sign-language 3

• There are about 70 million deaf people who use sign language as their first language or mother tongue

Research Effort

4

• Data Source• Sensor Based Systems

• Camera Based Systems

• Research Focus• Isolated SLR (Sign Language

Recognition).• Continuous SLR.• Scalable SLR.• Signer Independence.• Posture Recognition.

Sensor Based Systems

5

• Using electromyographybased sensors to measure the electrical activity of muscles at rest and during contraction, and then these measurements are used to detect the sign being performed

Sensor Based Systems

6

• Using Data gloves (i.e. Cyber glove) to capture fingers positions and orientation, to be used to recognize hand shape and signs.

Camera Based Systems

7

• Normal RGB Camera (Usually using colored gloves)

• Stereo System (2 RGB Cameras)

• Kinect Sensor

• Algorithms used• Hidden Markov Model.

• Conditional Random Fields.

• Dynamic time warping.

• Recurrent neural networks.

Research Effort

8

Proposed System Block Diagram

9

Kinect

10

Kinect

11

A Kinect sensor (also called a Kinect) is a physical device that contains cameras, a microphone array, and an accelerometer as well as a software pipeline that processes color, depth, and skeleton data.

Kinect Skeleton Tracking

12

Kinect provides data about 20 different Skeleton joints, that includes:

• 3D accurate position for each joint.

• Joints orientation.

Go-Stop Detector

13

Go-Stop Detector

14

• Detects the start and end of each sign using a threshold to differentiate between signing and non signing space

Signing space

Go-Stop Detector

15

• A Threshold is decided to differentiate between signing space and non signing space based on hands 3d position.

• Three subsequent frames in the signing space or non signing space to flag a start or end of the sign.

Go-Stop Detector

16

Sign Recorder

17

Preprocessing System

18

Preprocessing System

19

Feature Extraction

20

• Features captured from skeleton stream

1. Right hand joint x, y, and depth.

2. Left hand joint x , y, and depth.

3. HIP Center joint x, y, and depth.

Feature Vector

21

• Feature Vector consist of 6 values per skeleton frame.

Feature Number Feature Value

1 Right Hand x – Hip Center x

2 Right Hand y – Hip Center y

3 Right Hand depth – Hip Center depth

4 Left Hand x – Hip Center x

5 Left Hand y – Hip Center y

6 Left Hand depth – Hip Center depth

We need the Hip center Joint to calculate hands positions relative to a static point to compensate for signer position in front of the Kinect.

Linear Resampling

22

Kinect camera records skeleton at the rate of 30 frames/seconds. However, this is the average rate. Practically, time period measured between two consecutive samples show variations from 30ms to 100ms.

Trajectory Smoothing

23

• To decrease the effect of noisy sensors measurements (spikes).

• Next slide : Demo for the trajectory smoothing for one component

Trajectory Smoothing

24

Hidden Markov Model Classifier

25

Hidden Markov Model

26

Hidden Markov Model

27

• To build a Hidden Markov model we need:

Hidden Markov Model

28

• Each hidden Markov model has a Topology

Hidden Markov Model

29

• Our Hidden States Emission Probability Distribution function is

6-D Gaussian distribution

Training Set Generation

30

• For each sign out of the 40 signs, a long video containing 60 samples have been recorded and segmented using the go stop detector into 60 annotated samples per sign to generate the training set and test set.

• These annotated samples are used as observations sequence from which HMMs are created using Baum-Welch Algorithm.

Hidden Markov Model

31

• In Sign Language context: Hands positions in 3d space

Observation

Hidden State6-D Gaussian distributionSingle Skeleton Frame

Hidden Markov Model

32

• Evaluation Algorithm

Hidden Markov Model Classifier

33

Experimental Results

34

• Go Stop Detector• Reliable Segmentation for long video.

• Minimum transition time : 300 ms


35

• Hidden Markov Model Classifier output Performance (online mode)

Person Test Set Size per Sign Classification output

Original Signer 20 95.125%

Different Signer 20 92.5%

• Hidden Markov Model Classifier (offline mode)

Person Test Set Size per Sign Classification output

Original Signer 20 99.25%


36

• Hidden Markov Classifier Performance

• Algorithm used for classification is the Forward-Backward algorithm.

Sign Timing Time needed to classify (ms)

Average Sign time 12.68 ms

Maximum 20.2 ms

Minimum 8.75 ms


37

• Hidden Markov Model Hidden state count.

Conclusion

38

• A System has been developed to automatically segment a live video streams into isolated signs using Kinect and translate these signs into text.

• Performance for signer dependent is 95.125% and the signer independent is 92.5%.

Thank you!

39

Documents

HMM based Automatic Arabic Sign Language Translator using