Upload
-
View
30
Download
2
Embed Size (px)
Citation preview
Hidden Markov Model based Automatic Arabic Sign Language Translator
using KinectOmar Amin†, Hazem Said‡, Ahmed Samy†, Hoda El Korashy¥
†Teaching Assistant, Computer Engineering Department Ain Shams University.
†Software Developer at Robovics.
‡Assistant Professor, Computer Engineering Department, Ain Shams University.
¥Professor, Computer Engineering Department, Ain Shams University.
Outline
• Introduction• Problem Statement• Related Work
• Proposed System• System Description
• Experimental work.
• Conclusion.
2
Problem Introduction
Source : http://wfdeaf.org/human-rights/crpd/sign-language 3
• There are about 70 million deaf people who use sign language as their first language or mother tongue
Research Effort
4
• Data Source• Sensor Based Systems
• Camera Based Systems
• Research Focus• Isolated SLR (Sign Language
Recognition).• Continuous SLR.• Scalable SLR.• Signer Independence.• Posture Recognition.
Sensor Based Systems
5
• Using electromyographybased sensors to measure the electrical activity of muscles at rest and during contraction, and then these measurements are used to detect the sign being performed
Sensor Based Systems
6
• Using Data gloves (i.e. Cyber glove) to capture fingers positions and orientation, to be used to recognize hand shape and signs.
Camera Based Systems
7
• Normal RGB Camera (Usually using colored gloves)
• Stereo System (2 RGB Cameras)
• Kinect Sensor
• Algorithms used• Hidden Markov Model.
• Conditional Random Fields.
• Dynamic time warping.
• Recurrent neural networks.
Research Effort
8
Kinect
11
A Kinect sensor (also called a Kinect) is a physical device that contains cameras, a microphone array, and an accelerometer as well as a software pipeline that processes color, depth, and skeleton data.
Kinect Skeleton Tracking
12
Kinect provides data about 20 different Skeleton joints, that includes:
• 3D accurate position for each joint.
• Joints orientation.
Go-Stop Detector
14
• Detects the start and end of each sign using a threshold to differentiate between signing and non signing space
Signing space
Go-Stop Detector
15
• A Threshold is decided to differentiate between signing space and non signing space based on hands 3d position.
• Three subsequent frames in the signing space or non signing space to flag a start or end of the sign.
Feature Extraction
20
• Features captured from skeleton stream
1. Right hand joint x, y, and depth.
2. Left hand joint x , y, and depth.
3. HIP Center joint x, y, and depth.
Feature Vector
21
• Feature Vector consist of 6 values per skeleton frame.
Feature Number Feature Value
1 Right Hand x – Hip Center x
2 Right Hand y – Hip Center y
3 Right Hand depth – Hip Center depth
4 Left Hand x – Hip Center x
5 Left Hand y – Hip Center y
6 Left Hand depth – Hip Center depth
We need the Hip center Joint to calculate hands positions relative to a static point to compensate for signer position in front of the Kinect.
Linear Resampling
22
Kinect camera records skeleton at the rate of 30 frames/seconds. However, this is the average rate. Practically, time period measured between two consecutive samples show variations from 30ms to 100ms.
Trajectory Smoothing
23
• To decrease the effect of noisy sensors measurements (spikes).
• Next slide : Demo for the trajectory smoothing for one component
Hidden Markov Model
29
• Our Hidden States Emission Probability Distribution function is
6-D Gaussian distribution
Training Set Generation
30
• For each sign out of the 40 signs, a long video containing 60 samples have been recorded and segmented using the go stop detector into 60 annotated samples per sign to generate the training set and test set.
• These annotated samples are used as observations sequence from which HMMs are created using Baum-Welch Algorithm.
Hidden Markov Model
31
• In Sign Language context: Hands positions in 3d space
Observation
Hidden State6-D Gaussian distributionSingle Skeleton Frame
Experimental Results
34
• Go Stop Detector• Reliable Segmentation for long video.
• Minimum transition time : 300 ms
Experimental Results
35
• Hidden Markov Model Classifier output Performance (online mode)
Person Test Set Size per Sign Classification output
Original Signer 20 95.125%
Different Signer 20 92.5%
• Hidden Markov Model Classifier (offline mode)
Person Test Set Size per Sign Classification output
Original Signer 20 99.25%
Experimental Results
36
• Hidden Markov Classifier Performance
• Algorithm used for classification is the Forward-Backward algorithm.
Sign Timing Time needed to classify (ms)
Average Sign time 12.68 ms
Maximum 20.2 ms
Minimum 8.75 ms
Conclusion
38
• A System has been developed to automatically segment a live video streams into isolated signs using Kinect and translate these signs into text.
• Performance for signer dependent is 95.125% and the signer independent is 92.5%.