HaiXiu: Emotion Recognition from Movements

Preview:

Citation preview

Emotion Recognition

F e l i c i t o u s C o m p u t i n g I n s t i t u t e visit to

Deb * Mal * Ya Sin * Ha

Means of Emotion Recognition

Speech

Gestures

Facial Feature

Heartbeat

Skin ConductanceBrain Imagery

Movement Features

Hand Writing

Dance

Valance

Aro

usa

lHigh

High

Low

Low

Anger

LoveSadness

Joy

Social Presence

Bio Chemical

Eye & pupil

Active User Participation

Speech

Gestures

Facial Feature

Heartbeat

Skin ConductanceBrain Imagery

Movement Features

Hand Writing

Dance

Valance

Aro

usa

lHigh

High

Low

Low

Anger

LoveSadness

Joy

Social Presence

Bio Chemical

Eye & pupil

•Users usually have to consciously take part in it•Chances of suppressing emotive cues•Chances of showing inverted affective state•Tend to be biased

Passive User Participation

Speech

Gestures

Facial Feature

Heartbeat

Skin ConductanceBrain Imagery

Movement Features

Hand Writing

Dance

Valance

Aro

usa

lHigh

High

Low

Low

Anger

LoveSadness

Joy

Social Presence

Bio Chemical

Eye & pupil

• Users do not need to take part actively• Users have much less control over the results• Less attention and control means less bias

Active PassiveTransition

Other two Key Factors for the Transition:1. Device invisibility2. Sensor distance

Over time users tend to get familiarized•The actions slowly gets more passive•Controlling tendency decreases•Faking of emotion decreases

Device invisibility

Instead of

An invisible tracker/sensor device sits in the background and do work for the users

Problem of devices with direct contact:

• In various situations users will not have the mindset to actively engage (Trauma, Sadness, Forlorn)• Sensor Distance is important in these situations

MoodScope

Sensor Distance

• Facial Feature

• Heartbeat

• Skin Conductance

• Brain Imagery

• Movement and Gestures

• Bio Chemical

• Eye & pupil

With External

Attachments

• No body attachments• Large Sensor Distance• Can be operated more passively• Much more unconscious participation• Bias can be more minimized

Without

Attachments

Modes of Passive Recognizers

Passive, Ambient Sensors

• Facial Feature

• Eye & pupil

• Movement and Gestures

needs

• Focus• Facing to camera• A degree of attachment to

sensors

In many cases where:1. the face is not visible2. there is no provision for attaching

sensors to body3. there is no speech input

The movement and gesture detection is much more feasible to detect affect

Movements and Gestures: A scenario

Situations where body movements and gestures are crucial:

1. A Post Traumatic Stress Disorder (PTSD) patient pacing in the room. 2. A schizophrenic patient at an asylum is going impatient and angry and doing frivolous, jerky movements.3. A patient of Chronic Depression is seen pacing slowly, hands in pocket, head drooping.

An Automated system that detects emotive states in such situations, can even save lives.

HaiXiu -害羞

Records gestures and movement

Comes up with unique feature set

Trains a Neural Net for later detection

Continuous Emotion

Detection

HaiXiu -害羞

• Microsoft Kinect™ for movement detection

• Rather than discreet affective states, our targetis to detect Arousal and Valence Levels incontinuous space.

• This model of continuous affective leveldetection can be implemented with othercontinuous affective spaces. e.g: Plutchik’sEmotion Wheel, PAD model

• Presently HaiXiu detects only Arousal levels.Work is going on to include the Valence level.

Valance

Aro

usa

l

High

High

Low

Low

Anger

LoveSadness

Joy

Feature Set for Arousal level detection

Kinect gives us 20 Different Joint position data

We Calculate:

1. Minimum coordinates for X , Y and Z axis (Relative to spine)2. Maximum coordinates for X , Y and Z axis (Relative to spine)3. Speed = Δs/Δt4. Peak Acceleration = Δu/Δt5. Peak Deceleration = - Δu/Δt6. Average Acceleration = (Σ (Δu/Δt))/f7. Average Deceleration = - (Σ (Δu/Δt))/f8. Jerk Index = (Σ (Δa/Δt))/f

Δt = 0.2 second; f = total time / Δt

Training the Neural Net

Initially we took 20 movement features (without the position features) and told 2 subjects to walk in various arousal levels. We measured Speed, Accel, Decel, JerkIndex for upper body joints.

Type: Bipolar Feedforward ANNLayers: 3 (20 : 6: 1) Learning: Backpropagation LearningSample Size: 34 Walks (in different arousal levels) of 2 subjectsError Limit of learned Net: 0.0956

DetectionThe ANN outputs one variable for Arousal Level

The output range is from -1 (totally relaxed) to +1 (Very Aroused)

Challenges

1. Short working range of Kinect : .8m to 4.0m 2. Shorter than the range needed in practical scenarios

3. Data not consistent enough for precise movement feature Calculation4. Fault Tolerance in case of recording and detection is needed.5. Kinect does not follow BVH format thus available gesture databases in BVH

can not be natively used without a converter module (less efficiency)

Next Step

1. Introducing the Position CoOrdinates2. Fine tune the Arousal level recognizer3. A Robust Gesture recognition module4. Building a Valence recognizer module5. Getting more test data with more number of subjects6. Multiple Kinect integration for better recognition

7. A slightly better user interface

Valance

Aro

usa

lHigh

High

Low

Low

Anger

LoveSadness

Joy

Integrated Emotion detection

1. Every one of the modes of recognition have their merits2. There are a plethora of existing facial expression detectors like “affectiva”3. Speech based emotion recognition has also been extensively done4. MoodScope has changed the smartphone based affect detection5. Powerful tools like AmbientDynamix makes integration of various sensor

inputs ease for processing and using in small devices like a smartphone

+

Thank You

Recommended