31
USC Multimodal Communication and Machine Learning Lab Modeling Human Communication Dynamics [MultiComp Lab] PhD students: Derya Ozkan and Sunghyun Park Master student: Moitreya Chatterjee Post-doctoral researcher: Stefan Scherer Research programmer: Giota Stratou Project manager: Alesia Egan Louis-Philippe Morency

Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional

USC Multimodal Communication and

Machine Learning Lab

Modeling Human Communication Dynamics

[MultiComp Lab]

PhD students: Derya Ozkan and Sunghyun Park

Master student: Moitreya Chatterjee

Post-doctoral researcher: Stefan Scherer

Research programmer: Giota Stratou

Project manager: Alesia Egan

Louis-Philippe Morency

Page 2: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional

Multimodal Communication and Machine Learning Lab

© Keith Schaffer

Page 3: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional

Multimodal Perception of User State: Applications

Engagement Dominance Empathy

Social

Disorders

Depression PTSD Distress

Medical

Distress Indicators Suicide prevention

Education

Group learning analytics Virtual Learning Peer

Business

with MIT and CogitoHealth with Cincinnati Hospital

with Stanford and UCSD

YouTube: Opinion mining with UNT and UT Dallas

with CMU

Affect

Frustration Agreement Sentiment

Negotiation outcomes

with USC business school

Page 4: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional

Multimodal Perception of Distress Indicators

Page 5: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional

Multimodal Communicative Behaviors

Gestures Head gestures Eye gestures Arm gestures

Body language Body posture Proxemics

Eye contact Head gaze Eye gaze

Facial expressions FACS action units Smile, frowning

Verbal Visual

Prosody Intonation Voice quality

Auditory

Vocal expressions Laughter, moans

Lexicon Words

Syntax Part-of-speech Dependencies

Pragmatics Discourse acts

Page 6: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional

From Audio-Visual Signals to Perceived User State

Gestures Head gestures Eye gestures Arm gestures

Body language Body posture Proxemics

Eye contact Head gaze Eye gaze

Facial expressions FACS action units Smile, frowning

Verbal Visual

Prosody Intonation Voice quality

Auditory

Vocal expressions Laughter, moans

Lexicon Words

Syntax Part-of-speech Dependencies

Pragmatics Discourse acts

Engagement Dominance Empathy

Social

Disorders

Depression PTSD Distress

Affect

Frustration Agreement Sentiment

Page 7: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional

From Audio-Visual Signals to Perceived User State

Audio signals

Head pose

Perceived User State

Distress Engagement Sentiment

Visual signals

Voice pitch

Page 8: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional

• Low-cost depth sensor • Articulated body tracking

• 3D head pose estimation • Real-time facial feature tracker

[FG 2008, best paper award]

[CVPR 2012]

From Audio-Visual Signals to Perceived User State

Audio signals

Head pose

Visual signals

Voice pitch

• Pitch, energy and speaking rate • Automatic voice quality analysis

Page 10: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional

Computational Behavior Indicators

Human in the loop:

Identify behaviors useful to human task

Behavior quantification

Quantify changes in human behaviors

Page 11: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional

Visualization of Computational Behavior Indicators

Transparent comparisons of observed behavioral indicators

Analogous to medical lab result sheets

Low Normal High

Page 12: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional

Detection and Computational Analysis of

Psychological Signals (DCAPS)

Albert (Skip) Rizzo Louis-Philippe Morency

Jonathan Gratch Arno Hartholt David Traum Stacy Marsella

Multimodal Perception

Animation Evaluation Integration Dialogue

Visualization & Audio Analysis

Clinical Expert

+ 27 researchers, programmers, artists

and clinicians

Page 13: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional

Psychological Distress: Datasets

Aims: Study behaviors associated to psychological distress

Depression (PHQ-9)

Trait Anxiety (STAI)

PTSD (PCL-C/PCL-M)

Examine how these cues may differ

Across different interaction settings

Face-to-face: expected to evoke strongest indicators

Computer-mediated (TeleCoach): intended use case

Human-computer (SimSensei): intended use case

Across different populations

General Los Angeles population (Craigslist)

Recent veterans (US Vets)

Page 14: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional

Multimodal Perception of Distress Indicators

Page 15: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional

Psychological Distress Indicators

Vertical eye gaze Smile intensity

Distress

Hand self-adaptor Legs fidgeting

Distress No-distress

Distress No-distress Distress No-distress

No-distress

[IEEE FG 2013 – Best paper award]

Distress Distress No-distress No-distress

Voice energy std. Voice quality (NAQ)

Distress Distress No-distress No-distress

Joy – Facial expr. Sad – Facial expr.

• Distress

• Anxiety

• Depression

• PTSD

Page 16: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional

Effect of Gender on Distress Indicators [ACII 2013]

• Distress

• Anxiety

• Depression

• PTSD

Page 17: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional

Effect of Gender on Distress Indicators [ACII 2013]

• Distress

• Anxiety

• Depression

• PTSD

Page 18: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional

Suicide Prevention

Nonverbal indicators of suicidal ideations

Dataset: 30 suicidal adolescents/30 non-suicidal adolescents

Suicidal teenagers use more breathy tones

Sourc

e: C

DC

0.1

0.3

0.5

0.7

0.9

Non-Suicidal Suicidal

Open Quotient (OQ)

0

0.1

0.2

0.3

0.4

Non-Suicidal Suicidal

Std. Open Quotient (OQ std.)

Non-Suicidal Suicidal

Std. Norm. Amplitude Quotient (NAQ std.)

0

0.04

0.08

0.12

0.16

****

**

[ICASSP 2013]

Page 19: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional

MultiSense: Multimodal Perception Library

TOOLS\MODULES

TRANSFORMERS:

Facetrackers

•Gavam

•CLM/CLMZ

•Okao

•Shore

Real-time hCRF

ActiveMQ –VHMessenger

EmoVoice

CONSUMER

*Each one exported as .dll

PROVIDERS:

Audio Capture

Webcam Capture

Mouse

Kinect( Depth/Intensity/IR

image/Skeleton)

CONSUMERS:

Image Painter

Signal Painter

Sensing Layer

Behavior Layer

<person id=“subjectA”>

<sensingLayer>

<headPose>

<position z="223" y="345" x="193" />

<rotation rotZ="15" rotY="35" rotX="10" />

<confidence>0.34<confidence/>

</headPose>

...

</sensingLayer> </person>

<person id=“subjectB”>

<behaviorLayer>

<behavior>

<type>attention</type>

<level>high</level>

<value>0.6</value>

<confidence>0.46<confidence/>

</behavior>

...

</behaviorLayer>

</person>

PML:

Perception

Markup

language

Page 20: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional

Temporal Probabilistic Learning

x1

y1

x2

y2

x3

y3

xn

yn

Naïve Bayes Classifier

x1

y1

x2

y2

x3

y3

xn

yn

Maximum Entropy Model &

Support Vector Machine

x1 x2 x3 xn

h1 h2 h3 hn

Hidden Markov Model

x1 x2 x3 xn

y1 y2 y3 yn

Conditional Random Field

Observations (e.g., yaw, roll pitch)

Labels (e.g., head-nod,other-gesture)

Caption

Generative models

yi

xi

Discriminative models

• Audio

• Visual

• Verbal

I. Temporal dynamic

Page 21: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional

Temporal Probabilistic Learning

x1 x2 x3 xn

h1 h2 h3 hn

y

Hidden

Conditional Random Field

x1 x2 x3 xn

h1 h2 h3 hn

y1 y2 y3 yn

Latent-Dynamic

Conditional Random Field

[CVPR 2007] [CVPR 2006,PAMI 2007]

Model Accuracy

HMM 84.2%

CRF 86.0%

HCRF (w=0) 91.6%

HCRF (w=1) 93.9% 0 0.2 0.4 0.6 0.8 1

False positive rate

HMM

SVM

CRF

LDCRF

Tru

e p

osi

tiv

e ra

te

• Audio

• Visual

• Verbal

I. Temporal dynamic

II. Hidden substructure

Page 22: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional

Latent-Dynamic Conditional Neural Field

h1 h3 h3 hn

y1 y2 y3 yn

g3

x3 x3

x3

gn

xn xn

xn

g2

x2 x2

x2 x1

g1

x1 x1

x1

70

80

90

100

Rapportdataset

Taskardataset

LDCRF

LDCNF

60

65

70

75

80

Audioonly

Visualonly

Earlyfusion

Latefusion

Acc

ura

cy (

%)

Audio-visual sub-challenge

LDCNFAVEC dataset (95 videos)

[FG 2013, submitted]

• Audio

• Visual

• Verbal

I. Temporal dynamic

II. Hidden substructure

III. Nonlinear input fusion

Page 23: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional

Multi-View Hidden Conditional Random Field

y

x1 x2 x3 xn

h1 h2 h3 hn

x1 x2 x3 xn

h1 h2 h3 hn

Multi-View HCRF

y1

x1 x2 x3 xn

h1 h2 h3 hn

x1 x2 x3 xn

h1 h2 h3 hn

y2 y3 yn

Multi-View LDCRF

[CVPR 2012]

• Audio

• Visual

• Verbal

I. Temporal dynamic

II. Hidden substructure

III. Nonlinear input fusion

IV. Multi-stream models

Canal 9 debate dataset 50

55

60

65

70

HMM CRF HCRF MV-HCRF

Page 24: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional

Multi-View Hidden Conditional Random Field

y

x1 x2 x3 xn

h1 h2 h3 hn

x1 x2 x3 xn

h1 h2 h3 hn

Multi-View HCRF

y1

x1 x2 x3 xn

h1 h2 h3 hn

x1 x2 x3 xn

h1 h2 h3 hn

y2 y3 yn

Multi-View LDCRF

[ICMI 2012]

Canal 9 debate dataset 50

55

60

65

70

HMM CRF HCRF MV-HCRF MV-HCRF+KCCA

• Audio

• Visual

• Verbal

I. Temporal dynamic

II. Hidden substructure

III. Nonlinear input fusion

IV. Multi-stream models

V. Multimodal synchrony

Page 25: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional

Multimodal Machine Learning

• Audio

• Visual

• Verbal

I. Temporal dynamic in label sequence

II. Hidden substructure in label sequence

III. Nonlinear modeling of instantaneous input features

IV. Multi-stream modeling of hidden substructure

V. Synchrony in multimodal input streams

VI. Multi-label structure and correlation

VII. Symbol and signal integration

VIII.Uncertainty in behavior labels

HCRF Library: ICT open-source machine learning library

http://hcrf.sf.net

Page 26: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional

Cognition Un

de

rsta

nd

ing

Ge

ne

rati

on

Decision-making

Dialog

Emotion

SimSensei Virtual Human Interaction Loop

Perception

Social Cues

Affective Cues

Physical Cues

Action

Social Cues

Affective Cues

Physical Cues Virtual World

Physical World

Page 27: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional

Cognition Un

de

rsta

nd

ing

Ge

ne

rati

on

Action

Virtual World

Decision-making

Dialog

Emotion

SimSensei Virtual Human Interaction Loop

Physical World Perception

Social Cues

Affective Cues

Physical Cues

Social Cues

Affective Cues

Physical Cues

MultiSense

Cerebella + Smartbody

Flores Dialogue manager

Page 28: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional

MultiSense + SimSensei: Video Demonstration

Page 29: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional

MultiSense + SimSensei: Video Demonstration

Page 30: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional

Collaborations and Available Technologies

MultiSense: standardized perception framework Real-time facial tracking, articulated body tracking and auditory analysis

Modular and multi-threaded architecture for easy extension

http://multicomp.ict.usc.edu/

hCRF library, machine learning library Matlab and C++ implementations of LDCRF, HCRF and CRF.

2559 downloads during the last year

http://hcrf.sf.net/

GAVAM + CLM-Z, real-time nonverbal behavior recognition Real-time head position and orientation estimation

66 facial feature tracking with automatic initialization

http://multicomp.ict.usc.edu/

Page 31: Modeling Human Communication Dynamics - Allen School · Albert (Skip) Rizzo Louis-Philippe Morency ... II. Hidden substructure Nonlinear input fusion. Multi-View Hidden Conditional

Thank you!