38
Facial Expression Recognition and Generation Deepali Aneja Ph.D. student Computer Science and Engineering University of Washington

Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

Facial Expression Recognition and Generation

Deepali Aneja Ph.D. student

Computer Science and Engineering University of Washington

Page 2: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

Motivation • Accurate facial expression depiction is critical for storytelling.

• And difficult!

0% 13% 25% 38% 50% 63%

JoySadness

AngerSurprise

FearDisgustNeutral

0% 13% 25% 38% 50%

JoySadness

AngerSurprise

FearDisgustNeutral

0% 13% 25% 38% 50%

Joy

Sadness

Anger

Surprise

Fear

Disgust

Neutral

We asked three professional animators to make the character appear as surprised as possible. None of the expressions achieved above 50% recognition on

Mechanical Turk testing.

Page 3: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

Use human anatomy (FACs) to generate expressions

MPEG – 4 HapFACS HapFACS FACSGen (Anger) (Anger) (Fear) (Fear)

Page 4: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

Adobe Character Animator (Geometry + Audio input)

Page 5: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

Problem Statement

Given that that simple geometric mappings are not sufficient:

• How can we transfer human expressions to stylized characters without losing perceptual information?

• How can we use human expressions to quickly and

automatically create expressions for a wide range of characters?

Generate characters from human expressions

Page 6: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

Our Approach

• Use deep learning to learn mappings between • human expressions and human expressions • character expressions and character expressions • human expressions and characters expressions

• Seven classes of expressions : • Joy, Sad, Anger, Disgust, Surprise, Fear and Neutral

• This isn’t just geometry mapping

• It is perceptual modelling of expressions

Page 7: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

Step 4

Retrieve characters using a perceptual model and geometry

Step 2

Learn analogous character model

Character feature space

f’( )

Step 1

Use deep learning to create a perceptual model of human expressions

Human feature space

f( )

Step 3

Learn Mapping f’( ) f( )

Part 1: Expression Retrieval

Page 8: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

Steps

Data Collection

Data Pre-processing

Network Training

using Deep Learning

Transfer expressions

Page 9: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

Data Collection - Human Database

• CK+: The Extended Cohn-Kanade [REF] -309 images • DISFA: Denver Intensity of Spontaneous Facial Actions [REF] 60,000

images • KDEF: The Karolinska Directed Emotional Faces [REF] 4900 images • MMI: 10,000 images • Total of 75K images - We balanced out the final number of the

samples for training our network to avoid any bias towards any particular expression.

Page 10: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

Data Collection - Character Database

• Eight stylized characters • The animator creates the

• key poses for each expression • labeled via Mechanical Turk (MT) to populate the database

initially • We only used the expression key poses having 70% MT test

agreement among 50 Turkers for the same pose. Interpolating between the key poses resulted in 60,000 images (around 8,000 images per character).

Page 11: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

Data Pre-processing

Extract Face 49 landmarks (Intraface)

Register faces to an average frontal face via an affine transformation

Face bounding box selection

Re-size to 256x256 pixels for analysis

Page 12: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

Registered faces

Disgust(CK+) Joy(DISFA) Anger (KDEF) Surprise (MMI)

Page 13: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

Training networks

Stylized Character

Neural Network

Human Neural

Network

A

D

F

J

N

Sa

Sa

D

F

J

N

Sa

Sa

A

Find the correlation

between the corresponding expressions

Mapping

Page 14: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

Network Training using Deep Learning

Data Augmentation • 5 crops of 227x227

from four corners • center crop • Horizontal flip

Training Human model • 4 CONV layers • 4 POOL layers • 2 Fully Connected

layers

Training character model • 3 CONV layers • 3 POOL layers • 2 Fully Connected

layers

Fine-tuning character model • N-1 layer features

Page 15: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

Network Architecture

Human CNN (HCNN) Character CNN (CCNN) Shared CNN (SCNN)

Page 16: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

When and How to Fine-tune?

• New dataset is small and similar to original dataset. • Not a good idea to fine-tune the ConvNet (overfitting) • Train a linear classifier on the CNN codes.

• New dataset is medium/large and similar to the original dataset. • Fine-tune through the full network (Our shared CNN)

• New dataset is small but very different from the original dataset. • Train the SVM classifier from activations (somewhere earlier in the network)

• New dataset is large and very different from the original dataset. • Train from scratch • Initialize with weights from a pre-trained model.

Page 17: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

Transfer Learning

FC6 features extracted

from HCNN

FC6 features

extracted from SCNN

Shared human-character

feature space

Page 18: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

Distance Metrics

• Extracted features from the last fully connected layer of both the models: human expression trained model and fine-tuned character expression model & normalized the feature vectors

• To retrieve the stylized character closest expression match to the human expression:

• Jensen—Shannon divergence Distance for expression clarity • Geometric feature distance for expression refinement

Expression feature vectors (N-1) Layer features

Geometry feature vectors

Page 19: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

Jensen—Shannon divergence • JS Divergence is symmetrical and gives a finite value:

where • Kullback—Leibler divergence is given as

where X and M are discrete probability distributions

KL Div. KL Div.

Page 20: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

Multiple correct label results

Page 21: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

Geometric distance refinement

• Since expressions are mainly controlled by muscles around the mouth, eyes and eyebrows, we focus on features that characterize the shape and location of these parts of the face.

• We use the facial landmarks to extract our geometric features including the following measurements:

• the left/right eyebrow height • left/right eyelid height • nose width • left mouth corner to mouth center distance • mouth corner to mouth center distance.

• We normalize these feature vectors and compute the L2 norm distance between

the human geometry vector and character geometry vectors with the correct expression label. Finally, we re-order the retrieved images within the matched label based on matched geometry.

Page 22: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

Layers Visualization

Input

Filter – conv1 Features – conv1 Features – conv2 Features – conv3

Prediction label: Surprise

Page 23: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

Top match results (Surprise and Joy) Query Character Retrievals

Page 24: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

Expression based Retrieval

Using CCNN

Using HCNN

Page 25: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

Evaluation

How close is the retrieved character expression label is to the human query expression label?

Retrieval Score

Spearman rank correlation coefficient

Kendall τ test

Expert Comparison

Page 26: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

Retrieval Score

• We measured the retrieval performance of our method by calculating the average normalized rank of relevant results (0 is the best score)

• The evaluation score for a query human expression image was calculated as

follows:

where where N is the number of images in the database Nrel the number of relevant expression label images to q Rk is the rank assigned to the kth relevant image.

Page 27: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

Average retrieval score for each expression across all characters

Page 28: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

Sample expert comparison

test1 test2 test3 test4 test5

Expression

test1 test2 test3 test4 test5

Expert

test1 test2 test3 test4 test5

Expression + Geometry

Rank 1 Rank 2 Rank 3 Rank4 Rank 5

Query

Page 29: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

Rank correlation coefficient • Pearson correlation coefficient

• The closer value is to 1, the better the two ranks are correlated. • The average Spearman correlation coefficient for the 30 validation rank

orderings is 0.773 ± 0.336. • Rank 1 correlation is 0.934. – Most relevant match!

• Kendall test

• Pairwise error that represents how many pairs are ranked discordant. The best matching ranks get a τ value of 1.

• The average Kendall correlation coefficient for 30 validation rank orderings is 0.706 ± 0.355

• Rank 1 correlation is 0.910 - Most relevant match!

Page 30: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

Correlation metrics with expert

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Corr

elat

ion

coef

ficie

nt

Number of validation sets

Spearman

Kendall

Page 31: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

Part 2: Generating Character Expressions

Convolutional layer Max pooling layer Fully connected layer

Surprise

Fully Connected Convolutional Neural Network

Page 32: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

Generating Character Expressions

Convolutional layer Max pooling layer Fully connected layer

N-1 feature vector

Page 33: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

Generating Character Expressions

Convolutional layer Max pooling layer Fully connected layer

N-1 feature vector

Maya parameters

Page 34: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

Learn character model parameters

Convolutional Max pooling Fully connected Soft max

N-1 feature vector

Maya parameters

Page 35: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

Preliminary Result:

Disgust expression query

Disgust expression Parameter rendering

Page 36: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

Applications

• Improve visual storytelling applications: • animated films • Gaming • Online marketing • VR/AR experiences • Robotics

• Medically-motivated application: teaching children with autism

spectrum disorder (ASD) to both recognize and convey expressions using cartoon characters in an interactive environment.

Page 37: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

Expression retrieval work to be presented at Asian Conference on Computer Vision (Nov 2016).

Project webpage http://grail.cs.washington.edu/projects/deepexpr/

Page 38: Facial Expression Recognition and Generationhomes.cs.washington.edu/~shapiro/EE562/notes/Expressions.pdf · Facial Expression Recognition and Generation Deepali Aneja . Ph.D. student

Questions?