6
Relational Learning based Happiness Intensity Analysis in a Group Tuoerhongjiang Yusufu, Naifan Zhuang, Kai Li, Kien A. Hua Department of Computer Science University of Central Florida Orlando,Florida Email: {yusufu, kaili,kienhua}@cs.ucf.edu [email protected] Abstract—Pictures and videos from social events and gather- ings usually contain multiple people. Physiological and behav- ioral science studies indicate that there are strong emotional connections among group members. These emotional relations among group members are indispensable to better analyzing individual emotions in a group. However, most of the existing affective computing methods focus on estimating the emotion of a single object only. In this work, we concentrate on estimating happiness intensities of group members while considering the reciprocities among them. We propose a novel facial descriptor that effectively captures happiness related facial action units. We also introduce two different structural regression models, Continuous Conditional Random Fields (CCRF) and Continu- ous Conditional Neural Fields (CCNF), for estimating emotions of group members. Our experimental results on HAPPEI dataset demonstrate the viability of proposed features and the two frameworks. Keywords-Action Units, Happiness Intensity, Group, Proba- bilistic Graphic Model I. I NTRODUCTION Millions of images and videos from different social event and gatherings are uploaded and shared each day. In a social event, such as party, wedding or graduation ceremony, many pictures and videos are taken. These images and videos usually contain multiple people. Techniques for analyzing and understanding group images and videos have many applications. Recently, the study of a group of people in an image or a video has received much attention in the computer vision community for different research purposes. Callagher and Chen [22] proposed contextual features based on the group structure for computing the age and gender of individuals. Eichner at el. [24] present a novel multi-person pose esti- mation framework. In this paper we are also interested in group pictures. However, our topic is emotions in a group. Human affect analysis is a long-studied problem for its importance in human-computer interaction and affec- tive computing. Most of the automatic affect analysis and recognition algorithms in existing works, however, focus on analyzing the expressions and emotions of an individual only [3][4]. Although there are some works on analyzing group affect [5][6][7], they are interested in inferring the emotional intensity of a group as whole. Analyzing an individual’s emotion in a group context is still an unexplored problem. Figure 1: Group images from different social gatherings. Based on human cognitive and behavioral researches [1][2], group members bring their individual level emotional experiences, such as dispositional affect, moods, emotions, emotional intelligence, and sentiments, with them to a group interaction. Then through a variety of explicit and implicit processes, individual-level moods and emotions are spread and shared among group members. In other words, in a group, emotions of group members are connected to each other. Assessing reciprocity among the group members is indispensable to better understanding individual level emo- tions of group members. In this paper, we focus on modeling the relations among individual emotions in a group. After extensive research, we find that HAPPEI [8] dataset is the only suitable dataset for our research, as it include all group images and each face is annotated with different level of happiness intensity . Figure 1 shows some group images from HAPPEI dataset. All pictures in this dataset are taken from different social gatherings. Since we use the HAPPEI dataset, in this paper we only study two types of basic human expressions: happiness and neutral. Interestingly, as people tend to present themselves in a favorable way [30], most of 2016 IEEE International Symposium on Multimedia 978-1-5090-4571-6/16 $31.00 © 2016 IEEE DOI 10.1109/ISM.2016.115 353

Relational Learning Based Happiness Intensity Analysis in a Groupeecs.ucf.edu/~kaili/pdfs/paper9.pdf · Most of the automatic affect analysis and recognition algorithms in existing

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Relational Learning Based Happiness Intensity Analysis in a Groupeecs.ucf.edu/~kaili/pdfs/paper9.pdf · Most of the automatic affect analysis and recognition algorithms in existing

Relational Learning based Happiness Intensity Analysis in a Group

Tuoerhongjiang Yusufu, Naifan Zhuang, Kai Li, Kien A. Hua

Department of Computer ScienceUniversity of Central Florida

Orlando,FloridaEmail: {yusufu, kaili,kienhua}@cs.ucf.edu [email protected]

Abstract—Pictures and videos from social events and gather-ings usually contain multiple people. Physiological and behav-ioral science studies indicate that there are strong emotionalconnections among group members. These emotional relationsamong group members are indispensable to better analyzingindividual emotions in a group. However, most of the existingaffective computing methods focus on estimating the emotion ofa single object only. In this work, we concentrate on estimatinghappiness intensities of group members while considering thereciprocities among them. We propose a novel facial descriptorthat effectively captures happiness related facial action units.We also introduce two different structural regression models,Continuous Conditional Random Fields (CCRF) and Continu-ous Conditional Neural Fields (CCNF), for estimating emotionsof group members. Our experimental results on HAPPEIdataset demonstrate the viability of proposed features and thetwo frameworks.

Keywords-Action Units, Happiness Intensity, Group, Proba-bilistic Graphic Model

I. INTRODUCTION

Millions of images and videos from different social event

and gatherings are uploaded and shared each day. In a social

event, such as party, wedding or graduation ceremony, many

pictures and videos are taken. These images and videos

usually contain multiple people. Techniques for analyzing

and understanding group images and videos have many

applications.

Recently, the study of a group of people in an image or

a video has received much attention in the computer vision

community for different research purposes. Callagher and

Chen [22] proposed contextual features based on the group

structure for computing the age and gender of individuals.

Eichner at el. [24] present a novel multi-person pose esti-

mation framework.

In this paper we are also interested in group pictures.

However, our topic is emotions in a group.

Human affect analysis is a long-studied problem for

its importance in human-computer interaction and affec-

tive computing. Most of the automatic affect analysis and

recognition algorithms in existing works, however, focus on

analyzing the expressions and emotions of an individual only

[3][4]. Although there are some works on analyzing group

affect [5][6][7], they are interested in inferring the emotional

intensity of a group as whole. Analyzing an individual’s

emotion in a group context is still an unexplored problem.

Figure 1: Group images from different social gatherings.

Based on human cognitive and behavioral researches

[1][2], group members bring their individual level emotional

experiences, such as dispositional affect, moods, emotions,

emotional intelligence, and sentiments, with them to a group

interaction. Then through a variety of explicit and implicit

processes, individual-level moods and emotions are spread

and shared among group members. In other words, in a

group, emotions of group members are connected to each

other. Assessing reciprocity among the group members is

indispensable to better understanding individual level emo-

tions of group members. In this paper, we focus on modeling

the relations among individual emotions in a group.

After extensive research, we find that HAPPEI [8] dataset

is the only suitable dataset for our research, as it include all

group images and each face is annotated with different level

of happiness intensity . Figure 1 shows some group images

from HAPPEI dataset. All pictures in this dataset are taken

from different social gatherings. Since we use the HAPPEI

dataset, in this paper we only study two types of basic human

expressions: happiness and neutral. Interestingly, as people

tend to present themselves in a favorable way [30], most of

2016 IEEE International Symposium on Multimedia

978-1-5090-4571-6/16 $31.00 © 2016 IEEE

DOI 10.1109/ISM.2016.115

353

Page 2: Relational Learning Based Happiness Intensity Analysis in a Groupeecs.ucf.edu/~kaili/pdfs/paper9.pdf · Most of the automatic affect analysis and recognition algorithms in existing

the uploaded and shared pictures on websites are positive.

Studying happiness in the group has many real-world appli-

cations, such as emotion ranking, event summarization and

highlight summarization, image search and retrieval, etc.

The key contributions of this paper are as follows:

1) We propose a novel compact facial descriptor which

refers to happiness related ction units (AUs). This feature

effectively represent happiness intensities.

2) We introduce a Continuous Conditional Random Fields

(CCRF) based emotion prediction model. This model com-

bines Support Vector Regressors (SVR) and Continuous

Conditional Random Fields (CCRF) to model the relations

between different individuals emotions in a group image.

3) We also introduce a Continuous Conditional Neural

Fields (CCNF) model for directly estimating emotion inten-

sities of all group members together while considering the

relation among group members.

This paper is organized as follows: In section 2, we

discuss the previous related works. In Section 3, we intro-

duce the proposed feature extraction and emotion estimation

frameworks. In Section 4, we present the results of exam-

ining the proposed feature and structured regression models

for happiness level estimation in a group. Finally, we draw

our conclusions in Section 5.

II. RELATED WORKS

Facial image descriptors can be classified as appearance

features and geometric features.

Appearance features describe the skin texture of faces.

Because appearance features are usually extracted from

small regions, this type of features are robust to illumination

variations. Moreover, as most of the appearance features are

obtained by concatenating local histograms, and they are

also normalized, it increase the robustness of the overall rep-

resentation. They are also robust to registration errors as they

involve pooling over histogram. However, as appearance

features favor identity-related cues rather than expressions,

this kind of features are affected by identity bias. Most

popular appearance representations are local binary patterns

(LBP) [17] and local phase quantization (LPQ) [18]. Other

features such as Histogram of gradients(HoG) [19], pyramid

of histogram of gradients (PHOG) [29], quantized local

Zernike moments (QLZM) [20] and Gabor wavelets [21]

are also frequently used as facial descriptors.

Geometric features represent the facial geometry, such as

the shapes of face and the locations of facial landmarks

[9][10][11]. Since this kind of features are based on coordi-

nate values instead of pixel values, they are more robust

to illumination variations than appearance features. More

importantly, geometric features are less affected by identity

bias, which makes geometric features more suitable for ex-

pression analysis. However, the disadvantage of geometric-

based features is that they are vulnerable to registration

errors.

We want to model the affect continuously. Because dis-

cretization may lead to loss of information and relationships

between neighboring classes, the regression techniques are

the natural choice for our problem.

Most popular regression techniques are linear and logistic

regression, support vector regression, neural networks and

relevance vector machine (RVM) [26]. However, they are all

designed to model input-output dependencies disregarding

the output-output relations.

Recently, Conditional Random Fields (CRF) based struc-

tured regression models have received many attentions from

researchers. CRF technique is a powerful tool for relational

learning because it allows to model relations between objects

and contents of objects. As an extension to the classic

CRF to apply for continuous case, Continuous Conditional

Random Fields (CCRF)[31] has been successfully applied

to global ranking problems [31], emotion tracking in music

[32], and dimensional affect recognition in temporal data

[33], etc. Continuous Conditional Neural Fields (CCNF) [23]

is an extension of Conditional Neural Fields (CNF) . It

also can define temporal and spatial relationships. CCNF

has been applied for emotion prediction in music [25],

facial action unit recognition and facial landmark detection

tasks [35]. Both CCRF and CCNF can perform structured

regression, and they can easily define temporal and spatial

relationships.

III. PROPOSED FRAMEWORK

A. Facial Feature Extreaction

In the HAPPEI dataset, each face in a group image is

annotated with happiness intensity of 6 stages: Neutral,

Small Smile, Large Smile, Small Laugh, Large Laugh and

Thrilled. Since we are dealing with only two kind basic

human expressions- neutral and happiness, we propose a

problem specific and more efficient facial feature for happi-

ness intensity estimation.

Previous works in psychology and computer vision have

shown the value of using Action Unit (AUs) for analyzing

facial expressions [11][12][13]. In the Facial Action Coding

System (FACS) [27], AUs are related to the contractions

of specific facial muscles. Among 30 AUs, 12 of them are

for upper face and 18 are for the lower face. Any facial

expression can be explained as occurrence of an AU or

occurrence of a combination of several AUs.

In order to clearly show different happiness levels, in

Figure 2, we take some pictures of the same object from CK

database [34] and present four levels of happiness intensities

and corresponding AUs. We can see in a neutral face, the

eyes, brow and cheeks are relaxed, and lips are relaxed and

closed. When a person expresses his or her happiness, their

cheeks, upper and lower eyelids would be raised. At the

same time, the lip corners would be pulled obliquely, lips

would be relaxed and parted, mandible may be lowered. Any

354

Page 3: Relational Learning Based Happiness Intensity Analysis in a Groupeecs.ucf.edu/~kaili/pdfs/paper9.pdf · Most of the automatic affect analysis and recognition algorithms in existing

(a) Neutral (b) AU6+12

(c) AU6+12+25 (d) AU6+7+12+25

Figure 2: Happiness expressions and corresponding AUs.

Figure 3: Facial Landmarks

level of happiness can be expressed as combination of AU5,

AU6, AU7, AU12, AU25 and AU26.

Inspired by previous works [11][14][15], we extract ge-

ometric facial features referring to happiness related AUs.

We call the new feature Happiness Related Facial Feature

(HRFF). The facial feature extraction steps are as follows:

1) Face detection: we use Viola-Jones [28] face detection

algorithm.

2) Facial landmark detection and non-face elimination:

Intraface [16] ia applied to detect 49 facial landmarks from

each detected faces. Using the landmark detection results we

also can eliminate most of the falsely detected faces. The

reason is that, we can’t extract expected landmarks from

non-face objects. Figure 3 shows the locations and indices

of the corresponding 2D facial landmarks.

3) Face resizing and alignment: each face is resized to

128× 128 pixels. Results from Intrafae are used to perform

face alignment.

4) Geometric features are calculated using aligned land-

marks. Table I presents the descriptions and measurements

of the 6 dimensional facial features that correspond to

happiness related AUs.

Table I: Happiness Related Facial Feature (HRFF)

Features Implication Measurement AUs

f1,2 Eye lid movementSum of distances between

corresponding landmarks onthe upper and lower lips

AU5,AU7

f3 Lip tightener

Sum of distances ofcorresponding points on the

upper and lower mouth outercontour

AU25,AU26

f4 Lip parted

Sum of distance ofcorresponding points on the

upper and lower mouth innercontour

AU25,AU26

f5 Lip DepressorAngle between mouth corners

and lip upper center A12

f6 Cheek raiserAngle between nose wing and

nose centerAU6

B. Group Happiness Intensity Estimation

We select CCRF and CCNF as happiness intensity esti-

mation model in a group, as it has shown promising results

for continuous variable modeling when the extra context is

required.

Both CCRF and CCNF are undirected graphical models

that can learn the conditional probability of a continuous

valued vector y depending on continuous X. They are

discriminative approaches, where the conditional probability

P (y|X) is modeled explicitly. The graphical models that

represents CCRF and CCNF for emotion prediction in a

group are presented in Figure 4.

The probability density function for CCRF and CCNF can

be written as below:

P (y|X) =exp(Ψ)∫∞−∞ exp(Ψ)

(1)

In the CCRF model, the Ψ is defined as:

Ψ =∑

i

K1∑

k=1

αkfk(yi, Xi) +∑

i,j

K2∑

k=1

βkgk(yi, yj , X) (2)

Above X = {X1, X2, ...., Xn} is the set of facial features

vectors that can be represented as a matrix. Each row

corresponding to a face feature vector for each detected face.

y = {y1, y2, ...., yn} is the output variables that we want to

predict. In our case, it is the happiness intensity of each

355

Page 4: Relational Learning Based Happiness Intensity Analysis in a Groupeecs.ucf.edu/~kaili/pdfs/paper9.pdf · Most of the automatic affect analysis and recognition algorithms in existing

(a) CCRF Model

(b) CCNF Model

Figure 4: Proposed Frameworks

individual in a group image. In CCRF, two type of features

are defined. They are vertex features fk and edge features

gk.

fk(yi, Xi) = −(yi −Xi,k)2 (3)

gk(yi, yj , X) = −1

2S(k)i,j (yi − yj)

2 (4)

Vertex features fk represent the dependency between the

Xi,k and yk. In our case, it is dependency between a

happiness intensity prediction from a regressor and the actual

happiness intensity level. The parameter αk controls the

reliability of particular signal for a particular emotion.

Edge features gk represent the dependencies between

observations yi and yj , for example, how related is the

happiness intensity of person A and person B in a group.

This is also affected by the similarity measure Sk. The

parameter βk and similarities Sk allow us to control the

effect of such connections between emotions. αk and βk

are positive. We selected our similarity function as:

Si,j = exp(−‖Xi −Xj‖δ

) (5)

In the CCNF model the Ψ is defined as:

Ψ =∑

i

K1∑

k=1

αkfk(yi, Xi, θk) +∑

i,j

K2∑

k=1

βkgk(yi, yj , X)

(6)

Here again, αk and βk are positive, and Θ is uncon-

strained. Similar to CCRF, CCNF also has the same edge

feature, and also use the same similarity function to enforce

smoothness between neighboring nodes. But the vertex fea-

ture fk in CCNF represents the mapping from the Xi to yithrough a one layer neural network, and the new parameter

θk in CCNF represents the weight vector for a particular

neuron k. The number of vertex features k is determined

experimentally during cross-validation. The vertex feature

in CCNF can be written as:

fk(yi, Xi, θk) = −(yi − h(θk, Xi))2 (7)

where

h(θ,Xi) =1

1 + e−θTXi(8)

In the learning stage, we pick the α, β values for CCRF

model. For CCNF, we pick the α,β,Θ and k parameters to

optimize the conditional log-likelihood of the model on the

training images. All of the parameters are optimized jointly.

L(α, β,Θ) =n∑

q=1

logP (y(q)|x(q)) (9)

(α, β, Θ) = argmax(L(α, β,Θ)) (10)

Because both Eq.2 and Eq.6 are convex, the optimal

parameter values can be determined using standard tech-

niques such as stochastic gradient ascent or other general

optimization techniques. Both CCRF and CCNF models

can be viewed as multivariate Gaussian [33][36], inferring

output values that maximize Ψ(y|X) is straightforward and

efficient.

IV. EXPERIMENTAL ANALYSIS

Because the HAPPEI database is the only dataset related

to both group and happiness intensity levels, we examine

the performance of our new facial feature and introduced

emotion estimation frameworks at the same time. All ex-

periments are conducted on MATLAB 2015a, with 3.16Hz

CPU and 4GB RAM computer environment.

2000 group images, including 7248 faces are used in our

experiments. We conducted 4-fold cross-validation, where

1500 images are selected for training and 500 for testing.

The reported results are the average result of 4 folds.

First, We extracted LBP, LPQ, and PHOG features to

evaluate the computational complexity of HRFF.

Table II: Average Feature Extraction Time

Features Feature Dimension Execution time(second)LBP 256 0.0025LPQ 256 0.5250

PHOG 680 0.0286HRFF 6 0.0004

As we can see from Table II, LPQ feature takes highest

execution time. Although PHOG have highest dimension

of 680, but its extraction time is much smaller than LPQ.

356

Page 5: Relational Learning Based Happiness Intensity Analysis in a Groupeecs.ucf.edu/~kaili/pdfs/paper9.pdf · Most of the automatic affect analysis and recognition algorithms in existing

LBP is faster than PHOG and LPQ because calculating

LBP don’t require any transformation. But LPQ is based on

computing the short-term Fourier transform (STFT) on each

local image patch. As an extension of HOG, PHOG is based

on simple gradient operation. That is why LBP is faster

than PHOG and PHOG is faster than LPQ. However, HRFF

outperformed all of those features in terms of extraction and

processing speed, because it only related to few calculation

on coordinate values. The compactness and fast extraction

time are highly desirable in real-time emotion analysis

systems, such as real time event satisfaction level analysis

and tracking.

Then, we use above extracted features to train and test

emotion estimation models we introduced. After that, we

evaluate the performance of each descriptor and structured

regression models at the same time. We compared the

performance of CCRF and CCNF with the most popular

regression model- Support Vector Regressors(SVR) to show

how relational learning models can improve the performance

compared to single face analysis methods.

For SVR-based experiments we used a 2-fold cross val-

idation on each fold of training data to pick the hyper-

parameters. Then chosen hyper-parameters are used to train

on the whole dataset.

For CCRF-based experiments, each fold of training data

split into two parts. One part is used for training SVR and

the other is for training CCRF. Then we performed a 2 fold

cross-validation on both SVR and CCRF training data to

choose the hyper-parameters. These hyper-parameters are

then used for training on the whole training data.

For CCNF-based experiments we also used a 2-fold cross

validation on each fold of training data to pick the hyper-

parameters. Similar to CCRF, we use the chosen hyper-

parameters for training on the whole dataset. BFGS Quasi-

Newton method is used for both cross validation and training

stages.

We used two different evaluation metrics. In terms of

prediction accuracy, we selected mean squared error (MSE).

For prediction structure, we selected average correlation

coefficients. They are most common evaluation metrics for

regression models. Notice, smaller MSE values correspond

to better performance, while the opposite is true for corre-

lation coefficients.

Table III. shows the average mean squared error (MSE)

for happiness intensity estimation with different models with

different facial features. And Table IV, presents the average

correlation coefficient of different models with different

descriptors.

Table III: Mean Squared Error

LBP LPQ PHOG HRFFSVR 1.549 1.441 0.811 0.588

SVR + CCRF 1.531 1.425 0.796 0.575CCNF 1.514 1.410 0.783 0.561

Table IV: Correlation Coefficient

LBP LPQ PHOG HRFFSVR 0.039 0.097 0.486 0.632

SVR + CCRF 0.043 0.104 0.491 0.635CCNF 0.041 0.107 0.496 0.640

We can see from Table III and Table IV, the best result

is achieved when CCNF and HRFF are combined. LBP

and LPQ obtained highest MSE and lowest correlation

coefficients. The performance of PHOG is in between HRFF

and other appearance features. The driving of LBP and LPQ

features are highly effected by identity bias. That makes

them not the good options for facial expression analysis.

The performance of PHOG is better than LBP and LPQ,

because it take both gradient orientations and spatial layouts

into consideration. Our geometric features outperforms all

other face descriptors on this wild collected images, because

HRFF is directly related to happiness related facial AUs.

We can also see from experiment results in the Table

III and Table IV, the combination of SVR and CCRF

obtained consistently better result than SVR alone in both

evaluation metrics. It proves that considering the relations

and reciprocities among group members will improve the

emotion estimation results. Among these two structured

regression models we introduced, CCNF achieves the best

result because of its learning capacity and the nonlinearity of

the neural network. Compared to CCRF, training process of

CCNF is not too complex, because it don’t have to combine

with another regression model. It take the facial features

as direct input and train the model while considering the

emotional relations from the beginning.

V. CONCLUSION

In this paper, we proposed a novel facial descriptor and

introduced two model for happiness intensity estimation

in group context problem. We extracted compact geomet-

ric features from facial landmarks that refer to facial ac-

tion units (AU)s. For emotion estimation, we used two

structured regression frameworks-Continuous Conditional

Random Fields(CCRF) and Continuous Conditional Neural

Fields(CCNF). The combination of feature descriptor and

emotion estimation model is used to infer the happiness

intensities in a group of people.

We conducted experiments on HAPPEI database to show

how the proposed facial feature considerably improves the

performance of happiness intensity estimation. We also

tested the performances of two different structured regres-

sion models, and compared with most popular regression

model - Support Vector Regression (SVR). Experimental

results indicate that, compared to traditional single face

analysis methods, considering the relations between faces

in a group will improve emotion estimation accuracy sig-

nificantly. Experiment result also shows CCNF have better

performances over CCRF.

357

Page 6: Relational Learning Based Happiness Intensity Analysis in a Groupeecs.ucf.edu/~kaili/pdfs/paper9.pdf · Most of the automatic affect analysis and recognition algorithms in existing

In future, we will extend our method to real time emotion

tracking of multiple people in video sequences. We also

expecting to use deep learning methods to improve the

accuracy of emotion estimation and prediction.

VI. ACKNOWLEDGMENT

This material is based upon work partially supported

by the NASA under Grant Number NNX15AV40A. Any

opinions, findings, and conclusion or recommendations ex-

pressed in this materials are those of the authors and do

not necessarily reflect the views of the National Science

Foundation.

REFERENCES

[1] Janice R. Kelly, Sigal G. Barsade, Mood and Emotions in SmallGroups and Work Teams, 3rd ed. Harlow, England: Addison-Wesley, 1999.

[2] S. Barasade and D. Gibson, Group Emotion: a View from Topand Bottom, 3rd ed. Harlow, England: Addison-Wesley, 1999.

[3] Z. Zeng,M. Pantic, G. I. Toisman, and T.S. Huang, A surveyof affect recognition methods: Audio, visual, and spontaneousexpressions, IEEE Trans. Pattern Anal. Mach. Intell., vol.31, no. 1,pp. 39-58,Jan. 2009.

[4] E. Sariyanidi, H. Gunes, and A. Cavallaro, Automatic analysisof facial affect: A survey of registration, representation andrecognition, . IEEE Trans. on Pattern Analysis and MachineIntelligence, pp. 1-22,2014.

[5] A. Dhall, R. Goecke, and T. Gedeon, Automatic Group Hap-piness Intensity Analysis, . IEEE Transactions on AffectiveComputing, vol.6, no. 1, 2015

[6] W. Mou, O. Celiktutan and H. Gunes, Group-level Arousal andValence Recognition in Static Images: Face, Body and Context,IEEE International Conference and Workshops on AutomaticFace and Gesture Recognition (FG) 2015.

[7] A. Dhall, J. Joshi, K. Sikka, R. Goeckee and N. Sebe, TheMore the Merrier: Analysing the Affect of a Group of Peoplein Images, IEEE International Conference and Workshopson Automatic Face and Gesture Recognition (FG) 2015.

[8] A. Dhall, J. Joshi, I. Radwan and R. Goecke, Finding HappiestMoments in a Social Context, ACCV, 2012.

[9] S. Lucey, A. B. Ashraf, and J. Gohn,, Investigating sponta-neous facial action recognition through AAM representationsof the face, Face recognition Book. Mamendorf, Germany:Pro Literatur Verlag, 2007.

[10] M. Valstar, H. Gunes, and M. Pantic, How to distinguishposed from spontaneous smiles using geometric features,Proc. ICM Int. Conf. Multimodal Interfaces, 2007.

[11] Y. L. Tian, T. Kanade, and J. Cohn, Recognizing action unitsfor facia lexpression analysis, IEEE Trans. Pattern Anal.Mach. Intell., vol.23, no. 2,pp.97-115, Feb.2001.

[12] G. Littlewort, M. S. Bartlett, I. Fasel, J. Susskind and J.Movellan, Dynamics of facial expression extracted Automat-ically from Video, IEEE Conference on Computer Visionand Pattern Recognition Workshops (CVPRW) 2004.

[13] D. McDuff, R. El Kaliouby, K. Kassam, and R. Picard, Affectvalence inference from facial action unit spectrograms, IEEEconf. Compu. Vis. Pattern Recogni. Workshop 2004.

[14] F. Zhou, F. De la Torre, and J. F. Cohn, Unsuperviseddiscovery of facial events, 2010 IEEE Comput. Soc. Conf.Comput. Vis. Pattern Recognit., 2010.

[15] ] Michael Xuelin Huang, Grace Ngai, Kien A. Hua, Identify-ing User-specific Facial Affects from Spontaneous Expressionswith Minimal Annotation, IEEE Transactions on AffectiveComputing 2015.

[16] X. Xiong and F. De la Torre, Supervised descent method andits applications to face alignment, IEEE CVPR 2013.

[17] T. Ahonen, A. Hadid and M. Pietikainen, Face Descriptionwith Local Binary Patterns: Application to Face Recognition,IEEE Trans. on Pattern Ana. and Mach. Intel vol.28, 2006.

[18] V. Ojansivu and J. Heikkila, Blur Insensitive Texture Classi-fication Using Local Phase Quantization, In Proc. Int. Conf.Image Signal Process., 2008, pp.236-243.

[19] N. Dalal and B. Triggs, Histograms of Oriented Gradientsfor Human Detection, IEEE Conf. Comput. Vis. PatternRecogni., vol.1, 2005,pp.886-893.

[20] E. Sariyanidi, H. Gunes, M. Gokmen and A. Cavallaro, LocalZernike moment representations for facial affect recognition,British Machine Vision Conference 2013.

[21] C. Liu and H. Wechsler, Gabor feature based classificationusing the enhanced fisher linear discriminant model for facerecognition, IEEE Transactions on Image Processing, 2002.

[22] A. C. Gallagher, T. Chen, Understanding Images of Groupsof People, IEEE CVPR 2009.

[23] M. Eichner and V. Ferrari, We are Family: Joint Pose Estima-tion of Multiple Persons, European Conference on ComputerVision , 2010.

[24] V. Imbrasaite, T. Baltrusaitis, and P. Robinson CCNF forContinuous Emotion Tracking in Music: Comparison withCCRF and relative feature representation, IEEE Intern. Conf.on Multimedia and Expo, Multimedia Affective Computing,2014.

[25] T. Baltrusaitis, P. Robinson, and L.-Philippe Morency Con-tinous Conditional Neural Fields for Structured Regression,ECCV, 2014.

[26] Bishop, C.M., Pattern Recognition and Machine Learning.,Springer-Verlag New York,Inc. 2006.

[27] P. Ekman and W. V. Friesen, The Facial Action CodingSystem: A technique for the measurement of Facial Movement.,San Francisco: Consulting Psychologists Press, 1978.

[28] P. Viola and M. Jones, Rapid Object Detection using aBooosted Cascade of Simple Features., IEEE CVPR 2001.

[29] A. Bosch, A. Zisserman, and X. Munoz, Representing shapewith a spatial pyramid kernel., ACM International Confer-ence on Image and Video Retrieval (CIVR), 2007.

[30] H. G. Chou, N. Edge, ”They are Happier and HavingBetter lives than I Am”: The Impact of Using Facebook onPerceptions of Others’ Lives, Cyberpychology, Behavior,and Social Networking, vol 15, no 2, 2012.

[31] T. Qin,T. L, X. Zhang,D. Wang, H. Li, Global Ranking UsingContinuous Conditional Random Fields, Conference onNeural Information Processing Systems (NIPS) , 2008.

[32] V. Imbrasaite, T. Baltrusaitis, and P. Robinson, EmotionTracking in Music Using Continuous Conditional RandomFields and Relative Feature Representation, IEEE Intern.Conf. on Multimedia and Expo Workshops, 2013.

[33] T. Baltrusaitis,N. Banda, and P. Robinson, Dimensional AffectRecognition using Continuous Conditional Random Fields,IEEE International Conference and Workshops on AutomaticFace and Gesture Recognition (FG), 2013.

[34] Kanade, T., Cohn, J. F., and Tian, Y. Comprehensive databasefor facial expression analysis, IEEE International Conferenceand Workshops on Automatic Face and Gesture Recognition(FG), 20000.

358