16
Affective Storytelling Automatic Measurement of Story Effectiveness from Emotional Responses Collected over the Internet Daniel McDuff PhD Proposal in Media Arts & Sciences Affective Computing Group, MIT Media Lab [email protected] June 6, 2012 Executive Summary Emotion is key to the effectiveness of narratives and storytelling whether it be in influencing memory, likability or persuasion. Stories, even if fictional, have the ability to induce a genuine emotional response. However, the understanding of the role of emotions in storytelling and ad- vertising effectiveness has been limited due to the difficulty in measuring emotions in real-life contexts. Video advertising is a ubiquitous form of a short story usually 30-60 seconds designed to influence, persuade and engage, in which media with emotional content is frequently used and this will be one of the focuses of this thesis. The lack of understanding of the effects of emotion in advertising results in large amounts of wasted time, money and other resources. Facial expressions, head gestures, heart rate, respiration rate and heart rate variability can in- form us about the emotional valence, arousal and attention of a person. In this thesis I propose to demonstrate how automatically detected naturalistic and spontaneous facial responses and physio- logical responses can be used to predict the effectiveness of stories. I propose a framework for automatically measuring facial and physiological responses in addi- tion to self-report and behavioral measures to content (e.g. video advertisements) over the Internet in order to understand the role of emotions in story effectiveness. Specifically, I will present anal- ysis of the first large scale data of facial, physiological, behavioral and self-report responses to video content collected “in-the-wild” using the cloud. I will develop models for evaluating the ef- fectiveness of stories (e.g. likability, persuasion and memory) based on the automatically extracted features. This work will be evaluated on the success in predicting measures of story effectiveness that are useful in creation of content whether that be in copy-testing or content development. i

Affective Storytelling Automatic Measurement of …...Affective Storytelling Automatic Measurement of Story Effectiveness from Emotional Responses Collected over the Internet Daniel

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Affective Storytelling Automatic Measurement of …...Affective Storytelling Automatic Measurement of Story Effectiveness from Emotional Responses Collected over the Internet Daniel

Affective StorytellingAutomatic Measurement of Story Effectiveness from

Emotional Responses Collected over the Internet

Daniel McDuffPhD Proposal in Media Arts & Sciences

Affective Computing Group, MIT Media [email protected]

June 6, 2012

Executive SummaryEmotion is key to the effectiveness of narratives and storytelling whether it be in influencing

memory, likability or persuasion. Stories, even if fictional, have the ability to induce a genuineemotional response. However, the understanding of the role of emotions in storytelling and ad-vertising effectiveness has been limited due to the difficulty in measuring emotions in real-lifecontexts. Video advertising is a ubiquitous form of a short story usually 30-60 seconds designedto influence, persuade and engage, in which media with emotional content is frequently used andthis will be one of the focuses of this thesis. The lack of understanding of the effects of emotion inadvertising results in large amounts of wasted time, money and other resources.

Facial expressions, head gestures, heart rate, respiration rate and heart rate variability can in-form us about the emotional valence, arousal and attention of a person. In this thesis I propose todemonstrate how automatically detected naturalistic and spontaneous facial responses and physio-logical responses can be used to predict the effectiveness of stories.

I propose a framework for automatically measuring facial and physiological responses in addi-tion to self-report and behavioral measures to content (e.g. video advertisements) over the Internetin order to understand the role of emotions in story effectiveness. Specifically, I will present anal-ysis of the first large scale data of facial, physiological, behavioral and self-report responses tovideo content collected “in-the-wild” using the cloud. I will develop models for evaluating the ef-fectiveness of stories (e.g. likability, persuasion and memory) based on the automatically extractedfeatures. This work will be evaluated on the success in predicting measures of story effectivenessthat are useful in creation of content whether that be in copy-testing or content development.

i

Page 2: Affective Storytelling Automatic Measurement of …...Affective Storytelling Automatic Measurement of Story Effectiveness from Emotional Responses Collected over the Internet Daniel

Affective StorytellingAutomatic Measurement of Story Effectiveness from

Emotional Responses Collected over the Internet

Daniel McDuffPhD Proposal in Media Arts & Sciences

Affective Computing Group, MIT Media Lab

Thesis Committee

Rosalind PicardProfessor of Media Arts and Sciences, MIT Media LabThesis Supervisor

Jeffrey CohnProfessor of PsychologyUniversity of Pittsburgh

Ashish KapoorSenior Research ScientistMicrosoft Research, Redmond

Thales TeixeiraAssistant Professor of Business AdministrationHarvard Business School

ii

Page 3: Affective Storytelling Automatic Measurement of …...Affective Storytelling Automatic Measurement of Story Effectiveness from Emotional Responses Collected over the Internet Daniel

Abstract

Emotion is key to the effectiveness of narratives and storytelling whether it be in influ-encing memory, likability or persuasion. Stories, even if fictional, have the ability to induce agenuine emotional response. However, the understanding of the role of emotions in storytellingand advertising effectiveness has been limited due to the difficulty in measuring emotions inreal-life contexts. Video advertising is a ubiquitous form of a short story usually 30-60 sec-onds designed to influence, persuade and engage, in which media with emotional content isfrequently used and this will be one of the focuses of this thesis.

Facial expressions, head gestures, heart rate, respiration rate and heart rate variability caninform us about emotional valence and arousal and attention. In this thesis I propose to demon-strate how automatically detected naturalistic and spontaneous facial responses and physiolog-ical responses can be used to predict the effectiveness of stories. The results will be used toinform the creation and evaluation of new content.

I propose a framework for automatically measuring facial and physiological responses inaddition to self-report and behavioral measures to content (e.g. video advertisements) over theInternet in order to understand the role of emotions in story effectiveness. Specifically, I willpresent analysis of the first large scale data of facial, physiological, behavioral and self-reportresponses to video content collected “in-the-wild” using the cloud. I will develop models forevaluating the effectiveness of stories (e.g. likability, persuasion and memory) based on theautomatically extracted features.

1 IntroductionThere remains truth in Ray and Batra’s [28] statement: “an inadequate understanding of the roleof affect in advertising has probably been the cause of more wasted advertising money than anyother single reason.” This statement applies beyond advertising to many other forms of media andis due in part to the lack of understanding about how to measure emotion. This thesis proposaldeals with evaluating the effectiveness of emotional content in storytelling and advertising beyondthe laboratory environment using remotely measured facial and physiological responses. I will an-alyze challenging ecologically valid data collected over the Internet in the same contexts in whichthe media would normally be consumed and build a framework and set of models for automaticevaluation of effectiveness based on affective responses.

The face is one of the richest sources of communicating affective and cognitive informa-tion [11]. In addition, physiological reactions, such as changes in heart rate and other vital signs,are partially controlled by the autonomic nervous system and as such are manifestations of emo-tional processes [36]. Recent work has demonstrated that both facial behavior and physiologicalinformation can be measured directly from videos of the human face and as such emotion valenceand arousal can be measured remotely.

Previous work has shown that many people are willing to engage and share visual images fromtheir webcam over the Internet and these images and videos can be used for training automaticalgorithms for learning [32, 34, 22]. Moreover, webcams are now ubiquitous and have become astandard component on many media devices, laptops and tablets. In 2010, the number of cameraphones in use totaled 1.8 billion, which accounted for a third of all mobile phones1. In addition,

1http://www.economist.com/node/15865270

1

Page 4: Affective Storytelling Automatic Measurement of …...Affective Storytelling Automatic Measurement of Story Effectiveness from Emotional Responses Collected over the Internet Daniel

about half of the videos shared on Facebook every day are personal videos recorded from a desktopor phone camera2.

Traditionally consumer testing of video advertising, whether by self-report, facial response orphysiology, has been conducted in laboratory settings. Lab-based studies, while controlled, aresubject to bias from the presence of an experimenter and other factors (e.g. comfort with the con-text) unrelated to advertising interest that may impact the participants emotional experience [35].Conducting experiments outside a lab-based context can help avoid such problems.

Self-report is the current standard measure of affect, where people are typically interviewed,asked to rate their feeling on a Likert scale or turn a dial to quantify their state (affect dial ap-proaches). While convenient and inexpensive, self-report is problematic because it is also subjectto biasing from the context, increased cognitive load and other factors of little relevance to thestimulus being tested [30]. Self-report has a number of drawbacks including the difficulty for peo-ple to access information about their emotional experiences and their willingness to report feelingseven if they didn’t have them [8]. For many the act of introspection is challenging to performin conjunction with another task and may in itself alter that state [21]. Although affect dial ap-proaches provide a higher resolution report of a subject’s response compared to a post-hoc survey,subjects are often required to view the stimuli twice in order to help the participant introspect ontheir emotional state.

Unlike self-report, facial expressions and physiological responses are implicit, non-intrusiveand do not interrupt a person’s experience. In addition, as with affect dial ratings, facial andphysiological responses allow for continuous and dynamic representation of how affect changesover time. This represents a much richer data than can be obtained via a post-hoc survey. Asmall number of marketing studies consider the measurement of emotions via physiological [6],facial [18] or brain responses [3]. However, these are invariably performed in laboratory settingsand are restricted to a limited demographic.

Advertising and online media is global: movie trailers, advertisements and other content cannow be viewed the world over via the Internet and not just on selected television networks. It isimportant that marketers understand the nuances in responses across a diverse demographic and abroad set of geographic locations. For instance, advertising that works in certain cultural contextsmay not be effective in others. A majority of the studies of emotion in advertising have onlyconsidered a homogeneous subject pool, such as university undergraduates or a group from onelocation. There is evidence to suggest that emotions can be universally expressed on the face [10]and our framework allows for the evaluation of advertising effectiveness across a large and diversedemographic much more efficiently than is possible via lab-based experiments.

The aim of the proposed research is to utilize a framework for measuring facial, physiological,self-report and behavioral responses to commercials over the Internet in order to understand therole of emotions in advertising effectiveness (e.g. likability, persuasion and sales) and to designan automated system for predicting success based on these signals. This incorporates first-in-the-world studies of measurement of these parameters via the cloud and allows the robust explorationof phenomena across a diverse demographic and a broad set of geographic locations.

2http://gigaom.com/video/facebook-40-of-videos-are-webcam-uploads/

2

Page 5: Affective Storytelling Automatic Measurement of …...Affective Storytelling Automatic Measurement of Story Effectiveness from Emotional Responses Collected over the Internet Daniel

2 ContributionsThe main contributions of this thesis are described below:

1. To use a custom cloud based framework for collecting a large corpus of response videosto online media content (advertisements, movie trailers, etc.) with ground truth success(sharing, likability, persuasion and sales). To collect data from a diverse population to abroad range of content.

2. To automatically analyze facial responses, gestures and physiological reactions using com-puter vision algorithms.

3. To design, train and evaluate, a set of models for predicting key measures of story/advertisementeffectiveness based on facial responses, gestures and physiological features automaticallyextracted from the videos.

4. To propose generalizable emotional profiles that describe an effective story/advertisement inorder to practically inform the development of new content.

5. To implement a system (demo) that incorporates the findings into a fully automated classi-fication of a response to a story/advertisement. The predicted label will be the effect of thestory in changing likability/persuasion.

3 Background and Related Work

3.1 Storytelling, Marketing and EmotionEmotion is key to the effectiveness of narratives and storytelling [15]. Stories, even if fictional,have the ability to induce a genuine emotional response [14]. However, there are nuances in theemotional response to narrative representations compared to everyday social dialogue [25] andtherefore context specific models need to be designed.

Marketing, and more specifically advertising, makes much use of narratives and stories. Therole of emotion in marketing and advertising has been considered extensively since early work byZajonc [37] that argued that emotions function independently of cognition and can indeed over-ride it. It is widely held that emotions play a significant part in the decision-making process ofpurchasing and advertising is often seen as an effective source of enhancement of these emotionalassociations [24]. In advertising the states of amusement, surprise and confusion are of particularinterest and measurement of valence and arousal should be useful in distinguishing between thesestates.

In a study of TV commercials, Hazlett and Hazlett [18] found that facial responses, mea-sured using facial electromyography (EMG), were a stronger discriminator between commercialsand was more strongly related to recall than self-report information. Lang [20] found that pha-sic changes in heart rate could act as an indication of attention and tonic changes could act as anindication of arousal. The combination of physiology and facial responses is likely to improverecognition of emotions further still.

3

Page 6: Affective Storytelling Automatic Measurement of …...Affective Storytelling Automatic Measurement of Story Effectiveness from Emotional Responses Collected over the Internet Daniel

Sales is arguably the key measure of success of advertising and predicting behavioral measuresof success from responses will be our main focus. However, the success of an advertisement variesfrom person to person and sales figures at this level are often not available, therefore I will alsoconsider other measures of success, in particular liking, memory (recall and recognition) and per-suasion. “Ad liking” was found to be the best predictor of sales success in the Advertising ResearchFoundation Copy validation Research Project [17]. Biel [5] and Gordon [13] state that likabilityis the best predictor of sales effectiveness. Explicit memory of advertising (recall and recognition)is one of the most frequently used metrics for measuring advertising success. Independent studieshave demonstrated the sales validity of recall [17, 24]. Indeed, recall was found to be the secondbest predictor of advertising effectiveness (after ad liking) as measured by increased sales in theAdvertising Research Foundation Copy validation Research Project [17].

Behavioral methods such as ad zapping or banner click through rates are frequently used meth-ods of measuring success. Teixeira et al. [33] show that inducing affect is important in engagingviewers in online video adverts and in reducing the frequency of “zapping” (skipping the adver-tisement). They demonstrated that joy was one of the states that stimulated viewer retention in thecommercial. With our web based framework I can test behavioral measures (such as sharing orclick through) outside the laboratory in natural consumption contexts.

3.2 Facial Actions, Physiology, and EmotionsCharles Darwin was one of the first to demonstrate universality in facial expressions in his book,“The Expression of the Emotions in Man and Animals” [9]. Since then a number of other studieshave demonstrated that facial actions communicate underlying emotional information and thatsome of these expressions are consistent across cultures [10].

There are two main approaches for coding of facial displays, “sign judgment” and “messagejudgment.” “Sign judgment” involves the labeling of facial muscle movements or actions, such asthose defined in the FACS [12] taxonomy, “message judgments” are labels of human perceptualjudgment of the underlying state. In this proposal I focus on “sign judgments”, specific action unitintensities, as they are objective and not open to contextual variation.

The Facial Action Coding System (FACS) [12] is the most comprehensive labeling system.FACS 2002 defines 27 action units (AU) - 9 upper face and 18 lower face, 14 head positionsand movements, 9 eye positions and movements and 28 other descriptors, behaviors and visibilitycodes [7]. The action units can be further defined using five intensity ratings from A (minimum)to E (maximum). More than 7000 AU combinations have been observed [29].

Physiological changes, such as heart rate (HR), respiration rate (RR) and heart rate variability(HRV), are partially controlled by the autonomic nervous system, these are important in describingemotional responses in the real world [16]. Physiological changes can contain information aboutboth the emotional arousal and valence of a person.

By measuring facial responses, gestures, HR, RR and HRV we are able to capture elementsof both the valence and arousal dimensions of emotion. In addition, we can capture levels ofviewer attention. These three dimensions are likely to be important in predicting effectivenessfrom responses.

4

Page 7: Affective Storytelling Automatic Measurement of …...Affective Storytelling Automatic Measurement of Story Effectiveness from Emotional Responses Collected over the Internet Daniel

3.3 Remote Measurement of Facial Actions and PhysiologyThe first example of automated facial expression recognition was presented by Suwo et al. [31].Over the past 20 years there have been significant advances in the state of the art in action unitrecognition [38]. Our preliminary work has shown that certain actions, such as smiles, can beaccurately detected in low resolution, unconstrained videos collected via the Internet [23].

We have shown that heart rate (HR), respiration rate (RR) and heart rate variability (HRV) canbe measured remotely using camera based technology [26, 27]. This method has been validatedon webcam videos with a resolution of 640x480 pixels and a frame rate of 15 fps (correlationwith contact sensor measurements for HR: r=1.00; for RR: r=0.94; for HRV HF and LF: 0.94;all correlations p<0.001). Video of this quality should be obtainable over the Internet using ourframework.

3.4 Machine Learning for Affective ComputingThe interpretation of facial and physiological responses is a challenging pattern recognition prob-lem. The data are ecologically valid but noisy and require state of the art techniques in order toachieve strong performance predicting measures of likability, persuasion or sales. The aim is totake advantage of the huge quantities of data (1,000’s of video responses) that can be collectedusing our web based framework to design models that generalize across a range of content, gender,age and cultural demographics and a broad set of locations. In hierarchical Bayesian models priorinformation can be used in a tiered approach to make context specific predictions. I plan to imple-ment state-of-the-art models, the first examples to be trained on ecologically valid data collectedvia the Internet.

Increasingly, the importance of considering temporal information and dynamics of facial ex-pressions has been highlighted. Dynamics can be important in distinguishing between the under-lying meaning behind an expression [2, 19]. I will implement a method that considers temporalresponses to commercials taking advantage of the rich moment-to-moment data that can collectedusing automated facial and physiological analysis. Hidden Markov Models and Conditional Ran-dom Fields have been shown to be effective at modeling affective information. With multimodalinformation the coupling of multiple models may improve the predictions. Hierarchical Bayesianmodels have been used to model the interplay of emotions and attention on behavior in adver-tising [33]. These techniques provide the ability to describe the data temporally and in terms ofmultiple modalities.

4 Proposed Research

4.1 AimI propose to analyze story effectiveness based emotional responses of viewers using facial andphysiological responses measuring over the Internet. The technology allows for the remote mea-surement of affect via a webcam and I will design a custom framework and set of models forautomatic evaluation of advertising effectiveness based on this research. The dependent measureswill be based on established metrics for story and advertising success, including: sales, persuasion,

5

Page 8: Affective Storytelling Automatic Measurement of …...Affective Storytelling Automatic Measurement of Story Effectiveness from Emotional Responses Collected over the Internet Daniel

sharing and likability. Achieving this aim will involve the identification of generalizable facial ac-tion and physiological features and models that are adaptable to contexts. This work is the firstlarge scale study to consider physiological and facial responses measured “in-the-wild” via thecloud to understand the impact of emotional content in storytelling and advertising and how to useit to maximum effect. Figure 4 shows a summarization of the framework proposed which is basedon Barrett et al.’s dual-process model of emotion [4]. The valence, arousal and attention of theuser may be represented by latent variables within the models that are trained and not predictedexplicitly.

4.2 MethodologyI will use a web based framework for collecting responses over the Internet. The first iterationof this framework was presented in [22] and is shown in Figure 1. This framework allows theefficient collection of thousands of naturalistic and spontaneous responses to online videos. Fig-ure 2(a) shows example frames from data collected via this framework. Recruitment of participantshas initially been performed by creating a social interface that allows people to share a graph oftheir automatically analyzed smile response with others but recruitment can also be performed viaMechanical Turk, or another crowd marketplace, with financial incentives. The latter will be usedfor more in depth studies in which voluntary participation is difficult to obtain.

The facial response videos, an example of which is shown in Figure 2(b), will be analyzedusing automated facial action unit detection algorithms developed by Affectiva or MIT. As anexample, Affectiva’s AU12 algorithm is based on Local Binary Pattern (LBP) features with theresulting features being classified using decision tree classifiers. This outputs a frame-by-framemeasurement of smile probability. An example of the smile probability output is also shown inFigure 2(b). Although the algorithms will be trained with binary examples (e.g. AU12 vs. non-AU12) the probability outputs tend to be positively correlated with the intensity of the action,as shown in Figure 2(b). However, we must acknowledge that this interpretation not always beaccurate. Classifiers for AU1+2 (Frontalis/eyebrow raise), AU4 (Corrugator/brow furrow) andAU12 (Zygomatic Major/smile) will be used in addition to any others that are available by thetime that analysis is performed. AU1+2, AU4 and AU12 should capture the main components ofsurprise, confusion and amusement responses. Head turning, tilting and general motion will becalculated through the use of a head pose detector and facial feature tracker. The intention is tocapture information about the attention of the viewers.

Heart rate, respiration rate and heart rate variability features are calculated using a non-contactmethod described in [26, 27]. Figure 3 shows graphically how our algorithm can be used to extractthe blood volume pulse (BVP) and subsequently HR, RR and HRV information from the RGBchannels in a video containing a face. Specifically, the facial region within each video frame issegmented automatically and a spacial average of the RGB color values calculated for the regionof interest (ROI). For a given time window (typically 20-30s) the raw RGB signals are normalizedand detrended. A blind source separation technique (Independent Component Analysis (ICA)) isthen used to calculate a set of source signals. The source signal with the strongest BVP signal isfiltered and used to calculate the HR, RR and HRV. This method has been validated against contactsensors and proven to be accurate.

There will be limitations involved in collecting data over the Internet, the uncontrolled natureof this research presents several challenges. Firstly, “clean” data is not always available, motion

6

Page 9: Affective Storytelling Automatic Measurement of …...Affective Storytelling Automatic Measurement of Story Effectiveness from Emotional Responses Collected over the Internet Daniel

MEDIA

Video of webcam

footage stored

Video processed to

calculate facial and

physiological response

SELF-REPORT

3.

4. 5.

6.

Flash capture of webcam footage.

Frames sent to server.

Media clip played simultaneously.

SERVER

CLIENT

7.

User can answer self-report questions

CONSENT

2.

Participant asked if they will allow

access to their webcam stream.

Behavioral measures - sharing/

click through - recorded

1.

Participant visits site and is

introduced to the study.

HOMEPAGE/

INTRODUCTION

Figure 1: Overview of what the user experience and web-based framework that is used to crowd-source the facial videos. The video from the webcam is streamed in real-time to a server whereautomated facial expression analysis is performed. All the video processing can be performed onthe server side.

and context of the users will vary considerably and result in greater noise within our measurementsthan if the data were collected in a laboratory. In addition, the video recordings are likely to havea lower frame rate and resolution compared to those that could be collected in a laboratory. Inwhich case some more subtle and faster micro-expressions may be missed and the physiologicalmeasurements will be noisier. Secondly, detailed and reliable profiles of the participants may bedifficult to ensure in all cases. In order to address these weaknesses we will compare the resultsobtained against those from analyses of datasets collected within controlled laboratory settings.The computer vision methods for extracting facial and physiological response features will bevalidated in controlled studies with ground truth measures and against videos of differing qualitiesin order to ensure reliability on data collected over the Internet. Specifically, I intend to recruit anumber of subjects (10-20) and record video that matches those collected over the Internet withground truth measures of physiology. The accuracy of the system can be characterized under theseconditions. The AU detection algorithms will be tested against hand labeled examples of framescollected over the Internet as shown in [23].

By performing analysis online we can collect data from large populations with considerablerepresentation from diverse subgroups (gender/age/cultural background). We will recruit 150 par-ticipants for the second study proposed below and a similar number for the subsequent studies. Inthese cases recruitment will be possible through existing market research participant pools. How-ever, recruitment can also occur through a variety of other mechanisms (such as voluntary meansand paid crowd marketplaces) and by using self-report measures of age, gender and cultural back-ground.

The extracted features will be collected alongside self-report responses, as these are the currentstandard, and behavioral metrics. In order to minimize effects due to primacy and recency theorder in which advertisements are presented will be randomized. I plan to collaborate with MIT

7

Page 10: Affective Storytelling Automatic Measurement of …...Affective Storytelling Automatic Measurement of Story Effectiveness from Emotional Responses Collected over the Internet Daniel

(a)

(b)

0 5 10 15 20 25 300

0.2

0.4

0.6

0.8

1

Sm

ile P

rob

ab

ility

Time (s)

Ad

Response

Smile

Track

Figure 2: a) Example frames of data collected using a web-based framework similar to thatdescribed in Figure 1. b) A series of frames from one particular video, showing an AU12(smile/amusement) response. The smile track demonstrates how greater smile intensity is posi-tively correlated with the probability output from the classifier.

Red Channel

t1

t2

tn

Green Channel

t1

t2

tn

Blue Channel

t1

t2

tn

Red Signal

Green Signal

Blue Signal

Separated Source 1

Separated Source 2

Separated Source 3

t1

t2

tn

(a) Automated face tracking (b) Channel separation

Signal

Separation

(c) Raw traces (d) Signal components (e) Analysis of BVP

Heart rate

Respiration rate

Heart rate variability

HF/LF

Figure 3: Graphical illustration of our algorithm for extracting heart rate, respiration rate and heartrate variability from video images of a human face as described in [27].

8

Page 11: Affective Storytelling Automatic Measurement of …...Affective Storytelling Automatic Measurement of Story Effectiveness from Emotional Responses Collected over the Internet Daniel

Physiology

HR, RR, HRV

Facial Behavior

Head Gestures

Valence

Arousal

Attention

E�ectEmotion

Story/Narrative

Stimuli Measured Response

Likeability

Memory

Persuasion

Purchase

Sharing

Controlled Processing

Figure 4: Schematic of the proposed research model. Inspired by Barrett et al.’s dual-process viewof emotion [4]. The measured responses will capture information about the valence, arousal andattention of the viewer and will be used to predict the effects of the story/narrative.

Media Lab member companies in order to obtain sales data related to the advertisements.

4.3 StudiesI propose to carry out a series of studies in this research. A preliminary study has already beenperformed and was the first-in-the-world attempt to collect facial responses to videos on a largescale over the Internet. This involved testing three commercials which appeared during the 2011Super Bowl. The website was live for over a year and can be found at [1]. Visitors to the websitewere asked to opt-in to watch short videos and have their facial expressions recorded and analyzed.Immediately following each video, visitors completed a short self-report questionnaire. The videosfrom the webcam were streamed in real-time at 15 frames a second at a resolution of 320x240 toa server where automated facial expression analysis is performed. Approximately 7,000 videoswere collected in this study. This data will be used to build models for predicting advertising likingpurely from automatically measured behavior. In addition, I will investigate whether advertisingliking can be predicted effectively from only a subset of the response (e.g. the first 25% or 50%).

The second study will extend the framework and methodology used in the first study to a muchgreater number of commercials and I will extend the self-report questioning to cover more in-depthquestions. Specifically, I will be collecting and analyzing data for 150 viewers and 16 commercials(with each viewer watching a subset of the commercials). Video recordings of the participant’sresponses to the content will be collected and analyzed as described in the Methodology section.Self-report measures of persuasion, likability and familiarity will be recorded (post viewing Likertscale reports). Pre- and post-launch sales data for the products will be available. The videoscollected in this study will be of a similar quality as above (resolution: 320x240, frame rate: 15fps). This dataset will allow me to extend the modeling carried out in the preliminary study tobuild and evaluate models for predicting likability, persuasion and sales.

9

Page 12: Affective Storytelling Automatic Measurement of …...Affective Storytelling Automatic Measurement of Story Effectiveness from Emotional Responses Collected over the Internet Daniel

The third study I propose will be collecting and analyzing data for a set of advertisement con-cepts around different product ranges. This will involve approximately 100 viewers watching mul-tiple (2 or 3) advertisement concepts. Self-report measures of persuasion, likability and familiaritywill be recorded. This study will compare similar but different advertising concepts for the sameproduct. I will investigate the ability for measured emotional responses to distinguish between theefficacy of subtly different concepts for the same product.

The structure of the latter two studies will allow for richer data to be collected and a morecontrolled experimental design whilst still allowing us to collect naturalistic and spontaneous data“in-the-wild”. I will investigate the role of facial behavior and head gestures, HR, RR and HRV inpredicting the variables of persuasion, likability and sales. The dimensions of valence, arousal andattention will be modeled as latent variables within the model.

As described above I will be carrying out small-scale lab based studies to evaluate the accuracyof the physiological measurement under a greater range of conditions. This will involve a smallernumber of participants (10-20) viewing content on a computer or laptop whilst a video is recordedof their face. The method will be evaluated by its correlation with, and accuracy when comparedto, measurements from contact sensors. Data for 16 participants has been collected already, ifnecessary further data collection can be performed. For these experiments recruitment can be fromthe local community.

4.4 Plan for Completion of the ResearchTable 1 shows my tentative plan for completion of the research described in this proposal.

Timeline Work ProgressJanuary-March 2011 Analysis of Data from preliminary study completedApril-June 2012 Design of studies ongoingSeptember-November 2012 Implementation of studies plannedNovember-March 2013 Analysis of data collected plannedMarch 2013 First thesis outline plannedApril-June 2013 Complete analysis of study data plannedJuly 2013 Second thesis outline plannedAugust-December 2013 Thesis writing plannedJanuary-February 2014 Thesis defense planned

Table 1: Plan for completion of my doctoral thesis research.

4.5 Human Subjects ApprovalThe protocol for all studies will be approved by the Massachusetts Institute of Technology Com-mittee On Use of Humans as Experimental Subjects (COUHES).

10

Page 13: Affective Storytelling Automatic Measurement of …...Affective Storytelling Automatic Measurement of Story Effectiveness from Emotional Responses Collected over the Internet Daniel

4.6 CollaborationsI will be collaborating with Thales Teixeira at Harvard Business School on the modeling of effec-tiveness based on emotional responses. I will be working at Affectiva for one semester in order tocomplete parts of the data collection described. I will be building on the data collection frameworkand using the facial action unit detection algorithms.

5 BiographyDaniel McDuff is a PhD candidate in the Affective Comput-ing group at the MIT Media Lab. McDuff received his bach-elor degree, with first-class honors, and master degree in engi-neering from Cambridge University. Prior to joining the MediaLab, he worked for the Defense Science and Technology Labo-ratory (DSTL) in the UK. He is interested in using computer vi-sion and machine learning to enable the automated recognitionof affect, particularly in the domain of storytelling and advertis-ing.

Email: [email protected]: media.mit.edu/∼djmcduff

References[1] Web address of data collection site: http://www.forbes.com/2011/02/28/detect-smile-webcam-affectiva-mit-

media-lab.html.

[2] Z. Ambadar, J.F. Cohn, and L.I. Reed. All smiles are not created equal: Morphology and timing of smilesperceived as amused, polite, and embarrassed/nervous. Journal of nonverbal behavior, 33(1):17–34, 2009.

[3] T. Ambler, A. Ioannides, and S. Rose. Brands on the brain: Neuro-images of advertising. Business StrategyReview, 11(3):17–30, 2000.

[4] L.F. Barrett, K.N. Ochsner, and J.J. Gross. On the automaticity of emotion. Social psychology and the uncon-scious: The automaticity of higher mental processes, pages 173–217, 2007.

[5] A.L. Biel. Love the ad. buy the product? Admap, September, 1990.

[6] P.D. Bolls, A. Lang, and R.F. Potter. The effects of message valence and listener arousal on attention, memory,and facial muscular responses to radio advertisements. Communication Research, 28(5):627–651, 2001.

[7] J.F. Cohn, Z. Ambadar, and P. Ekman. Observer-based measurement of facial expression with the Facial ActionCoding System. Oxford: NY, 2005.

[8] R.R. Cornelius. The science of emotion: Research and tradition in the psychology of emotions. Prentice-Hall,Inc, 1996.

[9] C. Darwin, P. Ekman, and P. Prodger. The expression of the emotions in man and animals. Oxford UniversityPress, USA, 2002.

[10] P. Ekman. Facial expression and emotion. American Psychologist, 48(4):384, 1993.

[11] P. Ekman, W.V. Freisen, and S. Ancoli. Facial signs of emotional experience. Journal of Personality and SocialPsychology, 39(6):1125, 1980.

11

Page 14: Affective Storytelling Automatic Measurement of …...Affective Storytelling Automatic Measurement of Story Effectiveness from Emotional Responses Collected over the Internet Daniel

[12] P. Ekman and W.V. Friesen. Facial action coding system. 1977.

[13] W. Gordon. What do consumers do emotionally with advertising? Journal of Advertising research, 46(1), 2006.

[14] M.C. Green. Transportation into narrative worlds: The role of prior knowledge and perceived realism. DiscourseProcesses, 38(2):247–266, 2004.

[15] M.C. Green, J.J. Strange, and T.C. Brock. Narrative impact: Social and cognitive foundations. LawrenceErlbaum, 2002.

[16] H. Gunes, M. Piccardi, and M. Pantic. From the lab to the real world: Affect recognition using multiple cues andmodalities. Affective computing: focus on emotion expression, synthesis, and recognition, pages 185–218, 2008.

[17] R.I. Haley. The arf copy research validity project: Final report. In Transcript Proceedings of the Seventh AnnualARF Copy Research Workshop, 1990.

[18] R.L. Hazlett and S.Y. Hazlett. Emotional response to television commercials: Facial emg vs. self-report. Journalof Advertising Research, 39:7–24, 1999.

[19] M. E. Hoque and R.W. Picard. Acted vs. natural frustration and delight: many people smile in natural frustration.In Automatic Face & Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference on.IEEE, 2011.

[20] A. Lang. Involuntary attention and physiological arousal evoked by structural features and emotional content intv commercials. Communication Research, 17(3):275–299, 1990.

[21] M.D. Lieberman, N.I. Eisenberger, M.J. Crockett, S.M. Tom, J.H. Pfeifer, and B.M. Way. Putting feelings intowords. Psychological Science, 18(5):421, 2007.

[22] D. McDuff, R. El Kaliouby, and R. Picard. Crowdsourced data collection of facial responses. In Proceedings ofthe 13th international conference on Multimodal Interaction. ACM, 2011.

[23] D. J. McDuff, R. E. Kaliouby, and R. W. Picard. Crowdsourcing Facial Responses to Online Videos. IEEETransactions on Affective Computing, 2012.

[24] A. Mehta and S.C. Purvis. Reconsidering recall and emotion in advertising. Journal of Advertising Research,46(1):49, 2006.

[25] B. Parkinson and A.S.R. Manstead. Making sense of emotion in stories and social life. Cognition & Emotion,7(3-4):295–323, 1993.

[26] M.Z. Poh, D.J. McDuff, and R.W. Picard. Non-contact, automated cardiac pulse measurements using videoimaging and blind source separation. Optics Express, 18(10):10762–10774, 2010.

[27] M.Z. Poh, D.J. McDuff, and R.W. Picard. Advancements in noncontact, multiparameter physiological measure-ments using a webcam. Biomedical Engineering, IEEE Transactions on, 58(1):7–11, 2011.

[28] M.L. Ray and R. Batra. Emotion and persuasion in advertising: What we do and don’t know about affect.Graduate School of Business, Stanford University, 1982.

[29] K.R. Scherer and P. Ekman. Methodological issues in studying nonverbal behavior. Handbook of methods innonverbal behavior research, pages 1–44, 1982.

[30] N. Schwarz and F. Strack. Reports of subjective well-being: Judgmental processes and their methodologicalimplications. Well-being: The foundations of hedonic psychology, pages 61–84, 1999.

[31] M. Suwa, N. Sugie, and K. Fujimora. A preliminary note on pattern recognition of human emotional expression.In International Joint Conference on Pattern Recognition, pages 408–410, 1978.

[32] G.W. Taylor, I. Spiro, C. Bregler, and R. Fergus. Learning Invariance through Imitation. In Proceedings of IEEEConference on Computer Vision and Pattern Recognition, 2011.

[33] T. Teixeira, M. Wedel, and R. Pieters. Emotion-induced engagement in internet video ads. Journal of MarketingResearch, (ja):1–51, 2010.

12

Page 15: Affective Storytelling Automatic Measurement of …...Affective Storytelling Automatic Measurement of Story Effectiveness from Emotional Responses Collected over the Internet Daniel

[34] J. Whitehill, G. Littlewort, I. Fasel, M. Bartlett, and J. Movellan. Toward practical smile detection. PatternAnalysis and Machine Intelligence, IEEE Transactions on, 31(11):2106–2111, 2009.

[35] F.H. Wilhelm and P. Grossman. Emotions beyond the laboratory: Theoretical fundaments, study design, andanalytic strategies for advanced ambulatory assessment. Biological Psychology, 84(3):552–569, 2010.

[36] P. Winkielman, G.G. Berntson, and J.T. Cacioppo. The psychophysiological perspective on the social mind.Blackwell handbook of social psychology: Intraindividual processes, pages 89–108, 2001.

[37] R.B. Zajonc. Feeling and thinking: Preferences need no inferences. American psychologist, 35(2):151, 1980.

[38] Z. Zeng, M. Pantic, G.I. Roisman, and T.S. Huang. A survey of affect recognition methods: Audio, visual, andspontaneous expressions. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 31(1):39–58, 2009.

13

Page 16: Affective Storytelling Automatic Measurement of …...Affective Storytelling Automatic Measurement of Story Effectiveness from Emotional Responses Collected over the Internet Daniel

Committee BiographiesJeffrey CohnProfessor of PsychologyUniversity of Pittsburg

Jeffrey Cohn is Professor of Psychology at the University of Pittsburgh and Adjunct Facultyat the Robotics Institute at Carnegie Mellon University. He has led interdisciplinary and inter-institutional efforts to develop advanced methods of automatic analysis of facial expression andprosody; and applied those tools to research in human emotion, social development, non-verbalcommunication, psychopathology, and biomedicine. He co-chaired the 2008 IEEE InternationalConference on Automatic Face and Gesture Recognition (FG2008) and the 2009 InternationalConference on Affective Computing and Intelligent Interaction (ACII2009). He has co-editedtwo recent special issues of the Journal of Image and Vision Computing. His research has beensupported by grants from the National Institutes of Health, National Science Foundation, AutismFoundation, Office of Naval Research, Defense Advanced Research Projects Agency, and the Tech-nical Support Working Group.

Ashish KapoorSenior Research ScientistMicrosoft Research, Redmond

Ashish Kapoor is a researcher with the Adaptive Systems and Interaction Group at MicrosoftResearch, Redmond. He is focusing on Machine Learning and Computer Vision with applicationsin User Modelling, Affective Computing and Computer-Human interaction scenarios. Ashish dida PhD at the MIT Media Lab and his Doctoral thesis looked at building Discriminative Modelsfor Pattern Recognition with incomplete information (semi-supervised learning, imputation, noisydata etc.). Most of the earlier work focused on building new machine learning models for affectrecognition. A significant part of that work involved automatic analysis of non-verbal behaviorand physiological responses.

Thales TeixeiraAssistant Professor of Business AdministrationHarvard Business School

Thales Teixeira is Assistant Professor in the Marketing Department of the Harvard Business School.His research focuses on the economics of attention. He explores the rules of (implicit) transactionof attention in a marketplace in which consumer attention is a scarce resource, arguably evenscarcer than money or time. His work has also appeared in Marketing Science. He received hisPhD in Business from University of Michigan and holds a Master of Arts in Statistics (Universityof Sao Paulo, Brazil) and a Bachelor of Arts in Administration (University of Sao Paulo, Brazil).Before entering academia, he consulted for companies such as Microsoft and Hewlett-Packard. AtHarvard, he teaches an MBA course in Marketing.

14