24

M.I - Semantic Scholar · PDF file · 2015-07-28e amassed evidence that emo-tional skills are a basic comp onen t of in telli-gence, esp ecially for learning preferences and ... skills

  • Upload
    vocong

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Page 1: M.I - Semantic Scholar · PDF file · 2015-07-28e amassed evidence that emo-tional skills are a basic comp onen t of in telli-gence, esp ecially for learning preferences and ... skills

M.I.T Media Laboratory Perceptual Computing Section Technical Report No. 536To appear in IEEE Transactions on Pattern Analysis and Machine IntelligenceToward Machine Emotional Intelligence: Analysisof A�ective Physiological StateRosalind W. Picard, Elias Vyzas, and Jennifer HealeyMIT Media Laboratory20 Ames St.Cambridge, MA 01239AbstractThe ability to recognize emotion is one of thehallmarks of emotional intelligence, an aspectof human intelligence that has been argued tobe even more important than mathematical andverbal intelligences. This paper proposes thatmachine intelligence needs to include emotionalintelligence and demonstrates results towardthis goal: developing a machine's ability to rec-ognize human a�ective state given four phys-iological signals. We describe di�cult issuesunique to obtaining reliable a�ective data, andcollect a large set of data from a subject tryingto elicit and experience each of eight emotionalstates, daily, over multiple weeks. This pa-per presents and compares multiple algorithmsfor feature-based recognition of emotional statefrom this data. We analyze four physiologi-cal signals that exhibit problematic day-to-dayvariations: the features of di�erent emotions onthe same day tend to cluster more tightly thando the features of the same emotion on di�erentdays. To handle the daily variations, we pro-pose new features and algorithms and comparetheir performance. We �nd that the techniqueof seeding a Fisher Projection with the resultsof Sequential Floating Forward Search improvesthe performance of the Fisher Projection, andprovides the highest recognition rates reportedto date for classi�cation of a�ect from physiol-ogy: 81% recognition accuracy on eight classesof emotion, including neutral.1 IntroductionIt is easy to think of emotion as a luxury,something that is unnecessary for basic intel-ligent functioning and di�cult to encode in a

computer program; therefore, why bother giv-ing emotional abilities to machines? Recently,a constellation of �ndings, from neuroscience,psychology, and cognitive science, suggests thatemotion plays surprising critical roles in ratio-nal and intelligent behavior. Most people al-ready know that too much emotion is bad forrational thinking; much less well known is thatneuroscience studies of patients who essentiallyhave their emotions disconnected reveal thatthose patients have strong impairments in intel-ligent day-to-day functioning, suggesting thattoo little emotion can impair rational thinkingand behavior [1]. Apparently, emotion interactswith thinking in ways that are non-obvious butimportant for intelligent functioning. Emotion-processing brain regions have also been foundto perform pattern recognition before the in-coming signals arrive at the cortex: a rat canbe taught to fear a tone even when its auditorycortex is removed, and similar emotion-orientedprocessing is believed to take place in human vi-sion and audition [2].Scientists have amassed evidence that emo-tional skills are a basic component of intelli-gence, especially for learning preferences andadapting to what is important [3] [4]. With in-creasing deployment of adaptive computer sys-tems, e.g., software agents and video retrievalsystems that learn from users, the ability tosense and respond appropriately to user a�ec-tive feedback is of growing importance. Emo-tional intelligence consists of the ability to rec-ognize, express, and have emotions, coupledwith the ability to regulate these emotions, har-ness them for constructive purposes, and skill-fully handle the emotions of others. The skills1

Page 2: M.I - Semantic Scholar · PDF file · 2015-07-28e amassed evidence that emo-tional skills are a basic comp onen t of in telli-gence, esp ecially for learning preferences and ... skills

of emotional intelligence have been argued tobe a better predictor than IQ for measuring as-pects of success in life [4].Machines may never need all of the emo-tional skills that people need; however, thereis evidence that machines will require at leastsome of these skills to appear intelligent in in-teracting with people. A relevant theory isthat of Reeves and Nass at Stanford: human-computer interaction is inherently natural andsocial, following the basics of human-human in-teraction [5]. For example, if a piece of technol-ogy talks to you but never listens to you, then itis likely to annoy you, analogous to the situationwhere a human talks to you but never listens toyou. Nass and Reeves have conducted dozensof experiments of classical human-human in-teraction, taking out one of the humans andputting in a computer, and �nding that the ba-sic human-human results still hold.Recognizing a�ective feedback is impor-tant for intelligent human-computer interac-tion. Consider a machine learning algorithmthat has to decide when to interrupt the user. Ahuman learns this by watching how you respondwhen you are interrupted in di�erent situations:did you receive the interruption showing a neu-tral, positive, or negative response? Withoutsuch regard for your response, the human maybe seen as disrespectful, irritating, and unintel-ligent. One can predict a similar response to-ward computers that interrupt users obliviousto their positive or negative expressions. Thecomputer could potentially appear more intelli-gent by recognizing and appropriately adaptingto the user's emotion response.Although not all computers will need emo-tional skills, those that interact with and adaptto humans in real-time are likely to be perceivedas more intelligent if given such skills. The restof this paper focuses on giving a machine oneof the key skills of emotional intelligence: theability to recognize emotional information ex-pressed by a person.1.1 Human emotion recognitionHuman newborns show signs of recognizing af-fective expressions such as approval and disap-proval long before they acquire language. A�ectrecognition is believed to play an important rolein learning and in developing the ability to at-

tend to what is important, and is likely a keypart of the di�erence between normal child de-velopment and development of autistic children,who typically have impaired a�ect recognition[6]. For example, instead of attending to theparent's speech with exaggerated in ection, theautistic child might tune in to an unimportantsound, missing the guidance provided by the af-fective cues.Emotion modulates almost all modes of hu-man communication { word choice, tone ofvoice, facial expression, gestural behaviors, pos-ture, skin temperature and clamminess, respira-tion, muscle tension, and more. Emotions cansigni�cantly change the message: sometimes itis not what was said that was most important,but how it was said. Faces tend to be themost visible form of emotion communication,but they are also the most easily controlledin response to di�erent social situations whencompared to the voice and other modes of ex-pression. A�ect recognition is most likely to beaccurate when it combines: (1) multiple kindsof signals from the user with (2) informationabout the user's context, situation, goals, andpreferences. A combination of low-level pat-tern recognition, high-level reasoning and nat-ural language processing is likely to provide thebest emotion inference [7].How well will a computer have to recognizehuman emotional state to appear intelligent?Note that no human can perfectly recognizeyour innermost emotions, and sometimes peo-ple cannot even recognize their own emotions.No known mode of a�ect communication is loss-less; some aspects of internal feelings remainprivate, especially if you wish them to be thatway or if you su�ciently disguise them. What isavailable to an external recognizer is what canbe observed and reasoned about, and this al-ways with some uncertainty. Nonetheless, peo-ple recognize each other's emotions well enoughto communicate useful feedback. Our aim is togive computers recognition abilities similar tothose that people have.1.2 Importance of physiologicalemotion recognitionWhen designing intelligent machine interfaces,why not focus on facial and vocal communica-tion { aren't these the modes that people rely2

Page 3: M.I - Semantic Scholar · PDF file · 2015-07-28e amassed evidence that emo-tional skills are a basic comp onen t of in telli-gence, esp ecially for learning preferences and ... skills

upon? There are cases where such modes willbe preferable, as well as other behavior-basedmodes, such as gestural activity or time to com-plete a task. However, it is a mistake to thinkof physiology as something that people do notnaturally recognize. A stranger shaking yourhand can feel its clamminess (related to skinconductivity); a friend leaning next to you maysense your heart pounding; students can hearchanges in a professor's respiration that giveclues to stress; ultimately it is muscle tensionin the face that gives rise to facial expressions.People read many physiological signals of emo-tion.Physiological pattern recognition of emo-tion has important applications in medicine,entertainment, and human-computer interac-tion. A�ective states of depression, anxiety,and chronic anger have been shown to impedethe work of the immune system, making peoplemore vulnerable to viral infections, and slowinghealing from surgery or disease (See Chap 11of reference [4].) Physiological pattern recogni-tion can potentially aid in assessing and quan-tifying stress, anger, and other emotions thatin uence health. Certain human physiologicalpatterns show characteristic responses to mu-sic and other forms of entertainment: e.g., skinconductivity tends to climb as a piece of music\peps you up" and fall as it \calms you down."This principle was utilized in a wearable \a�ec-tive DJ" to allowmore personalized music selec-tions than the one-size-�ts-all approach of a discjockey [8]. Changes in physiological signals canalso be examined for signs of stress arising whileusers interact with technology, helping detectwhere the product causes unnecessary irritationor frustration, without having to interrupt theuser or record her appearance. This is a newarea for pattern recognition research: detectingwhen products cause user stress or aggravation,to help developers target areas for re-design andimprovement.Physiological sensing is sometimes consid-ered invasive because it involves physical con-tact with the person. However, not only is tech-nology improving with conductive rubber andfabric electrodes that are wearable, washable,and able to be incorporated in clothes and ac-cessories [9] but also there are new forms of non-contact physiological sensing on the horizon. In

some cases, physiological sensors are perceivedas less invasive than alternatives, such as video.Video almost always communicates identity, ap-pearance, and behavior, on top of emotional in-formation. Students engaged in distance learn-ing may wish to communicate to the lecturerthat they are furrowing their brow in confusionor puzzlement, but not have the lecturer knowtheir identity. They might not object to havinga small amount of muscle tension anonymouslytransmitted, whereas they may object to havingtheir appearance communicated.New wearable computers facilitate di�erentforms of sensing than traditional computers.Wearables often a�ord natural contact with thesurface of the skin; however, they do not eas-ily a�ord having a camera pointed at the user'sface. (It can be done with a snugly �tted hatand sti� brim to mount the camera for view-ing the wearer's face, a form factor that can beawkward both physically and socially.) In wear-able systems, physiological sensing may be setup so that it involves no visible or heavy awk-ward supporting mechanisms; in this case, thephysiological sensors may be less cumbersomethan video sensors.1.3 Related researchThe a�ect recognition problem is a hard onewhen you look at the few benchmarks whichexist. In general, people can recognize anemotional expression in neutral-content speechwith about 60% accuracy, choosing from amongabout six di�erent a�ective labels [10]. Com-puter algorithms match or slightly beat thisaccuracy, e.g., [11], [12]. Note that computerspeech recognition that works at about 90%accuracy on neutrally-spoken speech tends todrop to 50-60% accuracy on emotional speech[13]. Improved handling of emotion in speech isimportant for recognizing what is said as wellas how it was said.Facial expression recognition is easier forpeople, e.g., 70-98% accurate on six categoriesof facial expressions exhibited by actors [14],and the rates computers obtain range from 80-98% accuracy when recognizing 5-7 classes ofemotional expression on groups of 8-32 people[15], [16]. Other research has focused not somuch on recognizing a few categories of emo-tional expressions but on recognizing speci�c3

Page 4: M.I - Semantic Scholar · PDF file · 2015-07-28e amassed evidence that emo-tional skills are a basic comp onen t of in telli-gence, esp ecially for learning preferences and ... skills

facial actions|the fundamental muscle move-ments that comprise Paul Ekman's Facial Ac-tion Coding System|which can be combinedto describe all facial expressions. Recognizershave already been built for a handful of thefacial actions [17] [18] [19] [20] and the auto-mated recognizers have been shown to performcomparably to humans trained in recognizingfacial actions [18]. These facial actions are es-sentially facial phonemes, which can be assem-bled to form facial expressions. There are alsorecent e�orts that indicate that combining au-dio and video signals for emotion recognitioncan give improved results [21] [22] [23].Although the progress in facial, vocal, andcombined facial/vocal expression recognition ispromising, all of the results above are on pre-segmented data of a small set of sometimesexaggerated expressions, or on a small subsetof hand-marked singly-occurring facial actions.The state of the art in a�ect recognition is simi-lar to that of speech recognition several decadesago when the computer could classify the care-fully articulated digits, \0, 1, 2, ..., 9," spokenwith pauses in between, but could not accu-rately detect these digits in the many ways theyare spoken in larger continuous conversations.Emotion recognition research is also hardbecause understanding emotion is hard; afterover a century of research, emotion theoristsstill do not agree upon what emotions are andhow they are communicated. One of the bigquestions in emotion theory is whether distinctphysiological patterns accompany each emotion[24]. The physiological muscle movements com-prising what looks to an outsider to be a facialexpression may not always correspond to a realunderlying emotional state. Emotion consistsof more than its outward physical expression; italso consists of internal feelings and thoughts,as well as other internal processes of which theperson having the emotion may not be aware.The relation between internal bodily feelingsand externally observable expression is still anopen research area, with a history of contro-versy. Historically, William James was the ma-jor proponent of emotion as an experience ofbodily changes, such as your heart pounding oryour hands perspiring [25]. This view was chal-lenged by Cannon [26] and again by Schachterand Singer who argued that the experience of

physiological changes was not su�cient to dis-criminate emotions. Schachter and Singer's ex-periments showed that if a bodily arousal statewas induced, then subjects could be put intotwo distinct moods simply by being put in twodi�erent situations. They argued that physio-logical responses such as sweaty palms and arapid heart beat inform our brain that we arearoused, and that then the brain must appraisethe situation we are in before it can label thestate with an emotion such as fear or love [27].Since the classic work of Schachter andSinger, there has been a debate about whetheror not emotions are accompanied by speci�cphysiological changes other than simply arousallevel. Ekman et al. [28] and Winton et al. [29]provided some of the �rst �ndings showing sig-ni�cant di�erences in autonomic nervous sys-tem signals according to a small number ofemotional categories or dimensions, but therewas no exploration of automated classi�cation.Fridlund and Izard [30] appear to have beenthe �rst to apply pattern recognition (lineardiscriminants) to classi�cation of emotion fromphysiological features, attaining rates of 38%-51% accuracy (via cross-validation) on subject-dependent classi�cation of four di�erent facialexpressions (happy, sad, anger, fear) given fourfacial electromyogram signals. Although thereare over a dozen published e�orts aimed at�nding physiological correlates when examin-ing small sets of emotions (from 2-7 emotionsaccording to a recent overview [31]), most havefocused on t-test or analysis of variance com-parisons, combining data over many subjects,where each was measured for a relatively smallamount of time (seconds or minutes). Rela-tively few of the studies have included neutralcontrol states where the subject relaxed andpassed time feeling no speci�c emotion and noneto our knowledge have collected data from aperson repeatedly, over many weeks, where dis-parate sources of noise enter the data. Few ef-forts beyond Fridlund's have employed lineardiscriminants and we know of none that haveapplied more sophisticated pattern recognitionto physiological features.The work in this paper is novel in tryingto classify physiological patterns for a set ofeight emotions (including neutral), in applyingpattern recognition techniques beyond that of4

Page 5: M.I - Semantic Scholar · PDF file · 2015-07-28e amassed evidence that emo-tional skills are a basic comp onen t of in telli-gence, esp ecially for learning preferences and ... skills

simple discriminants to the problem (we usenew features, feature selection, spatial transfor-mations of features, and combinations of thesemethods), and in focusing on \felt" emotions ofa single subject gathered over sessions spanningmany weeks. The results we obtain are also in-dependent of psychological debates on the uni-versality of emotion categories [32], focusing in-stead on user-de�ned emotion categories.The contributions of this paper include notonly a new means for pattern analysis of a�ec-tive states from physiology, but also the �nd-ing of signi�cant classi�cation rates from phys-iological patterns corresponding to eight a�ec-tive states measured from a subject over manyweeks of data. Our results also reveal sig-ni�cant discrimination among both most com-monly described dimensions of emotion: va-lence and arousal. We show that the day-to-day variations in physiological signals are large,even when the same emotion is expressed, andthat this e�ect undermines recognition accuracyif it is not appropriately handled. This paperproposes and compares techniques for handlingday-to-day variation and presents new resultsin a�ect recognition based on physiology. Theresults lie between the rates obtained for expres-sion recognition from vocal and facial features,and are the highest reported to date for classi-fying eight emotional states given physiologicalpatterns.2 Gathering Good A�ective DataIn computer vision or speech recognition, it hasbecome easy to gather meaningful data; frame-grabbers, microphones, cameras, and digitizersare reliable, easy to use, and the integrity of thedata can be seen or heard by non-specialists;however, non-specialists do not usually knowwhat comprises a good physiological signal. Al-though people recognize emotional informationfrom physiology, it is not natural to do so bylooking at 1-D signal waveforms. Not only doesit take e�ort to learn what a good signal is,but the sensing systems (A/D converters andbu�ering-data capture systems) for physiologydo not seem to be as reliable as those for videoand audio. Factors such as whether or not thesubject just washed her hands, how much gelshe applied under an electrode, motion arti-facts, and precisely where the sensor was placed,

all a�ect the readings. These are some of thetechnical factors that contribute to the di�cultyin gathering accurate physiological data.Although dealing with the recording devicescan be tricky, a much harder problem is thatof obtaining the ground truth of the data, orgetting data that genuinely corresponds to aparticular emotional state. In vision or speechresearch the subject matter is often objective:scene depth, words spoken, etc. In those cases,the ground-truth labels for the data are eas-ily obtained. Easy-to-label data is sometimesobtained in emotion research when a singularstrong emotion is captured, such as an episodeof rage. However, more often the ground truth{ which emotion was present { is di�cult toestablish.Consider a task where a person uses thecomputer to retrieve images. Suppose our jobis to analyze physiological signals of the user ashe or she encounters pleasing or irritating fea-tures or content within the system. We wishto label the data according to the emotionalstate of the user (ground truth). Here, the prob-lem is complicated because there is little way ofknowing whether the person was truly pleasedor irritated when encountering the stimuli in-tended to induce these states. We may havetried to please, but the user was irritated be-cause of something she remembered unrelatedto the task. We may have tried to irritate, andnot succeeded. If we ask the person how she felt,her answer can vary according to her awarenessof her feelings, her comfort in talking about feel-ings, her rapport with the administrator(s) ofthe experiment, and more. When you ask isalso important { soon and often is likely to bemore accurate, but also more irritating, therebychanging the emotional state. Thus, measure-ment of ground truth disturbs the state of thattruth. When it comes to faces or voices, we cansee if the person was smiling or hear if her voicesounded cheerful, but that still does not meanthat she was happy. With physiology, little isknown about how emotions make their impact,but the signals are also potentially more sincereexpressions of the user's state since they tendto be less mediated by cognitive and social in- uences.5

Page 6: M.I - Semantic Scholar · PDF file · 2015-07-28e amassed evidence that emo-tional skills are a basic comp onen t of in telli-gence, esp ecially for learning preferences and ... skills

2.1 Five factors in eliciting emotionIn beginning this research, we thought it wouldbe simple to ask somebody to feel and expressan emotion, and to record data during suchepisodes. Indeed, this is the approach takenin most facial and vocal expression studies todate: turn the camera and microphone on, askthe subject to express joy, anger, etc, record it,and label it by what was requested. However,obtaining high quality physiological data for af-fect analysis requires attention to experimentaldesign issues not traditionally required in pat-tern recognition research. Here we outline �ve(not necessarily independent) factors that in u-ence data collection, to serve as a useful guidefor researchers trying to obtain a�ect data. Wesummarize the factors by listing their extremeconditions, but there are also in-between condi-tions:1. subject-elicited vs. event-elicited: Doessubject purposefully elicit emotion or is itelicited by a stimulus or situation outsidethe subject's e�orts?2. lab setting vs. real-world: Is subject in alab or in a special room that is not theirusual environment?3. expression vs. feeling: Is the emphasis onexternal expression or on internal feeling?4. open-recording vs. hidden-recording: Doessubject know that anything is beingrecorded?11When the presence of the camera or other recordingdevice is hidden, it is ethically necessary (via the guide-lines of the MIT Committee on the Use of Humans asExperimental Subjects) to debrief the subject, let themknow why the secrecy was necessary, tell them what sig-nals have been recorded, and obtain their permission fordata analysis, destroying the data if the subject with-holds permission. Unlike with video, it is not yet possi-ble to record a collection of physiological signals withoutthe subject being aware of the sensors in some form (al-though this is changing with new technology.) However,subjects can be led to believe that their physiology is be-ing sensed for some reason other than emotions, which isoften an important deception since a subject who knowsyou are trying to make them frustrated may thereforenot get frustrated. We have found that such deceptionis acceptable to almost all subjects when properly con-ducted. Nonetheless, we believe deception should notbe used unless it is necessary, and then only in accordwith ethical guidelines.

5. emotion-purpose vs. other-purpose: Doessubject know that the experiment is aboutemotion?The most natural set-up for gathering gen-uine emotions is opportunistic: the subject'semotion occurs as a consequence of personallysigni�cant circumstances (event-elicited); it oc-curs while they are in some natural location forthem (real-world); the subject feels the emotioninternally (feeling); subject behavior, includ-ing expression, is not in uenced by knowledgeof being in an experiment or being recorded(hidden-recording, other-purpose). Such datasets are usually impossible to get because ofprivacy and ethics concerns, but as recordingdevices are increasingly prevalent, people maycease to be aware of them, and the data cap-tured by these devices can be as if the de-vices were hidden. Researchers may try to cre-ate such opportunistic situations; for example,showing an emotion-eliciting �lm in a theaterwithout telling subjects the true purpose of theshowing, and without telling them that theyhave been videotaped until afterward. Evenso, an emotion-inducing movie scene will af-fect some viewers more than others, and somemaybe not at all.The opportunistic situation contrasts withthe experiments typically used to gather datafor facial or vocal expression recognition sys-tems. The common set-up is one in whichdata are gathered of a subject-elicited externalexpression in a lab setting, in front of a visi-ble open-recording camera or microphone, withknowledge that the data will be used for anal-ysis of emotion (emotion-purpose). Such ex-pressions, made with or without correspondinginternal feelings, are \real" and important torecognize, even if not accompanied by true feel-ings. Social communication involves both unfeltemotions (emphasizing expression) and moregenuine ones (emphasizing feeling.) A proto-col for gathering data may emphasize externalexpression or internal feeling; both are impor-tant. Moreover, there is considerable overlapsince physical expression can help induce theinternal feeling, and vice-versa.In this paper, we gathered real data fol-lowing a subject-elicited, close to real-world(subject's comfortable usual workplace), feel-6

Page 7: M.I - Semantic Scholar · PDF file · 2015-07-28e amassed evidence that emo-tional skills are a basic comp onen t of in telli-gence, esp ecially for learning preferences and ... skills

ing, open-recording, emotion-purpose methodol-ogy. The key one of these factors that makes ourdata unique is that the subject tried to elicit aninternal feeling of each emotion.2.2 Single-subject multiple-day datacollectionThe data we gather is from a single subject overmany weeks of time, standing in contrast to ef-forts that examine many subjects over a shortrecording interval (usually single session on onlyone day). Although the scope is limited to onesubject, the amount of data for this subject en-compasses a larger set than has traditionallybeen used in a�ect recognition studies involv-ing multiple subjects. The data are potentiallyuseful for many kinds of analysis, and will bemade available for research purposes.There are many reasons to focus on person-dependent recognition in the early stages ofa�ect recognition, even though some forms ofemotion communication are not only person-independent, but have been argued, namelyby Paul Ekman, to be basic across di�erentcultures. Ekman and colleagues acknowledgethat even simply labeled emotions like \joy"and \anger" can have di�erent interpretationsacross individuals within the same culture; thiscomplicates the quest to see if subjects elicitsimilar physiological patterns for the same emo-tion. When lots of subjects have been exam-ined over a short amount of time, researchershave had di�culty �nding signi�cant physio-logical patterns, which may be in part becausephysiology can vary subtly with how each indi-vidual interprets each emotion. By using onesubject, who tried to focus on the same per-sonal interpretation of the emotion in each ses-sion, we hoped to maximize the chance of get-ting consistent interpretations for each emotion.This also means that the expressive data canbe expected to di�er for another subject, sincethe way another subject interprets and reactsto the emotions may di�er. Hence, a weaknessof this approach is that the precise features andrecognition results we obtain with this data maynot be the same for other subjects. However,the methodology for gathering and analyzingthe data in this paper is not dependent on thesubject; the approach described in this paper isgeneral.

Emotion recognition is in an early stage ofresearch, similar to early research on speechrecognition, where it is valuable to developperson-dependent methods. For personal com-puting applications, we desire the machine tolearn an individual's patterns, and not justsome average response formed across a group,which may not apply to the individual.The subject in our experiments was ahealthy graduate student with two years actingexperience plus training in visualization, whowas willing to devote over six weeks to data col-lection. The subject sat in her quiet workspaceearly each day, at roughly the same time of day,and tried to experience eight a�ective stateswith the aid of a computer controlled prompt-ing system, the \Sentograph," developed byManfred Clynes [33], and a set of personally-signi�cant imagery she developed to help elicitthe emotional state.The Clynes protocol for eliciting emotionhas three features that contribute to helping thesubject feel the emotions and that make it ap-propriate for physiological data collection: 1. itsequences eight emotions in a way that suppos-edly makes it easier for many people to tran-sition from emotion to emotion, 2. it engagesphysical expression { asking the subject to pusha �nger against a button with a dual axis pres-sure sensor in an expressive way{but in a waythat limits motion artifacts being introduced tothe physiological signals. Physical expressiongives somatosensory feedback to the subject, aprocess that can help focus and strengthen thefeeling of the emotion [34]. 3. it prompts thesubject to repeatedly express the same emotionduring an approximately 3 minute interval, ata rate dependent on the emotion in order to tryto intensify the emotional experience [33].The order of expression of the eight states:no emotion, anger, hate, grief, platonic love,romantic love, joy and reverence was found byClynes to help subjects reliably feel each emo-tion; for example, it would be harder on mostsubjects to have to transition between the posi-tive and negative states, e.g., joy, hate, platoniclove, grief, etc. However, because interpreta-tions for each of the emotions may vary witheach individual, we do not expect this order tobe optimal for everyone.Descriptive guidelines on the meaning of7

Page 8: M.I - Semantic Scholar · PDF file · 2015-07-28e amassed evidence that emo-tional skills are a basic comp onen t of in telli-gence, esp ecially for learning preferences and ... skills

Emotion Imagery Description Arousal Valence(N)o emotion blank paper, typewriter boredom, vacancy low neut.(A)nger people who arouse rage desire to �ght very high very neg.(H)ate injustice, cruelty passive anger low neg.(G)rief deformed child, loss of mother loss, sadness high neg.(P)latonic love family, summer happiness, peace low pos.Romantic (L)ove romantic encounters excitement, lust very high pos.(J)oy The music \Ode to Joy" uplifting happiness med. high pos.(R)everence church, prayer calm, peace very low neut.Table 1: The subject's descriptions of imagery and emotional character used for each of the eightemotions.each emotion were developed by the subject be-fore the experiment. The subject reported theimages she used to induce each state, the de-gree to which she found each experience arous-ing (exciting, distressing, disturbing, tranquil),and the degree to which she felt the emotionwas positive or negative (valence) (See Table 1).Daily ratings varied in intensity{both up anddown, with no particular trend{but the overallcharacter of each state was consistent over theweeks of data collection.The eight emotions in this study di�er fromthose typically explored. Although there is nowidespread agreement on the de�nition and ex-istence of \basic" emotions and which nameswould comprise such a list, researchers on fa-cial expressions tend to focus on anger, fear,joy, and sadness with disgust and surprise of-ten examined as well as contempt, acceptance,and anticipation (e.g., [32]). Theorists still donot agree on what an emotion is, and manyof them do not consider love and surprise tobe emotions. Our work is less concerned with�nding a set of \basic" emotions, and more con-cerned with giving computers the ability to rec-ognize whatever a�ective states might be rele-vant in a personalized human-computer interac-tion. The ideal states for a computer to recog-nize will depend on the application. For exam-ple, in a learning-tutor application, detectingexpressions of curiosity, boredom, and frustra-tion may be more relevant than detecting emo-tions on the theorists' \basic" lists.Clynes' set of eight was motivated by con-sidering emotions that have been communicatedthrough centuries of musical performance onseveral continents. We started with his set not

because we think it is the best for computer-human interaction (such a set is likely to varywith computer applications { entertainment,business, socializing, etc.), but rather becausethis set together with its method for elicitationhad shown an ability to help subjects reliablyfeel the emotions, and had shown repeatablesigns of physical di�erentiation in how subjects'�nger pressure, applied to a �nger rest, di�erswith each emotion [33], [35], a measurable out-come that suggests di�erent states were beingachieved. It was important to our investment inlong-term data collection that we have a reliablemethod of helping the user repeatedly generatedistinct emotional states.For the purposes of this research, the speci�cemotions and their de�nitions are not as impor-tant as the fact that (1) the subject could relateto the named emotion in a consistent, speci�c,and personal way, and (2) the emotion cate-gories span a range of high and low arousal andpositive and negative valence. These two di-mensions are believed to be the most importantdimensions for categorizing emotions [36], andcontinue to be used for describing emotions thatarise in many contexts, including recent e�ortsto categorize emotions arising when people lookat imagery [37]. The arousal axis ranges fromcalm and peaceful to active and excited, whilethe valence axis ranges from negative (displeas-ing) to positive (pleasing).2.3 Experimental method andconstruction of data setsData were gathered from four sensors: atriode electromyogram (E) measuring facialmuscle tension along the masseter (with Ag-8

Page 9: M.I - Semantic Scholar · PDF file · 2015-07-28e amassed evidence that emo-tional skills are a basic comp onen t of in telli-gence, esp ecially for learning preferences and ... skills

0 500 1000 1500 200020

40

60

0 500 1000 1500 20005

10

0 500 1000 1500 200050

55

60

0 500 1000 1500 20000

50

Grief

0 500 1000 1500 20000

50

EM

G

Anger

0 500 1000 1500 200020

40

60

BV

P

0 500 1000 1500 20005

10

SC

0 500 1000 1500 200050

55

60

Res

pira

tionFigure 1: Examples of physiological signalsmeasured from a user while she intentionally ex-pressed anger (left) and grief (right). From topto bottom: electromyogram (microvolts), bloodvolume pressure (percent re ectance), skin con-ductivity (microSiemens), and respiration (per-cent maximumexpansion). Each box shows 100seconds of response. The segments shown hereare visibly di�erent for the two emotions, whichwas not true in general.AgCl electrodes of size 11mm each, and 10-20 high-conductivity gel), a photoplethysmyo-graph measuring blood volume pressure (B)placed on the tip of the ring �nger of the lefthand, a skin conductance (S) sensor measuringelectrodermal activity from the middle of thethree segments of the index and middle �ngerson the palm-side of the left hand (with 11mmAg-AgCl electrodes and K-Y Jelly used for low-conductivity gel) and a Hall e�ect respirationsensor (R) placed around the diaphragm. Theleft hand was held still throughout data collec-tion, and the subject was seated and relativelymotionless except for small pressure changesshe applied with her right hand to the �ngerrest. Sensors and sampling were provided bythe Thought Technologies ProComp unit, cho-sen because the unit is small enough to attach toa wearable computer and o�ers eight opticallyisolated channels for recording. Signals weresampled at 20 Hz.2 The ProComp automati-2The electromyogramis the only signal for which thissampling rate should have caused aliasing. However,our investigation of the signal showed that it registered

cally computed the heart rate (H) as a functionof the inter-beat intervals of the blood volumepressure, B. More details on this system and onour methodology are available [38].Each day's session lasted around 25 min-utes, resulting in around 28 to 33 thousand sam-ples per physiological signal, with each di�erentemotion segment being around 2 to 5 thousandsamples long, due to the variation built intothe Clynes method of eliciting the emotionalstates [33]. Eight signal segments of the rawdata (2000 samples each) from Data Set I areshown in Figure 1. On roughly a third of thethirty days for which we collected data, eitherone or more sensors failed during some portionof the 25-minute experiment because an elec-trode came loose, or one or more channels failedto sample and save some of the data properly.From the complete or nearly-complete sessions,we constructed two overlapping Data Sets.Data Set I was assembled before the thirtydays were over, and was formed as follows: seg-ments of length 2000 samples (100 seconds) ofdata were taken from each of the signals E ;B;Gand R for each of the eight emotions, on eachof nineteen days where there were no failuresin these segments of data collection. The 2000samples were taken from the end of each emo-tion segment to avoid the transitional onsetwhere the subject was prompted to move to thenext emotion. A twentieth day's data set wascreated out of a combination of partial recordsin which some of the sensors had failed.Data Set II is a larger data set comprised of20 days in which the sensors did not fail duringany part of the experiment for the �ve signals:E ;B;G;R and H. These 20 days included 16 ofthe original days from Data Set I, but for all20 days we used all of the samples available foreach emotion, thereby including the transitionalregions. Because di�erent emotions lasted fordi�erent lengths of time, the 2000 samples inData Set I were at times closer or farther awayfrom the beginning of an emotion segment. Toavoid this bias, and to maximize data availablefor training and testing, Data Set II includesall the samples for all the emotions and signalsover twenty days, roughly 2000 to 5000 samplesa clear response when the jaw was clenched vs. relaxed;thus, it was satisfactory for gathering coarse muscle ten-sion information.9

Page 10: M.I - Semantic Scholar · PDF file · 2015-07-28e amassed evidence that emo-tional skills are a basic comp onen t of in telli-gence, esp ecially for learning preferences and ... skills

per emotion per signal per day. With an aver-age of twice the number of samples (or minutesof data) as Data Set I, Data Set II resulted inan average 10% gain in performance when wecompared it to Data Set I across all the meth-ods.3 Feature Extraction, Selection,and TransformationBecause the signals involved have di�erent andcomplex sources, because there are not yet goodmodels to describe them, and because of aninterest in seeing how some classical methodsperform before an extensive modeling e�ort islaunched, we choose in this paper to explore afeature-based approach to classi�cation.3.1 Proposed feature setsThe psychophysiology and emotion literaturecontains several e�orts to identify featuresof bodily changes (facial muscle movements,heart-rate variations, etc.) that might correlatewith having an emotion (e.g., [31]). We gather avariety of features, some from the literature andsome that we propose. Several that we proposeare physically motivated, intended to capturethe underlying nature of speci�c signals, such asthe way respiration is quasi-periodic, while oth-ers are simple statistics and nonlinear combina-tions thereof. We do not expect the best classi-�er to require all the features proposed below,or even to require such a huge number of fea-tures. Our e�ort is to advance the state of theart in pattern recognition of a�ect from physiol-ogy by proposing a large space of reasonable fea-tures and systematically evaluating subsets ofit and transformations thereof. Below are pre-sented six statistical features, followed by tenmore physically-motivated features aimed atcompensating for day-to-day variations. Whichfeatures were found to be most useful (as a func-tion of each of the classi�ers) will be summa-rized later in Table 9.The six statistical features can be computedfor each of the signals as follows. Let the signalfE ;B;G;R;Hg from any one of the eight emo-tion segments be designated by X. The signalis gathered for 8 di�erent emotions each day, for20 days. Let Xn represent the value of the nthsample of the raw signal, where n = 1; : : : ; N ,with N = 2000 for Data Set I, and with N in

the range of 2000 to 5000 for Data Set II. Let~Xn refer to the normalized signal (zero mean,unit variance):~Xn = Xn � �X�X i = 1; :::; 4where �X and �X are the means and standarddeviations of X as explained below. Followingare six statistical features we investigated:1. the means of the raw signals�X = 1N NXn=1Xn (1)2. the standard deviations of the raw signals�X = 1N � 1 NXn=1 (Xn � �X )2!1=2 (2)3. the means of the absolute values of the�rst di�erences of the raw signals�X = 1N � 1 N�1Xn=1 jXn+1 �Xnj (3)4. the means of the absolute values of the�rst di�erences of the normalized signals~�X = 1N � 1 N�1Xn=1 ��� ~Xn+1 � ~Xn��� = �X�X (4)5. the means of the absolute values of thesecond di�erences of the raw signals X = 1N � 2 N�2Xn=1 jXn+2 �Xnj (5)6. the means of the absolute values of thesecond di�erences of the normalized sig-nals~ X = 1N � 2 N�2Xn=1 ��� ~Xn+2 � ~Xn��� = X�X (6)10

Page 11: M.I - Semantic Scholar · PDF file · 2015-07-28e amassed evidence that emo-tional skills are a basic comp onen t of in telli-gence, esp ecially for learning preferences and ... skills

The features (1)-(6) were chosen to coverand extend a range of typically measured statis-tics in the emotion physiology literature[39].(Means and variances are already commonlycomputed; the �rst di�erence approximates agradient.) Note that not all the features areindependent; in particular, ~�X and ~ X are non-linear combinations of other features. Also, theheart rate signal, H, is derived from the bloodvolume pressure signal, B, by a nonlinear trans-formation performed automatically by the Pro-Comp sensing system. It did not require an ad-ditional sensor, and so was dependent upon B.The dependencies are not linear; consequently,they are not obtainable by the linear combina-tion methods used later to reduce the dimen-sionality of the feature space and can poten-tially be of use in �nding a good transformationfor separating the classes. The comparisons be-low will verify this.One advantage of the features in (1)-(6) isthat they can easily be computed in an on-line way [40], which makes them advantageousfor real-time recognition systems. However, thestatistical features do not exploit knowledge wemay have about the physical sources of thesignals, and provide no special normalizationfor day-to-day variations in the signals. Fac-tors such as hand washing, gel application, andsensor placement can easily a�ect the statis-tics. These in uences combine with the sub-ject's daily mood and with other cognitive andbodily in uences in presently unknown ways,making them hard to model.In an e�ort to compensate for some ofthe non-emotion-related variations of the sig-nals, and to include more physically-motivatedknowledge about the signal (such as underlyingperiodic excitations), we also compute and eval-uate another set of ten physiology-dependentfeatures, f1 � f10, described below:From the inter-beat intervals of the bloodvolume pressure waveform, the Procomp com-putes the heart rate, H, which is approximately1=the inter-beat intervals. We applied a 500-point (25 sec) Hanning window, h, [41] to forma smoothed heart-beat rate, b = H � h, thentook the mean: f1 = 1N NXn=1 bn: (7)

The average acceleration or deceleration of theheart beat rate was calculated by taking themean of the �rst di�erence:f2 = 1N � 1 N�1Xn=1 (bn+1 � bn) = 1N � 1(bN � b1):(8)The skin conductivity signal, S containshigh frequency uctuations that may be noise;these uctuations are reduced by convolvingwith h, a 25 sec Hanning window, to forms = S � h. We also use a form of contrastnormalization to account for baseline uctua-tions; this measure was proposed by Rose in1966 [42] and found to be valuable over years ofpsychophysiology [43], [44], where max(g) andmin(g) are with respect to the whole day's data(for all emotions that day):f3 = smin(s)max(s) �min(s) : (9)The mean of the �rst di�erence of the smoothedskin conductivity is also proposed:f4 = �(sn+1�sn) = 1N � 1(sN � s1): (10)The respiration sensor measured expansionand contraction of the chest cavity using a Halle�ect sensor attached around the chest with avelcro band. Let Nd be the number of samplescollected that day. To account for variationsin the initial tightness of the sensor placementfrom day to day, we formed the mean of thewhole day's respiration data:�R;day = 1Nd NdXn=1Rn; (11)and then subtracted this to get r = R��R;day.Two respiration features were then formed as:f5 = 1N NXn=1 rn (12)and f6 = 1N � 1 NXn=1�rn � �R;day�2 : (13)The four features f7 - f10 represent frequencyinformation in the respiration signal. Each11

Page 12: M.I - Semantic Scholar · PDF file · 2015-07-28e amassed evidence that emo-tional skills are a basic comp onen t of in telli-gence, esp ecially for learning preferences and ... skills

0 0.1 0.2 0.3 0.40

0.05N

o E

mot

ion

0 0.1 0.2 0.3 0.40

0.05

Ang

er

0 0.1 0.2 0.3 0.40

0.05

Hat

e

0 0.1 0.2 0.3 0.40

0.05

Grie

f0 0.1 0.2 0.3 0.4

0

0.05

Love

0 0.1 0.2 0.3 0.40

0.05

R. L

ove

0 0.1 0.2 0.3 0.40

0.05

Joy

Hz0 0.1 0.2 0.3 0.4

0

0.05

Rev

Hz

Power Spectral Density Features for Respiration

Figure 2: Power Spectral Density from 0.0-0.4Hz in the respiration signal for eight emotions.The heights of the four bins shown here wereused as features f7 � f10.was computed using the power spectral densityfunction (PSD command) of Matlab, which usesWelch's averaged periodogram. The features,illustrated in Figure 2, represent the averageenergy in each of the �rst four 0.1 Hz bands ofthe power spectral density range 0.0-0.4 Hz.3.2 Selecting and transforming featuresWe �rst compare two techniques that have ap-peared in the literature to establish a bench-mark: Sequential Floating Forward Search(SFFS), and Fisher Projection (FP). Next, wepropose and compare a new combination ofthese, which we label SFFS-FP. This combina-tion is motivated by the dual roles of the twomethods: SFFS selects from a set of features,while Fisher Projection linearly transforms a setof features. Since feature selection is nonlinear,the cascade of these two methods should pro-vide a more powerful combination than eitheralone. This was con�rmed in our experiments(below).The Sequential Float-ing Forward Search (SFFS) method [45] ischosen because of its consistent success in previ-ous evaluations of feature selection algorithms,where it has been shown to outperform meth-ods such as Sequential Forward Search (SFS),Sequential Backward Search (SBS), General-

ized SFS and SBS, and Max-Min in compari-son studies [46]. Of course the performance ofSFFS is data dependent and the data set hereis new and di�erent; SFFS may not be the bestmethod to use. Nonetheless, because of its welldocumented success in other pattern recogni-tion problems, it will help establish a bench-mark for this new application area. The SFFSmethod takes as input the values of n features.It then does a non-exhaustive search on the fea-ture space by iteratively including and omittingfeatures. It outputs one subset ofm features foreach m, 2 � m � n, together with its classi�ca-tion rate. The algorithm is described in detailin [47].Fisher projection (FP) [48] is a well-known method of reducing dimensionality by�nding a linear projection of the data to a spaceof fewer dimensions where the classes are wellseparated. Due to the nature of the Fisher pro-jection method, the data can only be projecteddown to c�1 (or fewer if one wants) dimensions,assuming that originally there are more thanc�1 dimensions and c is the number of classes.If the amount of training data is inadequate, orif the quality of some of the features is question-able, then some of the dimensions of the Fisherprojection may be a result of noise rather thana result of di�erences among the classes. In thiscase, Fisher might �nd a meaningless projectionwhich reduces the error in the training data butperforms poorly in the testing data. For thisreason, we not only separate training and test-ing data, but we also evaluate projections downto fewer than c�1 dimensions. Note that if thenumber of features n is smaller than the numberof classes c, the Fisher projection is meaningfulonly up to at most n�1 dimensions. Thereforethe number of Fisher projection dimensions dis 1 � d � min(n; c)� 1, e.g., when 24 featuresare used on all 8 classes, all d = [1; 7] are tried,and when 4 features are used on 8 classes, alld = [1; 3] are tried.AHybrid SFFS with Fisher Projection(SFFS-FP)method is proposed, implemented,and evaluated here for comparison. As men-tioned above, the SFFS algorithm proposes onesubset of m features for each m, 2 � m � n.It selects, but does not transform the features.Instead of feeding the Fisher algorithm with allpossible features, we use the subsets that the12

Page 13: M.I - Semantic Scholar · PDF file · 2015-07-28e amassed evidence that emo-tional skills are a basic comp onen t of in telli-gence, esp ecially for learning preferences and ... skills

SFFS algorithm proposes as input to the FisherAlgorithm. Note that the SFFS method is usedhere as a preprocessor for reducing the numberof features fed into the Fisher algorithm, andnot as a classi�cation method.4 Classi�cationThis section describes and compares the resultsof a set of classi�cation experiments leading upto the best results (81%) shown in Tables 6and 7.The SFFS software employs a k-nearest-neighbor (k-NN) classi�er [49], so that it notonly outputs the best set of features accordingto this classi�er, but their classi�cation accu-racy as well. We used the k-NN classi�er forbenchmarking the SFFS method, following themethodology of Jain and Zongker [46]. For FPand SFFS-FP we used a MAP classi�er, withdetails below.In all three comparisons: SFFS, FP, andSFFS-FP, we used the leave-one-out method forcross validation because of the relatively smallamount of data available and the high dimen-sional feature spaces. In each case, the datapoint (vector of features for one day's data forone emotion) to be classi�ed was excluded fromthe data set before the SFFS, FP, or SFFS-FPwas run. The best set of features or best trans-form (determined from the training set) wasthen applied to the test data point to determineclassi�cation accuracy.For each k, where we varied 1 � k � 20, theSFFS algorithm output one set of m featuresfor each 2 � m � n. For SFFS-FP, we com-puted all possible Fisher projections for each ofthese feature sets. For both the FP and SFFS-FP methods we then �t Gaussians to the datain the reduced d-dimensional feature space andapplied Maximum a Posteriori (MAP) classi�-cation to the features in the Fisher-transformedspace. The speci�c algorithm was:1. The data point to be classi�ed (the testingset only includes one point) is excludedfrom the data set. The remaining dataset is used as the training set.2. The Fisher projection matrix (used in ei-ther FP or SFFS-FP) is calculated fromonly the training set. Then both the

training and testing set are projecteddown to the d dimensions found by Fisher.3. The data in the d-dimensional space isassumed to be Gaussian. The respec-tive means and covariance matrices of theclasses are estimated from the trainingdata.4. The posterior probability of the datapoint is calculated.5. The data point is then classi�ed as comingfrom the class with the highest posteriorprobability.4.1 Initial results - Data Set IOur �rst comparison was made on Data Set Iusing (1)-(6) computed on the four raw physio-logical signals, yielding 24 features: �X , �X ,�X , ~�X , X , ~ X , X 2 (E ;B;G;R). The re-sults of SFFS, Fisher, and SFFS-FP are shownin Table 2. Table 2 shows the results ap-plied to all eight emotions as well as the re-sults applied to all sets of C emotions, whereC = 3; 4; 5. The best-recognized emotionswere (N)eutral, (A)nger, (G)rief, (J)oy, and(R)everence. The states of (H)ate and thestates of love: (P)latonic Love, and Romantic(L)ove, were not well discriminated by any ofthe classi�ers.How good are these results { are the dif-ferences between the classi�ers signi�cant? Foreach pair of results here and through the restof the paper, we computed the probability thatthe lower error rate really is lower, treating errorrate over the 160 (or fewer) trials as a Bernoullirandom variable. For example, (last row of Ta-ble 2) we compared the 20 errors made by Fisherto the 10 errors made by SFFS-FP, over the 60trials for the three emotions (AJR). The con�-dence that the performance was improved was98%. Although the performance improvementsof SFFS-FP over Fisher in the other rows rangefrom 5 to over 8 percentage points, the con�-dences range from 77% to 87%; thus the per-formance improvement of SFFS-FP cannot besaid to be statistically signi�cant in these cases,given that 90-95% con�dence is usually requiredfor that statement. Nonetheless, all the classi-�ers in this table provide statistically signi�cant13

Page 14: M.I - Semantic Scholar · PDF file · 2015-07-28e amassed evidence that emo-tional skills are a basic comp onen t of in telli-gence, esp ecially for learning preferences and ... skills

Number of Random SFFS Fisher SFFS-FPEmotions Guessing (%) (%) (%) (%)8 12.5 40.6 40.0 46.35 (NAGJR) 20.0 64.0 60.0 65.04 (NAGR) 25.0 70.0 61.3 68.74 (AGJR) 25.0 72.5 60.0 67.53 (AGR) 33.3 83.3 71.7 80.03 (AJR) 33.3 88.3 66.7 83.3Table 2: Classi�cation accuracy for three methods applied to Data Set I, starting with the 24 features�X , �X , �X , ~�X , X , ~ X , X 2 (E ;B;G;R). The best-recognized emotions are denoted by their �rstinitial: (N)eutral, (A)nger, (G)rief, (J)oy, (R)everence.improved performance over a random classi�er(con�dence > 99:99%).Table 3 shows the number of dimensionsand the number of features, respectively, thatgave the best results. From looking at thesenumbers, we can conclude that performancewas sometimes improved by projecting down tofewer than c � 1 dimensions for a c class prob-lem. We can also see that a broad range ofnumbers of features led to the best results, butin no case were 20 or more features useful forconstructing the best classi�er.4.2 The problem of day-dependenceIn visually examining several projections into2-D, we noticed that the features of di�erentemotions from the same day often clusteredmore closely than did features for the sameemotions on di�erent days. To try to quan-tify this \day dependence" we ran a side exper-iment: how hard would it be to classify whatday each emotion came from, as opposed towhich emotion was being expressed? Becauseclassifying days is not the main interest, weonly ran one comparison: Fisher Projection ap-plied to the 24 features �X , �X , �X , ~�X , X ,~ X , X 2 (E ;B;G;R) computed on Data Set I.Leave-one-out cross validation was applied toevery point, and MAP classi�cation was usedon the classes (days or emotions), as describedabove. When the classes were the c = 20 days,(vs. the c = 8 emotions) then the recognitionaccuracy jumped to 83%, signi�cantly betterthan random guessing (5%) and, signi�cantlybetter than the 40% found for emotion classi�-cation using these features.

The day dependence is likely due to threefactors: (1) skin-sensor interface in uences{including hand washing, application of slightlydi�erent amounts of gel, and slight changesin positioning of the sensors; (2) variationsin physiology that may be caused by caf-feine, sugar, sleep, hormones, and other non-emotional factors; (3) variations in physiologythat are mood and emotion dependent { suchas an inability to build up an intense experienceof joy if the subject felt a strong baseline moodof sadness that day. The �rst factor is slightlymore controllable by using disposable electrodesthat come from the manufacturer containing apre-measured gel, and always having the sub-ject wash her hands in the same way at the sametime before each session, steps we did not im-pose. However, we made an e�ort to place thesensors as similarly from day to day as manu-ally possible and to manually apply the sameamount of gel each day. Nonetheless, many ofthese sources of variation are natural and can-not be controlled in realistic long-term measur-ing applications. Algorithms and features thatcan compensate for day-to-day variations areneeded.4.3 Day matrix for handlingday-dependenceThe 24 statistical features extracted from thesignals are dependent on the day the experi-ment was held. We now augment the 20 � 24matrix of the 20 days' 24 features with a 20�19day matrix, which appends a vector of length 19to each vector of length 24. The vector is thesame for all emotions recorded the same day,14

Page 15: M.I - Semantic Scholar · PDF file · 2015-07-28e amassed evidence that emo-tional skills are a basic comp onen t of in telli-gence, esp ecially for learning preferences and ... skills

Number of No. of Dimensions No. of FeaturesEmotions Fisher-24 SFFS-FP mSFFS mSFFS�FP8 6/7 4,5/7 13 175 (NAGJR) 3/4 3/4 12-17 154 (NAGR) 3/3 3/3 9-15,18 194 (AGJR) 3/3 2/3 7-8 123 (AGR) 2/2 2/2 2-16 123 (AJR) 1/2 2/2 6-14 7Table 3: Number of dimensions and number of features m used in the Fisher projections and featureselection algorithms that gave the results of Table 2. The denominator in the dimensions columnsis the maximum number of dimensions that could have been used; half of the time the results werebetter using a number less than this maximum (the numerator). When a range is shown for thenumber of features, m, then the performance was the same for the whole range.J2A1 J1 A2

J2A2

A1 J1

ab

J2A2

J3A3

A1 J1

a

p

(a) (b) (c)Figure 3: Illustration of a highly day-dependent feature for 2 emotions from 2 di�erent days. (a)The feature values for (A)nger and (J)oy from 2 di�erent days. (b) Addition of an extra dimensionallows for a line b to separate Anger from Joy. The data can be projected down to line a, so theaddition of the new dimension did not increase the �nal number of features. (c) In the case of datafrom 3 di�erent days, addition of 2 extra dimensions allows for a plane p to separate Anger fromJoy. The data can again be projected down to line a, not increasing the �nal number of features.15

Page 16: M.I - Semantic Scholar · PDF file · 2015-07-28e amassed evidence that emo-tional skills are a basic comp onen t of in telli-gence, esp ecially for learning preferences and ... skills

and di�ers among days. (The vectors were con-structed by generating 20 equidistant points ina 19-dim space.) Let us brie y explain the prin-ciple with an illustration in Fig. 3.Consider when the data come from 2 di�er-ent days and only 1 feature is extracted. (Thisis the simplest way to visualize, but it triviallyextends to more features). Although the featurevalues of one class are always related to the val-ues of the other classes in the same way, e.g., themean electromyogram signal for anger may al-ways be higher than the mean electromyogramfor joy, the actual values may be highly day-dependent (Fig. 3a). To alleviate this probleman extra dimension can be added before the fea-tures are input into the Fisher Algorithm (Fig.3b). If the data came from 3 di�erent days,2 extra dimensions are added rather than one(Fig. 3c), etc. In the general case, D � 1 extradimensions are needed for data coming from Ddi�erent days; hence, we use 19 extra dimen-sions. The above can be also seen as using theminimum number of dimensions so that eachof D points can be at equal distance from allothers. Therefore the D� 1 dimensional vectorcontains the coordinates of one such point foreach day.The e�ect of the day matrix on classi�cationcan be seen in Table 4, where we run Fisher andSFFS-FP with and without the day matrix, andcompare it to SFFS, running with the same 24features as above. Note that it is meaninglessto apply SFFS to the day matrix, so that com-parison is omitted. The use of the day matriximproves the classi�cation from 3.1% to 9.4%,although only the highest two of these improve-ments are signi�cant at > 90% and > 95% (theother con�dences are 71%-78%).Table 4 also reveals that all the methodsperform better on DataSet II than on DataSet I.The improvements range from 5 to 13 percent-age points, with con�dences 81%, 94%, 97%,98%, and 99%.4.4 Baseline matrix for handlingday-dependenceWe propose and evaluate another approach: useof a baseline matrix where the Neutral (no emo-tion) features of each day are used as a baselinefor (subtracted from) the respective features ofthe remaining 7 emotions of the same day. This

gives an additional 20�24 matrix for each of the7 non-neutral emotions. The resulting classi�-cation results on 7 emotions, run on DataSet I,are in Table 5, together with an additional trialof the day matrix for this case. All the resultsare signi�cantly higher than random guessing(14:3%) with con�dence > 99:99%. Compar-ing among the various ways of handling day-dependence, we �nd the most signi�cant im-provement to be SFFS-FP using the baselinematrix, which at 54.3% is an improvement over45% (con�dence 94%). The performance im-provements of SFFS-FP over that of Fisher inthe last two rows of the table are also statisti-cally signi�cant (con�dence 99%). Combiningall the features (original 24 + baseline + day)appears to result in a decrease in performance,suggesting the limitations of Fisher with toomany bad dimensions, where it cannot selectout features, but only transform them. How-ever, the decreases (from 54.3 to 49.3 and 40.7to 35.0) have 80% and 84% con�dence, so arenot considered signi�cant.4.5 Better features for handlingday-dependenceThe comparisons above were all conducted withthe original six statistical features (1)-(6); wewould now like to see how the features f1 � f10in uence classi�cation. We compare four spacesof features from which the algorithms can selectand transform subsets: the original six, �X , �X ,�X , ~�X , X , ~ X , for X 2 (E ;B;G;R) for a totalof 24 features; these same 24 features plus thesame six statistics for X = H for a total of 30features; features f1 � f10 plus �E which wereshown to be useful in our earlier investigation[50], and the combination of all 40 of these.The results are in Table 6. First, we considerthe case where all 40 features are available (thelast row). Comparing each entry in the lastrow of this table with the same column entryin the �rst row, we �nd that all the results aresigni�cantly improved (con�dence > 99:7%).Next, we look at the results with the daymatrix. It appears to improve upon the perfor-mance of the statistical features in the �rst tworows; however, the improvements are only a fewpercentage points and are not signi�cant (con-�dences 63%- 89.5%). The additional featuresof the day matrix can become a liability in the16

Page 17: M.I - Semantic Scholar · PDF file · 2015-07-28e amassed evidence that emo-tional skills are a basic comp onen t of in telli-gence, esp ecially for learning preferences and ... skills

Data Set Without Day Matrix With Day Matrixwith SFFS Fisher SFFS-FP Fisher SFFS-FP24 Features (%) (%) (%) (%) (%)DataSet I 40.6 40.0 46.3 49.4 50.6DataSet II 49.4 51.3 56.9 54.4 63.8Table 4: Classi�cation accuracy for all 8 emotions for DataSet I and DataSet II improves with theuse of the day matrix. These results were obtained with 24 statistical features, �X , �X , �X , ~�X , X ,~ X , X 2 (E ;B;G;R) for both data sets. The day matrix adds 19 dimensions to each feature that isinput into the Fisher Algorithm.Feature Space (Dimensions) SFFS Fisher SFFS-FP(%) (%) (%)Original (24) 42.9 39.3 45.0Orig.+ Day (43) N/A 39.3 45.7Orig.+ Base. (48) 49.3 40.7 54.3Orig.+ Base.+ Day (67) N/A 35.0 49.3Table 5: Data Set I, 7-emotion (all but the Neutral state) classi�cation results, comparing threemethods for incorporating the day information.presence of the f1�f10 features, which compen-sate much better for the day-to-day variations(the decrease from 70% to 61.3% is signi�cantat 95% con�dence.) Without the day matrix,the f1�f10 features alone perform signi�cantlybetter than the statistical features (con�dence> 93% in all six comparisons.)The best performance occurs when themethods have all forty features at their dis-posal, and the individual best of these occurswith SFFS-FP. The overall best rate of 81.25%is signi�cantly higher (> 95% con�dence) thansixteen of the cases shown in Table 6. The threeexceptions are in the last row { compared torates of 78.8% and to 77.5% the con�dences areonly 71% and 79% that 81.25% is a genuine im-provement.Table 7 a�ords a closer look at breakingpoints of the best-performing case: the SFFS-FP algorithm operating on the full set of 40features. Summing columns, one sees that thegreatest number of false classi�cations lie in thecategories of (P)latonic love and (J)oy. It is pos-sible that this re ects an underlying predilec-tion on the part of this subject toward being inthese two categories, even when asked to expe-

rience a di�erent emotion; however, one shouldbe careful not to read more into these numbersthan is justi�ed.Because of the longstanding debate regard-ing whether physiological patterns reveal va-lence di�erences (pleasing-displeasing aspects)of emotion, or only arousal di�erences, we con-sider here an alternate view of the results in Ta-ble 7. Table 8 rearranges the rows and columnsinto groups based on similar arousal rating orsimilar valence rating. From this we see thatboth arousal and valence are discriminated atrates signi�cantly higher than random (con�-dence > 99:99%), and that the rate of discrim-ination based on valence (87%) vs. the rateobtained based on arousal (84%) do not di�erin a statistically signi�cant way.4.6 Finding Robust FeaturesA feature-based approach presents not only avirtually unlimited space of possible featuresthat researchers can propose, but an intractablesearch of all possible subsets of these to �ndthe best features. When people hand-select fea-tures, they may do so with an intuitive feelfor which are most important; however, hand-17

Page 18: M.I - Semantic Scholar · PDF file · 2015-07-28e amassed evidence that emo-tional skills are a basic comp onen t of in telli-gence, esp ecially for learning preferences and ... skills

Number of Features: Which formed initial space Without Day Matrix With Day MatrixSFFS Fisher SFFS-FP Fisher SFFS-FP(%) (%) (%) (%) (%)24: �X , �X , �X , ~�X , X , ~ X , X 2 (E ;B;G;R) 49.4 51.3 56.9 54.4 63.830: �X , �X , �X , ~�X , X , ~ X , X 2 (E ;B;G;R;H) 52.5 56.9 60.0 58.8 63.811: f1; f2; : : : ; f10, �E 60.6 70.0 70.6 61.3 63.140: all of the above 65.0 77.5 81.25 77.5 78.8Table 6: Comparative classi�cation rates for 8 emotions for Data Set II. The day matrix adds 19features to the data input to the Fisher algorithm, o�ering no improvement when f1 � f10 areavailable. N A H G P L J R Total(N)eutral 17 0 0 0 3 0 0 0 20(A)nger 0 17 0 0 2 1 0 0 20(H)atred 0 0 14 1 0 0 3 2 20(G)rief 0 0 1 15 0 0 4 0 20(P)latonic Love 0 0 0 0 17 2 1 0 20Romantic (L)ove 1 1 0 0 3 14 1 0 20(J)oy 0 0 1 2 0 0 17 0 20(R)everence 0 0 0 1 0 0 0 19 20Total 18 18 16 19 25 17 26 21 160Table 7: Confusion matrix for the method that gave the best performance classifying 8 emotionswith Data Set II (81.25%). An entry's row label is the true class, the column label is what it wasclassi�ed as.selection is rarely as thorough as machine selec-tion and tends to overlook non-intuitive com-binations that may outperform intuitive ones.Machine selection is therefore preferable, evenwhen it is sub-optimal (non-exhaustive). Herewe analyze which features the machine searchesselected repeatedly. The results of automaticfeature selection from twelve experiments in-volving SFFS (either SFFS as in Jain andZongker's code, or SFFS-FP, or SFFS-FP withthe Day Matrix) are summarized in Table 9.From Table 9 we see that features such asthe means of the heart rate, skin conductiv-ity, and respiration were never selected by anyof the classi�ers running SFFS, so that theyneed not have even been computed for theseclassi�ers. At the same time, features such asthe mean absolute normalized �rst di�erence ofthe heart rate (~�H), the �rst di�erence of thesmoothed skin conductivity (f4), and the threehigher frequency bands of the respiration signal(f8 � f10), were always found to contribute tothe best results. Features that were repeatedlyselected by classi�ers yielding good classi�ca-tion rates can be considered more robust thanthose that were rarely selected, but only withinthe context of these experiments.There were surprises in that some featuresphysiologists have suggested to be importantsuch as f3 were not found to contribute togood discrimination. Although the results herewill be of interest to psychophysiologists, theymust clearly be interpreted in the context of therecognition experiment here, and not as somede�nitive statement about a physical correlateof emotion.18

Page 19: M.I - Semantic Scholar · PDF file · 2015-07-28e amassed evidence that emo-tional skills are a basic comp onen t of in telli-gence, esp ecially for learning preferences and ... skills

Valence Rating Very Neg. Neg. Neutral Pos. TotalVery Neg. (A) 17 0 0 3 20Neg. (H,G) 0 31 2 7 40Neutral (N,R) 0 1 36 3 40Pos. (P,L,J) 1 3 1 55 60Arousal Rating Very High Med. High High Low Very Low TotalVery High (A,L) 33 1 0 6 0 40Med. High (J) 0 17 2 1 0 20High (G) 0 4 15 1 0 20Low (N,H,P) 2 4 1 51 2 60Very Low (R) 0 0 1 0 19 20Table 8: Rearrangements of Table 7, showing confusion matrices for emotions having similar valenceratings (139=160 = 87%) and similar arousal ratings (135=160 = 84%).5 ConclusionsThis paper has suggested that machine intelli-gence should include skills of emotional intelli-gence, based on recent scienti�c �ndings aboutthe role of emotional abilities in human intelli-gence, and on the way human-machine interac-tion largely imitates human-human interaction.This is a shift in thinking from machine intelli-gence as one of primarily mathematical, verbal,and perceptual abilities. Emotion is believed tointeract with all of these aspects of intelligencein the human brain. Emotions, largely over-looked in early e�orts to develop machine intel-ligence, are increasingly regarded as an area forimportant research.One of the key skills of emotional intelli-gence for adaptive learning systems is the abil-ity to recognize the emotional communicationofothers. Even dogs can recognize their owner'sa�ective expressions of pleasure or displeasure {an important piece of feedback. One of the dif-�culties researchers face in this area is the sheerdi�culty of getting data corresponding to realemotional states; we have found that e�orts inthis area are more demanding than traditionale�orts to get pattern recognition data. We pre-sented �ve factors to aid researchers trying togather good a�ect data.The data gathered in this paper contrastswith that gathered for most other e�orts ata�ect pattern recognition not so much in theuse of physiology vs. video or audio, but in

its focus on having the subject try to generatea feeling vs. having the subject try to gener-ate an outward expression. One weakness ofboth data-gathering methods is that the sub-ject elicited the emotion, vs. a situation orstimulus outside the subject eliciting it. Ourgroup at MIT has recently designed and builtenvironments that focus on emotions not gener-ated deliberately by the subject, e.g., collectinga�ective data from users driving automobilesin city and highway conditions [38] and fromusers placed in frustrating computer situations[51]; both of these areas aim at data generationin an event-elicited, close to real-world, feeling,open-recording, other-purpose experiment, withe�ort to make the open recording so comfort-able that it is e�ectively ignored. Nonetheless,much work remains to be done in gathering andanalyzing a�ect data; attempts to create situ-ations that induce emotion in subjects remainsubject to uncertainty.One facet of emotion recognition is devel-oped here for the �rst time: classi�cation ofemotional state from physiological data gath-ered from one subject over many weeks of datacollection. The corpus of person-dependent af-fect data is larger than any previously reported,and the methodology we developed for its anal-ysis is subject-independent. In future work, thepatterns of many individuals may be clusteredinto groups, according to similarity of the physi-ological patterns, and these results leveraged to19

Page 20: M.I - Semantic Scholar · PDF file · 2015-07-28e amassed evidence that emo-tional skills are a basic comp onen t of in telli-gence, esp ecially for learning preferences and ... skills

24 Statistical Features 30 Statistical Features f1; f2; : : : ; f10 and �E All 40 Features GrandS SF SFD Sum S SF SFD Sum S SF SFD Sum S SF SFD Sum Total�E 0 1 1 2/3 0 1 1 2/3 1 1 1 3/3 0 0 1 1/3 8/12 0.67�E 0 1 1 2/3 0 1 1 2/3 0 1 1 2/3 6/9 0.67�E 1 1 1 3/3 1 1 1 3/3 1 0 1 2/3 8/9 0.89~�E 1 1 1 3/3 0 1 1 2/3 0 1 1 2/3 7/9 0.78 E 1 1 1 3/3 1 1 1 3/3 0 0 1 1/3 7/9 0.78~ E 1 1 1 3/3 0 1 1 2/3 0 1 1 2/3 7/9 0.78�B 1 1 1 3/3 0 1 1 2/3 0 1 1 2/3 7/9 0.78�B 0 0 1 1/3 0 0 0 0/3 0 0 0 0/3 1/9 0.11�B 0 0 0 0/3 0 1 1 2/3 0 0 0 0/3 2/9 0.22~�B 1 0 1 2/3 0 1 1 2/3 0 1 1 2/3 6/9 0.67 B 0 0 0 0/3 0 0 0 0/3 0 0 0 0/3 0/9 0.00~ B 0 1 1 2/3 0 1 1 2/3 0 0 1 1/3 5/9 0.56�H 0 0 0 0/3 0 0 0 0/3 0/6 0.00�H 0 0 0 0/3 0 0 1 1/3 1/6 0.17�H 0 1 1 2/3 0 1 1 2/3 4/6 0.67~�H 1 1 1 3/3 1 1 1 3/3 6/6 1.00 H 0 0 0 0/3 0 1 1 2/3 2/6 0.33~ H 1 1 1 3/3 0 1 1 2/3 5/6 0.83f1 1 0 0 1/3 0 0 1 1/3 2/6 0.33f2 1 1 1 3/3 0 1 1 2/3 5/6 0.83�S 0 0 0 0/3 0 0 0 0/3 0 0 0 0/3 0/9 0.00�S 0 0 0 0/3 0 0 1 1/3 0 0 1 1/3 2/9 0.22�S 1 1 1 3/3 0 1 1 2/3 1 1 1 3/3 8/9 0.89~�S 1 1 1 3/3 0 1 1 2/3 1 1 1 3/3 8/9 0.89 S 1 1 1 3/3 0 1 1 2/3 0 1 1 2/3 7/9 0.78~ S 1 1 1 3/3 0 1 1 2/3 0 1 1 2/3 7/9 0.78f3 1 0 0 1/3 0 0 0 0/3 1/6 0.17f4 1 1 1 3/3 1 1 1 3/3 6/6 1.00�R 0 0 0 0/3 0 0 0 0/3 0 0 0 0/3 0/9 0.00�R 0 0 1 1/3 0 0 0 0/3 0 0 0 0/3 1/9 0.11�R 1 1 1 3/3 0 1 1 2/3 0 1 1 2/3 7/9 0.78~�R 1 1 1 3/3 1 1 1 3/3 0 1 1 2/3 8/9 0.89 R 1 1 1 3/3 0 1 1 2/3 0 1 1 2/3 7/9 0.78~ R 1 1 1 3/3 0 1 1 2/3 0 1 1 2/3 7/9 0.78f5 1 0 0 1/3 0 1 1 2/3 3/6 0.50f6 1 0 0 1/3 0 1 1 2/3 3/6 0.50f7 1 1 1 3/3 0 1 1 2/3 5/6 0.83f8 1 1 1 3/3 1 1 1 3/3 6/6 1.00f9 1 1 1 3/3 1 1 1 3/3 6/6 1.00f10 1 1 1 3/3 1 1 1 3/3 6/6 1.00Table 9: Summary (along rows) of when each feature was chosen in the SFFS-based methods ofTable 6: the standard SFFS (S), SFFS followed by Fisher Projection (SF), and SFFS followedby Fisher Projection using the Day Matrix (SFD). The features listed in the left-most column aregrouped by signal: electromyogram (E), blood-volume pressure (B), heart rate (H), skin conductivity(S), and respiration (R), for easier physiological interpretation. The totals at the right, when high,suggest that the feature may be a robust one regardless of the classi�cation method.20

Page 21: M.I - Semantic Scholar · PDF file · 2015-07-28e amassed evidence that emo-tional skills are a basic comp onen t of in telli-gence, esp ecially for learning preferences and ... skills

potentially provide person-independent recog-nition.Prior to our e�orts, researchers have not re-ported the way in which physiological featuresfor di�erent emotions from the same day tendto be more similar than features for the sameemotions on di�erent days. This side-�nding ofour work may help explain why so many con- icting results exist on prior attempts to iden-tify emotion-dependent physiological correlates.The day-dependent variations that we foundand the ways we developed of handling themmay potentially be useful in handling across-subject variations. We proposed methods ofnormalizing features and baselining, showingthat the normalizing features gave the best re-sults, but have the drawback of not being easilyimplemented in an online way because of the re-quirement of having session-long summarizinginformation.We found that the Fisher Projection appliedto a subset of features pre-selected by SFFS al-ways outperformed Fisher Projection applied tothe full set of features when assessing percent-age of errors made by the classi�ers. However,only a few of the improvements of SFFS-FPover FP were signi�cant at > 95%; the resthad con�dences ranging from 58 � 87%. Fromthe signi�cant improvements, we surmise thatthe Fisher Projection's ability to transform thefeature space may work better when poorer-performing features are omitted up front. Thecombination is synergistic and may well applyin other domains: feature selection followed byfeature transformation.Forty features were proposed and systemat-ically evaluated by multiple algorithms. Thebest and worst features have been identi�ed forthis subject. These are of interest in the ongo-ing search for good features for a�ect recogni-tion, and may aid in trying to understand thedi�erential e�ects of emotions on physiology.Although the precise rates found here canonly be claimed to apply to one subject, themethodology developed in this paper can beused for any subject. The method is general for�nding the features that work best for a givensubject and for assessing classi�cation accura-cies for that subject.The results of 81% recognition accuracy oneight categories of emotion are the only re-

sults we know of for such a classi�cation, andare better than machine recognition rates ofa similar number of categories of a�ect fromspeech (around 60%-70%) and almost as goodas automated recognition of facial expressions(around 80-98%). Because there were doubtsin the literature that physiological informationshows any di�erentiation other than arousallevel, these results are exciting. However, theseresults do not imply that a computer can de-tect and recognize your emotions with 81% ac-curacy, because of the limited set of emotionsexamined, because of the optimistic estimatesof error given by the leave-one-out method, be-cause of the fact that this experiment, like oth-ers in the literature, has only looked at forcedchoice among pre-segmented data, because ofthe use of only one subject's long term data, andbecause of the nature of true emotion, whichconsists of more than externally measurable sig-nals. We expect that joint pattern analysisof signals from face, voice, body, and the sur-rounding situation is likely to give the most in-teresting emotion recognition results, especiallysince people read all these signals jointly.The greater than six-times-chance recogni-tion accuracy achieved in this pattern recogni-tion research is nonetheless a signi�cant �nding;it informs long-debated questions by emotiontheorists as to whether there is any physiolog-ical di�erentiation among emotions. Addition-ally, the �nding of signi�cant classi�cation rateswhen the emotion labels are grouped either byarousal or by valence lends evidence against thebelief that physiological signals only di�erenti-ate with respect to arousal; both valence andarousal were di�erentiated essentially equallyby the method we developed.A new question might be stated, \What isthe precise bodily nature of the di�erentiationthat exists among di�erent emotions and uponwhat other factors does this di�erentiation de-pend?" We expect that physiological pattern-ing, in combination with facial, vocal and otherbehavioral cues, will lead to signi�cant improve-ments in machine recognition of user emotionover the coming decade, and that this recogni-tion will be critical for givingmachines the intel-ligence to adapt their behavior to interact moresmoothly and respectfully with people. How-ever, this \recognition" will be on measurable21

Page 22: M.I - Semantic Scholar · PDF file · 2015-07-28e amassed evidence that emo-tional skills are a basic comp onen t of in telli-gence, esp ecially for learning preferences and ... skills

signals, which we do not expect to include one'sinnermost thoughts. Much work remains beforeemotion interpretation can occur at the level ofhuman abilities.6 AcknowledgmentsThe authors would like to thank Rob and DanGruhl for help with hardware for data collec-tion, the MIT Shakespeare Ensemble for actingsupport, Anil Jain and Doug Zongker for pro-viding us their SFFS code, Tom Minka for helpwith its use, and the anonymous reviewers forhelping improve the paper.7 References[1] A. R. Damasio, Descartes' Error: Emo-tion, Reason, and the Human Brain. NewYork, NY: Gosset/Putnam Press, 1994.[2] J. LeDoux, The Emotional Brain. NewYork: Simon & Schuster, 1996.[3] P. Salovey and J. D. Mayer, \Emotionalintelligence," Imagination, Cognition andPersonality, vol. 9, no. 3, pp. 185{211,1990.[4] D. Goleman, Emotional Intelligence. NewYork: Bantam Books, 1995.[5] B. Reeves and C. Nass, The Media Equa-tion. Cambridge University Press; Centerfor the Study of Language and Informa-tion, 1996.[6] M. Sigman and L. Capps, Children withAutism: A Developmental Perspective.The Developing Child, Cambridge, Mass.:Harvard University Press, 1997.[7] R. W. Picard, A�ective Computing. Cam-bridge, MA: The MIT Press, 1997.[8] J. Healey, R. W. Picard, and F. Dabek, \Anew a�ect-perceiving interface and its ap-plication to personalized music selection,"in Proc. Wkshp on Perceptual User Inter-faces, (San Francisco), Nov 1998.[9] R. W. Picard and J. Healey, \A�ectivewearables," Personal Technologies, vol. 1,no. 4, pp. 231{240, 1997.

[10] K. R. Scherer, \Ch. 10: Speech and emo-tional states," in Speech Evaluation in Psy-chiatry (J. K. Darby, ed.), pp. 189{220,Grune and Stratton, Inc., 1981.[11] R. Banse and K. R. Scherer, \Acoustic pro-�les in vocal emotion expression," Jour-nal of Personality and Social Psychology,vol. 70, no. 3, pp. 614{636, 1996.[12] T. Polzin, Detecting Verbal and Non-verbalCues in the Communication of Emotions.PhD thesis, School of Computer Science,June 2000.[13] J. Hansen, 1999. Communication duringICASSP '99 panel on Speech Under Stress.[14] J. N. Bassili, \Emotion recognition: Therole of facial movement and the relative im-portance of upper and lower areas of theface," Journal of Personality and SocialPsychology, vol. 37, pp. 2049{2058, 1979.[15] Y. Yacoob and L. S. Davis, \Recogniz-ing human facial expressions from log im-age sequences using optical ow," IEEET. Patt. Analy. and Mach. Intell., vol. 18,pp. 636{642, June 1996.[16] I. Essa and A. Gardner, \Prosody anal-ysis for speaker a�ect determination," inWorkshop on Perceptual User Interfaces'97, (Ban�, Alberta, Canada), pp. 45{46,October 19{21 1997.[17] I. A. Essa, Analysis, Interpretation andSynthesis of Facial Expressions. PhD the-sis, MIT Media Lab, Cambridge, MA, Feb.1995.[18] J. F. Cohn, A. J. Zlochower, J. Lien, andT. Kanade, \Automated face analysis byfeature point tracking has high concurrentvalidity with manual FACS coding," Psy-chophysiology, vol. 36, pp. 35{43, 1999.[19] M. Bartlett, J. C. Hager, P. Ekman, andT. J. Sejnowski, \Measuring facial expres-sions by computer image analysis," Psy-chophysiology, vol. 36, pp. 253{263, 1999.[20] G. Donato, M. S. Bartlett, J. C. Hager,P. Ekman, and T. J. Sejnowski, \Classify-ing facial actions," IEEE T. Patt. Analy.22

Page 23: M.I - Semantic Scholar · PDF file · 2015-07-28e amassed evidence that emo-tional skills are a basic comp onen t of in telli-gence, esp ecially for learning preferences and ... skills

and Mach. Intell., vol. 21, pp. 974{989, Oc-tober 1999.[21] L. C. De-Silva, T. Miyasato, and R. Nakatsu, \Fa-cial emotion recognition using multi-modalinformation," in Proc. IEEE Int. Conf. onInfo., Comm. and Sig. Proc., (Singapore),pp. 397{401, Sept 1997.[22] T. S. Huang, L. S. Chen, and H. Tao,\Bimodal emotion recognition by manand machine," in ATR Workshop on Vir-tual Communication Environments, (Ky-oto, Japan), April 1998.[23] L. S. Chen, T. S. Huang, T. Miyasato,and R. Nakatsu, \Multimodal human emo-tion/expression recognition," in Proc. 3rdInt. Conf. on Automatic Face and GestureRecognition, (Nara, Japan), pp. 366{371,IEEE Computer Soc., April 1998.[24] J. T. Cacioppo and L. G. Tassinary, \Infer-ring psychological signi�cance from phys-iological signals," American Psychologist,vol. 45, pp. 16{28, Jan. 1990.[25] W. James,William James: Writings 1878-1899, ch. on Emotion, pp. 350{365. TheLibrary of America, 1992. Originally pub-lished in 1890.[26] W. B. Cannon, \The James-Lange theoryof emotions: A critical examination and analternative theory," American Journal ofPsychology, vol. 39, pp. 106{124, 1927.[27] S. Schachter, \The interaction of cognitiveand physiological determinants of emo-tional state," in Advances in Experimen-tal Psychology (L. Berkowitz, ed.), vol. 1,pp. 49{80, New York: Academic Press,1964.[28] P. Ekman, R. W. Levenson, and W. V.Friesen, \Autonomic nervous system ac-tivity distinguishes among emotions," Sci-ence, vol. 221, pp. 1208{1210, Sep. 1983.[29] W. M. Winton, L. Putnam, and R. Krauss,\Facial and autonomic manifestations ofthe dimensional structure of emotion,"Journal of Experimental Social Psychology,vol. 20, pp. 195{216, 1984.

[30] A. J. Fridlund and C. E. Izard, \Elec-tromyographic studies of facial expressionsof emotions and patterns of emotions,"in Social Psychophysiology: A Sourcebook(J. T. Cacioppo and R. E. Petty, eds.),pp. 243{286, New York: Guilford Press,1983.[31] J. T. Cacioppo, G. G. Berntson, J. T.Larsen, K. M. Poehlmann, and T. A.Ito, \The psychophysiology of emotion,"in Handbook of Emotions (M. Lewis andJ. M. Haviland-Jones, eds.), pp. 173{191,New York: The Guilford Press, 2000.[32] D. Keltner and P. Ekman, \The psy-chophysiology of emotion," in Handbook ofEmotions (M. Lewis and J. M. Haviland-Jones, eds.), pp. 236{249, New York: TheGuilford Press, 2000.[33] D. M. Clynes, Sentics: The Touch of theEmotions. Anchor Press/Doubleday, 1977.[34] C. E. Izard, \Four systems for emotion ac-tivation: Cognitive and noncognitive pro-cesses," Psychological Review, vol. 100,no. 1, pp. 68{90, 1993.[35] H. Hama and K. Tsuda, \Finger-pressurewaveforms measured on clynes' sentographdistinguished among emotions," Percep-tual and Motor Skills, vol. 70, pp. 371{376,1990.[36] H. Schlosberg, \Three dimensions ofemotion," Psychological Review, vol. 61,pp. 81{88, Mar. 1954.[37] P. J. Lang, \The emotion probe: Studies ofmotivation and attention," American Psy-chologist, vol. 50, no. 5, pp. 372{385, 1995.[38] J. A. Healey, Wearable and AutomotiveSystems for A�ect Recognition from Physi-ology. PhD thesis, Massachusetts Instituteof Technology, Cambridge, MA, May 2000.TR 526.[39] E. Vyzas and R. W. Picard, \A�ec-tive pattern classi�cation," in AAAI 1998Fall Symposium, Emotional and Intelli-gent: The Tangled Knot of Cognition, (Or-lando, FL), Oct. 1998.23

Page 24: M.I - Semantic Scholar · PDF file · 2015-07-28e amassed evidence that emo-tional skills are a basic comp onen t of in telli-gence, esp ecially for learning preferences and ... skills

[40] E. Vyzas and R. W. Picard, \O�ineand online recognition of emotion expres-sion from physiological data," in Work-shop on Emotion-Based Agent Architec-tures, Third International Conf. on Au-tonomous Agents, (Seattle, WA), pp. 135{142, May 1999.[41] A. V. Oppenheim and R. W. Schafer,Discrete-Time Signal Processing. Engle-wood Cli�s, New Jersey: Prentice-Hall,Inc., 1989.[42] D. T. Lyyken, R. Rose, B. Luther, andM. Maley, \Correcting psychophysiolog-ical measures for individual di�erencesin range," Psychophysiological Bulletin,vol. 66, pp. 481{484, 1966.[43] D. T. Lyyken and P. H. Venables, \Directmeasurement of skin conductance: A pro-posal for standardization," Psychophysiol-ogy, vol. 8, no. 5, pp. 656{672, 1971.[44] M. E. Dawson, A. M. Schell, and D. L. Fil-ion, \The electrodermal system," in Prin-ciples of Psychophysiology: Physical, so-cial, and inferential elements (J. T. Ca-cioppo and L. G. Tassinary, eds.), pp. 295{324, Cambridge: Cambridge Press, 1990.[45] P. Pudil, J. Novovicova, and J. Kittler,\Floating search methods in feature selec-tion," Pattern Recognition Letters, vol. 15,pp. 1119{1125, Nov. 1994.[46] A. Jain and D. Zongker, \Feature selection:Evaluation, application, and small sampleperformance," IEEE T. Patt. Analy. andMach. Intell., vol. 19, pp. 153{163, Febru-ary 1997.[47] P. Pudil, J. Novovicova, and J. Kittler,\Simultaneous learning of decision rulesand important attributes for classi�cationproblems in image analysis," Image andVision Comp., vol. 12, pp. 193{198, Apr.1994.[48] R. O. Duda and P. E. Hart, PatternClassi�cation and Scene Analysis. Wiley-Interscience, 1973.

[49] T. M. Cover and P. E. Hart, \Nearestneighbor pattern classi�cation," IEEE T.Info. Theory, vol. IT-13, pp. 21{27, Jan.1967.[50] J. Healey and R. W. Picard, \Digitalprocessing of a�ective signals," in IEEEInt. Conf. on Acoust., Sp. and Sig. Proc.,(Seattle), 1998.[51] J. Scheirer, J. Klein, R. Fernandez, andR. W. Picard, \Frustrating the user onpurpose: A step toward building an a�ec-tive computer," Interaction with Comput-ers, 2001. To appear.

24