Thesis Proposaljdtam/Documents/Hyde_proposal.pdfthat an avatar that adjusts its facial motion to make an intended emotion more (or less) evident, could improve virtual conversations

Thesis ProposalDesign of avatars as conversational partners

Jennifer Hyde

April 2013

School of Computer ScienceCarnegie Mellon University

Pittsburgh, PA 15213

Thesis Committee:Jessica K. Hodgins (Co-Chair), Computer Science Department and Robotics Institute

Sara Kiesler (Co-Chair), Human-Computer Interaction InstituteNancy Pollard (Member), Computer Science Department and Robotics Institute

Carol O’Sullivan (External Member), Trinity College of Dublin

Submitted in partial fulfillment of the requirementsfor the degree of Doctor of Philosophy.

Copyright c© 2013 Jennifer Hyde

Keywords: Avatars, Communication, Interaction, Facial Motion, Visual Realism, Graphics,Perception, Cooperation, Trust, Engagement, Adults, Children

Abstract

Animated human characters, conversational agents, and avatars are used in ap-plications for counseling, therapy, entertainment, and education. This dissertationexamines the social effects of avatars, which are controlled in real-time by peopleand can, therefore, be used in unplanned and unscripted interactions. Avatars canhelp people converse remotely and anonymously, and they can be used to help peo-ple imagine alternative realities. Therapists have used avatars to practice differentsocial situations with patients who have social and communicative disorders. Thistype of therapy allows patients to safely practice conversing with their therapist whomay be represented by different avatars such as an employer or a sales clerk.

The literature on computer agents suggests that people will have better conversa-tions with avatars displaying realistic facial cues, such as a realistic smile. However,the research so far has not examined how an avatar system might modify the avatarsfacial cues in real time to improve conversation. For example, a person may in-tend that his or her avatar smiles, and the avatar system might decide how wide thesmile should be on the avatar. Although previous research provides strong supportfor the value of realistic avatar facial motion, a systematic investigation of modifica-tions to realistic avatar facial motion has not been pursued. In this work, we arguethat an avatar that adjusts its facial motion to make an intended emotion more (orless) evident, could improve virtual conversations with the avatar and, perhaps, theeffectiveness of the therapy, education, or entertainment.

The literature on facial motion indicates that slight differences in how the facemoves can change how people interpret the resulting facial expressions. For ex-ample, facial expressions that have been exaggerated are more recognizable andperceived as being more intense. Because facial expressions influence people’s im-pressions of and behavior towards other people, slight changes in avatar facial mo-tion might significantly influence people’s impressions of and behaviors towards theavatar. An avatar that could adjust its facial motion in real-time might, therefore, sig-nificantly influences an interaction. A therapist might find an avatar that could adjustits facial motion so as to make the therapist appear more trustworthy and respectfuluseful, because the avatar might encourage a patient to disclose more information tothe therapist. A child speaking with an unfamiliar social worker, may have an easiertime conversing if the social worker appears as a friendly and engaging avatar. Toinvestigate these possibilities, we will evaluate people’s thoughts and behaviors asthey watch animated characters and converse with avatars whose facial motion hasbeen exaggerated or damped. We have designed three projects to investigate the in-fluence and possible benefits of exaggerating and damping avatar facial motion. Topursue this question, we used a custom-built audiovisual telecommunications systemthat is capable of tracking a user’s facial motion and retargeting his/her motion to anavatar in real-time. This system allows two people to converse with one another us-ing natural eye gaze, facial gestures, and speech; however one user may appear asan avatar instead of as him- or herself.

In the first project, we explored how animated characters’ facial expressiveness

influenced viewers’ thoughts by asking participants for their impressions of the char-acters’ personalities. For this project we conducted two user studies in which we cre-ated animations of characters telling stories with different amounts of facial motion.We animated the characters using the same technique that is used by the audiovisualtelecommunications system, applying natural human facial motion to our humancharacters. The results of these studies show that participants perceptual judgmentswere influenced by facial motion, especially in regards to how extroverted a characterseemed. More facial motion led to stronger impressions of extroversion, intelligence,trust, and likeability. Damped motion led to stronger impressions of respectfulness,calmness, and positivity.

In a second project, we propose to investigate how avatar facial motion influencespeoples’ responses to the avatar by having participants engage in a cooperative rank-ing task with an avatar. Initially, participants will rank a list of items independently;then they will cooperate with an avatar animated with exaggerated, unaltered, ordamped motion to decide on a mutual ranking. Finally, participants will have theopportunity to change their initial ranking. We will measure how well participantscooperate with the avatars and their trust in the avatar by analyzing their conversa-tions, changes in their rankings, and questionnaire responses. This study will allowus to establish how avatar facial motion affects avatar trustworthiness and people’scooperative behavior.

In a third project, we propose to investigate how avatars might improve con-versations between children and adults. It can be difficult and uncomfortable forchildren to converse with adults who are unfamiliar to them such as social workers,counselors, therapists, doctors, and lawyers. Avatars might be more engaging andmore comfortable for children to talk to than adults. We propose a study in whichchildren, ages 4-10 years old, will converse with a confederate, represented as anavatar, using the audiovisual telecommunications system. The confederate will ap-pear either as herself with video, as an avatar with unaltered motion, or as an avatarwith exaggerated motion. To measure engagement and attention, we will analyzethe conversations and ask the children for their opinions. This study will help guidethe design of avatars to make them more engaging in conversation with children.

This dissertation will show that simple manipulations, such as exaggerating fa-cial motion, can encourage people to have better interactive experiences with avatars.More generally, we will provide scientifically based guidelines to create avatar-systems that encourage cooperation, engagement, and attention. This dissertationalso will contribute to the design of an audiovisual telecommunications system ca-pable of animating avatars with human motion in real time.

iv

Contents

1 Introduction 11.1 Modifying Facial Motion to Influence Perceptions of Animated Characters . . . . 51.2 Influencing Trust and Cooperation between Adults . . . . . . . . . . . . . . . . 61.3 Engaging and Sustaining Children’s Attention . . . . . . . . . . . . . . . . . . . 7

2 Background 92.1 Faces in human-human interaction . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1.1 Emotion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.1.2 Intention and mental state . . . . . . . . . . . . . . . . . . . . . . . . . 112.1.3 Identity and Impression Formation . . . . . . . . . . . . . . . . . . . . . 112.1.4 Communicative mechanism . . . . . . . . . . . . . . . . . . . . . . . . 122.1.5 Children vs. adults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2 Faces in human-computer interaction . . . . . . . . . . . . . . . . . . . . . . . . 132.2.1 Emotion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2.2 Intention and mental state . . . . . . . . . . . . . . . . . . . . . . . . . 142.2.3 Identity and Impression Formation . . . . . . . . . . . . . . . . . . . . . 142.2.4 Communicative mechanism . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Experimental Setup 173.1 Active Appearance Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.2 Custom Audiovisual Telecommunications System . . . . . . . . . . . . . . . . . 213.3 Evaluating the Impact of Delay on Dyadic Conversations . . . . . . . . . . . . . 22

3.3.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.3.2 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.3.3 Materials and Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4 Modifying Facial Motion to Influence Perceptions of Animated Characters 294.1 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.2 Determining facial motion sensitivity . . . . . . . . . . . . . . . . . . . . . . . . 30

4.2.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.2.2 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.2.3 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

v

4.2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.3 Perceptual effects of facial motion and appearance . . . . . . . . . . . . . . . . . 334.3.1 Study One . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.3.2 Study Two . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5 Influencing Trust and Cooperation 455.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465.2 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475.3 Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475.4 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485.5 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485.6 Confederate Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495.7 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505.8 Analytic Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

6 Increasing Children’s Engagement and Attention During Interaction 516.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526.2 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526.3 Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536.4 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536.5 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546.6 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546.7 Analytic Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

7 Contributions and Schedule 57

Bibliography 61

vi

List of Figures

1.1 Examples of customized avatars. (a) A Second Life avatar customized to looklike a user (http://secondlife.com/whatis/avatar/?lang=en-US). Accessed on March16, 2013. (b) A Microsoft avatar customized to look like a user(http://www.xbox.com/en-IN/Kinect/KinectAvatars). Accessed on March 16, 2013. . . . . . . . . . . . . . 2

1.2 Example of how the Microsoft Kinect uses natural motion to control an avatar.(a) Advertisement from (http://www.industrygamers.com/news/star-wars-lightsaber-game-track–field-and-more-at-kinect-premiere/). Accessed March 16, 2013. (b,c)Photos of the Kinect tracking in real time. . . . . . . . . . . . . . . . . . . . . . 3

3.1 Example of two people using the first version of our audiovisual telecommuni-cations system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2 Example of the shape and appearance models, which can be warped to createnew facial postures. We simplified this figure from the article Active AppearanceModels Revisited (Matthews and Baker, 2004). . . . . . . . . . . . . . . . . . . 19

3.3 A colleague sits in front of the audiovisual telecommunications setup that recordsher, tracks her facial motion (lower right window of the monitor), and retargetsher motion to an avatar (lower left window of the monitor). . . . . . . . . . . . 20

3.4 Diagram illustrating how the audiovisual telecommunications system works. . . 213.5 3D Models of the second version of our audiovisual telecommunications system.

(a) A side view of the box containing the camera, beam splitter, microphone, andmonitor. (b) A view from the back of the box. . . . . . . . . . . . . . . . . . . . 22

3.6 Effects of delay on conversation perceptions. The main effect of delay on per-ceptions across all scales except for pace is statistically significant, p < .05. . . . 27

3.7 Effects of delay and channel on perceived conversation naturalness. The interac-tion effect of delay and channel on the perception of naturalness is statisticallysignificant, p < .05. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.1 Examples of the (a) toon and (b) CG characters side-by-side with the characterson the right displaying normal motion and the characters on the left displayingdamped motion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.2 Motion level pretest results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.3 The eight characters in cartoon and realistic rendering styles. . . . . . . . . . . . 354.4 Sample still frames of the eight different motion levels. . . . . . . . . . . . . . . 364.5 Interaction effects on likeability and intelligence. . . . . . . . . . . . . . . . . . 374.6 Rendering Style Trends for Likeability and Intelligence . . . . . . . . . . . . . . 38

vii

4.7 Effect of motion level on extroversion. . . . . . . . . . . . . . . . . . . . . . . . 394.8 Influence of story valence on ratings of character respectfulness, calmness, at-

tentiveness, and positivity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.9 Influence of motion level on ratings of character respectfulness, calmness, extro-

version, and positivity. Ratings at specified motion levels (*) were significantlydifferent (p < 0.05). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

7.1 Proposed timeline for completion of dissertation work. . . . . . . . . . . . . . . 59

viii

List of Tables

3.1 Audio delay thresholds found in prior work. . . . . . . . . . . . . . . . . . . . . 233.2 Topics used in study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.3 Questionnaire items administered after each conversation. *Cronbach’s Alpha is

a measure of the reliability of the scale as a whole. Alpha ranges from 0.0 to 1.0. 26

ix

x

Chapter 1

Introduction

Animated human characters are used in a wide range of situations including in movies, games,and applications for therapy and counseling. People encounter these characters in such roles ascustomer service representatives on shopping websites, inhabitants of virtual worlds like Sec-ond Life and World of Warcraft, virtual peers and teachers in educational settings, and mockinterviewers that allow us to practice presenting ourselves. When these characters have real-istic behaviors and expressive faces, people tend to be more engaged with them (for a review,see Beale and Creed, 2009). We believe that if these characters are to be successful in their roles,they need to converse with people using free speech and natural gesture. Conversation is impor-tant for building trust because when people converse with one another, they not only exchangeinformation, but they also influence each other’s thoughts and behaviors (Batson, 1990). Thehuman face provides many social cues that people rely on to understand and communicate withone another. Facial motion and expressions are particularly important because they communi-cate identity, emotions, mental states, personality, and intentions (Back et al., 2009; Ekman et al.,1980). People form impressions of one another very quickly (Borkenau et al., 2004), and theyoften use the face to make quick judgments (Willis and Todorov, 2006).

In some situations driven by necessity or preference, technology can provide an attractive andfeasible alternative to in person conversation. There are many reasons why people may choose touse technology to converse with one another. People may not be in the same physical space, and,therefore, technology makes the conversation possible. If people wish to remain anonymousduring conversation, then technology can also mask their identity, or if people wish to imaginealternate realities via role playing, then technology can facilitate.

There are several different types of technology that support conversation. Video conferencingis easy to use, widely available, and allows people to remotely see and hear one another; there-fore enabling users to converse with free speech and natural gesture. Unfortunately, with mostvideo conferencing systems, users cannot make direct eye contact and must deal with low framerate and asynchronous audio and video. Video conferencing merely represents each participantto the other using video and audio, and, therefore, it does not provide users anonymity, nor doesit help them imagine alternate realities. At the other end of the spectrum, conversational agentsprovide anonymity and allow unrestricted character design. Because they are completely com-puter controlled, the appearance, behavior, and responses of an agent must be pre-programmedor algorithmically generated. Agents can be especially useful for tutoring and counseling (Ovi-

1

(a) (b)

Figure 1.1: Examples of customized avatars. (a) A Second Life avatar customized to look like auser (http://secondlife.com/whatis/avatar/?lang=en-US). Accessed on March 16, 2013. (b) A Mi-crosoft avatar customized to look like a user(http://www.xbox.com/en-IN/Kinect/KinectAvatars).Accessed on March 16, 2013.

att, 2000; Tartaro and Cassell, 2008; Hoque, 2012). For example, children with autism spectrumdisorder interacted more with a conversational agent than with other children, because the agentwas more patient and more encouraging than the real children (Tartaro and Cassell, 2008). Forcareer counselors who are too busy to give mock interviews multiple time to the same job can-didates, Hoque (2012) is developing My Automated Conversation Helper (MACH), which is asystem that uses virtual interviewing agents to conduct mock interviews, analyzes interviewees’behavior, and provides feedback to interviewees. Although there have been many advances inmachine learning and artificial intelligence that make agents respond more appropriately in realsituations, agents are still not a perfect replacement for real people.

In between these two solutions, we have avatar-mediated conversation, in which avatars pro-vide a virtual representation of the person who controls the avatar in real time. Avatars can bedesigned to allow their users to communicate remotely, change their appearance, change theirvoice, emote, and speak freely. In the virtual world, Second Life, users design and controlavatars to converse with one another. One of the attractions of Second Life is that it allows usersto create new personas and experiment with alternate realities. Second Life has also been usedto help people with autism spectrum disorders practice different social skills including, “meet-ing new people, dealing with a roommate conflict, negotiating financial or social decisions, andinterviewing for a job” (Kandalaft et al., 2013). The therapists in the study were represented asavatars in custom made spaces (office building, coffee shop, restaurant) so that they could coachparticipants who were also represented as avatars. Using avatars allowed participants to prac-tice socializing in different virtual settings. Second Life allows users to completely customizetheir avatars (see Figure 1.1), but customization is complex and time consuming (Linden, 2012).Microsoft’s XBox and Kinect provide avatars that are easy to control because they mimic usermotion (see Figure 1.2), but the avatars’ appearance are not as customizeable (see Figure 1.1).

Avatars are useful in unscripted conversation, because they can be completely customizedand controlled by users in real time. For this dissertation, we are interested in designing avatarsas conversation partners. We believe that many of the existing domains in which avatars are used,such as in education, counseling, and therapy, could benefit from avatars that are driven directly

2

(a) (b) (c)

Figure 1.2: Example of how the Microsoft Kinect uses natural motion to control anavatar. (a) Advertisement from (http://www.industrygamers.com/news/star-wars-lightsaber-game-track–field-and-more-at-kinect-premiere/). Accessed March 16, 2013. (b,c) Photos of theKinect tracking in real time.

by the users’ actions and can, therefore, converse with free speech and natural gestures. We knowfrom existing research on agents, that displaying facial expression and emotion can benefit inter-action in several domains (for a review, see Beale and Creed, 2009). In education, cooperativeagents that expressed support verbally and with their facial expressions were perceived as moreappealing and competent than cooperative agents that only expressed support verbally (Maldon-ado et al., 2005). Participants also performed better on tests with the emotive agent than withthe unemotional agent. In the context of exercise counseling, Bickmore and Picard (2005) foundthat participants developed closer bonds with agents that displayed relational behaviors includingnods, gaze, and behaviors modeled after real physical trainers than agents that displayed fewerrelational behaviors. When users want complete control over what they communicate and howthey are represented, avatars are a better solution than agents. The research on agents providesstrong support for the value of realistic avatar facial motion, but a systematic investigation ofmodifications to realistic avatar facial motion in real time interactions has not been pursued. Inthis work, I argue that an avatar that adjusts its facial motion to make an intended emotion more(or less) evident, could improve virtual conversations with the avatar.

Facial expressions can influence how much people trust and engage with one another. Anavatar that adjusts its facial motion might, therefore, be able to make itself appear more trustwor-thy or engaging. There are many ways an avatar could adjust its facial motion, but this disserta-tion will focus on how exaggerating and damping avatar facial motion could improve conversa-tion. Exaggeration could be useful because natural facial motion may not be sufficient to conveythe user’s emotion and intent when portrayed on a non-photorealistic avatar. We know fromresearch on human faces, that exaggerating and damping facial motion affects the clarity andintensity of the expression (Pollick et al., 2003; Ambadar et al., 2005). Researchers have foundthat exaggerating images of real facial expressions make emotion recognition easier, in additionto making the expressions seem more intense (Benson et al., 1999; Calder et al., 1997; Ambadaret al., 2005). Hill and colleagues (2005) found similar results with an animated, computer-

3

generated, realistic, looking face. Using dynamic point-light displays of real faces, Pollick andcolleagues (2003) also found improved recognition and heightened intensity for exaggerated ex-pressions. These results suggest that exaggerating the facial expressions on non-photorealisticavatars should make their expressions clearer during interaction.

In movies and television, characters are often animated using the basic principles of tradi-tional animation to give the characters the “illusion of life” (Thomas and Johnston, 1981). Theseanimation principles are paramount for creating believable, lovable, inspirational, and influen-tial characters such as Wall-E and Bugs Bunny. Exaggeration is one of the principles, whichis defined as “accentuating the essence of an idea via the design and the action,” (Lasseter,1987; Thomas and Johnston, 1981). It follows, then, that exaggeration might be an importanttechnique for animating characters in other, non-passive viewing experiences, such as duringavatar-mediated conversations.

This dissertation will empirically investigate how to design avatars that can adjust their mo-tion in real time to improve conversation. Because avatars are used in so many domains, wewill focus on improving avatars in the personal service domains. We believe that avatars shouldbe designed to increase trust, cooperation, engagement, and attention. If a counselor or thera-pist uses an avatar to converse with a client, then the client will hopefully trust and act on theavatar’s advice. If a tutor is using an avatar to teach a child, then it will be important for theavatar to engage the child so that the child does not become bored and lose focus. Avatars areused with adults and children, and these different populations may not be influenced by avatarsin the same ways. An avatar designed to converse with adults may not be appealing or engagingto children. Children may find conversing with an adult, who is unfamiliar to them, difficultand uncomfortable; however, children may feel at ease conversing with the same adult if theyappear as an avatar. For therapists and counselors practicing social situations with children whohave social and communicative disorders like Autism Spectrum Disorder, exaggerating avatarfacial motion could help clarify facial cues during conversation. In an experiment comparingchildren’s conversations with a conversational agent to an adult, Oviatt found that children spokemore clearly with the agent (Oviatt, 2000). For counselors, therapists, social workers, and otheradults working in personal service domains, avatar-mediated conversation could be an attractivemethod for conversing with children.

Children, like adults, are influenced by facial cues; however, children are not as adept asadults at recognizing and interpreting facial cues (Widen and Russell, 2008). Children mayhave better conversations with exaggerated avatars than with unmodified avatars because theexaggerated avatars’ facial cues may be clearer. Children may also respond more to avatarsthat appear friendly and extroverted, whereas, adults may respond more to avatars that appeartrustworthy. For this dissertation we will focus on how avatars can adjust their motion to: (1)increase trust during conversation with adults; and (2) increase engagement and attention duringconversation with children.

When avatars are created, there are many design questions that must be answered including:What should the avatar look like? How should the avatar move? How should the avatar com-municate? How will users control the avatar? What aspects of the avatar will users be able tocontrol? Answering all of these questions would require many experiments; therefore we haverestricted the dissertation to the following questions: (1) How will facial motion be perceived onnon-photorealistic avatar faces? (2) How will exaggerating and damping facial motion influence

4

users’ impressions? (3) How will the amount of facial motion influence trust and cooperationduring conversation? (4) How will children’s engagement and attention differ when speaking toadults by video conference or by avatar-mediated communication?. This dissertation will pro-vide insight into how avatar rendering style and amount of facial motion affect conversation. Wehave decided to specifically focus on manipulating avatar faces, because people tend to focus onfaces when they converse (Argyle and Dean, 1965). Conversational agents have been improvedby the addition of believable and realistic facial motion; therefore, we expect that believable andrealistic facial motion will improve avatars as conversation partners too. There has been littleresearch investigating the influence of avatar rendering style on users’ opinions, and the researchthat has been conducted supports the idea that people do not like creepy rendering styles (Inkpenand Sedlins, 2011; McDonnell et al., 2012). Inkpen and Sedlins (2011) found that in a profes-sional setting, users preferred to be represented by avatars that were proportioned realistically;however the users did not require that the textures be similarly realistic. Because we are in-terested in designing avatars for the personal service domain, we will use avatars with realisticproportions. We will also investigate how rendering style may affect user opinion.

This proposal will describe the empirical approach we intend to use to answer our questions.The remaining sections of this introduction summarize the completed and proposed experimentsfor this dissertation. We have organized the experiments into three projects. Chapter 2 reviewsrelated work on the importance and influence of facial expressions and motion during human-human and human-computer interaction. Chapter 3 describes our experimental setup includingthe design of an audiovisual telecommunications system that can track a person’s face and re-target the tracked motion to an avatar in real-time. Chapters 4-6 describe the three projects.Our first project involved measuring the perceptual effects of exaggerating and damping the fa-cial motion of animated characters. Our manipulations successfully influenced viewer attitudestowards non-interactive characters (Chapter 4). For our next project, we propose to investigatehow manipulating avatar facial motion can influence human behavior. Specifically, we are inter-ested in examining whether we can increase participant cooperation with and trust in our avatarsby simply exaggerating and damping the avatar facial motion (Chapter 5). In Chapter 6, we de-scribe our final proposed project involving avatar interaction with children. We are particularlyinterested in the potential for avatars to be used for children’s therapy; therefore we investigatewhether or not avatars can be more engaging and attention holding than real people. Finally,Chapter 7 discusses the contributions and limitations of our work. A schedule of proposed worknecessary for completion of this dissertation is also included.

1.1 Modifying Facial Motion to Influence Perceptions of Ani-mated Characters

Animated characters need to embody certain attributes to be effective. For example, a virtualpeer working with a child with Autism Spectrum Disorder should be engaging and friendly (Tar-taro and Cassell, 2008). A virtual assistant or therapist should be intelligent and trustworthy.Characters in games, movies, and television shows need to embody many other personality traitsto be believable. Currently, many of the design decisions regarding character appearance and

5

motion are made by artists. Artists follow heuristic guidelines as well as their own intuition tocreate believable characters (Lasseter, 1987; Thomas and Johnston, 1981).

We are interested in scientifically exploring some of these guidelines and intuitions to betterinform animation and design decisions. Specifically, we are interested in understanding the re-lationship between the amount of character facial motion and the character rendering style. Weconducted two controlled laboratory studies in which participants watched animations of cartoonand more realistic-looking characters reading unique stories, each with a different amount of fa-cial motion. The spatial motion was exaggerated and damped in 10% increments up to a 40%difference from the original motion. In study one, characters told positive stories, and partici-pants rated each character on questions concerning likeability, trustworthiness, intelligence, andextroversion. In the study two, characters told positive and negative stories, and participants ratedeach character on questions concerning respectfulness, calmness, attentiveness, extroversion, andpositivity.

Our results indicate that participants were influenced by the exaggerated and damped facialmotion. Damped characters, regardless of story valence, were considered more respectful andcalm than exaggerated characters; however, damped characters were also considered less ex-troverted than exaggerated characters. In general, positive characters were perceived as morerespectful, calm, and attentive than negative characters. Positive, realistic-looking characterswere more well liked than the cartoon characters. The positive, realistic characters were alsoconsidered more likeable, trustworthy, and intelligent when they were exaggerated than whenthey were animated with damped motion. We did not find this effect with the cartoon charac-ters. We believe the differences found between the realistic-looking and cartoon characters weredue to the fact that participants were more sensitive to motion changes in the realistic-lookingcharacters than in the cartoon characters.

1.2 Influencing Trust and Cooperation between AdultsOur completed experiments demonstrated that our facial motion manipulations can influencepeople’s attitudes towards non-interactive animated characters. We next explore whether this in-fluence still exists when people interact with an avatar controlled in real time by a confederate. Inparticular, we will assess how avatar motion influences how much people trust and cooperat withtheir avatar partner on a modified version of the Desert Survival Problem (Lafferty and Pond,1974). To examine how facial motion impacts cooperation and trust, we will pair participantswith a confederate who will appear as an avatar whose motion will be damped, unaltered, or ex-aggerated. To complete the task, participants will first need to rank a list of items in terms of theirimportance for survival. The participants will then be introduced to the confederate-controlledavatar who will have a permutation of the participant’s rankings. Together, the confederate andparticipant, must discuss and cooperate to reach a mutual agreement on the ranking for the items.After submitting their agreed upon list, the participant will have the opportunity to change hisor her initial rankings. The participant will also complete a questionnaire asking for his or herimpressions of the confederate and the interaction. The questionnaires will provide us withself-reported perceptual results. We will analyze how participants changed their rankings in thethree lists they submit to us. If the avatar successfully persuades the person to change his or her

6

rankings, then this indicates that the person thinks the avatar is trustworthy. We will compare par-ticipants’ impressions of trustworthiness to number of items reranked. The conversation betweenconfederate and participant will also be analyzed for communicative properties such as type ofresponse (agreement, argument, explanation, question), number of words used, and time neededfor task completion. We expect that participants will cooperate more and take less time to com-plete the task when they interact with exaggerated avatars. Exaggerated avatars will appear themost extroverted compared to the unaltered and damped avatars. Prior research has shown thatpeople perceive extroverts as having greater influence on group outcomes (Barry and Stewart,1997). Participants should also find exaggerated avatars more credible (Burgoon et al., 2000);therefore they should agree more with the exaggerated avatars. There is also the possibility thatparticipants will cooperate and trust the avatar that complements their personality. Isbister andcolleagues (2000) had participants engage with extroverted or introverted computer agents in adesert survival task and found that introverted participants preferred working with extrovertedagents, whereas extroverted participants preferred working with introverted agents.

1.3 Engaging and Sustaining Children’s AttentionChildren interact with animated characters frequently for educational and entertainment pur-poses. In fact, there is research that children as young as two years old can learn from interactivemedia as well as they learn from real people (Troseth et al., 2006). Many popular children’stelevision shows such as Dora the Explorer and Blue’s Clues, feature animated characters thatask their viewers questions and pause for a response before revealing the answer. This pseudo-interactive style of television has been very popular and successful with children. For educationalpurposes, interactive characters have proven engaging and memorable (Oviatt, 2000; Darves andOviatt, 2004). Most of these animated characters are completely pre-programmed, and their in-teractive abilities are severely limited. Avatars, like the ones we use, offer greater potential forinteraction because the person controlling the avatar can freely respond to children, who often sayunpredictable things. Not only can children be difficult to understand, but when they get bored orfrustrated their attention can easily shift. Druin and colleagues (1998; 1999) interviewed manyinteraction designers who found that when play-testing their systems with children, the childrenoften became frustrated or bored when the systems were not as interactive as they expected.

We see great potential for adults to use avatars to improve their conversations with children.Adults converse with children for many different reasons. Some of the more serious and diffi-cult conversations occur with counselors, police officers, lawyers, social workers, clinicians, andresearchers. A quick search on PsychArticles and PsychInfo using the keywords “interviewingchildren” reveals 2,176 articles, many of them guides and best practices for successfully con-versing with children. Our system may help ease conversations between children and unfamiliaradults. We hope that exaggerating the avatar facial motion will make facial cues more salientand therefore easier to read for children. Not only should talking to an avatar be fun and novelfor children, but we can also investigate whether children’s perceptions of avatar personalityare similarly influenced by the exaggeration of facial motion like adults. We also believe thatadult controlled avatars could be more engaging and attention holding during conversation withchildren than adults themselves.

7

We will conduct a controlled laboratory study in which children, 4-10 years old, will conversewith a confederate using our custom audiovisual telecommunications system. The confederatewill engage children in three conversations. One conversation will display a video of the con-federate, another conversation will display the confederate as an avatar with unaltered motion,and the third conversation will display the confederate as an avatar with exaggerated motion. Wewill use the same confederate for all conversations; however we will change the confederate’svoice so that the children are unaware that they are talking to the same person each time. Theconfederate will be blind to how they appear and sound to the children. The confederate willattempt to engage the children in simple conversation and we will ask the children in post-studyinterviews what they thought about the avatars and their conversation. We will also examinethe conversational data for behaviors including, but not limited to, number of interruptions andwords and length of conversation. We expect that children will prefer speaking to avatars overreal people. Children should have longer conversations and require less prompting when theyenjoy the conversation. The results from this study should characterize the differences betweenhow children engage and converse with adults using video conference to avatar-mediated com-munication. This study will also investigate whether or not exaggeration of facial motion helpsin conversation between children and adults unknown to them.

8

Chapter 2

Background

People are naturally social creatures, and this idea is illustrated by the fact that we spend so muchtime being with and thinking about other people (Batson, 1990). When people communicateface-to-face they use both verbal and nonverbal information to understand one another. Peoplerecognize and interpret these social cues to judge ability, personality, emotion, and mental states.Based on these judgments, people respond in socially appropriate and expected ways to one an-other. These social responses are not just limited to human-human interaction, and, as manyresearchers have discovered, people will respond socially to computers that exhibit human-likeor social qualities (e.g. Kiesler et al., 1996; Parise et al., 1996; Sproull et al., 1996; Parise et al.,1999 ; for reviews, see Nass et al., 1994; Reeves and Nass, 1996; Nass and Moon, 2000). Inseveral experiments, Nass and colleagues demonstrated that people would respond to computersas social actors. They reran traditional social science experiments by replacing at least one ofthe people in the experiments with a computer displaying some social cues. The researchers thenexamined whether of not the remaining person or people in the experiments would be influencedby the computer in ways that were similar to those seen in the traditional human-human experi-ments. Over several decades, researchers have illustrated that people will apply gender and racialstereotypes, act polite, and respond socially to human-like computers (for reviews, see Nass et al.,1994; Reeves and Nass, 1996; Nass and Moon, 2000). Sproull and colleagues (1996) found thathuman-like faces on computers socially influenced people to express themselves more positively(social desirability effect) and to attend more to tasks (social facilitation effect).

There have been several proposed theories addressing the circumstances in which peopletreat computers as social actors. The Threshold Model of Social Influence (Blascovich, 2002;Blascovich et al., 2002) suggests that humans respond socially to each other, including avatarsas they are controlled by humans; however, humans will only respond socially to agents if theirbehavior is indistinguishable from that of an avatar. In other words, the agent must meet athreshold of believable behavior before a human will interact socially with it. The Ethopeiaconcept suggests that humans automatically and unconsciously treat computers as social actorswhen the computers display social cues (Nass and Moon, 2000). The Revised Ethopeia Conceptargues that behavioral realism of agents and avatars is correlated with the social response frompeople:

“...it does not matter whether participants are interacting with an agent or an avatar,but rather how many human-like characteristics the systems provide. Although every

9

system elicits social reactions as long as the system provides social cues (EthopoeiaConcept), a system will elicit more or stronger social reactions when it providesmore social cues (Revised Ethopoeia Concept). Thus, higher behavioral realismshould lead to more social reactions by the user” (von der Putten et al., 2010).

Although these theories differ in the subtleties, they agree that people will respond to com-puters as if they were human if the computers behave socially to some degree. This dissertationstrives to make avatars more successful conversational partners. To achieve this goal, we needto ensure that our avatars exhibit believable, human-like, social behavior during interaction. Inparticular, we focus on facial behavior because the face is so important during social interaction.Human faces not only communicate different types of information, but they can also influencethe attitudes and behavior of others. We review the research on the importance of facial motionand expression during human-human interaction, and we consider how virtual faces have provenequally important during human-computer interaction. In particular, we focus our review onhow human and synthetic faces express emotion, encode intention, depict identity, influence ourimpressions, and coordinate conversation.

2.1 Faces in human-human interactionThe human face provides a myriad of important information for social interaction. People areremarkably good at decoding the face for this information and are incredibly sensitive to slightdifferences in facial motion and expression. People depend on faces to recognize emotion, men-tal state, and identity (for a review, see Donath, 2001). We quickly use this information to formimpressions of other people. During social interaction, the face also aids communication. AsLittle and colleagues (2011) state in their review, “humans readily draw a number of conclu-sions about the personality attributes, appearance, emotional states and preferences of completestrangers solely on the basis of facial cues”. We review the literature regarding how the humanface conveys these different types of information, and how people perceive and react to this in-formation. We conclude this section with a review of how children process facial expressionsdifferently from adults.

2.1.1 EmotionPeople make facial expressions to communicate emotion (Ekman et al., 1987). Although thereare several basic emotions that are recognized universally (for reviews, see Ekman, 1973; Ek-man et al., 1980), not all facial expressions of emotion are recognized accurately (Feleky, 1914).Several researchers have investigated whether or not exaggerated facial expressions can improvehuman recognition of emotion from still images of faces (Calder et al., 1997; Hess et al., 1997;Benson et al., 1999). While some researchers found that exaggeration improved emotion recogni-tion and increased perceptions of emotional intensity (Calder et al., 1997; Hess et al., 1997), otherresearchers found only an increase in perceived emotional intensity. During social interactionspeople generally need to interpret moving faces; therefore researchers have also investigated theeffect of facial motion on emotion recognition (Bassili, 1978; Ambadar et al., 2005; Krumhu-ber et al., 2007). Bassili (1978) found that participants could accurately identify basic emotions

10

from facial motion when viewing point-light displays of facial expression. People are very goodat recognizing intense displays of emotion without motion, but Ambadar and colleagues (2005),found that motion improved the recognition accuracy for subtle facial expressions of emotion.Exaggeration of dynamic facial expressions also improves emotion recognition accuracy, at leastfor point light displays (Pollick et al., 2003). These experiments imply that emotional facialexpressions should be easier to recognize with the addition of exaggerated motion; however theexaggeration will also cause the expressions to be perceived as more intense.

When people see emotional facial expressions, they naturally and unconsciously mirror theexpression (Dimberg, 1982). Mirroring facilitates social interaction (Lakin et al., 2003; McIn-tosh, 2006). When an observer mirrors the emotional expression of another person, the mirror-ing helps to initiate and modulate that emotion in the observer (McIntosh et al., 1994; McIn-tosh, 1996). The observer is also better able to perceive and interpret the emotional facialexpression (Niedenthal et al., 2005). This facial mimicry also helps spread the emotion in asocial process known as emotional contagion (McIntosh et al., 1994; Lundqvist and Dimberg,1995). Emotional contagion is a quick and efficient communicative mechanism as exemplifiedby Frith (2009), “The sight of a fearful face is likely to be a cue that there is something to beafraid of and that the observer should therefore be vigilant”. Facial mimicry can even make theobserver find the other person more appealing (Cappella, 1993). Emotional facial expressionmirroring is very important for developing rapport between people.

2.1.2 Intention and mental stateFacial expressions not only communicate emotional state, but they can also communicate men-tal state and intentions (Keltner and Haidt, 1999). Our faces can communicate sympathy andembarrassment. We can voluntarily choose what facial expressions to display thus sending de-liberate communicative signals (Parkinson, 2005). For example, when people make eye contactwith somebody in pain, the observers tend to show more intense expressions of reciprocal painthan when they do not make eye contact, thus signaling sympathy to the person in pain (Bave-las et al., 1986). If a person smiles at someone else, that smile could mean that the person ishappy and approachable; however, not all smiles indicate happiness and approachability. Peoplewill fake expressions to hide how they really feel (Ekman, 1985). They may do so to appeaseothers or to appear more trustworthy. Krumhuber and colleagues (2007) investigated the effectsof facial dynamics on indicators of trustworthiness and found that participants were extremelysensitive to the facial motion information. Participants who saw “fake” smiles, created by ma-nipulating the velocity and timing of the smile, cooperated less than those who saw “authentic”smiles. Clearly, facial expression and motion affect people’s behavior. Detecting and makingsense of these expressions are therefore important for successful social exchange (Scharlemannet al., 2001).

2.1.3 Identity and Impression FormationWe recognize and identify people by the physical appearance of their faces, as well as their facialmotion. We are experts at recognizing faces at different distances, with different expressions, atdifferent angles, and at different ages (Zebrowitz, 1997). As Judith Donath (2001) points out,

11

“Indeed our notion of personal identity is based on our recognition of people by their face. To befaceless is to be according to the Oxford English Dictionary, ‘anonymous, characterless, withoutidentity’”. Our identity includes our gender, age, and ethnicity. These are pieces of informationthat are easily recognizable on the face, and they are used to form impressions regarding ourcompetence and personality. Many times these impressions are based on stereotypes. A wideeyed, small nosed, “baby-like” visage results in impressions that the person will be trusting,naive, kind, and weak (Zebrowitz, 1997). Motion, too, can contain identifying information.Using point-light displays of faces, participants could determine age and gender from facialmotion (Zebrowitz, 1997; Hill and Johnston, 2001). Facial motion also makes recognition offaces easier, especially for unfamiliar faces (O’Toole et al., 2002; Hill and Johnston, 2001).

2.1.4 Communicative mechanismFacial motion including eye gaze, nodding, and eyebrow movement are particularly impor-tant during conversation because these motions aid in regulating and clarifying the conversa-tion (Kendon, 1970; Condon and Osgton, 1971; Duncan, 1972; Bruce and Young, 1998; Casselland Thorisson, 1999). Occasionally these motions replace words such as in the cases when wrin-kling the nose indicates disgust, nodding indicates agreement, and looking upwards indicatesthinking. During conversation, people adjust their behavior to one another. For example, if aspeaker sees that a listener is thinking then the speaker may wait for the listener before mov-ing on, or the speaker may repeat what he said to clarify his statement. These behaviors do notoccur randomly and are synchronized to the semantic contents of conversation and to the conver-sants’ behaviors (Condon and Osgton, 1971; Kendon, 1970). In other words, people influenceeach other with their behavior during conversation. Until recently, it was thought that noddingpatterns were attributed to gender: women nod more when speaking to women, men nod lesswhen speaking to men, and women nod more than men when speaking to each other. In a recentexperiment, Boker and colleagues (2011) swapped confederates’ genders and found that maleparticipants adjusted their nodding behavior to the actual confederate gender and behavior (mennodded more when speaking to women disguised as men), thus providing evidence that noddingpatterns are based on reactions to others’ behavior. Gaze is also an important conversationalcue, and people are quite adept at interpreting gaze. People can indicate what they are talkingabout just by looking at it, “That is pretty”. Looking at or away from others indicates how so-cially accessible one wants to be. When students do not want to be called on in class, they averttheir gaze to avoid making eye contact with their teacher. When people take turns speaking, thecurrent speaker will often look at the person who speaks next to ensure a smooth transition ofspeakers (Kendon, 1967).

2.1.5 Children vs. adultsChildren understand from a very young age that the face can communicate emotion (for a review,see Widen and Russell, 2008). Children differ from adults in their recognition of facial expres-sions, but their ability to correctly identify and generate labels for emotional facial expressionsimproves over time. In several studies, Widen and colleagues discovered that children’s errors inemotion recognition were systematic (Widen and Russell, 2008, 2010; Widen and Naab, 2012).

12

These systematic errors indicate that children first develop broad categories for emotions such aspositive and negative, and over time they create subcategories with narrower criteria. For exam-ple, a young child may classify anger and fear as negative emotions at first, but eventually learnsthe difference between these two emotional expressions. The age at which children are capableof identifying basic emotions (happiness, sadness, fear, anger, surprise, and disgust) is still de-bated. In general, children become more accurate as they age. Two and three year old childrenhave not quite learned the basic emotion categories although they are very good at identifyingpositive and negative emotions (Widen and Russell, 2008). By the time the children are four andfive years old, they are much better at identifying happiness, sadness, anger, and fear (Widenand Russell, 2008). By the time children are ten years old they are mostly accurate at identify-ing basic facial expressions with girls slightly outperforming boys (Knowles and Nixon, 1989).These results imply, that children differ greatly in their ability to recognize emotion from facialexpressions between four and ten years old. There is research that found significant differencesbetween the abilities of six year olds, eight year olds, and ten year olds (Knowles and Nixon,1989). This same study also that eight and ten year old girls were significantly better at emotionrecognition than age matched boys. The stimuli for these studies was a cartoon dog, indicatingthat children were capable of recognizing emotional expression for unrealistic faces (Knowlesand Nixon, 1989).

2.2 Faces in human-computer interaction

Giving computers human faces provides a powerful social cue to the people who interact withthose computers. We must be careful in designing those faces, “for the face is replete with socialcues and subtle signals; a poorly designed facial interface sends unintended, inaccurate message,doing more harm than good” (Donath, 2001). Synthetic faces are not yet as realistic in appear-ance and behavior as real faces. Many researchers have investigated how virtual characters, withfacial expression and motion, are perceived by people. We review the literature regarding howpeople perceive synthetic faces that convey emotion, intent, and identity.

2.2.1 Emotion

Emotional facial expressions are extremely influential during human-human interaction. It fol-lows then that emotional facial expressions on interactive characters could have similar influenceduring human-computer interaction. First, we must know whether emotional facial expressionsare recognizable on synthetic characters as their appearance is generally simpler than a real hu-man’s appearance. Researchers have found that basic emotions are recognizable on syntheticfaces especially when the facial expressions are intense. Baldi, a 3D head, animated with visualspeech, could recognizably produce six basic emotions at intense levels (Massaro et al., 2000).Likewise, Katsyri and Sams (2008) found that basic emotions were recognizable on static anddynamic synthetic faces if the expressions were intense; however, for subtler expressions, mo-tion was necessary for correct emotion recognition. This result is similar to what Ambadar andcolleagues (2005) found with human faces.

13

Knowing that emotions are recognizable on synthetic faces, researchers next looked at theinfluence emotionally expressive characters could have on people during human-computer in-teraction. In a review of recent work investigating the effects that emotional agents have onhuman users, Beale and Creed (2009) found that emotive characters had a positive effect on hu-man learners in educational and fitness domains, reduced participant frustration while playinggames, and encouraged cooperation in collaborative situations. In an experiment using emo-tive co-learner agents represented by static photographs, participants not only rated the emotiveagents as more appealing and competent than the non-emotive agents, but they also performedbetter on tests when working with the emotive agents (Maldonado et al., 2005). Basic empa-thetic agents can positively affect gaming situations. In an experiment, participants saw a stillphoto of a happy, neutral, or sad face as feedback, and these simple cues were enough to re-duce participant frustration while they played blackjack (Brave et al., 2005). People developcloser relationships with emotive characters that are used to influence behavioral change thanwith non-emotive characters. Laura, a virtual exercise advisor, was developed to encourage peo-ple to exercise. Participants with the emotive version of Laura reported developing closer bondswith her than participants given the non-emotive version of Laura (Bickmore and Picard, 2005).Similarly to how mirroring is important in human-human interaction, characters that displayedmirroring behavior were considered more emotionally intelligent than agents that did not mir-ror users (Burleson and Picard, 2007). In an experiment where participants used avatars, Fabriand colleagues (2005) found that participants given emotive avatars were more cooperative thanthose using non-emotive avatars. The emotive avatar was a 3D head that had pre-animated fa-cial expressions of basic emotions that could be selected by the user. The non-emotive avatarwas also a 3D head, but it did not include the animated facial expressions. As evidenced bythese experiments, emotionally expressive interactive characters, even extremely basic ones, canpositively influence human-computer interaction.

2.2.2 Intention and mental statePeople interpret the facial expressions of animated characters similarly to how they interpret thefacial expressions of real people, especially when the characters are realistic looking. When peo-ple interact with realistic characters, they expect the characters to exhibit realistic social cues.To develop trust and cooperation, characters must exhibit competence (Parise et al., 1996). Hu-man users can correctly interpret character emotional displays as indicators of whether characterswill cooperate or betray users in an iterated Prisoner’s Dilemma game (de Melo et al., 2011; Choiet al., 2012). Even the differences between genuine and ingenuine smiles are recognizable onsynthetic faces (Krumhuber and Kappas, 2005).

2.2.3 Identity and Impression FormationStereotyping and over generalization is a problem in human-human interaction. A virtual char-acter literally masks the physical appearance of its user thus making it difficult to identify theuser’s ethnicity, age, gender, and physical features. Characters animated with real human mo-tion may still divulge some this information, and certainly communicative behavior will also addidentifying information (Hill and Johnston, 2001). Users must be aware that avatars still have

14

appearance; therefore, impressions based on appearance will still be formed. Researchers inves-tigated whether a character’s race and gender would effect users’ impressions of the character’spersonality (Cloud-Buckner et al., 2009). They discovered that even if the behavior was identi-cal between characters that differed in gender and coloring, participants’ ratings of personalitytraits were still influenced by the character’s coloring and gender. Other researchers found thatcharacters with large eyes and slow blink rates were considered intelligent, attractive, and so-ciable; whereas average blink rates indicated friendliness (Takashima et al., 2008; Weibel et al.,2010). We must ensure that characters representing real people send the right social signals, asmismatched or missing social cues are often misinterpreted (Donath, 2001).

2.2.4 Communicative mechanismUnsurprisingly, humans respond to nonverbal behavior from animated characters; therefore,gaze, nods, blinks, smiles are still important for developing rapport and ensuring smooth in-teraction (Cassell and Thorisson, 1999). Garau and colleagues (2001) discovered that peopleperceived avatars with realistic eye gaze to be friendlier and more trustworthy than avatars withrandom eye gaze. The more realistic eye gaze also helped participants coordinate turn taking. In-terestingly, virtual characters may elicit more self disclosure from people than real people (Gratchet al., 2007). Gratch and colleagues (2007) designed an agent with positive listening feedbackincluding nods, gaze, head tilts, and posture. Compared to face-to-face communication, par-ticipants engaged in conversation longer with the agent. This phenomenon was due to the factthat when strangers interact with one another, they tend not to exhibit as many positive listeningfeedback behaviors as the agent displayed. In other words, agents and avatars can be designed tobe better conversational partners if they exhibit appropriate and recognizable social cues.

15

16

Chapter 3

Experimental Setup

In this chapter, we describe the experimental setup used in this thesis. All three projects in thethesis require animated characters or avatars, controlled by confederates. We are concerned withcreating believable avatars and need their motion to mimic the pacing, style, and facial gestures ofhumans; therefore we chose to have confederates “puppet” our avatars to capture and reproducethose behaviors in real time. There are several methods for retargeting human motion to an avatar.One approach is motion capture, in which markers, strategically placed on a person’s face, aretracked, and the virtual markers on the character’s face are analogously moved. Unfortunately,motion capture is not very good at capturing blinks and is unable to capture gaze direction. It isalso quite unnatural to wear the markers, and this strangeness could lead to unnatural behavior.We intend to have our confederates control the avatars while they are having a conversation withour subjects. When we compare video to avatar, we would want our confederates to take themarkers off in the video condition, but then they would not be blind to the condition. For thisthesis, the characters and avatars will be animated using the tracked data from active appearancemodels or AAMs (Cootes et al., 2001, 2002; Matthews and Baker, 2004). This technique tracksa confederate’s face without external markers and maps the confederate’s facial motion to theavatar, allowing the confederate to speak and move freely. Section 3.1 describes this techniquein more detail.

We have designed a desktop-like audiovisual telecommunications system that enables twopeople to converse with one another with a shoulders and up view. Not only do the two users seeand hear one another, but, unlike most desktop conferencing systems, the users can also makeeye contact with one another. We use a beam splitter made of reflective material so that wecan hide the camera directly in front of our users and so that our users can make eye contact(see Figure 3.1). Users also appear life size with our system, and they can be shown as is oras avatars; therefore, enabling us to investigate interactions between people and avatars. Whenthe system displays an avatar instead of video, audio and eye gaze are still enabled. Becausewe use AAMs to map motion, the person controlling the avatar, our confederate, can still moveand speak freely. The AAMs also allow us to modify our confederate’s facial motion beforeit is mapped to the avatar. In real time, we can spatially exaggerate and damp our confederatesfacial motion; therefore, manipulating the intensity of the avatar’s expression. In the sections thatfollow, we will describe the AAMs, our audiovisual telecommunications system, and a study we

17

Figure 3.1: Example of two people using the first version of our audiovisual telecommunicationssystem.

conducted to validate that the intrinsic amount of delay our system adds to a conversation doesnot negatively affect the interaction.

3.1 Active Appearance Models

For a non-invasive method that is capable of capturing facial motion including eye gaze andblinks, we turned to Active Appearance Models (AAMs) (Cootes et al., 2001, 2002; Matthewsand Baker, 2004). This computer vision method requires creating a virtual model of the shapeand appearance of a person and an avatar. Once the model of the person is learned, their face canbe tracked and corresponding points in the avatar model can be moved.

We used 2D Active Appearance Models (AAMs) to track the confederates and synthesize theanimated faces (Cootes et al., 2001, 2002; Matthews and Baker, 2004). An AAM consists of twoindependent models that describe shape and appearance variation (see Figure 3.2). We used thesemodels to define all possible face shapes and appearances for our confederates and characters.Our face shapes were vectors of 79 coordinates (s = (x1, y1, ...x79, y79)

T ). We created the shapemodel with hand-labeled training videos. The shape model is defined in Equation 3.1 where s isa new shape, s0 is the mean shape, and the vectors s1 through sm are the largest basis vectors thatspan the shape space. The shape parameters, pi, indicate how much each corresponding basisvector contributes to the overall face shape. A new shape can then be expressed as the meanshape plus a linear combination of m shape bases.

s = s0 +m∑i=1

sipi (3.1)

The appearance model is defined similarly in Equation 3.2 with appearance, x = (x, y)T ,defined as the pixels that lie within the mean face shape. A(x) is the new appearance, A0(x) isthe mean appearance, A1(x) through Al(x) are the largest bases spanning the appearance space,

18

Figure 3.2: Example of the shape and appearance models, which can be warped to create newfacial postures. We simplified this figure from the article Active Appearance Models Revis-ited (Matthews and Baker, 2004).

and the λi appearance parameters indicate the amount that each appearance base contributes tothe new appearance.

A(x) = A0(x) +l∑

i=1

λiAi(x) ∀x ∈ s0 (3.2)

We followed the procedure from Boker, Theobald, and colleagues (Boker et al., 2009; Theobaldet al., 2009) to exaggerate and damp the facial motion of the characters. By multiplying the faceshape variation by values greater than 1, the motion is exaggerated, and by multiplying the faceshape variation by values less than 1, the motion is damped. This method of exaggeration anddamping affects all facial features. We did not track body motion, so the torsos of our animatedcharacters move rigidly with respect to a pivot located at their mouths. Our characters alwaysface forward as they are created from 2D data. We added rigid points around the tops of thecharacters’ heads to prevent warping, and we damped the face border and nose points by 50% toensure that the characters’ faces and noses did not appear to be squished or stretched when theconfederates turned their heads slightly.

To transform users into avatars, the system needs the users’ AAM models and images of theavatars to be animated. For this dissertation, we will only transform confederates into avatars,and, therefore, we only need our confederates’ AAM models and images of the avatars theywill be animating. Each confederate’s motion was retargeted to characters of the same gender.We did not modify the duration of motion even though our manipulations did change spatialand temporal motion. In other words, the time it took a confederate to start and end a motion,such as opening and closing his mouth, was the same regardless of manipulation; however, the

19

Figure 3.3: A colleague sits in front of the audiovisual telecommunications setup that recordsher, tracks her facial motion (lower right window of the monitor), and retargets her motion to anavatar (lower left window of the monitor).

confederate’s smile would be bigger and his lips would move faster in the case of exaggeration.Because the duration of motion was unchanged, the confederate’s audio was still synchronizedwith the modified motion. This procedure has been used successfully to manipulate appearanceand motion during real-life interaction during video conferences (Boker et al., 2009; Theobaldet al., 2009). See Figure 3.3 for a still image of the tracking and retargeting in real time.

AAMs have some limitations, and unfortunately these limitations affect all of our projects.We cannot transfer our confederates’ motions to our characters perfectly, because 2D AAMscannot measure motion accurately when the confederates turn their heads or close their eyes andmouths completely. We have requested that our confederates limit their head motion. When theconfederates do turn their heads slightly, this motion translates into torso motion in the animatedcharacters. The lack of head turns and the additional torso motion may be perceived as unnatural.There are many other animation techniques that could have been used, however we chose to use2D AAMs because they are relatively quick and inexpensive to use. They are also customizableto individual people, so that subtle facial expressions are transferred to the animated characters.

To better understand how the limitations of using 2D AAMs may be biasing our results, wewill examine the AAMs tracking error. Once we have constructed the 2D AAM model of aconfederate, we can record new video of the confederate. We will test the model by comparingthe tracked points to manually labeled points. This comparison should give us a quantitativemeasure of the 2D AAMs tracking error. If this error is large and is biasing our results, we

20

Figure 3.4: Diagram illustrating how the audiovisual telecommunications system works.

may need to explore the possibility of using 3D AAMs. 3D AAMs can be derived from 2DAAMs (Xiao et al., 2004). 3D AAMs use a 3D shape model that constrains the shape parameters,which forces the AAM to move consistently with the 3D shape model.

3.2 Custom Audiovisual Telecommunications SystemOur custom audiovisual telecommunications system was designed so that participants could con-verse using free speech and facial expression. Users of the system should feel like they are con-versing with someone who is seated across from them. People on screen appear life size andmake eye contact as they would if physically in the same space. Besides enabling eye contact,our system is unique because it can transform users into avatars that are animated with the users’own facial motion. The system consists of two setups (A and B), each in a separate room (Ex-periment Rooms A and B), with a control room in between. From the control room, researcherscan monitor activity in both experiment rooms. The rooms are arranged such that Setup A(B)sends all video and audio data through the control room where it is recorded and then passedto Setup B(A). Each setup captures the audio and video of the person in front of it, sends thatdata to the control room, processes video data received from the control room, displays the pro-cessed video on its screen, and plays the audio received from the control room. The audio, ifprocessed, is done in the control room via a mixing board and voice changer before being sent toits destination. For a diagram of the system refer to Figure 3.4.

Each setup consists of a chair, computer, and black box. The box houses a monitor, camera,speaker, microphone, and beam splitter. The black box sits on top of a height adjustable tableso that we can ensure that the camera is inline with the person’s face. We have taken great careto seal all the edges of the box so that no light can enter the box. The absence of light hides theinnards of the box from view, allowing the beam splitter to work properly. Users sit comfortablyin front of the box. The monitor sits on top of the box, and the monitor’s image is reflected downonto the beam splitter. The beam splitter is a partially reflective and partially transparent mirrormade of glass or mylar. Participants can make eye contact with each other because the camera isbehind the beam splitter (Figure 3.5). A shotgun microphone, below the beam splitter, capturesthe user’s voice. A speaker is mounted under the camera, directly behind the beam splitter, suchthat audio originates directly in front of the user. We also have the option of using headphonesinstead of the speaker. If we are using the system to display video, then the video from the

21

(a) (b)

Figure 3.5: 3D Models of the second version of our audiovisual telecommunications system. (a)A side view of the box containing the camera, beam splitter, microphone, and monitor. (b) Aview from the back of the box.

control room is displayed as is on the monitor. If we are using the system to display an avatar,then we use the computer to track the speaker in the video, retarget this motion to the avatar,and display the animated avatar on the monitor. We add delay to the audio with the mixer inthe control room to ensure that the audio and video are in sync. Our measurements indicate thatthere is 67 ms of delay inherent to our system. We have validated that this delay has a negligibleeffect on impressions of users of our system, the interaction between users, and our system itself.The details of the study we conducted are in Section 3.3.

We created two versions of our system. Both systems use Audio Technica shotgun micro-phones, Sony PMW-EX3 cameras, AJA Kona capture cards, Apple 6-core 2x2.93 GHz MacPros,and a Yamaha 01v96 digital audio mixer. In the first version, the boxes and beam splitters in eachsetup were smaller (12-in.×14-in.), the beam splitter was made of glass, and we only used head-phones. We used the first version to conduct our study investigating the effects of delay onconversation described in Section 3.3. The second version of our system has speakers, largerboxes, and larger beam splitters made of mylar (25.75-in.×27.25-in.). The mylar is thinner thanglass and does not block the sound from the speaker that is placed behind it.

3.3 Evaluating the Impact of Delay on Dyadic ConversationsAudiovisual telecommunication systems support natural interaction by allowing users to re-motely interact with one another using natural speech and movement. Network connectionsand computation cause delays that may result in interactions that feel unnatural or belabored.In an experiment using our custom audiovisual telecommunications device, synchronized audioand video delays were added to participants’ conversations to determine how delay would affectconversation. To examine the effects of visual information on conversation, we also comparedthe audiovisual trials to trials in which participants were presented only the audio information.

22

Authors Delay (ms) Pair Types

Riesz and Klemmer, 1963 600 co-workersKlemmer, 1967 600 co-workersKrauss and Bricker, 1967 900 strangersKitawaki and Itoh, 1991 560 co-workers and strangersKurita, et al., 1994 300 co-workersHolub, et al., 2007 500 strangers

Table 3.1: Audio delay thresholds found in prior work.

We present self-report data indicating that delay had a weaker impact when both audio and videochannels were available, for delays up to 500 ms, than when only the audio channel was available.

3.3.1 Related Work

Visual telecommunication systems are popular because they support more natural forms of in-teraction than telephones and text-based chat rooms. Non-verbal behaviors, such as head nods,facial expressions, eye blinks, eye gaze, and lip movements, are available during interaction,and these behaviors are important cues that improve the ability to express understanding, agree-ment, and attitude, enhance verbal descriptions, interpret pauses, and take turns (Duncan, 1972;Isaacs and Tang, 1993). Unfortunately, visual telecommunication systems are subject to networkand computational delays, which can negatively impact users’ interactions. Audio delays causepeople to interrupt each other more frequently and to spend more time gaining control of orclarifying the conversation (Kurita et al., 1994; Vartabedian, 1966).

Industry experts suggest that audio delays be kept below 200 ms (Percy, 1999; Polycom,2006). Many researchers have identified thresholds at which delay becomes noticeable or inter-feres with audial conversation, but these thresholds are inconsistent (see Table 3.1). Althoughall delay thresholds were determined based on free conversation tasks, the thresholds range from300-900 ms. The studies conducted in English (Klemmer, 1967; Krauss and Bricker, 1967; Rieszand Klemmer, 1963) suggest that the delay threshold is within 600-900 ms, but these are also theoldest studies. The newer studies (Holub et al., 2007; Kitawaki and Itoh, 1991; Kurita et al.,1994), which also happen to be in non-English languages, suggest that the delay threshold iswithin 300-560 ms. Besides the different language, these lower thresholds may be due to the factthat participants were directly asked about delay and interference.

Prior research investigated the differences between audio and audiovisual platforms in re-gards to communicative efficiency and noticeability of delay. Isaacs and Tang (1993) evaluatedthe differences in interaction between individuals collaborating on a task using both audio andaudiovisual telecommunications systems. They found that the addition of video allowed partic-ipants to better understand each other and express themselves. Turn taking within the conver-sations was easier with video than without, and overall, the interactions were considered easierthan the audio only interactions. Kurita and colleagues (1994) examined the noticeability of de-lay with participants who used both audio and audiovisual systems. They found that there were

23

Trial Topic

1 favorite food2 favorite vacation3 hobby4 dream vacation home5 event to plan6 favorite restaurant7 activity to try

Table 3.2: Topics used in study.

no differences between participants’ perceptions of delay regardless of which system they used.We investigated the differences between audio and audiovisual platforms in terms of the qualityof interaction.

3.3.2 Hypothesis

Prior research suggested that delays would be noticeable between 200 and 600 ms. To discoverexactly how much delay it would take to negatively impact conversations, each participant wasexposed to seven different delay conditions between 67 and 900 ms. We expected that longdelays would cause conversations to feel unnatural and uncomfortable.

Because prior research had not reached a clear conclusion regarding the possible benefits ofvisual information in the presence of delay, we also investigated the differences in conversationalattribute ratings between audio and audiovisual conversations. We expected that participants us-ing the audiovisual system would experience a more natural and more comfortable conversationthan those using the audio system because nonverbal information is so important in normal con-versation. We hypothesized that delay would have less of a negative effect on the naturalnessand comfortableness of the conversation if video were available. Prior research also suggestedthat audio delay increased the number of interruptions in a conversation, but because non-verbalinformation is so important to turn taking (Duncan, 1972; Isaacs and Tang, 1993), we expectedthat delay would not have as much of an effect on the number of perceived interruptions whenparticipants could see one another.

3.3.3 Materials and Method

We examined conversational attribute ratings in a controlled laboratory experiment for adult,native English speakers. Each pair of participants had seven conversations about selected topics,each of which were followed by short surveys asking participants to rate the conversation onvarious attributes. Half of the pairs conversed with both audio and video, while the other halfconversed with audio only. Different amounts of delay were inserted into each of the sevenconversations.

24

Participants

We advertised our study on a university experiment scheduling site. Fifty-six adults participatedin this study (age range: 18-59 years; median age: 24 years; 28 females). Participants, who werestrangers to one another, were run in same-gender pairs. All participants completed informedconsent forms approved by the Institutional Review Board. Participants were paid $15 for thehour long study.

Experimental design

We used a repeated measures experimental design. Delay was a within-subjects factor withseven conditions (67, 200, 300, 400, 500, 600, 900 ms), and communication channel (CC) was abetween-subjects factor with two conditions (audio only or audio and video). Delay conditionswere chosen based on previous research and extensive pilot testing. In the audio only CC condi-tion, participants saw a static desktop image of a purple sky on the screen. The delay conditionswere assigned to participant pairs in a 7 × 7 Latin square design. Participants were randomlyassigned to a CC condition. Finally, based on their CC condition, a pair was then assigned to oneof four Latin squares, resulting in one square for each CC condition and gender combination.

The topic ordering was kept the same across all participant pairs (see Table 3.2). Before eachconversation, participants were given topic sheets that included some sample questions and basicprompts that could be used to keep the conversation alive.

Procedure

Each pair of participants completed consent forms at the study location. They were then taken toseparate study rooms containing the audiovisual telecommunications stations. The experimentersinformed the participants that they would have seven 4-minute conversations using the stations,and that they would be given topic sheets for inspiration. Participants could use a small timer tokeep track of their conversation, and they were informed that the experimenter would interruptthe conversation once four minutes had passed. Once seated, participants were given headphonesand the first topic sheet. Participants were told to start whenever they both were ready. Theexperimenters then left the study rooms to monitor the conversations from a nearby location.

After four minutes, the experimenters interrupted the conversations, gave the participantsshort surveys to complete, and presented the next topic sheet. This process was repeated for eachof the seven trials. After all seven trials, participants completed a questionnaire asking abouttheir favorite conversations and any difficulties with understanding the other participant.

Measures

Immediately following each conversation, participants rated the conversation on nine, five-pointscale items. We chose the questions to reflect our main interest in the flow of conversation andhow delay might disrupt conversation. We combined some items after exploratory factor analysissuggested they loaded on the same factor. Table 3.3 lists the questions.

25

Scale Questionnaire Items Alpha*Topic likeability Did you like or dislike the topic? Do you

think your partner liked or disliked the topic?0.8646

Comfortableness How comfortable or uncomfortable did youfeel? Did you find your partner comfortableor uncomfortable?

0.8875

Naturalness How was the flow of this conversation? Hownatural or unnatural did you find this conver-sation? Was this conversation like or unlikean in-person conversation

0.8585

Perceived interruptions How many times did you and your partner in-terrupt one another?

NA

Perceived pace How quick or slow was your partner to re-spond?

NA

Table 3.3: Questionnaire items administered after each conversation. *Cronbach’s Alpha is ameasure of the reliability of the scale as a whole. Alpha ranges from 0.0 to 1.0.

3.3.4 ResultsThe self-report data indicated that video weakened the negative impact of delays on naturalnessfor delays up to 500 ms, whereas in conversations with no video, delays at or above 400 ms neg-atively impacted naturalness. Once delays were at or above 600 ms, conversations from both CCconditions were perceived as significantly less likeable, comfortable, and natural. Interruptionsincreased with delay and were not mitigated by the addition of video.

Effects of delay

We found delay to have a significant impact on all scale items except for pace which was onlymarginally affected (see Figure 3.6). We discovered that as delays increased, likeability of topicdecreased, F (6, 318) = 2.43, p = .03. Participants especially disliked topics presented withdelays at of above 600 ms compared to those presented with shorter delays, F (1, 318) = 6.15,p = .01. Delay also had a significant effect on comfortableness, F (6, 318) = 2.29, p = .04,and naturalness, F (6, 318) = 3.29, p = .004, with both qualities decreasing with the increase ofdelay. We expected long delays would cause conversations to feel unnatural and uncomfortable.When conversations were presented with delays at or above 600 ms, they were rated significantlymore unnatural, F (1, 318) = 16.95, p < .0001, and uncomfortable, F (1, 318) = 6.95, p = .009,than conversations with delays between 67-500 ms.

Interruptions significantly increased with delay, F (6, 318) = 6.56, p < .0001, and as de-picted in Figure 3.6, the amount of delay was found to be significantly correlated to the numberof interruptions (r(392) = 0.1891, p < .001). Overall, conversation pace was only marginallyaffected by delay, F (6, 318) = 1.64, p = .14, but when delays were at or above 600 ms, the paceof the conversation was considered to be significantly slower than conversations with delaysbetween 67-500 ms, F (1, 318) = 5.82, p = .02.

26

67 200 300 400 500 600 900

3.5

4

4.5

Delay (ms)

LikeabilityComfortablenessNaturalnessLack ofInterruptionPace

67 200 300 400 500 600 900

4.5

4

3.5

67 200 300 400 500 600 900

3.5

4

4.5

Delay (ms)


Likeability Comfortableness

67 200 300 400 500 600 900

3.5

4

4.5

Delay (ms)


67 200 300 400 500 600 900

3.5

4

4.5

Delay (ms)


Naturalness

Lack of Interruption

67 200 300 400 500 600 900

3.5

4

4.5

Delay (ms)


67 200 300 400 500 600 900

3.5

4

4.5

Delay (ms)

LikeabilityComfortablenessNaturalnessLack ofInterruptionPacePace

Delay (ms)

Figure 3.6: Effects of delay on conversation perceptions. The main effect of delay on perceptionsacross all scales except for pace is statistically significant, p < .05.

We found an interaction effect of communication channel and delay on naturalness (F (6, 318) =2.27, p = .04). Contrast tests revealed that conversations with short delays of 67-300 ms did notsignificantly differ in naturalness between the two CC conditions. Conversations with mid-lengthdelays of 400 and 500 ms maintained their naturalness in the audio and video CC condition anddecreased in naturalness in the audio only CC condition (F (1, 77) = 3.84, p = .05). Conversa-tions with delays at or above 600 ms were the most unnatural (F (1, 318) = 16.95, p < .0001). Inother words, video weakened the negative impact of delay on conversation naturalness for delaysup to 500 ms while audio only conversations were negatively impacted with delays at or above400 ms.

Prior research indicated that delay would negatively impact conversation or become notice-able at some point between 300 and 900 ms. We consistently found that across all of our con-versation attributes, conversations with delays above 500 ms were negatively impacted. Allparticipants were given the opportunities to comment on any technological or communicativedifficulties during the study. Participants were also told the study’s purpose after their conver-sations, and they were asked if they had noticed any delays. Only 16 of the 56 participantsindicated that they were aware of any delay (28.6%), suggesting that most strangers conversingwith one another will not notice delays above 500 ms. This difference from previous studies mayindicate that people today are more accustomed to delay due to the popularity and widespreaduse of internet telephony and video chat.

27

67 200 300 400 500 600 900 3.5

4

4.5

Delay (ms)

AudioAudio & Video

67 200 300 400 500 600 900

4.5

4

3.5

Delay (ms)

Audio Audio & Video

67 200 300 400 500 600 900 3.5

4

4.5

Delay (ms)

AudioAudio & Video

Figure 3.7: Effects of delay and channel on perceived conversation naturalness. The interactioneffect of delay and channel on the perception of naturalness is statistically significant, p < .05.

Additional analyses

The topic of conversation had a significant main effect on topic likeability, F (6, 318) = 11.69,p < 0.0001, with “favorite food”, “event to plan” and “dream vacation home” being the leastfavorite topics. We believed that participants might require the first trial, “favorite food,” tobecome acquainted with the equipment and each other. If this were true then the first trial shouldscore significantly lower than the other trials, including the other trials with disliked topics,however, as this was not true we kept the first trial in the rest of our analysis.

3.3.5 ConclusionPrior research suggested that audio delays between 300-900 ms would not only be noticeable,but that the delay would also negatively impact remote interactions. In our experiment, strangersconversing with one another indicated that delays negatively impacted likeability of conversationtopic, comfortableness, naturalness, pace, and interruptions. In particular, delays at or above600 ms had a significantly stronger impact than delays between 67-500 ms. Video was found toweaken the negative impact of delay on naturalness for delays up to 500 ms, whereas audio onlyconversation naturalness suffered from delays at or above 400 ms. This difference could be dueto the fact that audiovisual interaction allows participants to see nonverbal information. Thesefindings are promising for our future work with this audiovisual telecommunications system,because we can easily stay under 500 ms of latency. This limit will not be a problem even whenwe track a confederate and animate an avatar for interaction.

28

Chapter 4

Modifying Facial Motion to InfluencePerceptions of Animated Characters

For this project, we conducted a pretest and two controlled laboratory studies. For the pretest,we evaluated people’s sensitivity to motion changes in cartoon and more realistic-looking hu-man characters. This pretest helped us determine the amounts of exaggeration and damping toapply to our characters. The first study evaluated the influence of exaggeration and damping onperceptions of the animated characters. For this experiment, participants watched animationsof cartoon and more realistic-looking characters reading positive stories, each with a differentamount of facial motion. Participants then rated each character on questions concerning likeabil-ity, trustworthiness, intelligence, and extroversion. The second study was similar in format to thefirst study, except this time we had participants watch animations of the characters telling eitherpositive or negative stories. Participants were then asked to rate each character on respectfulness,attentiveness, calmness, extroversion, and positivity. In the sections that follow, we provide themethod and results for the pretest and both experiments as well as a discussion of limitations.

4.1 Related work

Exaggerated movement is a signature of cartoon-style motion, and this characterization has beenwidely accepted as a guiding principle of animators who try to create the “illusion of life” (Las-seter, 1987; Thomas and Johnston, 1981; Hodgkinson, 2009). Exaggeration is used to makemovement, intention, and emotion more salient. This animation principle has even been bor-rowed by the robotics community to make human-robot interaction more intuitive and engag-ing (Gielniak and Thomaz, 2012; Ribeiro and Paiva, 2012; Takayama et al., 2011). As charactershave become more visually realistic, there has been more demand that their motion also be morerealistic (Hodgkinson, 2009). In a corporate setting, Inkpen and Sedlins (Inkpen and Sedlins,2011) found that users were more particular about their avatar’s appearance than the appearanceof the avatar with which they interacted. The researchers found that 65% of participants believedit was important for avatars to convey their user’s personality.

McDonnell and colleagues (McDonnell et al., 2012) investigated the effect of rendering styleand motion on perceptions of a virtual human’s appeal, friendliness, and trustworthiness. Partic-

29

ipants saw still images of a character in different rendering styles or they saw short clips (6-10seconds) of the animated character in different rendering styles. Motion did not have a significanteffect on appeal, friendliness, or trustworthiness; however, rendering style, as rated by partici-pants, had a significant effect on all three traits. The characters rendered in the most appealingstyles were also rated as the friendliest and most trustworthy. Realistic and cartoon characterswere equally well-liked, but characters that were in the middle of the cartoon-to-real spectrumwere not well-liked.

4.2 Determining facial motion sensitivity

We conducted a perceptual experiment to determine sensitivity to motion changes in cartoonand more realistic-looking characters. We wanted to ensure that we were changing characterbehavior in such a way that it would be noticeable and influence participants in future studies. Todetermine when exaggerated and damped motion would be perceived as different from originalmotion, we had participants compare animations that differed only in the amount of motion.We ran two small studies to investigate motion sensitivity in cartoon and more realistic-lookingcharacters.

4.2.1 Participants

We advertised our studies on a university experiment scheduling site. Twenty adults participatedin each of the studies (forty adults total). Unfortunately, we did not start collecting age andgender information on the participants until after we had started, because we did not hypothesizethat participants’ sensitivity to motion would be gender or age dependent. We did collect genderand age information for 28 participants (age range: 18-61; median age: 24; 15 females). Allparticipants read and signed informed consent forms approved by the Institutional Review Board.Participants were compensated for their time.

4.2.2 Materials

We used a ten second clip of an actress reading a story to create our animations. An artist cre-ated a female cartoon (toon) character and a more realistic-looking (CG) female character for us(see Figure 4.1, which we animated by retargeting the tracked motion of our actress. Additionalstimuli were created by exaggerating and damping the actress’s motion in 20% increments fromoriginal motion (100%) with a maximum motion level of 180% and a minimum motion level of20%. We presented all study stimuli and collected all participant responses using Apple 27-inchflat panel LED cinema displays connected to machines running OSX 10.6, Matlab, and the Psy-chophysics Toolbox extensions (Brainard, 1997; Pelli, 1997; Kleiner et al., 2007). Participantsinput their responses with a keyboard.

30

(a)

(b)

Figure 4.1: Examples of the (a) toon and (b) CG characters side-by-side with the characters onthe right displaying normal motion and the characters on the left displaying damped motion.

4.2.3 Procedure

Participants arrived at the study location and completed consent forms with the experimenterpresent to answer any questions. Participants were then led to the study room and seated in frontof the experiment monitor and keyboard. Participants were given verbal and written instructionsthat they would be comparing two animated characters side by side and would need to decidewhich character was moving more or if the characters were moving the same amount. Partici-pants were told to press 1, 2, or 3 on the keyboard where 1 indicated that the left-most animatedcharacter was moving more, 2 indicated that the right-most character was moving more, and 3indicated that the characters were moving the same amount. These instructions were displayedfor each trial of the experiment. Participants always saw two animations at a time, and one of theanimations was always presented with original motion while the other animation was damped,exaggerated, or the same. Participants were not told that one of the animations was always thesame across trials. In the first study, participants only saw the cartoon character, and we testedeight motion levels ranging between 20% and 180% of original motion in 20% increments. Inthe second study, participants only saw the more realistic-looking character. Because we wantedbetter granularity of the sensitivity threshold, we tested 28 motion levels ranging from 30% to170% of original motion in 5% increments.

31

20 40 60 80 100 120 140 160 18020

30

40

50

60

70

80

90

100

Percent of Original Motion (%)

Acc

ura

cy (

%)

Toon

CG

Figure 4.2: Motion level pretest results.

4.2.4 ResultsFor the cartoon character study, we ran a one-way analysis of variance (ANOVA) on participantaccuracy and found that there was a significant effect of motion level F (7, 152) = 7.70, p <0.0001. Participants performed above chance at all motion levels except the 120% level, the leastexaggeration tested. Using a Student’s t-test to compare all means, we found that accuracy atthe 140% and higher levels was significantly higher than accuracy at the 120% level. Similarly,accuracy at the 60% and lower levels was significantly better than at the 80% levels. Theseresults indicate that motion at and below 60% and at and above 140% are noticeably damped andexaggerated, as depicted in Figure 4.2.

For the realistic-looking character study, we conducted a similar analysis of participant ac-curacy. We also found a significant effect of motion level, F (27, 907) = 22.72, p < .0001.Participants performed above chance at all motion levels except those between and including85% and 110%. A Student’s t-test comparing all means revealed that participants performed wellat motion levels below and including 75% and above and including 130%. Participants performedslightly above chance at the 80% motion level as well as the motion levels between and including115% and 125%. These results indicate that, on the more realistic-looking character, motion atand below 75% and at and above 130% are noticeably damped and exaggerated, as depicted inFigure 4.2.

4.2.5 DiscussionThese results indicate that people have slightly different sensitivity levels to facial motion in car-toon and more realistic-looking animated characters only when the motion is slightly modified.Motion differences may have been easier to notice with the more realistic character because theskin texture is more detailed, and the motion is more apparent. Because the cartoon character’stexture is flat, areas of the cheek and forehead appear static when the character moves its mouthor eyebrows, making it more difficult to notice motion differences. From our results, partici-pants seem to be more sensitive to slight damping than to slight exaggeration. This sensitivity

32

difference is illustrated by the asymmetrical curve centered at 100% in Figure 4.2. The accu-racy for the slightly damped characters (< 100%) was higher than for the slightly exaggeratedcharacters (> 100%); however once the motion was 100 ± 40% participants had nearly perfectaccuracy. Participants also seem to be more sensitive to damping in the more-realistic characterthan in the cartoon character as illustrated by the higher accuracy for the more realistic-lookingcharacter. These results informed our motion level conditions for the following studies.

4.3 Perceptual effects of facial motion and appearanceWe examined participants’ impressions of animated characters in two controlled laboratory stud-ies. Participants viewed animated realistic and cartoon characters with differing amounts of mo-tion. Participants viewed each character once as he/she told a unique story. Each participant saweach motion level only once. The two studies differed on whether the characters told emotionallypositive or negative stories and on which personality traits we asked participants to rate.

4.3.1 Study OneFor this study, participants watched characters tell positive stories. After watching each character,participants rated the character on items related to likeability, trustworthiness, intelligence, andextroversion. We discovered that perceptions of likeability and intelligence differ depending oncharacter rendering style and motion level. We also confirmed that perceptions of likability andtrustworthiness are correlated and that amount of motion is positively correlated with perceptionsof extroversion.

Hypotheses

It is believed that character rendering style and motion level should “match”; therefore ourfirst hypothesis (H1) is that participant ratings of character likeability and intelligence will behigher for characters with matching rendering style and motion level than for characters withmismatched rendering style and motion level:(H1a) With exaggerated motion, cartoon characters will be rated higher than realistic-looking

characters;

(H1b) With damped motion, realistic-looking characters will be rated higher than cartoon char-acters;

(H1c) For realistic-looking characters, damped motion will be rated higher than exaggeratedmotion;

(H1d) For cartoon characters, exaggerated motion will be rated higher than damped motion.Prior research found that extroverts had faster body movement times than introverts (Wickett

and Vernon, 2000; Doucet and Stelmack, 1997; Stelmack et al., 1993) suggesting that we shouldfind a positive correlation between motion level and ratings of extroversion (H2).

The relationship between extroversion and intelligence is unclear as there is work supportinga positive correlation (Wickett and Vernon, 2000; Roberts, 1997) and more recent work sup-

33

porting a negative correlation (Wolf and Ackerman, 2005). We expect our results to support anegative correlation between extroversion and intelligence (H3).

Finally, research on expert witnesses in court found that likeability and trustworthiness werepositively correlated (Brodsky et al., 2009); therefore we expect to find that well-liked animatedcharacters will also be rated as trustworthy (H4).

Participants

We advertised our study on a university experiment scheduling site. Thirty-four adults partici-pated in this study (age range: 18-62 years; median age: 22.5 years; 18 females). We eliminateddata from two participants due to equipment problems. All participants read and signed informedconsent forms approved by the Institutional Review Board. Participants were compensated fortheir time.

Surveys and Questionnaires

Participants completed the Ten Item Personality Inventory (TIPI) prior to seeing any anima-tions (Gosling et al., 2003). After each animation, participants completed a questionnaire ask-ing for their impressions of the character they had just seen. We asked twelve questions on thesetopics in the form of five-point rating scales. Principal component analysis followed by a factorrotation indicated that the twelve questions loaded onto three factors with acceptable reliability(Cronbach’s α).• Likeability: Five questions on perceived likeability, trustworthiness, sincerity, reliability,

and warmth loaded onto this measure. (Cronbach’s α = 0.81)• Intelligence: Three questions on perceived intelligence, competence, and how well in-

formed the character was loaded onto this measure. (Cronbach’s α = 0.76)• Extroversion: Four questions on perceived extroversion, inhibition (reverse scored), socia-

bility, and dramaticism loaded onto this measure. (Cronbach’s α = 0.71)

We presented all questionnaires and study stimuli on Apple 27-inch flat panel LED cinemadisplays connected to machines running OSX 10.6, Matlab, and the Psychophysics Toolboxextensions (Brainard, 1997; Pelli, 1997; Kleiner et al., 2007). Participants wore headphones andinput their responses with a keyboard.

Characters and Animations

We created eight different characters, four of which were in a cartoon style (toon) and the otherfour in a more realistic style (CG). There were an equal number of male and female charactersper rendering style. Characters of the same gender differed by skin tone, hair color, eye color,and shirt color (see Figure 4.3). Animations were 1086 × 639 pixels and presented at 60 framesper second.

We recorded an actor and actress, from the shoulders up, reading eight short stories usinga Sony PMW-EX3 camera and an Audio Technica shotgun microphone. The stories ranged inlength from 1:24 to 1:42 (min:s). We used the recordings to create 2D AAMs of the actors’

34

(a) CG Female I (b) Toon Female I (c) CG Female II (d) Toon Female II

(e) CG Male I (f) Toon Male I (g) CG Male II (h) Toon Male II

Figure 4.3: The eight characters in cartoon and realistic rendering styles.

faces. These models were used to track the actors’ motion, which was then retargeted to ourcharacters. We used the same audio across all animations. The audio tracks were created from asingle actor’s recording of a specific story. For a description of the AAMs and retargeting pleaserefer to Section 3.1.

Story Calibration

A single author wrote 24 short stories from the same point of view. We calibrated the stories oninterest and emotional complexity, positivity, and intensity through Amazon’s Mechanical Turk.We collected 16 to 21 independent ratings for each story. We used a one-way analysis of variance(ANOVA) on emotional complexity followed by pairwise t-tests to find 18 statistically similarstories. Five stories were eliminated for exemplifying negative emotion. We then plotted thethirteen stories that were left and selected the eight stories that were most similar with respect toemotional intensity and interest. To avoid repeating stories, we used an equal number of storiesand characters.

Motion Levels

From the studies we described in section 4.2, we determined that motion was clearly exagger-ated or damped when it differed by ±40% from original motion for both our cartoon and morerealistic-looking characters. Based on these results, we chose to use the following eight motionlevels: 60%, 70%, 80%, 90%, 110%, 120%, 130%, and 140%. We provide a sample frame ateach of these motion levels in Figure 4.4.

35

(a) 60% (b) 70% (c) 80% (d) 90%

(e) 110% (f) 120% (g) 130% (h) 140%

Figure 4.4: Sample still frames of the eight different motion levels.

Experimental Design

We used a repeated-measures experimental design. Motion level and rendering style were within-subjects factors with eight and two conditions, respectively. Because we wanted participantsto form impressions of the characters’ personalities, we did not allow participants to see anycharacter, motion level, or story more than once. Therefore, participants viewed eight differentcharacters, each displaying a different level of motion, and each telling a different story. Weconstructed paired orthogonal Latin squares, which simultaneously counterbalanced immediatesequential effects and pairing of conditions (Lewis, 1989). This pair of Latin squares was usedto select participants’ trial conditions and trial order. Across all participants, every (character,motion) pair occurred twice, and the order was counterbalanced to control for possible ordereffects.

Procedure

Participants arrived at the study location and completed consent forms. An experimenter waspresent to answer questions and explain the study tasks to participants. The experimenter ledparticipants to the study setup, and participants completed the TIPI. After completion, partici-pants advanced to another instruction screen explaining that they would see a series of anima-tions, each of which would be followed by a questionnaire asking about their impressions ofthe character in the animation. The participants then viewed each animation and answered thesubsequent questions. A screen before each animation reminded participants that they could takea break. At the end of the study the participants were shown a thank you screen, and then theexperimenter debriefed and paid them. The experiment took no longer than 40 minutes. Theexperimenter stayed near the participant during the study so that participants could ask questionsor take breaks at any time.

36

60 70 80 90 100 110 120 130 140

3

4


Lik

eabil

ity

Toon CG

(a)

60 70 80 90 100 110 120 130 140

3

4


Inte

llig

ence

Toon CG

(b)

Figure 4.5: Interaction effects on likeability and intelligence.

Results

We conducted a repeated-measures ANOVA to analyze the possible effects of motion level andrendering style on our measures of likeability, intelligence, and extroversion. We found thatmotion level and rendering style influence participant impressions, although not always in waysthat we expected. We also found a significant main effect of rendering style on impressions ofcharacter likeability and intelligence. As expected, we found a relationship between motion leveland extroversion.

Hypothesis H1 stated that character rendering style and motion level should match based onanimation principles. This hypothesis was only partially supported. The interaction betweenmotion level and rendering style was marginal for likeability and significant for intelligence,F (7, 211) = 1.78,p = 0.092 and F (7, 207) = 2.91,p = 0.006, respectively (see Figure 4.5). Weinvestigated these interactions in more detail with post hoc contrast tests.

While investigating H1a, we found that exaggerated cartoon characters were rated equal to orlower than exaggerated realistic-looking characters. The difference in likeability ratings was notsignificant when examining all levels of exaggeration (110% - 140%) together, F (1, 223) = 3.49,p = 0.06; however ratings of intelligence were significantly higher for exaggerated realistic char-acters than for exaggerated cartoon characters F (1, 218) = 14.47, p = 0.0002. With slight exag-geration, 120%, the realistic characters were significantly more likeable and intelligent than thecartoon characters, F (1, 217) = 4.99, p = .03 and F (1, 212) = 16.05, p < 0.0001, respectively.

To investigate H1b, we compared the likeability and intelligence ratings of damped (60% -

37

Toon CG3.4

3.6

3.8

4

Rendering Style

Per

ceiv

ed M

easu

re

Likeability Intelligence

Figure 4.6: Rendering Style Trends for Likeability and Intelligence

90%) cartoon characters to the ratings of damped realistic-looking characters, and we found thatthe ratings did not differ significantly, F (1, 223) = 0.24, p = 0.62 and F (1, 218) = 1.58, p =0.21, respectively. When motion was noticeably damped (60% and 70%), the cartoon characterswere significantly less likeable than the realistic characters, F (1, 221) = 4.89, p = 0.0281,confirming H1b for likeability when motion was extremely damped.

We expected that damped realistic characters would be more likeable and intelligent thanexaggerated realistic characters (H1c); however, we found that slightly exaggerated (110% and120%) realistic characters were preferred to slightly damped (80% and 90%) realistic charac-ters, F (1, 215) = 4.95,p = 0.03 for likeability and F (1, 210) = 6.67,p = 0.01 for intel-ligence. We found similarly surprising results when investigating H1d: slightly damped car-toon characters were considered more intelligent than slightly exaggerated cartoon characters,F (1, 210) = 7.20,p = 0.008. Our unexpected results can partially be explained by the maineffects of render style we found on both likeability and intelligence, F (1, 195) = 4.44,p = 0.037and F (1, 196) = 3.69,p = 0.056, respectively (see Figure 4.6).

As hypothesized in H2, we found a significant main effect of motion level on perceptions ofextroversion, as depicted in Figure 4.7, F (7, 197) = 2.14, p = 0.04. A post hoc contrast verifiedthat extroversion ratings did not differ significantly at the 60% and 70% levels, F (1, 198) = 2.56,p = 0.11. There was a significant positive correlation between motion level and extroversion, r =0.14, p = 0.02, suggesting that as motion level increases, perceptions of extroversion increase.

Although, we suspected (H3) that there would be a correlation between extroversion andintelligence as previous research had suggested (Wickett and Vernon, 2000; Roberts, 1997; Wolfand Ackerman, 2005), we found no such correlation. However, our data support hypothesis H4,that likeability and trustworthiness are positively correlated. The questions regarding likeabilityand trustworthiness all loaded onto the same factor during principal component analysis.

We were concerned that perceived character age could be confounded with rendering style,and specifically that realistic characters might be perceived as older than cartoon characters. Weasked participants to estimate the age of the characters after they saw each one. We found nosignificant effect of rendering style on perceived character age, F (1, 25) = 1.26, p = 0.26.

38

60 70 80 90 100 110 120 130 1403

3.5

4


Extr

over

sio

n

Figure 4.7: Effect of motion level on extroversion.

Discussion

We have examined the effects of rendering style and motion level on impressions of characterpersonality. We wanted to know whether the character rendering style should match the amountof facial motion. We found evidence to support this idea when characters were extremely damped(60% and 70%) because the cartoon characters were then significantly less likeable than therealistic-looking characters.

Surprisingly, we also found support for a preference of characters with mismatched renderingstyle and motion level. Slightly exaggerated realistic-looking characters were more likeableand intelligent than slightly exaggerated cartoon characters. Because the cartoon characters areflatter, the slightly exaggerated motion may not have been perceived as well on the cartooncharacters than on the realistic-looking characters. The cartoon characters may have appearedas if they were “missing” some facial motion, and this flaw may have decreased their appeal.In contrast to the cartoon character, the more realistic-looking character may have appearedslightly more expressive, and, therefore, improved their appeal. Independent of motion level,our participants perceived the more realistic characters as more likeable and intelligent than thecartoon characters.

Previously, body motion speed and extroversion had been correlated; however, no previousstudies investigated the relationship between the amount of facial motion and extroversion. Weidentified a positive relationship between these variables. Prior research was inconclusive inregards to the relationship between extroversion and intelligence, and we found no relationship;however, we did find a positive correlation between likeability and trustworthiness as expected.

We used rendering styles similar to the Toon Flat and Toon CG styles used by McDonnelland colleagues (McDonnell et al., 2012). In contrast with their results, we found a significantdifference between ratings of likeability and trustworthiness across the two styles. Our realisticcharacters were perceived as significantly more likeable and trustworthy than our cartoon charac-ters. This difference may have been due to our animation technique as McDonnell and colleaguesused full motion capture data to animate their characters.

We chose to use positive stories about ordinary situations. Context may have an importantimpact on perceptions of character personality. It is quite possible that exaggerating motionduring expressions of negative emotion may have a different effect on participants’ perceptionsof personality, and we explored this question in Study Two.

39

4.3.2 Study TwoFor this study, participants watched characters tell either positive or negative stories. Afterwatching each character, participants rated the character on items related to respectfulness, at-tentiveness, calmness, extroversion, and positivity. We discovered that damped characters wereperceived as more respectful and calm than exaggerated characters. Similar to study one, theexaggerated characters were still perceived as more extroverted than damped characters. Char-acters telling positive stories were considered more respectful, calm, and attentive than characterstelling negative stories.

Hypotheses

We hypothesized that in negative situations, more motion would translate into characters thatlooked jumpy and fidgety, thus making them appear less respectful, calm, and attentive; however,in positive situations, we hypothesized that more motion would create characters that lookedexcited and enthused, thus making them appear more respectful and attentive. Based on resultsfrom study one, we also hypothesized that motion would again be correlated with ratings ofextroversion.

Participants

Sixty-six adult participants who had not completed any of our other studies took part in thisstudy (age range: 18-58; median age: 23; 31 females). We eliminated data from two partici-pants because they were personally familiar with the actors we used to animate our characters.Participants were recruited from the same university experiment scheduling site used before. Allparticipants read and signed informed consent forms approved by the Institutional Review Board.Participants were compensated for their time.

Surveys and Questionnaires

Because we did not find any interesting correlations between participant personality and theirperceptions of the animated characters in Study One, we did not ask participants to complete theTIPI for this study. We are interested in seeing avatars used for therapy and counseling; therefore,we focused our survey questions on qualities that are important for therapists and counselors. Weasked participants about extroversion again because we wanted to see if the correlation wouldhold for characters telling negative stories. As a check of our story valence, we asked participantsto rate the characters’ positivity. In total, we asked 16 questions in the form of five-point ratingscales. Principal component analysis followed by a factor rotation indicated that 15 questionsloaded onto four factors, with acceptable reliability (Cronbach’s α). The question on positivitywas not included in this analysis.

• Respectfulness: Seven questions on perceived respectfulness, patience, considerateness,acceptance, understanding, humbleness, and sensitivity. (Cronbach’s α = 0.91)

• Attentiveness: Two questions on perceived attentiveness and carefulness. (Cronbach’sα = 0.54)

40

• Calmness: Three questions on perceived calmness, contentedness, and untroubledness.(Cronbach’s α = 0.79)

• Extroversion: Three questions on perceived extoaversion, dramaticism, and inhibition (re-verse scored). (Cronbach’s α = 0.51)

Method and Procedure

Study two was similar in method and procedure to study one. We used the same characters,actors, motion levels, and equipment. In this study, participants would either see eight characterstell positive stories or they would see eight characters tell negative situations. No participantssaw characters tell both positive and negative stories. When characters told positive stories theywere the same as the stories used in study one. From our original story calibration data, weselected the eight most emotionally negative stories to use as well. Our experimental design,was also similar to the design of study one except that participants were randomly split into twogroups to decide whether they would see positive characters or negative characters. We followedthe same procedure as in study one.

Results

Because our actors’ performances were varied in range of movement, we first calculated nor-malized values for their motion from their original recordings. From these values, we coulddetermine damped, normal, and exaggerated ranges of motion. When we created our anima-tions with all the different motion levels, we organized them into these ranges of motion. Theanimations with the most motion were dominated by a single actor, and likewise the animationswith the least amount of motion were also dominated by a single actor. We excluded these tenextreme animations (26 of 512 trials) from our analyses so that a single actor would not becomeassociated with a particular type of motion. We investigated possible actor influence but foundthat our results were not significantly different; therefore, in our analysis we do not differentiatethe actors.

We asked participants to rate how positive or negative they found the characters to check thatour stories were truly based on positive or negative emotion. We found a significant effect ofstory valence on ratings of character positivity, F (1, 76) = 147.73, p < 0.0001 (see Figure 4.8).Interestingly, we also found that damped characters were considered more positive than normaland exaggerated characters, F (1, 442) = 9.54, p = 0.002, see Figure 4.9. We found no signifi-cant interaction between story valence and motion; therefore, damped characters were perceivedas more positive than normal and exaggerated characters regardless of story valence.

In general, we found a halo effect with the positive characters (see Figure 4.8). When par-ticipants saw positive characters, they would rate the characters as more respectful (F (1, 78) =88.70, p < 0.0001), calmer (F (1, 75) = 135.42, p < 0.0001), and more attentive (F (1, 76) =20.09, p < 0.0001). We found no interactions between story valence and motion; however we didfind significant main effects of motion on ratings of respectfulness and calmness, F (2, 446) =7.95, p = 0.0004 and F (2, 442) = 6.87, p = 0.0012, (see Figure 4.9). Post-hoc contrastsindicate that participants perceived damped characters to be more respectful than normal char-acters, F (1, 448) = 8.48, p = 0.0038. Similarly, participants perceived damped characters to

41

Negative Positive

2

3

4

Story Valence

Rat

ing

Respectfulness Calmness Attentiveness Positivity

Figure 4.8: Influence of story valence on ratings of character respectfulness, calmness, attentive-ness, and positivity.

Damped Normal Exaggerated

3

4

Motion Level

Rat

ing

*

Respectfulness Calmness Extraversion Positivity

Figure 4.9: Influence of motion level on ratings of character respectfulness, calmness, extrover-sion, and positivity. Ratings at specified motion levels (*) were significantly different (p < 0.05).

be calmer than normal and exaggerated characters, F (1, 444) = 5.45, p = 0.020. Participantsalso found cartoon characters to be calmer than the more realistic characters, F (1, 427) = 4.82,p = 0.03. As expected, we found a significant main effect of motion on ratings of extroversion,F (2, 438) = 7.02, p = 0.001.

Discussion

We found that damped characters were considered significantly more respectful and calmer thannormal and exaggerated characters. This difference may have been due to the fact that thedamped characters looked less fidgety than the normal and exaggerated characters regardlessof the situation. Cartoon characters were also considered calmer than the more realistic-lookingcharacters, perhaps because the toon characters appeared to be moving less due to their lack of

42

textural detail. Unsurprisingly, we found a halo effect for the characters telling positive stories.They were considered, overall, more respectful, attentive, and calmer. We also discovered thatdamped characters were considered more positive than normal and exaggerated characters. Asexpected, we also found that motion was still correlated with extroversion, regardless of the storyvalence.

4.4 LimitationsWe conducted controlled laboratory experiments to examine the relationship between renderingstyle and facial motion on viewers’ perceptions of character personality. Unfortunately, we onlyhad one actor and one actress, and therefore we could not investigate the effects of charactergender on perceptions of character personality because actor was perfectly confounded withcharacter gender.

From the studies described in this chapter, we have learned that facial expressiveness, inde-pendent of audio, influences viewers’ perceptions of animated characters. The strongest influenceoccurred during moderate motion changes. We also noticed that participants were more sensitiveto motion changes in the more realistic-looking characters than in the cartoon characters. Exag-gerated characters did not strongly affect participants’ perceptions on traits besides extroversion,perhaps because the motion was perceived as unnatural, or perhaps because the audio overpow-ered the visual information. We know from prior research that audiovisual information cannotbe completely separated, and the more emotional channel exerts the most influence (Massaroand Egan, 1996; de Gelder and Vroomen, 2000). Our exaggerated characters may have appearedemotionally disingenuous, and, therefore, participants may have relied more heavily on the audioinformation.

4.5 ConclusionOur experiments revealed that viewer attitude can be influenced by the facial motion intensityof animated characters. We made characters seem more likeable, trustworthy, intelligent, re-spectful, calm, extroverted, and positive by simply changing the amount of facial motion. Thisinfluence exists regardless of whether characters are discussing positive or negative situations.We also found that participants were more sensitive to motion changes in the more realistic-looking characters than in the cartoon characters, most likely due to the textural detail of themore realistic-looking characters. Our next steps include investigating whether or not facial mo-tion intensity can also influence people’s behavior and not just their attitudes.

43

44

Chapter 5

Influencing Trust and Cooperation

In the last chapter, we described a project that examined how animated characters’ facial motioninfluences people’s impressions. In this chapter, we propose a project to investigate how avatarfacial motion influences people’s impressions and behavior. Specifically, we will explore howavatar facial motion could encourage cooperative behavior and increase trust. Cooperation andtrust are important for effective collaboration and conversation. Collaboration involves workingtogether (cooperation), but also requires people to trust that their partners are credible and reli-able. If we can influence how people judge avatars on qualities that are related to trustworthiness,then we can potentially influence how people cooperate with the avatars.

There are several personality traits that are correlated with influence and trustworthiness. Weknow from research on group dynamics that people trust and are influenced more by other peoplewho present themselves as confident, composed, and competent (Burgoon et al., 1990). We alsoknow that human facial expressiveness has been correlated with perceptions of greater compe-tence, composure, and persuasiveness (Burgoon et al., 1990). From our own experiments (Hydeet al., 2013), we know that the exaggeration of animated characters’ facial motion is correlatedwith perceptions of extroversion and competence. A person who appears to be confident andextroverted may convince other people that he or she is more knowledgeable (Barry and Stew-art, 1997). These findings lead us to believe that we could animate avatars to make them moreinfluential and trustworthy during conversation by exaggerating their facial expressiveness.

For our study task, we will ask participants to independently rank a list of items. We willthen have the participants work with a confederate-controlled avatar to come up with a mutualranking. The confederate will argue for a permutation of the participants’ initial rankings. Afterthe discussion, participants will rank the items again. Confederates, who are blind to the motioncondition, will control avatars animated with exaggerated, unaltered, or damped motion. It ispossible that confederates’ motion may be influenced by participant mirroring behavior, and,therefore, confederates may move more or less depending on motion condition. We will evaluatethis possibility. Participants will only see one type of motion. This task will allow us to measurethe avatar’s influence on participants’ beliefs in their rankings and participants’ cooperation withthe avatars. If a participant changes his or her rankings so that they are closer to the confederate’srankings, then these changes would be a measure of the avatar’s trustworthiness and influenceover the participant. This experiment will demonstrate whether avatar facial motion is a strongenough cue to influence people’s behavior.

45

5.1 Related Work

The Desert Survival Problem (Lafferty and Pond, 1974) is a classic team building exercise thathas been used in many different research experiments to explore group and social dynamics. Theproblem has many variants, but in general a group or pair must work together to rank a givenset of items in order of their importance for survival. The group is given a scenario such as,“You are driving in the middle of the desert, several days travel away from the nearest village,when your car breaks down. You manage to salvage several items”. The group is then tasked torank the items based on importance for survival. To research social dynamics and cooperation,individuals in the group work separately at ranking the items first and then work together as agroup. Afterwards, the group is separated and members have the opportunity to change theirinitial rankings. By having participants rank the items three times, researchers can compare howparticipants were influenced by the interaction with the other members of the group.

While investigating the effect of computer agent anthropomorphism on influence during theDesert Survival Problem, researchers discovered that computer agents were actually more in-fluential than human partners (Burgoon et al., 2000). Participants changed their rankings morewhen they interacted with a computer than when they interacted with a human confederate. Inthe experiment, participants interacted face to face with a human confederate or one of five com-puter setups, which the authors describe as, “(1) text-only; (2) text and synthesized voice; (3)text, voice, and still image; (4) voice and animation (with the same image as in the previouscondition but with facial features and a mouth that ‘moved’ in sync with the synthesized voice);and (5) text, voice, and animation”. Confederates followed the same script as the computer, butthey could also respond to participant remarks. The researchers found minimal differences be-tween the computers and human confederates in terms of credibility, but participants did preferthe computers as partners over the humans confederate.

In a different experiment (Isbister and Nass, 2000), participants completed the Desert Sur-vival Problem with a computer agent that exhibited extroverted or introverted verbal and nonver-bal cues. The agent was a wooden mannequin, faceless, and ungendered, but it was capable ofmaking poses that were open (extroverted) or closed (introverted). An example of an open posewould be arms stretched out and head tilted up, whereas a closed pose would have arms behindthe body and head tilted down. Agents had no voice and communicated by text box with par-ticipants. Extroverted agents asserted themselves with language that was confident and friendly,whereas introverted agents asked questions and made suggestions with weaker language. Be-cause the researchers were interested in the effects of mismatched cues (extroverted verbal withintroverted nonverbal) they did not report results for the influence of consistently extroverted andconsistently introverted agents separately. Instead, the researchers noted that consistent agentswere more influential than inconsistent agents. The researchers also discovered that participantspreferred working with agents that complemented their personality; for example, extrovertedparticipants preferred working with introverted computer agents. Because participant personal-ity may interact with the avatar’s perceived personality, our participants will take a personalitysurvey, which we will use to examine possible correlates between participant personality, avatarmotion level, participant’s impressions of the avatar, cooperative behavior, and participant pref-erences.

46

5.2 Research QuestionsWe believe that the amount of avatar facial motion is a powerful enough cue to influence people’sdecisions. We will be evaluating the influence of exaggerated and damped avatars by comparingparticipant rankings before and after they interact with an avatar. We will also time participantinteractions with the avatar as a measure of participant’s cooperation.

RQ1: Is the amount of facial motion influential enough to affect participants’ behavior during acooperative task?

RQ2a: Will participants cooperate with exaggerated avatars more than damped avatars just aspeople cooperate with extroverted people more than with introverted people?

RQ2b: Will introverted people prefer to cooperate with exaggerated avatars that appear extro-verted, and, vice versa, will extroverted people prefer to cooperate with damped avatarsthat appear introverted.

RQ3: Will the confederate’s behavior be influenced by the participants mirroring the avatarmotion?

RQ4: Will questionnaire responses reflect that participants perceive exaggerated avatars as moreextroverted than damped avatars during interaction?

5.3 HypothesesWe expect that exaggeration will have a similar effect on participants’ impressions of avatarextroversion as it did in our prior work (Hyde et al., 2013) and that of others (Burgoon et al.,2000). We hypothesize that participants will cooperate with exaggerated characters more thanwith unaltered or damped characters, and we expect that participants will cooperate the least withdamped characters. Because other researchers (Isbister and Nass, 2000) found that extrovertedpeople preferred working with introverted computer agents, and introverted people preferredworking with extroverted computer agents, we also hypothesize that participants may prefer andcompromise more with avatars that display a complementary personality.

H1a. Participants will compromise soonest with avatars exhibiting exaggerated motion, fol-lowed by avatars exhibiting unaltered motion, and then damped motion.

H1b. Interactions will be shorter with avatars exhibiting exaggerated motion, followed by avatarsexhibiting unaltered motion, and then damped motion.

H2. Participants will find avatars exhibiting exaggerated motion the most extroverted, sociable,confident, credible, knowledgeable, and friendly, followed by avatars exhibiting unalteredmotion, and then damped motion.

H3a. Introverted participants will prefer interacting with avatars that exhibit exaggerated motionto avatars that exhibit damped motion.

H3b. Extroverted participants will prefer interacting with avatars that exhibit damped motion toavatars that exhibit exaggerated motion.

H4. Participant preference will be positively correlated with length of interaction.

47

5.4 Measures

We will have participants complete the Ten Item Personality Inventory (Gosling et al., 2003)to categorize them as extroverted or introverted. Participants will complete this short surveybefore their interaction with the confederate avatar. We will also have participants complete aquestionnaire to get their impressions of the confederate avatar and their interactions. We willask participants to rate the avatar on qualities such as trustworthiness, credibility, extroversion,persuasiveness, knowledge, and friendliness. We will ask participants about how enjoyable,relaxing, and natural they found the interaction.

We will be measuring cooperation in several different ways. We consider compromise asone indicator of cooperation. Participants’ initial rankings will be compared to their rankingsafter interaction (pre-post values). The pre-post values will give us a qualitative measure of howinfluential the avatar was because the avatar will be arguing for a permutation of the participant’srankings. In particular, we will be able to see how and to what degree our participants wereinfluenced. For example, if a participant were to change his/her final ratings to match the con-federate’s rankings that would indicate that the confederate had a very strong influence. On theother hand, if the participant’s final rankings are unchanged from his/her initial rankings thenthis would indicate that the confederate had very little influence on the participant. We willcount the number of ranks changed, and we will also quantify how much the ranks changed.We will also compare participant’s initial rankings to the rankings they submit with the con-federate (pre-confederate values). The pre-confederate values are a qualitative measure for howmuch the participant compromised with the confederate, and they may differ from the pre-postvalues. A participant may compromise more with the confederate during interaction, but thenwhen given the chance to submit their final rankings the participant may revert to what he/sheoriginally believed. This scenario would indicate that the confederate encouraged cooperationwith the participant but did not have much influence on the participant. We will analyze theparticipant-confederate interaction for other indicators of compromise and cooperation. For ex-ample, because our confederate will try twice to get a participant to compromise before agreeingwith the participant, we can count the confederate’s attempts at compromise as a measure of thelack of participant cooperation. The length of the interaction indicates how difficult it was forthe confederate and participant to reach consensus.

If necessary, we will label each participant-confederate exchange as an agreement, a question,an argument, a clarifying statement, or other. A large number of argument exchanges wouldindicate less compromise. We can also measure the confederate’s behavior during each condition.This measure may be interesting because, although the confederate will be blind to the motionlevel condition, mirroring behavior from the participant may encourage more or less motion fromthe confederate. It could be useful to compare actual confederate motion to avatar motion, andto see if the avatar facial motion levels affected confederate performance in any way.

5.5 Procedure

Participants will arrive individually at the study location, and they will be given time to read andask questions about the consent form. After signing the consent form, an experimenter will es-

48

cort the participant to the study room. The experimenter will sit with the participant and explainthe Desert Survival Problem. The participant will receive a form with written instructions, a listof items with pictures and brief descriptions, and space to write his/her rankings. The experi-menter will also inform the participant that he/she will be discussing the rankings together with apartner, and that together, the participant and partner must agree on a mutual set of rankings. Theexperimenter will also explicitly state that the partner will appear as an avatar. Once the partici-pant has finished selecting the initial rankings for the items, the experimenter will have him/hersit in front of our audiovisual telecommunications setup. The confederate avatar will alreadybe on screen. The experimenter will ask the confederate to introduce herself to get a short dia-logue started between the confederate and participant. After the introductions, the experimenterwill tell the confederate and participant to work together on the Desert Survival Problem, andupon completion of the task, the participant should notify the experimenter who will be sittingjust outside the study room. The experimenter will then leave the study room and wait until theconfederate and participant have finished. During this time, the confederate will follow a scriptto guide the interaction. This script is described in the next section. When the interaction isdone, the experimenter will ask the participant to once again rank the items. The experimenterwill explain that the rankings can be the same or different than what was previously submitted.Once the rankings are submitted, the participant will be given a questionnaire asking for his/herimpressions of the confederate. The participant will also complete a personality survey at thistime so as not to bias to the interaction. The participant will then be debriefed and introduced tothe confederate.

5.6 Confederate Behavior

Participants and our confederate will work on a task inspired by the Desert Survival Prob-lem (Lafferty and Pond, 1974). To complete the task, our confederate and participants mustagree on how to rank the listed items. Our confederate will start the interaction by saying, “Whatdid you list as most important?”. Our confederate will then argue for a permutation of the partic-ipant’s ranking. We will use the same algorithm to permute all of the participants’ rankings. Forexample, if our algorithm swaps items one and two, then our confederate would always argueto swap items one and two regardless of what those items are. For each item, the confederatewill have two reasons for why the item should be ranked higher and two reasons for why theitem should be ranked lower. This strategy is similar to Morkes and colleagues (1999). Whenthe confederate disagrees with the participant, she will try twice to get the participant to com-promise. If the participant still disagrees with the confederate, then the confederate will concedeand agree with the participant. When the participant and confederate have agreed on all therankings, the confederate will say, “Great, I think we’re all done. What do you think?” Once theparticipant answers affirmatively, the interaction will end. The length of the interaction will becalculated based on when the confederate asks, “What did you list as most important” and whenthe participant agrees to the confederate saying, “Great, I think we’re all done. What do youthink?”.

49

5.7 Experimental DesignWe will use a between subjects experimental design. Motion level will be our between subjectsfactor with three conditions: exaggerated, unaltered, or damped facial motion. Each participantwill only be exposed to one motion level, and our confederate will be blind to the condition.We will randomly select a motion level to show participants. We will select equally perceptibleexaggeration and damping levels based on the results from the motion sensitivity pilot studydescribed in Section 4.2.

5.8 Analytic MethodsWe will conduct a Multivariate Analysis of Variance (MANOVA) to examine how motion levelaffects our multiple dependent measures of influence and cooperative behavior. Because we be-lieve that exaggerated motion will have the greatest effect followed by unaltered and dampedmotion, we have planned to conduct contrasts between exaggerated motion and everything else,exaggerated motion and unaltered motion, unaltered motion and damped motion, and exaggera-tion motion and damped motion. We will also investigate whether or not participant personalityhad an effect on our results. We will look for significant interactions between participant person-ality, motion level, and ratings of confederate personality. It is possible that some items may beconsistently considered more important than other items. To control for this possibility, we willinclude participants’ original rankings as a covariate in our analysis.

5.9 ConclusionFrom this study we hope to learn whether or not we can encourage people to work together andtrust avatars by exaggerating or damping the avatar’s facial motion. Because avatars are used forgroup work, counseling, and education, it is important to understand how avatar facial motioncould influence cooperation and trust. This study will also provide evidence that avatars can bedesigned to influence people’s behavior.

50

Chapter 6

Increasing Children’s Engagement andAttention During Interaction

Children interact with animated characters on a frequent basis. Many times these characters areon television shows or in educational games, in which case the interactions are staged, and thecharacters cannot respond to unexpected behaviors. We envision that avatars like ours could beused to engage children in more responsive and dynamic interactions creating a fun, novel, andexciting experience for the children. This research has practical applications because, it can bedifficult for children to converse with unfamiliar adults, but sometimes this is necessary espe-cially when the adults are authority figures such as lawyers, social workers, clinicians, educators,and researchers. Avatars may also ease conversations between adults and children with AutismSpectrum Disorders who often have difficulty conversing with people. There are many compet-ing theories for why children with ASD have difficulty conversing; however some researchershave observed that children with ASD have done well conversing with other socially engagingtechnologies such as virtual peers and avatars (Tartaro and Cassell, 2008; Kandalaft et al., 2013).Although we will include only typically developing children in our study, we hope that in thefuture, avatars like ours could be used to help engage children with ASD in social exchanges.

We propose a controlled laboratory study to investigate the benefits of expressive avatarsduring interaction with children (4-10 years old). We expect that avatars and exaggerated facialmotion will increase engagement and attention. For our study, children will converse with thesame confederate three times using our audiovisual telecommunications system. The confederatewill appear as herself by video, as an avatar with unmodified motion, and as an avatar withexaggerated motion. We will use a voice changer to pitch shift the confederate’s voice so thatthe children think they are speaking to three different people. We will record these conversationsand measure the length of the conversations, the children’s preferences, and the contents of theconversations. We will investigate whether children converse differently based on whom theyappear to be talking to and whether the children have a preference for conversation partner.We expect that the children will prefer to talk to the avatars, and that the avatars will makethe children feel more comfortable thus allowing them to communicate better. We also believethat the exaggerated avatar will be perceived as more extroverted, and, the children will talkmore. This study will provide insight into how expressive avatars can be used to promote betterconversations between adults and children.

51

6.1 Related WorkChildren learn from a young age to pay attention to socially relevant information. By the timethey are two years old, they are using social information that includes eye gaze, gesture, andemotional displays (for a review, see Baldwin and Tomasello, 2001). Although young childrenpay attention to socially relevant information from people who are physically present, there hasbeen a question as to how children viewed people on screens. As adults, we understand thatpeople who are not physically present may still provide useful information to us. For example,we listen and learn from newscasters about what traffic to avoid or how to prepare for the weather.Children, on the other hand, do not necessarily listen and learn from people on television. Trosethand colleagues (2006) conducted several studies in which two year old children watched peopleon monitors or in the same room give useful hints for a game. Only the physically present people,and the people on monitors who interacted with the children were able to get the children to usethe hints. These results emphasize the importance of interaction if young children are to learnfrom media. As avatars like ours are responsive and interactive, we would expect that childrenwould be able to have meaningful conversations with them.

When comparing children’s interactions with adults to animated characters, researchers foundthat children had a much higher number of disfluencies in their speech when speaking withadults (Oviatt, 2000). Disfluencies, or broken speech, can make understanding what a child saysdifficult. If children are better able to express themselves when speaking with animated char-acters, then avatars, like ours, could be useful for interviewing children. We will compare ourparticipants’ abilities to converse with our confederate when she is presented as herself throughvideo or avatar.

In an experiment with older children (7-10 years old), Darves and colleages (2004) investi-gated whether or not the personality of animated characters could enhance children’s experienceswith educational software. The animated characters were assigned extroverted voices or intro-verted voices, but the content of their speech was identical. Children were more engaged withthe extroverted characters as evidenced by their increased number of questions. For our experi-ment, we will manipulate the confederate’s extrovertedness by exaggerating her motion. It willbe interesting to discover whether or not motion, and not voice quality, is enough to increasechildren’s engagement with our avatars.

6.2 Research QuestionsWhen children are bored or uncomfortable they can be difficult to talk to (Druin et al., 1998;Druin, 1999). Avatars may be useful in this situation because children may find avatars more funand less threatening than adults who are unfamiliar to them. We are interested in how adults canuse avatars like ours, to better communicate with children. For our study, we will characterizebetter communication as more engagement, more attention, and clearer language.RQ1: How do children’s engagement and attention differ when conversing with a person shown

on video and avatars?

RQ2: How do children’s engagement and attention differ when conversing with unaltered andexaggerated avatars?

52

RQ3: Will children prefer to speak with unfamiliar avatars or unfamiliar adults?

RQ4: Do children use fewer disfluencies when conversing with avatars than with a person shownon video?

6.3 HypothesesWe believe children will find the experience of speaking with an avatar fun, engaging, and novel.We expect that they will be more engaged with and prefer to speak to an avatar than with ourconfederate through video. We also expect that exaggerated avatars will appear more extrovertedand friendlier to children; therefore, encouraging engagement. Engagement will be measured bytiming conversations. After the study, we will ask children who they preferred speaking to anddetermine if their behavior matches their preference. We will also use recordings of the children’sconversations to characterize their speech during each conversation. We expect that children willuser fewer disfluencies and will be easier to understand when they speak with avatars becausesimilar results were described in prior research (Oviatt, 2000).

H1: Children will have longer conversations when our confederate is presented as an avatarinstead of when our confederate is shown on video.

H2: Children will have more disfluencies with our confederate when she is shown on videoinstead of as an avatar.

H3: Children will have longer conversations when our confederate is presented with exaggeratedmotion instead of unaltered motion.

H4: Children will prefer to speak to our confederate when she is presented as an avatar to whenshe is shown on video.

6.4 MeasuresWe will ask the children which conversation they enjoyed the most, who they enjoyed speakingwith the most, and who they would want to speak with again if given the chance. The children’sresponses will give some insight into how much they enjoyed conversing with each character,but it is unclear how reliable these self-reported measures will be; therefore, we will also use be-havioral measures. We will time each conversation, and this behavioral measure will also give usinsight into how engaging each conversation was. Because we will be recording the children, wewill also be able to analyze their conversations. We are interested in whether or not children aremore understandable when conversing with avatars than when conversing by video. To measureunderstandability, we will count the number of disfluencies during each conversation. Disflu-encies during speech include self-corrections, filled pauses, repetitions, and false starts (Oviatt,1995). As length of utterance is linearly related to rate of disfluencies (Oviatt, 2000), we willalso count and measure the length of utterances. If necessary, we can also code each exchangeas being on topic or being off topic as a measure for how well the confederate was able to keepthe children’s attention. We can also calculate how long each child’s gaze was directed towardsthe screen if we need another measure of attention. Together, these measures will provide us a

53

better understanding of how children converse with unfamiliar adults through video, avatar, andexaggerated avatar.

6.5 Procedure

Children ages 4-10 and their parents will arrive at the study location. The children will rotatethrough several research experiments including our experiment. A researcher will present theconsent form to parents and children, and the researcher will answer any questions they mayhave. When a child is ready for our study, the experimenter will take the child to our studyroom and seat the child in front of our audiovisual telecommunications system. Our confederatewill already be on screen as herself or as an avatar. The experimenter will ask the confederateto introduce herself to the child, at which point the confederate will take over the interactionwhile the experimenter sits quietly to the side timing the interaction. The confederate and childwill have two minutes to converse before the experimenter interrupts. If the child wishes toend the conversation early, he/she will be allowed to do so. Once the conversation is done, theexperimenter will hide the confederate from view. The child will then be given a distraction tasksuch as coloring a picture. Once the first distraction task is complete, the child will converse withour confederate for a second time. This time the confederate will appear and sound differently.The voice and appearance conditions of the confederate will be randomly selected. After theconversation, the child will be given the second distraction task, and then converse for a thirdtime with our confederate. Again, the confederate will appear and sound differently than she didin the first two conversations. After the third conversation, the experimenter will ask the childseveral questions to evaluate his or her conversations. The three conversations, distractions tasks,and final evaluation should take no longer than 15 minutes.

6.6 Experimental Design

We will use a repeated measures experimental design. Confederate appearance will be our re-peated measure with three conditions: video, avatar, and exaggerated avatar. We will also giveconfederates three topics to discuss, one topic per conversation. The topics will be favorite tele-vision show, favorite movie, and favorite book. The confederate will attempt to get children todescribe the plot and the characters for each topic. We will also use a voice changer to give theconfederate three different voices so that the children think they are speaking with three differentpeople. We will use a Latin Square to determine the combination and order of conditions. Eachchild will randomly be assigned to a row of the Latin Square to determine their trial order andconditions. Our confederate will be instructed to keep her behavior as consistent as possible, butthere is no good way to keep the confederate completely blind to the conditions. The confederatewill definitely know the topic, and will most likely know whether or not they appear as an avatar,because the children will probably say something. We will keep the confederate from knowingwhich voice she has and whether or not her avatar is exaggerated. To control for the confederate’sbeliefs, we will ask the confederate how she thinks she was presented in each trial.

54

6.7 Analytic MethodsWe expect to have 70-100 participants, with roughly the following composition: twenty 4-5year olds, thirty to sixty 6-8 year olds, and twenty 9-10 year olds. We will conduct a repeatedmeasures analysis of variance (ANOVA) to see how confederate appearance, voice, and topicinfluenced children’s perceptions and behaviors. We will also investigate whether any of the self-report measures correlate with behavioral measures. Because the children will be different agesand genders, we will also investigate whether or not age and gender influenced perceptions andbehavior. We are mainly interested in whether or not children engage more with adults presentedas themselves or as avatars; therefore, we will contrast children’s video trials with both avatartrials. We are also interested in whether or not the exaggerated avatar influenced conversationsdifferently than the unmodified avatar so we will also contrast the two avatar conditions.

6.8 ConclusionFrom this study we hope to learn whether or not we can increase the engagement and attention ofchildren during conversation by using avatars. It can be difficult and uncomfortable for childrento converse with adults who are unfamiliar to them, and avatars may be a useful tool to ease theseconversations. Exaggerated avatars may also be useful because they may appear friendlier andmore extroverted than unmodified avatars. Because avatars may be easier for children to talkto, the children may also use fewer disfluencies during conversation, and therefore be easier tounderstand. This study will provide insight into how avatars benefit children’s conversations, andit will inform design guidelines for how avatar faces should be animated.

55

56

Chapter 7

Contributions and Schedule

This dissertation will provide evidence that avatars can be designed to be better conversationalpartners by self-adjusting their facial motion. We have and will use human controlled avatarsthat display realistic facial motion and converse freely to prove that modifying facial cues canaffect people’s attitudes and behaviors. We have chosen to modify facial cues by exaggeratingand damping the avatar facial motion. This dissertation will directly influence designers and re-searchers in the fields of computer science and human computer interaction, and the subfields ofanimation, embodied conversational agents, and avatar mediated communication. For animatorsand researchers, we will provide insight into how facial motion could influence adult perceptionsof human characters. From our studies, we now know that people’s sensitivities to motion de-pend on the rendering style of a character (cartoon vs. realistic). Artists and designers will needto consider a character’s appearance when they decide how to animate it. Agent and avatar de-signers should consider how motion influences the viewers’ impressions so that they can designcharacters that exhibit specific personality traits. For example, a virtual counselor should give animpression of competence, respect, and trust. We have proposed a study to investigate how avatarfacial motion influences adults during a cooperative task. We will use the results of this study toprovide guidelines for how to animate avatars to encourage cooperation and trust between adultsand avatars. We have also proposed a study to investigate how avatar-mediated communicationmay be useful and engaging for children. We will explore the impact of avatar facial motionon children’s interactions with an avatar. We hope that these results can aid in making futuretechnologies more effective to children.

This work will also impact researchers of behavioral and social psychology, child develop-ment, cognitive science, and communications. The system we have built and its predecessorsallows scientists to manipulate interactions in real time in ways that were impossible before.An audiovisual telecommunications system that can track a human user’s facial motion, and re-target that motion onto an avatar in real-time essentially allows people to wear masks that cancompletely change their appearance, yet still display their own motion. Users can swap gen-ders, ethnicities, and even species. Our system also allows us to manipulate facial motion. Forthis dissertation, we have focused on exaggerating and damping spatial motion for the entirehuman face; however, in the future this system could be used to manipulate regions of the facein different ways. For example, researchers interested in examining the influence of motion forspecific regions of the face could use our system to damp motion in some regions and exagger-

57

ate motion in others. We have validated the system to ensure that any additional latency due tocomputation has negligible effect on user’s impressions of their conversational partner and thesystem itself. This system allows researchers to study the importance and influence of facial mo-tion and appearance during live interaction. This system provides a new tool for understandinghuman-avatar interaction.

Proposed SchedulePlease see Figure 7.1 for a visualization of my proposed schedule. This schedule does not includeany extra time for implementing 3D AAMs. In the event that I discover that the 2D AAMs arebiasing study results, I will need to explore the possibility of switching to 3D AAMs. I wouldrequire an additional three or four months to implement the code, get new artwork, and createnew models for confederates. If I continue to use 2D AAMs then, in April and May I plan toprepare and pilot the adult and child studies. The child study will be run in June and July aspart of Disney Research Pittsburgh’s Summer Games for children. The data will be coded andanalyzed as it is being collected. Concurrently, I will prepare a journal submission based on thenoninteractive studies from Chapter 4 that I have already completed. I hope to submit the childstudy to the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI) or IEEEInternational Conference on Automatic Face and Gesture Recognition (FG). Both conferencesshould have deadlines around September 20, 2013. At the end of September, after the CHI andFG deadlines, I will run the adult study. In November, I will code and analyze the adult study. InDecember I will write up the results from the adult study, and possibly submit this work to theACM SIGGRAPH Conference and Exhibition on Computer Graphics and Interactive Techniques(deadline around January 17, 2014). I will then begin writing the dissertation document. I planto defend in late April or early May. I must allow six weeks for my committee to review mythesis document; therefore, I will have until March to work on writing my thesis. I have allowedroughly one month for revisions to the document after my defense.

58

April May June July Aug Sept Oct Nov Dec Jan Feb March April May Jun

Run child studies

Prep and pilot child studies

Child data analysis

Write for CHI

Pilot adult study

Run adult study

Adult study data

analysis

Write study Write/revise journal submission on noninteractive work

Write dissertation Revisions Committee reviews 6 wks

Defense

2013/2014

Figure 7.1: Proposed timeline for completion of dissertation work.

59

60

Bibliography

Z. Ambadar, J. W. Schooler, and J. F. Cohen. Deciphering the enigmatic face the importanceof facial dynamics in interpreting subtle facial expressions. Psychological Science, 16(5):403–410, 2005.

M. Argyle and J. Dean. Eye-contact, distance and affiliation. Sociometry, 28(3):289–304, 1965.

E. Back, T. R. Jordan, and S. M. Thomas. The recognition of mental states from dynamic andstatic facial expressions. Visual Cognition, 17(8):1271–1286, 2009.

D. Baldwin and M. Tomasello. Word learning: A window on early pragmatic understanding.In E. Clark, editor, The Proceedings of the Twenty Ninth Annual Child Language ResearchForum, pages 3–23. Center for the Study of Language and Information, Chicago, IL, 2001.

B. Barry and G. L. Stewart. Composition, process, and performance in self-managed groups:The role of personality. Journal of Applied Psychology, 82(1):62–78, 1997.

J. N. Bassili. Facial motion in the perception of faces and of emotional expression. Journal ofExperimental Psychology: Human Perception and Performance, 4(3):373–379, 1978.

D. C. Batson. How social an animal? The human capacity for caring. American Psychologist,45(3):336–346, 1990.

J. B. Bavelas, A. Black, C. R. Lemery, and J. Mullett. “I show how you feel”: Motor mimicry asa communicative act. Journal of Personality and Social Psychology, 50(2):322–329, 1986.

R. Beale and C. Creed. Affective interaction: How emotional agents affect users. InternationalJournal of Human-Computer Studies, 67(9):755–776, 2009.

P. J. Benson, R. Campbell, T. Harris, M. G. Frank, and M. J. Tovee. Enhancing images of facialexpressions. Attention, Perception, & Pschophysics, 61(2):259–274, 1999.

T. W. Bickmore and R. W. Picard. Establishing and maintaining long-term human-computerrelationships. ACM Transactions on Computer-Human Interaction (TOCHI), 12(2):293–327,2005.

J. Blascovich. A theoretical model of social influence for increasing the utility of collaborativevirtual environments. In 4th International Conference on Collaborative Virtual Environments(CVE 2002), pages 25–30, 2002.

J. Blascovich, J. Loomis, A. C. Beall, K. R. Swinth, C. L. Hoyt, and J. N. Bailenson. Immersivevirtual environment technology as a methodological tool for social psychology. PsychologicalInquiry, 13:103–124, 2002.

S. M. Boker, J. F. Cohn, B.-J. Theobald, I. Matthews, T. R. Brick, and J. R. Spies. Effects of

61

damping head movement and facial expression in dyadic conversation using real-time facialexpression tracking and synthesized avatars. Philosophical Transactions of the Royal SocietyBiological Sciences, 364(1535):3485–3495, 2009.

S. M. Boker, J. F. Cohn, B.-J. Theobald, I. Matthews, M. Mangini, J. R. SPies, Z. Ambadar, andT. R. Brick. Something in the way we move: Motion dynamics, not perceived sex, influencehead movements in conversation. Journal of Experimental Psychology: Human Perceptionand Performance, 37(3):874–891, 2011.

P. Borkenau, N. Mauer, R. Riemann, F. M. Spinath, and A. Angleitner. Thin slices of behavioras cues of personality and intelligence. Journal of Personality and Social Psychology, 86(4):599–614, 2004.

D. H. Brainard. The psychophysics toolbox. Spatial Vision, 10:433–436, 1997.

S. Brave, C. Nass, and K. Hutchinson. Computers that care: Investigating the effects of orienta-tion of emotion exhibited by an embodied computer agent. International Journal of Human-Computer Studies, 62(2):161–178, 2005.

S. L. Brodsky, T. M. Neal, R. J. Cramer, and M. H. Ziemke. Credibility in the courtroom: Howlikeable should an expert witness be? Journal of the American Academy of Psychiatry and theLaw, 37(4):525–532, 2009.

V. Bruce and A. Young. In the Eye of the Beholder: The science of face perception. OxfordUniversity Press, Oxford, U.K., 1998.

J. Burgoon, J. Bonito, B. Bengtsson, C. Cederberg, M. Lundeberg, and L. Allspach. Interac-tivity in human-computer interaction: A study of credibility, understanding, and influence.Computers in Human Behavior, 16(6):553–574, 2000.

J. K. Burgoon, T. Birk, and M. Pfau. Nonverbal behaviors, persuasion, and credibility. HumanCommunication Research, 17(1):140–169, 1990.

W. Burleson and R. W. Picard. Gender-specific approaches to developing emotionally intelligentlearning companions. IEEE Intelligent Systems, 22(4):62–69, 2007.

A. J. Calder, A. W. Young, D. Rowland, and D. I. Perrett. Computer-enhanced emotion in facialexpressions. Philosophical Transactions of the Royal Society Biological Sciences, 264:919–925, 1997.

J. N. Cappella. The facial feedback hypothesis in human interaction. Journal of Language andSocial Psychology, 12(1-2):13–29, 1993.

J. Cassell and K. R. Thorisson. The power of a nod and a glance: Envelope vs. emotionalfeedback in animated conversational agents. Applied Artificial Intelligence, 13(4):519–538,1999.

A. Choi, C. de Melo, and J. Gratch. Affective engagement to emotional facial expressions ofembodied social agents in a decision-making game. Computer Animation and Virtual Worlds,23(3-4):331–342, 2012.

J. Cloud-Buckner, M. Sellick, B. Sainathuni, B. Yang, and J. Gallimore. Expression of person-ality through avatars: Analysis of effects of gender and race on perceptions of personality. In13th International Conference on Human-Computer Interaction, pages 248–256, 2009.

62

W. S. Condon and W. Osgton. Speech and body motion synchrony of the speaker-hearer. InD. Horton and J. Jenkins, editors, The Perception of Language, pages 150–184. AcademicPress, New York, NY, 1971.

T. F. Cootes, G. J. Edwards, and C. J. Taylor. Active appearance models. IEEE Transactions onPattern Analysis and Machine Intelligence, 23(6):681–685, 2001.

T. F. Cootes, G. Wheeler, K. Walker, and C. J. Taylor. View-based active appearance models.Image and Vision Computing, 20(9-10):657 – 664, 2002.

C. Darves and S. Oviatt. Talking to digital fish. In Z. Ruttkay and C. Pelachaud, editors, FromBrows to Trust: Evaluating Embodied Conversational Agents, pages 271–292. Kluwer Aca-demic Publishers, Norwell, MA, 2004.

B. de Gelder and J. Vroomen. The perception of emotions by ear and by eye. Cognition andEmotion, 14(3):289–311, 2000.

C. M. de Melo, P. Carnevale, and J. Gratch. The impact of emotion displays in embodied agentson emergence of cooperation with people. Presence, 20(5):449–465, 2011.

U. Dimberg. Facial reactions to facial expressions. Psychophysiology, 19(6):643–647, 1982.

J. S. Donath. Mediated faces. In 4th International Conference on Cognitive Technology: Instru-ments of Mind (CT 2001), pages 373–390, 2001.

C. Doucet and R. M. Stelmack. Movement time differentiates extraverts from introverts. Per-sonality and Individual Differences, 23(5):775–786, 1997.

A. Druin. Cooperative inquiry: Developing new technologies for children with children. In ACMSIGCHI Conference on Human Factors in Computing Systems (CHI 1999), pages 592–599,1999.

A. Druin, B. Bederson, A. Boltman, and A. Miura. Children as our technology design partners.In A. Druin, editor, The design of children’s technology, pages 51–72. Morgan Kaufmann, SanFrancisco, CA, 1998.

S. Duncan. Some signals and rules for taking speaking turns in conversations. Journal of Per-sonality and Social Psychology, 23(2):283–292, 1972.

P. Ekman. Cross cultural studies of facial expression. In P. Ekman, editor, Darwin and FacialExpression, pages 169–229. Academic, New York, 1973.

P. Ekman. Telling Lies: Clues to deceit in the marketplace, politics, and marriage. W.W. Norton& Company, New York, 1985.

P. Ekman, W. V. Friesen, M. O’Sullivan, and K. Scherer. Relative importance of face, body, andspeech in judgments of personality and affect. Journal of Personality and Social Psychology,38(2):270–277, 1980.

P. Ekman, W. V. Friesen, M. O’Sullivan, A. Chan, I. Diacoyanni-Tarlatz, K. Heider, R. Krause,W. A. LeCompte, T. Pitcairn, P. E. Ricci-Bitti, K. Scherer, M. Tomita, and A. Tzavaras. Uni-versals and cultural differences in the judgments of facial expressions of emotion. Journal ofPersonality and Social Psychology, 53(4):712–717, 1987.

M. Fabri, D. Moore, and D. Hobbs. Empathy and enjoyment in instant messaging. In 19th British

63

HCI Group Annual Conference (HCI 2005), pages 4–9, 2005.

A. M. Feleky. The expression of the emotions. Psychological Review, 21(1):33–41, 1914.

C. Frith. Role of facial expressions in social interactions. Philosophical Transactions of theRoyal Society Biological Sciences, 364:3453–3458, 2009.

M. Garau, M. Slater, S. Bee, and M. A. Sasse. The impact of eye gaze on communication usinghumanoid avatars. In ACM SIGCHI Conference on Human Factors in Computing Systems(CHI 2001), pages 390–316, 2001.

M. J. Gielniak and A. L. Thomaz. Enhancing interaction through exaggerated motion synthesis.In ACM/IEEE International Conference on Human-Robot Interaction (HRI 2012), pages 375–382, 2012.

S. Gosling, P. Rentfrow, and W. Swann. A very brief measure of the Big-Five personality do-mains. Journal of Research in Personality, 37(6):504–528, 2003.

J. Gratch, N. Wang, A. Okhmatovskaia, F. Lamothe, M. Morales, R. v. Werf, and L.-P. Morency.Can virtual humans be more engaging than real ones? In 12th International Conference onHuman-Computer Interaction, pages 286–297, 2007.

U. Hess, S. Blairy, and R. E. Kleck. The intensity of emotional facial expressions and decodingaccuracy. Journal of Nonverbal Behavior, 21(4):241–257, 1997.

H. Hill and A. Johnston. Categorizing sex and identity from the biological motion of faces.Current Biology, 11(11):880–885, 2001.

H. C. Hill, N. F. Troje, and A. Johnston. Range- and domain-specific exaggeration of facialspeech. Journal of Vision, 5(10):793–807, 2005.

G. Hodgkinson. The seduction of realism. In ACM SIGGRAPH ASIA Educators Program, pages1–4, 2009.

J. Holub, M. Kastner, and O. Tomiska. Delay effect on conversational quality in telecommunica-tion networks: Do we mind? In Wireless Telecommunications Symposium, 2007, pages 1–4,2007.

M. E. Hoque. My Automated Conversation Helper (MACH): Helping people improve socialskills. In ACM International Conference on Multimodal Interaction (ICMI 2012), DoctoralConsortium, 2012.

J. Hyde, E. J. Carter, S. Kiesler, and J. K. Hodgins. Perceptual effects of damped and exaggeratedfacial motion in animated characters. In IEEE International Conference on Automatic Faceand Gesture Recognition (FG 2013), 2013.

K. M. Inkpen and M. Sedlins. Me and my avatar: Exploring users’ comfort with avatars forworkplace communication. In ACM Conference on Computer Supported Cooperative Work(CSCW 2011), pages 383–386, 2011.

E. A. Isaacs and J. C. Tang. What video can and can’t do for collaboration: a case study. In ACMMultimedia Conference (MM 1993), pages 199–206, 1993.

K. Isbister and C. Nass. Consistency of personality in interactive characters: verbal cues, non-verbal cues, and user characteristics. International Journal of Human-Computer Studies, 53

64

(2):251–267, 2000.

M. R. Kandalaft, N. Didehbani, D. C. Krawczyk, T. T. Allen, and S. B. Chapman. Virtual realitysocial cognition training for young adults with high-functioning autism. Journal of Autismand Developmental Disorders, 43(1):34–44, 2013.

J. Katsyri and M. Sams. The effect of dynamics on identifying basic emotions from syntheticand natural faces. International Journal of Human-Computer Studies, 66(4):233–242, 2008.

D. Keltner and J. Haidt. Social functions of emotions at four levels of analysis. Cognition andEmotion, 13(5):505–521, 1999.

A. Kendon. Some functions of gaze-direction in social interaction. Acta Psychologica, 26(1):22–63, 1967.

A. Kendon. Movement coordination in social interaction: Some examples described. ActaPsychologica, 32:101–125, 1970.

S. Kiesler, L. Sproull, and K. Waters. A priosner’s dilemma experiment on cooperation withpeople and human-like computers. Journal of Personality and Social Psychology, 70(1):47–65, 1996.

N. Kitawaki and K. Itoh. Pure delay effects on speech quality in telecommunications. IEEEJournal on Selected Areas in Communications, 9(4):586–593, 1991.

M. Kleiner, D. Brainard, and D. Pelli. What’s new in psychtoolbox-3? Perception, ECVPAbstract Supplement, 36, 2007.

E. Klemmer. Subjective evaluation of transmission delay in telephone conversations. The BellSystem Technical Journal, 46(6):1141–1147, 1967.

A. D. Knowles and M. C. Nixon. Children’s comprehension of expressive states depicted in atelevision cartoon. Australian Journal of Psychology, 41(1):17–24, 1989.

R. M. Krauss and P. D. Bricker. Effects of transmission delay and access delay on the efficiencyof verbal communication. Journal of the Acoustical Society of America, 41(2):286–292, 1967.

E. Krumhuber and A. Kappas. Moving smiles: The role of dynamic components for the percep-tion of the genuineness of smiles. Journal of Nonverbal Behavior, 29(1):3–24, 2005.

E. Krumhuber, A. S. Manstead, D. Cosker, D. Marshall, and P. L. Rosin. Facial dynamics asindicators of trustworthiness and cooperative behavior. Emotion, 7(4):730–735, 2007.

T. Kurita, S. Iai, and N. Kitawaki. Effects of transmission delay in audiovisual communication.Electronics and Communications in Japan (Part 1: Communications), 77(3):63–74, 1994.

J. C. Lafferty and A. W. Pond. The Desert Survival Problem. Experimental Learning Methods,Plymouth, MI, 1974.

J. L. Lakin, V. E. Jefferis, C. M. Cheng, and T. L. Chartrand. The chameleon effect as social glue:Evidence for the evolutionary significance of nonconscious mimicry. Journal of NonverbalBehavior, 27(3):145–162, 2003.

J. Lasseter. Principles of traditional animation applied to 3D computer animation. ACM SIG-GRAPH Conference on Computer Graphics and Interactive Techniques (SIGGRAPH 1987),21(4):35–44, Aug. 1987.

65

J. R. Lewis. Pairs of latin squares to counterbalance sequential effects and pairing of conditionsand stimuli. In Proc. Hum. Fact. Soc. An., pages 1223–1227, 1989.

R. Linden. Second life user’s guide, Dec. 2012. URL http://community.secondlife.com/t5/English-Knowledge-Base/Second-Life-User-s-Guide/ta-p/1244857.

A. C. Little, B. C. Jones, and L. M. DeBruine. The many faces of research on face perception.Philosophical Transactions of the Royal Society Biological Sciences, 366(1571):1634–1637,2011.

L.-O. Lundqvist and U. Dimberg. Facial expressions are contagious. Journal of Psychophysiol-ogy, 9(3):203–211, 1995.

H. Maldonado, J.-E. R. Lee, S. Brave, C. Nass, H. Nakajima, R. Yamada, K. Iwamura, andY. Morishima. We learn better together: Enhancing eLearning with emotional characters. InT. Koschmann, D. D. Suthers, and T.-W. Chan, editors, Computer Supported CollaborativeLearning 2005: The Next 10 Years!, pages 408–417. Lawrence Erlbaum Associates, Mahwah,NJ, 2005.

D. W. Massaro and P. B. Egan. Perceiving affect from the voice and the face. PsychonomicBulletin & Review, 3(2):215–221, 1996.

D. W. Massaro, M. M. Cohen, J. Beskow, S. Daniel, and R. A. Cole. Developing and evaluat-ing conversational agents. In J. Cassell, J. Sullivan, S. Prevost, and E. F. Churchill, editors,Embodied Conversational Agents, pages 286–318. MIT Press, Cambridge, MA, 2000.

I. Matthews and S. Baker. Active appearance models revisited. International Journal of Com-puter Vision, 60:135–164, 2004.

R. McDonnell, M. Breidt, and H. H. Bulthoff. Render me real?: Investigating the effect ofrender style on the perception of animated virtual humans. ACM Transactions on Graphics(TOG 2012), 31(4):91:1–91:11, 2012.

D. N. McIntosh. Facial feedback hypotheses: Evidence, implications, and directions. Motivationand Emotion, 20(2):121–147, 1996.

D. N. McIntosh. Spontaneous facial mimicry, liking and emotional contagion. Polish Psycho-logical Bulletin, 37(1):31–42, 2006.

D. N. McIntosh, D. Druckman, and R. B. Zajonc. Socially induced affect. In D. Druckmanand R. A. Bjork, editors, Learning, remembering, believing: Enhancing human performance,pages 251–276, 364–371. National Academy Press, Washington, DC, 1994.

J. Morkes, H. K. Kernal, and C. Nass. Effects of humor in task-oriented human-computer interac-tion and computer-mediated communication: A direct test of SRCT theory. Human-ComputerInteraction, 14(4):395–435, 1999.

C. Nass and Y. Moon. Machines and mindlessness: Social responses to computers. Journal ofSocial Issues, 56(1):81–103, 2000.

C. Nass, J. Steuer, and E. R. Tauber. Computers are social actors. In ACM SIGCHI Conferenceon Human Factors in Computing Systems (CHI 1994), pages 72–78, 1994.

66

P. M. Niedenthal, L. W. Barsalou, P. Winkielman, S. Krauth-Gruber, and F. Ric. Embodiment inattitudes, social perception, and emotion. Personality and Social Psychology, 9(3):184–211,2005.

A. J. O’Toole, D. A. Roark, and H. Abdi. Recognizing moving faces: a psychological and neuralsynthesis. TRENDS in Cognitive Sciences, 6(6):261–266, 2002.

S. Oviatt. Predicting spoken disfluencies during human-computer interaction. Computer Speechand Language, 9(1):19–35, 1995.

S. Oviatt. Talking to thimble jellies: Children’s conversational speech with animated characters.In INTERSPEECH, pages 877–880, 2000.

S. Parise, S. Kiesler, L. Sproull, and K. Waters. My partner is a real dog: Cooperation with socialagents. In ACM Conference on Computer Supported Cooperative Work (CSCW 1996), pages399–408, 1996.

S. Parise, S. Kiesler, L. Sproull, and K. Waters. Cooperating with life-like interface agents.Computers in Human Behavior, 15(2):123–142, 1999.

B. Parkinson. Do facial movements express emotions or communicate motives? Personality andSocial Psychology Review, 9(4):278–311, 2005.

D. Pelli. The videotoolbox software for visual psychophysics: Transforming numbers intomovies. Spatial Vision, 10:437–442, 1997.

A. Percy. Understanding latency in IP telephony. Technical report, Brooktrout Technology, 1999.

F. E. Pollick, H. Hill, A. Calder, and H. Paterson. Recognising facial expression from spatiallyand temporally modified movements. Perception, 32:813–826, 2003.

Polycom. Supporting real-time traffic: Preparing your IP network for video conferencing. Tech-nical report, Polycom, 2006.

B. Reeves and C. Nass. The media equation: How people treat computers, television, and newmedia like real people and places. Cambridge University Press, New York, 1996.

T. Ribeiro and A. Paiva. The illusion of robotic life: Principles and practices of animation forrobots. In ACM/IEEE International Conference on Human-Robot Interaction (HRI 2012),pages 383–390, 2012.

R. R. Riesz and E. Klemmer. Subjective evaluation of delay and echo suppressors in telephonecommunications. The Bell System Technical Journal, 42(6):2919–2941, 1963.

R. D. Roberts. Fitts’ law, movement time and intelligence. Personality and Individual Differ-ences, 23:227–246, 1997.

J. Scharlemann, C. Eckel, A. Kacelnik, and R. Wilson. The value of a smile: Game theory witha human face. Journal of Economic Psychology, 22:617–640, 2001.

L. Sproull, M. Subramani, S. Kiesler, J. H. Walker, and K. Waters. When the interface is a face.Human-Computer Interaction, 11(2):97–124, 1996.

R. M. Stelmack, M. Houlihan, and P. A. McGarry-Roberts. Personality, reaction time, and event-related potentials. Journal of Personality and Social Psychology, 65:399–409, 1993.

K. Takashima, Y. Omori, Y. Yoshimoto, Y. Itoh, Y. Kitamura, and F. Kishino. Effects of avatar’s

67

blinking animation on person impressions. In Graphics Interface (GI 2008), pages 169–176,2008.

L. Takayama, D. Dooley, and W. Ju. Expressing thought: Improving robot readability withanimation principles. In ACM/IEEE International Conference on Human-Robot Interaction(HRI 2011), pages 69–76, 2011.

A. Tartaro and J. Cassell. Playing with virtual peers: Bootstrapping contingent discourse inchildren with autism. In International Conference for the Learning Sciences, pages 382–389,2008.

B.-J. Theobald, I. Matthews, M. Mangini, J. R. Spies, T. R. Brick, J. F. Cohn, and S. M. Boker.Mapping and manipulating facial expression. Language and Speech, 52:369–386, 2009.

F. Thomas and O. Johnston. Disney Animation: The Illusion of Life. Abbeville Press, 1981.

G. L. Troseth, M. M. Saylor, and A. H. Archer. Young children’s use of video as a source ofsocially relevant information. Child Development, 77(3):786–799, 2006.

A. Vartabedian. The effects of transmission delay in four-wire teleconferencing. The Bell SystemTechnical Journal, 45(10):1673–1688, 1966.

A. M. von der Putten, N. C. Kramer, J. Gratch, and S.-H. Kang. “It doesn’t matter what youare!” Explaining the social effects of agents and avatars. Computers in Human Behavior, 26(6):1641–1650, 2010.

D. Weibel, D. Stricker, B. Wissmath, and F. W. Mast. How socially relevant visual characteristicsof avatars influence impression formation. Journal of Media Psychology, 22(1):37–43, 2010.

J. Wickett and P. Vernon. Replicating the movement time-extraversion link ... with a little helpfrom IQ. Personality and Individual Differences, 28(2):205–215, Feb. 2000.

S. C. Widen and P. Naab. Can an anger face also be scared? malleability of facial expressions.Emotion, 12(5):919–925, 2012.

S. C. Widen and J. A. Russell. Children acquire emotion categories gradually. Cognitive Devel-opment, 23(2):291–312, 2008.

S. C. Widen and J. A. Russell. Differentiation in preschooler’s categories of emotion. Emotion,10(5):651–661, 2010.

J. Willis and A. Todorov. First impressions: Making up your mind after a 100-ms exposure to aface. Psychological Science, 17(7):592–598, 2006.

M. B. Wolf and P. L. Ackerman. Extraversion and intelligence: A meta-analytic investigation.Personality and Individual Differences, 39:531–542, 2005.

J. Xiao, S. Baker, I. Matthews, and T. Kanade. Real-time combined 2D+3D active appearancemodels. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR 1994),pages 535–542, 2004.

L. Zebrowitz. Reading Faces: Window to the soul? Westview Press, Boulder, CO, 1997.

68