19
SSPNet Social Signal Processing Network http://www.sspnet.eu/ Funded under the 7th FP (Seventh Framework Programme) Theme ICT-2007.2.2 [Network of Excellence] D9.2: Progress report, compendium of work done during months 1-12: Due date: 31/01/2010 Submission date: 29/01/2010 Project start date: 01/02/2009 Duration: 60 months WP Manager: C. Pelachaud Revision: 1 Author(s): Marcela Charfuelan, Ellen Douglas-Cowie, Roderick Cowie, Dirk Heylen, Gregor Hofer, Magalie Ochs, Catherine Pelachaud, Hiroshi Shimodaira, Marc Schr¨ oder, Oytun T¨ urk Project funded by the European Commission in the 7th Framework Programme (2009-2013) Dissemination Level PU Public No RE Restricted to a group specified by the consortium (includes Commission Services) No CO Confidential, only for members of the consortium (includes Commission Services) Yes

SSPNet Social Signal Processing Networkvincia/papers/D9.2.pdf · The correlations between prosody and voice qualities cues and social factors in scenario meet-ings are also being

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SSPNet Social Signal Processing Networkvincia/papers/D9.2.pdf · The correlations between prosody and voice qualities cues and social factors in scenario meet-ings are also being

SSPNetSocial Signal Processing Network

http://www.sspnet.eu/

Funded under the 7th FP (Seventh Framework Programme)Theme ICT-2007.2.2

[Network of Excellence]

D9.2: Progress report, compendium of work doneduring months 1-12:

Due date: 31/01/2010 Submission date: 29/01/2010Project start date: 01/02/2009 Duration: 60 monthsWP Manager: C. Pelachaud Revision: 1

Author(s): Marcela Charfuelan, Ellen Douglas-Cowie, Roderick Cowie, DirkHeylen, Gregor Hofer, Magalie Ochs, Catherine Pelachaud, Hiroshi Shimodaira,Marc Schroder, Oytun Turk

Project funded by the European Commissionin the 7th Framework Programme (2009-2013)

Dissemination LevelPU Public NoRE Restricted to a group specified by the consortium (includes Commission Services) NoCO Confidential, only for members of the consortium (includes Commission Services) Yes

Page 2: SSPNet Social Signal Processing Networkvincia/papers/D9.2.pdf · The correlations between prosody and voice qualities cues and social factors in scenario meet-ings are also being
Page 3: SSPNet Social Signal Processing Networkvincia/papers/D9.2.pdf · The correlations between prosody and voice qualities cues and social factors in scenario meet-ings are also being

D9.2: Progress report, compendium of work done duringmonths 1-12:

Abstract:Within WP9 the collaborative efforts is to on a “map” of the concepts relevant to social

signalling in general and of politeness in particular. In this deliverable we report what hasbeen achieved during the first year of the project. We also provide a description of futureworks.

Page 4: SSPNet Social Signal Processing Networkvincia/papers/D9.2.pdf · The correlations between prosody and voice qualities cues and social factors in scenario meet-ings are also being
Page 5: SSPNet Social Signal Processing Networkvincia/papers/D9.2.pdf · The correlations between prosody and voice qualities cues and social factors in scenario meet-ings are also being

Contents

1 Introduction 1

2 Integration Efforts 2

3 Achievements 2

4 Future Achievements 11

Bibliography 13

Page 6: SSPNet Social Signal Processing Networkvincia/papers/D9.2.pdf · The correlations between prosody and voice qualities cues and social factors in scenario meet-ings are also being
Page 7: SSPNet Social Signal Processing Networkvincia/papers/D9.2.pdf · The correlations between prosody and voice qualities cues and social factors in scenario meet-ings are also being

SSPNet [231287] D9.2: Modelling politeness

1 Introduction

The objectives of WP9 is to agree on a “map” of the concepts relevant to social signalling ingeneral and of politeness in particular, in order to 1) engage social scientists and technologistsin a dialogue challenging to both groups, 2) provide a broad conceptual framework for SSPNetas a whole, and 3) to instantiate the framework in a concrete case study, modelling the relevantaspects of politeness in data analysis and synthesis.

To this aim, three main goals have been identified:

1. build a shared conceptual modelling of social signals and behavioural cues related topoliteness

2. define a representation language and analysis of politeness

3. simulate a model of politeness through a Conversational Agent

The methodology we have chosen to follow can be described as two major steps. We will firstlook at concepts and theories that are pertinent to social interactions including politeness. Wewill look to get incentives from the litterature; a large bibliography has been gathered. We willalso build a general framework on social interaction, for which politeness is an instantiation.The general framework will go beyond the theory of Brown and Levinson. It will cover muchmore factors and will deal with multimodality.

We will also look at data, in particular data from natural settings. We have looked forrelevant data from existing corpora. We have decided to use AMI data as it involves severalparticipants in a meeting scenario. An annotation scheme is being developed. It encodes cues(including polite cues) as well as the temporal relationship between cues of participants. Thiswill provide us with insights on synchronicity between participants as well as rythmicity ofcues.

Another important aspect of WP9 is the elaboration of computational models. We areworking toward building a conceptual model that integrates various phenomena linked topoliteness:

• Cognitive model

• Synthesis model

• Representation Language encompassing politeness cues and strategies

While the general framework and the representation language specification has a verybroad scope, other work has been conducted more precisely on politeness. A thourough def-inition of politeness has been drawn from a litterature review. Two main categories haveemerged: Conventional politeness and Interpersonal politeness. Through a careful data anal-ysis of the AMI corpus, a study on polite vs impolite floor management is being conducted.The correlations between prosody and voice qualities cues and social factors in scenario meet-ings are also being investigated. Idem for smiles and head movements, they are other cuesthat are being looked at. Embodied Conversational Agent technology serves as test-bed onperceptive studies.

SSPNet D9.2: page 1 of 13

Page 8: SSPNet Social Signal Processing Networkvincia/papers/D9.2.pdf · The correlations between prosody and voice qualities cues and social factors in scenario meet-ings are also being

SSPNet [231287] D9.2: Modelling politeness

2 Integration Efforts

A number of meetings have been organized over last year either physically or by phone.These meetings are the occasion to present each other work and advancement, to exchangeideas, specially when coming from different backgrounds. Moreover several specific meetings,involving few partners, have been organized. In general these meetings were set at siteparticipants. Exchanges of PhD students have also happened. All these meetings ensure acohesion within WP9.

Participants of WP9 have also participated actively to other WP’s meetings: in Amster-dam (sept 09), London (Avr 09) and Rome (Dec 09).

Here is the list of physical meetings that took place:

• WP9 meeting at the kick-off meeting in London (Imperial College), Feb 09

• meeting on evaluation study for a polite ECA during eNTERFACE project (QUB &CNRS), Genova, July 09

• WP9 meeting during the ACII’09 conference, Amsterdam, sept 09

• WP9 meeting in Paris (CNRS), Nov 09

Several phone meetings were conducted:

• WP9 meeting: April 09

• meeting on statistical methods for synthesis of social signals (DFKI & UoE): Oct 09

Gregor Hofer, PhD student at Edinburgh came to spend 2 weeks at CNRS-Paris.

3 Achievements

In this section we report the achievements of WP9 obtained by the partners in the first yearof the project. Each achievement starts with a short description and then list the tangibleoutcome, dissemination activities, partners involved, related publications and the relationshipwith the other thematic WPs.

1. Progressive development of a framework for describing social signals.

Description: We have developed a framework that sets out in a reasonably compactway the main issues that the relevant literatures in the human sciences (mainly psychol-ogy, anthropology, and linguistics)indicate are relevant to understanding the processesof generating and interpreting indicators that may carry information about social states,relationships, intentions, etc.

Tangible outcome: N/A

Partners involved: QUB, CNRS-LTCI, University of Twente, DFKI

SSPNet D9.2: page 2 of 13

Page 9: SSPNet Social Signal Processing Networkvincia/papers/D9.2.pdf · The correlations between prosody and voice qualities cues and social factors in scenario meet-ings are also being

SSPNet [231287] D9.2: Modelling politeness

Dissemination: Successive versions of tha talk have been presented at ACII09 (seepublication), in the WP meeting in Paris (Nov 2009), and in the Workshop on Foun-dations of Social Signals (Rome, Dec 2009). The ACII and Rome presentations areavailable on video.

Related Publication(s): Paul M. Brunet,Gary McKeown, Roddy Cowie, HastingsDonnan, Ellen Douglas-Cowie (2009) Social Signal Processing: What are the relevantvariables? And in what ways do they relate? Proceedings of ACII 2009: AffectiveComputing and Intelligent Interaction September 2009, Amsterdam, The Netherlandsvol II pp 77-82.

Links to other thematic WPs: The issues affect all WP’s, and most partners havebeen involved in the discussions that have shaped the current form.

2. Applying the general framework for describing social signals to the particularcase of politeness.

Description: A framework has been developed which integrates diverse (and in somerespects contrasting) analyses of politeness in the literature, distinguishing differenttypes of politeness and the different levels at which it may be signalled.

Tangible outcome: N/A

Partners involved: QUB, CNRS-LTCI, University of Twente, DFKI

Dissemination: P Brunet presented a paper on politeness in the WP meeting in Paris(Nov 2009), and the material was included in his paper to the Workshop on Foundationsof Social Signals (Rome, Dec 2009). The Rome presentation is available on video.

Related Publication(s): A version is being prepared for submission early in 2010.

Links to other thematic WPs: The work on politeness is deliberately being castin a format that is relevant to all the thematic workpackages.

3. Bibliography of literature on social signal processing in the Human Sciences.

Description: A bibliography of journal articles, books, book chapters, and conferenceproceedings relevant to social signal processing. The list includes sources from humansciences and communication technologies.

Tangible outcome: The list has been put together, and uploaded onto the website.

Partners involved: QUB, University of Edinburgh

SSPNet D9.2: page 3 of 13

Page 10: SSPNet Social Signal Processing Networkvincia/papers/D9.2.pdf · The correlations between prosody and voice qualities cues and social factors in scenario meet-ings are also being

SSPNet [231287] D9.2: Modelling politeness

Dissemination: It has recently been made available on the SSPnet website.

Related Publication(s): N/A

Links to other thematic WPs: N/A

4. A paradigm for studying polite an impolite interactions.

Description: We have developed a scenario in which users can be recorded givinginstructions in deliberately polite or impolite manner, and the effects on collaboratorsmonitored. It involves a computer game in which players use QUB’s teleprompterrecording arrangement to interact, allowing high quality audiovisual recordings to bemade. Recording has just begun.

Tangible outcome: A database will be generated.

Partners involved: QUB

Dissemination: N/A

Related Publication(s): N/A

Links to other thematic WPs: N/A

5. Bibliography collection on politeness (mostly speech related).

Description: We have made a preliminary literature survey (mostly speech related)targeting the following aspects: voice quality and politeness, intonation and politeness,non-verbal vocalisation and politeness, context-dependence in polite behaviour, cross-cultural and cross-language differences on politeness. We have identified measures aswell as procedures to analyse prosody and voice quality on controlled data and naturalspeech recordings that can be useful for us to relate features with social signals ofconcern.

Tangible outcome: Bibliography containing 60 references.

Partners involved: DFKI

Dissemination: The bibliography is available on the SSPNet portal at:

https://wcms.inf.ed.ac.uk/sspnet/wp-spaces/wp9-webspace/dfki-ideas-and-goals-for-wp9-preliminary

SSPNet D9.2: page 4 of 13

Page 11: SSPNet Social Signal Processing Networkvincia/papers/D9.2.pdf · The correlations between prosody and voice qualities cues and social factors in scenario meet-ings are also being

SSPNet [231287] D9.2: Modelling politeness

Related Publication(s): N/A

Links to other thematic WPs: WP10

6. Discussion of issues regarding a markup language for social signal processing.

Description: Preliminary considerations regarding an XML-based markup languagefor social signal processing has been presented for discussion in the WP9 meeting inParis on November 2009. This is a markup language view of QUB’s proposal of relevantvariables. The aim is to represent variables relevant for social signal processing in a waythat they can be used in technology dealing with social signals. Useful comments werereceived during the discussion.

Tangible outcome: Presentation slides discussing concepts underlying a possibleXML markup language for SSP

Partners involved: DFKI, Queen’s University Belfast

Dissemination: Presentation in the WP9 meeting in Paris on November 2009.

Related Publication(s): N/A

Links to other thematic WPs: N/A

7. Harmonics+Noise Model (HNM) baseline implementation

Description : We have completed a baseline implementation of a harmonics+noisemodel (HNM) under the open source MARY TTS system. This parametric model sup-ports more robust modification of prosody to enable larger amounts of prosody modi-fications with less reduction in naturalness and quality on the synthetic speech. ThisHNM model as well as speech signal modification algorithms will allow us to generatetarget prosody and voice quality patterns during synthesis and will be integrated withthe speech synthesis engine in MARY TTS. Target prosody and voice quality patternswill be analyzed/extracted from real data (see related achievements in WP10 report) inthe form of expert rules that relate features with social signals of concern.

Tangible outcome: A package for HNM speech analysis and synthesis

Partners involved: DFKI

Dissemination: Freely available on MARY TTS version 4.0: http://mary.dfki.de/Download

SSPNet D9.2: page 5 of 13

Page 12: SSPNet Social Signal Processing Networkvincia/papers/D9.2.pdf · The correlations between prosody and voice qualities cues and social factors in scenario meet-ings are also being

SSPNet [231287] D9.2: Modelling politeness

Related Publications: N/A.

Links to other thematic WPs: WP10

8. Implementation of voice quality and prosody measures relevant to socialsignals.

Description : We have implemented 30 measures, reported in the literature as ro-bust and effective for discrimination of voice quality and prosody. We have validatedthese measures with controlled data: the NECA database [Schroder and Grice, 2003]which is a diphone database that contains a full diphone set for each of three levelsof vocal effort, for simplicity refered as “soft”, “modal‘” and “loud”; and the Berlindatabase of emotional speech [Burkhardt et al., 2005] which contains recordings of 10speakers (actors), speaking in neutral style and 6 different emotions: angry, bored, dis-gust, fearful, happy and sad. Using the NECA DB, the implemented measures and aprincipal component analysis, we were able to confirm the perception tests reportedin [Schroder and Grice, 2003]. As it is shown in the Figure 1, the first two principalcomponents give us a clear separation of the three effort ratings, for the female speaker(lower part) and the male speaker (upper part). Regarding the emotions on the Berlindatabase, we were able to replicate, to some extent (similar means but larger stan-dard deviations) , some of the voice quality measures reported in the experiment of[Lugger et al., 2006].

We have also developed a framework to analyze prosody and voice quality of socialsignals in the AMI Meeting corpus. The framework includes:

• Use of available AMI corpus annotations

• Use of the NXT tool for querying and extracting dialog acts and speech segmentsfrom the corpus.

• Extraction of 30 voice quality and prosody measures, frame based and utterancebased.

• Performing principal component analysis (PCA) to search for patterns and thebest discriminators among the measures.

Tangible outcome: A framework for analysis of voice quality and prosody of speechbased on high quality free available software and tools.

Partners involved: DFKI, University of Edinburgh

Dissemination: Presentation at Rome SSPNet Workshop on foundations of SocialSignals

Related Publications: N/A.

SSPNet D9.2: page 6 of 13

Page 13: SSPNet Social Signal Processing Networkvincia/papers/D9.2.pdf · The correlations between prosody and voice qualities cues and social factors in scenario meet-ings are also being

SSPNet [231287] D9.2: Modelling politeness

Figure 1: Principal component analysis results for three voice qualities in the NECA DB.

SSPNet D9.2: page 7 of 13

Page 14: SSPNet Social Signal Processing Networkvincia/papers/D9.2.pdf · The correlations between prosody and voice qualities cues and social factors in scenario meet-ings are also being

SSPNet [231287] D9.2: Modelling politeness

Links to other thematic WPs: WP10

9. Implementation of a dialog module for the embodied conversational agentGRETA.

Description: We have developed a dialog module based on A.I.M.L (Artificial In-telligence Markup Language). The dialog module has been connected to the agentGRETA which performs gestures (through automatic generation of FML script) duringthe interaction. That enables a user to have small talk conversation with GRETA. Thisplatform will be used to evaluate the polite smiling behavior model in the context ofsmall talk conversation between the user and the virtual agent GRETA.

Tangible outcome: Dialog module of Greta

Partners involved: CNRS-LTCI

Dissemination: The dialog module of Greta is available on the Greta’s website:http://www.tsi.enst.fr/pelachau/Greta/mediawiki/index.php/Dialog Modules

Related Publication(s): N/A

Links to other thematic WPs: N/A

10. Bibliography of literature on smiles in Human and Computer Sciences.

Description: We have done a state of art on smiles, both in interpersonal interactions(the different smiles, the effects of a smiling human on the interaction) and in human-machine interaction (existing smiling virtual agents, perception of the user, effect onthe human-machine interaction).

Tangible outcome: Bibliography containing 30 references

Partners involved: CNRS-LTCI

Dissemination: Presentation at Rome SSPNet Workshop on foundations of SocialSignals

Related Publication(s): N/A

Links to other thematic WPs: N/A

11. Short overview of the modeling of “social” agents.

SSPNet D9.2: page 8 of 13

Page 15: SSPNet Social Signal Processing Networkvincia/papers/D9.2.pdf · The correlations between prosody and voice qualities cues and social factors in scenario meet-ings are also being

SSPNet [231287] D9.2: Modelling politeness

Description: We have provided a short description of some social dimensions thathave been represented and implemented in conversational agents and the ways in whichthis has been done. We used the agents that were developed at the University of Twenteas an example: some show behaviours that can yield an effect on the impression ofsociality, whereas others have explicit cognitive models related to social variables.

Tangible outcome: Paper and presentation at the ACII Workshop on Social SignalProcessing

Partners involved: University of Twente

Dissemination: Paper and Presentation

Related Publication(s): Dirk Heylen, Marit Theune, Rieks op den Akker, AntonNijholt: ”Social Agents, the first generations” in: Proceedings of the ACII Workshopon Social Signal Processing, Amsterdam, 2009, pp. 114-120

Links to other thematic WPs: N/A

12. Identifying social factors in floor management.

Description: We try to disentangle the social factors that are involved in turn andfloor management in conversations. This sheds a light on different aspects that go intopoliteness (rules of etiquette, personality, role, dominance, rudeness, rapport, etc.). Westudied the literature on floor with this perspective in mind. The analysis will be used inthe synthesis of behaviours for conversatioal agents with different politeness strategies.

Tangible outcome: Presentation at workshop (see Virtual Learning Center). Thetangible outcome will include a review paper and an annotated bibliography on theportal.

Partners involved: University of Twente

Dissemination: The work was presented at the Foundations of Social Signal Pro-cessing in Rome.

Related Publication(s): N/A.

Links to other thematic WPs: The work is related to the definition of a floorannotation schema in WP10.

13. Integration of the UEDIN’s head motion generation engine into the CNRS’sGRETA (embodied conversational agent) system.

SSPNet D9.2: page 9 of 13

Page 16: SSPNet Social Signal Processing Networkvincia/papers/D9.2.pdf · The correlations between prosody and voice qualities cues and social factors in scenario meet-ings are also being

SSPNet [231287] D9.2: Modelling politeness

Description: In order to achieve more natural and non-deterministic behaviours forthe GRETA system that originally employed rule-based approach, UEDIN’s head-motion generation engine was incorporated into the GRETA system.

Tangible outcome: N/A

Partners involved: UEDIN, CNRS-LTCI

Dissemination: N/A

Related Publication(s): N/A

Links to other thematic WPs: N/A

14. Speech-driven eye blink synthesis.

Description: We have developed a novel technique to automatically synthesise eye-blink from a speech signal based on statistical models. Controlling eye blinks based onspeech is crucial to develop realistic avatars with social signals due to the fact that hu-mans perceive relationship and synchrony between eye blinks and speech. The techniquepredicts eye blinks from the speech signal and generates animation trajectories automat-ically employing trajectory hidden Markov models. The evaluation of the recognitionperformance showed that the timing of blinking can be predicted from speech with aF-score value upwards of 52%, which is well above chance. Additionally, a preliminaryperceptual evaluation confirmed that adding eye blinking significantly improves the per-ception the character. Finally it showed that the speech synchronised synthesised blinksoutperform random blinking in naturalness ratings.

Tangible outcome: N/A.

Partners involved: N/A.

Dissemination: N/A.

Related Publication(s): Michal Dziemianko and Gregor Hofer and Hiroshi Shi-modaira, HMM-Based Automatic Eye-Blink Synthesis from Speech, Interspeech, p.1799-1802, 2009.

Links to other thematic WPs: N/A

15. Evaluation of the UEDIN’s speech-driven head motion system.

SSPNet D9.2: page 10 of 13

Page 17: SSPNet Social Signal Processing Networkvincia/papers/D9.2.pdf · The correlations between prosody and voice qualities cues and social factors in scenario meet-ings are also being

SSPNet [231287] D9.2: Modelling politeness

Description: To evaluate the feasibility of UEDIN’s talking head for social signalsynthesis, a large scale of perceptual evaluation was conducted using web-based userinterface.

Tangible outcome: N/A

Partners involved: N/A

Dissemination: N/A

Related Publication(s): Gregor Hofer and Hiroshi Shimodaira and Junichi Yamag-ishi, Speech-Driven Head-Motion Synthesis, IEEE Magazine of Computer Graphics andApplications, 2010, submitted.

Links to other thematic WPs: N/A

4 Future Achievements

In the future we will continue working on the various research areas that are described above.We will also perform evaluation studies on various aspects of politeness, such as specificity ofsmile behaviors of an ECA, politeness startegies. Research will happen in two broad areas:

1. From concept to model

2. From data and perception study to model

Regarding the first area, from concept to model, we will continue our work on the definitionof the general framework. We will also extend the description of politeness within this generalframework. We will focus on the notion of units of analysis. Units of analysis matter becausepeople will in practice analyse relationships within a particular type of unit, and be less awareeither of patterns that hold in much bigger units or in much smaller ones. They may wantit to be physically instantiated in the form of ’clips’. There are several types of grouping toconsider.

Linguistic : There is a familiar hierarchy of linguistic groupings: phoneme, syllable, word,phrase, turn, conversation, etc. Note that some units (sentence, paragraph) are primar-ily appropriate to text; others (turn, conversation) to face-to face interaction; others(agenda item, report) to special, quasi-formal types of social structure.

Non-Linguistic :The options for non-linguistic grouping are less clear. There are small unitslike facial action units, gestures, etc. Emotions impose a larger type of unit, markedby a particular type of emotion appearing, peaking, and fading: the term “emotionalepisode” has been used.

SSPNet D9.2: page 11 of 13

Page 18: SSPNet Social Signal Processing Networkvincia/papers/D9.2.pdf · The correlations between prosody and voice qualities cues and social factors in scenario meet-ings are also being

SSPNet [231287] D9.2: Modelling politeness

Social Groupings : Psychology has tended to consider the individual as the primary unitof analysis. Much of linguistics considers the dyad. Beyond that, there are various typesof communicative group. Larger units (also, confusingly, called groups) such as ethnicor religious groups, may play extremely important parts in individual interactions. Somay assemblies that are not groups in the social science sense, such as audiences or‘bystanders’.

This research question would benefit other WPs, in particular WP8 for the analysis work torecognize cues, and WP10 for the consideration of different types of communicative groups.

Within the second research area, several issues will be tackled. Some works will be tar-getted on nonverbal behaviors such as head movement, smile and other facial expression:

smile :

• Creation of a context-free lexicon of smiles (signal - meaning): We are performingevaluations to identify the different types of ECA’s smile perceived by the userand to determine the meaning the users associate to the smiles depending on theirmorphological and dynamical characteristics. This work aims at creating a context-free lexicon of smiles that correlates smiles and their meaning (independently fromthe context).

• Computational Model of ECA’s polite smiling behaviors: We aim at creating acomputational model that enables one to determine the smiling behavior of an ECAdepending on the dialog context (communicative act, communicative intention,topic of the discussion) and depending on the global socio-emotional context ofthe interaction (social roles, affective state, ).

listening agent : Based on the enhanced GRETA system by the stochastic head-motiongeneration engine developed this year, we will develop, next year, an agent that auto-matically detects and generates backchannel head motion at appropriate moments inthe users speech. The AMI corpus and its Stanford head-motion annotations will beat first analysed to find relationships between the speaker’s speech/head-motions andlistener’s head motions. Then, stochastic models (i.e. HMMs) will be used to learn therelationships automatically from data.

Integration of rule-based and statistical-based approaches to generate realistic non-verbal signals for embodied conversational agents: Since the two approaches work com-plementary, it is hoped that integration of the two approaches will lead to more realisticagents whose movements are non-deterministic and of great variety. To that end, theGRETA’s rule-based engine is employed to drive UEDIN’s statistical models of headmotion. A prototype system will be developed next year. (collaboration between CNRSand UEDIN)

Development of speech and video-rate 3D facial image database : This data collec-tion aims to improve the quality of the current agents in terms of not only image ordynamics but also variety of motions specific to politeness and personality. Data of 4people are planned to record next year subject to the availability of recording facilities.

Facial animation workshop : The UEDIN will host a facial animation workshop nextyear.

SSPNet D9.2: page 12 of 13

Page 19: SSPNet Social Signal Processing Networkvincia/papers/D9.2.pdf · The correlations between prosody and voice qualities cues and social factors in scenario meet-ings are also being

SSPNet [231287] D9.2: Modelling politeness

Other works will be focused on prosody and voice quality:

analysis : Analyse prosody and voice quality of politeness on the database that is going tobe recorded in Belfast. For doing this analysis we will use the framework developed inthe first year.

generation : Investigate what is the best approach to realise prosody and voice qualitypatterns, found in real data (Belfast database), on synthesised speech. We will considerto combine two levels:

• High level: using expert rules to represent the prosody and voice quality patternsthat we can found in real data.

• Low level: using parametric/statistic based synthesis, Harmonic plus Noise Model(HNM) based unit selection synthesis (see achievement 7, it will be integrated in theMARY system) and/or Hidden Markov Model (HMM) based synthesis (availablein the MARY system). We will develop speech signal modification algorithms,appropriate for this speech synthesis technology, to realise the prosody and voicequality modifications derived from the expert rules.

These works on verbal and nonverbal behaviors will be integrated:

multimodality : Integrate the speech synthesiser capable of realising politeness behaviouron the ECA politeness demonstrator.

floor management : propose a model that will be implemented and evaluated in a conver-sational agent displaying different politeness strategies.

All these various works will be used to develop a markup language for social signal pro-cessing, to be able to describe, for example, the high level rules for realising politeness inautomatic speech synthesis. It will be a continuation of the initial discussion on this issuethat was done during the first year.

References

[Burkhardt et al., 2005] Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., and Weiss,B. (2005). Database of german emotional speech. In Interspeech.

[Lugger et al., 2006] Lugger, M., Yang, B., and Wokurek, W. (2006). Robust estimation ofvoice quality parameters under realworld disturbances. In ICASSP.

[Schroder and Grice, 2003] Schroder, M. and Grice, M. (2003). Expressing vocal effort inconcatenative synthesis. In 5th International Conference of Phonetic Sciences.

SSPNet D9.2: page 13 of 13