DesigningSoni cationofUserDatain A ectiveInteraction · In this master's thesis it is presented a proposal for sound design strategies that can be used in applications involving affective

Designing Soni�cation of User Data inA�ective Interaction

ANNA DE WITT

Supervisor:Roberto Bresin Examiner: Sten Ternström

Examensarbete i Musikakustik(Master Thesis in Music Acoustics)

KTH - School of Computer Science and Communication (CSC)Department of Speech, Music and Hearing

S-100 44 Stockholm

iii

To my dad

Designing Sonification of User Data in Affective Interaction Abstract Different design approaches contributed to what we see today as the prevalent design paradigm for Human Computer Interaction; though they have been mostly applied to the visual aspect of interaction. In this master's thesis it is presented a proposal for sound design strategies that can be used in applications involving affective interaction. The sonification of the Affective Diary, a digital diary with focus on emotions, affects, and bodily experience of the user is proposed for testing this approach. Results from studies in music and emotion were applied to sonic interaction design. This is one of the first attempts introducing different physics-based models for the real-time complete sonification of an interactive user interface in portable devices.

Design av användarens data ljudsättning i affektiv interaktion Abstract

Olika designansatser har bidragit till vad vi idag anser vara den rådande design paradigmen inom Människa Dator Interaktion (MDI); även om de har varit mest applicerade på den visuella aspekten av interaktionen. I denna magisteravhandling presenteras ett förslag till strategier för ljuddesign som kan användas till applikationer som använder affective interaction. Affective Diarys ljudsättning är ett förslag för att testa denna ansats. Affective Diary är en digital dagbok som har fokus på känslor, affekt och användarens kropps reaktioner. Resultat från studier inom musik och känslor och ljudinteraktion applicerades på denna design. Detta är ett av de första försöken att tillämpa fysisktbaserade modeller för realtids ljudsättning av ett interaktivt användargränssnitt för bärbara verktyg.

Aknowledgements I would like to thank Roberto Bresin for choosing me as master degree student and for being the ideal supervisor. He has been a good listener and a precious guide, letting me work on my own ideas. Thank you! I would also like to thank Göran for being patient with me during this work.

Contents

Contents vii

List of Tables ix

List of Figures x List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii

1 Introduction 1

I Background and design proposal 3

2 Literature Review 5 2.1 Soni�cation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Emotion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3 Embodiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 Design 13 3.1 A�ective Diary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.2 Sonic interaction in the A�ective Diary . . . . . . . . . . . . . . . . . 143.3 Soni�cation of the A�ective Diary events . . . . . . . . . . . . . . . . 15

3.3.1 Soni�cation of SMS messages . . . . . . . . . . . . . . . . . . 153.3.2 Soni�cation of Bluetooth presence . . . . . . . . . . . . . . . 15 3.3.3 Soni�cation of scribble events . . . . . . . . . . . . . . . . . . 173.3.4 Soni�cation of photo events . . . . . . . . . . . . . . . . . . . 193.3.5 Soni�cation of abstract body shapes . . . . . . . . . . . . . . 19

3.4 The tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.4.1 Pure Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.4.2 Sounding Objects . . . . . . . . . . . . . . . . . . . . . . . . . 21

II Conceptual models, implementations, tests and results 23

vii

4 The SMS event 25

viii CONTENTS

4.1 Conceptual model and implementation . . . . . . . . . . . . . . . . . 254.2 Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.2.1 Subjects and test settings . . . . . . . . . . . . . . . . . . . . 284.2.2 Stimuli and procedure . . . . . . . . . . . . . . . . . . . . . . 284.2.3 Results and conclusions . . . . . . . . . . . . . . . . . . . . . 32

5 The Bluetooth event 39 5.1 Conceptual model and implementation . . . . . . . . . . . . . . . . . 395.2 Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44


6 The Scribble event 53 6.1 Conceptual model and implementation . . . . . . . . . . . . . . . . . 536.2 Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56


7 General discussion and future works 67

8 Appendix 71

Bibliography 77

List of Tables

4.1 Rank and percentile table for the association of an emotion to marble sound texture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

6.1 SOb's book [21] p.162, the phenomenological guide to the friction model parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

6.2 Rank and percentile table for the recognition of a material vs. a pen sound textures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

6.3 Rank and percentile table for the association of an emotion to a sound texture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

6.4 Rank and percentile table for the association of an emotion to a pen thickness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

ix

List of Figures

2.1 With the modi�ed Brunswikian lens model, Juslin aims to show how the performers and composers communicate emotions and how the listener perceive them. Even if the process is successful it is very much in�uenced by the individual's personality and it is di�cult to reach a perfect communication accuracy. . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2 Hevner Adjective Circle . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.1 A�ective Diary: abstract colourful body shapes represent the values of collected sensor data. High energy corresponds to red colour, and high activity to the standing shape. . . . . . . . . . . . . . . . . . . . . . . . 14

3.2 A�ective Diary: representation of SMS messages. The timeline is represented in the lower part of the �gure by a gradually changing hue containing a miniature version of SMS messages, abstract shapes, Bluetooth presence, and photos. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.3 Marble Answering Machine . . . . . . . . . . . . . . . . . . . . . . . . . 163.4 A�ective Diary: representation of Bluetooth presence. The identi�cation

of the Bluetooth device is also visualized. . . . . . . . . . . . . . . . . . 17 3.5 Pure Data interface for the control of footstep sounds . . . . . . . . . . 17 3.6 Pure Data interface for the control of writing sounds . . . . . . . . . . . 18 3.7 A�ective Diary: representation of scribble events. . . . . . . . . . . . . . 19 3.8 A�ective Diary: representation of a digital photo. . . . . . . . . . . . . . 20 3.9 A�ective Diary:the body shape representation . . . . . . . . . . . . . . . 20

4.1 Pure Data interface for the control of marble sounds . . . . . . . . . . . 26 4.2 The high level conceptual model of the marble impact sounds . . . . . . 27 4.3 The test settings, the Kroonde and the boxes with a bidirectional accel

eration sensor applied . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.4 The stimuli settings for the marbles quantity test . . . . . . . . . . . . . 30 4.5 The test settings. The PD application for the marble sound test . . . . 31 4.6 Frequency estimation over the quantity of marble impact sounds . . . . 34 4.7 Variance over the quantity of marbles estimate for marble impact sounds

representing groups of 1, 3, 5, and 10 marbles . . . . . . . . . . . . . . . 35

x

4.8 Association of emotion to material and frequency . . . . . . . . . . . . . 36

List of Figures xi

4.9 Association of emotion to gravity force and impact force . . . . . . . . . 37

5.1 Pure Data interface for the control of footstep sounds . . . . . . . . . . 41 5.2 The actual curve that the footstep model produces . . . . . . . . . . . . 41 5.3 The high level conceptual model of the walking pattern . . . . . . . . . 42 5.4 The interface for the emotional walking pattern . . . . . . . . . . . . . . 43 5.5 The footstep test application. Age de�nition . . . . . . . . . . . . . . . . 455.6 The footstep test application. Gender de�nition . . . . . . . . . . . . . . 45 5.7 The footstep test application. Emotion de�nition . . . . . . . . . . . . . 46 5.8 Associations of frequency[Hz] or foot size vs footstep duration. Age def

inition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.9 Associations of frequency[Hz] or foot size vs footstep duration. Gender

de�nition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.10 Gender vs material association . . . . . . . . . . . . . . . . . . . . . . . 515.11 Emotion positioning and naturalness . . . . . . . . . . . . . . . . . . . . 52

6.1 The conceptual model for the friction sound of a pen . . . . . . . . . . . 55 6.2 Test application for the sound of a pen friction . . . . . . . . . . . . . . 576.3 The test session for the pen sounds . . . . . . . . . . . . . . . . . . . . . 576.4 Sound texture recognition and realism. . . . . . . . . . . . . . . . . . . . 626.5 Better with or without sound and sound response to pen movement . . . 63 6.6 E�ort perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646.7 E�ort perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656.8 Emotion association to pen thickness . . . . . . . . . . . . . . . . . . . . 656.9 Emotion association to sonic feedback . . . . . . . . . . . . . . . . . . . 66

.1 A�ective Diary marble sound . . . . . . . . . . . . . . . . . . . . . . . . 72

.2 A�ective Diary footstep sound . . . . . . . . . . . . . . . . . . . . . . . 73

.3 A�ective Diary footstep sound, material bank . . . . . . . . . . . . . . . 74

.4 A�ective Diary pen friction sound . . . . . . . . . . . . . . . . . . . . . 75

.5 A�ective Diary pen friction sound . . . . . . . . . . . . . . . . . . . . . 76

xii List of Figures

List of AbbreviationsHMI Human Machine Interaction MDI Manniska Dator Interaktion AD A�ective Diary SICS Swedish Institute of Computer Science SOb Sounding Object project PD Pure Data GEM Graphics Environment for Multimedia

Chapter 1

Introduction

Being in the world and acting in an everyday setting puts us in special perspective regarding sounds; we move around the world without realizing that sounds take a great part of the reality which we inhabit. Sounds in�uence us, some of them in a positive and enriching way, while others are almost disturbing or even exhausting. Humans act their everyday life in a sound continuum, a soundscape [24],1 but their attention may not be always engaged with the sound per se. Our attention is turned to sounds when they communicate something important for our interaction with the environment, such as alarm sounds, sounds of moving objects, or sounds produced by human actions. In everyday listening [10] sounds are embedded in the world and constitute an important part of humans' interaction with it.

Sound is not tangible, not visible but still very much real. Sound in its natural expression is a phenomenon which is situated in the world, and in the moment in which it is originated. It origins from a source which we may perceive as a tangible and probably visual reality, it expands in a medium, the air, to reach our perceptual apparatus and it exists until it disappears from our perception domain. Sound is a multichannel information carrier. It informs about the event that caused it and therefore provides cues for the listener's interpretation of the environment in which they manifest. As an example, the footstep sounds of a person walking on a wooden �oor tell us about the gender, age, size, and emotional intention of the person, and hardness and material of both shoes and �oor [12].

Soni�cation is a recent �eld of research that explores the use of sound as information display in parallel or as complementary to visualization. In particular sonic interaction is called the �eld in which soni�cation is applied in human-machine interaction. In the present work the aim is to propose the use of sound models based on sonic interaction in applications focusing on a�ective interaction. The purpose is also to put soni�cation design into an ecological perspective and try to design for

1The word soundscape is a counterpart of landscape. Murray Shafer started soundscape studies at Sion Fraser University in the 60's. He puts the study of sound in an environmental context, by opposing the hi-� soundscape of nature to the lo-� soundscape of the nowadays world. Animportant issue to him is the design of a soundscape for our daily life that takes in account factors as comfort and life quality.

1

2 CHAPTER 1. INTRODUCTION

the embodied interaction paradigm [4].

Part I

Background and design proposal

3

Chapter 2

Literature Review

2.1 SonificationDi�erent design approaches contributed to what we see today as the prevalent design paradigm for Human Computer Interaction; though they have been mostly applied to the visual aspect of interaction. The �eld of sound and music computing has reached signi�cant results and developments in terms of new theories, techniques, and models during the last �fteen years [26] 1 . The increasing computer power and its constantly diminishing costs allow for real-time sophisticated sound synthesis models which were unthinkable only ten years ago. All this opens for the new possibilities of including sound in the human-machine interaction loop.

The urgent need for interpreting and displaying the increasing amount of information, and for looking at new ways of expression has lead in the last 10 year to the use of sound as a new dimension to be considered. The development of the soni�cation discipline has been in�uenced by the ecological approach [11]. In psychoacoustics, Bill Gaver [10]took the ecological approach to everyday listening. His studies in sound perception, sound analysis and design contributed considerably to extend the boundaries of human-computer interaction especially by o�ering possible alternatives to the dominant visual displays.

The concept of everyday listening according to Gaver is the experience of a sound and its producing event e.g. it happens when we hear the sound of steps behind us and we perceive that the sound is produced by a person walking by. Gaver suggests everyday listening as a �eld of study and develops a framework for describing everyday sounds via physical analysis and algorithm studies. He describes everyday listening as the experience of "hearing events in the world rather than sounds per se" [9], as it happens in musical listening. It is possible to apply the everyday listening to any sound or piece of music, just by listening in term of its source.

On the other hand the musical listening is the listening in terms of the sensory

1The Sound and Music Computing Roadmap:http://www.soundandmusiccomputing.org/�lebrowser/roadmap/pdf

5

6 CHAPTER 2. LITERATURE REVIEW

qualities of the sound, or piece of music [10]. Thus the distinction between everyday listening and musical listening is a distinction between how we experience the sound, not a distinction between sounds.

Gaver also investigates the features of events that human beings hear and recognize. From the ecological perspective an important step to understanding how listeners can hear events in the world is to investigate how sounds can specify their source. A researcher could for instance specify sound sources in terms of material or size and the physical event that generated the sound. After stating that each source of sounds involves an interaction of materials at a location in an environment, Gaver divides sounds events into three basics sources of sounds, vibrating solids, liquids and gasses, thus developing a framework of perceptual attributes that describe the perception of events [10] [4].

From the combination of basic level events it is possible to understand more complex events as for instance a series of footsteps on cobblestones. An interesting consideration about complex events is that complexity comes in terms of the physical event and not in terms of sound perception. In fact the perception of complex sound events is almost intuitive and it even brings with it a broader spectrum of auditory cues that makes it possible to infer a complex set of information. In his study Gaver [9] explores the acoustic and physical basis of everyday listening. The result is a number of algorithms which enable the synthesis of sounds made by basic-level events such as impacts, scraping, and dripping and more complex events such as bouncing, breaking, spilling, and machinery. As an example application of his ideas on everyday listening, Gaver proposes the creation of auditory icons that provide intuitive mapping between sound and computer events. Gaver invites other researchers to further develop his algorithms and hopes that his study will become a starting point for future research.

Having Gaver set the basis for a new interaction paradigm HMI designers started to consider sound in a new way. Auditory icons became a mean to convey information, a�ordance and coupling through the exploitation of sound potentials, sustaining the same design ideas that visual icons are meant to do.

Consequently to Gaver's studies other researchers came on with considerations about the context in which sounds are produced. Sound is resulting from actions during a certain time interval and it is often connected to human gestures. An important aspect is the coupling between action and sound. A further step in Gaver's direction has been the the Sounding Object project [21]. This project provided new sound models, running in real-time and based on physics-based modeling, which can be controlled for generating sounds in a natural and dynamic way in human-machine interaction (HMI) applications. Because of their physical nature, these sound models can communicate properties such as size, material of an object, and information on user's manipulation e�ort on it. These sounding objects can therefore inform users with feedback about their actions on di�erent objects, such as virtual objects on the screen, sensor-enabled devices, etc. These new sound models are used in the present work as explained in the following sections.

2.2. EMOTION 7

2.2 Emotion

Emotions are part of the experience in the world and they are necessary for physically prepare the individual to an action, they are determined by real life conditions and they occur in social interactions that have some implication for the individuals' life [15]. Soni�cation design can address to the ideas of using the sound as an interface that could provide for emotion as interaction.

When including emotions in HMI applications the designer should consider the cultural context and the dynamic experience. The user interface should help people to understand and experience their own emotions.

This can be achieved by extending the boundaries of the cognitive model traditionally adopted by the HMI community toward a�ective interaction. Emotion and cognition should not be construed as private phenomena and restricted to the boundaries of the body [2].

Emotions are tightly connected to the social context they manifest themselves in, they are qualities related to an experience, like narrative qualities that develop in time and during the action.

It has been demonstrated that at a certain extent emotions can be observed and modeled. Active �elds of research in this sense have been those of speech and music.

Many researches during the years investigated how composers and performers work with the musical structure in order to communicate emotions. It came to evidence that there were combinations of cues to describe a speci�c emotion. The cues can be related to the composition features and to the performance features.

The �rst important study was conducted by K. Hevner in 1936. She presented a list of adjectives to address the di�erent state of mind. The adjectives were clustered in a circular con�guration(see Figure 2.2). Every group of adjectives represented the possible emotional interpretation of a piece of music. She tested di�erent pieces of music and analyzed the structural factors that delineated the character of the pieces as described by her adjective list. Her studies still in�uence the modern research that focuses on �nding the factors that a�ect emotional expression in music.

Gabrielsson and Lindstöm [8] gave us a summary of the results from more recent studies. The researchers investigated how various composition structures (tempo, mode, pitch variation, tonality, rhythm, musical form and so on) a�ect emotional expression in musical composition. They tried to associate emotions to separate musical structures by testing the listeners' perception of emotion. Two examples from these studies showed that major mode can be associated with happiness and minor mode with sadness. Slow tempo is associated to calmness and serenity, on the contrary fast tempo represent agitation and excitement. Other factors that have been investigated are rhythm, tonality intervals, pitch level, melodic direction, melodic motion and many others.

Patrik N. Juslin [14] studies con�rmed that music performers are able to communicate emotions to the listeners. They also individually characterize the performance. Questions arising in this research are how accurate the communications of emotions is and which are the codes used in performance and in listeners' percep


tion. The accuracy of the perception during the tests was found to be good for the basic emotions as sadness or happiness. Nuances of emotions were more di�cult to recognize because they are often related to the individual di�erences in encoding and decoding them. The codes used by the performers to interpret the pieces of music were mainly sets of cues that they used to a�ect their performance. Those cues regard the performance features as for instance articulation, sound level, and time variations. An interpretational model was proposed where the basic emotions (happiness, sadness, anger, fear, tenderness) were arranged in a two-dimensional emotion space. This emotional space also o�ered a model that describes how the emotional expression can be gradually modulated by altering the cues during the performance.

Juslin's study proposed also a theory about the origin of this expressive code. He hypothesized that emotional expression is in�uenced by two main factors: brain programs and social learning. The existence of innate capabilities in humans' as a sort of brain programs, to convey emotion in vocal expression might also explain the performer's capabilities of expressing emotions in music. The innate capability of interpreting and decode an emotion is tightly related to the innate skill of vocal expression. These brain programs together with learned social codes create the framework of expression features that govern emotional expression in speech as well as in musical performance. The process of communication, by coding and decoding the expressive cues was interpreted by introducing a modi�ed version of the Brunswikian lens model2 . Juslin illustrated how the performers' coding process and the listeners' decoding process works. The cues codes are described as probabilistic. This means that the performers' encoding process abounds of cues in order to intend an emotion. The listeners on their behalf have to combine these cues to make a judgment of the expressed emotion. The lens model implicates that it is extremely di�cult to reach a perfect communication accuracy, and if we want to investigate the success in emotion communication there must be a common code that performers and listeners share to describe the emotions (see Figure 2.1). The test results of Juslin's study showed greater audience agreement for some of the basic emotions (happiness and sadness) than for more complex emotions. Only some of the listeners that took part to the tests were musically trained, but this did not a�ect the perception of emotion which proved to be as accurate for the trained audience as for the non trained listeners.

Bresin and Friberg [3]conducted a study by using a program for automatic music performance. They analyzed the results from a test where a panel of listeners was asked to try to recognize intended emotion of two di�erent pieces of music. The two short compositions were computer generated and controlled by set of macro rules that were set up in order to re�ect di�erent emotions. The macro rules are sets of acoustic parameters that were selected by a panel of experts taking into account

2 The Lens Model was created in 1952 by Egon Brunswik (1903-1955). Brunswik formalizedHelmholtz's ideas with the lens model.The Lens Model presents di�erent patterns, and is wellaccepted in studies that treat social judgment.

2.3. EMBODIMENT 9

previous studies about emotional cues in music performance. The conclusions were that it is possible to group performance rule into macro rules that generate di�erent emotional coloring during performances. It is also possible to extend these macro rules to di�erent musical styles.

Music listening can di�er from other emotion producing conditions in which the emotional situation usually involves a social context and the use of verbal interaction. Emotional reactions are reported for music listening done in isolation, for example at home or in an experimental session [15]. Scherer and Zentner [25] propose an alternative point of view considering that music can provoke emotion in a similar way as other emotion producing events. They propose routes i.e. mechanisms whereby emotions may be caused by music. They propose that memory, empathy and contagion are mechanisms of interest because they suggest a link between music and other kind of performances as for instance theater or movies that are considered to have potential emotion induction in the viewer.

Besides di�erent ways of considering emotion induction in music it is proved by psycho physiological studies that music has in�uence on physiological parameters, as for example tests conducted with the use of physiological parameters such as vascular, cardiac, electro-dermal, and respiratory functions [15] [25] [19] [16]. Results appear also in clinical and therapeutic contexts. The physiological responses to emotion elicitation by listening to music sustain the listeners' verbal reports with consistency [15].

The interesting implication for soni�cation design is the possibility of using the information and the conclusion that these studies came to by exploiting the potential that music has. As we have written above, humans listen to sound in both everyday listening and musical listening [10]. This implicates that the perceptive attention shifts from the need to hear properties of the sound source, and the need to interpret the situation that generated the sound, the �mood of the environment�. In the present work, as we will see in the following, we applied results from studies in music and emotion to sonic interaction design. We also tried to model emotional understanding of sounds starting from studies on the perception of expressive sounds [12].

2.3 Embodiment

In a virtual experience, users are mainly observers and do not inhabit the environment. They try to make sense of the interaction interface by constantly checking if the results of their interaction were the expected ones. This is a disconnected way of interaction with the virtual environment. In the real world instead, users inhabit the interaction, are bodily present in the world, and are directly connected with the action. This is what it is called Embodied Interaction.

According to Paul Dourish [4] embodied phenomena occurs "in real time and real place", and therefore "embodied interaction is the creation, manipulation and sharing through engaged interactions with artifacts". We argue that the perceptual


Figure 2.1. With the modi�ed Brunswikian lens model, Juslin aims to show how the performers and composers communicate emotions and how the listener perceive them. Even if the process is successful it is very much in�uenced by the individual's personality and it is di�cult to reach a perfect communication accuracy.

Figure 2.2. Hevner Adjective Circle

11 2.3. EMBODIMENT

discrepancy that exists between the real world and the virtual one is wider for the visual or haptic domain, while the auditory domain contains properties in itself that �t better for the application of the embodied interaction de�nition. As an example consider the di�erence between writing with a pen or writing with a word processor. Writing with pen and paper implies the use of real world objects, i.e. the execution of an action that is regulated by physical events during time, and with results perceivable in real-time while the action evolves. The pen and paper objects a�ord the user's action in a direct way. Writing with a word processor forces the user to understand the metaphor given by the system.

The pen-pressing action leaves ink on the paper as a direct translation of user's intention. The key-pressing action is instead mediated by the system which shows on the screen the result of the action. The user's experience of the system is that of disembodiment, action and its result are disconnected from each other.

We argue that the sonic environment instead has properties that constrain the representational models to provide for unmediated connection between the sound (both real or virtual) and the user (i.e.the listener). If sound events are not synchronized with the event that generated them, the listener will perceive them as unnatural. Sound perception mechanisms also constrain sound models to a tightly coupled connection between the event, the sound and the listener. It is therefore necessary to use sound models which promptly respond to user's actions.

Chapter 3

Design

3.1 Affective Diary

As a test case of interactive soni�cation of data we had the possibility to apply our ideas to A�ective Diary (AD), a digital diary with focus on emotions, a�ects, and bodily experience of the user [17].

AD is a system developed by researchers at SICS 1, Stockholm, in collaboration with Microsoft Research, Cambridge. AD runs on table-pc and smart phones. The latter in particular is characterized by the small dimension of the graphical display. This is the ideal situation for introducing the auditory channel as information channel in the design of AD, i.e. when it is di�cult to see, it is still possible to hear! In the present work we propose a design for an interactive soni�cation of AD.

AD2 is a software tool for organizing the digital information that a typical teenager collects during her everyday experience via portable devices such as mobile phones, mp3 players, digital cameras. Digital information can be SMS and MMS messages, photographs, and bluetooth IDs of other people devices. This information is complemented with data from wearable sensors that the user wears during the day. Sensors collect body temperature, number of steps, heart beat, and also environmental sounds. All these data are automatically organized in the form of a "story of the day" along a time line from early morning to late night.

The main idea behind AD is that of inviting the user to focus on and reexperience her emotional, social and bodily experience. Data collected by portable devices are presented as they are, while sensor data are plotted using an abstract colourful body shape (see Figure 3.1).

The AD interface is under continuous development. So far the system presents the following features; SMS messages, Bluetooth presence, Scribbles, Photos, Movie (or timeline), Edit. These are all features which can be empowered with interactive soni�cation and they are described in the following section.

1Swedish Institute of Computer Science. http://www.sics.se/2http://www.sics.se/interaction/projects/ad/

13

14 CHAPTER 3. DESIGN

Figure 3.1. A�ective Diary: abstract colourful body shapes represent the values of collected sensor data. High energy corresponds to red colour, and high activity tothe standing shape.

3.2 Sonic interaction in the Affective Diary

In the present work we decided to use sound models developed in the Sounding Object project (SOb) [21]. The researchers of the SOb project aimed to select sounds that communicate information and give users feedback about their actions on the system. Sounds were generated by physics-based models in order to directly control a representation of the dimensions that sound communicates, as for example size, material or user's manipulation e�ort. The results were models for sound synthesis. The sound models tend to become really complex if the aim is to reach the perfect reproduction of a sound.

The SOb sound models emphasize only the main features of the sound that they model, thus creating a cartoon of the sound. Cartoon sound models give the sense of a living sound, and at the same time di�erentiate from the real sound. Since these sound models are physically informed, it is possible to control them in real-time through their parameters, thus allowing for their direct manipulation.

These features make these sound models the ideal candidate for the design of interactive soni�cation in HMI applications. These sound models sustain the design paradigm of the user's continuous �ow of action, i.e.the Embodied Interaction paradigm that puts the user in control of the entire course of events.

In the present work, the role of the interaction designer is to provide the user with sounding objects, modeled by their physical and perceptual parameters, with a priori parameters settings depending on the real world rules of interaction.

In the next section we present some sound models for the interactive soni�cation of AD. All sound models presented in the following work in real-time, are reactive to users gestures and to AD events. The sound models are developed in Pure Data (PD), a multi-platform system for the sound synthesis and analysis 3. PD communicates with AD via network protocol.

3Pure Data: http://www.puredata.info

3.3. SONIFICATION OF THE AFFECTIVE DIARY EVENTS 15

3.3 Sonification of the Affective Diary events

3.3.1 Sonification of SMS messages

In AD, SMS messages are visualized as concentric circles and appear on the timeline at the moment they were received. The SMS text appears when tipping or pointing on its graphical representation (i.e. the concentric circles) (see Figure 3.2).

In order to attribute a sonic metaphor to the event of an �incoming� SMS many ideas were considered, but the idea of a container and content was the one that felt more prone to provide with coupling and a�ordance. It �nds its counterparts in other studies, as for example in the Tangible Bits [13] or the Marble Answering Machine where the idea was of small marbles representing incoming messages for a telephone answering machine (see Figure 3.3).

An SMS event can be illustrated as a little marble falling into a container. The container is the metaphor for the hand-held device. When an SMS is received it will sound as an impact of the marble into the container. We used a sound model for impact sounds that can be con�gured to infer the idea of material, weight and impact force [23] [20]. In that way it is possible to personalize the sender of the message by associating sound attributes that distinguish her, such as di�erent material and/or size. The sender could also set up his own attributes. The users feel free to construct the a�ective implications of the sound they produce by interacting with the sonic interface almost in the same way they would do if they were using real marbles in real box. The a�ective component of the synthesized impact sounds is given by the simulation of di�erent materials, sizes, frequencies (spectral centroid). For instance an important message could be represented by the sound of an heavy and large steel marble, that of an happy message by a light, bouncing, wooden marble.

An interesting aspect of this idea is that the metaphor continues to live during the interaction with the device. For determining how many SMS messages are in the hardware device running AD, the user could shake it and would get a sonic feedback representing the marbles shaking into the �virtual� container. In this way it would be possible to get an idea about the number of messages in the device, and who send them (if di�erent sizes and materials have been used in the modeling). The aim of this design is to make possible for the user to ascribe a meaning to the sound that she produces by interacting with the device and consequently to provoke the utilization of the information in the user's behavior.

If the case that SMS messages are accompanied by a picture, such as in MMS messages, it has been shown that sound and music can enhance the emotional communication embedded in the text message [18].

3.3.2 Sonification of Bluetooth presence

In AD, if the user moves in a environment where there are Bluetooth devices, their presence is graphically represented by abstract shapes on the screen of the portable device. The shapes become highlighted in correspondence of the timestamp when


Figure 3.2. A�ective Diary: representation of SMS messages. The timeline isrepresented in the lower part of the �gure by a gradually changing hue containing a miniature version of SMS messages, abstract shapes, Bluetooth presence, and photos.

Figure 3.3. Marble Answering Machine

the user has approached a Bluetooth device during the day. The identi�er of a Bluetooth device (i.e. the user name or the device name) appears on the screen by tipping on its abstract shape (see Figure 3.4).

For the sonic representation of Bluetooth presence we thought of using the sound of footsteps of people passing by. The main reason for this choice is that identi�ers of Bluetooth devices represent people that the AD user encountered during the day when moving/walking in di�erent environments, such as public or working spaces.

Previous studies on the modeling of footstep sounds [5] and on the analysis of expressive walking sounds [12] inspired the idea of designing a model for the synthesis of emotionally expressive human footsteps.

Footstep sounds can be characterized by modifying the control parameters of their sound model such as impact force, distance, step velocity, and the ground textures which can be distinguished in soft or hard (see Figure 3.5). We based the cues on the analysis of emotional in�uenced walking samples that were collected from persons that did not possess any kind of musical knowledge. During the tests the participants were asked to walk in a way such as if they felt emotions fear, anger, happiness, sadness, and neutral. The acoustic analysis provided the parameters nec

17 3.3. SONIFICATION OF THE AFFECTIVE DIARY EVENTS

Figure 3.4. A�ective Diary: representation of Bluetooth presence. The identi�cation of the Bluetooth device is also visualized.

Figure 3.5. Pure Data interface for the control of footstep sounds

essary to create an interpolation model of the emotional cues. Special combinations of this control parameters allow for the direct manipulated of walker parameters such as age, gender, weight, size, ground texture, pace, and emotion. The synthesis of the footsteps then becomes malleable to the emotional in�uence of the user that manipulates the interface. Bluetooth presences could thus be represented by the description of a human presence going by with walking sounds if it was a longer presence, or with running sounds it was a shorter presence.

3.3.3 Sonification of scribble eventsThe user of AD can write some comments using a pen on a Tablet PC or on a smart phone. This will produce freehand scribbles appearing on the AD display. The generated scribbles will be saved automatically and will reappear if that particular moment of the day will be revisited by the user (see Figure 3.7).

When writing the user acts on the display and consequently gets the haptic and auditory feedback in a natural way. Because of this, the direct soni�cation of


Figure 3.6. Pure Data interface for the control of writing sounds

the event of writing on the AD display could be disregarded, however sometimes the auditory feedback could be too soft or the user could be wearing headphones preventing her from hearing it.

The sonic representation of the scribble is created as the friction sound of a pen on a surface. Even in this case the sound is produced in real-time by using a physicsbased model, a model of friction sounds [1], and by modulating the parameters that de�ne the pen sound, such as point thickness and point material.

As the timeline scrolls it could be useful to hear the sound of the pen friction on the �virtual paper� for providing feedback on the presence of a scribble, since it is not visualized on the timeline, and given the small size of the screen. Also, one could think of scrolling AD without watching it and just listening to the sound feedback for identifying the scribble position.

The user interaction is coupled to the real-time synthesis model. The parameters which control the sound model can be retrieved and re-synthesized when the scribble reappears on the screen. This happens when the user browses AD. The user could browse AD without watching it and just listening to the sound feedback for identifying the scribble position. Sounds textures that has been created simulates sounds of pencil, chalk and felt-tip pen (see Figure 3.6). These textures are meant to express di�erent e�orts and to map di�erent graphical properties of the pen (i.e. wide line for a felt-tip pen, thin line for pencil). The writing gesture of the user is embedded into the sound, providing feedback on the user gesture. The user can write faster or slower, with regular or irregular movements and thicker or thinner point thus sounding nervous or gentle.

In this way, a�ective gestures of the user are directly re�ected in the synthesized sound.

19 3.3. SONIFICATION OF THE AFFECTIVE DIARY EVENTS

Figure 3.7. A�ective Diary: representation of scribble events.

3.3.4 Sonification of photo events

The AD user receives and/or takes photos during the day and they are embedded in the diary in correspondence to their timestamp.

As in the case of scribbles, we want to raise the user attention when she is browsing AD and a photo appears in a speci�c moment of the day. The event of taking a picture has already its own well established soni�cation standard. The click sound of a mechanical camera have been indeed replicated into digital cameras for providing feedback to users. Think for example to digital cameras in mobile phones. The sound is always the same, since it is meant to be generated by a �xed mechanical action. Therefore, in this case a sampled sound is the natural choice for sonifying photo events.

Another possibility would be that of using ambient sound recorded at the same time when the photo was taken. This would help the user in remembering the situation if she was taking the photo by herself or in enhancing her sense of presence in the environment where the photo was taken.

3.3.5 Sonification of abstract body shapes

The movie event is an automatic scroll of the day stored in AD. The timeline scrolls on as the screen shows the daily events in the order they happened. It is possible to stop and pause the movie and to modify the shapes on the screen according to the actual a�ective and bodily experience of the user(see Figure 3.9).

The position registered by the physical input devices is still represented as a shadow and in this way it is possible to make a comparison between the di�erent representations .

One of the functions of the timeline is to display during the playing of the movie the data collected by the wearable sensors. These are visualized as abstract shapes


Figure 3.8. A�ective Diary: representation of a digital photo.

Figure 3.9. A�ective Diary:the body shape representation

with di�erent body postures and colors, representing movement and arousal. As those characters follow each other as in a sort of narrative path it would be natural to propose a musical comment to the development of the story. What we felt was a supporting soni�cation for the narrative of the timeline is the computer-controlled expressive performance of music �les stored in AD (e.g. ringtones, MIDI or mp3 �les) [3].

The abstract shapes are used for controlling the real-time performance of the chosen score. Energy in the body is mapped into energy of the music performance. For example high energy of the body, corresponding to red colour of the silhouette, us mapped into high energy of the music performance (i.e. loud sound level, more legato articulation, etc.). Movement of the body is mapped into kinematics of the music performance. For example if the abstract shape represents high quantity of body motion (raised body), this will produce a music performance with fast tempo and more staccato articulation [6].

3.4. THE TOOLS 21

3.4 The tools3.4.1 Pure DataThe software tool we used to implement our sound models was Pure Data4 (or PD) and its external GEM (Graphics Environment for Multimedia) 5 . PD is a graphical programming language developed by Miller Puckette in the 1990s at the University of California, San Diego, for the creation of interactive, real-time computer music and multimedia works.

The modular and nested nature of PD makes it possible to reuse the patches for independent changes inside the modules and quali�es the patches for use as blocks inside higher-level patches. For this reason we could integrate the speci�c rules for producing impact and friction sounds, that the SOb project models provided, in a framework of higher abstraction level patches that describe and produce sounds of a walking person, of the friction of a pen on a surface and �nally of the impacts of marbles.

3.4.2 Sounding ObjectsThe SOb's physically-based models give the possibility to synthesize naturally behaving sounds. In this work we refer to the SOb's models as low-level models meaning that they constitute the basic layer on which we built our high level models. They describe basic physical mechanisms involved in sound generation such as impact and friction and the properties of the resonating structures and the interaction modalities. The sound control is achieved by the modulation of a limited number of parameters, which are related to the two resonating objects and to the interactor. The de�nition of a limited number of parameters gives the models a cartooni�ed character that enable an e�ective sound computation and a good sound synthesis. The sound models are activated by the user interaction and respond to physical input parameters. By means of those parameters we can implement a higher level of abstraction where we can describe more complex sound events.

The SOb sound models are collected into the interaction-modal package which is available at the SOb web site6 . The interaction-modal folder contains the implementation of resonators described as a bank of modal oscillators and the interactors for impact and friction interactions. It also contains sound-modules which is a subdirectory for the structures required by PD and the plugins [22].

4http://www.puredata.org 5http://gem.iem.at/ 6http://www.soundobject.org

Part II

Conceptual models, implementations,tests and results

23

Chapter 4

The SMS event

4.1 Conceptual model and implementation

Sms events are represented by the metaphor of a marble falling in a virtual container. We propose a sound model for the soni�cation of the SMS event. MarblesInterface (see Figure 4.1) is the PD patch for the high level sound model implementing the sonic metaphor of a container with marbles. The requirements, that were established during the design phase, were that the recipient could personalize her messages by associating telephone numbers to sound attributes for distinguishing the senders, such as di�erent marble material, impact force and, size. The sender also could set up his own attributes.

The main idea is that by shaking the hardware device, which in AD's case would be the mobile phone, the user could hear if there are SMS messages and recognize how many they are. Providing that the hardware device has sensors that detect the user's manipulation, the model would respond to the user action by producing a real-time sound of the impacts of the marbles. The sound will be characterized by the marble settings that the user has chosen.

In order to satisfy the requirements, we designed a conceptual model that provides AD with a synthesis of sound determined by the parameters that describe the action of shaking. An accelerometer will provide the values for the activation of the real time sound synthesis, and will also be needed for testing purposes. The activation of the high level model, achieved by shaking the hardware device, simultaneously controls the retrieval of the stored parameters (material, size or frequency [Hz], impact force and number of marbles), together with the values of elasticity, strike velocity and bouncing of the impact models.

The high level PD model is built upon two main structures: an SMS event controller for the real-time activation of sound synthesis which works also as impacts counter, and takes care of the parameters storage, and a group of four impact models that enables multiple impact sounds synthesis (see Figure 4.2).

The storage of parameters is done with the help of PD's data structures. PD's graphical data structure resembles a data structure out of the C programming lan

25

26 CHAPTER 4. THE SMS EVENT

Figure 4.1. Pure Data interface for the control of marble sounds

guage and allows the storage of di�erent data types: scalar �oats and symbols, arrays, and lists. The data structure can be saved to a text �le. Accessing or changing data is done via pointers to scalars.

The activation of the model and the production of multiple impact sounds is provided by a counter structure that administrates the distribution of the retrieved parameters to a set of four impact models representing a sort of electronic drum machine. In this way it is possible to synthesize many simultaneous impact sounds each and one being characterized by the user's de�ned parameters. The model is prone to be augmented by adding more impact models to the set of four.

This high level design was mainly in�uenced by the nature of the impact models and of the PD language. A discussion of the design solution will follow in the chapter about future works.

A PD patch collection from the MarblesInterface model package can be found in the appendix (see Appendix �gure .1).

27 4.1. CONCEPTUAL MODEL AND IMPLEMENTATION

Figure 4.2. The high level conceptual model of the marble impact sounds


4.2 Test

4.2.1 Subjects and test settings

To test our model and hypothesis we set up a listening test. The participants to the test of the marble sound were a group of 16 subjects of di�erent nationalities, 8 males and 8 females, chosen to be representative for the average AD's user. Their age was between 17 and 67 years old, with an average of 37. All the subjects listened to music and six of them played an instrument. Each test session lasted in average 25 minutes.

We held the tests in an silent computer room where the set-up was a computer, a pair of headphones (Peltor Workstyle Type HT7A), an accelerometer and �ve cardboard boxes. We gave written instructions to the subjects, they could also ask for clari�cations before and during the test. The sound level was kept constant across the tests for all the subjects.

4.2.2 Stimuli and procedure

We divided the marble sound test in two parts, a marble quantity test and an emotion association test. During the marble quantity test we aimed to investigate if the user could be able to infer the quantity of the SMS messages stored in the handheld device. Our model provides the acoustic feedback for a SMS message by synthesizing the sound of a marble impact in a container. Each impact is characterized by properties as size, material and impact force. Providing that the hand helddevice has a gesture sensor, the user can activate our model and retrieve the sonic metaphor for the SMS by shaking the handheld device. In order to prepare the proper setting for this test we used a La Kitchen Kroonde 1 which is a wireless, high speed, high precision data-acquisition system dedicated to real time applications. It can be used for live performance of music and other interactive applications using embedded sensors such as �exion, pressure, light, magnetic �eld and others. For our testing purposes we applied the two-dimension acceleration sensor with a piece of tape to a cardboard box containing some marbles. During the test the subject was asked to wear a pair of headphones, shake the box, listen to the synthesized sound of marble impacts and guess the quantity of objects contained in it. She would then write down her answer on a questionnaire. As the subject was shaking the box, the sensor would send a message to the model and activate it.

In order to investigate the perception of quantity, we set up the test session by arranging two series of stimuli, one with the synthesis of impact sounds of rubber marbles, and the other with a random choice of materials. We also prepared a set of 5 boxes; one was empty the others containing 1,3,5, and 10 marbles. The boxes were prepared in order to minimize the natural sound of the shaking marbles by padding

[email protected], [email protected] La kitchen 78, avenue de laRépublique 75011 Paris - France- www.la-kitchen.fr

29 4.2. TEST

Figure 4.3. The test settings, the Kroonde and the boxes with a bidirectional acceleration sensor applied

their walls. Also the headphones isolated the subject from the environmental sounds up to an acceptable degree (see Figure 4.3).

The two series of stimuli were then presented by randomly changing the boxes applied to the sensor. We wanted to test in the context of the same stimuli series, if the subject could better recognize the quantity of marbles with or without the haptic feedback. For that purpose we arranged each series into three groups of stimuli, one by letting the subject shake an empty box and listen to the sonic feedback only, the second by applying the sensor to a box containing the same number of marbles as the number of impact sounds, and �nally by letting the subject guessing the number of marbles in the box without the sonic feedback (see Figure 4.2.2).

The emotion association test was computer aided. We implemented a test application were subjects were presented with questions about the associations of the marble sounds to the emotions of happiness, calmness, sadness and anger. Subjects had to choose the sound settings that felt more like the emotion we were investigating by moving a cursor in a slider or clicking on a button (see Figure 4.5). We wanted to investigate the in�uence of di�erent material, frequency [HZ], impact force and gravity on the emotion perception of the subject. After the test we saved the results in a test �le.

Stimuli group 3

Haptic feedback only

Stimuli series 1

Different materials marble impacts

Stimuli Series 2

Equal materials marble impacts

Stimuli group 2

Impacts sounds and haptic feedback

Stimuli group 2

Impacts sounds and haptic feedback

Stimuli group 1

Impacts sounds only

Stimuli group 1

Impacts sounds only

Stimuli group 3

Haptic feedback only


Figure 4.4. The stimuli settings for the marbles quantity test

Next

SAD

Make your own marble sound. Move the cursor, listen to the

sounds. Choose the sound that you think is saddest by

clicking into the check box.

31 4.2. TEST

Figure 4.5. The test settings. The PD application for the marble sound test


4.2.3 Results and conclusions

The test session on the marble impacts sounds was led in two parts. During the �rst one we asked the subjects to give an estimate on the perceived quantity of marbles they could hear in the box. The results show how the subjects could recognize the number of 1 marble and 3 marbles in a averagely good way. While it became more di�cult to distinguish the number of 5 and 10 marbles. The same tendency can be seen for sound stimuli of same material, the rubber, and for sound stimuli of mixed materials, wood, steel, rubber and glass. (see Figure 4.6 a. and b.).

The variance charts display the results dispersion for the two sets of stimuli, one with sounds of equal material and the second with sounds of di�erent materials. They shows how the deviations from the mean is very low when the subject evaluated the stimuli of 1 and 3 marbles, a lager deviation occurs when the stimuli presents 5 and 10 marble impact sounds. When the variance is calculated for impact sounds of equal material, the result shows evident di�erences for the set of stimuli with only sonic feedback compared to the chart that displays the variance calculated for impacts sound of di�erent materials. The most evident di�erence is the variance for the quantity perception of 10 marbles sounds that was 2.95 in case of stimuli of equal material compared to the 9.31 in the case of di�erent materials (see Figure 4.7 a. and b.).

From this result it seems that the subjects could better identify the number of marbles when the sound texture described di�erent materials and the stimuli were presented with both the sonic and haptic feedback. When the stimuli were presented with the sonic feedback only, the results showed a higher recognition rate with the sound texture of equal material. Finally when we tested the perception of quantity with only the haptic feedback, where the marbles contained in the boxes were of same material and size, we found that the results could be considered equal in both the series of stimuli. The sonic feedback seems to be enhanced by its combination with a haptic feedback, and it seems that di�erent materials of the sound textures improves the ability of the subjects to distinguish the quantity of marbles.

In the second part of the test we asked the subjects to associate the sound textures to an emotion. For the emotion happy the material steel rates at 57%, glass at 31%, rubber and wood both at 6%. For the emotion angry the material steel rates at 47%, glass at 33%,wood at 20% and rubber at 0% For the emotion sad the material wood rates at 57%, glass at 29%, rubber at 14% and steel at 0%. For the emotion calm both the materials wood and rubber rates at 44%,glass and steel both at 6% . The results show a clear tendency to associate steel and glass to emotions of happiness and anger, while wood and rubber are mainly associated to sadness and calm (see Figure 4.8 a.). Also the percentiles table display the results (cf.table 4.1).

The association of sound fundamental frequency[Hz] and emotion shows how sounds with higher frequencies are perceived as having high emotional valence. Lower emotional valence is associated to lower frequency (see Figure 4.8 b.). In this case the mean value for happiness is 1829.67 and for anger is 1576.94. The

33 4.2. TEST

mean value for sadness is 1096.16 and calm is 1135.36 The association of gravity force and emotion shows higher values for sounds with

high emotional valence, happiness and anger while they decrease for sadness and calm (see Figure 4.9 a.). The mean value for happiness is 726.25 and for anger is 380.21. The mean value for calm is 311.55 and sadness is 200.89

The association of impact force and emotion shows an even more evident separation between sounds with high and low emotional valence. Happiness and angerrated quite higher than sadness and calm (see Figure 4.9 b.). The mean value for happiness is 4683.83 and for anger is 4408.24. The mean value for sadness is 554.67 and calm is 562.19

Emotion Sound texture Subjects Ranks Percentile Angry

steel wood glass rubber

8 5 3 0

1 2 3 4

100% 66% 33% 0%

Sad wood glass rubber steel

10 4 2 0

1 2 3 4

100% 66% 33% 0%

Happy steel glass wood rubber

9 5 1 1

1 2 3 3

100% 66% 0% 0%

Calm steel wood glass rubber

7 7 1 1

1 1 3 3

66% 66% 0% 0%

Table 4.1. Rank and percentile table for the association of an emotion to marble sound texture


(a) The histogram of the frequencies of number of marbles estimate forstimuli representing marbles of equal material sound texture. It showswhat proportion of cases fall into each of the four categories: 1 marble, 3 marbles, 5 marbles and 10 marbles.

(b) The same tendency as for the previous histogram can be seen when the stimuli presented sound textures of di�erent materials.

Figure 4.6. Frequency estimation over the quantity of marble impact sounds

35 4.2. TEST

(a) The variance chart for stimuli of equal materials.

(b) The variance chart for stimuli of di�erent material

Figure 4.7. Variance over the quantity of marbles estimate for marble impact soundsrepresenting groups of 1, 3, 5, and 10 marbles


(a) Association of emotion to a sound texture

(b) Association of emotion to frequency

Figure 4.8. Association of emotion to material and frequency

37 4.2. TEST

(a) Association of emotion to gravity force

(b) Association of emotion to impact force

Figure 4.9. Association of emotion to gravity force and impact force

Chapter 5

The Bluetooth event

5.1 Conceptual model and implementationThe Bluetooth event is represented by the metaphor of footsteps of a person passing by, calling attention to an encounter with another user's bluetooth. FootstepsInterface (see Figure 5.1) is the interface �le of the high level model that implements the sonic representation of Bluetooth presence.

The model was developed in a "cartooni�ed" way starting from an idea of walking pattern that can be found in the Obiwannabe tutorials site 1 .

As in the tutorial the typical footstep sound pattern can be described by three consecutive events. First the heel produces an impact on the ground. Then the body weight is shifted slowly from the heel towards the toes. Finally the heel completely leaves the ground so that only the toes support the body weight. Thus for the footstep sound model we need to consider three stages, heel only, toes only, and an intermediate state where the out step of the foot rolls along the ground transferring the weight between the two. Furthermore there is an important phase relationship between the two feet during walking which changes when running. During walking the heel of each step overlaps with the toe phase of the previous step, there is never a time when no part of either foot touches the ground. While during running both feet completely leave the ground and there is a short time where no parts of the feet touch the ground until the next step.

The implementation of the model described above consists in a metronome structure that sets the velocity of the walking/running pattern. A second structure describes the phases of the heel, the out step and the toe that are implemented as one half cycle of a sine wave each. It is possible to see the actual curve of the three phases taken during the development of the model which shows two peaks as the heel and the toe impact, and the out step in between them (see Figure 5.2). Finally we created a bank of sound textures for the di�erent grounds that contains three low level sound models. The �rst sound texture group is based on the modulation of brown and pink noise to create soft materials, as in the Obiwannabe tutorial. The

1http://www.obiwannabe.co.uk/html/music/music.html

39

40 CHAPTER 5. THE BLUETOOTH EVENT

second group implements the SOb impact model and it represents the hard materials. The third group implements the crumpling sound model developed by Fontana and Bresin [5]. For a graphic description of the implementation see the conceptual model in Figure 5.3.

The footstep sound model can be controlled by tuning the velocity of walking or running, the size of the foot, the material of the ground, the distance of the sound, the reverb, the impact force (see Figure 5.1). The low level parameters that the footstep model controls on the impact model are: the amplitude, the reverb, the fundamental frequency[Hz], the impact force, the material. In addition, while for the sound textures based on the impact model we designed the foot size as being described by the variation of frequency, for the noise based sound textures the foot size is described by the variation of time interval of the heel impact and the toe impact.

The FootstepsInterface implements also a model for tempo, impact force, and amplitude manipulation for varying activity and emotional valence of footstep sounds (see Figure 5.4). We integrated the walking pattern with an activity-valence model similar to the one found in the expressive sequencer pDM [6]. pDm is a system for the real-time control of music performance synthesis. It implements the KTH performance rules system [7]. The integration within the walking pattern is made by the interpolation of the results from tests about the perceived emotion in di�erent walking patterns [28]. The acoustic analysis of emotional in�uenced walking samples provided the parameters necessary to create an interpolation model of the emotional cues. During the tests the participants were asked to walk in a way such as if they felt emotions fear, anger, happiness, sadness, and neutral. Then we used the results from a perception test that rated the di�erent emotions portrayed in the recorded walking sound. The tests samples which rated at the highest values of emotional intention for the emotions of fear, anger, happiness,and sadness provided the values for the interpolation used in the control model of footstep sounds implemented in this thesis. The linear interpolation in two steps is calculated �rst between sad-angry and then tender-happy. The values interpolated correspond to the toe amplitude, the heel amplitude, the outstep duration (heel to toe time interval) and the time interval between the two feet (toe to heel time interval). A discussion of the design solution will follow in the further developments chapter.

A PD patch collection from the Footstep model package can be found in the appendix (see Appendix �gures .2 and .3).

array100


Figure 5.1. Pure Data interface for the control of footstep sounds

Figure 5.2. The actual curve that the footstep model produces


Figure 5.3. The high level conceptual model of the walking pattern


Figure 5.4. The interface for the emotional walking pattern


5.2 Test

5.2.1 Subjects and test settings

After the marble sound test and a pause of some minutes, we lead the footstep sound test. The subjects were the same group of 8 males and 8 females. Each test session lasted in average 20 minutes. The test setting were the same as the previous test. As the test was entirely computer aided, subjects had to answer to a set of questions presented to them by a test application. We gave written instructions to the subjects and they could also ask for clari�cations before and during the test. The sound level was kept constant across the tests for all the subjects.

5.2.2 Stimuli and procedure

The test session of the FootstepsInterface was entirely computer aided. We implemented a test application in PD were the subject was asked to evaluate her perception of age, gender and emotion by listening and manipulating the sound synthesis of footstep sound of a walking person.

We disposed the questionnaire into three sections. In the �rst part the subject was asked to drag the cursors of two sliders and leave them when the model produced a sound that she perceived as representative of the age of a person. We asked for the representation of the age of a person of 3 years old, 25 years old and 60 years old with no consideration for the gender. The sliders controlled the settings of fundamental frequency[Hz] and footsteps duration for the impact model based sound textures. While for the noise based sound textures, they controlled the settings of foot size (implemented as the time interval between the toe and the heel impacts), and footstep duration (see Figure 5.5).

In the second group of questions we wanted to investigate the gender characteristics of the walking pattern. The subject was asked to choose a sound texture by clicking on buttons that corresponded to the available materials and drag the cursors of two sliders were the sound was perceived as masculine or feminine. The two sliders controlled the model in the same way as the previous group of questions (see Figure 5.6). The proposed sounds were chosen in order to test both the soft (noise based) and hard (impact model based) sound textures: they were the gravel (soft) and rubber (hard) textures. In the third part of the test session, the PD application shows a two-dimensional �eld (a grid) with a red dot cursor that can be dragged in the four directions. As the cursor moves around, the model synthesizes a footstep sound with an emotional valence as it has been described in the conceptual model section (see section 5.1). In order to prevent the learning e�ect of the four corners of the �eld associated to an emotional valence, we rotated the grid by 90 degrees counterclockwise, thus changing the positioning of the valences at each new question. The subjects were asked to choose the footsteps sound that were more representative of four emotions happiness, fear, sadness and anger (see Figure 5.7). The answers were saved on a text �le to be analyzed later.

Next

start soundpause sound

Move the cursors on the sliders. Produce a sound as a 3

years old child walking. When you are satisfied click on

the NEXT button.

Next

1 3 4 2

Choose a material first. start sound

end sound

MALE

Move the cursors on the sliders and try to create the

walking sound of a man.

45 5.2. TEST

Figure 5.5. The footstep test application. Age de�nition

Figure 5.6. The footstep test application. Gender de�nition

Next

HAPPY

s dmex1 s dmey1

start sound

end sound

Move the red dot, listen to the walking sounds. Choose the

sound that you think is happier.


Figure 5.7. The footstep test application. Emotion de�nition

5.2.3 Results and conclusionsWe aimed to investigate if it is possible to design footstep sounds that re�ect some information about the person who is walking by. We tested if the subjects could manipulate our model to create a sound that resembled the footstep sound of persons of di�erent age and gender. Also we asked the subjects to give an evaluation about the naturalness of our sounds. Furthermore we were interested to know if the a�ective modulation of our model could produce an emotionally in�uenced walking pattern and how was the intended emotion interpreted. To investigate this we asked the subjects to give an evaluation of the sound produced by the model, having in consideration a speci�c emotion. The subjects should position a cursor in a two-dimensional �eld. The position the subjects chose re�ected their opinion and gave us information about the recognition of our intended emotion.

The subjects tend to apply faster steps and higher frequencies[Hz] or smaller foot size to be representative of younger age, while slower steps and lower frequencies[Hz] represent older people (see Figure 5.8 a. and b.).

As displayed in the chart in Figure 5.8 a., it is possible to localize three areas where the di�erent ages are represented. values corresponding to 3 year old children were identi�ed by a higher frequency and lower footstep duration. The 25 years old values �nd a place in the middle of the chart representing footsteps characterized by medium frequency and footstep duration. While the 60 years old values show a lower frequency and higher footstep duration. The chart in Figure 5.8 b. shows the relation foot size vs footstep duration, as our model implements the sound textures

47 5.2. TEST

in di�erent ways. As in the previous chart, it is possible to distinguish di�erent areas where the three ages are represented.

In de�ning the gender of a person in the walking pattern the subjects tended to choose sound with lower frequency, and slower duration for the male pattern and higher frequency and faster duration for the female. The most de�nite distinction was achieved in creating sounds based on the impact model (see Figure 5.9 a. and b.). The chart in Figure 5.9 a. shows the relation between frequency and footstep duration in de�ning the gender of the person walking for the impact based model. The distinction between the two walking patterns is quite de�nite. It ascribes values of longer footstep duration and lower frequency to the male walking pattern, while the female pattern is characterized by shorter duration with some variations for the frequency values. The characterization of the footstep pattern for the noise based textures doesn't show the same sharp distinction as the previous results. Still it is possible lo localize the male pattern with some longer footstep duration and the female pattern with a shorter duration (see Figure 5.9 b.).

The choice of materials to de�ne the gender shows that the impact model represents male and female in a more distinct way. While the subjects could not sharply de�ne the gender by choosing the textures based on noise modulation (see Figure 5.10 a. and b.). Subjects chose sound textures of wood and rubber when they associated a footstep sound to a male person. They chose rubber in 89% of cases, and wood 11% of cases. Their answers were less de�ned when they associated a material sound texture to the footsteps of a woman, their results rated at 45% glass, 33% rubber, 11%wood and 11% steel. The results regarding noise based sound textures rated as following: 37% gravel, 27% dirt, 18% snow, 9% woodsoft and 9% for the footstep sound of a man, and 62% gravel, 25% snow, and 13% grass for the footsteps sound of a woman.

When the subjects had to recognize an emotional intent in the walking pattern, the emotion of sadness and happiness were easier to determine than the others. Most di�cult was to recognize fear and anger . Chart in Figure 5.11 a. represents how subjects positioned their emotion evaluation in the bidimensional �eld created for the test. The di�erent symbols represent the emotion investigated and their coordinate give us information about the sonic feedback as it varies along with the coordinates in the �eld and depending on the interpolation function we used. In the four corners of the �eld the valence parameters for each emotion reach their highest values. The chart shows how subjects answered when they recognized an emotion in the walking pattern. The subjects explained after the test that they could really hear the di�erences in intent, still they interpreted the emotions in their own subjective way. A subject said that she had some di�culty in deciding in which way to consider the emotion of fear. Fear could be described in two ways: as a fast footstep sound as if the person wanted to run away scared, or in a very slow footstep as if the person was trying to hide herself.

Subjects were asked to evaluate the naturalness of the sound. Naturalness of sound textures was evaluated better for the sound based on noise modulation. The sound textures based on the impact model were considered to be less natural (see


Figure 5.11 b.). On an arbitrary scale from 1 to 10 the soft materials sound textures (noise based) reach the mean value of 5.02, while the hard materials sound textures (impact model based) get a lower evaluation of 2.99.

49 5.2. TEST

(a) This chart displays the distribution of the estimates for the age representation in the walking pattern in frequency vs footstep duration relation.

(b) This chart displays the relation foot size vs footstep duration, as our model implements the sound textures in di�erent ways.

Figure 5.8. Associations of frequency[Hz] or foot size vs footstep duration. Age de�nition.


(a) This chart shows the relation between frequency and footstep duration in de�ning the gender of the person walking for the impact based model.

(b) Relation between the footstep duration vs the foot size for the noise based sound texture

Figure 5.9. Associations of frequency[Hz] or foot size vs footstep duration. Gender de�nition.

51 5.2. TEST

(a) Distribution of materials over the gender for the hard textures (impact model based)

(b) Distribution of materials over the gender for the soft textures (noise based model)

Figure 5.10. Gender vs material association


(a) This chart displays how subjects positioned the their emotion evaluationin the bidimensional �eld

(b) Naturalness of sound textures evaluation

Figure 5.11. Emotion positioning and naturalness

Chapter 6

The Scribble event

6.1 Conceptual model and implementationThe sonic representation of the scribble event is created as the friction sound of a pen on a surface. PenInterface (see Figure 3.6) is the high level model �le implementing the sound of writing gesture which is proposed for the soni�cation of the scribble event. As low level model we used the friction model contained in the interaction-modal package from the SOb project [21].

The main idea was to connect the AD scribble event to our model. By receiving the coordinates and the pressure values of the pen, the model will produce the continuous sonic feedback of the writing gesture (see Figure 6.1).

In order to develop this idea and to be able to test the pen movement vs the sound of the pen friction it was necessary to create a visual interface of the writing gesture. We used a Wacom Graphire4 pen tablet1 to send the parameters of the movement of the pen to the model. The visual interface was developed by using the OpenGL-based GEM (Graphics Environment for Multimedia) externals for Pure Data. It shows the pen movement as a white line on a black background in the GEM window. The thickness of the pen line can be set as thin, medium and large. As the pen moves within the GEM window a synchronized feedback sound is produced by the model. Once the sound model will be integrated with AD's scribble event the visual interface will be discarded.

For the sonic representation of the scribble we wanted to estimate and control the parameters of the friction model in order to create the friction sound of three di�erent pen points: chalk, pencil and felt-point pen. The friction model is controlled by the high-level model trough a number of parameters which are related to the interaction force between the two resonating objects and a third parameter that is the normal force between the two objects [21]. These are the parameters that the user control in a direct way by interacting with the tablet. The user interaction sets the velocity of the pen and the acceleration values which are calculated in the high level model. Those values are sent to the friction model together with the

1http://www.wacom-europe.com

53

54 CHAPTER 6. THE SCRIBBLE EVENT

pen pressure to control the modal resonator 1 values of external_force_on_bow, bow_initial_velocity and bow_pressure on the model.

The other parameters of the friction model pertain to the design of the sound textures, they are not really in�uenced by the user interaction; so we tuned them to get the proper sound textures. For the tuning of the parameters we referred to the phenomenological guide of the friction model parameters, in order to choose the sound texture which we considered more suitable for representing the di�erent pen points (cf. table 6.1) .

Symbol Physical description Phenomenological description σ0 bristle sti�ness a�ects the evolution of mode lock-inσ1 bristle dissipation a�ects the sound bandwidthσ2 viscous friction a�ects the speed of timbre evolution and pitchσ3 noise coe�cient a�ects the perceived surface roughnessµd dynamic friction coe�. high values reduce the sound bandwidthµs static friction coe�. a�ects the smoothness of sound attackυs Stribeck velocity a�ects the smoothness of sound attackf N normal force high values give rougher and louder sounds

Table 6.1. SOb's book [21] p.162, the phenomenological guide to the friction model parameters

The chalk sound texture is created by tuning the properties of the modal resonator 2 in the friction_2modalb_example patch : t_e-fact,freq1, and freq2. Freq1 and freq2 gets their values from the changes of pen pressure values that the tablet sends to the model. In this way the sound of the pen friction becomes more realistic as the writing gesture is mapped into the sound frequencies that change in relation to the variations of the pressure of the hand. This feature is shared by all the sound textures. For the pencil sound texture we modulated the interactor properties such as the lin-viscosity, brist-viscosity. For the felt-point pen sound texture the freq-fact of the modal resonator 2.

Other variables are initiated at the model start: the interactor noise level, the modal resonator 2 properties gain-fact, level1 and level2. This model shows the �exibility and the potential of the friction model as it is possible to improve or augment the number of the sound textures by simply manipulating the control parameters of the friction model.

A PD patch collection from the PenInterface package can be found in the appendix (see Appendix �gures .4 and .5).


Figure 6.1. The conceptual model for the friction sound of a pen


6.2 Test6.2.1 Subjects and test settingsAt the end of the experiment on the bluetooth event, after a 10 minutes pause, the same subjects as the previous tests took the scribble test. This session lasted in average 20 minutes. We held the tests in an silent computer room where the set-up was a computer, a pair of headphones (Peltor Workstyle Type HT7A) and a wacom tablet. We gave written instructions to the subjects and they could also ask for clari�cations before and during the test. The sound level was kept constant across the tests for all the subjects.

6.2.2 Stimuli and procedureIn order to test the model of the pen friction sound which is activated by the user writing action, we had to develop a visual interface that made it possible to connect the Wacom tablet to the sound model, this work was described previously in the design section (see section 6.1).

The test application for the pen sound test was presented to the subject in two windows on the computer screen (see Figure 6.2). In the left upper part of the computer screen a black GEM window representing a blackboard surface makes it possible to write or draw with the digital Wacom pen. On the right upper part of the computer screen another window presents the questions to the subject (see Figure 6.3). The questionnaire is aimed to investigate the quality of the pen friction sound, its relation to the writing gesture and the possibility of assigning some emotional meaning to the action of writing. Thus we asked to relate a combination of pen friction sound and pen thickness to a speci�c emotion. The emotions we asked about were happiness, calmness, sadness and anger. To choose the combinations, the subject was asked to draw or write something on the GEM window. An interesting aspect of this part of the test is that in some way the subject's gesture is in�uenced by the mood of the question. We thought that the subject would probably scribble in a way that re�ects the investigated emotion while she is trying to �nd her answer. Our idea was then to save on �le the coordinated of the pen movement on the GEM window, together with the values of the pressure of the hand. By saving this data we are able to reconstruct the scribbles, or drawings, of the subject and possibly, in a further work, analyze the gestures in relation to the mood of the question. This data could add more information to our investigation about the possibility to assign an emotional meaning to the action of writing. The answers to the questionnaire were then saved to a text �le for further analysis.

57 6.2. TEST

Figure 6.2. Test application for the sound of a pen friction

Figure 6.3. The test session for the pen sounds


6.2.3 Results and conclusions

The �rst questions we asked in the pen sounds test aimed at understanding the degree of recognition, realism and acceptance the user felt toward the sound textures. We asked to identify the sounds by guessing among a choice of possible solutions. The subjects recognized better the felt pen point sound texture while chalk and pencil were recognized to a lower degree.

The chalk sound texture is recognized at 50% of the times, ranking at 66,60% percentile together with the option that the sound is representing something else. The pencil sound texture is recognized at the 43% of the times, ranking at the 100% percentile among the other options. The felt pen point sound texture is recognized at the 56% of the times, also ranking at the 100% percentile among the other options (see Figure 6.4 a.). See also the percentiles table (cf.table 6.2).

They also considered that the realism of the sound textures was better for chalk, while felt pen point and pencil ranked at a lower degree. The arbitrary scale interval of 1 to 7 we used, rates 1 as very realistic and 7 not realistic at all. The mean value for the chalk sound texture rated 4.47 on the arbitrary scale. The mean for the pencil sound texture evaluation rated 2.65. The mean for the felt pen point sound texture rated up to 3.53 (see Figure 6.4 b.).

The pencil sound texture was better accepted than the others and chalk texture was less agreeable to the subjects. Percentage of preference for the sonic feedback or without the sonic feedback. The chalk sound texture acceptance rated up to 53%. The pencil sound texture acceptance rated to 87%. The felt pen point sound texture acceptance rated up to 80% (see Figures 6.5 a.).

Rating the sound response to the pen movement the subjects seemed to feel better response from the chalk and pencil sound, still they felt that the sound did not responded completely to the pen movements. On an arbitrary scale that goes from 1 to 7, where 1 is "no response at all" and 7 is "complete response", the sound response to the pen movements perceived by the subjects rated at 3.31 for chalk texture, 3.23 for pencil texture and 2.81 for felt pen point texture (see Figures 6.5 b.).

The second set of questions regarded the e�ort felt by writing with the sonic feedback. The haptic feedback is equal across the test still the subjects could feel some di�erences in e�ort while they were writing with di�erent combinations of sonic and visual feedback. They felt more e�ort while they were writing with the felt pen point sound texture across the three di�erent pen thicknesses. The test session presented a random combination of stimuli of di�erent feedback sound and di�erent pen thicknesses. In the arbitrary scale the value of 1 is set as very easy. The value of 7 is set as very di�cult. The mean values for the e�ort perception while writing with a thin pen point rated as following. The chalk sound texture rated 3.96 on the arbitrary scale. The mean for the pencil sound texture evaluation rated 4.34. The mean for the felt pen point sound texture rated up to 4.58. The mean values for the e�ort perception while writing with a large pen point rated as following.The mean value for the chalk sound texture and large pen point rated 3.30

59 6.2. TEST

on the arbitrary scale. The mean for the pencil sound texture evaluation rated 4.00. The mean for the felt pen point sound texture rated up to 4.41. The mean values for the e�ort perception while writing with a medium pen point rated as following. The mean value for the chalk sound texture and medium pen point rated 4.03 on the arbitrary scale. The mean for the pencil sound texture evaluation rated 4.06. The mean for the felt pen point sound texture rated up to 4.79 (see Figures 6.6 a. and b., and 6.7 a.). As the haptic sensation is actually the same across all the stimuli, a di�erence in perceived e�ort, though small, should be ascribed to both the di�erent sonic and visual feedback.

Finally the subjects answered to questions regarding the association of emotions to the action of writing.

The relation between emotion and point thicknesses rated for Angry vs Largeup to 37%. The relation Angry vs Thin to 38% and Angry vs Medium up to 37% . The relation Calm vs Large up to 31%. The relation Calm vs Thin up to 25%.The relation Calm vs Medium up to 44%. The relation Happy vs Large up to 13%. The relation Happy vs Thin up to 44%.The relation Happy vs Medium up to 43%. The relation Sad vs Large" up to 25%. The relation Sad vs Thin up to 44%. The relation Sad vs Medium up to 31%. Also see the percentile tables where the association Angry vs Large ranked to 100.00%. The association Medium and Happy vs Calm ranked both to the 66,6%. And Happy and Sad vs Thin ranked also both to the 66,6%.

The relation sonic feedback and emotion rated for Angry vs Chalk as much as 81%. The relation Angry vs Pencil rated to 13% and Angry vs Felt pen point to 6%. The relation Calm vs Chalk rated 0.0% wile Calm vs Pencil up to 69%. The relation Calm vs Felt pen point to 31%. Happy vs Chalk rated 6%. The relation Happy vs Pencil up to 25%, and Happy vs Felt pen point up to 69%. Sad vs Chalk rated 31%. Sad vs Pencil up to 6% and Sad vs Felt pen point up to 63%. The percentile tables show that Angry vs Chalk ranked to 100.00%, Calm vs Pencil ranked at 100.00% and Sad vs Felt pen point at 100.00%.

The results show how the subjects related the emotional intent mainly to the sonic feedback. The emotion of anger was related to the chalk sound texture. Calmness was related to pencil. Happiness and sadness were also mainly related to felt pen point sound (see Figures 6.8 and 6.9). The results are also illustrated by the percentiles tables (cf.table 6.3 and 6.4).


Sound texture Material Subjects Ranks Percentile Pencil

Pencil Felt pen point

Else Chalk

7 5 3 1

1 2 3 4

100% 66% 33% 0%

Chalk Chalk Else Pencil

Felt pen point

8 8 0 0

1 1 3 3

66% 66% 0% 0%

Felt pen Point Felt pen point

Else Pencil Chalk

9 4 2 1

1 2 3 4

100% 66% 33% 0%

Table 6.2. Rank and percentile table for the recognition of a material vs. a pen sound textures

Emotion Sound texture Subjects Ranks Percentile Angry

Chalk Pencil

Felt pen point

13 2 1

1 2 3

100% 50% 0%

Sad Felt pen point

Chalk Pencil

10 5 1

1 2 3

100% 50% 0%

Happy Felt pen point

Pencil Chalk

11 4 1

1 2 3

100% 50% 0%

Calm Pencil

Felt pen point Chalk

11 5 0

1 2 3

100% 50% 0%

Table 6.3. Rank and percentile table for the association of an emotion to a sound texture

61 6.2. TEST

Emotion Pen Thickness Subjects Ranks Percentile Angry

Large Thin

Medium

6 6 4

1 1 3

50% 50% 0%

Sad Thin

Medium Large

7 5 4

1 2 3

100% 50% 0%

Happy Medium Thin Large

7 7 2

1 1 3

50% 50% 0%

Calm Medium Large Thin

7 5 4

1 2 3

100% 50% 0%

Table 6.4. Rank and percentile table for the association of an emotion to a pen thickness


(a) Chart of the pen sound texture recognition

(b) Chart of the pen sound texture realism. The arbitrary scale interval of 1 to 7 we used, rates 1 as very realistic and 7 not realistic at all.

Figure 6.4. Sound texture recognition and realism.

63 6.2. TEST

(a) Chart of the percentage of preference for the sonic feedback or without the sonic feedback

(b) Chart of the sound response to the pen movements

Figure 6.5. Better with or without sound and sound response to pen movement


(a) Chart of the evaluation of the perceived e�ort with di�erent pen sounds, in this case with thin pen point

(b) Chart of the evaluation of the perceived e�ort with di�erent pen sounds,in this case with large pen point

Figure 6.6. E�ort perception

65 6.2. TEST

(a) Chart of the evaluation of the perceived e�ort with di�erent pen sounds, in this case with medium pen point

Figure 6.7. E�ort perception

(a) The relation between emotion and point thicknesses

Figure 6.8. Emotion association to pen thickness


(a) The relation between emotion and sonic feedback

Figure 6.9. Emotion association to sonic feedback

Chapter 7

General discussion and future works

Results of the tests presented in the previous chapter provide new ideas which we believe could be proposed as further developments. Looking at our three models, each of them presented particular problems that could be solved in order to improve them.

The choice of the sound of a marble falling into a container seems to us to be a good metaphor for the SMS message. We could see from the test results how the subjects related the sound and the quantity of marbles. As they were able to recognize the sounds we believe that the idea of inferring information from a marble sound gives us a feasible design purpose. Furthermore there are other studies that propose a similar design though applied in di�erent ways and contexts [27] [13]. This makes us believe that our idea could be well accepted and understood. Still during the development, we realized that if the SOb project's models could be revisited to make it possible to instantiate single impact objects, we would have the opportunity to program our high level models in a more e�ective and object oriented way. For this purpose we should need new PD externals that makes it possible to instantiate PD objects. An instantiated sounding object could store the parameters values in its variables, it could have methods for the implementations of its behavior and it would make the data storage easier and more economic. Also we could implement appropriate AI for each instance of sounding objects. Of course this would give us the advantage of more robust and time e�ective programming, especially if we intend to create applications for handheld devices. So far we know there aren't externals that permit to do so. We believe there will be in the future. At present the architecture we chose for our model depends on the features that the impact model o�er us and on the PD's programming interface.

If we had a chance to new tests we would probably use the former experience to make some changes. We regret that for reasons of time economy we decided to hold all the three test sessions at the same time. It would have been better to divide the di�erent parts in more test sessions. The subjects would have concentrated better to each one of the models'tests.

Still another issue about the tests settings for the marble sound is that the

67

68 CHAPTER 7. GENERAL DISCUSSION AND FUTURE WORKS

haptic feedback, which consisted in cardboard boxes containing marbles, was of course not consistent and synchronized with the sonic feedback during the quantity test. Creating such a test setting with synchronized haptic feedback was beyond our possibilities. Nevertheless in their studyWilliamson and Murray-Smith [27] came up with a simulated system for mobile phones, they designed a vibrotactile model that describes a number of balls bouncing around within the virtual container. Although they have a real time response to the user manipulation for the haptic feedback, they do not have a real time synthesis of the sonic feedback. The auditory display of their model is mainly based on a number of pre-recorded impact sounds (wood on glass, for example). It is an interesting idea to us to cooperate with this study and put our forces together, by developing a model that produces both haptic and sonic feedback in real time, to get new results and better understanding about our design.

If we had the opportunity of going further on this study we could program a mobile telephone to vibrate in synchrony with the sound produced by our model. We believe that this would improve our results.

The sonic feedback to the digital pen works well. We had nevertheless some problems with the hardware, the tablet's response to the user's movements. Our Wacom pen is trimmed to send out pressure and coordinate values some millimeters before the point of the pen reaches the tablet surface. It is a feature that is useful when the pen is used in the usual context without a sonic feedback. But we really need a perfect synchronization with the movements. The subjects were expressively troubled by this problem and we could not change the pen settings. There are on the market more advanced tablets that give the possibilities to trim the outputs for actual purposes. We believe that if we had the possibility to use those tablets, we would have better test results regarding the sonic feedback response to the pen movements.

During the pen sound test we collected data regarding the user gesture in relation with the pen sound feedback that still has to be analyzed. This material could open up for a new study. The interesting aspect would be that we could correlate studies in human gesture vs emotions, to studies in emotion perception and sound design.

The SOb team is still working on the low level models; they are de�ning better versions of the impact model together with a more stabile version of the Crumpling sound model. We could't test the latter because it did not hold the computational work we needed. Since the footstep model can be easily enhanced by adding new sound textures, we believe that the naturalness of the sound will be improved by new and better low level models.

Moreover we believe that the emotional expressive control of walking pattern has the potential of being tuned to be more sensible to the emotional intent. Of course the interpretation of emotion has a subjective factor, and di�erent subjects perceive the intended emotion in di�erent ways, still we could change the values of the interpolation function in order to increase the response to the walking pattern intended emotions. New values could be derived from new studies on emotion and walking patterns.

69

The marble impacts sounds model test shows that the subjects cold make a correct identi�cation of quantity, also emotional content can be ascribed to sound textures. The footstep sounds model results show that the subjects identi�ed age and gender mainly by footstep duration while sound texture, foot size and frequency [Hz] were not as indicatory. The test results shows that the sonic feedback of the pen can communicate emotion. This is mainly determined by the sound feedback and not by the visual one

We think that high level sound models can be used as sonic interaction tools in order to express a multiplicity of meanings and emotions and to interact in an unmediated way. While visual interfaces gives us the possibility to make abstractions and create meanings in order to shift the place where the action takes place into the system's representation of the world. In the sonic domain the only way to interact is to use sound samples, which is what has been done so far. Using real time sound synthesis instead, the sound interfaces cannot a�ord to arrange the representation of sounds in another conceptual place. They would not be credible. Designing for embodied interaction becomes, in the auditory domain, a mandatory premise to a good quality design. Users should interact with sounding objects in the same way they interacts with real world sounds.

The soni�cation of A�ective Diary gave us the possibility to apply a new interaction design paradigm, the embodied interaction. The sonic metaphors enrich the possibilities of interacting with the system without visual attention by exploiting familiarity with sound events, and by adding emotional understanding of expressive sounds.

It is a new way to think about sound and interaction. We believe that sound response to user gestures leads to interfaces that are more intuitive, and rich in signi�cance and implications.

Chapter 8

Appendix

PD Patches The following PD patches describe how we implemented our models.

Figure .1 Marble impact sound, control model.

Figure .2 Footstep sound, control model.

Figure.3 Footstep sound, material bank.

Figure .4 Pen friction, control model.

Figure .5 Pen friction, pen movements and animation control model.

71

size

GravityForce

rubber

wood

steel

glass

bangAppend

InputStrike

bangGet

bangSaveToFile

EndOfFile

DeleteParam.OnFile

control_interface1

ImpactForce

f

+ 1

01

2

datastructure

fixValue

fixValue

0

10000

f

pack 0 0 0 0

4

3

-1

0

print

delay 100

control_interface3

control_interface2

control_interface

f

+ 1

0

f

0

numberStrykes

0

loadbang

bang

counting from 0)

start

stop

writesf~ 1

r save

r stop

open V:/pebbles.wav

+~

f

0

moses 12

fixValue

0

r testNumber

sequenceDelays4

delay 10

delay 10

delay 10

delay 10

-1

If there are more

than 12 impacts on

memory then bang the

first 4 together

Goes to the begininng of the

scalars list, then sends a bang

and starts the impacts models.

It stops at the end of file

72 CHAPTER 8. APPENDIX

Figure .1. A�ective Diary marble sound

73

Figure .2. A�ective Diary footstep sound


Figure .3. A�ective Diary footstep sound, material bank

75

Figure .4. A�ective Diary pen friction sound


Figure .5. A�ective Diary pen friction sound

Bibliography

[1] F. Avanzini, S. Sera�n, and D. Rocchesso, Interactive simulation of rigid body interaction with friction-induced sound generation, IEEE Transactions on Speech and Audio Processing 13 (2005), no. 5, 1073�1081.

[2] K. Boehner, R. DePaula, P. Dourish, and P. Sengers, A�ect: from information to interaction, CC '05: Proceedings of the 4th decennial conference on Critical computing (New York, NY, USA), ACM Press, 2005, pp. 59�68.

[3] R. Bresin and A. Friberg, Emotional coloring of computer-controlled music performances., Computer Music Journal 24 (2000), no. 4, 44�63.

[4] P. Dourish, Where the action is: The foundations of embodied interaction, The MIT Press, October 2001.

[5] F. Fontana and R. Bresin, Physics-based sound synthesis and control: crushing, walking and running by crumpling sounds, XIV Colloquium on Musical Informatics, XIV CIM 2003 (Florence, Italy), may 2003, pp. 109�114.

[6] A. Friberg, pDM: an expressive sequencer with real-time control of the KTH music performance rules movements, Computer Music Journal 30 (2006), no. 1, 37�48.

[7] A. Friberg, Bresin R., and J. Sundberg, Overview of the kth rule system for musical performance, Advances in Cognitive Psychology 2 (2006), no. 2-3, 145� 161., Special Issue on Music Performance.

[8] A. Gabrielsson and E. Lindström, Music and emotion theory and research, ch. The In�uence of Musical Structure on Emotional Expression, New York: Oxford University Press, 2001.

[9] W.W. Gaver, How do we hear in the world ? explorations in ecological acoustics, Ecological Psychology 5 (1993), no. 4, 285�313.

[10] , What in the world do we hear? an ecological approach to auditory event perception, Ecological Psychology 5 (1993), no. 1, 1�29.

[11] J. J. Gibson, The ecological approach to visual perception, Lawrence Erlbaum Associates.

77

78 BIBLIOGRAPHY

[12] B. Giordano and R. Bresin,Walking and playing: What's the origin of emotional expressiveness in music?, ICMPC9 - 9th International Conference on Music Perception & Cognition (Bologna) (M. Baroni, A. R. Addessi, R. Caterina, and M. Costa, eds.), Bonomia University Press (abstract), aug 2006, p. 149.

[13] I. Hiroshi and U. Brygg, Tangible bits: towards seamless interfaces between people, bits and atoms, CHI '97: Proceedings of the SIGCHI conference on Human factors in computing systems (New York, NY, USA), ACM Press, 1997, pp. 234�241.

[14] P. N. Juslin, Communicating emotion in music performance: A review and theoretical framework, in music and emotion, Oxford University Press, 2001.

[15] C.L. Krumhansl, An exploratory study of musical emotions and psychophysiology, Can J Exp Psychol 51 (1997), no. 4, 336�53.

[16] , Music: A link between cognition and emotion, Current Directions in Psychological Science 11 (April 2002), 45�50(6).

[17] A. Lindström, A. Ståhl, K. Höök, P. Sundström, J. Laaksolathi, M. Combetto, A. Taylor, and R. Bresin, A�ective diary: designing for bodily expressiveness and self-re�ection, CHI '06: CHI '06 extended abstracts on Human factors in computing systems (New York, NY, USA), ACM Press, 2006, pp. 1037�1042.

[18] I. Flores Luis and R. Bresin, In�uence of expressive music on the perception of short text messages, ICMPC9 � 9th International Conference on Music Perception & Cognition (Mario Baroni, Anna Rita Addessi, Roberto Caterina, and Marco Costa, eds.), 2006, abstract only, p. 244.

[19] I. Peretz and R. Zatorre, The cognitive neuroscience of music, Oxford University Press, USA, October 2003.

[20] M. Rath and F. Fontana, Highlevel models: bouncing, breaking, rolling, crumpling, pouring, The Sounding Object (D. Rocchesso and F. Fontana, eds.), Mondo Estremo, Florence, Italy, 2003, pp. 173�204.

[21] D. Rocchesso, R. Bresin, and M. Fernström, Sounding object, IEEE Multimedia Magazine 10 (2003), no. 2, 42�52.

[22] D. Rocchesso, L. Ottaviani, and F. Fontana, Low-level models: resonators, interactions, surface textures, The Sounding Object (D. Rocchesso and F. Fontana, eds.), Mondo Estremo, Florence, Italy, 2003, pp. 137�172.

[23] , Size, shape, and material properties of sound models, The Sounding Object (D. Rocchesso and F. Fontana, eds.), Mondo Estremo, Florence, Italy, 2003, pp. 95�110.

[24] R. Murray Schafer, The soundscape. our sonic environment and the tuning of the world, Destiny Books, Rochester, Vermont, 1994.

79

[25] K.R. Scherer and M.R. Zentner, Emotional e�ects of music: Production rules, Music and Emotion: Theory and Research (P.N. Juslin and J.A. Sloboda, eds.), Oxford Univ. Press, Oxford, UK, 2001, pp. 361�392.

[26] VVAA, A roadmap for sound and music computing, The S2S2 Consortium, http://www.soundandmusiccomputing.org/roadmap, 2007.

[27] J. Williamson and R. Murray-Smith, Multimodal excitatory interfaces with automatic content classi�cation, ACM SIG CHI Conference, 2007.

[28] Å. Wrange, From expressive walking to expressive music performance, Master's thesis, KTH - School of Computer Science and Communication (CSC), Department of Speech, Music and Hearing, S-100 44 Stockholm, 2007.

Documents

DesigningSoni cationofUserDatain A ectiveInteraction · In this master's thesis it is presented a proposal for sound design strategies that can be used in applications involving affective