Transcript
Page 1: Interactive sonification of synchronisation of motoric behaviour in social active listening to music with mobile devices

J Multimodal User Interfaces (2012) 5:157–173DOI 10.1007/s12193-011-0079-z

O R I G I NA L PA P E R

Interactive sonification of synchronisation of motoric behaviourin social active listening to music with mobile devices

Giovanna Varni · Gaël Dubus · Sami Oksanen ·Gualtiero Volpe · Marco Fabiani · Roberto Bresin ·Jari Kleimola · Vesa Välimäki · Antonio Camurri

Received: 31 January 2011 / Accepted: 12 November 2011 / Published online: 13 December 2011© OpenInterface Association 2011

Abstract This paper evaluates three different interactivesonifications of dyadic coordinated human rhythmic activ-ity. An index of phase synchronisation of gestures was cho-sen as coordination metric. The sonifications are imple-mented as three prototype applications exploiting mobiledevices: Sync’n’Moog, Sync’n’Move, and Sync’n’Mood.

G. Varni · G. Volpe (�) · A. CamurriInfoMus, DIST, University of Genova, Viale Causa 13,16145 Genova, Italye-mail: [email protected]

G. Varnie-mail: [email protected]

A. Camurrie-mail: [email protected]

S. Oksanen · J. Kleimola · V. VälimäkiDepartment of Signal Processing and Acoustics,School of Electrical Engineering, Aalto University,P.O. Box 13000, 00076 Aalto, Finland

S. Oksanene-mail: [email protected]

J. Kleimolae-mail: [email protected]

V. Välimäkie-mail: [email protected]

G. Dubus · M. Fabiani · R. BresinDepartment of Speech, Music and Hearing, School of ComputerScience and Communication, KTH Royal Institute of Technology,Lindstedtsvägen 24, 10044 Stockholm, Sweden

G. Dubuse-mail: [email protected]

M. Fabianie-mail: [email protected]

R. Bresine-mail: [email protected]

Sync’n’Moog sonifies the phase synchronisation index byacting directly on the audio signal and applying a nonlineartime-varying filtering technique. Sync’n’Move interveneson the multi-track music content by making the single in-struments emerge and hide. Sync’n’Mood manipulates theaffective features of the music performance. The three soni-fications were also tested against a condition without sonifi-cation.

Keywords Interactive sonification · Interactive systems ·Audio systems · Sound and music computing · Activemusic listening · Synchronisation

1 Introduction

This paper evaluates three different interactive sonificationsof dyadic coordinated human rhythmic activity. The sonifi-cations are implemented as three prototype applications ex-ploiting mobile devices. The study concerns a pair of usersinteracting by means of their movements and gestures, e.g.,shaking gestures, performed by holding and moving eachone a mobile phone. Kinematic features, such as acceler-ation and velocity, of the movement of the two users aremeasured by means of the on-board 3D accelerometer themobile phones are endowed with. According to the state ofthe art on coordination (e.g., [16, 17]), focusing mainly onthe timing or phasing of joint activity, phase synchronisa-tion of gestures is chosen as coordination metric. An indexof phase synchronisation of the movement of the two usersis extracted by applying Recurrence Quantification Analy-sis [14] to the kinematic features. This phase synchronisa-tion index is sonified using three different algorithms, im-plemented in three application prototypes, operating at dif-ferent conceptual levels. Sync’n’Moog sonifies the index by

Page 2: Interactive sonification of synchronisation of motoric behaviour in social active listening to music with mobile devices

158 J Multimodal User Interfaces (2012) 5:157–173

applying a time-varying filtering technique to an audio sig-nal. Sync’n’Move works on the polyphonic structure of amultitrack audio signal, by making instruments emerge andhide. Sync’n’Mood manipulates the affective content of themusic performance.

These three prototypes were compared in experimen-tal conditions where the users were provided with auditoryfeedback only and with auditory and visual feedback. A nosonification condition, where users are only provided withvisual feedback, was also tested as baseline. The purpose isto find out whether there is a difference between: (i) a dyadiccoordinated rhythmic activity performed with and withoutsonification, (ii) the three sonifications, and (iii) the mea-sured and the perceived synchronisation.

The remainder of the paper is organised as follows:Sect. 2 presents the research framework and related work;Sect. 3 shows how the present work is more in keepingwith the frame of interactive sonification than with data-driven music performance, considering successive defini-tions of sonification; Sect. 4 briefly introduces the architec-tural framework; Sect. 5 discusses the sonification designand presents the three algorithms; Sects. 6 and 7 describethe experimental methodology and discuss the carried outanalysis and the obtained results.

2 Research framework and related work

Research is carried out in the framework of social active lis-tening to music [12]. In an active music listening experiencelisteners are enabled to interactively operate on music con-tent, by modifying and moulding it in real-time while listen-ing. Active music listening is performed in an user-centricperspective [13], that is, the experience is not so much de-pending on specific features of the content (e.g., the musicgenre or the score). Rather, it depends on the behaviour ofthe user, generally measured by means of a multimodal in-terface.

Historically, one of the first active music listening sys-tems, the Radio Baton, was developed by Mathews [1].The Radio Baton is a gesture control device, which de-tects the locations of the tips of two sticks held by the user.An early application of the device was to conduct a com-puter, which would play music using a MIDI synthesiser,while the user could communicate the basic musical param-eters, such as tempo and level. Later, various other technolo-gies have been used for the conductor following task, suchas magnetic trackers [2]. Gesture control has been appliedwidely in musical sound synthesis, as Paradiso [3] and Wan-derley and Depalle [4] show in their tutorial reviews. Vir-tual air guitars [5, 6] are recent related applications, whichuse accelerometers and computer vision (i.e., camera-based

tracking) to locate the users’ hands and apply this informa-tion to control a guitar sound synthesiser. Playing a virtualair guitar can be much easier than playing a real musicalinstrument, because the player’s gestures can be coarselyclassified into a few classes, e.g., into only four differentchords. It will then be easy to play right, and the expe-rience can be classified as active music listening, becausemore interaction is allowed than when listening to a pieceof music, but it can be much more limited than in real play-ing.

Recently, active music listening prototypes were devel-oped for specific use with mobile devices. Mobile phonescan operate on music content available over a network.Moreover, they are broadly available, they are pervasive,they influence everyday life activities, and they are increas-ingly used as a device for social networking. Applicationswere developed for providing auditory feedback about phys-ical activities, expressive gestures [31] and outdoor sportstraining [30]. Further applications concern sound manip-ulation and sharing in networked media scenarios (e.g.,Sonic City [19], Sound Pryer [20], Sonic Pulse [21]) andthe use of the phones as musical instruments, e.g., Ca-mus [22].

In this work, active listening is addressed in a networkedmedia scenario, where the listening experience is the resultof the collaborative non-verbal social behaviour of two ormore users. Whereas this paper deals with two users, ap-plications concerning more than two users are also beingdeveloped. For example, one of the prototypes presentedhere (Sync’n’Move) was extended to an arbitrary numberof users and was tested with four users. As another example,Leman and colleagues [23] developed a system analysingthe full-body rhythmic movements of a group of users andcomparing them with the beat of a song to produce auditoryfeedback.

Examples of interactive sonification of movement exist inthe literature. For example, it constitutes a method for pro-viding biofeedback as a part of elite sports training [27–29].The use of auditory display in general is particularly wellsuited for the realisation of time-related tasks, and conse-quently fits well with fields where timing is important suchas high-performance sport practice. It also offers promisingopportunities in other contexts, e.g., stroke recovery [25]and sports rehabilitation [26]. In the work presented here,on the one hand, the motoric interaction between the twousers affects sonification. On the other hand, sonification isthe auditory output of the system and an increasing synchro-nisation of the motoric behaviour contributes to a better ex-perience of the music content. Thus, interactive sonificationcan be deemed as a special kind of social active experienceof music.

Page 3: Interactive sonification of synchronisation of motoric behaviour in social active listening to music with mobile devices

J Multimodal User Interfaces (2012) 5:157–173 159

3 Using music material in sonification

Considering early definitions of sonification, such as the useof non-verbal sounds to convey information, as introducedby Barrass and Kramer [35], the present work can be de-scribed rather safely as sonification. However, it does notseem to meet the requirements proposed in Hermann’s tax-onomy of sonification [36] as the systematic nature of theauditory display is subject to question. Hermann states thatonly random generation (e.g., noise) and differences in elec-troacoustic equipment and conditions should be allowed asa concession to the fact that the sound wave would not beidentical for the same input data. In other words, a periodicinput data should give a periodic sound output in real-timeinteractive sonification, which is not the case for the threeprototypes described here, since the basis of the display is apiece of music unfolding in time. Nevertheless, the authorsbelieve them as fitting in a broadened definition of sonifica-tion where the same input data would yield the same con-sequences not directly on the sound wave but on propertiesof the sound at a higher level: tempo, articulation, metricvariations, and timbre. Being aware that this definition ison the edge of sonification, it seems necessary to make aclear distinction with other potential uses of music materialas constrained by input data such as data-driven music per-formance.

The border between data-driven music and sonificationcan be fuzzy, as pointed out in [34]. The major point of dif-ference lies in the intention of the work: sonification is notmusic because these two domains clearly do not have thesame purpose. Whereas data-driven performance mainly tar-gets aesthetic qualities, the main goal of sonification is to op-timise efficiency of information communication. If aestheticconsiderations are often taken into account in sonification, itis rather for usability considerations than to achieve an im-proved efficiency. Similarly, portions of data can be made

more perceivable for artistic purposes in data-driven music,yet what motivates the design decisions is rather the intrin-sic aesthetic value of the piece of art than what the listenerswill catch from the underlying data. Although the three pro-totypes described in Sect. 5 use music as sound material, themain purpose of the setup is not for the auditory display tobe pleasant to hear or to arouse particular feelings for theparticipants but to help them to perform a physical task cor-rectly.

4 Architectural framework

Figure 1 summarises the general architecture of the de-veloped prototypes. All of them share the architecturalframework developed in the EU-ICT project SAME(www.sameproject.eu), and only differ for the adopted soni-fication algorithm.

The mobile devices Nokia N85/N95/N97 running on S60operating system were used in this work. These devices areequipped with 3D accelerometers able to measure acceler-ation in the range [−2g,2g] at 30 Hz, and they communi-cate via network (a wireless LAN) with a server where theprototypes are running. The server performs low-level sig-nal processing, feature extraction, and sonification. This wasimplemented in real-time with the EyesWeb XMI platform(www.eyesweb.org).

4.1 Low-level signal processing

A normalisation is performed so that the magnitude of theacceleration varies in between 0 and 1 approximately. Thisis obtained through calibration, performed once the first timea specific mobile is connected. Calibration is carried out byfinding out the maximum and minimum values of the mag-nitude of the acceleration the on-board accelerometer can

Fig. 1 Block diagram of the adopted general architecture: mobile phones send data to a server through the network. The server performs low-levelsignal processing, feature extraction, and sonification

Page 4: Interactive sonification of synchronisation of motoric behaviour in social active listening to music with mobile devices

160 J Multimodal User Interfaces (2012) 5:157–173

measure. Calibration inlcudes the measure of the g compo-nent the accelerometer detects on each of its three axes.

Besides acceleration, speed and energy are also com-puted. These are used to stop and start audio rendering whenthe users stop and start moving.

4.2 Computation of phase synchronisation index

The phase synchronisation index, the measure to be soni-fied, is computed on the magnitude of the accelerations ofthe two mobile devices with an approach based on Recur-rence Quantification Analysis (RQA) [14]. The dynamics ofeach user is described by the time series of the magnitudeof acceleration. For each time series a1 and a2 the probabil-ities p(a1, τ ) and p(a2, τ ) of recurring (within a given tol-erance) after a time lag τ are computed. Finally, the phasesynchronisation index is computed as the correlation of thenormalised probabilities of recurrence (CPR):

CPR = 〈p̂(a1, τ ), p̂(a2, τ )〉, (1)

where 〈 , 〉 indicates the correlation operation. For this rea-son, the phase synchronisation index is usually referred to asCPR Index. Intuitively, the more the two accelerations recurwith the same time lag, the more the phase synchronisationindex tends to 1.

5 Sonification design

The sonification algorithms have been chosen so that they in-tervene at different conceptual levels and on different qual-ities of the auditory feedback. The first one (Sync’n’Moog)operates at the physical level, by applying a filtering trans-formation of the audio signal, independently from any mu-sical concept or structure. The second one (Sync’n’Move)works at an intermediate level, intervening on the poly-phonic structure of the piece (i.e., adding or removing in-strumental sections depending on the phase synchronisa-tion index). The third one (Sync’n’Mood) exploits high-level music features related to the expressive content mu-sic conveys. Sync’n’Moog can process any audio file.Sync’n’Move requires a multitrack recording as input. Theinstances of Sync’n’Move and Sync’n’Moog evaluatedhere used the same music piece. Sync’n’Mood works onMIDI files. Some videos excerpts showing Sync’n’Moogand Sync’n’Move are available at the following address:http://www.sameproject.eu/demos/ (October 2011).

The prototypes work on sliding time-windows, whoseduration was set to 3 s. Such a duration proved to be, on theone hand, long enough to capture the phenomenon under ob-servation (i.e., synchronisation) and, on the other hand, shortenough to provide meaningful feedback within an accept-able latency. Note that given the overlapping of the time-windows (they are shifted of one sample at time), after an

initial time-period of 3 s during which no output is produced,audio buffers are then produced at the same rate of the inputstream (e.g., 30 Hz in case of the used on-board accelerom-eters).

5.1 Sync’n’Moog

The first algorithm, Sync’n’Moog, maps the CPR Index ontothe control parameters of a time-varying digital filter. Soni-fication in Sync’n’Moog is based on the digital implemen-tation of a nonlinear musical filter, the Moog ladder filter,which processes the input audio file. The characteristics ofthe Moog filter come from the time-varying resonating anddistorting lowpass nature of the sound. The suggested ef-fect cannot be realised by using a simple bandpass filter andtherefore a more complex design is needed. More details onthe Moog filter are provided in Appendix A.

The Sync’n’Moog sonification is controlled by mappingthe current value of the CPR Index to the filter cutoff fre-quency. The desired mapping range is determined by thelower threshold fmin = 0.2 kHz and the upper thresholdfmax = 3.0 kHz. The filter cutoff frequency mapping can bedone by using a linear mapping between the CPR Index andfc (Fig. 2) but it leads to unnatural control sensation. A loga-rithmic frequency mapping produces more natural sonifica-tion control, but has a computationally complex implemen-tation that requires computing of an exponential for eachoutput sample. Therefore a computationally less expensivepolynomial mapping for the cutoff frequency approximationis used:

fc = 2800β3 − 1000β2 + 1000β + 200 Hz, (2)

where β is the current CPR Index value. Figure 2 presents acomparison of the frequency mapping methods.

An example of Sync’n’Moog sonification output spectro-gram for this work is presented in Fig. 3. The filter cutoff fre-

Fig. 2 Three approaches to filter cutoff-frequency mapping from theCPR Index, when fmin = 200 Hz and fmax = 3.0 kHz

Page 5: Interactive sonification of synchronisation of motoric behaviour in social active listening to music with mobile devices

J Multimodal User Interfaces (2012) 5:157–173 161

quency (white line) is determined from the value of the CPRIndex by using (2). The effect of the time-varying filteringcan be seen from the output spectrogram. When the syn-chronisation between users is low the cutoff frequency re-ceives small values and the resulting spectrum is narrow. Onthe opposite, when synchronisation between users is high,the cutoff frequency receives larger values and the resultingoutput spectrum is broad.

5.2 Sync’n’Move

In Sync’n’Move [11, 15] the CPR Index is sonified by mak-ing it control the polyphonic structure of the music piece.When the mobile devices do not move, no sound is pro-duced. As soon as a user moves one of them a rhythmicpattern can be heard. This is usually a mix of the percussionsection of the piece (e.g., bass drum, cymbal, snare drum,hat, and so on). These choices have a twofold motivation:the transition from silence to the rhythmic pattern makes theusers aware that the mobile device can act as a controller

Fig. 3 Spectrogram of the audio output for Sync’n’Moog using a pieceof music as input signal. The filter cutoff frequency is marked with awhite line

for the music piece; the reproduction of a rhythmic patternencourages the users to move rhythmically, helping them toreach synchronisation about at the same tempo the rhythmicpattern suggests. The initial rhythmic pattern provides thebaseline for the active experience. On the top of it, the morethe users get synchronised, the more their experience is en-riched with the addition of other instruments. As soon as theCPR Index crosses a threshold and stays above it for a givenduration, a new instrument or a group of new instruments(up to a whole section) is added to the music output. Bysuitably combining the values of the thresholds and the du-rations, the effect of gradually enriching the music contentis obtained. The music piece can thus be fully experiencedonly when the two users reach the highest level of synchro-nisation and are able to keep it for enough time. In such away, a motivation for collaborating and a challenge are in-troduced at the same time: the two users have to collaboratein reaching a high level of synchronisation and the prize forthat is the complete experience of the piece. Figure 4 sum-marises the sonification algorithm.

The instruments were divided in four groups. The rhyth-mic baseline included various drums. A second section wascomposed by bass and synth. The next section included theguitars. The final section introduced flute and voice. Themotivation of this choice is that the flute plays the samemelody of the voice when there is no voice. The thresholdswere all set to 0.4. The duration along which the users hadto stay synchronised was set to 3.5 s for all the groups ofinstruments. This means that in this implementation the pro-gression of the sections depends on the time along whichthe users stay synchronised, i.e., the more they stay synchro-nised with CPR values higher than 0.4, the more the auditoryfeedback is enriched with new sections. The value of 0.4 isthe minimum value of the index over which synchronisationbegins to be remarkable. In this way, the task becomes eas-ier for the users who are encouraged to continue their move-ment to reach very high levels of synchronisation. Never-theless, Sync’n’Move can also work with different values

Fig. 4 Block diagram of theSync’n’Move sonificationalgorithm

Page 6: Interactive sonification of synchronisation of motoric behaviour in social active listening to music with mobile devices

162 J Multimodal User Interfaces (2012) 5:157–173

for the thresholds, even if this setting was not selected forthis particular experiment.

5.3 Sync’n’Mood

The Sync’n’Mood sonification uses the program Permorferto playback a MIDI file. In this experiment, the piece “Solenglimmar blank och trind” by the Swedish composer CarlMichael Bellman was used. The way the MIDI file is per-formed, i.e., the sound level and articulation of the notes,and the tempo, depends on the gestures of the participants.Permorfer uses the KTH rule system for music performance[32] to change the musical parameters of a piece of mu-sic in real-time. In the program, 19 rules in total can bechanged independently or combined to create more intuitivecontrol parameters, such as activity and valence as describedby Friberg [33]. An extra parameter introduced for this testwas a time offset Δ between the main melody and the ac-companiment in the score. When Δ = 0, the melody and ac-companiment play synchronously; when Δ > 0, the melodyleads (i.e., plays before the accompaniment); when Δ < 0,the melody lags (i.e., the accompaniment plays before themelody).

For the experiment, acceleration data from the two mo-bile phones were collected and analysed to extract two pa-rameters: energy (integration of acceleration magnitude overa time window, 0 < E < 1) and CPR Index, which werethen mapped onto the parameters controlling the perfor-mance in Permorfer. CPR Index and energy E were bothmapped onto the melody lead parameter Δ, through the rela-tion shown in Fig. 5. The absolute value of Δ is mainly cou-pled to the CPR Index. When CPR > CPRmax, Δ is equalto 0, indicating that the two users are synchronised. WhenCPR < CPRmin, Δ is equal to Δmax, indicating a very no-ticeable lag/lead, the sign of which depends on the total en-ergy: if E > Ethr, Δ > 0, and the melody leads; on the con-trary, if E < Ethr, Δ < 0, and the melody lags. Finally, when

Fig. 5 The delay between melody and accompaniment depends onboth energy and synchronisation, following (3). Here Δmax = 500 ms,Δmin = 30 ms, CPRmin = 0.2, CPRmax = 0.5 and Ethr = 0.5

CPRmin < CPR < CPRmax, the absolute value of Δ is com-puted using (3). The set of possible values for Δ when E

varies is represented by the hatched domain in Fig. 5.

|Δ| = Δmin

+ (Δmax − Δmin)

×(

1 − CPR − CPRmin

CPRmax − CPRmin

)2(1−|E−Ethr|)(3)

To summarise, the value of Δ can be seen as resultingfrom two simultaneous mappings: a clipped linear mappingwith respect to CPR, and a non-linear mapping with re-spect to E. While the former provides a natural feedbackregarding the performance of the users in the synchroni-sation task, the role of the latter is to determine the signof Δ, with an additional minor influence on the value itself.The motivation behind this lies in the phenomenon of veloc-ity artifact in piano performances revealed by Goebl [37].The piano action induces a larger time offset (melody lead)between high-pitched melody and bass accompaniment atlouder dynamics—obviously corresponding to higher en-ergy transmitted by the fingers to the keys.

In addition to the melody lead parameter, the overall ac-tivity of the performance is also coupled to energy. A changeof activity corresponds to a change of performance parame-ters such as tempo, sound level, and articulation. For exam-ple, high activity implies fast tempo, high sound level andstaccato articulation [38].

6 Method

6.1 Participants

Fourty-two individuals (14 female and 28 male coming fromEuropean and extra-European countries) participated as vol-unteers in this study. Most of them were researchers, grad-uate students, and PhD candidates in Information and Com-munication Technologies (ICT), music, and acoustics fromlocal universities. Mean age was 30 y (20 y to 47 y). Allowned a mobile phone and they affirmed to use it every-day mainly to make calls, send SMSs, and take pictures.About 50% of participants had a long experience in playingmusic (mean experience was 14 y) using computer musictools like, e.g., sequencers, software synthesisers, and nota-tion editors or musical instruments such as guitar, drums, orpiano. All participants were naive inasmuch as they were notpreviously informed about the hypotheses being tested.

6.2 Procedure

The three sonifications were tested on three groups of sevenpairs each one performing the same trials in three different

Page 7: Interactive sonification of synchronisation of motoric behaviour in social active listening to music with mobile devices

J Multimodal User Interfaces (2012) 5:157–173 163

laboratories, located in three different EU countries. Par-ticipants were given mobile phones and they were askedto move them. The task for the users was to synchronisetheir gestures with each other. Two single hints were givento them: they were aware that the synchronisation was di-rectly linked to the acceleration measured by the mobilephones, and also that not moving the phones at all wouldstop the auditory display. Otherwise, they were totally freeto move the devices as they wanted, for example by shak-ing them. Shaking the mobile phone is a kind of rhythmicmovement that can be effortlessly sustained. Such kinds ofgesture (e.g., swinging a hand-held pendulum-like stick witha weight at the end, rocking in a rocking chair, walking on atreadmill) have been often used for examining the dynamicsof interpersonal coordination [24].

For each sonification two trials were counted: a trial inwhich the two participants could see each other and a trial inwhich they could not (they turned their back). Further, par-ticipants were asked to perform a trial without any sonifica-tion: in this trial they could see each other. Each trial lastedtwo minutes and, for each pair, the order of trials was ran-domised using a 7×7 Latin square approach. Before testingeach sonification, the participants were shown a slide de-scribing how the sonification works, that is, how the CPRIndex is mapped onto sound changes. Further, they were al-lowed a short training. This consisted of one minute alongwhich they could explore the sonification and try differentgestures.

Before the test, the participants were asked to fill upan anonymous questionnaire collecting general informationabout them such as age, gender, and occupation. Further, af-ter each trial they were asked to rate on 11-point Likert itemsthe easiness to synchronise and to use the mobile phone, theunderstanding of the sonification, the amount of information(with respect to synchronisation) elicited by the sound, thenaturalness of the interaction between gestures and sound,the feeling with the other participant, and how much sheperceived her synchronisation during the trial. These Lik-ert items were conceived in order to measure the attitude ofthe participants to each sonification and to what extent theparticipants perceived synchronisation between them. Thequestionnaire also included open questions and a blank sec-tion to collect comments and suggestions. The questionnaireis available in Appendix B. During each trial, the audio sig-nals and the time series of acceleration and of the CPR Indexwere also recorded.

7 Data analysis and results

The following subsections decribe the analysis carried outon the questionnaire and on the time series.

7.1 Questionnaire analysis

Questionnaire analysis consisted of an aggregate analysis ofthe Likert items to evaluate (i) the attitude of the participantsto sonification versus no sonification (items 4.1, 4.2, and 4.6)and, then, more specifically, (ii) the attitude to each sonifi-cation (items from 4.1 to 4.6). The former investigation wascarried out on the data concerning the trials in which theparticipants could see each other, whereas the latter investi-gation was carried out on the data concerning both the trialsin which they could see each other and the trials in whichthey could not. The scores of each participant were summedover the items to obtain her global attitude to sonificationand to each sonification.

The questionnaire has a high internal consistency. Inter-nal consistency was measured using Cronbach’s α. Its aver-age over the sonifications was 0.85 (SD = 0.03).

To understand whether the attitude of the participantschanges when performing the trials with and without soni-fication, a Friedman’s test was run on the Likert items con-cerning the easiness (i) to reach synchronisation (item 4.1),(ii) to use the device (item 4.2), and (iii) to interact with theother person (item 4.6) with or without sonification. Thistest revealed significant differences among the three soni-fications and the without sonification condition (χ2(3) =14.8, p < 0.05). Post-hoc tests showed that this is due tothe differences between: No sonification and Sync’n’Moog(p = 0.01), No sonification and Sync’n’Move (p = 0.02),and No sonification and Sync’n’Mood (p = 0.003). Figure 6depicts the pairwise differences between the scores the par-ticipants gave to the three sonifications and to the no sonifi-cation condition. No sonification is preferred with respect tothe three sonifications. A possible motivation for this resultis that sonification was more effective for non-expert par-ticipants who let themself be driven by the sound, whereasexpert participants tended to listen to sound and music witha professional approach. The whole experiment involved 27expert and 15 non-expert participants. Expert participantsare participants that play musical instruments or use com-puter music tools. The medians for the expert group and forthe non-expert group on the three Likert items taken into ac-count were 61 and 63, respectively. A Mann-Whitney testwas run to evaluate the differences in the answers of thesetwo groups. By testing the null-hypothesis that these differ-ences are not random, a significant effect of the group wasfound (p = 0.7).

With regard to the attitude to each sonification, Fig. 7depicts the box plot of this attitude without taking into ac-count the kind of feedback the participants were providedwith (visual-not visual). As a Mauchly’s test revealed noviolation of sphericity against sonification (W(3) = 0.95,p = 0.38), a one-way RM ANOVA was performed show-ing that the sonifications have no significant effect on the

Page 8: Interactive sonification of synchronisation of motoric behaviour in social active listening to music with mobile devices

164 J Multimodal User Interfaces (2012) 5:157–173

Fig. 6 The pairwise differences between the scores given over items4.1, 4.2, and 4.6. The significant differences are light-grey tinted. Theinner horizontal line and the edges of the boxes are the median and themedians of the upper and lower halves of the distribution, respectively.The inner fences are depicted by the lower and upper whiskers from the

boxes. Possible outliers are marked with an empty dot. y-axis showsthe difference between the scores. The three sonifications and the nosonification condition are identified as follows, Mg: Sync’n’Moog,Mv: Sync’n’Move, Md: Sync’n’Mood, Nsn: No sonification

Fig. 7 Boxplot of the attitude to each sonification (items from 4.1 to 4.6). y-axis shows the score computed as the sum over items from 4.1 to 4.6

score the participants provided (F(2,82) = 0.77, p = 0.46).Further, a comparison of the attitude to each sonificationtaking into account the kind of feedback participants wereprovided with was carried out. With a paired t-test, a sig-nificant effect for the feedback was found for each soni-fication with the visual feedback outperforming sonifica-tion without visual feedback. A measure of the effect size dwas also computed (Sync’n’Moog: t (41) = 3.35, p < 0.05,d = 0.52; Sync’n’Move: t (41) = 3.91, p < 0.05, d = 0.60;Sync’n’Mood: t (41) = 5.81, p < 0.05, d = 0.90).

7.2 Time series analysis

In order to provide objective results on the difference amongthe rhythmic activity performed with and without sonifica-

tion, among the sonifications, and try to understand the re-lationship between the physical measure of synchronisationand how it is perceived by the participants, an analysis onthe time series of the acceleration and of the CPR Index wascarried out. The first 10 s of the recorded time-series of theCPR index were excluded from this analysis to avoid pos-sible artifacts due to the start-up phase of each trial. Theremainder of these time series was split up in three equaland not-overlapped parts (Part I, Part II, and Part III): this isdone to enable a comparison among the perceived and theactual synchronisation (see details in the following). Fig-ure 8 shows an example of the CPR index over the threeparts of a trial (Sync’n’Moog with visual feedback).

For each Part the percentage of time along which the CPRIndex was greater than 0.4 was computed. The time percent-

Page 9: Interactive sonification of synchronisation of motoric behaviour in social active listening to music with mobile devices

J Multimodal User Interfaces (2012) 5:157–173 165

Fig. 8 The CPR Index over atrial (Sync’n’Moog with visualfeedback): the panels from thetop to the bottom show thevalues of the CPR Index overthe first, the second, and thethird part of the trial,respectively

age spent in coordination was chosen because this metricwas already used to measure the efficiency of the coordina-tion. For example, Marsh and colleagues, talking about anexperiment investigating rhythmic coordination of individ-uals rocking in a chair, say that “If oscillators are coupledmore weakly (e.g., two people have only peripheral visualinformation about the other’s movement), they will spendless time in these stable states than when there is strongercoupling (e.g., when the two people have focal visual infor-mation about the other’s movement)” [24].

Almost 20% of the data set was discarded due to tech-nical problems such as the temporary disconnection of themobiles from the network and the loss of UDP packets.

Since the coupling strength of a pair could not be system-atically changed, to verify the statistical significance of theresults a surrogate-based hypothesis test was needed. Thebasic idea was to compute, for each pair, the index of syn-chronisation between the original data of the first participantand data generated starting from the original data of the sec-ond participant but applying some randomness criteria. Insuch a way, if the resulting index of synchronisation is notsignificantly different from the original index, there is nosufficient evidence to claim synchronisation. The null hy-pothesis was the dynamical equivalence between the pair ofthe original data and the pairs including only one originaldata. With this hypothesis, the most suitable algorithm forgenerating surrogates is the Thiel’s algorithm of twin surro-gates described in [18]. By setting the significance level to95% and carrying out a one-sided test, the minimum numberof surrogates was 19. The durations along which synchroni-sation is relevant were extracted both when the CPR Indexwas computed on the original data and when the CPR Indexwas extracted replacing the second participant with a surro-gate. The application of the hypothesis test to such durations

Table 1 Statistical significance of the measured CPR Index for eachPart, for each kind of sonification, and for the no sonification condition.Percentages refer to the number of trials for which the measured CPRIndex proved to be significant, with respect to the total number of anal-ysed trials after removal of the corrupted ones. The conditions are iden-tified as follows: Mv-V: Sync’n’Move with visual feedback, Mv-NV:Sync’n’Move without visual feedback, Mg-V: Sync’n’Moog with vi-sual feedback, Mg-NV: Sync’n’Moog without visual feedback, Md-V:Sync’n’Mood with visual feedback, Md-NV: Sync’n’Mood without vi-sual feedback, Nsn: No sonification

Sonif. Part I Part II Part III

Mv-V 94.44% 94.44% 94.44%

Mv-NV 84.21% 68.42% 73.68%

Mg-V 89.47% 89.47% 89.47%

Mg-NV 76.19% 71.43% 85.71%

Md-V 91.66% 91.66% 91.66%

Md-NV 75.00% 83.33% 83.33%

Nsn 100% 100% 100%

allowed to reject the null hypothesis in 86.83% of the cases.More specifically, the null hypothesis was rejected, thus re-sulting in a statistically significant CPR Index, with the fol-lowing percentages: 84.68% of the trials for Sync’n’Move(94.44% when the visual feedback was allowed), 83.33% ofthe trials for Sync’n’Moog (89.47% when the visual feed-back was allowed), 86.11% of the trials for Sync’n’Mood(91.66% when the visual feedback was allowed), and 100%of the no sonification trials. Percentages are computed bytaking as reference the number of analysed trials, after re-moval of the corrupted ones. Table 1 reports the percentagesof rejection of the null hypothesis in each of the three partsthe performance was divided in.

Page 10: Interactive sonification of synchronisation of motoric behaviour in social active listening to music with mobile devices

166 J Multimodal User Interfaces (2012) 5:157–173

Fig. 9 The boxplots of the pairwise differences between the signifi-cant time percentages for Part I. The significant differences are light-grey-tinted. Possible outliers are marked with an empty dot. The three

sonifications and the no sonification condition are identified as follows,Mg: Sync’n’Moog, Mv: Sync’n’Move, Md: Sync’n’Mood, Nsn: Nosonification

Fig. 10 The boxplots of the pairwise differences between the signifi-cant time percentages for Part II. The significant differences are light-grey-tinted. Possible outliers are marked with an empty dot. The three

sonifications and the no sonification condition are identified as follows,Mg: Sync’n’Moog, Mv: Sync’n’Move, Md: Sync’n’Mood, Nsn: Nosonification

Three Friedman’s tests (one for each Part) were run onthe significant time percentages for the three sonificationsand for the no sonification trial. Significant differences werefound in Part I (p < 0.05) and Part II (p < 0.05). As forPart I, post-hoc tests showed that such differences are dueto the differences between: Sync’n’Moog and Sync’n’Mood(p = 0.02), No sonification and Sync’n’Moog (p = 0.0004),and No sonification and Sync’n’Move (p = 0.02). Concern-ing Part II, post-hoc tests reveal the following significant

differences: Sync’n’Moog and Sync’n’Mood (p = 0.005),Sync’n’Move and Sync’n’Mood (p = 0.04), No sonifica-tion and Sync’n’Moog (p = 0.04). Figures 9 and 10 depictthe boxplot of the pairwise differences between the timepercentages.

This shows how the use of sonification affects the timealong which the participants can keep synchronised. Fur-ther, to take into account how much sonification can affectthe amount of synchronisation, three more Friedman’s tests

Page 11: Interactive sonification of synchronisation of motoric behaviour in social active listening to music with mobile devices

J Multimodal User Interfaces (2012) 5:157–173 167

(one for each Part) were run for the three sonifications andfor the no sonification trial on the integral of CPR Indexover the time along which CPR was greater than 0.4. Thistest was carried out on the trials showing significant timepercentages. No significant differences were found. Such aresult suggests that high amounts of synchronisation are as-sociated with short synchronisation times and vice versa.

7.3 Perceived vs measured synchronisation

The comparison among the scores of the perceived synchro-nisation and the measured values of the CPR Index in thethree Parts was also performed.

As for the perceived values, the participants were askedto rate on 11-steps Likert scales how much they felt synchro-nised at the beginning, in the middle, and at the end of eachtrial, respectively (item 4.7). In order to evaluate whethersignificant differences occurred over the three parts of eachtrial, a Friedman’s test with post-hoc tests was carried outfor all the sonifications (both with and without visual feed-back) and for the trial without sonification. The significancelevel was set to 95%. All cases but Sync’n’Mood withoutvisual feedback show significant differences between Part IIand Part I, and Part III e Part I. The trial performed withoutsonification shows only a difference between Parts I and III.Figures 11, 12, 13 14, 15 and 16 depict the boxplots of thepairwise differences between the scores given by the par-ticipants over the Parts and show the direction of the re-sults. Perceived synchronisation is lower in Part I than inPart II. This effect could be due to the learning and under-standing of how the sonification works. This hypothesis is

further supported by considering that this effect does not oc-cur between Parts I and II in the no sonification trial. No sig-nificant differences occurred between the scores of Parts IIand III.

About the measured values of the CPR Index, for eachpart of the trials the distribution of CPR was computed andthe median was taken as its descriptive value to be comparedwith the scores the participants gave. To this aim, such amedian value was linearly remapped onto 11 points. Beforeproceeding with the comparison, a test for significant differ-ences occurring among the measured CPR values over thethree parts of each trial was carried out. An one-way RMANOVA did not reveal any effect of the Part on the CPRvalues.

Wilcoxon Mann Whitney tests were run to compare theperceived and the measured synchronisation values for eachof the three sonifications, for the no sonification condi-tion, and Part by Part. Since three tests were carried out,the comparison-wise level of significance was correctedby a Bonferroni correction to 0.016 to avoid Type I Errorrate inflation. Differences between medians were found inthe following cases: Sync’n’Moog Part II (medians of per-ceived and measured synchronisation were 7 and 4 respec-tively, p < 10−6, r = 0.67), Part III (medians of perceivedand measured synchronisation were 6.5 and 5 respectively,p = 0.002, r = 0.44); Sync’n’Moog NV Part II (medians ofperceived and measured synchronisation were 6 and 6 re-spectively, p = 0.007, r = 0.37), Part III (medians of per-ceived and measured synchronisation were 5 and 5 respec-tively, p < 10−4, r = 0.68); Sync’n’Move Part II (medi-ans of perceived and measured synchronisation were 6.5and 4.5 respectively, p = 0.003, r = 0.41); Sync’n’Mood

Fig. 11 The boxplots of the pairwise differences between the scores given by the participants for Sync’n’Move. The significant differences arelight-grey-tinted. Possible outliers are marked with an empty dot. y-axis shows the difference between the scores

Page 12: Interactive sonification of synchronisation of motoric behaviour in social active listening to music with mobile devices

168 J Multimodal User Interfaces (2012) 5:157–173

Fig. 12 The boxplots of pairwise differences between the scores given by the participants for Sync’n’Move without visual feedback. The signifi-cant differences are light-grey-tinted. Possible outliers are marked with an empty dot. y-axis shows the difference between the scores

Fig. 13 The boxplots of the pairwise differences between the scores given by the participants for Sync’n’Moog. The significant differences arelight-grey-tinted. Possible outliers are marked with an empty dot. y-axis shows the difference between the scores

Part II (medians of perceived and measured synchronisa-tion were 6 and 4 respectively, p = 0.0007, r = 0.60);Sync’n’Mood NV Part II (medians of perceived and mea-sured synchronisation were 4.5 and 1 respectively, p =0.015, r = 0.64); trial without sonification Part I (mediansof perceived and measured synchronisation were 8 and 5respectively, p < 10−5, r = 0.62), Part II (medians of per-ceived and measured synchronisation were 8 and 4 respec-tively, p < 10−7, r = 0.82), Part III (medians of perceivedand measured synchronisation were 8 and 4.5 respectively,p < 10−7, r = 0.82). In these Parts, the participants per-

ceived a synchronisation level higher than the level they re-ally reached.

7.4 Open questions, comments, and suggestions

The last part of the questionnaire included some open ques-tions about the experience the participants had by testingthe sonifications and the future perspectives of the proto-types exploiting them. Generally, the participants quotedthe experience with Sync’n’Move and Sync’n’Moog morepositively than the experience they had with Sync’n’Mood.

Page 13: Interactive sonification of synchronisation of motoric behaviour in social active listening to music with mobile devices

J Multimodal User Interfaces (2012) 5:157–173 169

Fig. 14 The boxplots of the pairwise differences between the scores given by the participants for Sync’n’Moog without visual feedback. Thesignificant differences are light-grey-tinted. y-axis shows the difference between the scores

Fig. 15 The boxplots of the pairwise differences between the scores given by the participants for Sync’n’Mood. The significant differences arelight-grey-tinted. Possible outliers are marked with an empty dot. y-axis shows the difference between the scores

More specifically, many participants stated sentences like,e.g.: “Sync’n’Move was the easiest and most interestingamong the three applications [. . .] it is easy to detect if I amnot in synchronisation”, and “Sync’n’Moog was the mostentertaining [. . .] it was easily understood and gave goodfeedback”. All the participants ascribed the easiness to reachsynchronisation to the fixed tempo of the music. The follow-ing quotes exemplify this: “It was easier to be synchronisedthan in the other two cases. You can follow the rhythm toget synchronised other than the movements of the other per-son”, “I tried to follow the tempo of the drums”, “I used

the device like a baton, keeping the tempo”. However, someparticipants reported the experience was frustrating and dif-ficult.

Sync’n’Mood was commonly perceived as the most chal-lenging and least understandable application: “I did not un-derstand the effect of my movements”, “Hard to know whenyou were synchronised and when you were not”. These dif-ficulties are ascribed by most of the participants to the lackof an evident rhythmic reference: “It was hard, there is norhythm to follow”, “The perception of beat is more difficultwhen the melody is lagging/leading”.

Page 14: Interactive sonification of synchronisation of motoric behaviour in social active listening to music with mobile devices

170 J Multimodal User Interfaces (2012) 5:157–173

Fig. 16 The boxplots of the pairwise differences between the scores given by the participants when they performed the trial without sonification.The significant differences are light-grey-tinted. Possible outliers are marked with an empty dot. y-axis shows the difference between the scores

Participants proposed useful improvements. Three im-provements most of them suggested are: to speed the re-sponse time, to match in time the tempo of the music withthe tempo people chose, to define a role playing in order toprovide people with knowledge about who is doing mistake.In addition, specific requirements were suggested for eachsonification, e.g., smoother transitions when synchronisa-tion is lost in Sync’n’Move, a more predominant harmonyin Sync’n’Mood.

All the participants agreed to identify as possible futureuses the development of cooperative games and of educa-tional systems for training and teaching tempo and rhythm.Some participants outlined that they felt fatigued at the endof the test, however the Likert item concerning the easinessof using the mobile phone during the trials did not show anytrend from which it can be inferred that fatigue affects theperformances. The score that each participant gave aboutthis item is not constant trial by trial (except for few ofthem), rather it changes with the kind of sonification, andis higher when they performed the trials in which they couldsee each other.

8 Conclusion

This paper studied three different interactive sonificationsof dyadic coordinated human rhythmic activity. The soni-fications were implemented as three prototype applications,exploiting mobile devices, and operating on different param-eters of the sound and music content. Evaluation was per-formed on both the questionnaires filled in by the users andon the data obtained by logging the measures performed bythe system (acceleration of the mobile devices, CPR Index).

Concerning the purposes of this work, i.e., to find outwhether there is a difference between (i) a dyadic coordi-nated rhythmic activity performed with and without sonifi-cation of the phase synchronisation index, (ii) the three soni-fication prototypes, and (iii) the measured and the perceivedsynchronisation, results can be summarised as follows. Theanalysis of the questionnaires revealed that the performancewithout any sonification is preferred by the expert partici-pants and that no statistically significant difference emergesamong the three sonifications. However, such a result is notfully confirmed by the analysis of the objectively measureddata, where significant differences emerge among the threesonifications and the performance without any sonificationdoes not systematically outperform the three sonificationsnor leads to longer synchronisation times. This suggests thatthe sonification actually helps the users to keep synchroni-sation for longer times, even if they do not perceive it as arelevant helper. A further consideration, confirming this re-sult, is that learning between Parts I and II only occurs whenthe sound feedback is provided and it does not occur in theperformance without sonification (it occurs there betweenParts I and III). Thus, the sound feedback facilitates learn-ing or makes it faster. As for the three sonifications, in Part IISync’n’Move and Sync’n’Moog outperform Sync’n’Mood.This result may be representative, since it concerns the cen-tral part of the performance, when the learning effect al-ready happened and partipants are not too tired. The re-sult may indicate that simpler sonification paradigms leadto longer synchronisation times. As regards the relationshipbetween the measured and the perceived synchronisation, amedium (0.3 ≤ r ≤ 0.5) and large (r > 0.5) significant ef-fect was found among measured synchronisation and per-

Page 15: Interactive sonification of synchronisation of motoric behaviour in social active listening to music with mobile devices

J Multimodal User Interfaces (2012) 5:157–173 171

ceived synchronisation for the three sonifications both whenvisual feedback was allowed and when it was not. However,such a difference does not concern all the three parts the per-formance was divided in, but only one or two of them (firstand second Part). In all the cases where the difference wassignificant the participants perceived a synchronisation levelhigher than the measured one.

Several aspects may have affected the results and need tobe considered in future work. One issue is the duration ofeach single trial and the whole duration of the experimentthat turned out to be quite long. A further issue concern-ing Sync’n’Moog and Sync’n’Move is that the tempo of themusic was fixed and did not change depending on the move-ment of the users. This is related to the fact that sonificationwas the mean to reach synchronisation, rather than beingthe goal (i.e., the task was to get synchronised, rather thanto reconstruct or reshape the music material). Another im-portant issue concerns the gestures participants performed.Even if participants were not constrained to perform a spe-cific gesture, most of them chose to just shake rhythmicallythe mobile device. Novel interaction paradigms enabling awider range of gestures have to be defined and developed.These have to be considered in a broader framework, whereinteractive sonification and active listening to sound and mu-sic content provide novel technology-mediated solutions forenhancing human perception and experience.

Acknowledgements This work was partially supported by the FP7EU-ICT Project 215749 SAME (www.sameproject.eu). SAME wasco-funded by the EU Research Framework Programme 7 under DGINFSO Networked Media Systems. Part of this work was funded bythe Accademy of Finland (Project no. 122815). Bresin, Dubus, andFabiani were partially supported by the Swedish Research Council,Grant Nr. 2010-4654. The authors thank Norbert Marwan for discus-sion about recurrence and the anonymous reviewers for their usefulcomments.

Appendix A

In this appendix, the time-varying nonlinear filter used in theSync’n’Moog application (see Sect. 5.1) is described.

The original analog Moog filter [8] consists of four iden-tical one-pole lowpass sections in series with a global neg-ative feedback to produce a resonant peak near the cutofffrequency. The first digital implementation of the Moog fil-ter was presented by Stilson and Smith [9]. Huovilainen[7] presented a nonlinear digital implementation which fol-lows more closely its analogue counterpart. The implemen-tation is computationally expensive because for each out-put sample five hyperbolic tangent (tanh) evaluations arerequired and an increased sample rate is need to avoid alias-ing. Välimäki and Huovilainen [10] presented a computa-tionally more efficient implementation. The performance is

increased by reducing the amount of costly hyperbolic tan-gent computations to one per output sample.

The Sync’n’Moog sonification is based on the digitalMoog filter presented by Välimäki and Huovilainen [10],which is embedded in a new structure shown in Fig. 17. Thesound output is a mix between the original audio input andthe Moog-filtered version of it. The mixing coefficient a isdetermined by the desired cutoff frequency fc and the upperthreshold frequency fmax:

a ={

fc/fmax, if 0 < fc ≤ fmax,

1, otherwise.(4)

The filter part consists of a feedforward section wherethe saturating nonlinearity (tanh) is in cascade with fourlowpass-filter sections. Each lowpass section is realised withthe following difference equation:

y(n) = h0x(n) + h1x(n − 1) + (1 − g)y(n − 1), (5)

where h0 = 1.0/1.3 and h1 = 0.3/1.3. The g parameter isused to determine the filter cutoff frequency fc in relation tothe system sample rate fs . For low cutoff frequencies g canbe approximated by:

g ≈ 2πfc/fs. (6)

The feedback section consists of a single delay, a res-onance control Gres = 0.9 and a pass-band gain controlGcomp = 0.5. The resonance control Gres parameter affectsto the resonance peak amplitude and can have values from 0(no resonance) to 1 (full resonance). The passband gain con-trol is used to keep the output amplitude at an approximatelyconstant level despite the changes in the resonance value.

Frequency responses of the Sync’n’Moog sonification forvarious cutoff frequencies are presented in Fig. 18. Whenfc = 200 Hz the response has a lowpass nature with a sharpresonance at the cutoff frequency. Increasing fc results inshifted lowpass and resonance peak location. The effect ofthe mixing coefficient is visible at the frequencies above thecutoff point when fc = 1.0 kHz. When fc = 3.0 kHz, a isset to 1.0 and the resulting response is flat because of theMoog filter output is attenuated to −∞ dB level and theoutput is the unmodified input signal.

Fig. 17 Block diagram of the Sync’n’Moog sonification algorithm.LP refers to a lowpass filter section and z−1 is a delay of one samplinginterval

Page 16: Interactive sonification of synchronisation of motoric behaviour in social active listening to music with mobile devices

172 J Multimodal User Interfaces (2012) 5:157–173

Fig. 18 Frequency responses of the Sync’n’Moog sonification for dif-ferent filter cutoff frequencies, when fmax = 3.0 kHz

Appendix B: Questionnaire

Part 1 concerned some general information on the partici-pants:

1.1 Gender (F/M)1.2 Age1.3 Nationality1.4 Occupation

Part 2 aimed at gathering information about the usage ofmobile phones by the participants:

2.1 Do you own a mobile phone? (No/Yes)2.2 If yes, which functions do you use the most? (sev-

eral answers are possible among: Making calls; Send-ing SMS messages; Taking pictures; Listening to music;Recording videos; Playing games: Others)

Part 3 was about the music background and skills of the par-ticipants.

3.1 Do you play/create music? (No/Yes)3.2 Do you make use of computer music technology tools?

(No/Yes)3.2.1 If yes, which ones? (for example sequencers, DJ

tools, software synthesisers, audio effects, nota-tion editors, etc.)

3.3 Do you play a musical instrument? (No/Yes)3.3.1 If yes, which instrument do you play?3.3.2 How long (years) have you played your instru-

ment for?3.3.3 You play your musical instrument(s) for about ...

hours/day3.4 The music genre that I prefer is ...3.5 I listen to music (only one answer among: Everyday;

Several times a week; Once a week; Several times amonth; Several times a year)

Part 4 included the following 11-step Likert items. It wasfilled up after each trial.

4.1 How easy was your task?4.2 How easy was the use of the device?4.3 How easy was to understand the sonification?4.4 How much information (with respect to synchroniza-

tion) could you extract from the sound feedback?4.5 How natural was the interaction between your gestures

and the sound feedback?4.6 You were involved in the experiment with another per-

son. Did you feel to interact with him? (No/Yes)4.6.1 If yes, how much?

4.7 How much did you feel to be synchronised?4.7.1 At the beginning of the experiment4.7.2 During the experiment4.7.3 At the end of the experiment

Finally, Part 5 aimed at collecting comments and sugges-tions (for each of the three sonifications).

5.1 Can you describe your experience with the program?5.2 Can you imagine how it could be used?5.3 What suggestions for improvement would you give?5.4 Free comments

References

1. Mathews MV (1991) The Radio Baton and conductor program,or: pitch, the most important and least expressive part of music.Comput Music J 15(4):37–46

2. Ilmonen T, Takala T (1999) Conductor following with artificialneural networks. In: Proceedings of the 1999 international com-puter music conference, Beijing, China, pp 367–370

3. Paradiso JA (1997) Electronic music: new ways to play. IEEESpectr 34(12):18–30

4. Wanderley M, Depalle P (2004) Gestural control of sound synthe-sis. Proc IEEE 92(4):632–644

5. Karjalainen M, Mäki-Patola T, Kanerva A, Huovilainen A (2006)Virtual air guitar. J Audio Eng Soc 54(10):964–980

6. Pakarinen J, Puputti T, Välimäki V (2008) Virtual slide guitar.Comput Music J 32(3):42–54

7. Huovilainen A (2004) Non-linear digital implementation of theMoog ladder filter. In: Proceedings of the 7th international confer-ence on digital audio effects (DAFx’04), Naples, Italy, pp 61–64

8. Moog RA (1965) A voltage-controlled low-pass high-pass filterfor audio signal processing. In: Proceedings of the 17th audio en-gineering society convention

9. Stilson T, Smith JO (1996) Analyzing the Moog VCF with consid-erations for digital implementation. In: Proceedings of the 1996international computer music conference, Hong Kong, pp 398–401

10. Välimäki V, Huovilainen A (2006) Oscillator and filter algorithmsfor virtual analog synthesis. Comput Music J 30(2):19–31

11. Varni G, Volpe G, Camurri A (2010) A System for real-time mul-timodal analysis of nonverbal affective social interaction in user-centric media. IEEE Trans Multimed 12(6):576–590

12. Volpe G, Camurri A (2011) A system for embodied social ac-tive listening to sound and music content. J Comput Cult Heritage4:2:1–2:23

Page 17: Interactive sonification of synchronisation of motoric behaviour in social active listening to music with mobile devices

J Multimodal User Interfaces (2012) 5:157–173 173

13. Laso-Ballesteros I, Daras P (2008) User centric future media In-ternet. EU Commission, Brussels

14. Marwan N, Romano MC, Thiel M, Kurths J (2007) Recurrenceplots for the analysis of complex systems. Phys Rep 438:237–329

15. Varni G, Mancini M, Volpe G, Camurri A (2010) A system formobile active music listening based on social interaction and em-bodiment. Mob Netw Appl 16(3):375–384

16. Bingham GP, Schmidt RC, Turvey MT, Rosenblum LD (1991)Task dynamics and resource dynamics in the assembly of coor-dinated rhythmic activity. J Exp Psychol Hum Percept Perform17(2):359–381

17. Richardson MJ, Marsh KL, Schmidt RC (2005) Effects of visualand verbal interaction on unintentional interpersonal coordination.J Exp Psychol Hum Percept Perform 31(1):62–79

18. Thiel M, Romano MC, Kurths J, Rolfs M, Kiegl R (2006) Twinsurrogates to test for complex synchronisation. Europhys Lett75:535–541

19. Gaye L, Mazé R, Holmquist LE (2003) Sonic city: the urban en-vironment as a musical interface. In: Proceedings of the 2003 in-ternational conference on new interfaces for musical expression

20. Östergren M, Juhlin O (2004) Sound pryer: truly mobile joint lis-tening. In: Proceedings of the 1st international workshop on mo-bile music technology

21. Anttila A (2006) SonicPulse: exploring a shared music space. In:Proceedings of the 3rd international workshop on mobile musictechnology

22. Rohs M, Essl G, Roth M (2006) CaMus: live music performanceusing camera phones and visual grid tracking. In: Proceedings ofthe 2006 international conference on new interfaces for musicalexpression, pp 31–36

23. Leman M, Demey M, Lesaffre M, van Noorden L, Moelants D(2009) Concepts, technology and assessment of the social musicgame Sync-in Team. In: Proceedings of the 12th IEEE interna-tional conference on computational science and engineering

24. Marsh K, Richardson M, Schmidt R (2009) Social connectionthrough joint action and interpersonal coordination. Top Cogn Sci1(2):320–339

25. Wallis I, Ingalls T, Rikakis T, Olsen L, Chen Y, Xu W, Sun-daram H (2007) Real-time sonification of movement for an im-mersive stroke rehabilitation environment. In: Proceedings of the13th international conference on auditory display

26. Godbout A, Boyd JE (2010) Corrective sonic feedback for speedskating: a case study. In: Proceedings of the 16th international con-ference on auditory display

27. Effenberg AE (2005) Movement sonification: effects on percep-tion and action. IEEE Multimed 12(2):53–59

28. Schaffert N, Mattes K, Effenberg AE (2010) Listen to the boatmotion: acoustic information for elite rowers. In: Proceedings of3rd interactive sonification workshop

29. Dubus G, Bresin R (2010) Sonification of sculler movements, de-velopment of preliminary methods. In: Proceedings of the 3rd in-teractive sonification workshop

30. Barrass S, Schaffert N, Barrass T (2010) Probing preferencesbetween six designs of interactive sonifications for recreationalsports, health and fitness. In: Proceedings of the 3rd interactivesonification workshop

31. Fabiani M, Dubus G, Bresin R (2010) Interactive sonification ofemotionally expressive gestures by means of music performance.In: Proceedings of the 3rd interactive sonification workshop

32. Friberg A, Bresin R, Sundberg J (2006) Overview of the KTH rulesystem for musical performance. Adv Cogn Psychol 2(2):145–161

33. Friberg A (2006) pDM: an expressive sequencer with real-timecontrol of the KTH music-performance rules. Comput Music J30(1):37–48

34. Kessous L, Jacquemin C, Filatriau JJ (2008) Real-time sonifica-tion of physiological data in an artistic performance context. In:Proceedings of the 14th international conference on auditory dis-play

35. Barrass S, Kramer G (1999) Using sonification. Multimed Syst7:23–31

36. Hermann T (2008) Taxonomy and definitions for sonification andauditory display. In: Proceedings of the 14th international confer-ence on auditory display

37. Goebl W (2001) Melody lead in piano performance: expressivedevice or artifact? J Acoust Soc Am 110(1):563–572

38. Bresin R, Friberg A (2011) Emotion rendering in music: rangeand characteristic values of seven musical variables. Cortex47(9):1068–1081