J Multimodal User Interfaces (2012) 5:157173DOI 10.1007/s12193-011-0079-z
O R I G I NA L PA P E R
Interactive sonification of synchronisation of motoric behaviourin social active listening to music with mobile devices
Giovanna Varni Gal Dubus Sami Oksanen Gualtiero Volpe Marco Fabiani Roberto Bresin Jari Kleimola Vesa Vlimki Antonio Camurri
Received: 31 January 2011 / Accepted: 12 November 2011 / Published online: 13 December 2011 OpenInterface Association 2011
Abstract This paper evaluates three different interactivesonifications of dyadic coordinated human rhythmic activ-ity. An index of phase synchronisation of gestures was cho-sen as coordination metric. The sonifications are imple-mented as three prototype applications exploiting mobiledevices: SyncnMoog, SyncnMove, and SyncnMood.
G. Varni G. Volpe () A. CamurriInfoMus, DIST, University of Genova, Viale Causa 13,16145 Genova, Italye-mail: firstname.lastname@example.org
G. Varnie-mail: email@example.com
A. Camurrie-mail: firstname.lastname@example.org
S. Oksanen J. Kleimola V. VlimkiDepartment of Signal Processing and Acoustics,School of Electrical Engineering, Aalto University,P.O. Box 13000, 00076 Aalto, Finland
S. Oksanene-mail: Sami.Oksanen@aalto.fi
J. Kleimolae-mail: Jari.Kleimola@aalto.fi
V. Vlimkie-mail: Vesa.Valimaki@tkk.fi
G. Dubus M. Fabiani R. BresinDepartment of Speech, Music and Hearing, School of ComputerScience and Communication, KTH Royal Institute of Technology,Lindstedtsvgen 24, 10044 Stockholm, Sweden
G. Dubuse-mail: email@example.com
M. Fabianie-mail: firstname.lastname@example.org
R. Bresine-mail: email@example.com
SyncnMoog sonifies the phase synchronisation index byacting directly on the audio signal and applying a nonlineartime-varying filtering technique. SyncnMove interveneson the multi-track music content by making the single in-struments emerge and hide. SyncnMood manipulates theaffective features of the music performance. The three soni-fications were also tested against a condition without sonifi-cation.
Keywords Interactive sonification Interactive systems Audio systems Sound and music computing Activemusic listening Synchronisation
This paper evaluates three different interactive sonificationsof dyadic coordinated human rhythmic activity. The sonifi-cations are implemented as three prototype applications ex-ploiting mobile devices. The study concerns a pair of usersinteracting by means of their movements and gestures, e.g.,shaking gestures, performed by holding and moving eachone a mobile phone. Kinematic features, such as acceler-ation and velocity, of the movement of the two users aremeasured by means of the on-board 3D accelerometer themobile phones are endowed with. According to the state ofthe art on coordination (e.g., [16, 17]), focusing mainly onthe timing or phasing of joint activity, phase synchronisa-tion of gestures is chosen as coordination metric. An indexof phase synchronisation of the movement of the two usersis extracted by applying Recurrence Quantification Analy-sis  to the kinematic features. This phase synchronisa-tion index is sonified using three different algorithms, im-plemented in three application prototypes, operating at dif-ferent conceptual levels. SyncnMoog sonifies the index by
158 J Multimodal User Interfaces (2012) 5:157173
applying a time-varying filtering technique to an audio sig-nal. SyncnMove works on the polyphonic structure of amultitrack audio signal, by making instruments emerge andhide. SyncnMood manipulates the affective content of themusic performance.
These three prototypes were compared in experimen-tal conditions where the users were provided with auditoryfeedback only and with auditory and visual feedback. A nosonification condition, where users are only provided withvisual feedback, was also tested as baseline. The purpose isto find out whether there is a difference between: (i) a dyadiccoordinated rhythmic activity performed with and withoutsonification, (ii) the three sonifications, and (iii) the mea-sured and the perceived synchronisation.
The remainder of the paper is organised as follows:Sect. 2 presents the research framework and related work;Sect. 3 shows how the present work is more in keepingwith the frame of interactive sonification than with data-driven music performance, considering successive defini-tions of sonification; Sect. 4 briefly introduces the architec-tural framework; Sect. 5 discusses the sonification designand presents the three algorithms; Sects. 6 and 7 describethe experimental methodology and discuss the carried outanalysis and the obtained results.
2 Research framework and related work
Research is carried out in the framework of social active lis-tening to music . In an active music listening experiencelisteners are enabled to interactively operate on music con-tent, by modifying and moulding it in real-time while listen-ing. Active music listening is performed in an user-centricperspective , that is, the experience is not so much de-pending on specific features of the content (e.g., the musicgenre or the score). Rather, it depends on the behaviour ofthe user, generally measured by means of a multimodal in-terface.
Historically, one of the first active music listening sys-tems, the Radio Baton, was developed by Mathews .The Radio Baton is a gesture control device, which de-tects the locations of the tips of two sticks held by the user.An early application of the device was to conduct a com-puter, which would play music using a MIDI synthesiser,while the user could communicate the basic musical param-eters, such as tempo and level. Later, various other technolo-gies have been used for the conductor following task, suchas magnetic trackers . Gesture control has been appliedwidely in musical sound synthesis, as Paradiso  and Wan-derley and Depalle  show in their tutorial reviews. Vir-tual air guitars [5, 6] are recent related applications, whichuse accelerometers and computer vision (i.e., camera-based
tracking) to locate the users hands and apply this informa-tion to control a guitar sound synthesiser. Playing a virtualair guitar can be much easier than playing a real musicalinstrument, because the players gestures can be coarselyclassified into a few classes, e.g., into only four differentchords. It will then be easy to play right, and the expe-rience can be classified as active music listening, becausemore interaction is allowed than when listening to a pieceof music, but it can be much more limited than in real play-ing.
Recently, active music listening prototypes were devel-oped for specific use with mobile devices. Mobile phonescan operate on music content available over a network.Moreover, they are broadly available, they are pervasive,they influence everyday life activities, and they are increas-ingly used as a device for social networking. Applicationswere developed for providing auditory feedback about phys-ical activities, expressive gestures  and outdoor sportstraining . Further applications concern sound manip-ulation and sharing in networked media scenarios (e.g.,Sonic City , Sound Pryer , Sonic Pulse ) andthe use of the phones as musical instruments, e.g., Ca-mus .
In this work, active listening is addressed in a networkedmedia scenario, where the listening experience is the resultof the collaborative non-verbal social behaviour of two ormore users. Whereas this paper deals with two users, ap-plications concerning more than two users are also beingdeveloped. For example, one of the prototypes presentedhere (SyncnMove) was extended to an arbitrary numberof users and was tested with four users. As another example,Leman and colleagues  developed a system analysingthe full-body rhythmic movements of a group of users andcomparing them with the beat of a song to produce auditoryfeedback.
Examples of interactive sonification of movement exist inthe literature. For example, it constitutes a method for pro-viding biofeedback as a part of elite sports training .The use of auditory display in general is particularly wellsuited for the realisation of time-related tasks, and conse-quently fits well with fields where timing is important suchas high-performance sport practice. It also offers promisingopportunities in other contexts, e.g., stroke recovery and sports rehabilitation . In the work presented here,on the one hand, the motoric interaction between the twousers affects sonification. On the other hand, sonification isthe auditory output of the system and an increasing synchro-nisation of the motoric behaviour contributes to a better ex-perience of the music content. Thus, interactive sonificationcan be deemed as a special kind of social active experienceof music.
J Multimodal User Interfaces (2012) 5:157173 159
3 Using music material in sonification
Considering early definitions of sonification, such as the useof non-verbal sounds to convey information, as introducedby Barrass and Kramer , the present work can be de-scribed rather safely as sonification. However, it does notseem to meet the requirements proposed in Hermanns tax-onomy of sonification  as the systematic nature of theauditory display is subject to question. Hermann states thatonly random generation (e.g., noise) and differences in elec-troacoustic equipment and conditions should be allowed asa concession to the fact that the sound wave would not beidentical for the same input data. In other words, a periodicinput data should give a periodic sound output in real-timeinteractive sonification, which is not the case for the threeprototypes described here, since the basis of the display is apiece of music unfolding in time. Nevertheless, the authorsbelieve them as fitting in a broadened definition of sonifica-tion where the same input data would yield the same con-sequences not directly on the sound wave but on propertiesof the sound at a higher level: tempo, articulation, metricvariations, and timbre. Being aware that this definition ison the edge of sonification, it seems necessary to make aclear distinction with other potential uses of music materialas constrained by input data such as data-driven music per-formance.
The border between data-driven music and sonificationcan be fuzzy, as pointed out in . The major point of dif-ference lies in the intention of the work: sonification is notmusic because these two domains clearly do not have thesame purpose. Whereas data-driven performance mainly tar-gets aesthetic qualities, the main goal of sonification is to op-timise efficiency of information communication. If aestheticconsiderations are often taken into account in sonification, itis rather for usability considerations than to achieve an im-proved efficiency. Similarly, portions of data can be made
more perceivable for artistic purposes in data-driven music,yet what motivates the design decisions is rather the intrin-sic aesthetic value of the piece of art than what the listenerswill catch from the underlying data. Although the three pro-totypes described in Sect. 5 use music as sound material, themain purpose of the setup is not for the auditory display tobe pleasant to hear or to arouse particular feelings for theparticipants but to help them to perform a physical task cor-rectly.
4 Architectural framework
Figure 1 summarises the general architecture of the de-veloped prototypes. All of them share the architecturalframework developed in the EU-ICT project SAME(www.sameproject.eu), and only differ for the adopted soni-fication algorithm.
The mobile devices Nokia N85/N95/N97 running on S60operating system were used in this work. These devices areequipped with 3D accelerometers able to measure acceler-ation in the range [2g,2g] at 30 Hz, and they communi-cate via network (a wireless LAN) with a server where theprototypes are running. The server performs low-level sig-nal processing, feature extraction, and sonification. This wasimplemented in real-time with the EyesWeb XMI platform(www.eyesweb.org).
4.1 Low-level signal processing
A normalisation is performed so that the magnitude of theacceleration varies in between 0 and 1 approximately. Thisis obtained through calibration, performed once the first timea specific mobile is connected. Calibration is carried out byfinding out the maximum and minimum values of the mag-nitude of the acceleration the on-board accelerometer can
Fig. 1 Block diagram of the adopted general architecture: mobile phones send data to a server through the network. The server performs low-levelsignal processing, feature extraction, and sonification
160 J Multimodal User Interfaces (2012) 5:157173
measure. Calibration inlcudes the measure of the g compo-nent the accelerometer detects on each of its three axes.
Besides acceleration, speed and energy are also com-puted. These are used to stop and start audio rendering whenthe users stop and start moving.
4.2 Computation of phase synchronisation index
The phase synchronisation index, the measure to be soni-fied, is computed on the magnitude of the accelerations ofthe two mobile devices with an approach based on Recur-rence Quantification Analysis (RQA) . The dynamics ofeach user is described by the time series of the magnitudeof acceleration. For each time series a1 and a2 the probabil-ities p(a1, ) and p(a2, ) of recurring (within a given tol-erance) after a time lag are computed. Finally, the phasesynchronisation index is computed as the correlation of thenormalised probabilities of recurrence (CPR):CPR = p(a1, ), p(a2, ), (1)where , indicates the correlation operation. For this rea-son, the phase synchronisation index is usually referred to asCPR Index. Intuitively, the more the two accelerations recurwith the same time lag, the more the phase synchronisationindex tends to 1.
5 Sonification design
The sonification algorithms have been chosen so that they in-tervene at different conceptual levels and on different qual-ities of the auditory feedback. The first one (SyncnMoog)operates at the physical level, by applying a filtering trans-formation of the audio signal, independently from any mu-sical concept or structure. The second one (SyncnMove)works at an intermediate level, intervening on the poly-phonic structure of the piece (i.e., adding or removing in-strumental sections depending on the phase synchronisa-tion index). The third one (SyncnMood) exploits high-level music features related to the expressive content mu-sic conveys. SyncnMoog can process any audio file.SyncnMove requires a multitrack recording as input. Theinstances of SyncnMove and SyncnMoog evaluatedhere used the same music piece. SyncnMood works onMIDI files. Some videos excerpts showing SyncnMoogand SyncnMove are available at the following address:http://www.sameproject.eu/demos/ (October 2011).
The prototypes work on sliding time-windows, whoseduration was set to 3 s. Such a duration proved to be, on theone hand, long enough to capture the phenomenon under ob-servation...