8

Click here to load reader

A system for mobile music authoring and active listening

Embed Size (px)

Citation preview

Page 1: A system for mobile music authoring and active listening

Entertainment Computing 4 (2013) 205–212

Contents lists available at ScienceDirect

Entertainment Computing

journal homepage: ees .e lsevier .com/entcom

A system for mobile music authoring and active listening q

1875-9521/$ - see front matter � 2013 International Federation for Information Processing Published by Elsevier B.V. All rights reserved.http://dx.doi.org/10.1016/j.entcom.2013.08.001

q This paper has been recommended for acceptance by Matthias Rauterberg.⇑ Corresponding author. Tel.: +39 010 2758252.

E-mail address: [email protected] (M. Mancini). 1 http://www.sameproject.eu.

Maurizio Mancini ⇑, Antonio Camurri, Gualtiero VolpeInfoMus Lab, DIST, University of Genova, Italy

a r t i c l e i n f o a b s t r a c t

Article history:Received 14 April 2012Revised 15 February 2013Accepted 12 August 2013Available online 11 September 2013

Keywords:MobileOrchestraParadigmAuthoringExploreActive listening

We propose a paradigm and an end-to-end system to support authoring and real-time active listeningexperience of prerecorded music. Users are prosumers who navigate and express themselves in a shared(physical or virtual) orchestra space, populated by the sections of an orchestra playing a prerecordedmusic piece. The system consists of two components: an authoring system enabling the user to editand shape personalized versions of active listening experiences, and a run-time environment implement-ing the mechanisms of interaction with the active music content. Both components exploit metadatadefining the mapping rules of users expressive gesture and context information onto sound and musicprocessing. A user interacts with content by holding her smartphone in her hand and by using environ-mental sensors and context-aware information to shape the experience, possibly in cooperation withother users.� 2013 International Federation for Information Processing Published by Elsevier B.V. All rights reserved.

1. Introduction

In Human-Computer Interaction active listening is emerging asa new concept, on which novel paradigms for expressive multi-modal interfaces have been grounded [1,2]. These aim at empow-ering users to interact with and shape the audiovisual content byintervening actively into the experience. Active listening exploitsnon-intrusive technology and sensors, and is based on natural ges-ture interaction [3].

Music making and listening are a clear example of a humanactivity that is, above all, interactive and social, two big challengesfor new communication devices and applications. To date medi-ated music making and listening is usually a passive, non-interac-tive, and non-context sensitive experience. The current electronictechnologies, even if they allow for example electronic musiciansto cooperate remotely in music composition and creation (e.g.,MusicianLink [4], Skype [5]), have not yet been able to supportand promote the essential aspect of music making and listeningfor music consumers. This can be considered a significant degrada-tion of the traditional live listening and music making experience,in which the public was (and still is) able to interact in many wayswith performers to modify the expressive features of a music piece.

Mobile devices are commonly used for music consumption, andmore often as the sole music playback device. Recent advances innetworked media and in sound and music computing enable mo-bile experiencing and interactively molding music in countless

ways, both for musicians (e.g., iMASCHINE [6], DM1 [7]) and con-sumers (e.g., DJay [8]). Music can be reused, distributed, andshared with practically no extra effort. As a consequence, newbehavioral patterns are emerging as the modern way for peopleto capture, share, and re-live their experiences and personal histo-ries. The need for systems allowing new creative forms of interac-tive music experience in context-aware mobile scenarios isincreasingly evident (see for example the work reported in [9–11]). They also enable individuals to become both producers andconsumers of multimedia content (i.e., prosumers) and they handlethe whole media chain in both directions, that is, they are end-to-end systems [12].

An end-to-end system for active listening needs an integratedprocessing of music, gesture, and context. The EU-ICT projectSAME1 [2,13] started to face active music listening. This work is acontinuation of such a research, focusing on novel search and pro-cessing paradigms of multimedia content, based on non-verbal mul-timodal interaction.

This paper presents the Mobile Orchestra Explorer, an end-to-endsystem supporting authoring and real-time active listening experi-ence of prerecorded music. The system exploits the interactionparadigm of exploration/navigation of a physical or virtual space.The Mobile Orchestra Explorer enables the user to fill the spacewith the instruments/sections of a virtual orchestra and then to ex-plore the resulting virtual ensemble by moving in the space. More-over, the user can mold the music which is being played by meansof her expressive gesture. An instance of this system was evaluated

Page 2: A system for mobile music authoring and active listening

206 M. Mancini et al. / Entertainment Computing 4 (2013) 205–212

during the Festival of Science 2010, a public event held annually inGenoa, Italy [37].

2. Related work

The Mobile Orchestra Explorer is conceived as an end-to-endgeneralization of a previous system, the Orchestra Explorer [1].The Orchestra Explorer was a first active music listening systemenabling users to physically navigate inside a virtual orchestraand to modify and mold in real-time the music performancethrough expressive full-body movement and gesture. By walkingand moving in a sensitive space, the user discovered each singlemusic instrument or section and could operate through her expres-sive gestures to modify in real time the music the instrument orthe section was playing. However, the Orchestra Explorer did notinclude any authoring component explicitly nor any possibility ofsharing the experience with other users. The configuration of thevirtual orchestra was prepared beforehand by an artist and thepossibility of modifying movement with gesture was limited tothe mechanisms that the artist made available while preparingthe experience. Previous work addressing the conduction of a vir-tual orchestra (not the authoring or the exploration of a virtualspace populated by audio sources) is reported in [14–17].

A more sophisticated system, Mappe per Affetti Erranti [18], wasstill grounded on the exploration/navigation paradigm, but naviga-tion was performed by multiple users on both a physical and anaffective map. Each user ‘‘embodied’’ a different section of a poly-phonic music piece and could modify its interpretation by herexpressive movement. For example, let us consider a vocal poly-phonic music piece, where each user embodies a different vocalpart: a style of movement ‘‘shy and hesitant’’ of a user might cor-respond to a change of the musical interpretation of her embodiedmusic part to ‘‘whispering and soft’’. A group of users could recon-struct a music piece in one of its specific expressive interpretation(e.g., shy, solemn, happy) only if all of them move with the sameexpressive intentions, i.e., they are located in the same area ofthe affective map. In this way, the system supports the social par-ticipation of the users to create a group experience of music.

Further research on active music listening includes for examplethe works by Goto and Pachet. Goto proposed a GUI-based systemfor intervening on prerecorded music with some original real-timesignal processing techniques to select, skip and navigate sections ofthe recording [19]. Pachet [20,21] at Sony CSL proposed a similarapproach to active music listening based on predefined mixingconfigurations for spatialization of sound sources. Further, he pro-posed another system, the Continuator, an auto-reflexive systemcapable to build music improvisation experiences, in which thesystem plays a role of a companion of the user in music sessions.

Some research specifically addressed mobile music perfor-mances: SonicCity uses multimodal sensors allowing a single userto create music and manipulate sounds by using the physical urbanenvironment as interface; information about the environment andthe user’s actions are captured and mapped onto real-time pro-cessing of urban sounds [22]. SoundPryer is a peer-to-peer applica-tion of mobile wireless ad hoc networking for PDAs, enabling usersto share and listen to the music of people in vehicles in the imme-diate surrounding [23]. The SonicPulse system is an application de-signed to discover other mobile music users in a physicalenvironment and engage with them sharing and co-listening tomusic [24].

Fig. 1. The Mobile Orchestra Explorer: multitrack audio is retrieved from adatabase; users are allowed to create a personalized version of the music piecevia a mobile authoring interface; users’ profiles are stored in a database; thepersonalized music piece is retrieved from the database and rendered during theactive listening experience (Orchestra Exploration).

3. Use scenario

A user stands in a room equipped with a 3D spatial soundspeaker system, holding her mobile phone in her hand. She

connects to a server and downloads a playlist of music pieces shecan actively experience. She selects one of them. On the displayof her mobile she can see a graphical representation of a virtualorchestra or ensemble playing the selected piece, with icons show-ing the position of sections/instruments in this space. Now, she canenable or disable the sections, change their position in space, andapply audio effects. The configuration of the personalized userexperience is stored on the server. The user can then explore thespace as she configures it: when she walks in the room gettingclose to one section, she listens to such a section with a fade-in vol-ume effect. Depending on the user’s quality of motion (e.g., jerkyvs. smooth) the music is processed by applying audio effects(e.g., low-pass filters). The resulting sound, depending on the user’schoice, is rendered either on the 3D spatial sound speaker systemor on the user’s mobile phone. In case the user does not have en-ough room for the physical exploration, she can do it in a virtualspace displayed on her mobile device, tilts and rotations of the de-vice being connected with navigation in the space. If instead theuser is in a public space (e.g., a pub or a disco) endowed with anactive listening station, including environmental sensors (e.g.,cameras) and professional audiovisual rendering systems, she cantake her turn at it and show or make her friend try her active lis-tening interpretation of the music piece. Moreover, the user canshare and reuse the configuration she made with her friends.

The Mobile Orchestra Explorer is an end-to-end system sup-porting such a use scenario. As such, it needs component for theauthoring phase, an exploration component for the run-time phase(see Fig. 1). Both access to archives of user profiles and audio con-tent and exploit metadata.

4. Architecture

The Mobile Orchestra Explorer is conceptually conceived as cli-ent-server system as illustrated in Fig. 2. Clients are mobile devicessuch as smartphones or tablets.

Client applications perform computationally inexpensive oper-ations, such as user’s input acquisition, real-time low-level audioprocessing, and output data rendering. Low-level audio processingis strictly related to particular cases in which there is the need forvery low-latency real-time reaction to user movements, to keepthe user ‘‘connected’’ and aware and in control of the application(reactive processing, low-level mapping). This reactive feedbackrequires constant real-time response within 40–100 milliseconds[25], and is therefore performed at the client level.

Page 3: A system for mobile music authoring and active listening

Fig. 2. Conceptual framework: several mobile devices, performing lightweight computation such as acceleration measurement, communicate with a server that performsheavier computation such as user’s movement analysis. Then, the server sends audio/video back to the mobile devices to provide feedback to the user.

M. Mancini et al. / Entertainment Computing 4 (2013) 205–212 207

Further users’ information (e.g., position in space, configurationof the user’s body) may also be acquired by environmental sensors,in case the space the user currently inhabits is endowed with them.A web server receives the input from the mobile devices and theenvironmental sensors, performs the most intensive computations(e.g., expressive gesture processing), and sends the output data(e.g., audiovisual streaming of music) back to the mobile devicesand possibly to videoprojectors and sound equipment, in case theyare available. Expressive gesture processing is performed byextracting gesture features at different levels: from movementkinematics gesture descriptors such as energy, impulsivity, anddirectness.

In the following, the Mobile Orchestra Explorer is presented byproviding a detailed description of its major components, with par-ticular reference to the authoring component and the explorationcomponent.

An instance of the Mobile Orchestra Explorer has been imple-mented using EyesWeb XMI2 [26]. The system was developed bothon some clients (mobile phones) running python scripts and on aserver, running a Linux version of the EyesWeb XMI kernel.

4.1. Authoring component

The authoring component consists of an authoring mobile inter-face and several motion analysis and audio processing plugins run-ning on a server. The server stores audiovisual material and itsrelated metadata, such as: (i) playback options, (ii) parameters ofspecific digital audio effects, (iii) interaction rules, and (iv) strate-gies for mapping the detected behavior of the user onto the pro-cessing of the audiovisual content.

By operating with their mobile device, users can select theaudio content they want to interact with. They can look at the dif-ferent sections or instruments the selected audio content includesand can locate them in the virtual space. Users are also allowed toapply audio plugins running on the server (e.g., from audio filtersto sound separation plugins). Plugin parameters can be set upeither by means of the mobile keyboard/touch screen or by directly

2 http://eyesweb.infomus.org.

moving (e.g., shaking, twisting) the mobile device, that is, the usercan mimic the kind of movement she would like to perform withher mobile and can connect it with a particular way of processingthe audio content.

On the server side, a user-defined archive of music audiovisualcontent is managed, including metadata describing how the musiccontent is processed according to the mapping strategies and rulesdefined in the authoring process. The authoring component mayfurther include techniques for source separation: this enables theuser to select a music piece and to retrieve tracks in different audiochannels, each with approximately single music instruments ormusic sections.

Finally, the authoring component manages saving and upload-ing the personalized configurations on the server. Beside the mo-bile interface, the authoring component may also run onstandard PCs.

Fig. 3 shows the authoring component architecture in detail.

4.1.1. Authoring interfaceThe authoring interface, running on the user’s mobile device,

graphically displays the current audio content setup, acquires theuser’s interaction data, and exchanges data with the server.

Two client authoring interfaces have been implemented, allow-ing a user to view and edit the definition of the explorations avail-able on the server: a web interface (Fig. 4, on the left) running onany mobile device with a web browser; a Python interface runningon Symbian phones (Fig. 4, on the right). While the Python inter-face can be run only on dedicated devices (e.g., Nokia phones),the web interface could be easily integrated with other web appli-cations, such as those running on social networks.

The server embeds a Python plugin, allowing the authoringinterface in the client to send the commands in the following for-mat. Normal HTTP ReST commands are used. Query commands:

� audio_listIt returns the list of music pieces that can be explored.� background

It returns the image to be used as background while exploringthe piece.

Page 4: A system for mobile music authoring and active listening

Fig. 3. The authoring process components. The authoring interface displayed on the user’s mobile device screen interacts with a server, allowing the user to retrievemultitrack audio with metadata (specifying for example which personalizations can be performed by the user) from a database; audio plugins (e.g., VST effects) can then beapplied by the user either via the authoring interface or by directly performing expressive gestures (e.g., smooth movements); audio sections can be positioned in the space;finally the user-defined configurations are stored as metadata in the user’s profile for later use.

Fig. 4. The Python (left) and web (right) interface of the authoring component.

208 M. Mancini et al. / Entertainment Computing 4 (2013) 205–212

Editing commands:

� addIt adds a new music piece to be explored.� background_add

It specifies the background to associate with a music piece.� audio_tracks

It returns the list of audio files that can be added to anexploration.� audio_add

It adds a new audio file to a music piece, together with the 2Dposition where it should be played.

Play commands:

� playThe server starts a new session in the EyesWeb server, config-ures it to play the requested music piece, then returns to the cli-ent the URL where the streaming audio can be retrieved.� stop

It closes the session in the EyesWeb server, terminating theaudio streaming.

4.1.2. Audio and metadataThe Mobile Orchestra Explorer addresses both professional

users (e.g., recording labels) and prosumers. Each music piececan be stored in one or more formats and in different versionsfor different users:

1. Mp3: lower use of storage space but poor interaction and per-sonalization due to the loss of information because ofcompression.

2. High-quality audio (e.g., uncompressed stereo): higher use ofstorage space but some personalization is possible to users(e.g., FFT allowing source separation, filtering, etc.).

3. Multitrack high-quality audio: highest use of storage space butfully personalized active experience. In principle, any process-ing available in the final stages of the typical audio post-pro-cessing phase may be available here. Further, different kindsof audio processing can be mapped and controlled by differentinteraction modalities (e.g., expressive gestures, individual orgrouped dance movements, context and environmental vari-ables). Constraints to the degrees of freedom on audio andmusic processing may be superimposed, e.g., by the composeror by the recording label due to artistic aims.

These are also different levels of personalization, which mayopen novel business models for the delivery of digital music. Thesecond and third level are of particular interest in this perspective:each music piece includes a metadata schema, which defines therange for metadata, the rules, and the constraints to apply gesturalmapping strategies onto audio and music features on this specificpiece. For example, a commercial piece may not allow to modifythe leading voice of the artist, or may want to define the alloweddegrees of freedom for intervention on a specific music instrumentor section: the user may personalize music sections or singleinstruments by selecting from a predefined subset of filters, digitalaudio effects, sound 3D spatialization, specific for the piece. An

Page 5: A system for mobile music authoring and active listening

M. Mancini et al. / Entertainment Computing 4 (2013) 205–212 209

instance of a metadata schema for three instruments (audio tracks)in a specific user profile is shown below:

<section name= "lead guitar" position="right"><plugin name="moog-filter"/><plugin name="sound-source"/><gesture-mapping>

<rule feature="smoothness" parameter="moog-

filter.gain"/><rule feature="user-position"

parameter="sound-source.position"/></gesture-mapping>

</section><section name= "rhythmic guitar"

position="left"><plugin name="sound-source"/><gesture-mapping>

<rule feature="user-position"

parameter="sound-source.position"/></gesture-mapping>

</section><section name= "voice" position="center">

<plugin name="sound-source"/><gesture-mapping>

<rule feature="user-position"

parameter="sound-source.position"/></gesture-mapping><social-interaction>

<rule feature="syncAll"

parameter="volume.gain"/></social-interaction>

</section>

In the above example, the section ‘‘lead guitar’’ can be person-alized by the user by applying a moog filter associated to thesmoothness of her movement and positioning the sound sourcein the user position in the listening space. The section ‘‘bass guitar’’can be only positioned in the listening space depending on theuser’s position. Finally the volume of the section ‘‘voice’’ is deter-mined by the level of synchronization of movement of this userwith all the other users sharing the experience (to make the voice‘‘emerging’’ only if all users are synchronized).

Multitrack audio with metadata coding includes:

a) Associations of music sections to space regions and users’gesture: e.g., how to activate and mold each section andthe whole piece depending on the users’ position andgesture.

b) Allowed real-time audio processing for each music sectionand to the whole active listening result (e.g., delays, filters,voice morphing).

c) The rules defining how to process context and social signals(shared active experience).

4.1.3. Audio pluginsThe server includes an open library of audio plugins, as shown

in Fig. 3. Audio plugins can include a broad range of digital audioeffects, the control of the 3D position of sections or instruments,and in general filters and processing of the audio content, includ-ing, for example, the VST standard plugins.3 Audio plugins can alsoinclude tools for source separation in the case of stereo audio con-tent (see for example [27]).

3 http://www.steinberg.net.

Parameter settings of audio plugins may be controlled in a nat-ural way by users through expressive gesture. Expressive gestureanalysis to control such plugins is described in Section 4.2.1.

4.2. Exploration component

After setting up the orchestra space by defining positions of sec-tions and audio effects, users are allowed to explore the space asactive agents of the audio experience. The architecture of theexploration component is illustrated in Fig. 5. The explorationexperience starts by the selection of the musical piece the useraims to explore: the audio metadata, previously stored in the data-base shown in Fig. 5, defines a mapping between the availableaudio and exploration features (as explained in Section 4.1.2).

Exploration can either take place in a virtual space displayed onthe mobile device or in the physical space. A simple example ofexploration, available in the current implementation of our system,is the following: on their mobile devices users can see a graphicalrepresentation of the audio sections (e.g., the silhouettes of theinstruments), with icons showing their positions in space. A reddot represents the user’s position. The user’s task is to movearound, discovering and listening to the various audio sections.As the exploration starts, instruments start playing a piece of musicbut the user does not hear them all at the same time. Instead, bytilting the phone on the left/right or forward/backward directionthe user moves the red dot horizontally and vertically on the mo-bile screen. As the user moves near one or more audio sections, shehears only these sections playing (see the audio generation moduleof Fig. 5). Sensors embedded in the user’s mobile device detect theuser’s quality of movement (e.g., smoothness vs. impulsiveness)that the audio effects module maps onto audio effects (e.g., pitchbend). The sound produced by the sections is either rendered onthe user’s mobile device or spatialized, that is, it is mapped intothe physical environment of the user, in a way that the physicalsource of each section seems to correspond to the point of the vir-tual space in which the section is located.

4.2.1. Expressive gesture processingSeveral studies from psychology investigated ‘‘expressive’’ body

movements. Many studies have found evidence for features ofbody movements related to different personalities [28,29] andemotional states [30]. These features refer to the way a gesture isperformed, that is, for example, its speed, amplitude, smoothness.In the Mobile Orchestra Explorer, expressive gesture processing iscarried out according to the expressive gesture layered conceptualframework developed by Camurri and colleagues and further re-fined in [2] (see also Fig. 2). This framework - originally conceivedfor analysis of full-body movement - has been adapted and ex-tended in order to be applied on data referring to the movementof handheld devices. The extracted features range from low-levelphysical measures (e.g., acceleration, velocity, and their statistics),towards movement and overall gesture features (e.g., motion en-ergy, smoothness, impulsivity, directness [31]). Based either onthe acceleration captured by the accelerometer embedded in theuser’s mobile device or on the movement captured by possibleenvironmental senors (e.g., cameras), the computation of severalmovement features is performed on the server using EyesWebXMI and the EyesWeb Expressive Gesture Processing Library [31].

As an example, we describe a technique for extracting one ofsuch features, which is particularly relevant for the Mobile Orches-tra Explorer: the Impulsivity Index. This index can be used, e.g., fordistorting the audio signal when a sudden movement is detected.Other movement features are available on the server, details aboutthem can be found in [18,32,33].

In human motion analysis, impulsivity can be defined as atemporal perturbation of a regime motion [34]. Such a statement

Page 6: A system for mobile music authoring and active listening

Fig. 5. The exploration component: mobile clients exchange data with the server, that performs computational-expensive operations (such as audio modulation and mixing).The result is rendered on the mobile devices.

210 M. Mancini et al. / Entertainment Computing 4 (2013) 205–212

refers to the physical concept of impulse as a variation of themomentum. This contributes to define and reach a reference mea-sure for impulsivity. From psychological studies [35,36], an impul-sive gesture lacks of premeditation, that is, is performed ‘‘without asignificant preparation phase’’. We developed an algorithm forimpulsivity detection, previously presented in [32,33], where agesture is considered an impulse if it is characterized by, for exam-ple, short duration and high magnitude.

Fig. 6 depicts how our algorithm for impulsivity computationworks. First, 3-axis acceleration, coming from a mobile deviceaccelerometer is calibrated (see the acceleration calibration blockin Fig. 6). That is, an absolute acceleration value is computed andthe gravity influence is subtracted:

Acal ¼jj~A�~gjj

AMAX; ð1Þ

Then, we compute two movement features:

� Kinetic Energy (KE), an approximation of the user’s movementenergy. We integrate the acceleration Acal over time, obtainingan approximation of user’s hand speed S. Then we computethe corresponding kinetic energy:

Fig.

KE ¼ 12

mS2 ð2Þ

where m is the mass of the user’s hand, that we replace with 1.� Gesture Duration (GestDur): by comparing the user’s movement

speed S with a fixed threshold, we determine GS, the gesturestarting time (corresponding to the time in which the speedexceeds the threshold) and GE, the gesture ending time (corre-sponding to the time in which the speed becomes lower thanthe threshold). Gesture duration is computed by:GestDur = GE � GS.

6. Impulsivity Index flow chart. The input accelerometer data coming from mob

Given KE and GestDur, the algorithm for computing the Impul-sivity Index is illustrated in the flow chart in Fig. 6. KE is comparedwith the threshold ET and GestDur with the threshold DT: if theyboth are greater than the corresponding threshold then an impulseis detected.

5. Implementing the use scenario

We now provide details of the implementation of the use sce-nario described in Section 3.

5.1. Authoring phase

Fig. 7 demonstrates an example of the authoring phase. Theuser enters a room and her position is tracked by a camera. Theuser hold her mobile phone in her hand and on the mobile’s screenshe sees a red dot moving on an empty background (see the topscreen capture of block (a) in the Figure). The red dot representsthe user’s position in the Orchestra space. By pressing the menukey on her mobile phone, the user enters a menu (middle screencapture of block (a)) where she can choose which instrument couldbe moved to the current user’s position. The instruments listed inthis menu are those defined in the XML metadata described in Sec-tion 4.1.2. The user selects the rhytmic guitar, so a silhouette of theinstrument is added to the Orchestra space (bottom screen captureof block (a)).

The user walks to another point of the room and the red dot onher mobile’s screen moves accordingly. When she reaches the newpoint (top screen capture of block (b) in the Figure) she repeats theabove steps for positioning the lead guitar. This operation termi-nates the authoring phase, the personalization of the Orchestra

ile devices is processed to determine whether there in an impulsive movement.

Page 7: A system for mobile music authoring and active listening

Fig. 7. An example of the authoring phase of the Mobile Orchestra Explorer: the user is personalizing the Orchestra space configuration by positioning the music sections.

M. Mancini et al. / Entertainment Computing 4 (2013) 205–212 211

space performed by the user is stored in an XML file for later fru-ition by the user herself or other users.

5.2. Exploration phase

Exploration phase output consists of the, sometimes 3D ren-dered, processed music content. It also includes possible visualfeedback either on the mobile phone or in the physical space.

Fig. 8 shows an example of visual feedback that is displayedwhile the music is playing. The user can move in front of a largedisplay equipped with loudspeakers. When he moves close to theright part of the display, he causes the emergence of the lead guitarmusic section, then when he moves to the left, the rhythmic guitaremerges. The sections might remain and fade-off slowly, so that if

Fig. 8. An instance of the Mobile Orchestra Explorer: a user can mix and

the user does not give attention to some music section, it slowlydisappears from the audio and the videoclip.

6. Discussion

The Mobile Orchestra Explorer allows users to navigate and ex-press themselves in a virtual or physical space, populated by thesections of a prerecorded music.

An evaluation study was carried out during Festival of Science2010 in Genoa, Italy [37]. Forty participants experienced the Mo-bile Orchestra Explorer and filled a questionnaire about their activemusic listening experience. The first part of the questionnairedetermined the mobile user profile: it addresses how the visitoruses her mobile in daily life (to make calls, to take pictures, to send

control music instruments of one of his favourite artist’s videoclip.

Page 8: A system for mobile music authoring and active listening

212 M. Mancini et al. / Entertainment Computing 4 (2013) 205–212

and receive sms, etc.); it also investigated her musical background(expert, music lover) and identified the type and frequency of herphysical activities (sport, dance). The second part focused on theevaluation of the Mobile Orchestra Explorer by asking the user torate: (i) the application’s usability (e.g., level of understandingand control), (ii) user’s satisfaction and (iii) music embodimentand active listening experience.

Results confirmed that the Mobile Orchestra Explorer, consid-ered as a proof-of-concept of active music listening, increases theuser’s interest for music and has a possible educational potential.Future work includes tests specifically designed to investigatehow the real-time temporal and spatial processing of music con-tent may facilitate the acquisition of musical skills. As a whole,the evaluation study confirmed the suitability of active listeningfor creative entertainment. In the future, applications to educationand therapy & rehabilitation will be subject of furtherinvestigation.

Finally, the Mobile Orchestra Explorer enables novel and verypromising business models for the delivery of digital music, as pre-liminary experiments with music labels confirm.

Acknowledgements

Some of the expressive gesture features mentioned in this paperare currently further investigated and refined with the partial sup-port of the EU-ICT Project ASC-Inclusion, funded from the Euro-pean Commission (grant no. 289021, ASC-Inclusion). We thankAlberto Massari for the implementation of the EyesWeb Linux ver-sion and Donald Glowinski for the evaluation study. We also thankSimone Ghisio for the system output example, Corrado Canepa, andPaolo Coletta for their contribution to this research and for usefuldiscussions.

References

[1] A. Camurri, C. Canepa, G. Volpe, Active listening to a virtual orchestra throughan expressive gestural interface: the orchestra explorer, in: Proceedings of the7th International Conference on New Interfaces for Music Expression.

[2] G. Volpe, A. Camurri, A system for embodied social active listening to soundand music content, ACM Journal on Computing and Cultural Heritage 4 (2011)2–23.

[3] A. Camurri, G. Volpe, H. Vinet, R. Bresin, M. Fabiani, E. Maestre, J. Llop, J.K.S.Oksanen, V. Valimaki, J. Seppanen, User-centric context-aware mobileapplications for embodied music listening, in: ICST Conference on UserCentric Media, volume LNCST 40, Springer, 2009.

[4] MusicianLink, http://www.musicianlink.com, 2012.[5] Skype, http://www.skype.com, 2012.[6] iMASCHINE, http://www.native-instruments.com/#/products/producer/

imaschine, 2012.[7] DM1, http://www.fingerlab.net/website/Fingerlab/DM1.html, 2012.[8] DJay, http://www.algoriddim.com/djay, 2012.[9] R. Etter, M. Specht, Melodious walkabout- - implicit navigation with

contextualized personal audio contents, in: Adjunct Proceedings of the ThirdInternational Conference on Pervasive Computing, p. 43.

[10] C. Stahl, The roaring navigator: a group guide for the zoo with shared auditorylandmark display, in: Proceedings of the 9th international conference onHuman computer interaction with mobile devices and services, MobileHCI ’07,ACM, New York, NY, USA, 2007, pp. 383–386.

[11] F. Heller, T. Knott, M. Weiss, J. Borchers, Multi-user interaction in virtual audiospaces, in: Proceedings of the 27th International Conference ExtendedAbstracts on Human Factors in Computing Systems, CHI EA ’09, ACM, NewYork, NY, USA, 2009, pp. 4489–4494.

[12] I. Laso-Ballesteros, P. Daras, User Centric Future Media Internet, EUCommission, September 2008.

[13] G. Varni, M. Mancini, G. Volpe, A. Camurri, A system for mobile active musiclistening based on social interaction and embodiment 16 (2010) 375–384.

[14] J.O. Borchers, W. Samminger, M.M ühlhäuser, Conducting a realistic electronicorchestra, in: Proceedings of the 14th annual ACM symposium on User

interface software and technology, UIST ’01, ACM, New York, NY, USA, 2001,pp. 161–162.

[15] E. Lee, I. Grüll, H. Kiel, J. Borchers, conga: a framework for adaptive conductinggesture analysis, in: Proceedings of the 2006 conference on New interfaces formusical expression, NIME ’06, IRCAM &#8212; Centre Pompidou, Paris, France,France, 2006, pp. 260–265.

[16] B. Bruegge, C. Teschner, P. Lachenmaier, E. Fenzl, D. Schmidt, S. Bierbaum,Pinocchio: conducting a virtual symphony orchestra, in: Proceedings of theinternational conference on Advances in computer entertainment technology,ACE ’07, ACM, New York, NY, USA, 2007, pp. 294–295.

[17] E. Lee, U. Enke, J. Borchers, L. de Jong, Towards rhythmic analysis of humanmotion using acceleration-onset times, in: Proceedings of the 7th InternationalConference on New Interfaces for Musical Expression, NIME ’07, ACM, NewYork, NY, USA, 2007, pp. 136–141.

[18] A. Camurri, C. Canepa, P. Coletta, B. Mazzarino, G. Volpe, Mappe per affettierranti: a multimodal system for social active listening and expressiveperformance, in: G. Volpe, A. Camurri (Eds.), Proceedings of the InternationalConference NIME 2008 - New Interfaces for Musical Expression, Casa Paganini,University of Genoa, 2008.

[19] M. Goto, Active music listening interfaces based on signal processing, in:Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE InternationalConference on, vol. 4, IEEE, pp. IV–1441.

[20] O. Pachet, Francois; Delerue, On-the-fly multi-track mixing, in: AudioEngineering Society Convention 109.

[21] F. Pachet, Creativity studies and musical interaction, in: I. Delige, G. Wiggins(Eds.), Musical Creativity: Current Research in Theory and Practice, PsychologyPress, 2004.

[22] L. Gaye, R. Mazé, L. Holmquist, Sonic city: the urban environment as a musicalinterface, in: Proceedings of the 2003 Conference on New Interfaces forMusical Expression, National University of Singapore, pp. 109–115.

[23] M. Östergren, O. Juhlin, Sound pryer: truly mobile joint music listening, in: TheFirst International Workshop on Mobile Music Technology, vol., 2004.

[24] A. Anttila, N. Design, Sonicpulse: exploring a shared music space, in: 3rdInternational Workshop on Mobile Music Technology.

[25] S. Card, T. Moran, A. Newell, The Psychology of Human-Computer Interaction,Lawrence Erlbaum Associates, 1983.

[26] A. Camurri, P. Coletta, G. Varni, S. Ghisio, Developing multimodal interactivesystems with eyesweb xmi, in: Proceedings of the 2007 Conference on NewInterfaces for Musical Expression (NIME07), ACM, 2007, pp. 302–305.

[27] J. Bonada, A. Loscos, M. Vinyes, Demixing commercial music productions viahuman-assisted time-frequency masking, in: Audio Engineering SocietyConvention 120.

[28] J. Allwood, Bodily communication - dimensions of expression and content, in:B. Granstrom, D. House, I. Karlsson (Eds.), Multimodality in Language andSpeech Systems, Kluwer Academic, 2002, pp. 7–26.

[29] P.E. Gallaher, Individual differences in nonverbal behavior: Dimensions ofstyle, Journal of Personality and Social Psychology 63 (1992) 133–145.

[30] H. Wallbott, Bodily expression of emotion, European Journal of SocialPsychology 28 (1998) 879–896.

[31] A. Camurri, B. Mazzarino, G. Volpe, Analysis of expressive gestures: Theeyesweb expressive gesture processing library, in: Gesture-BasedCommunication in Human–Computer Interaction, volume LNAI 2915,Springer Verlag, 2004, 460–467.

[32] M. Mancini, B. Mazzarino, Motion analysis to improve virtualmotionplausibility, in: H.v.W.A. Nijholt, A. Egges, G.H.W. Hondorp (Eds.),Proceedings of the 22nd Annual Conference on Computer Animation andSocialAgents, Amsterdam, The Netherlands.

[33] B. Mazzarino, M. Mancini, The need for impulsivity & smoothness: improvinghci by qualitatively measuring new high-level human motion features, in:Proceedings of the International Conference on Signal Processing andMultimedia Applications (IEEE sponsored), SIGMAP is part of ICETE -TheInternational Joint Conference on e-Business and Telecommunications,INSTICCPress, 2009.

[34] P. Heiser, J. Frey, J. Smidt, C. Sommerlad, P.M. Wehmeier, J. Hebebrand, H.Remschmidt, Objective measurement of hyperactivity, impulsivity, andinattention in children with hyperkinetic disorders before and aftertreatment with methylphenidate, European Child & Adolescent Psychiatry13 (2004) 100–104.

[35] J.L. Evenden, Varieties of impulsivity, Psychopharmacology 146 (1999) 348–361.

[36] A. Wilson, A. Bobick, J. Cassell, Recovering the temporal structure of naturalgesture, in: Proc. of the Second Intern. Conf. on Automatic Face and GestureRecognition.

[37] D. Glowinski, M. Mancini, A. Massari, Evaluation of the mobile orchestraexplorer paradigm, in: Proceedings of the 4th International ICST Conference onInteractive Entertainment, LNICST, Springer, 2012.