Multilevel Auditory Displays for Mobile Eyes-Free Location ...yolanda/homepage_publications/p1567...User interaction with location-based information in a ex-ploratory indoor mobile

Multilevel Auditory Displays forMobile Eyes-Free Location-BasedInteraction

Yolanda Vazquez-AlvarezGISTSchool of Computing ScienceUniversity of Glasgow,Glasgow G12 8QQ, [email protected]

Rocio von JungenfeldEdinburgh College of ArtUniversity of EdinburghEdinburgh EH8 9DF, [email protected]

Matthew P. AylettCereProc, Ltd.University of EdinburghEdinburgh EH8 9LE, [email protected]

Antti VirolainenNokia Research CentreHelsinki, [email protected]

Stephen A. BrewsterGISTSchool of Computing ScienceUniversity of GlasgowGlasgow G12 8QQ, [email protected]

Permission to make digital or hard copies of part or all of this work forpersonal or classroom use is granted without fee provided that copies are notmade or distributed for profit or commercial advantage and that copies bearthis notice and the full citation on the first page. Copyrights for third-partycomponents of this work must be honored. For all other uses, contact theOwner/Author. Copyright is held by the owner/author(s).CHI 2014 , Apr 26 - May 01 2014, Toronto, ON, CanadaACM 978-1-4503-2474-8/14/04.http://dx.doi.org/10.1145/2559206.2581254

AbstractThis paper explores the use of multilevel auditory displaysto enable eyes-free mobile interaction with location-basedinformation in a conceptual art exhibition space. Multi-level auditory displays enable user interaction with con-centrated areas of information. However, it is necessaryto consider how to present the auditory streams withoutoverloading the user. We present an initial study in whicha top-level exocentric sonification layer was used to adver-tise information present in a gallery-like space. Then, in asecondary interactive layer, three different conditions wereevaluated that varied in the presentation (sequential ver-sus simultaneous) and spatialisation (non-spatialised ver-sus egocentric spatialisation) of multiple auditory sources.Results show that 1) participants spent significantly moretime interacting with spatialised displays, 2) there wasno evidence that a switch from an exocentric to an ego-centric display increased workload or lowered satisfaction,and 3) there was no evidence that simultaneous presen-tation of spatialised Earcons in the secondary display in-creased workload.

Author KeywordsEyes-free interaction; auditory displays; spatial audio

ACM Classification KeywordsH.5.2 [User Interfaces]: Interaction styles, evaluation.

Work-in-Progress CHI 2014, One of a CHInd, Toronto, ON, Canada

1567

IntroductionAudio-Augmented Reality (AAR) enables users to inter-act with location-based information purely through soundwhile on the move. This is particularly useful when theusers’ visual attention is already being compromised byreal visual objects in the surrounding environment. Con-sider the following scenario: There is a conceptual artexhibition in London and art lover, David, has arranged avisit with his friend Rocio. Before they enter the gallery,they download an application onto their mobile phonethat will enable them to listen to information about theart pieces using their headphones while walking aroundthe exhibition. As they get close to an audio-augmentedlocation, different sounds allow users to browse the audioinformation available. This varies between comments leftby visitors, the artist herself and an art critic. At one ar-tifact, Rocio selects a comment left by a previous visitorthat says the piece reminds him of a circulatory system.David selects a comment left by the artist, which de-scribes how the frame squeezes wool of different coloursto contrast the 2D nature of the photo frame with the3D element of materials. David and Rocio have a livelydiscussion based on these comments. They agree thatcomments provided by the artist helped them appreci-ate the ideas in the work, while the opinions left by othervisitors mentioned things they would never thought ofthemselves. Overall, the result is a personalised museumexperience, which has responded to the individual user in-terests and encouraged them to appreciate and enjoy theart work in more depth in their own way.

As illustrated in our example, location-based informationcan be presented using AAR. When using such an eyes-free auditory interface, each location being augmentedrequires the use of an audio stream which means it maybe necessary to discriminate between them, especially

when they could overlap if locations are close to eachother. Spatial audio has been used in previous research[8, 12] as a successful technique to segregate multiple au-dio streams by placing each audio stream at a differentlocation around the user’s head, mirroring how humansperceive sounds in real life. When designing auditory dis-plays for mobile audio-augmented environments, choiceshave to be made on both the audio presentation and thespatial arrangement of the audio streams. A multilevelauditory display allows concentrated areas of informationto be structured in a location-based system. However,given a top-level spatial auditory display, should a sec-ondary display also be spatialised and if so, how? Shouldinformation be provided sequentially or simultaneously?While simultaneous presentation is important to create arich immersive audio environment, high levels of workloadmay affect exploration and selection between differentlocations and also the exploration and selection of thevarious amounts of information provided at each location.

Background WorkIn 1993 AAR was proposed as the action of superimpos-ing virtual sound sources upon real world objects [4]. Thekey idea is that users can explore an acoustic virtual en-vironment augmenting a physical space solely by listeningas they walk [6, 12]. The majority of indoor AAR systemshave been developed for museums, exhibitions or historicsites in order to replace linear keypad-based audio tourguides that constrained users to a linear access to infor-mation and could pull the visitor’s attention away fromthe actual exhibits and disturb the overall user experience.Bederson’s automated tour guide [1] was an early exam-ple of an exploratory non-linear playback system. Sincethis early prototype, indoor AAR applications have grownin complexity providing a much greater amount of infor-mation about the audio-augmented locations [13, 6].


1568

User interaction with location-based information in a ex-ploratory indoor mobile AAR application usually takesplace within a proximity and an activation zone [9]. The

Figure 1: Experimental setup.Top: 1) IR tag, 2) JAKE sensor(both mounted on headphones),3) SHAKE SK6 sensor pack and4) mobile device. Bottom: Detailof SHAKE SK6 navigation switchinterface.

Figure 2: Schematic of thesystem architecture.

proximity zone advertises an audio-augmented locationand the activation zone presents more detailed informa-tion. The amount of information presented within the ac-tivation zone in previous systems has varied greatly, fromone [1] to multiple [13, 6] audio streams. As the amountof information presented increases, more complex audi-tory displays are required. Spatial audio techniques enableuser interaction with multiple audio streams by providingorientational information that aids segregation and atten-tion switching between the audio streams [10]. However,how should a spatial auditory display be designed in orderto support increasing amounts of information? Previouswork has used spatial audio to design mobile auditory dis-plays presenting information sequentially in an egocentric[5] or exocentric [11] display; or simultaneously in an ego-centric [8] or exocentric display [3].

However, the usability of these designs has not been com-pared against each other as part of an interactive audio-only environment. In an egocentric display, elements arealways in a fixed position relative to the user, which canbe particularly useful for mobile users as changes in ori-entation when moving are inevitable. In an exocentricdisplay, on the other hand, display element positions haveto be updated real-time according to the user orientationas they appear to be fixed to the world. This can aid usernavigation but can be computationally intensive for a mo-bile device. Previous research has shown that interactingwith an egocentric display when mobile is faster but moreerror prone than an exocentric display [7] and that simul-taneous presentation in an exocentric display allows forfaster user interactions [3]. However, to our knowledge,no previous research has compared the use of combined

ego- and exocentric designs within the same spatial hi-erarchical auditory display. This paper presents an initialevaluation of a complex multilevel spatial auditory displayincluding egocentric and exocentric designs and sequen-tial and simultaneous presentation, tested within a mobileAAR environment.

Audio-augmented Art ExhibitionA conceptual art exhibition was used as the setting for allconditions in this study. The virtual audio environmentwas run on a Nokia N95 8GB and the built-in head re-lated transfer functions (HRTFs) were used to positionthe audio sources. User position was determined using anInfrared (IR) camera tracking an IR tag powered by a 9Vbattery and mounted on top of a pair of headphones. Co-ordinate information was fed to the mobile phone over anInternet connection and was used to activate the zonesassociated with the art pieces.

User orientation was determined using a JAKE1 sensorpack connected to the mobile phone via Bluetooth. Novisual aids were provided on the mobile device and, toensure a full eyes-free experience, the phone was placedon a lanyard around the user’s neck (see Figure 1 - top).A SHAKE SK6 sensor pack2 was held the user. It wasalso connected via Bluetooth and the navigation switchprovided input. This navigation switch allowed users toactivate, browse, select and deactivate audio content (seeFigure 1 - bottom). The audio was played over a pair ofDT431 Beyerdynamic open headphones with the aim toreduce the isolation of the listener from the surroundingenvironment. The IR tag and JAKE sensor were placedin the middle of the headphone’s headband. Figure 2shows the final system setup. The augmented exhibits

1http://code.google.com/p/jake-drivers2http://code.google.com/p/shake-drivers


1569

consisted of four different art pieces from the Weavingthe City project (www.weavingthecity.eu) situated in anexhibition space measuring 3m by 3.85m see Figure 3).

Figure 3: Illustration of thetop-level sonification layershowing the location of thecircular proximity (radius 1.25m)and activation zones (radius0.75m). Small squares with a dotat its centre identify art piecesplaced on a table.

Top-level sonification layerThis layer used an exocentric design to present a chat-tering voices sound within the proximity zone advertis-ing content about each art piece. The sound increasedin volume when the user reached the activation zone inwhich a secondary interactive layer could then be acti-vated/deactivated (see Figure 3 for an illustration of thesetup) by pressing down (long press, > 2 secs) on theSHAKE navigation switch (see Figure 1 - bottom).

Secondary interactive layerFor each art piece, different Earcons were used in an au-dio menu. Earcons provide an abstract and symbolic re-lationship between the sounds and the information theyare representing [2]. The Earcons identified 1-3 differ-ent audio menu items as follows: “water waves”: for theartist’s comments, “open crackling fire”: for positive non-expert reviews, and “stormy wind”: for negative (cold)non-expert reviews. To select one of the menu items theSHAKE navigation switch was pressed down (short press< 2 secs). Once the menu item was selected, an approx.25 secs long audio clip was played containing the com-ment or review. User interaction with the audio menuitems varied for the different experimental conditions:

Baseline: Each Earcon was always played sequentiallyat each push of the navigation switch either right or left.There was no spatialisation of the audio items so theyseemed to originate from within the user’s head. The aimwas to recreate a traditional audio guide style interactionin which users triggered the audio content by the press ofa button in a sequential order (see Figure 4a).

Egocentric Sequential: Each Earcon was presented in a

radial menu (virtually located around the user’s head tothe right, left or in front of the user’s nose) and playedone at a time when selected by pushing the navigationswitch for the sequential presentation group. Selectionwas performed by pushing the navigation switch eitherright or left and the Earcons were located at 0◦, -90◦ and+90◦ azimuth (see Figure 4b).

Egocentric Simultaneous: Similar to the egocentric se-quential condition except that all of the Earcons wereplayed simultaneously. When a menu item was selected,the volume increased for that selected item to bring itinto focus and decreased for the rest (see Figure 4c).

StudyThirty-two participants (21 males, 11 females, aged 18to 39 years) were recruited, all were studying or workingat the University. They all reported normal hearing, wereright-handed and paid £6 for participation, which lastedjust over an hour.

Experimental Design and MeasuresParticipants were split equally into two groups: sequen-tial and simultaneous presentation, in a between-subjectsdesign. The baseline condition was used as a control inboth groups and the order of conditions was randomised.Three dependent variables were analysed: perceived work-load, overall user satisfaction and time spent while in-teracting with the secondary display. In addition, userlocation coordinates and head orientation data were alsocollected to investigate user behaviour. The following hy-potheses were tested:

H1: A spatialised secondary display will increase explo-ration. We define increased exploration as usersspending significantly more time without a signif-icant drop in user satisfaction or a significant in-crease in perceived workload.


1570

www.weavingthecity.eu

H2: Changing between the exocentric and the egocentriclayers will increase perceived workload.

H3: Simultaneous presentation of Earcons in the ego-centric secondary interactive layer will increase per-ceived workload.

Proximity zone Ac#va#on zone

Interactive layer

water wind fire

a Baseline 0°

water -90° 90°

fire

wind

b Egocentric sequential

0°

water -90° 90°

fire

wind

c Egocentric simultaneous

Figure 4: Schematic of theinteractive auditory displaystested.

Figure 5: Mean time takeninteracting with the secondaryinteractive layer per conditionand presentation group. Baselinewas used as a control for bothegocentric conditions.Total audiocontent: 9 audio clips x 25 secs= 225 secs. Error bars showStandard Error of Mean ± 2.0.

ProcedureThe experiment started with a training session beforethe test conditions to familiarise the participant with themulti-level auditory displays around one of the art piecesin the exhibition space. For each test condition, partici-pants were asked to explore the exhibition space and findas much information as possible about the art pieces byinteracting with the different auditory displays. Partici-pants were given a maximum of 10 minutes of explorationtime for each test condition. There was no minimumtime and participants could choose to stop whenever theywanted. All the participants had time to explore the artpieces in the allocated time. After each test condition,participants were asked to complete a NASA-TLX sub-jective workload assessment, a satisfaction questionnairethat was modified from that used in Wakkary and Hatala[13] and also provide some informal feedback.

ResultsTo test hypotheses 2 and 3, a two-way mixed-designANOVA was performed on the overall perceived work-load and overall user satisfaction mean scores with con-dition type as a within-subjects factor and presentationgroup as a between-subjects factor. Overall workload wascalculated across all six subscales to a maximum of 120mean score and overall user satisfaction was calculatedas an average over 62 satisfaction questions rated on afive-point scale where 5 was ’best’. No significant resultsare reported for the different conditions, Baseline versusEgocentric display, or for Simultaneous versus Sequentialpresentation; or for interactions between these variables,

either for perceived workload or overall satisfaction. Thus,hypotheses 2 and 3 can be rejected. Across all conditionsand presentations mean overall user satisfaction was highat 3.9/5 and mean overall workload was low at 32/120.

A two-way mixed-design ANOVA showed significantlymore time was spent in the spatialised Egocentric con-ditions than in the baseline condition (F (1,30)=8.21,p=0.008), see Figure 5. As there was no evidence thatspatialisation increased perceived workload, extra timespent can be attributed to an increase in user explorationand therefore hypothesis 1 can be accepted.

The logged user behaviour data showed a much simplerpattern of exploration for participants in the Baselinethan in the Egocentric conditions. Figure 6 (top) showsan example of one of the participants in the Baseline con-dition walking in a straighter trajectory between the artpieces and then staying mainly stationary once the sec-ondary interactive layer was activated. Figure 6 (bottom)shows the same participant taking more time to explorearound the art pieces once the secondary interactive layerwas activated in the Egocentric simultaneous condition.

Conclusions and Future WorkThis paper investigated the usability of complex spatialauditory displays designed to enable user interactions withconcentrated areas of information in an exploratory mo-bile audio-augmented reality environment. Both egocen-tric and exocentric designs were combined in the mul-tilevel auditory display to test whether these 3D audiotechniques would encourage an exploratory behaviour. In-formal feedback suggests that the egocentric secondaryinteractive layer allowed for a more exploratory expe-rience. Although overall the baseline condition was re-ported as “easy to use”, it was also found to be “less im-


1571

mersive”. Users liked the control over the interaction with

Figure 6: Route taken aroundthe art pieces (0-3) by oneparticipant from thesimultaneous presentation groupin the Baseline (top) andEgocentric (bottom) conditions.Solid red and blue lines showpath of exploration in thesonification layer and thesecondary interactive layerrespectively. Short splinesillustrate the participant’s headdirection every 0.5 second.

the location-based information provided by the egocen-tric design and remarked on how this “spatialised inter-face was more fun than simply scrolling through sounds”.Performance results supported user feedback and helpedcharacterise the egocentric secondary display as an ex-ploratory experience, where interaction times increasewithout an increase in workload and a decrease of usersatisfaction. The egocentric design performed well acrosspresentation type with no evidence that the transitionfrom the top-level exocentric layer into the egocentricsecondary interactive layer had a negative impact.

These results show that spatial audio can encourage animmersive experience and an exploratory behaviour. Fu-ture work will investigate more closely how this sense ofimmersion could be further encouraged, for example withthe use of an exocentric secondary display, and how suchan immersive experience might be shared across multi-ple users interacting with the same display. We believethat a deeper understanding of the extent to which noveland more complex auditory displays can impact the userexperience will allow designers to make more informeddecisions when designing eyes-free auditory interfaces formobile audio-augmented reality environments.

References[1] Bederson, B. B. Audio augmented reality: A proto-

type automated tour guide. In CHI’95, vol. 2 (1995),210–211.

[2] Blattner, M. M., Sumikawa, D. A., and Greenberg,R. M. Earcons and icons: Their structure and com-mon design principles. Human-Computer Interaction4, 1 (1989), 11–44.

[3] Brewster, S. A., Lumsden, J., Bell, M., Hall, M.,and Tasker, S. Multimodal ’eyes-free’ interaction

techniques for wearable devices. In CHI’03 (2003),463–480.

[4] Cohen, M., Aoki, S., and Koizumi, N. Augmentedaudio reality: Telepresence/vr hybrid acoustic envi-ronments. In 2nd IEEE Intern. Wrkshp. on Robotand Human Comm. (1993), 361–364.

[5] Dicke, C., Wolf, K., and Tal, Y. Foogue: eyes-free interaction for smartphones. In MobileHCI’10(2010), 455–458.

[6] Eckel, G. Immersive audio-augmented environments:The listen project. In IV’01, IEEE Computer SocietyPress (2001), 571–573.

[7] Marentakis, G., and Brewster, S. Effects of feedback,mobility and index of difficulty on deictic spatial au-dio target acquisition in the horizontal plane. InCHI’06 (2006), 359–368.

[8] Sawhney, N., and Schmandt, C. Nomadic radio:Speech and audio interaction for contextual messag-ing in nomadic environments. Trans. on Computer-Human Interaction 7, 3 (2000), 353–383.

[9] Stahl, C. The roaring navigator: A group guide forthe zoo with a shared auditory landmark display. InMobileHCI’07 (2007), 282–386.

[10] Stifelman, L. The cocktail party effect in auditory in-terfaces: a study of simultaneous presentation. Tech.rep., MIT Media Laboratory, 1994.

[11] Terrenghi, L., and Zimmermann, A. Tailored audioaugmented environments for museums. In IUI’04(2004), 334–336.

[12] Vazquez-Alvarez, Y., Oakley, I., and Brewster, S. A.Auditory display design for exploration in mobileaudio-augmented reality. Personal and Ubiq. Com-puting 16, 8 (2012), 987–999.

[13] Wakkary, R., and Hatala, M. Situated play in a tan-gible interface and adaptive audio museum guide.Personal and Ubiq. Comp. 11, 3 (2007), 171–191.


1572

IntroductionBackground WorkAudio-augmented Art ExhibitionTop-level sonification layerSecondary interactive layer

StudyExperimental Design and MeasuresProcedure

ResultsConclusions and Future WorkReferences

Documents

Multilevel Auditory Displays for Mobile Eyes-Free Location ...yolanda/homepage_publications/p1567...User interaction with location-based information in a ex-ploratory indoor mobile