8
General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. Users may download and print one copy of any publication from the public portal for the purpose of private study or research. You may not further distribute the material or use it for any profit-making activity or commercial gain You may freely distribute the URL identifying the publication in the public portal If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Downloaded from orbit.dtu.dk on: Aug 30, 2021 New Music Interfaces for Rhythm-Based Retrieval Kapur, Ajay; McWalter, Richard Ian; Tzanetakis, George Published in: Proceedings of ISMIR 2005 Publication date: 2005 Document Version Publisher's PDF, also known as Version of record Link back to DTU Orbit Citation (APA): Kapur, A., McWalter, R. I., & Tzanetakis, G. (2005). New Music Interfaces for Rhythm-Based Retrieval. In Proceedings of ISMIR 2005: 6th International Conference on Music Information Retrieval (pp. 130-136)

New Music Interfaces for Rhythm-Based Retrieval · NEW MUSIC INTERFACES FOR RHYTHM-BASED RETRIEVAL Ajay Kapur University of Victoria 3800 Finnerty Rd. Victoria BC, Canada [email protected]

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: New Music Interfaces for Rhythm-Based Retrieval · NEW MUSIC INTERFACES FOR RHYTHM-BASED RETRIEVAL Ajay Kapur University of Victoria 3800 Finnerty Rd. Victoria BC, Canada ajay@ece.uvic.ca

General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

Users may download and print one copy of any publication from the public portal for the purpose of private study or research.

You may not further distribute the material or use it for any profit-making activity or commercial gain

You may freely distribute the URL identifying the publication in the public portal If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

Downloaded from orbit.dtu.dk on: Aug 30, 2021

New Music Interfaces for Rhythm-Based Retrieval

Kapur, Ajay; McWalter, Richard Ian; Tzanetakis, George

Published in:Proceedings of ISMIR 2005

Publication date:2005

Document VersionPublisher's PDF, also known as Version of record

Link back to DTU Orbit

Citation (APA):Kapur, A., McWalter, R. I., & Tzanetakis, G. (2005). New Music Interfaces for Rhythm-Based Retrieval. InProceedings of ISMIR 2005: 6th International Conference on Music Information Retrieval (pp. 130-136)

Page 2: New Music Interfaces for Rhythm-Based Retrieval · NEW MUSIC INTERFACES FOR RHYTHM-BASED RETRIEVAL Ajay Kapur University of Victoria 3800 Finnerty Rd. Victoria BC, Canada ajay@ece.uvic.ca

NEWMUSIC INTERFACES FOR RHYTHM-BASED RETRIEVAL

Ajay KapurUniversity of Victoria3800 Finnerty Rd.Victoria BC, Canada

[email protected]

Richard I. McWalterUniversity of Victoria3800 Finnerty Rd.Victoria BC, Canada

[email protected]

George TzanetakisUniversity of Victoria3800 Finnerty Rd.Victoria BC, Canada

[email protected]

ABSTRACTIn the majority of existing work in music information re-trieval (MIR) the user interacts with the system using stan-dard desktop components such as the keyboard, mouse orsometimes microphone input. It is our belief that movingaway from the desktop to more physically tangible waysof interacting can lead to novel ways of thinking aboutMIR. In this paper, we report on our work in utilizingnew non-standard interfaces for MIR purposes. One ofthe most important but frequently neglected ways of char-acterizing and retrieving music is through rhythmic infor-mation. We concentrate on rhythmic information both asuser input and as means for retrieval. Algorithms and ex-periments for rhythm-based information retrieval of mu-sic, drum loops and indian tabla thekas are described. Thiswork targets expert users such as DJs and musicians whichtend to be more curious about new technologies and there-fore can serve as catalysts for accelerating the adoption ofMIR techniques. In addition, we describe how the pro-posed rhythm-based interfaces can assist in the annotationand preservation of perfomance practice.

Keywords: user interfaces, rhythm analysis, con-trollers, live performance

1 INTRODUCTIONMusical instruments are fascinating artifacts. For thou-sands of years, humans have used all sorts of differentmaterials including wood, horse hairs, animal hides, andbones to manufacture a wide variety of musical instru-ments played in many different ways. The strong cou-pling of the musicians’ gestures with their instrumentswas taken for granted for most of music history until theinvention of recording technology which made it possibleto listen to music without the presence of performers andtheir instruments.

Permission to make digital or hard copies of all or part of thiswork for personal or classroom use is granted without fee pro-vided that copies are not made or distributed for profit or com-mercial advantage and that copies bear this notice and the fullcitation on the first page.

c�2005 Queen Mary, University of London

Music Information Retrieval (MIR) has the potentialof revolutionizing the way music is produced, archivedand consumed. However, most of existing MIR systemsare prototype systems developed in academia or researchlabs that have not yet been empbraced by the public. Oneof the possible reasons is that so far most of existing sys-tems have focused on two types of users: 1) users that haveknowledge of western common music notation such asmusicologists and music librarians or 2) “average users”who are not necessarily musically-trained. This bias isalso reflected in the selection of problems typically involv-ing collections of either western classical music or popularmusic.

In this paper, we explore the use of non-standard inter-action devices for MIR. These devices are inspired by ex-isting musical instruments and are used for both retrievaland browsing. They attempt to mimic and possibly lever-age the tangible interaction of performers with their in-struments. Therefore the target users are musicians andDJs which in our experience tend to be more curious aboutnew technologies and therefore can serve as catalysts foraccelerating the adoption of MIR techniques.

Although the main ideas we propose are applicable toany type of instrument-inspired interaction and music re-trieval tasks the focus of this paper is the use of rhyth-mic information both as user input and as means for re-trieval. Rhythm is fundamental in understanding music ofany type and provided us with a common thread behindthis work. The described interfaces are inspired by ex-isting musical instruments and have their origins in com-puter music performance. They utilize a variety of sensorsto extract information from the user. This information issubsequently used for browsing and retrieval purposes. Inaddition these sensor-enhanced instruments can be usedto archive performance-related information which is typi-cally lost in audio and symbolic representations. The inte-gration of the interfaces with MIR algorithms into a pro-totype system is described and experimental results usingcollections of drum loops, tabla thekas and music piecesof different genres are presented. Some reprsentative sce-narios are provided to illustrate how the proposed inter-faces can be used. It is our hope that these interfaces willextendMIR beyond the standard desktop/keyboard/mouseinteraction into new contexts such as practicing musicalinstruments, live performance and the dance hall.

130

Page 3: New Music Interfaces for Rhythm-Based Retrieval · NEW MUSIC INTERFACES FOR RHYTHM-BASED RETRIEVAL Ajay Kapur University of Victoria 3800 Finnerty Rd. Victoria BC, Canada ajay@ece.uvic.ca

2 RELATEDWORKThere are two main areas of related work: 1) non-standardtangible user interfaces for information retrieval and 2)rhythm analysis and retrieval systems.

Using sensor-based user interfaces for information re-trieval is a new and emerging field of study. The bricksproject (Fitzmaurice et al., 1995) at MIT is an early ex-ample of a graspable user interface being used to controlvirtual objects, such as objects in a drawing program.

One of the most inspiring interfaces for our work ismusicBottles, a tangible interface designed by HiroshiIshii and his team at theMITMedia Lab. In this work, bot-tles can be opened and closed to explore a music databaseof classical, jazz, and techno music (Ishii, 2004). Ishiielegantly describes in his paper his mother’s expertise in”everyday interaction with her familiar physical environ-ment - opening a bottle of soy sauce in the kitchen.” Histeam thus built a system that took advantage of this exper-tise, so that his mother could open a bottle and hear birdssinging to know that tomorrow would be a sunny, beau-tiful day, rather than having to use a mouse, keyboard tocheck the system online. This work combines a tangibleinterface with ideas from information retrieval. Similarlyin our work we try to use existing interaction metaphorsfor music information retrieval tasks.

The use of sensors to gather gestural data from a mu-sician has been used as an aid in the creation of real-timecomputer music performance. Examples of such systemsinclude: the Hypercello (Machover, 1992), and the digi-tized japanese drum Aobachi (Young and Fujinaga, 2004).

Also there has been some initial research on using in-terfaces with music information retrieval for live perfor-mance on stage. AudioPad (Patten et al., 2002) is an inter-face designed at MIT Media Lab, which combines the ex-pressive character of multidimensional tracking with themodularity of a knob-based interface. This is accom-plished by using embedded LC tags inside a puck-like in-terface which is tracked in two dimensions on a tabletop.It is used to control parameters of audio playback, act-ing as a new interface for the modern disk jockey. BlockJam (Newton-Dunn et al., 2003) is an interface designedby Sony Research which controls audio playback withthe use of 25 blocks. Each block has a visual displayand a button-like input for event driven control to con-trol functionality. Sensors within the blocks allow forgesture-based manipulation of the audio files. Researchersfrom Sony CSL Paris proposed SongSampler (Aucou-turier et al., 2004), a system which samples a song andthen uses a MIDI instrument to perform the samples fromthe original sound file.

There has also been some work in using rhythmic in-formation for MIR. The use of BeatBoxing, the art of vo-cal percussion, as a query mechanism for music informa-tion retrieval was proposed by Kapur et al. (2004a). A sys-tem which classified and automatically identified individ-ual beat boxing sounds, mapping them to correspondingdrum samples was developed. A similar concept was pro-posed by Nakano et al. (2004) in which the team createda system for voice percussion recognition for drum pat-tern retrieval. Their approach used onomatopoeia as theinternal representation of drum sounds which allowed for

a larger variation of vocal input with an impressive iden-tification rate. Gillet and Richard (2003) explore the useof the voice as a query mechanism in the different contextof Indian tabla music. A system forQuery-by-Rhythmwasintroduced by Chen and Chen (1998). Rhythm is stored asstrings turning song retrieval into a string matching prob-lem. The authors propose an L-tree data structure for effi-cient matching. The similarity of rhythmic patterns usinga dynamic programming approach is explored by Paulusand Klapuri (2002). A system for the automatic descrip-tion of drum sounds using a template adaption method isYoshii et al. (2004).

3 INTERFACESThe musical interfaces described in this paper are tan-gible devices used to communicate musical informationthrough gestural interaction. They enable a more richand musical interaction than the standard keyboard andmouse. These interfaces use modern sensor technologysuch as force sensing resistors, accelerometers, infrareddetectors, and piezoelectric sensors to measure various as-pects of the human-instrument interaction. Data is col-lected by an onboard microprocessor which converts thesensor data into a digital protocol for communicating withthe computer used for MIR. Currently, the MIDI messageprotocol is utilized.

3.1 Mercurial STC1000

The Mercurial STC1000 (shown in Figure 1 (A)) 1 usesa network of fiberoptic sensors to detect pressure as posi-tion on a two-dimensional plane. It has been designed bythe Mercurial Innovations Group. This device is a singetouch controller that directly outputs MIDI messages. Themapping to MIDI can be controlled by the user. The activepad area is 125mm X 100mm (5 X 4 inches).

3.2 Radio Drum

The Radio Drum/Baton, shown in Figure 1 (B), is oneof the oldest electronic music controllers (Mathews andSchloss, 1989). Built by Bob Boie and improved by MaxMathews, it has undergone a great deal of improvementin accuracy of tracking, while the user interaction has re-mained relatively constant. The drum generates 6 separateanalog signals that represent the current x, y, z positionof each stick. The radio tracking is based on measuringthe electrical capacitance between the coil at the end ofeach stick and the array of receiving antennas on the drum(one for each corner). The analog signals are convertedto MIDI messages by a microprocessor. The sensing sur-face measures approximately 375mm X 600mm (15 X 24inches).

3.3 ETabla

The traditional tabla are a pair of hand drums used toaccompany North Indian vocal and instrumental music.The Electronic Tabla Controller (ETabla) (Kapur et al.,

1http://www.thinkmig.com/stc1000.html

131

Page 4: New Music Interfaces for Rhythm-Based Retrieval · NEW MUSIC INTERFACES FOR RHYTHM-BASED RETRIEVAL Ajay Kapur University of Victoria 3800 Finnerty Rd. Victoria BC, Canada ajay@ece.uvic.ca

Figure 1: (A) Mercurial STC 1000 and (B) Radio Drum

Figure 2: (A) ETabla (B) ESitar (C) EDholak

2004b) (shown in Figure 2 (A)) is a custom built con-troller that uses force sensing resistors to detect differentstrokes, strike velocity and position. Though an acousti-cally quiet instrument, it adheres to traditional techniqueand form. The ETabla is designed to allow performers toleverage their skill to control digital information.

3.4 EDholak

The Electronic Dholak Controller (EDholak) (Kapuret al., 2004b) is another custom built controller usingforce sensing resistors and piezoelectric sensors to capturerhythmic gestures. The Dholak is an Indian folk drum per-formed by two players. Inspired by the collaborative na-ture of the traditional drum, the EDholak (shown in Figure2 (C)) is a two player electronic drum, where one musicianprovides the rhythmic impulses while the other providesthe tempo and controls the sound production of the instru-ment using a sensor-enhanced spoon. This interface, inaddition to sending sensor data, produces actual sound inthe same way as it’s traditional counterpart.

3.5 ESitar

The Electronic Sitar Controller (ESitar) (Kapur et al.,2004b) is a hyperinstrument designed using a variety ofsensor techniques. The Sitar is a 19-stringed, pumpkinshelled, traditional North Indian instrument. The ESitar(shown in Figure 2 (B)) modifies the acoustic sound ofthe performance using the gesture data deduced from thesensors, as well as serving as a real-time transcription toolthat can be used as a pedagogical device. The ESitar ob-tains rhythmic information from the performer by a forcesensing resistor placed under the right thumb, deducingstroke direction and frequency.

Figure 3: “Boom-chick” onset detection

4 RHYTHM ANALYSIS ANDRETRIEVAL

The proposed interfaces generate essentially symbolicdata. However our goal is to utilize this information tobrowse and retrieve from collections of audio signals.Therefore a rhythmic analysis front-end is used to con-vert audio signals to more structured symbolic representa-tions. This front-end is based on decomposing the signalinto different frequency bands using a Discrete WaveletTransform (DWT) similarly to the method described byTzanetakis and Cook (2002) for the calculation of BeatHistograms. The envelope of each band is then calculatedby using full wave rectification, low pass filtering and nor-malization. In order to detect tempo and beat strength theBeat Histogram approach is utilized.

In order to extract more detailed information we per-form what we term “boom-chick” analysis. The idea isto detect the onset of low frequency events typically cor-responding to bass drum hits and high frequency eventstypically corresponding to snare drum hits. This is ac-complished by onset detection using adaptive thresholdingand peak picking on the amplitude envelope of two of thefrequency bands of the wavelet transform (approximately300Hz-600Hz for the “boom” and 2.7KHz-5.5KHz for the“chick”). Figure 3 shows how a drum loop can be decom-posed into “boom” and “chick” bands and correspondingdetected onsets. Even though more sophisticated algo-rithms for tempo extraction and drum pattern detection,such as the ones mentioned in the related work section,have been proposed, the above approach worked quitewell and provide us with the necessary infrastructure forexperimenting with the new interfaces.

The onset sequences of “boom-chick” events can beconverted into a string representation for retrieval pur-poses. Once onset times are found, the follwoing two rep-resentations are created:Chick array --C---C-----C-C----Boom array: B---B-B---B---B---B

132

Page 5: New Music Interfaces for Rhythm-Based Retrieval · NEW MUSIC INTERFACES FOR RHYTHM-BASED RETRIEVAL Ajay Kapur University of Victoria 3800 Finnerty Rd. Victoria BC, Canada ajay@ece.uvic.ca

The next step is to combine the two representationsinto one. If a Bass and Snare occur at the same time, thenT represents B+C:

Combined Array: B-C-BCB---T-C-B---B

This composite representation array is then used tocreate a string combining the type of onset with dura-tions between each event. There are six types of transi-tions that are labeled with durations: BC, CB, TC, CT,BT. Durations are relative (similar to common music no-tation). They are calculated by picking the most commoninter-onset interval (IOI) using clustering and expressingall other IOI’s after quantization as ratios to the most com-mon one. Typically the most common IOI corresponds toeighth notes or quarter notes. This representation is invari-ant to tempo. The quantized IOIs form essentially a dic-tionary of possible rhythmic durations (an hierarchy of 5-6durations is typically adaquate). In order to represent the“boom-chick” events these durations are combined withthe 6 possible transitions to form an alphabet. For exam-ple a full beat string can be represented as:

{BC2,CB2,BC1,CB1,BT4,TC2,CB2,BB4}

In this string representation each triplet essentiallycorresponds to one character in the alphabet. These stringscan then be compared using standard approximate stringmatching algorithms such as Dynamic Programming. Al-though straightforward, this approach works quite well formusic with strong beats which is the focus of this work.

5 SCENARIOSIn this section, we illustrate using scenarios some of theways the proposed interfaces can be used for MIR. Thesescenarios have been implemented as proof-of-conceptprototypes and are representative of what is possible usingour system. They also demonstrate the interplay betweenannotation, retrieval and browsing that is made possibleby non-standard MIR interfaces. Initial reception by mu-sicians and DJs of our prototypes has been encouraging.

5.1 Tapping Tempo-based Retrieval

One of the most basic reactions to music is tapping. Anyof the proposed drum interfaces can be used to generate atempo-based query. Even in this simple fundamental in-teraction, being able to tap the rhythm using a stick or afinger on a surface is preferable to clicking a mouse but-ton or keyboard key. The query tempo can directly bemeasured and compared to a database of tempo-annotatedmusic pieces or drum loops. The annotation can be per-formed manually or automatically using audio analysis al-gorithms. Moreover, tempo annotation can also be doneusing the same process as the query specification. Tempo-based retrieval is also useful for the DJ, saving time by nothaving to thumb through boxes of vinyl, or scroll throughhundreds of mp3s for a song that is at a particular tempo.Tapping the tempo into an interface is more convenientbecause the DJ can be listening to a particular song andjust tap to find all files that match, rather than having tomanually beat match.

5.2 “Boom-Chick” Rhythmic Retrieval

A slightly more complex way of using rhythm-based in-formation and the proposed interfaces is what we term“boom-chick” retrieval. In this approach rhythmic in-formation is abstracted into a sequence of low fre-quency (bass-drum like) events and medium-high fre-quency (snare-drum like) events. Although simple, thisrepresentation captures the basic nature of many rhythmicpatterns especially in dance music.

The audio signals (drum loops, music pieces) in thedatabase are analyzed into “boom-chick” events usingthe rhythm-based analysis algorithms described in sec-tion 4. The symbolic sensor data is easier to convertinto a “boom-chick” representation. Using the EDho-lak interface is ideal for this application. The first mu-sician taps out a query beat. One piezo sensor representsa ”Boom” low event, while a separate piezo sensor rep-resents a ”Chick” high event. The query is matched withthe patterns in the database and the most similar resultsare returned. The second musician can then use the digitalspoon to browse and select the desired rhythm at a particu-lar tempo. Also time stretching techniques such as resam-pling and phasevocoding can also be used for changingthe tempo of the returned result using the spoon.

5.3 Rhythm-based Browsing

This scenario focuses on browsing a collection of drumsamples or musical pieces rather than retrieval. The RadioDrum or the STC-1000 can be used. One axis of the sur-face is mapped to tempo and the other axis can be mappedto some other attribute. We have experimented with au-tomatically extracted beat strength, genre and dance stylefor the second axis. The pressure (STC-1000) or the stickheight is used for volume control. With this system a DJcan find songs at a particular tempo and style just by plac-ing the finger or stick at the appropriate location. If thereare no files or drum loops at the appropriate tempo the sys-tem looks for the closest match and then uses time stretch-ing techniques such as overlap-add or phasevocoding toadjust the piece to the desired tempo. Sound is constantlyplaying providing direct feedback to the user. This methodprovides a tangible exploratory way of listening to collec-tions of music rather than the tedious playlist-play buttonmodel of existing music players.

5.4 Sensor-based Tabla Theka Retrieval

A more advanced scenario is to use the ETabla controllerto play a tabla pattern and retrieve a recorded sample, po-tentially played by a professional musician. This can beused during live performance and for pedagogical applica-tions. There are 8 distrinct tabla basic strokes detected bythe ETabla. The database is populated either by symbolicpatterns using the ETabla or by audio recordings analyzedsimilarly to Tindale et al. (2005). This approach can alsobe utilized to form queries if an ETabla is not available.One of our goals is to be able to automatically determinethe type of theka being played. Theka literally means cy-cle and we consider 4 types: TinTaal (16 beats/cycle),JhapTaal (10 b/c), Rupak (7 b/c), and Dadra (6 b/c).

133

Page 6: New Music Interfaces for Rhythm-Based Retrieval · NEW MUSIC INTERFACES FOR RHYTHM-BASED RETRIEVAL Ajay Kapur University of Victoria 3800 Finnerty Rd. Victoria BC, Canada ajay@ece.uvic.ca

5.5 Perforamance Annotation

One of the frequent challenges in developing audio anal-ysis algorithms is ground-truth annotation. In tasks suchas beat tracking, annotation is typically performed by lis-tening to the recorded audio signal. An interesting possi-bility enabled by sensor-enhanced interfaces is to directlyprovide the annotation while the music is being recorded.This is also important for preserving performance-relatedinformation which is typically lost in symbolic and audiorepresentations. Finally, even when listening to music af-ter the fact these interfaces can facilitate annotation. Forexample it is much easier for a tabla player to annotatea particular theka by simply playing along with it on theETabla rather than having to use the mouse or keyboard.

5.6 Automatic Tabla Accompaniment Generationfor the ESitar

The closest scenario to an interactive performance systemis based on the ESitar controller. In the playing of thesitar rhythm information is conveyed by the direction andfrequency of the stroke. The thumb force sensor on theESitar controller is used to capture this rhythmic infor-mation creating a query. The query is then matched into adatabase in order to to provide an automatically-generatedtabla accompaniment. The accompaniment is generatedby matching the rhythmic information into a database con-taining variations of different thekas. We hope to use thisprototype system in the future as a key component of livehuman-computer music performances.

6 EXPERIMENTSIn order for the rhythm-based matching using dynamicprogramming the detected “boom-chick” onsest must beaccurate. A number of experiments were conducted to de-termine the accuracy of the ”Boom-Chick” detection al-gorthm described in section 4. The data consists of audiotracks with strong beats which is the focus of this work.

6.1 Data Collection

Three sample data sets were collected and utilized. Theyconsist of techno beats, tabla thekas and music clips.The techno beats and tabla thekas were recorded usingDigiDesign Digi 002 ProTools at a sampling rate of 44100Hz. The techno beats were gathered from Dr. Rex in Pro-pellerheads Reason. Four styles (Dub, House, Rhythm &Blues, Drum & Bass) were recorded (10 each) at a tempoof 120 BPM. The tabla beats were recorded with a pairof AKG C1000s to obtain stereo separation of the differ-ent drums. Ten of each of four ”thekas” (meaning beatsper cycle) were recorded (Tin Taal Theka (16), Jhaap TaalTheka (10), Rupak Theka (7), Dadra Theka (6)). The mu-sic clips were downsampled to 22050 Hz and consist ofjazz, funk, pop/rock and dance music with strong rhythm.A large collections of similar composition was used fordeveloping the prototype systems used in the scenarios.

Figure 4: Tabla theka experimental results

6.2 Experimental results

The evaluation of the system was performed by compara-tive testing between the actual and detected beats by twodrummers. After listening to each track, false positive andfalse negative drum hits were detected seperately for eachtype (“boom” and “chick”). False positives are the set ofinstances in which a drum hit was detected but did notactually occur in the original recording. False negativesare the set of instances where a drum hit occurs in theoriginal recording but is not detected automatically by thesystem. In order to determine consistency in annotation,five random samples from each dataset were analyzed byboth drummers. The results were found to be consistent.

The results are summarized using the standard preci-sion and recall measures. Precision measures the effec-tiveness of the algorithm by dividing the number of cor-rectly detected hits (true positives) by the total number ofdetected hits (true positives + false positives). Recall rep-resents the accuracy of the algorithm by dividing the num-ber of correctly detected hits (true positives) by the totalnumber of actual hits in the original recording (false neg-atives + true positive). Recall can be improved by low-ering precision and vice versa. A common way to com-bine these two measures is the so called F-measure de-fined as (P is precision, R is recall and higher values ofthe F-measure indicate better retrieval performance):

F =2 ⇤ P ⇤ R

P + R(1)

In our first experiment, the accuracy of our algorithmon the Reason drum loops was tested. As seen in figure 5House beats have almost 99% F-measure accuracy. Thisis explained by the fact that house beats generally have asimple bass pattern of one hit on each downbeat. For bassdrum detection the hardest style was Rhythm & Blues.This can be explained by the largest number of bass hits,which were often located close to each other. The snaredrum detection worked well indepenedently of style. Oneproblem we noticed was that some bass hits would be de-tected as snare hits as well.

134

Page 7: New Music Interfaces for Rhythm-Based Retrieval · NEW MUSIC INTERFACES FOR RHYTHM-BASED RETRIEVAL Ajay Kapur University of Victoria 3800 Finnerty Rd. Victoria BC, Canada ajay@ece.uvic.ca

Figure 5: Beat loop experimental results

Figure 6: Overall results

The second experiment was conducted on the Tablarecordings. This time, instead of detecting bass and snarehits, ”ga” stroke ( the lowest frequency hit on the bayandrum) and ”na” and ”ta” strokes (high frequency hits onthe dahina drum) (Kapur et al., 2004b) are detected. Fromfigure 4 it can be seen that the “ga” stroke was detectedwith high accuracy compared to the dahina strokes.

The final experiment consisted of the analysis of 15music tracks separated into 3 subgenres; Dance, Funk,and Other. The Dance music results were fairly incon-sistent, with a range from 40 to 100% for recall and preci-sion. The bass drum hits were overall more accurate thanthose found for the snare hits. The Funk music was moreconsistent, though held the same overall accuracy as theDance music. The final category, Other, which consistedof Rock, Pop and Jazz tracks was more dependent on theindividaul track than the genre. If a pronounced bass andsnare drum was present, the algorithm was quite success-ful in detection. The accuracy of these results was signifi-cantly less than those found in Funk and Jazz. As seen infigure 6 the algorithm did not work as well on the thesemusic files as it did on the beats and tabla datasets. This isdue to the interference of voices, guitars, saxophones, etc.

Table 1: “Chick” hit detection resultsCategory Recall Precision F-measureRnb 0.844 0.878 0.861Dnb 0.843 0.891 0.866Dub 0.865 0.799 0.831Hse 0.975 0.811 0.886Average 0.882 0.845 0.861Dadra 0.567 1.000 0.723Rupak 0.662 1.000 0.797Jhaptaal 0.713 1.000 0.833Tintaal 0.671 0.981 0.727Average 0.653 0.995 0.787Various 0.699 0.554 0.618Dance 0.833 0.650 0.730Funk 0.804 0.621 0.701Average 0.779 0.609 0.683

Table 2: “Boom” hit detection resultsCategory Recall Precision F-measureRnb 0.791 0.956 0.866Dnb 0.910 0.914 0.912Dub 0.846 0.964 0.901Hse 0.967 0.994 0.980Average 0.879 0.957 0.915Dadra 0.933 0.972 0.952Rupak 1.000 0.763 0.865Jhaptaal 0.947 0.981 0.963Tintaal 0.843 0.965 0.900Average 0.931 0.920 0.920Various 0.745 0.803 0.773Dance 0.823 0.864 0.743Funk 0.863 0.820 0.841Average 0.810 0.829 0.819

7 SYSTEM INTEGRATIONIn order to integrate the interfaces with the music retrievalalgorithms and tasks we developed IntelliTrance an ap-plication written using the MARSYAS 2 which is a soft-ware framework for prototyping audio analysis and syn-thesis applications. The graphical user interface is writ-ten using the Qt toolkit 3. IntelliTrance is based on aninterface metaphor of a DJ console as shown in Figure7. It introduces a new level of DJ control and function-ality for the digital musician. Based on the standard 2-turntable design, this software driven system operates onthe principles of music information retrieval. The user hasthe ability to analyze a library of audio files and retrieveany sound via an array of preset MIDI interfaces includ-ing the ones described in this paper. The functionality ofIntelliTrance can be found in the main console window.The console offers 2 decks, consisting of 4 independentlycontrolled channels with load function, volume, mute andsolo. Each deck has a master fader for volume control,cross fader to control the their amplitude relationship, and

2http://marsyas.sourceforge.net3http://www.trolltech.com/products/qt

135

Page 8: New Music Interfaces for Rhythm-Based Retrieval · NEW MUSIC INTERFACES FOR RHYTHM-BASED RETRIEVAL Ajay Kapur University of Victoria 3800 Finnerty Rd. Victoria BC, Canada ajay@ece.uvic.ca

Figure 7: IntelliTrance Graphical User Interface

a main fader for the audio output. The MIR portion ofeach track allows for the retrieval of audio files for pre-view. Once the desired audio file is found from the re-trieval, it can be sent to the track for looped playback.IntelliTrance offers save functionality to store the audiotracks and levels of the current session and load the samesettings at a later date.

8 CONCLUSIONS AND FUTUREWORKThe experimental results show an overall high accuracyfor the analysis of audio samples for selected tracks usingthe “Boom-Chick” algorithm. IntelliTrance has a strongfocus on music with a pronounced beat therefore theseexperimental results demonstrate the potential of our ap-proach. The proposed interfaces enable new ways of inter-action with music retrieval systems that leverage existingmusic experience. It is our hope that such MIR interfaceswill be commonplace in the future.

There are various directions for future work. Integrat-ing more interfaces into our system such as BeatBoxing(Kapur et al., 2004a) is an immediate goal. In addition,we are building a custom controller for user interactionwith the IntelliTrance GUI. Another interesting possibil-ity is the integration of a library of digital audio effects andsynthesis tools into the system, to allow more expressivecontrol for musicians. We are also working on expandingour system to allow it to be used in media art installa-tions where sensor-based environmental data can informretrieval of audio and video.

ACKNOWLEDGEMENTSWe would like to thank the student of the CSC 484 MusicInformation Retrieval course at University of Victoria fortheir ideas, support and criticism during the developmentof this project. A special thanks to Stuart Bray for hishelp with QT GUI design. Thanks to Peter Driessen andAndrew Schloss for their support.

REFERENCESJ. J. Aucouturier, F. Pachet, and P. Hanappe. From soundsampling to song sampling. In Proc. Int. Conf. on MusicInformation Retrieval (ISMIR), Barcelona, Spain, 2004.

J. Chen and A. Chen. Query by rhythm: An approach forsong retrieval in music databases. In Proc. Int. Work-shop on Research Issues in Data Engineering, 1998.

G. W. Fitzmaurice, H. Ishii, and W. Buxton. Bricks: Lay-ing the foundations for graspable user interfacesb. InProc. Human Factors in Computer Systems, 1995.

O. Gillet and G. Richard. Automatic labelling of tablasymbols. In Proc. Int. Conf. on Music Information Re-trieval (ISMIR), Baltimore, USA, 2003.

H. Ishii. Bottles: A transparent interface as a tribute tomark weiser. IEICE Transactions on Information andSystems, E87-D(6), 2004.

A. Kapur, M. Benning, and G. Tzanetakis. Query-by-beatboxing: Music information retrieval for the dj. InProc. Inter. Conf. on Music Information Retrieval (IS-MIR), Barcelona, Spain, 2004a.

A. Kapur, P. Davidson, P. Cook, P. Driessen, andA. Schloss. Digitizing north indian performance. InProc. Inter. Computer Music Conf. (ICMC), Miami,Florida, 2004b.

T. Machover. Hyperinstruments: A progress report. Tech-nical report, MIT, 1992.

M. Mathews and W. A. Schloss. The radio drum as asynthesizer controller. In Proc. Inter. Computer MusicConf. (ICMC), 1989.

T. Nakano, J. Ogata, M. Goto, and Y. Hiraga. A drumpattern retrieval method by voice percussion. In Proc.Inter. Conf. on Music Information Retrieval, Barcelona,Spain, 2004.

H. Newton-Dunn, H. Nakono, and J. Gibson. Block jam:A tangible interface for interactive music. In Proc. NewInterfaces for Musical Expression(NIME), 2003.

J. Patten, B. Recht, and H. Ishii. Audiopad: A tag-basedinterface for musical performance. In Proc. New Inter-faces for Musical Expression (NIME), 2002.

J. Paulus and A. Klapuri. Measuring the similarity ofrhythmic patterns. In Proc. Inter. Conf. on Music In-formation Retrieval (ISMIR), Paris, France, 2002.

A. R. Tindale, A. Kapur, W. A. Schloss, and G. Tzane-takis. Indirect acquisition of percussion gestures usingtimbre recognition. In Proc. Conf. on InterdisciplinaryMusicology (CIM), Montreal, Canada, 2005.

G. Tzanetakis and P. Cook. Musical Genre Classificationof Audio Signals. IEEE Trans. on Speech and AudioProcessing, 10(5), July 2002.

K. Yoshii, M. Goto, and H. Okuno. Automatic drumsound description for real world music using templateadaption and matching methods. In Proc. Int. Conf. onMusic Information Retrieval (ISMIR), 2004.

D. Young and I. Fujinaga. Aobachi: A new interface forjapanese drumming. In Proc. New Interfaces for Musi-cal Expression (NIME), Hamamatsu, Japan, 2004.

136