10
Int J Digit Libr (2012) 12:149–158 DOI 10.1007/s00799-012-0085-0 Exact and approximate rhythm matching algorithms Joseph Wun-Tat Chan · Costas S. Iliopoulos · Spiros Michalakopoulos · M. Sohel Rahman Published online: 22 March 2012 © Springer-Verlag 2012 Abstract An interesting problem in music information retrieval is to classify songs according to rhythms. A rhythm is represented by a sequence of “Quick” ( Q) and “Slow” ( S) symbols, which correspond to the (relative) duration of notes, such that S = 2 Q. Christodoulakis et al. presented an efficient algorithm that can be used to classify musical sequences according to rhythms. In this article, the above algorithm is implemented, along with a naive brute force algorithm to solve the same problem. The theoretical time complexity bounds are analyzed with the actual running times achieved by the experiments, and the results of the two algorithms are compared. Furthermore, new efficient algo- rithms are presented that take temporal errors into account. This, the approximate pattern matching version, could not be handled by the algorithms previously presented. The running J. W.-T. Chan Department of Computer Science, The University of Hong Kong, Hong Kong, Hong Kong e-mail: [email protected] C. S. Iliopoulos Algorithm Design Group, Department of Computer Science, King’s College London, London, UK URL: www.dcs.kcl.ac.uk/research/groups/adg e-mail: [email protected] S. Michalakopoulos (B ) Media Net Software, Avenida del Partenón 10, 28042 Madrid, Spain URL: www.medianet.es e-mail: [email protected] M. S. Rahman AEDA Group, Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology (BUET), Dhaka, 1000, Bangladesh e-mail: [email protected] times of two algorithmic variants are analyzed and compared and examples of their implementation are shown. Keywords Music information retrieval · Pattern matching · Quick–slow · Rhythm 1 Introduction In [4], the problem of classification of a music sequence by rhythms was considered. The authors proposed a new frame- work for the identification of rhythms in a musical sequence and devised an efficient algorithm (the CIRS algorithm), for that task. In this article, the interest lies in two areas: 1. The practical performance of the above algorithm. 2. Extending it to take errors into account. The CIRS algorithm efficiently locates the maximum length substring of a music sequence t that can be “exactly” covered by a given rhythm r , i.e., with no temporal errors. First, the algorithm is implemented along a naive brute force algorithm to solve the same problem. The theoretical time complexity bounds are then analyzed with the actual running times achieved by the experiments and the results of the two algorithms are compared. Second, the model is extended to take errors into account. The definitions are broadened, allowing for temporal errors, while the focus continues to be on music of the ballroom dance genre. Here, the goal is to apply their work in a more practical setting by defining an approximate paradigm on top of their model. The motivation of this study follows from the fact that temporal errors often occur, either in the playing of the rhythm by a drummer or percussionist, or in the extraction 123

Exact and approximate rhythm matching algorithms

Embed Size (px)

Citation preview

Page 1: Exact and approximate rhythm matching algorithms

Int J Digit Libr (2012) 12:149–158DOI 10.1007/s00799-012-0085-0

Exact and approximate rhythm matching algorithms

Joseph Wun-Tat Chan · Costas S. Iliopoulos ·Spiros Michalakopoulos · M. Sohel Rahman

Published online: 22 March 2012© Springer-Verlag 2012

Abstract An interesting problem in music informationretrieval is to classify songs according to rhythms. A rhythmis represented by a sequence of “Quick” (Q) and “Slow”(S) symbols, which correspond to the (relative) duration ofnotes, such that S = 2Q. Christodoulakis et al. presentedan efficient algorithm that can be used to classify musicalsequences according to rhythms. In this article, the abovealgorithm is implemented, along with a naive brute forcealgorithm to solve the same problem. The theoretical timecomplexity bounds are analyzed with the actual runningtimes achieved by the experiments, and the results of the twoalgorithms are compared. Furthermore, new efficient algo-rithms are presented that take temporal errors into account.This, the approximate pattern matching version, could not behandled by the algorithms previously presented. The running

J. W.-T. ChanDepartment of Computer Science, The University of Hong Kong,Hong Kong, Hong Konge-mail: [email protected]

C. S. IliopoulosAlgorithm Design Group, Department of Computer Science,King’s College London, London, UKURL: www.dcs.kcl.ac.uk/research/groups/adge-mail: [email protected]

S. Michalakopoulos (B)Media Net Software, Avenida del Partenón 10,28042 Madrid, SpainURL: www.medianet.ese-mail: [email protected]

M. S. RahmanA�EDA Group, Department of Computer Science and Engineering,Bangladesh University of Engineering and Technology (BUET),Dhaka, 1000, Bangladeshe-mail: [email protected]

times of two algorithmic variants are analyzed and comparedand examples of their implementation are shown.

Keywords Music information retrieval ·Pattern matching · Quick–slow · Rhythm

1 Introduction

In [4], the problem of classification of a music sequence byrhythms was considered. The authors proposed a new frame-work for the identification of rhythms in a musical sequenceand devised an efficient algorithm (the CIRS algorithm), forthat task. In this article, the interest lies in two areas:

1. The practical performance of the above algorithm.2. Extending it to take errors into account.

The CIRS algorithm efficiently locates the maximumlength substring of a music sequence t that can be “exactly”covered by a given rhythm r , i.e., with no temporal errors.

First, the algorithm is implemented along a naive bruteforce algorithm to solve the same problem. The theoreticaltime complexity bounds are then analyzed with the actualrunning times achieved by the experiments and the results ofthe two algorithms are compared.

Second, the model is extended to take errors into account.The definitions are broadened, allowing for temporal errors,while the focus continues to be on music of the ballroomdance genre. Here, the goal is to apply their work in a morepractical setting by defining an approximate paradigm on topof their model. The motivation of this study follows from thefact that temporal errors often occur, either in the playing ofthe rhythm by a drummer or percussionist, or in the extraction

123

Page 2: Exact and approximate rhythm matching algorithms

150 J. W.-T. Chan et al.

of the data, for example when using a beat tracking systemon audio data [6,8]. The problem handled here can be seenas a μ-approximation version of the original problem, whereμ is the degree of error allowed.

The algorithms presented can be used to automaticallyclassify and retrieve songs within a digital library of ballroomdance tracks. A collection of MIDI tracks can be classifiedinto predefined rhythms using efficient implementations ofthe algorithms. This is taken into consideration and includedin the runtime analysis of the algorithms (see Sect. 3).

This article is organized as follows. In Sect. 2, the frame-work presented in [4] is briefly reviewed and extended totake errors into account. In Sect. 3, the CIRS algorithm is suc-cinctly described, along with implementation-specific issues.Two algorithmic variants are presented, which solve the prob-lem within the approximate paradigm. The experimental set-ting is presented in Sect. 4 along with the test results. Finally,the article is concluded in Sect. 5.

2 Preliminaries

In the first part of this Sect. 2.1, the notations and definitionsused in [4] are briefly reviewed. Then, in Sect. 2.2, new def-initions and notations necessary for the approximate modelare presented.

2.1 Exact rhythm matching definitions

A musical sequence, e.g., a song, can be considered to con-sist of a series of onsets (or events) that correspond to musicsignals, such as drum beats, guitar picks, horn hits, etc. Itis the inter-onset intervals (IOIs) between those events, thatcharacterize the song. Formally, a musical sequence t is astring t = t1t2 . . . tn , where ti ∈ N

+, for all 1 ≤ i ≤ n. Here,each ti , 1 ≤ i ≤ n represents the duration of the consecutivemusical events.

Consider for example a music signal having five musicalevents occurring at 0th, 50th, 100th, 200th, and 240th ms.Then t = [50, 50, 100, 40] is the musical sequence repre-senting the above musical signal.

In this particular setting, (ballroom dance) rhythms areassumed to consist of a number of IOIs. In particular, thereare two types of IOIs in the rhythm of a song: quick (Q) andslow (S). Quick means that the duration between two (notnecessarily successive) onsets is q ms, while the slow IOIis equal to 2q. For example, the dancing rhythms, cha-cha,foxtrot, and jive are represented as shown in Table 1.

Formally, a rhythm r is a string r = r1r2 . . . rm , wherer j ∈ {Q, S}, for all 1 ≤ j ≤ m. For example, r = QSS.Here, Q and S correspond to durations of activities (IOIsbetween the start of consecutive events), such that the lengthof the IOI represented by an S is double the length of that

Table 1 Ballroom dance rhythms

Name Representation

Cha-cha SSQ QSSSQ QS

Foxtrot SSQ QSSQ Q

Jive SSQ QSQ QS

represented by Q. However, the exact length of Q or S is nota priori known.

Note that the alphabet for the musical sequence and thatof the rhythm differs. The alphabet for the musical sequenceis Σ = N

+, whereas that for the rhythm is Σr = {Q, S}. Thenotion of match and cover in this framework is extended fromthe notion of classical string matching in the following way.

Definition 1 (IOI q-match) Let Q represent IOIs of size q ∈N+ ms, and S represent IOIs of size 2q. Then Q is said to

q-match with the substring ti..i ′ of the musical sequence t , ifand only if

q = ti + ti+1 + . . .+ ti ′

where 1 ≤ i ≤ i ′ ≤ n. If i = i ′, i.e., Q codes a singleIOI, then the match is said to be solid. Similarly, S is saidto q-match with ti..i ′ , if and only if either of the following istrue

– i = i ′ and ti = 2q, or– i �= i ′ and there exists i ≤ i1 < i ′ such that q = ti+ ti+1 + · · · + ti1 = ti1+1 + ti1+2 + · · · + ti ′ .

As with Q, the match of S is said to be solid, if i = i ′.

Consider the musical sequence shown in Fig. 1. For q =150, Q-matches with t2..3 and S matches with t5..9. For q =100, Q-matches with t1..2, t3 etc. and S matches with t6..8.However, for q = 100, S does not match with t7..9 despitethe fact that

∑9i=7 ti = 2q.

Definition 2 (Rhythm q-match) Given rhythm r = r1 . . . rm ,it is said to q-match with the substring ti..i ′ of the musicalsequence t , if and only if, there exists an integer q ∈ N

+,and integers i1 < i2 < . . . < im < im+1 such that

1. i1 = i, im+1 = i ′ + 1, and2. r j q-matches ti j ..i j+1−1, for all 1 ≤ j ≤ m

Fig. 1 Q and S-matching in musical sequences

123

Page 3: Exact and approximate rhythm matching algorithms

Rhythm matching algorithms 151

Fig. 2 q-matches of r = QSS in t , for q = 50

For instance, the rhythm r = QSS, q-matches with t1..5

as well as with t5..8, in Fig. 2, for q = 50.One important fact is that reporting only the start (or end)

position of a q-match of a rhythm may not convey the com-plete information. This can be easily understood by the dif-ference in length of the portion of t that q-matches withr in the above two instances. Therefore, both the start andend positions must be reported to denote the q-occurrencesagainst the q-matches. The q-occurrence list for the abovecase is thus Occq = {(1, 5), (5, 8)}.Definition 3 (Rhythm q-cover) A rhythm r is said to q-coverthe substring ti..i ′ of the musical sequence t , if and only ifthere exist integers i1, i ′1, i2, i ′2, . . . , ik, i ′k , for some k ≥ 1,such that

1. rq-matches ti�..i ′� , for all 1 ≤ � ≤ k, and2. i ′�−1 ≥ i� − 1, for all 2 ≤ � ≤ k

In the example of Fig. 2, the rhythm r = QSS, q-coverst1..8 for q = 50.

2.2 Inexact rhythm matching definitions

Into this paradigm, error μ is introduced. Allowing this canbe claimed to be significant because of

– human errors, e.g., a percussionist is slightly off in herdrumming, intentional or otherwise

– technological errors, e.g., when using a beat tracking sys-tem on audio data [6,8], a music recognition or musictranscription algorithm [5,10,12]

– other types of errors

The value of μ should be expected to be small. For thepurposes of the algorithm, μ is restricted to a relatively largevalue and this is justified for a very simple reason. First aλ-match is defined and then the possible range of values forμ is discussed.

Definition 4 (IOI λ-match) Let Q represent IOIs of size q ∈N+ms, and S IOIs of size σ . Then, Q is said to λ-match with

the substring ti..i ′ of the musical sequence t , if and only if,

q = ti + ti+1 + . . .+ ti ′ + λq ,

where 1 ≤ i ≤ i ′ ≤ n and λq is an integer such that |λq | ≤ μ.If i = i ′, then the match is said to be solid. Similarly, S is

said to λ-match with ti..i ′ , if and only if,

σ = ti + ti+1 + . . .+ ti ′ + λσ ,

where λσ is an integer such that |λσ | ≤ 2μ. As with Q, thematch of S is said to be solid if i = i ′.

Here, λ represents the local error. The global error isdefined later on in this section. In what follows, the termλq -match is used when referring to the matching of a Q, andλσ -match when matching an S. However, when it is clearfrom the context or there is no need to differentiate betweenthe two types, the term “match” or “λ-match” is used. Next,the permissable values for μ are discussed.

Errors are allowed for both quick (Q) and slow (S) beats,so the allowable error margin must be unambiguous as towhether a λ-match is a λq -match or a λσ -match. In particu-lar, possible values for a q should not overlap with possiblevalues for a σ , i.e., the upper bound of a q should not belarger than or equal to the lower bound of a σ . The followingequation is solved:

q + μ ≤ σ − 2μ⇒ q + μ ≤ 2q − 2μ⇒ μ ≤ q/3 (1)

Hence the maximum error μ is equal to q/3.In the case that the maximum value for μ is used, valid

values for q are in [2q/3..4q/3] and valid values for σ ∈(4q/3 + 1..8q/3]. Notice that “the advantage” is given to aq for the “border” condition of 4q/3. This is justified by thefact that, according to Definition 1, two q’s make a σ , but aσ cannot be split into two q’s.

For example, consider the musical sequence shown inFig. 3. For q = 60 and μ = 10, Qλ-matches with t1..2,

t4..5, t4..6 etc., and Sλ-matches with t3 (solid) and t4..8 withλσ = 20, with t4..9 with λσ = 10 etc.

A constant c is defined to be the maximum number of con-secutive elements in t that can be merged to give a Q or Sq-match. A valid sequence is one in which each element of thesequence does not have an error greater than μ. The symbol κisdefinedtobethesumofall the localerrors inavalidsequenceand it is named the global error of this sequence. Finally, thebest sequence Rσ over t for agivenσ is avalid sequencewhichminimizes the overall error, i.e., the global error κ .

Definition 5 (Rhythm (κ–λ)-match) A rhythm r = r1 . . . rm

is said to (κ–λ)-match with the substring ti..i ′ of the musicalsequence t , if and only if, there exists an integer q ∈ N

+,and integers i1 < i2 < . . . < im < im+1, such that,

Fig. 3 λ-matches in t , for q = 60 and μ = 10

123

Page 4: Exact and approximate rhythm matching algorithms

152 J. W.-T. Chan et al.

Fig. 4 q-matches of r = QSQ in t , for q = 50 and μ = 5

1. i1 = i, im+1 = i ′ + 1, and2. r jλ-matches ti j ..i j+1−1, for all 1 ≤ j ≤ m.

For instance, the rhythm r = QSQ, (κ–λ)-matches witht1..4 as well as with t5..7 and t7..9, in Fig. 4, for q = 50and μ = 5. Similarly to the exact rhythm-matching case,both the start and end positions need be reported to denotetheμ-approximate q-occurrences against the (κ–λ)-matches.Thus, the μ-approximate q-occurrence list is Occμq ={(1, 4), (5, 7), (7, 9)}.Definition 6 (Rhythm (κ–λ)-cover) A rhythm r is said to(κ–λ)-cover the substring ti..i ′ of the musical sequence t ,if and only if there exist integers i1, i ′1, i2, i ′2, . . . , ik, i ′k , forsome k ≥ 1, such that:

1. r (κ–λ)-matches ti�..i ′� , for all 1 ≤ � ≤ k, and2. i ′�−1 ≥ i� − 1, for all 2 ≤ � ≤ k.

In Fig. 4, the rhythm r = QSQq-covers t1..9 for q = 50and μ = 5.

3 Maximal coverability algorithm

The maximal coverability problem is extended from [4] toconform to the approximate paradigm:

Problem 1 (Maximal coverability) Given a musicalsequence t = t1t2 . . . tn, ti ∈ N

+, a rhythm r = r1r2 . . . rm,

r j ∈ {Q, S}, an integer μ ∈ N+, and a constant c, find the

longest substring ti..i ′ of t that is μ-approximate q-coveredby r among all possible values of q.

Thus, the maximal coverability problem for exact match-ing is simply the above definition with μ = 0 and c = n. Therestriction applied on the above problem in CIRS is now:

For each match of r with a substring ti..i ′ , there must existat least one S in r , whose match in ti..i ′ is solid; that is, thereexists at least one 1 ≤ j ≤ m such that r j = S λσ -matchestk = σ + λσ , where |λσ | ≤ 2μ, for some value of σ .

As it was argued in [4] to justify the above restriction, froma musical point of view, it is meaningful to have at least oneevent that is solid to avoid the situation where any rhythmwould match a metronomic tone.

The CIRS algorithm works in the following main stages.

Stage 1 Find all occurrences of (solid) S = σ in t for eachpossible value of σ .Stage 2 Transform the areas around all the S’s found inStage 1 into sequences of Q’s and S’s. A sequence in thisstage is identified by σ = S as follows: a sequence is saidto be a q-sequence, if the solid S is assumed to be of value2q, i.e., σ = 2q.Stage 3 Find the q-matches of r in correspondingq-sequences from Stage 2.Stage 4 Find the maximal area q-covered by r for allpossible values of q and then report a maximum one.

The above stages are discussed in the rest of this section,along with implementation details of the algorithm. Particu-lar focus is placed on Stage 2, which is the only stage affectedby the introduction of errors.

3.1 Stage 1—finding all occurrences of S

In this stage, all occurrences of S = σ must be found for thechosen σ , so that in Stage 2, the areas around each of thoseoccurrences may be transformed into sequences of Q’s and(possibly) S’s. The above must be repeated for every possiblevalue of σ .

A single scan through the input string suffices to find alloccurrences of σ . As the stage is repeated for every distinctσ ∈ Σ , overall the algorithm would need O(|Σ |n) timeduring this stage alone.

It is possible however, to speed this stage up, by collec-tively computing occurrences of all the symbols and stor-ing them in appropriate data structures. This can be done inO(n log |Σ |) time and O(n + |Σ |) space in the followingmanner. Consider balanced binary search tree f irst , of size|Σ | and height log |Σ |, and vector next , of size n, such that

– (σ, i) is an item in tree f irst , with key = σ and data =i , if and only if the leftmost occurrence of the symbol σ

appears at position i of t1..n

– next[i] = j if and only if ti = t j and for all k, i < k <

j, tk �= ti ; if no such j exists, then next[i] = 0

A single scan through t suffices to compute f irst andnext . Insertions into f irst require O(log |Σ |), hence thetotal runtime of this stage is O(n log |Σ |).

The data structure f irst was implemented using the STLassociative container map, which in turn is normally imple-mented as a balanced search tree. This ensures a O(log |Σ |)running time for lookups of particular σ ’s and has the addedadvantage that it keeps the elements sorted, a fact which isuseful in the next stage. In particular, this helps avoid usingthe complex range maxima query data structure and associ-ated overhead.

123

Page 5: Exact and approximate rhythm matching algorithms

Rhythm matching algorithms 153

Fig. 5 Transforming the area around t5 = S = 100 and then around t14 = S = 100

Note that the size of the alphabet Σ is much smaller onaverage than the size of the piece of music, i.e., |Σ | � n. Inthe actual social dance music pieces used for experimentalpurposes, |Σ | ≈ 10, while a 5-min song can have thousandsof rhythmic events. Thus, log |Σ | may be taken as constant,O(1), with respect to n.

3.2 Stage 2—transformation

The task of this stage is to transform t , which is a sequenceof integers, into a number of sets Rσ of sequences for allpossible values of σ . Each sequence belonging to Rσ isa best μ-approximate q-sequence over {Q, S} for the cho-sen q = σ/2. The aim is to identify all the μ-approximateq-matches of r in t ′ ∈ Rσ .

3.2.1 Exact matching transform

For each occurrence of the current symbol σ = 2q = S,an attempt is made to convert the area surrounding that Sinto sequences or a tile of Q’s. When it is not possible tocontinue making Q’s, a check is made whether S’s can bemade instead. Note that a Q is attempted first, and in caseof a failure only then is an S attempted. Figure 5 shows anexample of the transformation process.

It is easy to observe that in this way, S is found only if itis solid, because by definition (see Definition 1), there can-not exist an S which can not be divided into two consecutiveQ’s. If neither a Q nor an S can be made then the end of thesequence is marked. So each sequence t ′ ∈ Rσ consists ofone, or possibly more, solid S’s, surrounded by and separatedfrom each other by zero or more Q’s.

The running time of Stage 2 depends on the total lengthof all the sequences produced in this stage. The follow-ing lemma from [11], improved the bound on number ofsequences produced in this stage from [4].

Lemma 1 The total length of rhythms computed by the algo-rithm transform, for all possible values of q, is not greaterthan 5n.

Nevertheless, the running time of Stage 2 remains the sameand is given by the following lemma from [4].

Lemma 2 The running time of Stage 2 is O(n log H), whereH is the maximum value in t.

To ensure the theoretical running time of O(n log H) theCIRS algorithm uses the complex range maxima query data

structure of [7,1]. However, in the implementation it was pre-ferred to avoid the associated data structural overhead andcomplex coding. Thus, the following approach was taken.

As in Stage 1, the STL map container is used, which isa balanced search tree. The tree is of the form (key, data) =(〈startPos, endPos〉, sequence), e.g., (〈1, 5〉, “Q QSQ Q”), (〈7, 9〉,“QSQ”) …etc. This ensures that the sequences are in sorted bystart position order, though it adds O(log |RΣ |) to the asymp-totic run time. However, as it turns out from the experiments,the resulting increase in running time is near negligible.

3.2.2 Inexact matching transform

This subproblem is solved using two alternative approaches:

(a) Depth-first Search (DFS)(b) Dynamic programming (DP)

The λ() subroutine is used by both algorithms for findingthe local error. The local error λ simply gives the error fromthe exact q or σ of a value r , regardless of how many mergesof elements of t are performed to obtain this r :

λ(r) ={ |r − q|, if r ∈ [q − μ..q + μ], a λq -match|r − 2q|, if r ∈ [2q − 2μ..2q + 2μ], a λσ -match+∞, otherwise

(2)

In Algorithm 1, the call to subroutine λ() also returns theappropriate character (a Q or an S), from which the sequenceis built.

Algorithm 1 Calculate λ

1: function λ(r, q, μ)2: local_error ←∞3: beat_char ← “-”4: if r < q − μ then5: local_error ←∞6: else if r ≤ q + μ then7: local_error ← |r − q|8: beat_char ← “Q”9: else if r < 2q − 2μ then10: local_error ←∞11: else if r ≤ 2q + 2μ then12: local_error ← |r − 2q|13: beat_char ← “S”14: return local_error , beat_char

In the algorithms presented next, only the bestsequences are produced, i.e., not all possible sequences.

123

Page 6: Exact and approximate rhythm matching algorithms

154 J. W.-T. Chan et al.

Fig. 6 Inexact rhythm matching on musical sequence using the DFS approach

Thus, appropriate tie-breaking rules are needed to deal withtwo different merges with the same local error λ.

1. In the case of a tie between two Q’s or two S’s, the onewith less merges is chosen. This is justified by the simpleobservation that the ideal case is for solid Q’s and S’s,the 2nd best case is for merges of only 2 IOIs, and so on.

2. In the case of a tie between a Q and an S, the Q is chosen,i.e., if there are two possible merges, one that gives a Qand one an S, and they have the same local error λ, the Qis chosen. This is justified by the property that, accordingto the definitions, two Q’s make an S, but an S cannot besplit into two Q’s.

3.2.3 Depth-first search (DFS)

The DFS algorithm takes a greedy approach. For each valueof σ identified in Stage 1, a set Rσ of sequences is produced.Starting from the position of σ , first go to the left as far aspossible, producing valid Q’s and S’s and then do the same toit’s right. Given constant c, an attempt is made to merge up toc IOIs and choose the best fit, i.e., the merged IOIs that mini-mize the local error λ. If neither a Q nor an S can be producedwithin c merges, this part of the algorithm terminates.

Note that, μ, by definition, cannot exceed q/3; thus, thealgorithm reduces μ when necessary to adhere to this, andfloating point arithmetic is used instead of integers.

The execution of the DFS algorithm is shown in the partialexample of Fig. 6, for c = 3. Note that sequences which areless than the length of the rhythm can be immediately dis-carded. Pseudo-code for the full DFS algorithm, as well asfurther details can be seen in [2].

The search to the left and to the right iterates over the wholeof the piece of music t , requiring O(c) time each, attaininga total running time of O(nc) for DFS-Transform(). Recallthat, DFS-Transform() is called for each σ , i.e., for each valuein t and therefore, the worst case running time of this stageusing the DFS approach is O(n2c).

Fig. 7 DFS problem example

A problem can occur at boundaries in the DFS approachas shown in Fig. 7. This occurs when reading from left toright for σ = t1 = 12, μ = 1 and c = 3, in this example.A few possible workarounds are available, including merg-ing the left-to-right and right-to-left produced sequences intoone, or backtracking when the algorithm terminates and cer-tain conditions are met. The following section provides anelegant solution that overcomes this problem all together.

3.2.4 Dynamic programming (DP)

The 2nd approach (see Algorithm 2) solves the problem byemploying dynamic programming. The recursion is given inrelation to κ(i). The recursion for the search to the right is:

κ(i) = min

⎧⎪⎪⎨

⎪⎪⎩

κ(i − 1)+ λ(ti )κ(i − 2)+ λ(ti + ti−1)

. . .

κ(i − c)+ λ(ti + ti−1 + . . .+ ti−c+1)

(3)

κ(i) signifies the optimal solution, i.e., the lowest globalerror from the starting point up until and including positioni of t .

Algorithm 2 Dynamic Programming1: function DP- Transform(t1..n, i, σ, μ, c)2: Rσ ← {}3: q ← σ/24: if μ > q/3 then5: μ← q/36: i ← data in f irst at key = σ

7: while i > 0 do8: x ← “S”9: j ← i10: i Le f t Pos ← j11: if j > 1 then12: x ← DP-TransformToLeft(t, j, q, μ, c)13: j ← i + 114: i Right Pos ← j15: if j < n − 1 then16: x ← DP-TransformToRight(t, j, q, μ, c)17: Rσ ← Rσ ∪ {x}18: i ← next[i]19: return Rσ

The algorithm terminates it’s directional (left, then right)search, when c consecutive κ(i)’s are∞. After termination,the sequence is reproduced from the optimal solution backto the starting point, as is common in DP. This is achieved

123

Page 7: Exact and approximate rhythm matching algorithms

Rhythm matching algorithms 155

by retrieving the information stored in the three vectors:κ, index , and beat_char .

Algorithm 3 DP subroutine: to the right1: function DP- TransformToRight(t1..n, i, q, μ, c)2: pos ← i , r ← 03: κ[i..n] ← {∞,∞, . . . ,∞}4: κ[i] ← 0.05: index[i..n] ← {∞,∞, . . . ,∞}6: beat_char [i..n] ← {‘-’, ‘-’,..., ‘-’}7: for j ← i to n do8: for k ← 1 to c do9: if j − k ≤ i then10: r ← t j − t j−k11: (κ ′, local_char)← κ[ j − k − 1] + λ(r, q, μ)

12: if κ ′ < κ[ j − 1] then13: κ[ j − 1] ← κ ′14: index[ j − 1] ← k15: beat_char [ j − 1] ← local_char16: pos ← j − 117: if κ[ j.. j − c] =∞ then18: break � no more Q’s to the right19: for m ← pos downto i do20: m ← m − index[m]21: push beat_char to front of y22: push y to back of x23: return x

The recursion is applied on the same example as wasdone for the DFS approach and κ, index , and beat_char ,are shown in Table 2. Note that in this simple example,only an application of DP-TransformToRight (Algorithm 3)is required and not DP-TransformToLeft (not shown here forspace considerations, but detailed in [2]). From these vec-tors, the sequence “SQQQQSQSS” is produced as indicatedby the asterisks.

It should be clear that, to find each entry in κ, index , andbeat_char , takes c time and there are n such entries, whichgives the same running time for DP-Transform as for DFS-Transform, i.e., O(cn). Again, DP-Transform is called foreach element, thus giving a total runtime of O(n2c).

The problematic example presented in the DFS section issolved using DP and the results are shown in Table 3. Theresultant sequence, i.e., “SSS”, is the correct one.

The two different approaches have the same theoreticalrunning times, but the DP’s hidden constant is larger than that

Table 2 The DP tables computed by DP-TransformToRight() forσ = 12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

κ 0 ∞ 2 1 2 3 3 4 5 4 5 7 4 8 4 6 8 5 5

index – 1 2 3 2 2 2 2 2 2 1 1 3 2 2 1 1 3 1

beat_char S – Q Q Q Q Q Q Q Q Q Q S Q Q Q Q S S

Sequence ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗

Table 3 The problematic example, correctly solved by DP

0 1 2 3 4 5 6 7 8

κ 0 0 2 0 2 0 1 ∞ 1

index – 1 1 2 2 3 5 – 6

beat_char – S S Q S S Q − S

Sequence ∗ ∗ ∗

of the DFS’s. So, in practice, the implementation of the DFSalgorithm is faster, and it is believed that, despite the prob-lems at boundaries as discussed above, such problems arerarely encountered in real pieces of music as is the intention ofthe authors to show in experimental results in future studies.

3.3 Stage 3—find the matchings

In this stage each t ′ ∈ Rσ is considered, for all valid val-ues of σ , and all the q-matches of r in t ′ are identified. Todo that efficiently a bit-masking technique is exploited anddescribed below. First, some notations that are used for sakeof convenience are defined.

St ′ and Sr are defined to indicate an S in t ′ and r , respec-tively. Qt ′ and Qr are defined analogously. First, some pre-processing is performed as follows; t ′′ is constructed fromt ′ where each St ′ is replaced by 01 and each Qt ′ is replacedby 1. Note that the corresponding positions of t ′ in t ′′ aretracked. Second, the Invalid set I for t ′′ is constructed, whereI includes each position of “1” of St ′ in t ′′. For example, ift ′ = Q QSQS then t ′′ = 1101101 and I = {4, 7}. It is easyto see that no occurrence of r can start at i ∈ I . Finally, tocomplete the preprocessing, r ′ is constructed from r whereeach Sr is replaced by 10 and each Qr is replaced by 0.

After the preprocessing is done, at each position i /∈ I oft ′′ a bitwise OR operation is performed between t ′′i..i+|r ′|−1

and r ′. If the result of the operation is all 1’s, i.e., 1r ′ then amatch has been found at position i of t ′′. However, it must beensured that there is a solid S in the match. To achieve this,a bitwise XOR operation is performed between t ′′i..i+|r ′|−1

and 1r ′ , and only if this returns a nonzero value is the ORoperation stated above performed.

In the implementation of the bitwise operations, the C++bitset data structure was used. This ensures minimal storage(each bit is stored as one bit) and speed, as all operations usedin the algorithm are constant (O(1)) (AND, OR, and left andright bit shifts).

The total runtime of this stage as deduced in [4] and imple-mented in [3] is O(|t ′′|× |r ′|/w) = O(n log H m/w) wherew is the size of the word of the target machine. This followsfrom the fact that the above procedure is done for everysequence produced in Stage 2. This was improved on in [11]and reduced to O(n log m) as a direct result of Lemma 1.

123

Page 8: Exact and approximate rhythm matching algorithms

156 J. W.-T. Chan et al.

The version of the algorithm that was implemented for thisarticle though is the same as for [4] and [3].

3.4 Stage 4—find the cover

It can be assumed from the above that the input to Stage 4 aresets of q-occurrence lists, corresponding to the q-matches forthe r in t . Assume that O = {Occσ } where Occσ is the setof occurrences corresponding to q-matches with q = σ/2.Recall that the occurrences are in sorted order; now what isdone is as follows. For each Occσ ∈ O, the correspondingq-covers need be found. This can easily be done by check-ing, respectively, the end and start positions of consecutiveoccurrences. Furthermore, a global variable is maintained tokeep track of the longest cover so far. Thus, the running timeof this stage is of the order of O(n log H). In [11], Kubicaand Walen improved the runtime for this stage and reducedit to O(n).

Stages 3 and 4 are the same for the exact and inexact casesin the CIRS version of the algorithms. Thus, the total runningtime of the inexact rhythm matching algorithms is O(n2c),because of Stage 2. The runtime in the exact case, as wasshown in [4] is O(n log H), reduced to O(n ·(log H+log m))

in [11].For a digital library of ballroom music with k tracks of

length n, a classification on the 9 rhythms of Table 4, and anadditional one for “unclassifiable”, could be done in the exactcase in O(kn · (log H + log m)) time. In the inexact case,which, with regards to digital libraries is considered morerealistic, the classification can be done in O(kn2c) time.

4 Experimental results

CIRS and the naive brute force algorithm (Algorithm 4) wereimplemented and tested on pieces of music, converted frommusical instrument digital interface (MIDI) files, to plain text

Table 4 Ballroom dance rhythms and their representations on whichthe algorithms were tested on

Name Representation

Bolero SQ QSQ Q

Cha-cha SSQ QSSSQ QS

Foxtrot SSQ QSSQ Q

Jive SSQ QSQ QS

Mambo/salsa Q QSQ QS

Quickstep SQ QSSQ QS

Rumba SQSSQ

Tango SSQ QS

Waltz SSS

files. The algorithms were compared for speed of executionand not for accuracy of results.

Algorithm 4 Brute Force Algorithm1: function Naive Search(r, t)2: pos ← 1, m ← |r |, n← |t |3: while more elements in piece of music t do4: for v′ ← 1 to c do5: i ← pos + v′6: if i < n − m then7: set start value for q8: j ← 19: while more chars in rhythm do10: for v← 1 to c do11: if ti+v − ti q-matches r j then12: increment j � next char in rhythm r13: else if v < c then14: next v � same char in r , next diff in t15: else if v = c then16: set new start value for the beat17: if j = m then � report occurrence18: report occurrence of r in t at pos with start value

q19: pos ← pos + 1 � move to next element in music t

In Algorithm 4, a naive search for rhythm r in musicalsequence t is performed. Constant c is the number of consec-utive elements in t that may be merged to form a Q. Althoughthis is not necessarily a requirement of the brute force algo-rithm, it was introduced so as to keep the search time reason-able and the number of merges pragmatic.

At every position pos of t , the algorithm attempts to matchthe rhythm r , by setting q initially to tpos+1 − tpos, then totpos+2− tpos, and up to tpos+c− tpos. If all the chars in r matchfor the given starting value for q and by merging only up toc consecutive elements in t , then an occurrence is reported.

4.1 The data

File formats such as mp3 and wav use audio signals, whereasin MIDI every musical event is described by simple attributessuch as pitch, volume and duration. The implemented algo-rithms take as input a musical piece t , that is by definitionexpressed as an array of the timing of rhythmic events. MIDIis thus a suitable format for the purposes of the implementa-tions.

Music recognition is a difficult task, though music tran-scription algorithms [5,10–12], techniques, and softwarewhich converts audio to MIDI, are available. Thus, in a pre-processing step, audio music files could be converted to MIDIfiles.

The determination of what constitutes the rhythm of apiece of music is inherently ambiguous [9]. Thus, choosingthe appropriate tracks from the MIDI files, as representingthe rhythm of the musical piece is also a difficult task. For

123

Page 9: Exact and approximate rhythm matching algorithms

Rhythm matching algorithms 157

Fig. 8 Performance of CIRS and the brute force algorithm for rhythm bolero (SQ QSQ Q), on pieces of music of length n

reasons of simplicity, only one track per song was chosen totest the algorithms on, which in some cases was the drumtrack and in others the bass track.

4.2 The tests

The tests were run in a Windows environment with an IntelCeleron M processor of 1.60 GHz and 512 MB RAM. Theimplementation was done in C++ using the STL.

The tests were carried out on 9 popular rhythms that can becategorized as belonging to the ballroom dance variety, listedin Table 4. The rhythms were run against 26 unique piecesof ballroom dance music, 17 of which were doubled andquadrupled in length for test purposes. The aim of the exper-iments was to measure the execution speed of the algorithmsand establish whether the CIRS algorithm was indeed fasterthan a brute force implementation. Note that 1,000 rhyth-mic events in the songs used, corresponds approximately to2 min, and that ballroom songs are usually in the range of3- to 5-min long. The 17 arbitrarily chosen songs were length-ened by concatenating the whole musical sequence in theform of IOIs to the end of each song. This believe should notdiminish the value of the following observations.

The actual runtime results of the implemented algorithmson the rhythm bolero (SQ QSQ Q) are plotted in Fig. 8. Thegraph plots the search of bolero on various pieces of music. Inthe figure, the pieces of music are represented on the X axis.Where n = 1, 000 means there were 1,000 rhythmic eventsin that song, which corresponds approximately to a 2-minsong. The Y axis shows the running time, in milliseconds.The following observations are made.

1. The behavior of both CIRS and the brute force implemen-tation is linear. This observation is backed by the theo-

retical running time along with the fact that in practicalcases the size of the dance rhythms are quite small.

2. For very small values of n the brute force algorithmoutperforms CIRS. This behavior is expected and easyto understand as follows. The brute force algorithm ineffect compares the rhythm against the music sequenceat every position. There are no other significant over-heads. Whereas CIRS sets up the data structures and hasfour stages regardless of the size and nature of the musicsequence and rhythms. To get a better understanding ofthis particular situation experiments were also performedon short and long made up rhythms and an attempt wasmade to pinpoint the value up to which the brute forcealgorithm outperforms CIRS. It was found that for largerrhythms, this value is around 300 whereas for shorterrhythms this is near 500.

3. A further observation concerns the near vertical lines onthe graphs at certain points. This is as a result of the testsbeing on various pieces of music of similar length, andalthough the rhythm and thus the length of the string mremains the same, the nature of the pieces of music is suchthat the algorithms speed varies according to the numberof direct comparisons and occurrences of the rhythm inthe musical sequence. This, combined with the doublingand quadrupling of certain songs, produces the effectsseen in Fig. 8 at around n = 850, 1700, and3400.

4. Finally, the tests reveal that although the CIRS algorithmis elegant from a mathematical point of view, its imple-mentation is not justified because of its complexity. Itcould however have a part to play in a more sophisticatedgenre classification system, for other types of music andrhythms of longer length. Furthermore, it is interestingfrom an algorithmic point of view and in particular in thefield of string pattern matching.

123

Page 10: Exact and approximate rhythm matching algorithms

158 J. W.-T. Chan et al.

5 Conclusions and further studies

The task of identifying the rhythmic structure of a piece ofmusic and categorizing it according to a predefined set ofrhythms is, within the realm of music information retrieval,a non-trivial task.

In this article, the problem of automated classification ofsongs according to rhythms was considered from a practicalpoint of view. The CIRS algorithm [4] was implemented foridentifying musical sequences according to rhythms alongwith a naive brute force algorithm to solve the same prob-lem. The theoretical time complexity bounds were analyzedwith the actual running times achieved by the experiments,and the results of the two algorithms were compared.

One of the inherent difficulties in extracting rhythmicstructure from music is the presence of errors, inten-tional or otherwise. Two efficient algorithms were pre-sented, that can be used to categorize songs of the ballroomdance genre, taking temporal errors in the transcription intoaccount.

An assumption made throughout this article is that therhythm remains constant throughout the duration of the pieceof music. This could be relaxed to allow rhythm changes insongs. The new algorithm could have applications on a widerspectrum of musical forms. Another interesting researchdirection could be to investigate the case where the assump-tion of S being double the duration of Q is relaxed.

From an experimental viewpoint, it would be interesting tomeasure the accuracy of the above-implemented algorithmsin truly identifying the rhythms of selected pieces of music.In addition, an implementation of the asymptotically fasteralgorithm presented in [11] could be compared against CIRSfor speed on a common test base.

Acknowledgments The authors would like to thank the anonymousreviewers for the time and effort spent on improving the contents of thearticle.

References

1. Bender, M.A., Farach-Colton, M.: The lca problem revisited. In:Gonnet, G.H., Panario, D., Viola, A. (eds.) Latin American Theo-retical Informatics (LATIN), Lecture Notes in Computer Science,vol. 1776, pp. 88–94. Springer, Berlin (2000)

2. Chan, J.W.T., Iliopoulos, C.S., Michalakopoulos, S., Rahman,M.S.: Erratic dancing. In: Jensen, K., (eds.) Proceedings of the5th International Symposium on Computer Music Modeling andRetrieval (CMMR 2008), Re:New—Digital Arts Forum, pp. 117–129, Copenhagen, Denmark (2008)

3. Chen, A.L.P., Iliopoulos, C.S., Michalakopoulos, S., Rahman,M.S.: Implementation of algorithms to classify musical textsaccording to rhythms. In: Spyridis, C., Georgaki, A., Kouroupe-troglou, G., Anagnostopoulou, C., (eds.) Proceedings of the 4thSound and Music Computing Conference, National and Kapodis-trian University of Athens, pp. 134–141, Lefkada, Greece (2007)

4. Christodoulakis, M., Iliopoulos, C.S., Rahman, M.S., Smyth,W.F.: Identifying rhythms in musical texts. Int. J. Foundat. Comput.Sci. 19(1), 37–51 (2008)

5. Collins, N.: Beat induction and rhythm analysis for live audio pro-cessing: 1st year Ph.D. report (2004)

6. Dixon, S.: Automatic extraction of tempo and beat from expressiveperformances. J. New Music Res. 30(1), 39–58 (2001)

7. Gabow, H., Bentley, J., Tarjan, R.: Scaling and related techniquesfor geometry problems. In: Symposium on the Theory of Comput-ing (STOC), pp. 135–143. ACM Press, New York (1984) (Chair-man-Richard DeMillo)

8. Goto, M., Muraoka, Y.: A real-time beat tracking system for audiosignals. In: Proceedings of the International Computer Music Con-ference, pp. 171–174. International Computer Music Association(1995)

9. Gouyon, F., Dixon, S.: A review of automatic rhythm descriptionsystems. Comput. Music J. 29(1), 34–54 (2005)

10. Hainsworth, S., Macleod, M.: Automatic bass line transcriptionfrom polyphonic music. In: Proceedings of the International Com-puter Music Conference (2001)

11. Kubica, M., Walen, T.: Improved algorithm for rhythm recognitionin musical sequences. In: Chan, J., Daykin, J.W., Rahman, M.S.(eds.) London Algorithmics 2008: Theory and Practice, Texts inAlgorithmics, vol. 11. College Publications (2009)

12. Orife, I.F.O.: Riddim: A rhythm analysis and decompositiontool based on independent subspace analysis. Master’s thesis,Dartmouth College (2001)

123