Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Hindawi Publishing CorporationThe Scientific World JournalVolume 2013 Article ID 752464 11 pageshttpdxdoiorg1011552013752464
Research ArticleMusic Identification System Using MPEG-7 AudioSignature Descriptors
Shingchern D You1 Wei-Hwa Chen2 and Woei-Kae Chen1
1 Department of Computer Science and Information Engineering National Taipei University of Technology Taipei 104 Taiwan2Hon-Hai Precision Industry Co Ltd Tucheng District New Taipei City 236 Taiwan
Correspondence should be addressed to Shingchern You youcsientutedutw
Received 3 January 2013 Accepted 24 January 2013
Academic Editors P Melin J Pavon and K Polat
Copyright copy 2013 Shingchern D You et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited
This paper describes a multiresolution system based on MPEG-7 audio signature descriptors for music identification Such anidentification system may be used to detect illegally copied music circulated over the Internet In the proposed system low-resolution descriptors are used to search likely candidates and then full-resolution descriptors are used to identify the unknown(query) audio With this arrangement the proposed system achieves both high speed and high accuracy To deal with the problemthat a piece of query audio may not be inside the systemrsquos database we suggest two different methods to find the decision thresholdSimulation results show that the proposed method II can achieve an accuracy of 994 for query inputs both inside and outsidethe database Overall it is highly possible to use the proposed system for copyright control
1 Introduction
With the SOPA (stop online piracy act) bill [1] proposed in2011 the protection of copyrighted intellectual property suchas digital content once again brought to public attentionDespite the controversial issues of the SOPA bill it iscommonly agreed that copyrighted digital content should beprotected However the first step toward the protection ofcopyrighted content is to identify whether a piece of digitalcontent is copyrighted and if so who owns it In this regardit is important to identify (detect) whether a digital work iscopyrighted or not
Among digital content soundtracks (usually in the formof audio files) are one type of content that is easily to be ille-gally reproduced Owing to the advanced techniques in audiocompression music soundtracks are usually distributed overthe Internet in compressed form rather than in uncompressedform Therefore any approach for copyright detection mustbe able to deal with both compressed and uncompressedaudio files
A typical method to attach the copyright information toa piece of music is by embedding watermarks [2] Thougheffective this method has some limitations such as thewatermarks must be embedded into the source soundtracks
before release Therefore it is not possible to identify therights owner of a piece ofmusicwithoutwatermarks Anotherconcern is that the embedding process usually introducesdistortion Thus the quality of the embedded audio may bedegraded
In addition to the watermarking technique it is alsopossible to identify rights owner by comparison For exampleif an unknown soundtrack is very similar to a soundtrackowned by a company then the unknown soundtrack is highlylikely copyrighted in that companyrsquos name This type ofapproach is especially suitable for audio files because in prac-tice tremendous amount of records currently available do notembed watermarks or any kind of copyright information
When comparing a piece of music with a music databasethe comparison may be accomplished based on the melody(ie musical notes) of the music [3] For this type ofcomparison however if two persons sing the same songthese two works will be recognized as the same one Sincedifferent artists may perform the same song known as thecover version a comparison based on melody cannot solvethis problem
Another type of comparison is based on the waveform ofthe music This technique is also known as music identifica-tion In this case the same song performed by different artists
2 The Scientific World Journal
generally does not have the same waveforms and thereforethey can be correctly identifiedThough conceptually simpleit is not plausible to directly compare PCM samples of twopieces of music because it would take too much time forthe comparison For example a typical compact disc (CD)has about 600M bytes of PCM samples to store about tensongs If a database contains 10000 different songs then thePCM samples occupy about 600G bytes of space A piece ofunknownmusic with duration of ten seconds has about 880 kbytes of data It is obvious that it requires a huge amountof computation to sequentially compare the 880 k bytes ofdata with the 600G bytes of data in the database Thereforedimension-reduced representations of the PCM waveformsknown as fingerprints are used in comparison Amongthe fingerprints most of them are defined by individualcompanies or groups Some of them are briefly explained inthe following
Researchers in Google develop a fingerprinting schemefor audio calledWaveprint [4] based onwaveletsWith the aidof wavelets the fingerprint is invariant to timescale changeIn other words whether the audio piece is played faster orslower than the normal speed the fingerprint is unchangedThe fingerprint of a piece of 4-minute music is around 64 kbytes equivalently 2133 bits per second
Shazam [5] is a company (and service) dedicated formusic identification Its database contains around elevenmillion soundtracks As described in [6] the fingerprintsused are sets of triplets based on spectrogram peaks Forexample if (119905
1 1198911) and (119905
2 1198912) are two peaks at time 119905
1and
1199052and frequency 119891
1and 119891
2 then the triplet ((119905
2minus 1199051) 1198911
(1198912minus 1198911)) is a feature Based on the realization of [7] the
fingerprint in this scheme uses 400 bits per secondResearchers in Philips also propose a fingerprinting
scheme [8] The computation of the fingerprints includesframing windowing (von Hann window) FFT (fast fouriertransform) band decision energy computation and thenquantization (into binary) In the typical setting one secondof audio has around 2730 bits of fingerprint
Microsoftrsquos Robust Audio Recognition Engine (RARE)[9] divides the incoming audio into overlapping frames Eachframe is converted to spectral domain by MCLT (modulatedcomplex lapped transform)The spectral values are applied totwo layers of OPCA (oriented principle component analysis)to reduce the dimensionality of the spectral data For thismethod 344 features (11008 bits if one feature is stored in4 bytes in a floating point) are obtained per second
In addition to the above methods there are actuallymany more different types of audio fingerprinting schemesavailable such as Music Brainz [10] Audible Magic [11] andGracenotersquos MusicID [12] According to [13] there are morethan ten different audio fingerprinting schemes available
Since there are vast amount of different fingerprintingschemes available some researchers then conducted exper-iments to compare the relative performance among someof them The results show that if the schemes use thesame number of bits to represent fingerprints they havecomparable performance [14] Therefore the selection of thefingerprinting schemes should also consider other factors
(such as interoperability to be addressed below) rather thanmerely the minor performance difference
With the ever-increasing amount of multimedia contentover the Internet and in the multimedia databases it is animportant task to exchange multimedia content To respondthe public demands ISOrsquos (International StandardizationOrganization) working group developed MPEG-7 standard[15 16] In the audio part of the standard [17] a high-level tool is developed for audio identification called audiosignature description scheme The fingerprints used in thescheme are called audio signature descriptors and theyhave good identification accuracy [18 19] In the followingwe interchangeably use descriptors and fingerprints withoutdistinction
Although proprietary audio fingerprints have excellentidentification performance the MPEG-7 audio descriptorsoffer some advantages First being an international standardensures the open and fair use (subject to license fee) of thetechnology Second such an international standard makesthe interoperability possible For example if a mobile phoneinstalls an application program to convert a piece of recordedaudio to MPEG-7 descriptors the descriptors can be sent toany website accepting the descriptors On the other handit is not possible to send proprietary fingerprints used inone company to database systems owned by competitorsThird different companies may share or exchange theiraudio descriptors (fingerprints) in their databases withoutany difficulties In the current situation each company has tocompute fingerprints for newly released albumsWith the useof MPEG-7 descriptors the redundant efforts of computingfingerprints can be minimized
Although a music identification system based on audiofingerprints has several applications [8] we concentrate onthe issue of detecting if a piece of circulated music is highlysimilar to a copyrighted work or not In a typical case thesimilarity is measured by a distance metric If the distanceis shorter than a threshold the two pieces of music areconsidered as similar Although it is not trivial to determine asuitable threshold [20] this problem is not fully studied Forexample [20] does not indicate any approach to determinethe threshold In addition the audio files to be comparedmay be very large therefore it is very important to reducethe comparison time while maintaining high identificationaccuracy Since not many papers address these two issuesbased on MPEG-7 descriptors we report in this paper ourapproaches and experimental results
This paper is organized as follows Section 2 is anoverview of the MPEG-7 audio signature descriptorsSection 3 is the system model for music identificationSection 4 describes the dimensionality reduction methodused in the paper Section 5 is the proposed strategy todetermine the threshold Section 6 covers the experimentsand results Section 7 is the conclusion
2 Overview of MPEG-7 AudioSignature Descriptors
Part 4 [17] of the MPEG-7 standard includes low-leveland high-level descriptors for various applications Low-level descriptors are derived from the temporal and spec-tral characteristics of the waveform whereas the high-level
The Scientific World Journal 3
descriptors are constructed based on low-level descriptors Inthis section we will briefly describe the descriptors related tomusic identification
21 Low-Level and High-Level Descriptors of the MPEG-7Audio There are 17 low-level audio descriptors defined inthe standard All of these low-level descriptors are derivedfrom the waveform of the music They can be dividedinto six different categories basic basic spectral signalparameter temporal timbral spectral timbral and spectralbasis representationThese low-level features may be directlyused or may serve as the basics for constructing high-leveldescriptors
Based on low-level descriptors MPEG-7 audio standardalso defines high-level description schemes for various appli-cations These include audio signature description instru-ment timbre description general sound recognition andindexing description and spoken content description Theaudio signature descriptors one type of fingerprints are usedfor music identification
22 MPEG-7 Audio Signature Descriptors Although low-level descriptors in MPEG-7 audio may also be used toidentifymusic it is shown that the audio signature descriptorsprovide better identification performance [18 19] Thereforethe audio signature descriptors are adopted in the proposedsystem
TheMPEG-7 audio signature descriptors are computed asfollows
(1) Time-to-frequency conversion this step is based onaudio spectrum envelope descriptors including thefollowing substeps
(i) Determine the hop length between two con-secutive windows in the unit of samples Thedefault value is equivalent to 30ms
(ii) Define the window length 119897119908 This value is set to
three times the hop length The chosen windowis Hamming window
(iii) Determine the length of FFT denoted as 119873FFTTo reduce the computational complexity 119873FFTis the smallest power-of-2 number equal to orgreater than 119897
119908 For example if the sample rate of
the audio is 44100 ss then 119897119908= 44100sdot003sdot3 =
3969Therefore119873FFT =4096The extra samplesafter 119897
119908are padded with zeros
(iv) Perform the FFT on the windowed samples
(2) Divide the FFT coefficients into subbands with eachsubband having a bandwidth of one-fourth of anoctave In addition the bandwidths of two consecu-tive subbands should overlap each other by 10Thatis the computed bandwidth for each subbandmust bemultiplied by 11 The beginning frequency of the firstsubband denoted as loEdge is fixed at 250Hz Table 1lists the frequency range of the first three subbandsbefore and after (spectral) overlapping
Subband
middot middot middot middot middot middot
119860 =
11988612 119886124
11988621 11988622 119886224
119886301 119886302 1198863024
119886311 119886312 1198863124
middot middot middot middot middot middot
middot middot middot middot middot middot
middot middot middot middot middot middot
11988611
Time
Figure 1 MPEG-7 audio signature descriptors in a matrix
(3) Find the flatness measure F(b) for subband b by
119865 (119887) =
ℎ(119887)minus119897(119887)+1radicprodℎ(119887)
119894=119897(119887)119888 (119894)
(1 (ℎ (119887) minus 119897 (119887) + 1))sumℎ(119887)
119894=119897(119887)119888 (119894)
(1)
where c(i) is the power spectrum computed by FFT(in step 1) and h(b) and l(b) are the lower and upperindices of c(i) within subband b
(4) Find themean and variance of F(b) for subband b overa certain number of successive FFT windows calledscaling ratio The default value of the scaling ratio is16
(5) The series of mean and variance values are the audiosignature descriptors
With the computational steps of the audio signaturedescriptor we may compute the number of descriptors ina piece of 15-second music In the time domain about(15minus009)003 = 497windows are used to cover the 15-secondsignal because the hop size is 30ms Since the scaling ratio isset to 16 there are 49716 asymp 31 values per subband In thespectral domain since there are four subbands per octave andwe use six octaves (250Hz sim 16 kHz) there are totally 24subbands in the spectral domain Thus the 15-second signalproduces 24 times 31 = 744 mean values and other 744 variancevalues The descriptors are arranged in a matrix form asshown in Figure 1
Since the mean values are sufficient for the identificationpurpose [21] we will not consider the variance descriptorsin the following The obtained descriptors are referred to ashigh-resolution (or full-dimensional) descriptors Accordingto [21] if a piece of query music is to be compared with areference piece it should be done with a sliding comparisonas shown in Figure 2 In the figure one segment of linerepresents descriptors in a piece of music arranged in onedimensional structure Using the representation in Figure 1the query segment is arranged as
119876 = [11990211 sdot sdot sdot 119902124 sdot sdot sdot 119902311 sdot sdot sdot 1199023124] (2)
Since only descriptors from the same subband are to becompared the hop size between two query segments inFigure 2 is 24 descriptors or 003 sdot 16 = 048 second Inother words one descriptor (in a subband) represents 048second of audio samples The smallest Euclidean distance
4 The Scientific World Journal
middot middot middot
middot middot middot
119886124 11988621 119886224 1198863124middot middot middot
middot middot middot
11988611 11988631 middot middot middot 119886324 middot middot middot 1198863224 1198866224
11990211 119902124 11990222411990221 1199023124
middot middot middot11990211 119902124 11990222411990221 1199023124middot middot middot
middot middot middot11990211 119902124 1199023124middot middot middot
middot middot middot
middot middot middot
middot middot middot
Figure 2 The sliding operation to compare the query input Q (15seconds) and the reference music A (30 seconds)
obtained during the sliding operation is recorded as thedistance between the query input and the reference musicFor example if the query Q is to be compared with referenceA then we may obtain the Euclidean distance 119864
119860119861(119896) for the
k-th sliding comparison as
119864119876119860(119896) =
31
sum
119894=1
24
sum
119895=1
10038161003816100381610038161003816119902119894119895minus 119886(119894+119896)119895
10038161003816100381610038161003816 (3)
The recorded distance between these two pieces of music is
119889119876119860
= min119896
119864119876119860(119896) (4)
Although using audio signature descriptors greatlyreduces the computational burden for comparison therequired computation using (3) is still very large Accordingto [21] suppose that a database contains 1000 audio files witheach one having duration of 30 seconds and the music tobe identified has a duration of 15 seconds then 47 616 000arithmetic operations are required Therefore it is beneficialto further reduce the number of comparison which can beaccomplished by employing a multiresolution strategy
3 Music Identification Based onMultiresolution Strategy
As discussed previously the computational cost is still veryhigh even if we use the MPEG-7 descriptors Thereforewe will use the multiresolution strategy to reduce thecomputational complexity To do so in addition to theabove-mentioned high-resolution descriptors we also needto generate low-resolution descriptors (see also Section 4)Therefore the music database contains high-resolution andlow-resolution descriptors for each soundtrack to be identi-fied This step can be accomplished during the setup of thedatabase In addition a training process is conducted to finda distance threshold to determine whether the query inputis in the database or not Section 5 has a detailed descriptionabout the training process
Once the database is constructed as shown in Figure 3the music identification procedure consists of the followingsteps
(1) When a query input is sent to the system it com-putes the MPEG-7 audio signature descriptors andlow-resolution descriptors for the query music If amobile device is used to record the music usually the
Table 1 Bandwidth of the first three subbands
Bandwidth of subband(nonoverlapped)
Bandwidth of subband(overlapped)
250ndash2973Hz 2375ndash3122Hz2973ndash3536Hz 2824ndash3712Hz3535ndash4204Hz 3359ndash4415Hz
descriptors (fingerprints) are sent instead of the PCMsamples to reduce the size of the transmitted data
(2) The computed low-resolution descriptors are com-pared with those in the database Based on thedistance metric a list of candidates is obtainedSince the query music may start at any position ofthe soundtrack a sliding comparison as shown inFigure 2 is necessary
(3) After obtaining the candidate list high-resolutiondescriptors are used to find the distances between thequery input and the candidates The shortest distanceand its associated soundtrack are recorded
(4) If the shortest distance is less than the threshold thequery input is considered as highly similar to therecorded soundtrack in the database Otherwise thequery input is not in the database
During the low-resolution comparison it is also possibleto use an existing algorithm to reduce the comparison timeFor example we may arrange the descriptors in k-d tree (k-dimensional binary search tree) [22 23] structure to reducethe search time However using the k-d tree structure impliesthat every segment of music in the database has the samenumber of descriptors or same duration Note that the unitto be compared is one segment and one soundtrack canbe divided into many overlapped segments as shown inFigure 2 Therefore if the duration of the segment is set to 15seconds then the querymusic must also have a duration of 15seconds to use the system In a practical situation if the queryinput is longer than 15 seconds then only 15 seconds of themusic is used for computing the descriptors and comparison
4 Dimensionality Reduction Method forMPEG-7 Audio Signature Descriptors
This section briefly explains the dimensionality reductiontechnique used in the experiments Although a block-averagemethod is proposed in [21] for this purpose we use a differenttechnique in this paper
41 Problems of Reducing Dimensionality Using Scaling RatioConceptually we may increase the scaling ratio (given inSection 2) to reduce the number of descriptors duringcomputing them For example by increasing the scalingratio from 16 to 256 the number of descriptors is reducedby 16 times Unfortunately this approach does not yieldsatisfactory results because of insufficient time resolutionRecall that the descriptors are derived based on thewindowedwaveform Therefore if two segments of the soundtrack are
The Scientific World Journal 5
Query
Database
Dim reduction
In database
Reject Result
Threshold
Candidate
MPEG-7descriptors
Fullresolution
search
Lowresolution
search
YesNo
Figure 3 Procedure for multiresolution music identification
Descriptors
Descriptors Descriptors1198861198991 11988611989924 119886119899+11 119886119899+124
1199021198991 11990211989924
Figure 4 The audio samples and the descriptors In a real applica-tion 119886
119899119896is stored in the database whereas 119902
119899119896is computed from
the query input Usually 119886119899119896
is not equal to 119902119899119896
due to differentscopes of the windows even though they are all derived from thesame soundtrack
not highly similar their corresponding descriptors usuallyhave large differences With a scaling ratio of 256 eachdescriptor represents around 77 seconds of audio samplesTherefore unless the query input also starts at a point veryclose to the segment boundary of a soundtrack a comparisonbased on these descriptors may fail As illustrated in Figure 4descriptors of 119886
119899119896(or 119886119899+1119896
) and 119902119899119896
are quite different andtherefore the descriptor-based identification scheme cannotidentify the query inputTherefore a suitable time resolutionfor example 048 second should be maintained
42 Proposed Dimensionality Reduction Method In contrastto reduce the number of descriptors by increasing the scalingratio we may reduce them in each temporal-spectral block(ie the matrix in Figure 1) representing a segment of (15-second) music However to maintain a high time resolutionsuccessive temporal-spectral block should have a small timedifference as shown in Figure 5 With this arrangement weachieve both high identification rate and low comparisoncomplexity
For each temporal-spectral block we use PCA (princi-pal component analysis) [21 24] to reduce the number ofdescriptors Since it is difficult to directly use PCA to obtaingood low-resolution descriptors [21] we use an alternativeapproach Its basic idea is to partition the entire block into
119860 =
middot middot middot
119886124
11988621 119886224
119886311
119886321
1198863124
middot middot middot
11988611
11988631 119886324middot middot middot
middot middot middot
middot middot middot
middot middot middot
1198863224
119902331 1199023324
middot middot middot
middot middot middot
middot middot middot 1198866224119886621
11989111 11989118
11989121 11989128
11989131 11989138
Figure 5 Low-resolution descriptors In the figure each temporal-spectral block corresponds to descriptors from 15-second music
119860 = =11986011 1198601211986021 11986022
middot middot middot middot middot middot
middot middot middot
middot middot middot
middot middot middot
middot middot middot
middot middot middot
middot middot middot
middot middot middot
middot middot middot
11988611 11988618
119886151
119886161
119886291
119886158
119886168
119886298
11988619
119886159
119886169
119886299
119886116
1198861516
1198861616
1198862916 1198862917 1198862924
119886124119886117
Figure 6 Partition of a temporal-spectral block into four subblocksin the experiments
four subblocks as shown in Figure 6 and then use PCA toreduce the number of descriptors of each subblock into twovalues (descriptors) called low-resolution descriptors Alsohigh-frequency descriptors in the original temporal-spectralblock are all discarded because they are susceptible to noise[21] With this arrangement totally eight descriptors are usedto represent a segment of 15-second music Note that theactual duration of a segment used in the experiments is 1404seconds (though we still say 15 seconds) because some audiosamples are lost after MP-3 compression and decompressionin the experiments
6 The Scientific World Journal
Table 2 Identification accuracy using high-resolution features
Uncompressed 192 k MP-3 96 k MP-3High-frequencydescriptors Used Not used Used Not used Used Not used
Match in firstone 100 100 100 100 94 100
Match in first 15 100 100 100 100 98 100
Wenowdescribe how to calculate low-resolution descrip-tors Since using PCA for dimensionality reduction is a well-known approach we only describe how to construct thecovariance matrix for PCA computation and omit the com-putation of finding principal components Suppose that thereare N segments of music pieces in the database with theirdescriptor matrices denoted as 119860(1) to 119860(119873) To simplify theargument we consider the subblock matrix119860(119896)
11(referring to
Figure 6) in the following Other subblocks can be computedby the same procedure First collect all subblocks from 119860
(119896)
11
to form a big matrix as follows
119861 =
[[[
[
119886(1)
11sdot sdot sdot 119886
(1)
18119886(2)
11sdot sdot sdot 119886
(2)
18sdot sdot sdot 119886(119873)
11sdot sdot sdot 119886(119873)
18
sdot sdot sdot
119886(1)
151sdot sdot sdot 119886(1)
158119886(2)
151sdot sdot sdot 119886(2)
158sdot sdot sdot 119886(119873)
151sdot sdot sdot 119886(119873)
158
]]]
]
= [
[
uarr sdot sdot sdot uarr
b1sdot sdot sdot b8119873
darr sdot sdot sdot darr
]
]
(5)
Then we may consider column vectors of matrix B as b119894
vectors Having the data vectors b119894 the covariance matrix
and subsequently the principal components can be foundBy keeping only two principal components we can obtainc119894= [11988811989411198881198942] from b
119894 Since there are eight column vectors in
a subblock descriptors in119860(119896)11
are reduced to eight c119894(8119896minus7 le
119894 le 8119896) vectors Next we can rearrange the obtained c119894vectors
as
119862 =[[
[
11988811
11988812
11988891
11988892
sdot sdot sdot 1198888119873minus71
1198888119873minus72
sdot sdot sdot
11988881
11988882
119888161
119888162
sdot sdot sdot 11988881198731
11988881198732
]]
]
(6)
Again by treating each row of matrix C as the data to beprocessed by PCA we can compute the covariance matrixand finally reduce each column vector to one value Sincethere are two columns originally from one subblock thereare totally two (low-resolution) descriptors per subblockEquivalently a segment of 15-second music is representedby eight descriptors During the computation the principalcomponents obtained in the first and the second steps shouldbe stored in the database When a query input (in the formof high-resolution descriptors) is supplied these componentsare used to obtain the low-resolution descriptors for theinput
Table 3 Comparison of identification accuracies using low-resolution descriptors obtained by the proposed approach and bythe averaging (avg) method [21]
Uncompressed 192 k MP-3 96 k MP-3Method Proposed Avg Proposed Avg Proposed AvgMatch in firstone 993 995 991 993 901 898
Match in first15 100 100 100 100 996 995
5 Threshold to Determine the Membership ofthe Query Input
Since in practice the database cannot collect all musicsoundtracks in the world we have to have a strategy todetermine if the query input is actually in the database ornot In our case the decision is accomplished with the aid ofa threshold If the shortest distances between the input andthe candidates are greater than a threshold then the queryinput is not in the database otherwise it is in the databaseThough the concept is simple it is not trivial to determinethe threshold [20]
Recall that when a query input is applied to the proposedsystem the system computes the Euclidean distances betweenthe low-resolution descriptors of the query input and thosein the database Accordingly m (say m = 20) segments ofsoundtracks from distinct titles are selected based on thecomputed distances Next the Euclidean distances betweenthe query and the selected segments are computed againusing full-resolution descriptors By sorting the distancesfrom small to large the system creates a candidate list for thequery input
To compute the previously mentioned threshold weexamine two different methods In the first method (methodI) we define the ldquofirstrdquo distance 119889
1as the (full-resolution)
distance associated with the first candidate in the list Simi-larly the ldquosecondrdquo distance 119889
2is the distance associated with
the second candidate as shown in Figure 7 The methodto compute the first and second distances is used both intraining and in identification For the training phase assumethat there are 119873
119879soundtracks used For a segment from a
soundtrack indexed n let the first distance be 1198891(119899) and the
second distance be 1198892(119899) The threshold 119879
119860is then computed
as
119879119860= 05 sdot (
119873119879
sum
119899=1
1198891(119899) +
119873119879
sum
119899=1
1198892(119899)) (7)
During the identification phase the first distance forthe query input is computed If the first distance is smallerthan the threshold 119879
119860 the query input is determined to be
highly similar to the top soundtrack title in the candidate listOtherwise the query input is not inside the database
In addition to method I we also propose an alternativemethod (method II) to compute the first and seconddistances
The Scientific World Journal 7
Firs
t dist
ance
Seco
nd d
istan
ce
Figure 7 The first and second distances in method I
and the threshold Referring to Figure 8 the first distance1198891015840
1(119899) of method II for a training segment in soundtrack n is
1198891015840
1(119899) = 119889
1(119899) divide
sum119872
119898=2119889119898(119899)
119872 minus 1 (8)
where 119889119898(119899) is the distance associatedwith themth candidate
in the list andM is a constant Based on our experimentM =10 is sufficient By the same arrangement the second distance1198891015840
2(119899) is calculated as
1198891015840
2(119899) = 119889
2(119899) divide
sum119872+1
119898=3119889119898(119899)
119872 minus 1 (9)
Similar to (7) the threshold 1198791015840119860is determined as
1198791015840
119860= 05 sdot (
119873119879
sum
119899=1
1198891015840
1(119899) +
119873119879
sum
119899=1
1198891015840
2(119899)) (10)
During identification phase if the computed distance 11988910158401
is greater than1198791015840119860 the query input is determined as not in the
database Otherwise the query input is highly similar to thetop soundtrack title in the candidate list
6 Experiments and Results
Before conducting the experiments we collect 750 sound-tracks frommany CD titles for constructing the database andfor identification In the experiments 30 seconds of musicis excerpted from the soundtracks as the reference items tobe identified Then low- and full-resolution descriptors ofthe reference items are calculated and stored in the databaseWe also randomly excerpt 15-second query items from thereference items Note that a query item may start fromany sample on the first half of the reference To test theidentification accuracy for compressed audio the 15-secondquery inputs are also encoded and then decoded with anMP-3 coder with bitrates of 192 k and 96 k respectivelyHowever the database only contains descriptors from theuncompressed items In addition the principal componentsare also obtained using uncompressed items
Table 4 Comparison of search time between 119896-119889 tree search andlinear search
119870-119889 tree Linear searchAverage time 0255 sec 323 sec
Several experiments are conducted to examine the per-formance of the proposed systemThefirst experiment checksthe identification accuracy using high-resolution descriptorsThe second one compares the relative identification accuracybetween the proposed method (in Section 42) and themethod given in [21] The third experiment compares thesearch speed by using the linear search and k-d tree searchThe next experiment intends to determine a suitable valueof M used in (8) and (9) Having the value of M we reportthe identification accuracy using a multiresolution strategyin experiment five In this experiment both method I andmethod II are examined To have a complete comparisonbetween method I and II we also report the results in termsof the ROC (receiver operating characteristics) [25] curveand the DET-like (detection error tradeoff) [26] curve butwithout logarithm
61 Experiment One Identification Accuracy Using High-Resolution Descriptors The first experiment is to examinethe identification accuracy using the high-resolutionMPEG-7 audio signature descriptors (without reduction) To evaluatethe influences of high-frequency descriptors we also exam-ine the accuracies with and without using high-frequency(greater than 25 kHz) descriptors The results are given inTable 2 In the table ldquomatch in first onerdquo is the rate that thequery input is correctly identified with the shortest distanceamong all reference items Similarly ldquomatch in first 15rdquo is therate that the query is correctly identified within a list of 15 ref-erence items sorted by distanceThe results indicate that high-resolution descriptors have very good identification accuracyAlso for uncompressed or 192 k MP-3 items whether usinghigh-frequency descriptors does not affect the identificationaccuracy However for 96 k MP-3 items discarding high-frequency descriptors greatly improves the accuracy
8 The Scientific World Journal
Seco
nd d
istan
ce
Firs
t dist
ance
AvgAvg
dividedivide
Figure 8 The first and second distances in method II
62 Experiment Two Identification Accuracy Using Low-Resolution Descriptors The second experiment compares therelative identification accuracies of the proposed dimension-ality reduction technique and the averaging method pro-posed in [21] In this experiment eight descriptors are used torepresent one segment of music (15 seconds) The results areshown in Table 3 For this experiment the accuracy rate inldquomatch in first 15rdquo ismore important than the figure in ldquomatchin first onerdquo because the low-resolution descriptors are usedto produce a candidate list Therefore it is acceptable as longas the correct item is within the list as the high-resolutiondescriptors are to be used in the second stage In this regardthe proposed approach is slightly better than the averagingmethod proposed in [21] for 96 k MP-3 query inputs
63 Experiment Three Time for k-d Tree Search and LinearSearch The third experiment compares the time to generatea list of 20 candidates by using k-d tree search and linear(exhausted) search for low-resolution descriptors The com-puting time is obtained by averaging the required time in tentrials For each trial 500 query items are randomly selectedamong the 750 available items The experimental platform isa personal computer with a 16GHz PentiumCPU and 15 GBmemory The program is written in C with a Borland C++compiler The results are shown in Table 4 The results showthat we can increase the search time by 10-fold if the low-resolution descriptors are embedded in the k-d tree structure
64 Experiment Four Determination of M in Method II Thefourth experiment is to determine the numberM in methodII Recall that the thresholds in both methods are computedbased on high-resolution descriptors To determine the opti-mal value of M we vary this value from one to eighteen in(8) and (9) to compute the threshold 1198791015840
119860and then to perform
identification Note that method II is degenerated to methodI if M = 1 In this experiment 96 k MP-3-coded items areused as the query In addition we deliberately keep the high-frequency descriptors in computing the distances becausekeeping them makes it easier to examine the influence of Mversus identification rate (defined in experiment five ) Theexperimental results are given in Figure 9 The results showthat whenM is equal or greater than 10 the identification rate
085408560858
0860862086408660868
087087208740876
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Iden
tifica
tion
rate
119872 value
Figure 9The identification rate versus the numberM for 96 kMP-3query inputs
remains almost constant Therefore M = 10 is sufficient formethod II This value is then used in experiment five
65 Experiment Five Comparison between Method I andMethod II The fifth experiment compares the identificationrates between method I and method II The thresholds arecalculated using (7) and (10) respectively In this experiment375 of the 750 15-second items are randomly chosen fortraining (ie to obtain the thresholds) The rest of 375 itemsare for query Note that all of the items are excerpted from thereference items in the database In addition we test other 500query items (also with duration of 15 sec) not excerpted fromdatabase references
Since some query inputs are not in the database wealso need to consider the situations of falsely identifying anoutside database item as one inside the database and viceversa Suppose that 119879
119889query items are actually inside the
database and 119879119900query items are not inside the database
Assume that the identification system correctly identifies119873119889
inside-database query item erroneously rejects 119873119903inside-
database query items and erroneously accepts 119873119886outside
database query items Then the IDR (identification rate)
The Scientific World Journal 9
Table 5 Comparison of identification performance using method I and method II with the use of high-frequency descriptors
Uncompressed 192 k MP-3 96 k MP-3Method I Method II Method I Method II Method I Method II
IDR 988 999 99 997 841 874FAR 21 02 18 02 04 02FRR 00 00 00 05 366 296ACC 984 999 989 995 718 768
Table 6 Comparison of method I and method II for IDR FAR and FRR with high-frequency descriptors removed
Uncompressed 192 k MP-3 96 k MP-3Method I Method II Method I Method II Method I Method II
IDR 991 997 989 997 990 997FAR 15 02 19 02 16 02FRR 03 04 02 04 03 06ACC 987 995 985 995 986 994
FAR (false-accept rate) and FRR (false-rejection rate) arecomputed as follows
IDR =119873119889
119879119889minus 119873119903
(11)
FAR =119873119886
119879119900
(12)
FRR =119873119903
119879119889
(13)
In addition the accuracy of the system is computed as
ACC =119873119889+ 119873119900
119879119889+ 119879119900
(14)
where119873119900is the number of query items correctly identified as
outside-database items and can be computed as119873119900= 119879119900minus119873119886
In this experiment the low-resolution descriptors areused to find 20 candidates Next high-resolution descriptorsare used to find the best-matched one in the list If thedistance associated with the best-match is greater than thethreshold the query input is determined as not in thedatabase Otherwise the query input is identified as the best-matched one In this experiment high-frequency descriptorsare kept in the comparison With the use of (11) to (14) theaverage values after four trials are given in Table 5 Fromthe results we know that the system does not perform wellfor identifying 96 k MP-3 query items This is mainly dueto high distortion in high-frequency descriptors A similarphenomenon can also be observed in Table 2 Later onwe will repeat this experiment but discard high-frequencydescriptors
To further examine the performance differences betweenmethod I and method II we again use 96 k MP-3 items asthe query inputs But this time we vary the threshold values(for both methods) and then record the IDR FAR and FRRvalues The results are presented as ROC curves [25] andDET-like curves [26] (but without logarithm) as shown in
Figures 10 and 11 respectively The ROC curve plots FARversus TPR (true positive rate) which is defined as
TPR =119873119889
119879119889
(15)
Conceptually a better classifier should have a higher TPRfor a fixed FAR Therefore a better classifier should havean ROC curve closer to the left-upper corner With thisinterpretation from Figure 10 we know that method II is abetter classifier
In this experiment we also show DET-like curves Aldquotruerdquo DET curve plots FAR versus FRR using logarithmscales In our case however we do not use logarithm scales toexhibit the similarity between ROC and DET curves Againconceptually we know that a better classifier should have alower FRR over a fixed FAR Thus a better classifier shouldhave a curve closer to the left-bottom corner By examiningFigure 11 we again confirm thatmethod II is a better method
Since the high-frequency descriptors actually affect theidentification rate at low bitrates we again repeat this exper-iment without using high-frequency descriptors The resultsare given in Table 6 By comparing the results in Tables 5 and6 we know that removing high-frequency descriptors slightlyreduces the accuracy of uncompressed items However itgreatly improves the accuracy for 96 k MP-3 items Sincemethod II has an accuracy of at least 994 in all cases it ishighly plausible to identify copyrighted audiomaterials usingthis method
7 Conclusion
In this paper we propose a system using a multiresolutionstrategy to identify whether a piece of unknown music isidentical to one of the pieces in the database In the systemhigh-resolution descriptors are MPEG-7 audio signaturedescriptors and the low-resolution descriptors are obtainedfrom high-resolution descriptors with the aid of PCA toreduce their dimensionality Experimental results show that
10 The Scientific World Journal
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100
FAR ()
TPR
()
ROC curve (96 k MP-3)
Method IMethod II
Figure 10 The ROC curves for both methods
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100
FAR ()
FRR
()
Method IMethod II
DET curve (96 k MP-3)
Figure 11 The DET-like curves without logarithm for both meth-ods
low-resolution descriptors still have high identification accu-racies To reduce the time to generate the candidate list weuse the k-d tree structure to store low-resolution descriptorsExperimental results show that using k-d tree structureincreases the search time by ten-folds Since not every pieceof query input is within the database we also proposed twomethods to determine the distance thresholds Experimentalresults show that the proposed method II provides an accu-racy of 994 Therefore the proposed system can be usedin real applications such as identifying copyrighted audiofiles circulated over the Internet As it can be easily extendedto operate with multiple computers the proposed system
is a plausible starting point to construct a large operabledatabase
Acknowledgment
This work was supported in part by National Science Councilof Taiwan through Grants NSC 94-2213-E-027-042 and 99-2221-E-027-097
References
[1] For a brief description of the bill please refer to httpenwikipediaorgwikiSOPA
[2] R Eklund ldquoAudio watermarking techniquesrdquo httpwwwmusemagiccompaperswatermarkhtml
[3] J S R Jang and H R Lee ldquoA general framework of progressivefiltering and its application to query by singinghummingrdquoIEEE Transactions on Audio Speech and Language Processingvol 16 no 2 pp 350ndash358 2008
[4] S Baluja and M Covell ldquoAudio fingerprinting combiningcomputer visionamp data streamprocessingrdquo in Proceedings of theIEEE International Conference on Acoustics Speech and SignalProcessing (ICASSP rsquo07) pp II213ndashII216 Honolulu HawaiiUSA April 2007
[5] Shazam httpwwwshazamcommusicwebhomehtml[6] A Wang ldquoThe Shazam music recognition servicerdquo Communi-
cations of the ACM vol 49 no 8 pp 44ndash48 2006[7] V Chandrasekha M Sharifi and D A Ross ldquoSurvey and eval-
uation of audio fingerprinting schemes for mobile query-by-example applicationsrdquo in Proceedinjgs of the 12th InternationalConference onMusic Information Retrieval pp 801ndash806MiamiFla USA October 2011
[8] J A Haitsma and T Kalker ldquoA highly robust audio fingerprint-ing systemrdquo in Proceedinjgs of the 12th International Conferenceon Music Information Retrieval pp 107ndash115 Paris FranceOctober 2002
[9] C J C Burges J C Platt and S Jana ldquoDistortion discriminantanalysis for audio fingerprintingrdquo IEEE Transactions on Speechand Audio Processing vol 11 no 3 pp 165ndash174 2003
[10] Official website httpmusicbrainzorg[11] Official website httpwwwaudiblemagiccom[12] Official website httpwwwgracenotecom[13] httpenwikipediaorgwikiAcoustic fingerprint[14] P J O Doets M M Gisbert and R L Lagendijk ldquoOn
the comparison of audio fingerprints for extracting qualityparameters of compressed audiordquo in Security Steganographyand Watermarking of Multimedia Contents VIII vol 6072 ofProceedings of SPIE pp 60720L-1ndash60720L-12 San Jose CalifUSA January 2006
[15] F Nack and A T Lindsay ldquoEverything you wanted to knowaboutMPEG-7 part 1rdquo IEEEMultimedia vol 6 no 3 pp 65ndash771999
[16] F Nack and A Lindsay ldquoEverything you wanted to know aboutMPEG-7 part 2rdquo IEEEMultimedia vol 6 no 4 pp 64ndash73 1999
[17] ISOIEC Information TechnologymdashMultimedia ContentDescription Interface -Part 4 Audio IS 15938-4 2002
[18] H Crysandt ldquoMusic identification with MPEG-7rdquo in Proceed-ings of the 115th AES Convention October 2003 Paper 5967
The Scientific World Journal 11
[19] OHellmuth E AllamanceMCremerHGrossmann J Herreand T Kastner ldquoUsing MPEG-7 audio fingerprinting in real-world applicationrdquo in Proceedings of the 115th AES ConventionOctober 2003 Paper 5961
[20] H-G Kim N Moreau and T Sikora MPEG-7 Audio andBeyond John Wiley amp Sons 2005
[21] J-Y Lee and S D You ldquoDimension-reduction technique forMPEG-7 Audio Descriptorsrdquo in Proceedings of the 6th PacificRim Conference on Multimedia (PCM rsquo05) vol 3768 of LectureNotes in Computer Science pp 526ndash537 2005
[22] J L Bentley ldquoMultidimensional binary search trees used forassociative searchingrdquo Communications of the ACM vol 18 no9 pp 509ndash517 1975
[23] Open source program for k-d tree httpwwwcsumdedusimmountANN
[24] J Shlens ldquoA tutorial on principal component analysisrdquohttpwwwsnlsalkedusimshlenspcapdf
[25] T Fawcell ldquoROC graphs notes and practical considerations forresearchersrdquo Tech Rep HP Lab 2004
[26] A Martin G Doddington T Kamm M Ordowski and MPrzybocki ldquoThe DET curve in assessment of detection taskperformancerdquo in Proceedings of the Eurospeech vol 4 pp 1899ndash1903 Rhodes Greece September 1997
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
2 The Scientific World Journal
generally does not have the same waveforms and thereforethey can be correctly identifiedThough conceptually simpleit is not plausible to directly compare PCM samples of twopieces of music because it would take too much time forthe comparison For example a typical compact disc (CD)has about 600M bytes of PCM samples to store about tensongs If a database contains 10000 different songs then thePCM samples occupy about 600G bytes of space A piece ofunknownmusic with duration of ten seconds has about 880 kbytes of data It is obvious that it requires a huge amountof computation to sequentially compare the 880 k bytes ofdata with the 600G bytes of data in the database Thereforedimension-reduced representations of the PCM waveformsknown as fingerprints are used in comparison Amongthe fingerprints most of them are defined by individualcompanies or groups Some of them are briefly explained inthe following
Researchers in Google develop a fingerprinting schemefor audio calledWaveprint [4] based onwaveletsWith the aidof wavelets the fingerprint is invariant to timescale changeIn other words whether the audio piece is played faster orslower than the normal speed the fingerprint is unchangedThe fingerprint of a piece of 4-minute music is around 64 kbytes equivalently 2133 bits per second
Shazam [5] is a company (and service) dedicated formusic identification Its database contains around elevenmillion soundtracks As described in [6] the fingerprintsused are sets of triplets based on spectrogram peaks Forexample if (119905
1 1198911) and (119905
2 1198912) are two peaks at time 119905
1and
1199052and frequency 119891
1and 119891
2 then the triplet ((119905
2minus 1199051) 1198911
(1198912minus 1198911)) is a feature Based on the realization of [7] the
fingerprint in this scheme uses 400 bits per secondResearchers in Philips also propose a fingerprinting
scheme [8] The computation of the fingerprints includesframing windowing (von Hann window) FFT (fast fouriertransform) band decision energy computation and thenquantization (into binary) In the typical setting one secondof audio has around 2730 bits of fingerprint
Microsoftrsquos Robust Audio Recognition Engine (RARE)[9] divides the incoming audio into overlapping frames Eachframe is converted to spectral domain by MCLT (modulatedcomplex lapped transform)The spectral values are applied totwo layers of OPCA (oriented principle component analysis)to reduce the dimensionality of the spectral data For thismethod 344 features (11008 bits if one feature is stored in4 bytes in a floating point) are obtained per second
In addition to the above methods there are actuallymany more different types of audio fingerprinting schemesavailable such as Music Brainz [10] Audible Magic [11] andGracenotersquos MusicID [12] According to [13] there are morethan ten different audio fingerprinting schemes available
Since there are vast amount of different fingerprintingschemes available some researchers then conducted exper-iments to compare the relative performance among someof them The results show that if the schemes use thesame number of bits to represent fingerprints they havecomparable performance [14] Therefore the selection of thefingerprinting schemes should also consider other factors
(such as interoperability to be addressed below) rather thanmerely the minor performance difference
With the ever-increasing amount of multimedia contentover the Internet and in the multimedia databases it is animportant task to exchange multimedia content To respondthe public demands ISOrsquos (International StandardizationOrganization) working group developed MPEG-7 standard[15 16] In the audio part of the standard [17] a high-level tool is developed for audio identification called audiosignature description scheme The fingerprints used in thescheme are called audio signature descriptors and theyhave good identification accuracy [18 19] In the followingwe interchangeably use descriptors and fingerprints withoutdistinction
Although proprietary audio fingerprints have excellentidentification performance the MPEG-7 audio descriptorsoffer some advantages First being an international standardensures the open and fair use (subject to license fee) of thetechnology Second such an international standard makesthe interoperability possible For example if a mobile phoneinstalls an application program to convert a piece of recordedaudio to MPEG-7 descriptors the descriptors can be sent toany website accepting the descriptors On the other handit is not possible to send proprietary fingerprints used inone company to database systems owned by competitorsThird different companies may share or exchange theiraudio descriptors (fingerprints) in their databases withoutany difficulties In the current situation each company has tocompute fingerprints for newly released albumsWith the useof MPEG-7 descriptors the redundant efforts of computingfingerprints can be minimized
Although a music identification system based on audiofingerprints has several applications [8] we concentrate onthe issue of detecting if a piece of circulated music is highlysimilar to a copyrighted work or not In a typical case thesimilarity is measured by a distance metric If the distanceis shorter than a threshold the two pieces of music areconsidered as similar Although it is not trivial to determine asuitable threshold [20] this problem is not fully studied Forexample [20] does not indicate any approach to determinethe threshold In addition the audio files to be comparedmay be very large therefore it is very important to reducethe comparison time while maintaining high identificationaccuracy Since not many papers address these two issuesbased on MPEG-7 descriptors we report in this paper ourapproaches and experimental results
This paper is organized as follows Section 2 is anoverview of the MPEG-7 audio signature descriptorsSection 3 is the system model for music identificationSection 4 describes the dimensionality reduction methodused in the paper Section 5 is the proposed strategy todetermine the threshold Section 6 covers the experimentsand results Section 7 is the conclusion
2 Overview of MPEG-7 AudioSignature Descriptors
Part 4 [17] of the MPEG-7 standard includes low-leveland high-level descriptors for various applications Low-level descriptors are derived from the temporal and spec-tral characteristics of the waveform whereas the high-level
The Scientific World Journal 3
descriptors are constructed based on low-level descriptors Inthis section we will briefly describe the descriptors related tomusic identification
21 Low-Level and High-Level Descriptors of the MPEG-7Audio There are 17 low-level audio descriptors defined inthe standard All of these low-level descriptors are derivedfrom the waveform of the music They can be dividedinto six different categories basic basic spectral signalparameter temporal timbral spectral timbral and spectralbasis representationThese low-level features may be directlyused or may serve as the basics for constructing high-leveldescriptors
Based on low-level descriptors MPEG-7 audio standardalso defines high-level description schemes for various appli-cations These include audio signature description instru-ment timbre description general sound recognition andindexing description and spoken content description Theaudio signature descriptors one type of fingerprints are usedfor music identification
22 MPEG-7 Audio Signature Descriptors Although low-level descriptors in MPEG-7 audio may also be used toidentifymusic it is shown that the audio signature descriptorsprovide better identification performance [18 19] Thereforethe audio signature descriptors are adopted in the proposedsystem
TheMPEG-7 audio signature descriptors are computed asfollows
(1) Time-to-frequency conversion this step is based onaudio spectrum envelope descriptors including thefollowing substeps
(i) Determine the hop length between two con-secutive windows in the unit of samples Thedefault value is equivalent to 30ms
(ii) Define the window length 119897119908 This value is set to
three times the hop length The chosen windowis Hamming window
(iii) Determine the length of FFT denoted as 119873FFTTo reduce the computational complexity 119873FFTis the smallest power-of-2 number equal to orgreater than 119897
119908 For example if the sample rate of
the audio is 44100 ss then 119897119908= 44100sdot003sdot3 =
3969Therefore119873FFT =4096The extra samplesafter 119897
119908are padded with zeros
(iv) Perform the FFT on the windowed samples
(2) Divide the FFT coefficients into subbands with eachsubband having a bandwidth of one-fourth of anoctave In addition the bandwidths of two consecu-tive subbands should overlap each other by 10Thatis the computed bandwidth for each subbandmust bemultiplied by 11 The beginning frequency of the firstsubband denoted as loEdge is fixed at 250Hz Table 1lists the frequency range of the first three subbandsbefore and after (spectral) overlapping
Subband
middot middot middot middot middot middot
119860 =
11988612 119886124
11988621 11988622 119886224
119886301 119886302 1198863024
119886311 119886312 1198863124
middot middot middot middot middot middot
middot middot middot middot middot middot
middot middot middot middot middot middot
11988611
Time
Figure 1 MPEG-7 audio signature descriptors in a matrix
(3) Find the flatness measure F(b) for subband b by
119865 (119887) =
ℎ(119887)minus119897(119887)+1radicprodℎ(119887)
119894=119897(119887)119888 (119894)
(1 (ℎ (119887) minus 119897 (119887) + 1))sumℎ(119887)
119894=119897(119887)119888 (119894)
(1)
where c(i) is the power spectrum computed by FFT(in step 1) and h(b) and l(b) are the lower and upperindices of c(i) within subband b
(4) Find themean and variance of F(b) for subband b overa certain number of successive FFT windows calledscaling ratio The default value of the scaling ratio is16
(5) The series of mean and variance values are the audiosignature descriptors
With the computational steps of the audio signaturedescriptor we may compute the number of descriptors ina piece of 15-second music In the time domain about(15minus009)003 = 497windows are used to cover the 15-secondsignal because the hop size is 30ms Since the scaling ratio isset to 16 there are 49716 asymp 31 values per subband In thespectral domain since there are four subbands per octave andwe use six octaves (250Hz sim 16 kHz) there are totally 24subbands in the spectral domain Thus the 15-second signalproduces 24 times 31 = 744 mean values and other 744 variancevalues The descriptors are arranged in a matrix form asshown in Figure 1
Since the mean values are sufficient for the identificationpurpose [21] we will not consider the variance descriptorsin the following The obtained descriptors are referred to ashigh-resolution (or full-dimensional) descriptors Accordingto [21] if a piece of query music is to be compared with areference piece it should be done with a sliding comparisonas shown in Figure 2 In the figure one segment of linerepresents descriptors in a piece of music arranged in onedimensional structure Using the representation in Figure 1the query segment is arranged as
119876 = [11990211 sdot sdot sdot 119902124 sdot sdot sdot 119902311 sdot sdot sdot 1199023124] (2)
Since only descriptors from the same subband are to becompared the hop size between two query segments inFigure 2 is 24 descriptors or 003 sdot 16 = 048 second Inother words one descriptor (in a subband) represents 048second of audio samples The smallest Euclidean distance
4 The Scientific World Journal
middot middot middot
middot middot middot
119886124 11988621 119886224 1198863124middot middot middot
middot middot middot
11988611 11988631 middot middot middot 119886324 middot middot middot 1198863224 1198866224
11990211 119902124 11990222411990221 1199023124
middot middot middot11990211 119902124 11990222411990221 1199023124middot middot middot
middot middot middot11990211 119902124 1199023124middot middot middot
middot middot middot
middot middot middot
middot middot middot
Figure 2 The sliding operation to compare the query input Q (15seconds) and the reference music A (30 seconds)
obtained during the sliding operation is recorded as thedistance between the query input and the reference musicFor example if the query Q is to be compared with referenceA then we may obtain the Euclidean distance 119864
119860119861(119896) for the
k-th sliding comparison as
119864119876119860(119896) =
31
sum
119894=1
24
sum
119895=1
10038161003816100381610038161003816119902119894119895minus 119886(119894+119896)119895
10038161003816100381610038161003816 (3)
The recorded distance between these two pieces of music is
119889119876119860
= min119896
119864119876119860(119896) (4)
Although using audio signature descriptors greatlyreduces the computational burden for comparison therequired computation using (3) is still very large Accordingto [21] suppose that a database contains 1000 audio files witheach one having duration of 30 seconds and the music tobe identified has a duration of 15 seconds then 47 616 000arithmetic operations are required Therefore it is beneficialto further reduce the number of comparison which can beaccomplished by employing a multiresolution strategy
3 Music Identification Based onMultiresolution Strategy
As discussed previously the computational cost is still veryhigh even if we use the MPEG-7 descriptors Thereforewe will use the multiresolution strategy to reduce thecomputational complexity To do so in addition to theabove-mentioned high-resolution descriptors we also needto generate low-resolution descriptors (see also Section 4)Therefore the music database contains high-resolution andlow-resolution descriptors for each soundtrack to be identi-fied This step can be accomplished during the setup of thedatabase In addition a training process is conducted to finda distance threshold to determine whether the query inputis in the database or not Section 5 has a detailed descriptionabout the training process
Once the database is constructed as shown in Figure 3the music identification procedure consists of the followingsteps
(1) When a query input is sent to the system it com-putes the MPEG-7 audio signature descriptors andlow-resolution descriptors for the query music If amobile device is used to record the music usually the
Table 1 Bandwidth of the first three subbands
Bandwidth of subband(nonoverlapped)
Bandwidth of subband(overlapped)
250ndash2973Hz 2375ndash3122Hz2973ndash3536Hz 2824ndash3712Hz3535ndash4204Hz 3359ndash4415Hz
descriptors (fingerprints) are sent instead of the PCMsamples to reduce the size of the transmitted data
(2) The computed low-resolution descriptors are com-pared with those in the database Based on thedistance metric a list of candidates is obtainedSince the query music may start at any position ofthe soundtrack a sliding comparison as shown inFigure 2 is necessary
(3) After obtaining the candidate list high-resolutiondescriptors are used to find the distances between thequery input and the candidates The shortest distanceand its associated soundtrack are recorded
(4) If the shortest distance is less than the threshold thequery input is considered as highly similar to therecorded soundtrack in the database Otherwise thequery input is not in the database
During the low-resolution comparison it is also possibleto use an existing algorithm to reduce the comparison timeFor example we may arrange the descriptors in k-d tree (k-dimensional binary search tree) [22 23] structure to reducethe search time However using the k-d tree structure impliesthat every segment of music in the database has the samenumber of descriptors or same duration Note that the unitto be compared is one segment and one soundtrack canbe divided into many overlapped segments as shown inFigure 2 Therefore if the duration of the segment is set to 15seconds then the querymusic must also have a duration of 15seconds to use the system In a practical situation if the queryinput is longer than 15 seconds then only 15 seconds of themusic is used for computing the descriptors and comparison
4 Dimensionality Reduction Method forMPEG-7 Audio Signature Descriptors
This section briefly explains the dimensionality reductiontechnique used in the experiments Although a block-averagemethod is proposed in [21] for this purpose we use a differenttechnique in this paper
41 Problems of Reducing Dimensionality Using Scaling RatioConceptually we may increase the scaling ratio (given inSection 2) to reduce the number of descriptors duringcomputing them For example by increasing the scalingratio from 16 to 256 the number of descriptors is reducedby 16 times Unfortunately this approach does not yieldsatisfactory results because of insufficient time resolutionRecall that the descriptors are derived based on thewindowedwaveform Therefore if two segments of the soundtrack are
The Scientific World Journal 5
Query
Database
Dim reduction
In database
Reject Result
Threshold
Candidate
MPEG-7descriptors
Fullresolution
search
Lowresolution
search
YesNo
Figure 3 Procedure for multiresolution music identification
Descriptors
Descriptors Descriptors1198861198991 11988611989924 119886119899+11 119886119899+124
1199021198991 11990211989924
Figure 4 The audio samples and the descriptors In a real applica-tion 119886
119899119896is stored in the database whereas 119902
119899119896is computed from
the query input Usually 119886119899119896
is not equal to 119902119899119896
due to differentscopes of the windows even though they are all derived from thesame soundtrack
not highly similar their corresponding descriptors usuallyhave large differences With a scaling ratio of 256 eachdescriptor represents around 77 seconds of audio samplesTherefore unless the query input also starts at a point veryclose to the segment boundary of a soundtrack a comparisonbased on these descriptors may fail As illustrated in Figure 4descriptors of 119886
119899119896(or 119886119899+1119896
) and 119902119899119896
are quite different andtherefore the descriptor-based identification scheme cannotidentify the query inputTherefore a suitable time resolutionfor example 048 second should be maintained
42 Proposed Dimensionality Reduction Method In contrastto reduce the number of descriptors by increasing the scalingratio we may reduce them in each temporal-spectral block(ie the matrix in Figure 1) representing a segment of (15-second) music However to maintain a high time resolutionsuccessive temporal-spectral block should have a small timedifference as shown in Figure 5 With this arrangement weachieve both high identification rate and low comparisoncomplexity
For each temporal-spectral block we use PCA (princi-pal component analysis) [21 24] to reduce the number ofdescriptors Since it is difficult to directly use PCA to obtaingood low-resolution descriptors [21] we use an alternativeapproach Its basic idea is to partition the entire block into
119860 =
middot middot middot
119886124
11988621 119886224
119886311
119886321
1198863124
middot middot middot
11988611
11988631 119886324middot middot middot
middot middot middot
middot middot middot
middot middot middot
1198863224
119902331 1199023324
middot middot middot
middot middot middot
middot middot middot 1198866224119886621
11989111 11989118
11989121 11989128
11989131 11989138
Figure 5 Low-resolution descriptors In the figure each temporal-spectral block corresponds to descriptors from 15-second music
119860 = =11986011 1198601211986021 11986022
middot middot middot middot middot middot
middot middot middot
middot middot middot
middot middot middot
middot middot middot
middot middot middot
middot middot middot
middot middot middot
middot middot middot
11988611 11988618
119886151
119886161
119886291
119886158
119886168
119886298
11988619
119886159
119886169
119886299
119886116
1198861516
1198861616
1198862916 1198862917 1198862924
119886124119886117
Figure 6 Partition of a temporal-spectral block into four subblocksin the experiments
four subblocks as shown in Figure 6 and then use PCA toreduce the number of descriptors of each subblock into twovalues (descriptors) called low-resolution descriptors Alsohigh-frequency descriptors in the original temporal-spectralblock are all discarded because they are susceptible to noise[21] With this arrangement totally eight descriptors are usedto represent a segment of 15-second music Note that theactual duration of a segment used in the experiments is 1404seconds (though we still say 15 seconds) because some audiosamples are lost after MP-3 compression and decompressionin the experiments
6 The Scientific World Journal
Table 2 Identification accuracy using high-resolution features
Uncompressed 192 k MP-3 96 k MP-3High-frequencydescriptors Used Not used Used Not used Used Not used
Match in firstone 100 100 100 100 94 100
Match in first 15 100 100 100 100 98 100
Wenowdescribe how to calculate low-resolution descrip-tors Since using PCA for dimensionality reduction is a well-known approach we only describe how to construct thecovariance matrix for PCA computation and omit the com-putation of finding principal components Suppose that thereare N segments of music pieces in the database with theirdescriptor matrices denoted as 119860(1) to 119860(119873) To simplify theargument we consider the subblock matrix119860(119896)
11(referring to
Figure 6) in the following Other subblocks can be computedby the same procedure First collect all subblocks from 119860
(119896)
11
to form a big matrix as follows
119861 =
[[[
[
119886(1)
11sdot sdot sdot 119886
(1)
18119886(2)
11sdot sdot sdot 119886
(2)
18sdot sdot sdot 119886(119873)
11sdot sdot sdot 119886(119873)
18
sdot sdot sdot
119886(1)
151sdot sdot sdot 119886(1)
158119886(2)
151sdot sdot sdot 119886(2)
158sdot sdot sdot 119886(119873)
151sdot sdot sdot 119886(119873)
158
]]]
]
= [
[
uarr sdot sdot sdot uarr
b1sdot sdot sdot b8119873
darr sdot sdot sdot darr
]
]
(5)
Then we may consider column vectors of matrix B as b119894
vectors Having the data vectors b119894 the covariance matrix
and subsequently the principal components can be foundBy keeping only two principal components we can obtainc119894= [11988811989411198881198942] from b
119894 Since there are eight column vectors in
a subblock descriptors in119860(119896)11
are reduced to eight c119894(8119896minus7 le
119894 le 8119896) vectors Next we can rearrange the obtained c119894vectors
as
119862 =[[
[
11988811
11988812
11988891
11988892
sdot sdot sdot 1198888119873minus71
1198888119873minus72
sdot sdot sdot
11988881
11988882
119888161
119888162
sdot sdot sdot 11988881198731
11988881198732
]]
]
(6)
Again by treating each row of matrix C as the data to beprocessed by PCA we can compute the covariance matrixand finally reduce each column vector to one value Sincethere are two columns originally from one subblock thereare totally two (low-resolution) descriptors per subblockEquivalently a segment of 15-second music is representedby eight descriptors During the computation the principalcomponents obtained in the first and the second steps shouldbe stored in the database When a query input (in the formof high-resolution descriptors) is supplied these componentsare used to obtain the low-resolution descriptors for theinput
Table 3 Comparison of identification accuracies using low-resolution descriptors obtained by the proposed approach and bythe averaging (avg) method [21]
Uncompressed 192 k MP-3 96 k MP-3Method Proposed Avg Proposed Avg Proposed AvgMatch in firstone 993 995 991 993 901 898
Match in first15 100 100 100 100 996 995
5 Threshold to Determine the Membership ofthe Query Input
Since in practice the database cannot collect all musicsoundtracks in the world we have to have a strategy todetermine if the query input is actually in the database ornot In our case the decision is accomplished with the aid ofa threshold If the shortest distances between the input andthe candidates are greater than a threshold then the queryinput is not in the database otherwise it is in the databaseThough the concept is simple it is not trivial to determinethe threshold [20]
Recall that when a query input is applied to the proposedsystem the system computes the Euclidean distances betweenthe low-resolution descriptors of the query input and thosein the database Accordingly m (say m = 20) segments ofsoundtracks from distinct titles are selected based on thecomputed distances Next the Euclidean distances betweenthe query and the selected segments are computed againusing full-resolution descriptors By sorting the distancesfrom small to large the system creates a candidate list for thequery input
To compute the previously mentioned threshold weexamine two different methods In the first method (methodI) we define the ldquofirstrdquo distance 119889
1as the (full-resolution)
distance associated with the first candidate in the list Simi-larly the ldquosecondrdquo distance 119889
2is the distance associated with
the second candidate as shown in Figure 7 The methodto compute the first and second distances is used both intraining and in identification For the training phase assumethat there are 119873
119879soundtracks used For a segment from a
soundtrack indexed n let the first distance be 1198891(119899) and the
second distance be 1198892(119899) The threshold 119879
119860is then computed
as
119879119860= 05 sdot (
119873119879
sum
119899=1
1198891(119899) +
119873119879
sum
119899=1
1198892(119899)) (7)
During the identification phase the first distance forthe query input is computed If the first distance is smallerthan the threshold 119879
119860 the query input is determined to be
highly similar to the top soundtrack title in the candidate listOtherwise the query input is not inside the database
In addition to method I we also propose an alternativemethod (method II) to compute the first and seconddistances
The Scientific World Journal 7
Firs
t dist
ance
Seco
nd d
istan
ce
Figure 7 The first and second distances in method I
and the threshold Referring to Figure 8 the first distance1198891015840
1(119899) of method II for a training segment in soundtrack n is
1198891015840
1(119899) = 119889
1(119899) divide
sum119872
119898=2119889119898(119899)
119872 minus 1 (8)
where 119889119898(119899) is the distance associatedwith themth candidate
in the list andM is a constant Based on our experimentM =10 is sufficient By the same arrangement the second distance1198891015840
2(119899) is calculated as
1198891015840
2(119899) = 119889
2(119899) divide
sum119872+1
119898=3119889119898(119899)
119872 minus 1 (9)
Similar to (7) the threshold 1198791015840119860is determined as
1198791015840
119860= 05 sdot (
119873119879
sum
119899=1
1198891015840
1(119899) +
119873119879
sum
119899=1
1198891015840
2(119899)) (10)
During identification phase if the computed distance 11988910158401
is greater than1198791015840119860 the query input is determined as not in the
database Otherwise the query input is highly similar to thetop soundtrack title in the candidate list
6 Experiments and Results
Before conducting the experiments we collect 750 sound-tracks frommany CD titles for constructing the database andfor identification In the experiments 30 seconds of musicis excerpted from the soundtracks as the reference items tobe identified Then low- and full-resolution descriptors ofthe reference items are calculated and stored in the databaseWe also randomly excerpt 15-second query items from thereference items Note that a query item may start fromany sample on the first half of the reference To test theidentification accuracy for compressed audio the 15-secondquery inputs are also encoded and then decoded with anMP-3 coder with bitrates of 192 k and 96 k respectivelyHowever the database only contains descriptors from theuncompressed items In addition the principal componentsare also obtained using uncompressed items
Table 4 Comparison of search time between 119896-119889 tree search andlinear search
119870-119889 tree Linear searchAverage time 0255 sec 323 sec
Several experiments are conducted to examine the per-formance of the proposed systemThefirst experiment checksthe identification accuracy using high-resolution descriptorsThe second one compares the relative identification accuracybetween the proposed method (in Section 42) and themethod given in [21] The third experiment compares thesearch speed by using the linear search and k-d tree searchThe next experiment intends to determine a suitable valueof M used in (8) and (9) Having the value of M we reportthe identification accuracy using a multiresolution strategyin experiment five In this experiment both method I andmethod II are examined To have a complete comparisonbetween method I and II we also report the results in termsof the ROC (receiver operating characteristics) [25] curveand the DET-like (detection error tradeoff) [26] curve butwithout logarithm
61 Experiment One Identification Accuracy Using High-Resolution Descriptors The first experiment is to examinethe identification accuracy using the high-resolutionMPEG-7 audio signature descriptors (without reduction) To evaluatethe influences of high-frequency descriptors we also exam-ine the accuracies with and without using high-frequency(greater than 25 kHz) descriptors The results are given inTable 2 In the table ldquomatch in first onerdquo is the rate that thequery input is correctly identified with the shortest distanceamong all reference items Similarly ldquomatch in first 15rdquo is therate that the query is correctly identified within a list of 15 ref-erence items sorted by distanceThe results indicate that high-resolution descriptors have very good identification accuracyAlso for uncompressed or 192 k MP-3 items whether usinghigh-frequency descriptors does not affect the identificationaccuracy However for 96 k MP-3 items discarding high-frequency descriptors greatly improves the accuracy
8 The Scientific World Journal
Seco
nd d
istan
ce
Firs
t dist
ance
AvgAvg
dividedivide
Figure 8 The first and second distances in method II
62 Experiment Two Identification Accuracy Using Low-Resolution Descriptors The second experiment compares therelative identification accuracies of the proposed dimension-ality reduction technique and the averaging method pro-posed in [21] In this experiment eight descriptors are used torepresent one segment of music (15 seconds) The results areshown in Table 3 For this experiment the accuracy rate inldquomatch in first 15rdquo ismore important than the figure in ldquomatchin first onerdquo because the low-resolution descriptors are usedto produce a candidate list Therefore it is acceptable as longas the correct item is within the list as the high-resolutiondescriptors are to be used in the second stage In this regardthe proposed approach is slightly better than the averagingmethod proposed in [21] for 96 k MP-3 query inputs
63 Experiment Three Time for k-d Tree Search and LinearSearch The third experiment compares the time to generatea list of 20 candidates by using k-d tree search and linear(exhausted) search for low-resolution descriptors The com-puting time is obtained by averaging the required time in tentrials For each trial 500 query items are randomly selectedamong the 750 available items The experimental platform isa personal computer with a 16GHz PentiumCPU and 15 GBmemory The program is written in C with a Borland C++compiler The results are shown in Table 4 The results showthat we can increase the search time by 10-fold if the low-resolution descriptors are embedded in the k-d tree structure
64 Experiment Four Determination of M in Method II Thefourth experiment is to determine the numberM in methodII Recall that the thresholds in both methods are computedbased on high-resolution descriptors To determine the opti-mal value of M we vary this value from one to eighteen in(8) and (9) to compute the threshold 1198791015840
119860and then to perform
identification Note that method II is degenerated to methodI if M = 1 In this experiment 96 k MP-3-coded items areused as the query In addition we deliberately keep the high-frequency descriptors in computing the distances becausekeeping them makes it easier to examine the influence of Mversus identification rate (defined in experiment five ) Theexperimental results are given in Figure 9 The results showthat whenM is equal or greater than 10 the identification rate
085408560858
0860862086408660868
087087208740876
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Iden
tifica
tion
rate
119872 value
Figure 9The identification rate versus the numberM for 96 kMP-3query inputs
remains almost constant Therefore M = 10 is sufficient formethod II This value is then used in experiment five
65 Experiment Five Comparison between Method I andMethod II The fifth experiment compares the identificationrates between method I and method II The thresholds arecalculated using (7) and (10) respectively In this experiment375 of the 750 15-second items are randomly chosen fortraining (ie to obtain the thresholds) The rest of 375 itemsare for query Note that all of the items are excerpted from thereference items in the database In addition we test other 500query items (also with duration of 15 sec) not excerpted fromdatabase references
Since some query inputs are not in the database wealso need to consider the situations of falsely identifying anoutside database item as one inside the database and viceversa Suppose that 119879
119889query items are actually inside the
database and 119879119900query items are not inside the database
Assume that the identification system correctly identifies119873119889
inside-database query item erroneously rejects 119873119903inside-
database query items and erroneously accepts 119873119886outside
database query items Then the IDR (identification rate)
The Scientific World Journal 9
Table 5 Comparison of identification performance using method I and method II with the use of high-frequency descriptors
Uncompressed 192 k MP-3 96 k MP-3Method I Method II Method I Method II Method I Method II
IDR 988 999 99 997 841 874FAR 21 02 18 02 04 02FRR 00 00 00 05 366 296ACC 984 999 989 995 718 768
Table 6 Comparison of method I and method II for IDR FAR and FRR with high-frequency descriptors removed
Uncompressed 192 k MP-3 96 k MP-3Method I Method II Method I Method II Method I Method II
IDR 991 997 989 997 990 997FAR 15 02 19 02 16 02FRR 03 04 02 04 03 06ACC 987 995 985 995 986 994
FAR (false-accept rate) and FRR (false-rejection rate) arecomputed as follows
IDR =119873119889
119879119889minus 119873119903
(11)
FAR =119873119886
119879119900
(12)
FRR =119873119903
119879119889
(13)
In addition the accuracy of the system is computed as
ACC =119873119889+ 119873119900
119879119889+ 119879119900
(14)
where119873119900is the number of query items correctly identified as
outside-database items and can be computed as119873119900= 119879119900minus119873119886
In this experiment the low-resolution descriptors areused to find 20 candidates Next high-resolution descriptorsare used to find the best-matched one in the list If thedistance associated with the best-match is greater than thethreshold the query input is determined as not in thedatabase Otherwise the query input is identified as the best-matched one In this experiment high-frequency descriptorsare kept in the comparison With the use of (11) to (14) theaverage values after four trials are given in Table 5 Fromthe results we know that the system does not perform wellfor identifying 96 k MP-3 query items This is mainly dueto high distortion in high-frequency descriptors A similarphenomenon can also be observed in Table 2 Later onwe will repeat this experiment but discard high-frequencydescriptors
To further examine the performance differences betweenmethod I and method II we again use 96 k MP-3 items asthe query inputs But this time we vary the threshold values(for both methods) and then record the IDR FAR and FRRvalues The results are presented as ROC curves [25] andDET-like curves [26] (but without logarithm) as shown in
Figures 10 and 11 respectively The ROC curve plots FARversus TPR (true positive rate) which is defined as
TPR =119873119889
119879119889
(15)
Conceptually a better classifier should have a higher TPRfor a fixed FAR Therefore a better classifier should havean ROC curve closer to the left-upper corner With thisinterpretation from Figure 10 we know that method II is abetter classifier
In this experiment we also show DET-like curves Aldquotruerdquo DET curve plots FAR versus FRR using logarithmscales In our case however we do not use logarithm scales toexhibit the similarity between ROC and DET curves Againconceptually we know that a better classifier should have alower FRR over a fixed FAR Thus a better classifier shouldhave a curve closer to the left-bottom corner By examiningFigure 11 we again confirm thatmethod II is a better method
Since the high-frequency descriptors actually affect theidentification rate at low bitrates we again repeat this exper-iment without using high-frequency descriptors The resultsare given in Table 6 By comparing the results in Tables 5 and6 we know that removing high-frequency descriptors slightlyreduces the accuracy of uncompressed items However itgreatly improves the accuracy for 96 k MP-3 items Sincemethod II has an accuracy of at least 994 in all cases it ishighly plausible to identify copyrighted audiomaterials usingthis method
7 Conclusion
In this paper we propose a system using a multiresolutionstrategy to identify whether a piece of unknown music isidentical to one of the pieces in the database In the systemhigh-resolution descriptors are MPEG-7 audio signaturedescriptors and the low-resolution descriptors are obtainedfrom high-resolution descriptors with the aid of PCA toreduce their dimensionality Experimental results show that
10 The Scientific World Journal
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100
FAR ()
TPR
()
ROC curve (96 k MP-3)
Method IMethod II
Figure 10 The ROC curves for both methods
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100
FAR ()
FRR
()
Method IMethod II
DET curve (96 k MP-3)
Figure 11 The DET-like curves without logarithm for both meth-ods
low-resolution descriptors still have high identification accu-racies To reduce the time to generate the candidate list weuse the k-d tree structure to store low-resolution descriptorsExperimental results show that using k-d tree structureincreases the search time by ten-folds Since not every pieceof query input is within the database we also proposed twomethods to determine the distance thresholds Experimentalresults show that the proposed method II provides an accu-racy of 994 Therefore the proposed system can be usedin real applications such as identifying copyrighted audiofiles circulated over the Internet As it can be easily extendedto operate with multiple computers the proposed system
is a plausible starting point to construct a large operabledatabase
Acknowledgment
This work was supported in part by National Science Councilof Taiwan through Grants NSC 94-2213-E-027-042 and 99-2221-E-027-097
References
[1] For a brief description of the bill please refer to httpenwikipediaorgwikiSOPA
[2] R Eklund ldquoAudio watermarking techniquesrdquo httpwwwmusemagiccompaperswatermarkhtml
[3] J S R Jang and H R Lee ldquoA general framework of progressivefiltering and its application to query by singinghummingrdquoIEEE Transactions on Audio Speech and Language Processingvol 16 no 2 pp 350ndash358 2008
[4] S Baluja and M Covell ldquoAudio fingerprinting combiningcomputer visionamp data streamprocessingrdquo in Proceedings of theIEEE International Conference on Acoustics Speech and SignalProcessing (ICASSP rsquo07) pp II213ndashII216 Honolulu HawaiiUSA April 2007
[5] Shazam httpwwwshazamcommusicwebhomehtml[6] A Wang ldquoThe Shazam music recognition servicerdquo Communi-
cations of the ACM vol 49 no 8 pp 44ndash48 2006[7] V Chandrasekha M Sharifi and D A Ross ldquoSurvey and eval-
uation of audio fingerprinting schemes for mobile query-by-example applicationsrdquo in Proceedinjgs of the 12th InternationalConference onMusic Information Retrieval pp 801ndash806MiamiFla USA October 2011
[8] J A Haitsma and T Kalker ldquoA highly robust audio fingerprint-ing systemrdquo in Proceedinjgs of the 12th International Conferenceon Music Information Retrieval pp 107ndash115 Paris FranceOctober 2002
[9] C J C Burges J C Platt and S Jana ldquoDistortion discriminantanalysis for audio fingerprintingrdquo IEEE Transactions on Speechand Audio Processing vol 11 no 3 pp 165ndash174 2003
[10] Official website httpmusicbrainzorg[11] Official website httpwwwaudiblemagiccom[12] Official website httpwwwgracenotecom[13] httpenwikipediaorgwikiAcoustic fingerprint[14] P J O Doets M M Gisbert and R L Lagendijk ldquoOn
the comparison of audio fingerprints for extracting qualityparameters of compressed audiordquo in Security Steganographyand Watermarking of Multimedia Contents VIII vol 6072 ofProceedings of SPIE pp 60720L-1ndash60720L-12 San Jose CalifUSA January 2006
[15] F Nack and A T Lindsay ldquoEverything you wanted to knowaboutMPEG-7 part 1rdquo IEEEMultimedia vol 6 no 3 pp 65ndash771999
[16] F Nack and A Lindsay ldquoEverything you wanted to know aboutMPEG-7 part 2rdquo IEEEMultimedia vol 6 no 4 pp 64ndash73 1999
[17] ISOIEC Information TechnologymdashMultimedia ContentDescription Interface -Part 4 Audio IS 15938-4 2002
[18] H Crysandt ldquoMusic identification with MPEG-7rdquo in Proceed-ings of the 115th AES Convention October 2003 Paper 5967
The Scientific World Journal 11
[19] OHellmuth E AllamanceMCremerHGrossmann J Herreand T Kastner ldquoUsing MPEG-7 audio fingerprinting in real-world applicationrdquo in Proceedings of the 115th AES ConventionOctober 2003 Paper 5961
[20] H-G Kim N Moreau and T Sikora MPEG-7 Audio andBeyond John Wiley amp Sons 2005
[21] J-Y Lee and S D You ldquoDimension-reduction technique forMPEG-7 Audio Descriptorsrdquo in Proceedings of the 6th PacificRim Conference on Multimedia (PCM rsquo05) vol 3768 of LectureNotes in Computer Science pp 526ndash537 2005
[22] J L Bentley ldquoMultidimensional binary search trees used forassociative searchingrdquo Communications of the ACM vol 18 no9 pp 509ndash517 1975
[23] Open source program for k-d tree httpwwwcsumdedusimmountANN
[24] J Shlens ldquoA tutorial on principal component analysisrdquohttpwwwsnlsalkedusimshlenspcapdf
[25] T Fawcell ldquoROC graphs notes and practical considerations forresearchersrdquo Tech Rep HP Lab 2004
[26] A Martin G Doddington T Kamm M Ordowski and MPrzybocki ldquoThe DET curve in assessment of detection taskperformancerdquo in Proceedings of the Eurospeech vol 4 pp 1899ndash1903 Rhodes Greece September 1997
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World Journal 3
descriptors are constructed based on low-level descriptors Inthis section we will briefly describe the descriptors related tomusic identification
21 Low-Level and High-Level Descriptors of the MPEG-7Audio There are 17 low-level audio descriptors defined inthe standard All of these low-level descriptors are derivedfrom the waveform of the music They can be dividedinto six different categories basic basic spectral signalparameter temporal timbral spectral timbral and spectralbasis representationThese low-level features may be directlyused or may serve as the basics for constructing high-leveldescriptors
Based on low-level descriptors MPEG-7 audio standardalso defines high-level description schemes for various appli-cations These include audio signature description instru-ment timbre description general sound recognition andindexing description and spoken content description Theaudio signature descriptors one type of fingerprints are usedfor music identification
22 MPEG-7 Audio Signature Descriptors Although low-level descriptors in MPEG-7 audio may also be used toidentifymusic it is shown that the audio signature descriptorsprovide better identification performance [18 19] Thereforethe audio signature descriptors are adopted in the proposedsystem
TheMPEG-7 audio signature descriptors are computed asfollows
(1) Time-to-frequency conversion this step is based onaudio spectrum envelope descriptors including thefollowing substeps
(i) Determine the hop length between two con-secutive windows in the unit of samples Thedefault value is equivalent to 30ms
(ii) Define the window length 119897119908 This value is set to
three times the hop length The chosen windowis Hamming window
(iii) Determine the length of FFT denoted as 119873FFTTo reduce the computational complexity 119873FFTis the smallest power-of-2 number equal to orgreater than 119897
119908 For example if the sample rate of
the audio is 44100 ss then 119897119908= 44100sdot003sdot3 =
3969Therefore119873FFT =4096The extra samplesafter 119897
119908are padded with zeros
(iv) Perform the FFT on the windowed samples
(2) Divide the FFT coefficients into subbands with eachsubband having a bandwidth of one-fourth of anoctave In addition the bandwidths of two consecu-tive subbands should overlap each other by 10Thatis the computed bandwidth for each subbandmust bemultiplied by 11 The beginning frequency of the firstsubband denoted as loEdge is fixed at 250Hz Table 1lists the frequency range of the first three subbandsbefore and after (spectral) overlapping
Subband
middot middot middot middot middot middot
119860 =
11988612 119886124
11988621 11988622 119886224
119886301 119886302 1198863024
119886311 119886312 1198863124
middot middot middot middot middot middot
middot middot middot middot middot middot
middot middot middot middot middot middot
11988611
Time
Figure 1 MPEG-7 audio signature descriptors in a matrix
(3) Find the flatness measure F(b) for subband b by
119865 (119887) =
ℎ(119887)minus119897(119887)+1radicprodℎ(119887)
119894=119897(119887)119888 (119894)
(1 (ℎ (119887) minus 119897 (119887) + 1))sumℎ(119887)
119894=119897(119887)119888 (119894)
(1)
where c(i) is the power spectrum computed by FFT(in step 1) and h(b) and l(b) are the lower and upperindices of c(i) within subband b
(4) Find themean and variance of F(b) for subband b overa certain number of successive FFT windows calledscaling ratio The default value of the scaling ratio is16
(5) The series of mean and variance values are the audiosignature descriptors
With the computational steps of the audio signaturedescriptor we may compute the number of descriptors ina piece of 15-second music In the time domain about(15minus009)003 = 497windows are used to cover the 15-secondsignal because the hop size is 30ms Since the scaling ratio isset to 16 there are 49716 asymp 31 values per subband In thespectral domain since there are four subbands per octave andwe use six octaves (250Hz sim 16 kHz) there are totally 24subbands in the spectral domain Thus the 15-second signalproduces 24 times 31 = 744 mean values and other 744 variancevalues The descriptors are arranged in a matrix form asshown in Figure 1
Since the mean values are sufficient for the identificationpurpose [21] we will not consider the variance descriptorsin the following The obtained descriptors are referred to ashigh-resolution (or full-dimensional) descriptors Accordingto [21] if a piece of query music is to be compared with areference piece it should be done with a sliding comparisonas shown in Figure 2 In the figure one segment of linerepresents descriptors in a piece of music arranged in onedimensional structure Using the representation in Figure 1the query segment is arranged as
119876 = [11990211 sdot sdot sdot 119902124 sdot sdot sdot 119902311 sdot sdot sdot 1199023124] (2)
Since only descriptors from the same subband are to becompared the hop size between two query segments inFigure 2 is 24 descriptors or 003 sdot 16 = 048 second Inother words one descriptor (in a subband) represents 048second of audio samples The smallest Euclidean distance
4 The Scientific World Journal
middot middot middot
middot middot middot
119886124 11988621 119886224 1198863124middot middot middot
middot middot middot
11988611 11988631 middot middot middot 119886324 middot middot middot 1198863224 1198866224
11990211 119902124 11990222411990221 1199023124
middot middot middot11990211 119902124 11990222411990221 1199023124middot middot middot
middot middot middot11990211 119902124 1199023124middot middot middot
middot middot middot
middot middot middot
middot middot middot
Figure 2 The sliding operation to compare the query input Q (15seconds) and the reference music A (30 seconds)
obtained during the sliding operation is recorded as thedistance between the query input and the reference musicFor example if the query Q is to be compared with referenceA then we may obtain the Euclidean distance 119864
119860119861(119896) for the
k-th sliding comparison as
119864119876119860(119896) =
31
sum
119894=1
24
sum
119895=1
10038161003816100381610038161003816119902119894119895minus 119886(119894+119896)119895
10038161003816100381610038161003816 (3)
The recorded distance between these two pieces of music is
119889119876119860
= min119896
119864119876119860(119896) (4)
Although using audio signature descriptors greatlyreduces the computational burden for comparison therequired computation using (3) is still very large Accordingto [21] suppose that a database contains 1000 audio files witheach one having duration of 30 seconds and the music tobe identified has a duration of 15 seconds then 47 616 000arithmetic operations are required Therefore it is beneficialto further reduce the number of comparison which can beaccomplished by employing a multiresolution strategy
3 Music Identification Based onMultiresolution Strategy
As discussed previously the computational cost is still veryhigh even if we use the MPEG-7 descriptors Thereforewe will use the multiresolution strategy to reduce thecomputational complexity To do so in addition to theabove-mentioned high-resolution descriptors we also needto generate low-resolution descriptors (see also Section 4)Therefore the music database contains high-resolution andlow-resolution descriptors for each soundtrack to be identi-fied This step can be accomplished during the setup of thedatabase In addition a training process is conducted to finda distance threshold to determine whether the query inputis in the database or not Section 5 has a detailed descriptionabout the training process
Once the database is constructed as shown in Figure 3the music identification procedure consists of the followingsteps
(1) When a query input is sent to the system it com-putes the MPEG-7 audio signature descriptors andlow-resolution descriptors for the query music If amobile device is used to record the music usually the
Table 1 Bandwidth of the first three subbands
Bandwidth of subband(nonoverlapped)
Bandwidth of subband(overlapped)
250ndash2973Hz 2375ndash3122Hz2973ndash3536Hz 2824ndash3712Hz3535ndash4204Hz 3359ndash4415Hz
descriptors (fingerprints) are sent instead of the PCMsamples to reduce the size of the transmitted data
(2) The computed low-resolution descriptors are com-pared with those in the database Based on thedistance metric a list of candidates is obtainedSince the query music may start at any position ofthe soundtrack a sliding comparison as shown inFigure 2 is necessary
(3) After obtaining the candidate list high-resolutiondescriptors are used to find the distances between thequery input and the candidates The shortest distanceand its associated soundtrack are recorded
(4) If the shortest distance is less than the threshold thequery input is considered as highly similar to therecorded soundtrack in the database Otherwise thequery input is not in the database
During the low-resolution comparison it is also possibleto use an existing algorithm to reduce the comparison timeFor example we may arrange the descriptors in k-d tree (k-dimensional binary search tree) [22 23] structure to reducethe search time However using the k-d tree structure impliesthat every segment of music in the database has the samenumber of descriptors or same duration Note that the unitto be compared is one segment and one soundtrack canbe divided into many overlapped segments as shown inFigure 2 Therefore if the duration of the segment is set to 15seconds then the querymusic must also have a duration of 15seconds to use the system In a practical situation if the queryinput is longer than 15 seconds then only 15 seconds of themusic is used for computing the descriptors and comparison
4 Dimensionality Reduction Method forMPEG-7 Audio Signature Descriptors
This section briefly explains the dimensionality reductiontechnique used in the experiments Although a block-averagemethod is proposed in [21] for this purpose we use a differenttechnique in this paper
41 Problems of Reducing Dimensionality Using Scaling RatioConceptually we may increase the scaling ratio (given inSection 2) to reduce the number of descriptors duringcomputing them For example by increasing the scalingratio from 16 to 256 the number of descriptors is reducedby 16 times Unfortunately this approach does not yieldsatisfactory results because of insufficient time resolutionRecall that the descriptors are derived based on thewindowedwaveform Therefore if two segments of the soundtrack are
The Scientific World Journal 5
Query
Database
Dim reduction
In database
Reject Result
Threshold
Candidate
MPEG-7descriptors
Fullresolution
search
Lowresolution
search
YesNo
Figure 3 Procedure for multiresolution music identification
Descriptors
Descriptors Descriptors1198861198991 11988611989924 119886119899+11 119886119899+124
1199021198991 11990211989924
Figure 4 The audio samples and the descriptors In a real applica-tion 119886
119899119896is stored in the database whereas 119902
119899119896is computed from
the query input Usually 119886119899119896
is not equal to 119902119899119896
due to differentscopes of the windows even though they are all derived from thesame soundtrack
not highly similar their corresponding descriptors usuallyhave large differences With a scaling ratio of 256 eachdescriptor represents around 77 seconds of audio samplesTherefore unless the query input also starts at a point veryclose to the segment boundary of a soundtrack a comparisonbased on these descriptors may fail As illustrated in Figure 4descriptors of 119886
119899119896(or 119886119899+1119896
) and 119902119899119896
are quite different andtherefore the descriptor-based identification scheme cannotidentify the query inputTherefore a suitable time resolutionfor example 048 second should be maintained
42 Proposed Dimensionality Reduction Method In contrastto reduce the number of descriptors by increasing the scalingratio we may reduce them in each temporal-spectral block(ie the matrix in Figure 1) representing a segment of (15-second) music However to maintain a high time resolutionsuccessive temporal-spectral block should have a small timedifference as shown in Figure 5 With this arrangement weachieve both high identification rate and low comparisoncomplexity
For each temporal-spectral block we use PCA (princi-pal component analysis) [21 24] to reduce the number ofdescriptors Since it is difficult to directly use PCA to obtaingood low-resolution descriptors [21] we use an alternativeapproach Its basic idea is to partition the entire block into
119860 =
middot middot middot
119886124
11988621 119886224
119886311
119886321
1198863124
middot middot middot
11988611
11988631 119886324middot middot middot
middot middot middot
middot middot middot
middot middot middot
1198863224
119902331 1199023324
middot middot middot
middot middot middot
middot middot middot 1198866224119886621
11989111 11989118
11989121 11989128
11989131 11989138
Figure 5 Low-resolution descriptors In the figure each temporal-spectral block corresponds to descriptors from 15-second music
119860 = =11986011 1198601211986021 11986022
middot middot middot middot middot middot
middot middot middot
middot middot middot
middot middot middot
middot middot middot
middot middot middot
middot middot middot
middot middot middot
middot middot middot
11988611 11988618
119886151
119886161
119886291
119886158
119886168
119886298
11988619
119886159
119886169
119886299
119886116
1198861516
1198861616
1198862916 1198862917 1198862924
119886124119886117
Figure 6 Partition of a temporal-spectral block into four subblocksin the experiments
four subblocks as shown in Figure 6 and then use PCA toreduce the number of descriptors of each subblock into twovalues (descriptors) called low-resolution descriptors Alsohigh-frequency descriptors in the original temporal-spectralblock are all discarded because they are susceptible to noise[21] With this arrangement totally eight descriptors are usedto represent a segment of 15-second music Note that theactual duration of a segment used in the experiments is 1404seconds (though we still say 15 seconds) because some audiosamples are lost after MP-3 compression and decompressionin the experiments
6 The Scientific World Journal
Table 2 Identification accuracy using high-resolution features
Uncompressed 192 k MP-3 96 k MP-3High-frequencydescriptors Used Not used Used Not used Used Not used
Match in firstone 100 100 100 100 94 100
Match in first 15 100 100 100 100 98 100
Wenowdescribe how to calculate low-resolution descrip-tors Since using PCA for dimensionality reduction is a well-known approach we only describe how to construct thecovariance matrix for PCA computation and omit the com-putation of finding principal components Suppose that thereare N segments of music pieces in the database with theirdescriptor matrices denoted as 119860(1) to 119860(119873) To simplify theargument we consider the subblock matrix119860(119896)
11(referring to
Figure 6) in the following Other subblocks can be computedby the same procedure First collect all subblocks from 119860
(119896)
11
to form a big matrix as follows
119861 =
[[[
[
119886(1)
11sdot sdot sdot 119886
(1)
18119886(2)
11sdot sdot sdot 119886
(2)
18sdot sdot sdot 119886(119873)
11sdot sdot sdot 119886(119873)
18
sdot sdot sdot
119886(1)
151sdot sdot sdot 119886(1)
158119886(2)
151sdot sdot sdot 119886(2)
158sdot sdot sdot 119886(119873)
151sdot sdot sdot 119886(119873)
158
]]]
]
= [
[
uarr sdot sdot sdot uarr
b1sdot sdot sdot b8119873
darr sdot sdot sdot darr
]
]
(5)
Then we may consider column vectors of matrix B as b119894
vectors Having the data vectors b119894 the covariance matrix
and subsequently the principal components can be foundBy keeping only two principal components we can obtainc119894= [11988811989411198881198942] from b
119894 Since there are eight column vectors in
a subblock descriptors in119860(119896)11
are reduced to eight c119894(8119896minus7 le
119894 le 8119896) vectors Next we can rearrange the obtained c119894vectors
as
119862 =[[
[
11988811
11988812
11988891
11988892
sdot sdot sdot 1198888119873minus71
1198888119873minus72
sdot sdot sdot
11988881
11988882
119888161
119888162
sdot sdot sdot 11988881198731
11988881198732
]]
]
(6)
Again by treating each row of matrix C as the data to beprocessed by PCA we can compute the covariance matrixand finally reduce each column vector to one value Sincethere are two columns originally from one subblock thereare totally two (low-resolution) descriptors per subblockEquivalently a segment of 15-second music is representedby eight descriptors During the computation the principalcomponents obtained in the first and the second steps shouldbe stored in the database When a query input (in the formof high-resolution descriptors) is supplied these componentsare used to obtain the low-resolution descriptors for theinput
Table 3 Comparison of identification accuracies using low-resolution descriptors obtained by the proposed approach and bythe averaging (avg) method [21]
Uncompressed 192 k MP-3 96 k MP-3Method Proposed Avg Proposed Avg Proposed AvgMatch in firstone 993 995 991 993 901 898
Match in first15 100 100 100 100 996 995
5 Threshold to Determine the Membership ofthe Query Input
Since in practice the database cannot collect all musicsoundtracks in the world we have to have a strategy todetermine if the query input is actually in the database ornot In our case the decision is accomplished with the aid ofa threshold If the shortest distances between the input andthe candidates are greater than a threshold then the queryinput is not in the database otherwise it is in the databaseThough the concept is simple it is not trivial to determinethe threshold [20]
Recall that when a query input is applied to the proposedsystem the system computes the Euclidean distances betweenthe low-resolution descriptors of the query input and thosein the database Accordingly m (say m = 20) segments ofsoundtracks from distinct titles are selected based on thecomputed distances Next the Euclidean distances betweenthe query and the selected segments are computed againusing full-resolution descriptors By sorting the distancesfrom small to large the system creates a candidate list for thequery input
To compute the previously mentioned threshold weexamine two different methods In the first method (methodI) we define the ldquofirstrdquo distance 119889
1as the (full-resolution)
distance associated with the first candidate in the list Simi-larly the ldquosecondrdquo distance 119889
2is the distance associated with
the second candidate as shown in Figure 7 The methodto compute the first and second distances is used both intraining and in identification For the training phase assumethat there are 119873
119879soundtracks used For a segment from a
soundtrack indexed n let the first distance be 1198891(119899) and the
second distance be 1198892(119899) The threshold 119879
119860is then computed
as
119879119860= 05 sdot (
119873119879
sum
119899=1
1198891(119899) +
119873119879
sum
119899=1
1198892(119899)) (7)
During the identification phase the first distance forthe query input is computed If the first distance is smallerthan the threshold 119879
119860 the query input is determined to be
highly similar to the top soundtrack title in the candidate listOtherwise the query input is not inside the database
In addition to method I we also propose an alternativemethod (method II) to compute the first and seconddistances
The Scientific World Journal 7
Firs
t dist
ance
Seco
nd d
istan
ce
Figure 7 The first and second distances in method I
and the threshold Referring to Figure 8 the first distance1198891015840
1(119899) of method II for a training segment in soundtrack n is
1198891015840
1(119899) = 119889
1(119899) divide
sum119872
119898=2119889119898(119899)
119872 minus 1 (8)
where 119889119898(119899) is the distance associatedwith themth candidate
in the list andM is a constant Based on our experimentM =10 is sufficient By the same arrangement the second distance1198891015840
2(119899) is calculated as
1198891015840
2(119899) = 119889
2(119899) divide
sum119872+1
119898=3119889119898(119899)
119872 minus 1 (9)
Similar to (7) the threshold 1198791015840119860is determined as
1198791015840
119860= 05 sdot (
119873119879
sum
119899=1
1198891015840
1(119899) +
119873119879
sum
119899=1
1198891015840
2(119899)) (10)
During identification phase if the computed distance 11988910158401
is greater than1198791015840119860 the query input is determined as not in the
database Otherwise the query input is highly similar to thetop soundtrack title in the candidate list
6 Experiments and Results
Before conducting the experiments we collect 750 sound-tracks frommany CD titles for constructing the database andfor identification In the experiments 30 seconds of musicis excerpted from the soundtracks as the reference items tobe identified Then low- and full-resolution descriptors ofthe reference items are calculated and stored in the databaseWe also randomly excerpt 15-second query items from thereference items Note that a query item may start fromany sample on the first half of the reference To test theidentification accuracy for compressed audio the 15-secondquery inputs are also encoded and then decoded with anMP-3 coder with bitrates of 192 k and 96 k respectivelyHowever the database only contains descriptors from theuncompressed items In addition the principal componentsare also obtained using uncompressed items
Table 4 Comparison of search time between 119896-119889 tree search andlinear search
119870-119889 tree Linear searchAverage time 0255 sec 323 sec
Several experiments are conducted to examine the per-formance of the proposed systemThefirst experiment checksthe identification accuracy using high-resolution descriptorsThe second one compares the relative identification accuracybetween the proposed method (in Section 42) and themethod given in [21] The third experiment compares thesearch speed by using the linear search and k-d tree searchThe next experiment intends to determine a suitable valueof M used in (8) and (9) Having the value of M we reportthe identification accuracy using a multiresolution strategyin experiment five In this experiment both method I andmethod II are examined To have a complete comparisonbetween method I and II we also report the results in termsof the ROC (receiver operating characteristics) [25] curveand the DET-like (detection error tradeoff) [26] curve butwithout logarithm
61 Experiment One Identification Accuracy Using High-Resolution Descriptors The first experiment is to examinethe identification accuracy using the high-resolutionMPEG-7 audio signature descriptors (without reduction) To evaluatethe influences of high-frequency descriptors we also exam-ine the accuracies with and without using high-frequency(greater than 25 kHz) descriptors The results are given inTable 2 In the table ldquomatch in first onerdquo is the rate that thequery input is correctly identified with the shortest distanceamong all reference items Similarly ldquomatch in first 15rdquo is therate that the query is correctly identified within a list of 15 ref-erence items sorted by distanceThe results indicate that high-resolution descriptors have very good identification accuracyAlso for uncompressed or 192 k MP-3 items whether usinghigh-frequency descriptors does not affect the identificationaccuracy However for 96 k MP-3 items discarding high-frequency descriptors greatly improves the accuracy
8 The Scientific World Journal
Seco
nd d
istan
ce
Firs
t dist
ance
AvgAvg
dividedivide
Figure 8 The first and second distances in method II
62 Experiment Two Identification Accuracy Using Low-Resolution Descriptors The second experiment compares therelative identification accuracies of the proposed dimension-ality reduction technique and the averaging method pro-posed in [21] In this experiment eight descriptors are used torepresent one segment of music (15 seconds) The results areshown in Table 3 For this experiment the accuracy rate inldquomatch in first 15rdquo ismore important than the figure in ldquomatchin first onerdquo because the low-resolution descriptors are usedto produce a candidate list Therefore it is acceptable as longas the correct item is within the list as the high-resolutiondescriptors are to be used in the second stage In this regardthe proposed approach is slightly better than the averagingmethod proposed in [21] for 96 k MP-3 query inputs
63 Experiment Three Time for k-d Tree Search and LinearSearch The third experiment compares the time to generatea list of 20 candidates by using k-d tree search and linear(exhausted) search for low-resolution descriptors The com-puting time is obtained by averaging the required time in tentrials For each trial 500 query items are randomly selectedamong the 750 available items The experimental platform isa personal computer with a 16GHz PentiumCPU and 15 GBmemory The program is written in C with a Borland C++compiler The results are shown in Table 4 The results showthat we can increase the search time by 10-fold if the low-resolution descriptors are embedded in the k-d tree structure
64 Experiment Four Determination of M in Method II Thefourth experiment is to determine the numberM in methodII Recall that the thresholds in both methods are computedbased on high-resolution descriptors To determine the opti-mal value of M we vary this value from one to eighteen in(8) and (9) to compute the threshold 1198791015840
119860and then to perform
identification Note that method II is degenerated to methodI if M = 1 In this experiment 96 k MP-3-coded items areused as the query In addition we deliberately keep the high-frequency descriptors in computing the distances becausekeeping them makes it easier to examine the influence of Mversus identification rate (defined in experiment five ) Theexperimental results are given in Figure 9 The results showthat whenM is equal or greater than 10 the identification rate
085408560858
0860862086408660868
087087208740876
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Iden
tifica
tion
rate
119872 value
Figure 9The identification rate versus the numberM for 96 kMP-3query inputs
remains almost constant Therefore M = 10 is sufficient formethod II This value is then used in experiment five
65 Experiment Five Comparison between Method I andMethod II The fifth experiment compares the identificationrates between method I and method II The thresholds arecalculated using (7) and (10) respectively In this experiment375 of the 750 15-second items are randomly chosen fortraining (ie to obtain the thresholds) The rest of 375 itemsare for query Note that all of the items are excerpted from thereference items in the database In addition we test other 500query items (also with duration of 15 sec) not excerpted fromdatabase references
Since some query inputs are not in the database wealso need to consider the situations of falsely identifying anoutside database item as one inside the database and viceversa Suppose that 119879
119889query items are actually inside the
database and 119879119900query items are not inside the database
Assume that the identification system correctly identifies119873119889
inside-database query item erroneously rejects 119873119903inside-
database query items and erroneously accepts 119873119886outside
database query items Then the IDR (identification rate)
The Scientific World Journal 9
Table 5 Comparison of identification performance using method I and method II with the use of high-frequency descriptors
Uncompressed 192 k MP-3 96 k MP-3Method I Method II Method I Method II Method I Method II
IDR 988 999 99 997 841 874FAR 21 02 18 02 04 02FRR 00 00 00 05 366 296ACC 984 999 989 995 718 768
Table 6 Comparison of method I and method II for IDR FAR and FRR with high-frequency descriptors removed
Uncompressed 192 k MP-3 96 k MP-3Method I Method II Method I Method II Method I Method II
IDR 991 997 989 997 990 997FAR 15 02 19 02 16 02FRR 03 04 02 04 03 06ACC 987 995 985 995 986 994
FAR (false-accept rate) and FRR (false-rejection rate) arecomputed as follows
IDR =119873119889
119879119889minus 119873119903
(11)
FAR =119873119886
119879119900
(12)
FRR =119873119903
119879119889
(13)
In addition the accuracy of the system is computed as
ACC =119873119889+ 119873119900
119879119889+ 119879119900
(14)
where119873119900is the number of query items correctly identified as
outside-database items and can be computed as119873119900= 119879119900minus119873119886
In this experiment the low-resolution descriptors areused to find 20 candidates Next high-resolution descriptorsare used to find the best-matched one in the list If thedistance associated with the best-match is greater than thethreshold the query input is determined as not in thedatabase Otherwise the query input is identified as the best-matched one In this experiment high-frequency descriptorsare kept in the comparison With the use of (11) to (14) theaverage values after four trials are given in Table 5 Fromthe results we know that the system does not perform wellfor identifying 96 k MP-3 query items This is mainly dueto high distortion in high-frequency descriptors A similarphenomenon can also be observed in Table 2 Later onwe will repeat this experiment but discard high-frequencydescriptors
To further examine the performance differences betweenmethod I and method II we again use 96 k MP-3 items asthe query inputs But this time we vary the threshold values(for both methods) and then record the IDR FAR and FRRvalues The results are presented as ROC curves [25] andDET-like curves [26] (but without logarithm) as shown in
Figures 10 and 11 respectively The ROC curve plots FARversus TPR (true positive rate) which is defined as
TPR =119873119889
119879119889
(15)
Conceptually a better classifier should have a higher TPRfor a fixed FAR Therefore a better classifier should havean ROC curve closer to the left-upper corner With thisinterpretation from Figure 10 we know that method II is abetter classifier
In this experiment we also show DET-like curves Aldquotruerdquo DET curve plots FAR versus FRR using logarithmscales In our case however we do not use logarithm scales toexhibit the similarity between ROC and DET curves Againconceptually we know that a better classifier should have alower FRR over a fixed FAR Thus a better classifier shouldhave a curve closer to the left-bottom corner By examiningFigure 11 we again confirm thatmethod II is a better method
Since the high-frequency descriptors actually affect theidentification rate at low bitrates we again repeat this exper-iment without using high-frequency descriptors The resultsare given in Table 6 By comparing the results in Tables 5 and6 we know that removing high-frequency descriptors slightlyreduces the accuracy of uncompressed items However itgreatly improves the accuracy for 96 k MP-3 items Sincemethod II has an accuracy of at least 994 in all cases it ishighly plausible to identify copyrighted audiomaterials usingthis method
7 Conclusion
In this paper we propose a system using a multiresolutionstrategy to identify whether a piece of unknown music isidentical to one of the pieces in the database In the systemhigh-resolution descriptors are MPEG-7 audio signaturedescriptors and the low-resolution descriptors are obtainedfrom high-resolution descriptors with the aid of PCA toreduce their dimensionality Experimental results show that
10 The Scientific World Journal
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100
FAR ()
TPR
()
ROC curve (96 k MP-3)
Method IMethod II
Figure 10 The ROC curves for both methods
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100
FAR ()
FRR
()
Method IMethod II
DET curve (96 k MP-3)
Figure 11 The DET-like curves without logarithm for both meth-ods
low-resolution descriptors still have high identification accu-racies To reduce the time to generate the candidate list weuse the k-d tree structure to store low-resolution descriptorsExperimental results show that using k-d tree structureincreases the search time by ten-folds Since not every pieceof query input is within the database we also proposed twomethods to determine the distance thresholds Experimentalresults show that the proposed method II provides an accu-racy of 994 Therefore the proposed system can be usedin real applications such as identifying copyrighted audiofiles circulated over the Internet As it can be easily extendedto operate with multiple computers the proposed system
is a plausible starting point to construct a large operabledatabase
Acknowledgment
This work was supported in part by National Science Councilof Taiwan through Grants NSC 94-2213-E-027-042 and 99-2221-E-027-097
References
[1] For a brief description of the bill please refer to httpenwikipediaorgwikiSOPA
[2] R Eklund ldquoAudio watermarking techniquesrdquo httpwwwmusemagiccompaperswatermarkhtml
[3] J S R Jang and H R Lee ldquoA general framework of progressivefiltering and its application to query by singinghummingrdquoIEEE Transactions on Audio Speech and Language Processingvol 16 no 2 pp 350ndash358 2008
[4] S Baluja and M Covell ldquoAudio fingerprinting combiningcomputer visionamp data streamprocessingrdquo in Proceedings of theIEEE International Conference on Acoustics Speech and SignalProcessing (ICASSP rsquo07) pp II213ndashII216 Honolulu HawaiiUSA April 2007
[5] Shazam httpwwwshazamcommusicwebhomehtml[6] A Wang ldquoThe Shazam music recognition servicerdquo Communi-
cations of the ACM vol 49 no 8 pp 44ndash48 2006[7] V Chandrasekha M Sharifi and D A Ross ldquoSurvey and eval-
uation of audio fingerprinting schemes for mobile query-by-example applicationsrdquo in Proceedinjgs of the 12th InternationalConference onMusic Information Retrieval pp 801ndash806MiamiFla USA October 2011
[8] J A Haitsma and T Kalker ldquoA highly robust audio fingerprint-ing systemrdquo in Proceedinjgs of the 12th International Conferenceon Music Information Retrieval pp 107ndash115 Paris FranceOctober 2002
[9] C J C Burges J C Platt and S Jana ldquoDistortion discriminantanalysis for audio fingerprintingrdquo IEEE Transactions on Speechand Audio Processing vol 11 no 3 pp 165ndash174 2003
[10] Official website httpmusicbrainzorg[11] Official website httpwwwaudiblemagiccom[12] Official website httpwwwgracenotecom[13] httpenwikipediaorgwikiAcoustic fingerprint[14] P J O Doets M M Gisbert and R L Lagendijk ldquoOn
the comparison of audio fingerprints for extracting qualityparameters of compressed audiordquo in Security Steganographyand Watermarking of Multimedia Contents VIII vol 6072 ofProceedings of SPIE pp 60720L-1ndash60720L-12 San Jose CalifUSA January 2006
[15] F Nack and A T Lindsay ldquoEverything you wanted to knowaboutMPEG-7 part 1rdquo IEEEMultimedia vol 6 no 3 pp 65ndash771999
[16] F Nack and A Lindsay ldquoEverything you wanted to know aboutMPEG-7 part 2rdquo IEEEMultimedia vol 6 no 4 pp 64ndash73 1999
[17] ISOIEC Information TechnologymdashMultimedia ContentDescription Interface -Part 4 Audio IS 15938-4 2002
[18] H Crysandt ldquoMusic identification with MPEG-7rdquo in Proceed-ings of the 115th AES Convention October 2003 Paper 5967
The Scientific World Journal 11
[19] OHellmuth E AllamanceMCremerHGrossmann J Herreand T Kastner ldquoUsing MPEG-7 audio fingerprinting in real-world applicationrdquo in Proceedings of the 115th AES ConventionOctober 2003 Paper 5961
[20] H-G Kim N Moreau and T Sikora MPEG-7 Audio andBeyond John Wiley amp Sons 2005
[21] J-Y Lee and S D You ldquoDimension-reduction technique forMPEG-7 Audio Descriptorsrdquo in Proceedings of the 6th PacificRim Conference on Multimedia (PCM rsquo05) vol 3768 of LectureNotes in Computer Science pp 526ndash537 2005
[22] J L Bentley ldquoMultidimensional binary search trees used forassociative searchingrdquo Communications of the ACM vol 18 no9 pp 509ndash517 1975
[23] Open source program for k-d tree httpwwwcsumdedusimmountANN
[24] J Shlens ldquoA tutorial on principal component analysisrdquohttpwwwsnlsalkedusimshlenspcapdf
[25] T Fawcell ldquoROC graphs notes and practical considerations forresearchersrdquo Tech Rep HP Lab 2004
[26] A Martin G Doddington T Kamm M Ordowski and MPrzybocki ldquoThe DET curve in assessment of detection taskperformancerdquo in Proceedings of the Eurospeech vol 4 pp 1899ndash1903 Rhodes Greece September 1997
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
4 The Scientific World Journal
middot middot middot
middot middot middot
119886124 11988621 119886224 1198863124middot middot middot
middot middot middot
11988611 11988631 middot middot middot 119886324 middot middot middot 1198863224 1198866224
11990211 119902124 11990222411990221 1199023124
middot middot middot11990211 119902124 11990222411990221 1199023124middot middot middot
middot middot middot11990211 119902124 1199023124middot middot middot
middot middot middot
middot middot middot
middot middot middot
Figure 2 The sliding operation to compare the query input Q (15seconds) and the reference music A (30 seconds)
obtained during the sliding operation is recorded as thedistance between the query input and the reference musicFor example if the query Q is to be compared with referenceA then we may obtain the Euclidean distance 119864
119860119861(119896) for the
k-th sliding comparison as
119864119876119860(119896) =
31
sum
119894=1
24
sum
119895=1
10038161003816100381610038161003816119902119894119895minus 119886(119894+119896)119895
10038161003816100381610038161003816 (3)
The recorded distance between these two pieces of music is
119889119876119860
= min119896
119864119876119860(119896) (4)
Although using audio signature descriptors greatlyreduces the computational burden for comparison therequired computation using (3) is still very large Accordingto [21] suppose that a database contains 1000 audio files witheach one having duration of 30 seconds and the music tobe identified has a duration of 15 seconds then 47 616 000arithmetic operations are required Therefore it is beneficialto further reduce the number of comparison which can beaccomplished by employing a multiresolution strategy
3 Music Identification Based onMultiresolution Strategy
As discussed previously the computational cost is still veryhigh even if we use the MPEG-7 descriptors Thereforewe will use the multiresolution strategy to reduce thecomputational complexity To do so in addition to theabove-mentioned high-resolution descriptors we also needto generate low-resolution descriptors (see also Section 4)Therefore the music database contains high-resolution andlow-resolution descriptors for each soundtrack to be identi-fied This step can be accomplished during the setup of thedatabase In addition a training process is conducted to finda distance threshold to determine whether the query inputis in the database or not Section 5 has a detailed descriptionabout the training process
Once the database is constructed as shown in Figure 3the music identification procedure consists of the followingsteps
(1) When a query input is sent to the system it com-putes the MPEG-7 audio signature descriptors andlow-resolution descriptors for the query music If amobile device is used to record the music usually the
Table 1 Bandwidth of the first three subbands
Bandwidth of subband(nonoverlapped)
Bandwidth of subband(overlapped)
250ndash2973Hz 2375ndash3122Hz2973ndash3536Hz 2824ndash3712Hz3535ndash4204Hz 3359ndash4415Hz
descriptors (fingerprints) are sent instead of the PCMsamples to reduce the size of the transmitted data
(2) The computed low-resolution descriptors are com-pared with those in the database Based on thedistance metric a list of candidates is obtainedSince the query music may start at any position ofthe soundtrack a sliding comparison as shown inFigure 2 is necessary
(3) After obtaining the candidate list high-resolutiondescriptors are used to find the distances between thequery input and the candidates The shortest distanceand its associated soundtrack are recorded
(4) If the shortest distance is less than the threshold thequery input is considered as highly similar to therecorded soundtrack in the database Otherwise thequery input is not in the database
During the low-resolution comparison it is also possibleto use an existing algorithm to reduce the comparison timeFor example we may arrange the descriptors in k-d tree (k-dimensional binary search tree) [22 23] structure to reducethe search time However using the k-d tree structure impliesthat every segment of music in the database has the samenumber of descriptors or same duration Note that the unitto be compared is one segment and one soundtrack canbe divided into many overlapped segments as shown inFigure 2 Therefore if the duration of the segment is set to 15seconds then the querymusic must also have a duration of 15seconds to use the system In a practical situation if the queryinput is longer than 15 seconds then only 15 seconds of themusic is used for computing the descriptors and comparison
4 Dimensionality Reduction Method forMPEG-7 Audio Signature Descriptors
This section briefly explains the dimensionality reductiontechnique used in the experiments Although a block-averagemethod is proposed in [21] for this purpose we use a differenttechnique in this paper
41 Problems of Reducing Dimensionality Using Scaling RatioConceptually we may increase the scaling ratio (given inSection 2) to reduce the number of descriptors duringcomputing them For example by increasing the scalingratio from 16 to 256 the number of descriptors is reducedby 16 times Unfortunately this approach does not yieldsatisfactory results because of insufficient time resolutionRecall that the descriptors are derived based on thewindowedwaveform Therefore if two segments of the soundtrack are
The Scientific World Journal 5
Query
Database
Dim reduction
In database
Reject Result
Threshold
Candidate
MPEG-7descriptors
Fullresolution
search
Lowresolution
search
YesNo
Figure 3 Procedure for multiresolution music identification
Descriptors
Descriptors Descriptors1198861198991 11988611989924 119886119899+11 119886119899+124
1199021198991 11990211989924
Figure 4 The audio samples and the descriptors In a real applica-tion 119886
119899119896is stored in the database whereas 119902
119899119896is computed from
the query input Usually 119886119899119896
is not equal to 119902119899119896
due to differentscopes of the windows even though they are all derived from thesame soundtrack
not highly similar their corresponding descriptors usuallyhave large differences With a scaling ratio of 256 eachdescriptor represents around 77 seconds of audio samplesTherefore unless the query input also starts at a point veryclose to the segment boundary of a soundtrack a comparisonbased on these descriptors may fail As illustrated in Figure 4descriptors of 119886
119899119896(or 119886119899+1119896
) and 119902119899119896
are quite different andtherefore the descriptor-based identification scheme cannotidentify the query inputTherefore a suitable time resolutionfor example 048 second should be maintained
42 Proposed Dimensionality Reduction Method In contrastto reduce the number of descriptors by increasing the scalingratio we may reduce them in each temporal-spectral block(ie the matrix in Figure 1) representing a segment of (15-second) music However to maintain a high time resolutionsuccessive temporal-spectral block should have a small timedifference as shown in Figure 5 With this arrangement weachieve both high identification rate and low comparisoncomplexity
For each temporal-spectral block we use PCA (princi-pal component analysis) [21 24] to reduce the number ofdescriptors Since it is difficult to directly use PCA to obtaingood low-resolution descriptors [21] we use an alternativeapproach Its basic idea is to partition the entire block into
119860 =
middot middot middot
119886124
11988621 119886224
119886311
119886321
1198863124
middot middot middot
11988611
11988631 119886324middot middot middot
middot middot middot
middot middot middot
middot middot middot
1198863224
119902331 1199023324
middot middot middot
middot middot middot
middot middot middot 1198866224119886621
11989111 11989118
11989121 11989128
11989131 11989138
Figure 5 Low-resolution descriptors In the figure each temporal-spectral block corresponds to descriptors from 15-second music
119860 = =11986011 1198601211986021 11986022
middot middot middot middot middot middot
middot middot middot
middot middot middot
middot middot middot
middot middot middot
middot middot middot
middot middot middot
middot middot middot
middot middot middot
11988611 11988618
119886151
119886161
119886291
119886158
119886168
119886298
11988619
119886159
119886169
119886299
119886116
1198861516
1198861616
1198862916 1198862917 1198862924
119886124119886117
Figure 6 Partition of a temporal-spectral block into four subblocksin the experiments
four subblocks as shown in Figure 6 and then use PCA toreduce the number of descriptors of each subblock into twovalues (descriptors) called low-resolution descriptors Alsohigh-frequency descriptors in the original temporal-spectralblock are all discarded because they are susceptible to noise[21] With this arrangement totally eight descriptors are usedto represent a segment of 15-second music Note that theactual duration of a segment used in the experiments is 1404seconds (though we still say 15 seconds) because some audiosamples are lost after MP-3 compression and decompressionin the experiments
6 The Scientific World Journal
Table 2 Identification accuracy using high-resolution features
Uncompressed 192 k MP-3 96 k MP-3High-frequencydescriptors Used Not used Used Not used Used Not used
Match in firstone 100 100 100 100 94 100
Match in first 15 100 100 100 100 98 100
Wenowdescribe how to calculate low-resolution descrip-tors Since using PCA for dimensionality reduction is a well-known approach we only describe how to construct thecovariance matrix for PCA computation and omit the com-putation of finding principal components Suppose that thereare N segments of music pieces in the database with theirdescriptor matrices denoted as 119860(1) to 119860(119873) To simplify theargument we consider the subblock matrix119860(119896)
11(referring to
Figure 6) in the following Other subblocks can be computedby the same procedure First collect all subblocks from 119860
(119896)
11
to form a big matrix as follows
119861 =
[[[
[
119886(1)
11sdot sdot sdot 119886
(1)
18119886(2)
11sdot sdot sdot 119886
(2)
18sdot sdot sdot 119886(119873)
11sdot sdot sdot 119886(119873)
18
sdot sdot sdot
119886(1)
151sdot sdot sdot 119886(1)
158119886(2)
151sdot sdot sdot 119886(2)
158sdot sdot sdot 119886(119873)
151sdot sdot sdot 119886(119873)
158
]]]
]
= [
[
uarr sdot sdot sdot uarr
b1sdot sdot sdot b8119873
darr sdot sdot sdot darr
]
]
(5)
Then we may consider column vectors of matrix B as b119894
vectors Having the data vectors b119894 the covariance matrix
and subsequently the principal components can be foundBy keeping only two principal components we can obtainc119894= [11988811989411198881198942] from b
119894 Since there are eight column vectors in
a subblock descriptors in119860(119896)11
are reduced to eight c119894(8119896minus7 le
119894 le 8119896) vectors Next we can rearrange the obtained c119894vectors
as
119862 =[[
[
11988811
11988812
11988891
11988892
sdot sdot sdot 1198888119873minus71
1198888119873minus72
sdot sdot sdot
11988881
11988882
119888161
119888162
sdot sdot sdot 11988881198731
11988881198732
]]
]
(6)
Again by treating each row of matrix C as the data to beprocessed by PCA we can compute the covariance matrixand finally reduce each column vector to one value Sincethere are two columns originally from one subblock thereare totally two (low-resolution) descriptors per subblockEquivalently a segment of 15-second music is representedby eight descriptors During the computation the principalcomponents obtained in the first and the second steps shouldbe stored in the database When a query input (in the formof high-resolution descriptors) is supplied these componentsare used to obtain the low-resolution descriptors for theinput
Table 3 Comparison of identification accuracies using low-resolution descriptors obtained by the proposed approach and bythe averaging (avg) method [21]
Uncompressed 192 k MP-3 96 k MP-3Method Proposed Avg Proposed Avg Proposed AvgMatch in firstone 993 995 991 993 901 898
Match in first15 100 100 100 100 996 995
5 Threshold to Determine the Membership ofthe Query Input
Since in practice the database cannot collect all musicsoundtracks in the world we have to have a strategy todetermine if the query input is actually in the database ornot In our case the decision is accomplished with the aid ofa threshold If the shortest distances between the input andthe candidates are greater than a threshold then the queryinput is not in the database otherwise it is in the databaseThough the concept is simple it is not trivial to determinethe threshold [20]
Recall that when a query input is applied to the proposedsystem the system computes the Euclidean distances betweenthe low-resolution descriptors of the query input and thosein the database Accordingly m (say m = 20) segments ofsoundtracks from distinct titles are selected based on thecomputed distances Next the Euclidean distances betweenthe query and the selected segments are computed againusing full-resolution descriptors By sorting the distancesfrom small to large the system creates a candidate list for thequery input
To compute the previously mentioned threshold weexamine two different methods In the first method (methodI) we define the ldquofirstrdquo distance 119889
1as the (full-resolution)
distance associated with the first candidate in the list Simi-larly the ldquosecondrdquo distance 119889
2is the distance associated with
the second candidate as shown in Figure 7 The methodto compute the first and second distances is used both intraining and in identification For the training phase assumethat there are 119873
119879soundtracks used For a segment from a
soundtrack indexed n let the first distance be 1198891(119899) and the
second distance be 1198892(119899) The threshold 119879
119860is then computed
as
119879119860= 05 sdot (
119873119879
sum
119899=1
1198891(119899) +
119873119879
sum
119899=1
1198892(119899)) (7)
During the identification phase the first distance forthe query input is computed If the first distance is smallerthan the threshold 119879
119860 the query input is determined to be
highly similar to the top soundtrack title in the candidate listOtherwise the query input is not inside the database
In addition to method I we also propose an alternativemethod (method II) to compute the first and seconddistances
The Scientific World Journal 7
Firs
t dist
ance
Seco
nd d
istan
ce
Figure 7 The first and second distances in method I
and the threshold Referring to Figure 8 the first distance1198891015840
1(119899) of method II for a training segment in soundtrack n is
1198891015840
1(119899) = 119889
1(119899) divide
sum119872
119898=2119889119898(119899)
119872 minus 1 (8)
where 119889119898(119899) is the distance associatedwith themth candidate
in the list andM is a constant Based on our experimentM =10 is sufficient By the same arrangement the second distance1198891015840
2(119899) is calculated as
1198891015840
2(119899) = 119889
2(119899) divide
sum119872+1
119898=3119889119898(119899)
119872 minus 1 (9)
Similar to (7) the threshold 1198791015840119860is determined as
1198791015840
119860= 05 sdot (
119873119879
sum
119899=1
1198891015840
1(119899) +
119873119879
sum
119899=1
1198891015840
2(119899)) (10)
During identification phase if the computed distance 11988910158401
is greater than1198791015840119860 the query input is determined as not in the
database Otherwise the query input is highly similar to thetop soundtrack title in the candidate list
6 Experiments and Results
Before conducting the experiments we collect 750 sound-tracks frommany CD titles for constructing the database andfor identification In the experiments 30 seconds of musicis excerpted from the soundtracks as the reference items tobe identified Then low- and full-resolution descriptors ofthe reference items are calculated and stored in the databaseWe also randomly excerpt 15-second query items from thereference items Note that a query item may start fromany sample on the first half of the reference To test theidentification accuracy for compressed audio the 15-secondquery inputs are also encoded and then decoded with anMP-3 coder with bitrates of 192 k and 96 k respectivelyHowever the database only contains descriptors from theuncompressed items In addition the principal componentsare also obtained using uncompressed items
Table 4 Comparison of search time between 119896-119889 tree search andlinear search
119870-119889 tree Linear searchAverage time 0255 sec 323 sec
Several experiments are conducted to examine the per-formance of the proposed systemThefirst experiment checksthe identification accuracy using high-resolution descriptorsThe second one compares the relative identification accuracybetween the proposed method (in Section 42) and themethod given in [21] The third experiment compares thesearch speed by using the linear search and k-d tree searchThe next experiment intends to determine a suitable valueof M used in (8) and (9) Having the value of M we reportthe identification accuracy using a multiresolution strategyin experiment five In this experiment both method I andmethod II are examined To have a complete comparisonbetween method I and II we also report the results in termsof the ROC (receiver operating characteristics) [25] curveand the DET-like (detection error tradeoff) [26] curve butwithout logarithm
61 Experiment One Identification Accuracy Using High-Resolution Descriptors The first experiment is to examinethe identification accuracy using the high-resolutionMPEG-7 audio signature descriptors (without reduction) To evaluatethe influences of high-frequency descriptors we also exam-ine the accuracies with and without using high-frequency(greater than 25 kHz) descriptors The results are given inTable 2 In the table ldquomatch in first onerdquo is the rate that thequery input is correctly identified with the shortest distanceamong all reference items Similarly ldquomatch in first 15rdquo is therate that the query is correctly identified within a list of 15 ref-erence items sorted by distanceThe results indicate that high-resolution descriptors have very good identification accuracyAlso for uncompressed or 192 k MP-3 items whether usinghigh-frequency descriptors does not affect the identificationaccuracy However for 96 k MP-3 items discarding high-frequency descriptors greatly improves the accuracy
8 The Scientific World Journal
Seco
nd d
istan
ce
Firs
t dist
ance
AvgAvg
dividedivide
Figure 8 The first and second distances in method II
62 Experiment Two Identification Accuracy Using Low-Resolution Descriptors The second experiment compares therelative identification accuracies of the proposed dimension-ality reduction technique and the averaging method pro-posed in [21] In this experiment eight descriptors are used torepresent one segment of music (15 seconds) The results areshown in Table 3 For this experiment the accuracy rate inldquomatch in first 15rdquo ismore important than the figure in ldquomatchin first onerdquo because the low-resolution descriptors are usedto produce a candidate list Therefore it is acceptable as longas the correct item is within the list as the high-resolutiondescriptors are to be used in the second stage In this regardthe proposed approach is slightly better than the averagingmethod proposed in [21] for 96 k MP-3 query inputs
63 Experiment Three Time for k-d Tree Search and LinearSearch The third experiment compares the time to generatea list of 20 candidates by using k-d tree search and linear(exhausted) search for low-resolution descriptors The com-puting time is obtained by averaging the required time in tentrials For each trial 500 query items are randomly selectedamong the 750 available items The experimental platform isa personal computer with a 16GHz PentiumCPU and 15 GBmemory The program is written in C with a Borland C++compiler The results are shown in Table 4 The results showthat we can increase the search time by 10-fold if the low-resolution descriptors are embedded in the k-d tree structure
64 Experiment Four Determination of M in Method II Thefourth experiment is to determine the numberM in methodII Recall that the thresholds in both methods are computedbased on high-resolution descriptors To determine the opti-mal value of M we vary this value from one to eighteen in(8) and (9) to compute the threshold 1198791015840
119860and then to perform
identification Note that method II is degenerated to methodI if M = 1 In this experiment 96 k MP-3-coded items areused as the query In addition we deliberately keep the high-frequency descriptors in computing the distances becausekeeping them makes it easier to examine the influence of Mversus identification rate (defined in experiment five ) Theexperimental results are given in Figure 9 The results showthat whenM is equal or greater than 10 the identification rate
085408560858
0860862086408660868
087087208740876
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Iden
tifica
tion
rate
119872 value
Figure 9The identification rate versus the numberM for 96 kMP-3query inputs
remains almost constant Therefore M = 10 is sufficient formethod II This value is then used in experiment five
65 Experiment Five Comparison between Method I andMethod II The fifth experiment compares the identificationrates between method I and method II The thresholds arecalculated using (7) and (10) respectively In this experiment375 of the 750 15-second items are randomly chosen fortraining (ie to obtain the thresholds) The rest of 375 itemsare for query Note that all of the items are excerpted from thereference items in the database In addition we test other 500query items (also with duration of 15 sec) not excerpted fromdatabase references
Since some query inputs are not in the database wealso need to consider the situations of falsely identifying anoutside database item as one inside the database and viceversa Suppose that 119879
119889query items are actually inside the
database and 119879119900query items are not inside the database
Assume that the identification system correctly identifies119873119889
inside-database query item erroneously rejects 119873119903inside-
database query items and erroneously accepts 119873119886outside
database query items Then the IDR (identification rate)
The Scientific World Journal 9
Table 5 Comparison of identification performance using method I and method II with the use of high-frequency descriptors
Uncompressed 192 k MP-3 96 k MP-3Method I Method II Method I Method II Method I Method II
IDR 988 999 99 997 841 874FAR 21 02 18 02 04 02FRR 00 00 00 05 366 296ACC 984 999 989 995 718 768
Table 6 Comparison of method I and method II for IDR FAR and FRR with high-frequency descriptors removed
Uncompressed 192 k MP-3 96 k MP-3Method I Method II Method I Method II Method I Method II
IDR 991 997 989 997 990 997FAR 15 02 19 02 16 02FRR 03 04 02 04 03 06ACC 987 995 985 995 986 994
FAR (false-accept rate) and FRR (false-rejection rate) arecomputed as follows
IDR =119873119889
119879119889minus 119873119903
(11)
FAR =119873119886
119879119900
(12)
FRR =119873119903
119879119889
(13)
In addition the accuracy of the system is computed as
ACC =119873119889+ 119873119900
119879119889+ 119879119900
(14)
where119873119900is the number of query items correctly identified as
outside-database items and can be computed as119873119900= 119879119900minus119873119886
In this experiment the low-resolution descriptors areused to find 20 candidates Next high-resolution descriptorsare used to find the best-matched one in the list If thedistance associated with the best-match is greater than thethreshold the query input is determined as not in thedatabase Otherwise the query input is identified as the best-matched one In this experiment high-frequency descriptorsare kept in the comparison With the use of (11) to (14) theaverage values after four trials are given in Table 5 Fromthe results we know that the system does not perform wellfor identifying 96 k MP-3 query items This is mainly dueto high distortion in high-frequency descriptors A similarphenomenon can also be observed in Table 2 Later onwe will repeat this experiment but discard high-frequencydescriptors
To further examine the performance differences betweenmethod I and method II we again use 96 k MP-3 items asthe query inputs But this time we vary the threshold values(for both methods) and then record the IDR FAR and FRRvalues The results are presented as ROC curves [25] andDET-like curves [26] (but without logarithm) as shown in
Figures 10 and 11 respectively The ROC curve plots FARversus TPR (true positive rate) which is defined as
TPR =119873119889
119879119889
(15)
Conceptually a better classifier should have a higher TPRfor a fixed FAR Therefore a better classifier should havean ROC curve closer to the left-upper corner With thisinterpretation from Figure 10 we know that method II is abetter classifier
In this experiment we also show DET-like curves Aldquotruerdquo DET curve plots FAR versus FRR using logarithmscales In our case however we do not use logarithm scales toexhibit the similarity between ROC and DET curves Againconceptually we know that a better classifier should have alower FRR over a fixed FAR Thus a better classifier shouldhave a curve closer to the left-bottom corner By examiningFigure 11 we again confirm thatmethod II is a better method
Since the high-frequency descriptors actually affect theidentification rate at low bitrates we again repeat this exper-iment without using high-frequency descriptors The resultsare given in Table 6 By comparing the results in Tables 5 and6 we know that removing high-frequency descriptors slightlyreduces the accuracy of uncompressed items However itgreatly improves the accuracy for 96 k MP-3 items Sincemethod II has an accuracy of at least 994 in all cases it ishighly plausible to identify copyrighted audiomaterials usingthis method
7 Conclusion
In this paper we propose a system using a multiresolutionstrategy to identify whether a piece of unknown music isidentical to one of the pieces in the database In the systemhigh-resolution descriptors are MPEG-7 audio signaturedescriptors and the low-resolution descriptors are obtainedfrom high-resolution descriptors with the aid of PCA toreduce their dimensionality Experimental results show that
10 The Scientific World Journal
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100
FAR ()
TPR
()
ROC curve (96 k MP-3)
Method IMethod II
Figure 10 The ROC curves for both methods
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100
FAR ()
FRR
()
Method IMethod II
DET curve (96 k MP-3)
Figure 11 The DET-like curves without logarithm for both meth-ods
low-resolution descriptors still have high identification accu-racies To reduce the time to generate the candidate list weuse the k-d tree structure to store low-resolution descriptorsExperimental results show that using k-d tree structureincreases the search time by ten-folds Since not every pieceof query input is within the database we also proposed twomethods to determine the distance thresholds Experimentalresults show that the proposed method II provides an accu-racy of 994 Therefore the proposed system can be usedin real applications such as identifying copyrighted audiofiles circulated over the Internet As it can be easily extendedto operate with multiple computers the proposed system
is a plausible starting point to construct a large operabledatabase
Acknowledgment
This work was supported in part by National Science Councilof Taiwan through Grants NSC 94-2213-E-027-042 and 99-2221-E-027-097
References
[1] For a brief description of the bill please refer to httpenwikipediaorgwikiSOPA
[2] R Eklund ldquoAudio watermarking techniquesrdquo httpwwwmusemagiccompaperswatermarkhtml
[3] J S R Jang and H R Lee ldquoA general framework of progressivefiltering and its application to query by singinghummingrdquoIEEE Transactions on Audio Speech and Language Processingvol 16 no 2 pp 350ndash358 2008
[4] S Baluja and M Covell ldquoAudio fingerprinting combiningcomputer visionamp data streamprocessingrdquo in Proceedings of theIEEE International Conference on Acoustics Speech and SignalProcessing (ICASSP rsquo07) pp II213ndashII216 Honolulu HawaiiUSA April 2007
[5] Shazam httpwwwshazamcommusicwebhomehtml[6] A Wang ldquoThe Shazam music recognition servicerdquo Communi-
cations of the ACM vol 49 no 8 pp 44ndash48 2006[7] V Chandrasekha M Sharifi and D A Ross ldquoSurvey and eval-
uation of audio fingerprinting schemes for mobile query-by-example applicationsrdquo in Proceedinjgs of the 12th InternationalConference onMusic Information Retrieval pp 801ndash806MiamiFla USA October 2011
[8] J A Haitsma and T Kalker ldquoA highly robust audio fingerprint-ing systemrdquo in Proceedinjgs of the 12th International Conferenceon Music Information Retrieval pp 107ndash115 Paris FranceOctober 2002
[9] C J C Burges J C Platt and S Jana ldquoDistortion discriminantanalysis for audio fingerprintingrdquo IEEE Transactions on Speechand Audio Processing vol 11 no 3 pp 165ndash174 2003
[10] Official website httpmusicbrainzorg[11] Official website httpwwwaudiblemagiccom[12] Official website httpwwwgracenotecom[13] httpenwikipediaorgwikiAcoustic fingerprint[14] P J O Doets M M Gisbert and R L Lagendijk ldquoOn
the comparison of audio fingerprints for extracting qualityparameters of compressed audiordquo in Security Steganographyand Watermarking of Multimedia Contents VIII vol 6072 ofProceedings of SPIE pp 60720L-1ndash60720L-12 San Jose CalifUSA January 2006
[15] F Nack and A T Lindsay ldquoEverything you wanted to knowaboutMPEG-7 part 1rdquo IEEEMultimedia vol 6 no 3 pp 65ndash771999
[16] F Nack and A Lindsay ldquoEverything you wanted to know aboutMPEG-7 part 2rdquo IEEEMultimedia vol 6 no 4 pp 64ndash73 1999
[17] ISOIEC Information TechnologymdashMultimedia ContentDescription Interface -Part 4 Audio IS 15938-4 2002
[18] H Crysandt ldquoMusic identification with MPEG-7rdquo in Proceed-ings of the 115th AES Convention October 2003 Paper 5967
The Scientific World Journal 11
[19] OHellmuth E AllamanceMCremerHGrossmann J Herreand T Kastner ldquoUsing MPEG-7 audio fingerprinting in real-world applicationrdquo in Proceedings of the 115th AES ConventionOctober 2003 Paper 5961
[20] H-G Kim N Moreau and T Sikora MPEG-7 Audio andBeyond John Wiley amp Sons 2005
[21] J-Y Lee and S D You ldquoDimension-reduction technique forMPEG-7 Audio Descriptorsrdquo in Proceedings of the 6th PacificRim Conference on Multimedia (PCM rsquo05) vol 3768 of LectureNotes in Computer Science pp 526ndash537 2005
[22] J L Bentley ldquoMultidimensional binary search trees used forassociative searchingrdquo Communications of the ACM vol 18 no9 pp 509ndash517 1975
[23] Open source program for k-d tree httpwwwcsumdedusimmountANN
[24] J Shlens ldquoA tutorial on principal component analysisrdquohttpwwwsnlsalkedusimshlenspcapdf
[25] T Fawcell ldquoROC graphs notes and practical considerations forresearchersrdquo Tech Rep HP Lab 2004
[26] A Martin G Doddington T Kamm M Ordowski and MPrzybocki ldquoThe DET curve in assessment of detection taskperformancerdquo in Proceedings of the Eurospeech vol 4 pp 1899ndash1903 Rhodes Greece September 1997
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World Journal 5
Query
Database
Dim reduction
In database
Reject Result
Threshold
Candidate
MPEG-7descriptors
Fullresolution
search
Lowresolution
search
YesNo
Figure 3 Procedure for multiresolution music identification
Descriptors
Descriptors Descriptors1198861198991 11988611989924 119886119899+11 119886119899+124
1199021198991 11990211989924
Figure 4 The audio samples and the descriptors In a real applica-tion 119886
119899119896is stored in the database whereas 119902
119899119896is computed from
the query input Usually 119886119899119896
is not equal to 119902119899119896
due to differentscopes of the windows even though they are all derived from thesame soundtrack
not highly similar their corresponding descriptors usuallyhave large differences With a scaling ratio of 256 eachdescriptor represents around 77 seconds of audio samplesTherefore unless the query input also starts at a point veryclose to the segment boundary of a soundtrack a comparisonbased on these descriptors may fail As illustrated in Figure 4descriptors of 119886
119899119896(or 119886119899+1119896
) and 119902119899119896
are quite different andtherefore the descriptor-based identification scheme cannotidentify the query inputTherefore a suitable time resolutionfor example 048 second should be maintained
42 Proposed Dimensionality Reduction Method In contrastto reduce the number of descriptors by increasing the scalingratio we may reduce them in each temporal-spectral block(ie the matrix in Figure 1) representing a segment of (15-second) music However to maintain a high time resolutionsuccessive temporal-spectral block should have a small timedifference as shown in Figure 5 With this arrangement weachieve both high identification rate and low comparisoncomplexity
For each temporal-spectral block we use PCA (princi-pal component analysis) [21 24] to reduce the number ofdescriptors Since it is difficult to directly use PCA to obtaingood low-resolution descriptors [21] we use an alternativeapproach Its basic idea is to partition the entire block into
119860 =
middot middot middot
119886124
11988621 119886224
119886311
119886321
1198863124
middot middot middot
11988611
11988631 119886324middot middot middot
middot middot middot
middot middot middot
middot middot middot
1198863224
119902331 1199023324
middot middot middot
middot middot middot
middot middot middot 1198866224119886621
11989111 11989118
11989121 11989128
11989131 11989138
Figure 5 Low-resolution descriptors In the figure each temporal-spectral block corresponds to descriptors from 15-second music
119860 = =11986011 1198601211986021 11986022
middot middot middot middot middot middot
middot middot middot
middot middot middot
middot middot middot
middot middot middot
middot middot middot
middot middot middot
middot middot middot
middot middot middot
11988611 11988618
119886151
119886161
119886291
119886158
119886168
119886298
11988619
119886159
119886169
119886299
119886116
1198861516
1198861616
1198862916 1198862917 1198862924
119886124119886117
Figure 6 Partition of a temporal-spectral block into four subblocksin the experiments
four subblocks as shown in Figure 6 and then use PCA toreduce the number of descriptors of each subblock into twovalues (descriptors) called low-resolution descriptors Alsohigh-frequency descriptors in the original temporal-spectralblock are all discarded because they are susceptible to noise[21] With this arrangement totally eight descriptors are usedto represent a segment of 15-second music Note that theactual duration of a segment used in the experiments is 1404seconds (though we still say 15 seconds) because some audiosamples are lost after MP-3 compression and decompressionin the experiments
6 The Scientific World Journal
Table 2 Identification accuracy using high-resolution features
Uncompressed 192 k MP-3 96 k MP-3High-frequencydescriptors Used Not used Used Not used Used Not used
Match in firstone 100 100 100 100 94 100
Match in first 15 100 100 100 100 98 100
Wenowdescribe how to calculate low-resolution descrip-tors Since using PCA for dimensionality reduction is a well-known approach we only describe how to construct thecovariance matrix for PCA computation and omit the com-putation of finding principal components Suppose that thereare N segments of music pieces in the database with theirdescriptor matrices denoted as 119860(1) to 119860(119873) To simplify theargument we consider the subblock matrix119860(119896)
11(referring to
Figure 6) in the following Other subblocks can be computedby the same procedure First collect all subblocks from 119860
(119896)
11
to form a big matrix as follows
119861 =
[[[
[
119886(1)
11sdot sdot sdot 119886
(1)
18119886(2)
11sdot sdot sdot 119886
(2)
18sdot sdot sdot 119886(119873)
11sdot sdot sdot 119886(119873)
18
sdot sdot sdot
119886(1)
151sdot sdot sdot 119886(1)
158119886(2)
151sdot sdot sdot 119886(2)
158sdot sdot sdot 119886(119873)
151sdot sdot sdot 119886(119873)
158
]]]
]
= [
[
uarr sdot sdot sdot uarr
b1sdot sdot sdot b8119873
darr sdot sdot sdot darr
]
]
(5)
Then we may consider column vectors of matrix B as b119894
vectors Having the data vectors b119894 the covariance matrix
and subsequently the principal components can be foundBy keeping only two principal components we can obtainc119894= [11988811989411198881198942] from b
119894 Since there are eight column vectors in
a subblock descriptors in119860(119896)11
are reduced to eight c119894(8119896minus7 le
119894 le 8119896) vectors Next we can rearrange the obtained c119894vectors
as
119862 =[[
[
11988811
11988812
11988891
11988892
sdot sdot sdot 1198888119873minus71
1198888119873minus72
sdot sdot sdot
11988881
11988882
119888161
119888162
sdot sdot sdot 11988881198731
11988881198732
]]
]
(6)
Again by treating each row of matrix C as the data to beprocessed by PCA we can compute the covariance matrixand finally reduce each column vector to one value Sincethere are two columns originally from one subblock thereare totally two (low-resolution) descriptors per subblockEquivalently a segment of 15-second music is representedby eight descriptors During the computation the principalcomponents obtained in the first and the second steps shouldbe stored in the database When a query input (in the formof high-resolution descriptors) is supplied these componentsare used to obtain the low-resolution descriptors for theinput
Table 3 Comparison of identification accuracies using low-resolution descriptors obtained by the proposed approach and bythe averaging (avg) method [21]
Uncompressed 192 k MP-3 96 k MP-3Method Proposed Avg Proposed Avg Proposed AvgMatch in firstone 993 995 991 993 901 898
Match in first15 100 100 100 100 996 995
5 Threshold to Determine the Membership ofthe Query Input
Since in practice the database cannot collect all musicsoundtracks in the world we have to have a strategy todetermine if the query input is actually in the database ornot In our case the decision is accomplished with the aid ofa threshold If the shortest distances between the input andthe candidates are greater than a threshold then the queryinput is not in the database otherwise it is in the databaseThough the concept is simple it is not trivial to determinethe threshold [20]
Recall that when a query input is applied to the proposedsystem the system computes the Euclidean distances betweenthe low-resolution descriptors of the query input and thosein the database Accordingly m (say m = 20) segments ofsoundtracks from distinct titles are selected based on thecomputed distances Next the Euclidean distances betweenthe query and the selected segments are computed againusing full-resolution descriptors By sorting the distancesfrom small to large the system creates a candidate list for thequery input
To compute the previously mentioned threshold weexamine two different methods In the first method (methodI) we define the ldquofirstrdquo distance 119889
1as the (full-resolution)
distance associated with the first candidate in the list Simi-larly the ldquosecondrdquo distance 119889
2is the distance associated with
the second candidate as shown in Figure 7 The methodto compute the first and second distances is used both intraining and in identification For the training phase assumethat there are 119873
119879soundtracks used For a segment from a
soundtrack indexed n let the first distance be 1198891(119899) and the
second distance be 1198892(119899) The threshold 119879
119860is then computed
as
119879119860= 05 sdot (
119873119879
sum
119899=1
1198891(119899) +
119873119879
sum
119899=1
1198892(119899)) (7)
During the identification phase the first distance forthe query input is computed If the first distance is smallerthan the threshold 119879
119860 the query input is determined to be
highly similar to the top soundtrack title in the candidate listOtherwise the query input is not inside the database
In addition to method I we also propose an alternativemethod (method II) to compute the first and seconddistances
The Scientific World Journal 7
Firs
t dist
ance
Seco
nd d
istan
ce
Figure 7 The first and second distances in method I
and the threshold Referring to Figure 8 the first distance1198891015840
1(119899) of method II for a training segment in soundtrack n is
1198891015840
1(119899) = 119889
1(119899) divide
sum119872
119898=2119889119898(119899)
119872 minus 1 (8)
where 119889119898(119899) is the distance associatedwith themth candidate
in the list andM is a constant Based on our experimentM =10 is sufficient By the same arrangement the second distance1198891015840
2(119899) is calculated as
1198891015840
2(119899) = 119889
2(119899) divide
sum119872+1
119898=3119889119898(119899)
119872 minus 1 (9)
Similar to (7) the threshold 1198791015840119860is determined as
1198791015840
119860= 05 sdot (
119873119879
sum
119899=1
1198891015840
1(119899) +
119873119879
sum
119899=1
1198891015840
2(119899)) (10)
During identification phase if the computed distance 11988910158401
is greater than1198791015840119860 the query input is determined as not in the
database Otherwise the query input is highly similar to thetop soundtrack title in the candidate list
6 Experiments and Results
Before conducting the experiments we collect 750 sound-tracks frommany CD titles for constructing the database andfor identification In the experiments 30 seconds of musicis excerpted from the soundtracks as the reference items tobe identified Then low- and full-resolution descriptors ofthe reference items are calculated and stored in the databaseWe also randomly excerpt 15-second query items from thereference items Note that a query item may start fromany sample on the first half of the reference To test theidentification accuracy for compressed audio the 15-secondquery inputs are also encoded and then decoded with anMP-3 coder with bitrates of 192 k and 96 k respectivelyHowever the database only contains descriptors from theuncompressed items In addition the principal componentsare also obtained using uncompressed items
Table 4 Comparison of search time between 119896-119889 tree search andlinear search
119870-119889 tree Linear searchAverage time 0255 sec 323 sec
Several experiments are conducted to examine the per-formance of the proposed systemThefirst experiment checksthe identification accuracy using high-resolution descriptorsThe second one compares the relative identification accuracybetween the proposed method (in Section 42) and themethod given in [21] The third experiment compares thesearch speed by using the linear search and k-d tree searchThe next experiment intends to determine a suitable valueof M used in (8) and (9) Having the value of M we reportthe identification accuracy using a multiresolution strategyin experiment five In this experiment both method I andmethod II are examined To have a complete comparisonbetween method I and II we also report the results in termsof the ROC (receiver operating characteristics) [25] curveand the DET-like (detection error tradeoff) [26] curve butwithout logarithm
61 Experiment One Identification Accuracy Using High-Resolution Descriptors The first experiment is to examinethe identification accuracy using the high-resolutionMPEG-7 audio signature descriptors (without reduction) To evaluatethe influences of high-frequency descriptors we also exam-ine the accuracies with and without using high-frequency(greater than 25 kHz) descriptors The results are given inTable 2 In the table ldquomatch in first onerdquo is the rate that thequery input is correctly identified with the shortest distanceamong all reference items Similarly ldquomatch in first 15rdquo is therate that the query is correctly identified within a list of 15 ref-erence items sorted by distanceThe results indicate that high-resolution descriptors have very good identification accuracyAlso for uncompressed or 192 k MP-3 items whether usinghigh-frequency descriptors does not affect the identificationaccuracy However for 96 k MP-3 items discarding high-frequency descriptors greatly improves the accuracy
8 The Scientific World Journal
Seco
nd d
istan
ce
Firs
t dist
ance
AvgAvg
dividedivide
Figure 8 The first and second distances in method II
62 Experiment Two Identification Accuracy Using Low-Resolution Descriptors The second experiment compares therelative identification accuracies of the proposed dimension-ality reduction technique and the averaging method pro-posed in [21] In this experiment eight descriptors are used torepresent one segment of music (15 seconds) The results areshown in Table 3 For this experiment the accuracy rate inldquomatch in first 15rdquo ismore important than the figure in ldquomatchin first onerdquo because the low-resolution descriptors are usedto produce a candidate list Therefore it is acceptable as longas the correct item is within the list as the high-resolutiondescriptors are to be used in the second stage In this regardthe proposed approach is slightly better than the averagingmethod proposed in [21] for 96 k MP-3 query inputs
63 Experiment Three Time for k-d Tree Search and LinearSearch The third experiment compares the time to generatea list of 20 candidates by using k-d tree search and linear(exhausted) search for low-resolution descriptors The com-puting time is obtained by averaging the required time in tentrials For each trial 500 query items are randomly selectedamong the 750 available items The experimental platform isa personal computer with a 16GHz PentiumCPU and 15 GBmemory The program is written in C with a Borland C++compiler The results are shown in Table 4 The results showthat we can increase the search time by 10-fold if the low-resolution descriptors are embedded in the k-d tree structure
64 Experiment Four Determination of M in Method II Thefourth experiment is to determine the numberM in methodII Recall that the thresholds in both methods are computedbased on high-resolution descriptors To determine the opti-mal value of M we vary this value from one to eighteen in(8) and (9) to compute the threshold 1198791015840
119860and then to perform
identification Note that method II is degenerated to methodI if M = 1 In this experiment 96 k MP-3-coded items areused as the query In addition we deliberately keep the high-frequency descriptors in computing the distances becausekeeping them makes it easier to examine the influence of Mversus identification rate (defined in experiment five ) Theexperimental results are given in Figure 9 The results showthat whenM is equal or greater than 10 the identification rate
085408560858
0860862086408660868
087087208740876
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Iden
tifica
tion
rate
119872 value
Figure 9The identification rate versus the numberM for 96 kMP-3query inputs
remains almost constant Therefore M = 10 is sufficient formethod II This value is then used in experiment five
65 Experiment Five Comparison between Method I andMethod II The fifth experiment compares the identificationrates between method I and method II The thresholds arecalculated using (7) and (10) respectively In this experiment375 of the 750 15-second items are randomly chosen fortraining (ie to obtain the thresholds) The rest of 375 itemsare for query Note that all of the items are excerpted from thereference items in the database In addition we test other 500query items (also with duration of 15 sec) not excerpted fromdatabase references
Since some query inputs are not in the database wealso need to consider the situations of falsely identifying anoutside database item as one inside the database and viceversa Suppose that 119879
119889query items are actually inside the
database and 119879119900query items are not inside the database
Assume that the identification system correctly identifies119873119889
inside-database query item erroneously rejects 119873119903inside-
database query items and erroneously accepts 119873119886outside
database query items Then the IDR (identification rate)
The Scientific World Journal 9
Table 5 Comparison of identification performance using method I and method II with the use of high-frequency descriptors
Uncompressed 192 k MP-3 96 k MP-3Method I Method II Method I Method II Method I Method II
IDR 988 999 99 997 841 874FAR 21 02 18 02 04 02FRR 00 00 00 05 366 296ACC 984 999 989 995 718 768
Table 6 Comparison of method I and method II for IDR FAR and FRR with high-frequency descriptors removed
Uncompressed 192 k MP-3 96 k MP-3Method I Method II Method I Method II Method I Method II
IDR 991 997 989 997 990 997FAR 15 02 19 02 16 02FRR 03 04 02 04 03 06ACC 987 995 985 995 986 994
FAR (false-accept rate) and FRR (false-rejection rate) arecomputed as follows
IDR =119873119889
119879119889minus 119873119903
(11)
FAR =119873119886
119879119900
(12)
FRR =119873119903
119879119889
(13)
In addition the accuracy of the system is computed as
ACC =119873119889+ 119873119900
119879119889+ 119879119900
(14)
where119873119900is the number of query items correctly identified as
outside-database items and can be computed as119873119900= 119879119900minus119873119886
In this experiment the low-resolution descriptors areused to find 20 candidates Next high-resolution descriptorsare used to find the best-matched one in the list If thedistance associated with the best-match is greater than thethreshold the query input is determined as not in thedatabase Otherwise the query input is identified as the best-matched one In this experiment high-frequency descriptorsare kept in the comparison With the use of (11) to (14) theaverage values after four trials are given in Table 5 Fromthe results we know that the system does not perform wellfor identifying 96 k MP-3 query items This is mainly dueto high distortion in high-frequency descriptors A similarphenomenon can also be observed in Table 2 Later onwe will repeat this experiment but discard high-frequencydescriptors
To further examine the performance differences betweenmethod I and method II we again use 96 k MP-3 items asthe query inputs But this time we vary the threshold values(for both methods) and then record the IDR FAR and FRRvalues The results are presented as ROC curves [25] andDET-like curves [26] (but without logarithm) as shown in
Figures 10 and 11 respectively The ROC curve plots FARversus TPR (true positive rate) which is defined as
TPR =119873119889
119879119889
(15)
Conceptually a better classifier should have a higher TPRfor a fixed FAR Therefore a better classifier should havean ROC curve closer to the left-upper corner With thisinterpretation from Figure 10 we know that method II is abetter classifier
In this experiment we also show DET-like curves Aldquotruerdquo DET curve plots FAR versus FRR using logarithmscales In our case however we do not use logarithm scales toexhibit the similarity between ROC and DET curves Againconceptually we know that a better classifier should have alower FRR over a fixed FAR Thus a better classifier shouldhave a curve closer to the left-bottom corner By examiningFigure 11 we again confirm thatmethod II is a better method
Since the high-frequency descriptors actually affect theidentification rate at low bitrates we again repeat this exper-iment without using high-frequency descriptors The resultsare given in Table 6 By comparing the results in Tables 5 and6 we know that removing high-frequency descriptors slightlyreduces the accuracy of uncompressed items However itgreatly improves the accuracy for 96 k MP-3 items Sincemethod II has an accuracy of at least 994 in all cases it ishighly plausible to identify copyrighted audiomaterials usingthis method
7 Conclusion
In this paper we propose a system using a multiresolutionstrategy to identify whether a piece of unknown music isidentical to one of the pieces in the database In the systemhigh-resolution descriptors are MPEG-7 audio signaturedescriptors and the low-resolution descriptors are obtainedfrom high-resolution descriptors with the aid of PCA toreduce their dimensionality Experimental results show that
10 The Scientific World Journal
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100
FAR ()
TPR
()
ROC curve (96 k MP-3)
Method IMethod II
Figure 10 The ROC curves for both methods
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100
FAR ()
FRR
()
Method IMethod II
DET curve (96 k MP-3)
Figure 11 The DET-like curves without logarithm for both meth-ods
low-resolution descriptors still have high identification accu-racies To reduce the time to generate the candidate list weuse the k-d tree structure to store low-resolution descriptorsExperimental results show that using k-d tree structureincreases the search time by ten-folds Since not every pieceof query input is within the database we also proposed twomethods to determine the distance thresholds Experimentalresults show that the proposed method II provides an accu-racy of 994 Therefore the proposed system can be usedin real applications such as identifying copyrighted audiofiles circulated over the Internet As it can be easily extendedto operate with multiple computers the proposed system
is a plausible starting point to construct a large operabledatabase
Acknowledgment
This work was supported in part by National Science Councilof Taiwan through Grants NSC 94-2213-E-027-042 and 99-2221-E-027-097
References
[1] For a brief description of the bill please refer to httpenwikipediaorgwikiSOPA
[2] R Eklund ldquoAudio watermarking techniquesrdquo httpwwwmusemagiccompaperswatermarkhtml
[3] J S R Jang and H R Lee ldquoA general framework of progressivefiltering and its application to query by singinghummingrdquoIEEE Transactions on Audio Speech and Language Processingvol 16 no 2 pp 350ndash358 2008
[4] S Baluja and M Covell ldquoAudio fingerprinting combiningcomputer visionamp data streamprocessingrdquo in Proceedings of theIEEE International Conference on Acoustics Speech and SignalProcessing (ICASSP rsquo07) pp II213ndashII216 Honolulu HawaiiUSA April 2007
[5] Shazam httpwwwshazamcommusicwebhomehtml[6] A Wang ldquoThe Shazam music recognition servicerdquo Communi-
cations of the ACM vol 49 no 8 pp 44ndash48 2006[7] V Chandrasekha M Sharifi and D A Ross ldquoSurvey and eval-
uation of audio fingerprinting schemes for mobile query-by-example applicationsrdquo in Proceedinjgs of the 12th InternationalConference onMusic Information Retrieval pp 801ndash806MiamiFla USA October 2011
[8] J A Haitsma and T Kalker ldquoA highly robust audio fingerprint-ing systemrdquo in Proceedinjgs of the 12th International Conferenceon Music Information Retrieval pp 107ndash115 Paris FranceOctober 2002
[9] C J C Burges J C Platt and S Jana ldquoDistortion discriminantanalysis for audio fingerprintingrdquo IEEE Transactions on Speechand Audio Processing vol 11 no 3 pp 165ndash174 2003
[10] Official website httpmusicbrainzorg[11] Official website httpwwwaudiblemagiccom[12] Official website httpwwwgracenotecom[13] httpenwikipediaorgwikiAcoustic fingerprint[14] P J O Doets M M Gisbert and R L Lagendijk ldquoOn
the comparison of audio fingerprints for extracting qualityparameters of compressed audiordquo in Security Steganographyand Watermarking of Multimedia Contents VIII vol 6072 ofProceedings of SPIE pp 60720L-1ndash60720L-12 San Jose CalifUSA January 2006
[15] F Nack and A T Lindsay ldquoEverything you wanted to knowaboutMPEG-7 part 1rdquo IEEEMultimedia vol 6 no 3 pp 65ndash771999
[16] F Nack and A Lindsay ldquoEverything you wanted to know aboutMPEG-7 part 2rdquo IEEEMultimedia vol 6 no 4 pp 64ndash73 1999
[17] ISOIEC Information TechnologymdashMultimedia ContentDescription Interface -Part 4 Audio IS 15938-4 2002
[18] H Crysandt ldquoMusic identification with MPEG-7rdquo in Proceed-ings of the 115th AES Convention October 2003 Paper 5967
The Scientific World Journal 11
[19] OHellmuth E AllamanceMCremerHGrossmann J Herreand T Kastner ldquoUsing MPEG-7 audio fingerprinting in real-world applicationrdquo in Proceedings of the 115th AES ConventionOctober 2003 Paper 5961
[20] H-G Kim N Moreau and T Sikora MPEG-7 Audio andBeyond John Wiley amp Sons 2005
[21] J-Y Lee and S D You ldquoDimension-reduction technique forMPEG-7 Audio Descriptorsrdquo in Proceedings of the 6th PacificRim Conference on Multimedia (PCM rsquo05) vol 3768 of LectureNotes in Computer Science pp 526ndash537 2005
[22] J L Bentley ldquoMultidimensional binary search trees used forassociative searchingrdquo Communications of the ACM vol 18 no9 pp 509ndash517 1975
[23] Open source program for k-d tree httpwwwcsumdedusimmountANN
[24] J Shlens ldquoA tutorial on principal component analysisrdquohttpwwwsnlsalkedusimshlenspcapdf
[25] T Fawcell ldquoROC graphs notes and practical considerations forresearchersrdquo Tech Rep HP Lab 2004
[26] A Martin G Doddington T Kamm M Ordowski and MPrzybocki ldquoThe DET curve in assessment of detection taskperformancerdquo in Proceedings of the Eurospeech vol 4 pp 1899ndash1903 Rhodes Greece September 1997
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
6 The Scientific World Journal
Table 2 Identification accuracy using high-resolution features
Uncompressed 192 k MP-3 96 k MP-3High-frequencydescriptors Used Not used Used Not used Used Not used
Match in firstone 100 100 100 100 94 100
Match in first 15 100 100 100 100 98 100
Wenowdescribe how to calculate low-resolution descrip-tors Since using PCA for dimensionality reduction is a well-known approach we only describe how to construct thecovariance matrix for PCA computation and omit the com-putation of finding principal components Suppose that thereare N segments of music pieces in the database with theirdescriptor matrices denoted as 119860(1) to 119860(119873) To simplify theargument we consider the subblock matrix119860(119896)
11(referring to
Figure 6) in the following Other subblocks can be computedby the same procedure First collect all subblocks from 119860
(119896)
11
to form a big matrix as follows
119861 =
[[[
[
119886(1)
11sdot sdot sdot 119886
(1)
18119886(2)
11sdot sdot sdot 119886
(2)
18sdot sdot sdot 119886(119873)
11sdot sdot sdot 119886(119873)
18
sdot sdot sdot
119886(1)
151sdot sdot sdot 119886(1)
158119886(2)
151sdot sdot sdot 119886(2)
158sdot sdot sdot 119886(119873)
151sdot sdot sdot 119886(119873)
158
]]]
]
= [
[
uarr sdot sdot sdot uarr
b1sdot sdot sdot b8119873
darr sdot sdot sdot darr
]
]
(5)
Then we may consider column vectors of matrix B as b119894
vectors Having the data vectors b119894 the covariance matrix
and subsequently the principal components can be foundBy keeping only two principal components we can obtainc119894= [11988811989411198881198942] from b
119894 Since there are eight column vectors in
a subblock descriptors in119860(119896)11
are reduced to eight c119894(8119896minus7 le
119894 le 8119896) vectors Next we can rearrange the obtained c119894vectors
as
119862 =[[
[
11988811
11988812
11988891
11988892
sdot sdot sdot 1198888119873minus71
1198888119873minus72
sdot sdot sdot
11988881
11988882
119888161
119888162
sdot sdot sdot 11988881198731
11988881198732
]]
]
(6)
Again by treating each row of matrix C as the data to beprocessed by PCA we can compute the covariance matrixand finally reduce each column vector to one value Sincethere are two columns originally from one subblock thereare totally two (low-resolution) descriptors per subblockEquivalently a segment of 15-second music is representedby eight descriptors During the computation the principalcomponents obtained in the first and the second steps shouldbe stored in the database When a query input (in the formof high-resolution descriptors) is supplied these componentsare used to obtain the low-resolution descriptors for theinput
Table 3 Comparison of identification accuracies using low-resolution descriptors obtained by the proposed approach and bythe averaging (avg) method [21]
Uncompressed 192 k MP-3 96 k MP-3Method Proposed Avg Proposed Avg Proposed AvgMatch in firstone 993 995 991 993 901 898
Match in first15 100 100 100 100 996 995
5 Threshold to Determine the Membership ofthe Query Input
Since in practice the database cannot collect all musicsoundtracks in the world we have to have a strategy todetermine if the query input is actually in the database ornot In our case the decision is accomplished with the aid ofa threshold If the shortest distances between the input andthe candidates are greater than a threshold then the queryinput is not in the database otherwise it is in the databaseThough the concept is simple it is not trivial to determinethe threshold [20]
Recall that when a query input is applied to the proposedsystem the system computes the Euclidean distances betweenthe low-resolution descriptors of the query input and thosein the database Accordingly m (say m = 20) segments ofsoundtracks from distinct titles are selected based on thecomputed distances Next the Euclidean distances betweenthe query and the selected segments are computed againusing full-resolution descriptors By sorting the distancesfrom small to large the system creates a candidate list for thequery input
To compute the previously mentioned threshold weexamine two different methods In the first method (methodI) we define the ldquofirstrdquo distance 119889
1as the (full-resolution)
distance associated with the first candidate in the list Simi-larly the ldquosecondrdquo distance 119889
2is the distance associated with
the second candidate as shown in Figure 7 The methodto compute the first and second distances is used both intraining and in identification For the training phase assumethat there are 119873
119879soundtracks used For a segment from a
soundtrack indexed n let the first distance be 1198891(119899) and the
second distance be 1198892(119899) The threshold 119879
119860is then computed
as
119879119860= 05 sdot (
119873119879
sum
119899=1
1198891(119899) +
119873119879
sum
119899=1
1198892(119899)) (7)
During the identification phase the first distance forthe query input is computed If the first distance is smallerthan the threshold 119879
119860 the query input is determined to be
highly similar to the top soundtrack title in the candidate listOtherwise the query input is not inside the database
In addition to method I we also propose an alternativemethod (method II) to compute the first and seconddistances
The Scientific World Journal 7
Firs
t dist
ance
Seco
nd d
istan
ce
Figure 7 The first and second distances in method I
and the threshold Referring to Figure 8 the first distance1198891015840
1(119899) of method II for a training segment in soundtrack n is
1198891015840
1(119899) = 119889
1(119899) divide
sum119872
119898=2119889119898(119899)
119872 minus 1 (8)
where 119889119898(119899) is the distance associatedwith themth candidate
in the list andM is a constant Based on our experimentM =10 is sufficient By the same arrangement the second distance1198891015840
2(119899) is calculated as
1198891015840
2(119899) = 119889
2(119899) divide
sum119872+1
119898=3119889119898(119899)
119872 minus 1 (9)
Similar to (7) the threshold 1198791015840119860is determined as
1198791015840
119860= 05 sdot (
119873119879
sum
119899=1
1198891015840
1(119899) +
119873119879
sum
119899=1
1198891015840
2(119899)) (10)
During identification phase if the computed distance 11988910158401
is greater than1198791015840119860 the query input is determined as not in the
database Otherwise the query input is highly similar to thetop soundtrack title in the candidate list
6 Experiments and Results
Before conducting the experiments we collect 750 sound-tracks frommany CD titles for constructing the database andfor identification In the experiments 30 seconds of musicis excerpted from the soundtracks as the reference items tobe identified Then low- and full-resolution descriptors ofthe reference items are calculated and stored in the databaseWe also randomly excerpt 15-second query items from thereference items Note that a query item may start fromany sample on the first half of the reference To test theidentification accuracy for compressed audio the 15-secondquery inputs are also encoded and then decoded with anMP-3 coder with bitrates of 192 k and 96 k respectivelyHowever the database only contains descriptors from theuncompressed items In addition the principal componentsare also obtained using uncompressed items
Table 4 Comparison of search time between 119896-119889 tree search andlinear search
119870-119889 tree Linear searchAverage time 0255 sec 323 sec
Several experiments are conducted to examine the per-formance of the proposed systemThefirst experiment checksthe identification accuracy using high-resolution descriptorsThe second one compares the relative identification accuracybetween the proposed method (in Section 42) and themethod given in [21] The third experiment compares thesearch speed by using the linear search and k-d tree searchThe next experiment intends to determine a suitable valueof M used in (8) and (9) Having the value of M we reportthe identification accuracy using a multiresolution strategyin experiment five In this experiment both method I andmethod II are examined To have a complete comparisonbetween method I and II we also report the results in termsof the ROC (receiver operating characteristics) [25] curveand the DET-like (detection error tradeoff) [26] curve butwithout logarithm
61 Experiment One Identification Accuracy Using High-Resolution Descriptors The first experiment is to examinethe identification accuracy using the high-resolutionMPEG-7 audio signature descriptors (without reduction) To evaluatethe influences of high-frequency descriptors we also exam-ine the accuracies with and without using high-frequency(greater than 25 kHz) descriptors The results are given inTable 2 In the table ldquomatch in first onerdquo is the rate that thequery input is correctly identified with the shortest distanceamong all reference items Similarly ldquomatch in first 15rdquo is therate that the query is correctly identified within a list of 15 ref-erence items sorted by distanceThe results indicate that high-resolution descriptors have very good identification accuracyAlso for uncompressed or 192 k MP-3 items whether usinghigh-frequency descriptors does not affect the identificationaccuracy However for 96 k MP-3 items discarding high-frequency descriptors greatly improves the accuracy
8 The Scientific World Journal
Seco
nd d
istan
ce
Firs
t dist
ance
AvgAvg
dividedivide
Figure 8 The first and second distances in method II
62 Experiment Two Identification Accuracy Using Low-Resolution Descriptors The second experiment compares therelative identification accuracies of the proposed dimension-ality reduction technique and the averaging method pro-posed in [21] In this experiment eight descriptors are used torepresent one segment of music (15 seconds) The results areshown in Table 3 For this experiment the accuracy rate inldquomatch in first 15rdquo ismore important than the figure in ldquomatchin first onerdquo because the low-resolution descriptors are usedto produce a candidate list Therefore it is acceptable as longas the correct item is within the list as the high-resolutiondescriptors are to be used in the second stage In this regardthe proposed approach is slightly better than the averagingmethod proposed in [21] for 96 k MP-3 query inputs
63 Experiment Three Time for k-d Tree Search and LinearSearch The third experiment compares the time to generatea list of 20 candidates by using k-d tree search and linear(exhausted) search for low-resolution descriptors The com-puting time is obtained by averaging the required time in tentrials For each trial 500 query items are randomly selectedamong the 750 available items The experimental platform isa personal computer with a 16GHz PentiumCPU and 15 GBmemory The program is written in C with a Borland C++compiler The results are shown in Table 4 The results showthat we can increase the search time by 10-fold if the low-resolution descriptors are embedded in the k-d tree structure
64 Experiment Four Determination of M in Method II Thefourth experiment is to determine the numberM in methodII Recall that the thresholds in both methods are computedbased on high-resolution descriptors To determine the opti-mal value of M we vary this value from one to eighteen in(8) and (9) to compute the threshold 1198791015840
119860and then to perform
identification Note that method II is degenerated to methodI if M = 1 In this experiment 96 k MP-3-coded items areused as the query In addition we deliberately keep the high-frequency descriptors in computing the distances becausekeeping them makes it easier to examine the influence of Mversus identification rate (defined in experiment five ) Theexperimental results are given in Figure 9 The results showthat whenM is equal or greater than 10 the identification rate
085408560858
0860862086408660868
087087208740876
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Iden
tifica
tion
rate
119872 value
Figure 9The identification rate versus the numberM for 96 kMP-3query inputs
remains almost constant Therefore M = 10 is sufficient formethod II This value is then used in experiment five
65 Experiment Five Comparison between Method I andMethod II The fifth experiment compares the identificationrates between method I and method II The thresholds arecalculated using (7) and (10) respectively In this experiment375 of the 750 15-second items are randomly chosen fortraining (ie to obtain the thresholds) The rest of 375 itemsare for query Note that all of the items are excerpted from thereference items in the database In addition we test other 500query items (also with duration of 15 sec) not excerpted fromdatabase references
Since some query inputs are not in the database wealso need to consider the situations of falsely identifying anoutside database item as one inside the database and viceversa Suppose that 119879
119889query items are actually inside the
database and 119879119900query items are not inside the database
Assume that the identification system correctly identifies119873119889
inside-database query item erroneously rejects 119873119903inside-
database query items and erroneously accepts 119873119886outside
database query items Then the IDR (identification rate)
The Scientific World Journal 9
Table 5 Comparison of identification performance using method I and method II with the use of high-frequency descriptors
Uncompressed 192 k MP-3 96 k MP-3Method I Method II Method I Method II Method I Method II
IDR 988 999 99 997 841 874FAR 21 02 18 02 04 02FRR 00 00 00 05 366 296ACC 984 999 989 995 718 768
Table 6 Comparison of method I and method II for IDR FAR and FRR with high-frequency descriptors removed
Uncompressed 192 k MP-3 96 k MP-3Method I Method II Method I Method II Method I Method II
IDR 991 997 989 997 990 997FAR 15 02 19 02 16 02FRR 03 04 02 04 03 06ACC 987 995 985 995 986 994
FAR (false-accept rate) and FRR (false-rejection rate) arecomputed as follows
IDR =119873119889
119879119889minus 119873119903
(11)
FAR =119873119886
119879119900
(12)
FRR =119873119903
119879119889
(13)
In addition the accuracy of the system is computed as
ACC =119873119889+ 119873119900
119879119889+ 119879119900
(14)
where119873119900is the number of query items correctly identified as
outside-database items and can be computed as119873119900= 119879119900minus119873119886
In this experiment the low-resolution descriptors areused to find 20 candidates Next high-resolution descriptorsare used to find the best-matched one in the list If thedistance associated with the best-match is greater than thethreshold the query input is determined as not in thedatabase Otherwise the query input is identified as the best-matched one In this experiment high-frequency descriptorsare kept in the comparison With the use of (11) to (14) theaverage values after four trials are given in Table 5 Fromthe results we know that the system does not perform wellfor identifying 96 k MP-3 query items This is mainly dueto high distortion in high-frequency descriptors A similarphenomenon can also be observed in Table 2 Later onwe will repeat this experiment but discard high-frequencydescriptors
To further examine the performance differences betweenmethod I and method II we again use 96 k MP-3 items asthe query inputs But this time we vary the threshold values(for both methods) and then record the IDR FAR and FRRvalues The results are presented as ROC curves [25] andDET-like curves [26] (but without logarithm) as shown in
Figures 10 and 11 respectively The ROC curve plots FARversus TPR (true positive rate) which is defined as
TPR =119873119889
119879119889
(15)
Conceptually a better classifier should have a higher TPRfor a fixed FAR Therefore a better classifier should havean ROC curve closer to the left-upper corner With thisinterpretation from Figure 10 we know that method II is abetter classifier
In this experiment we also show DET-like curves Aldquotruerdquo DET curve plots FAR versus FRR using logarithmscales In our case however we do not use logarithm scales toexhibit the similarity between ROC and DET curves Againconceptually we know that a better classifier should have alower FRR over a fixed FAR Thus a better classifier shouldhave a curve closer to the left-bottom corner By examiningFigure 11 we again confirm thatmethod II is a better method
Since the high-frequency descriptors actually affect theidentification rate at low bitrates we again repeat this exper-iment without using high-frequency descriptors The resultsare given in Table 6 By comparing the results in Tables 5 and6 we know that removing high-frequency descriptors slightlyreduces the accuracy of uncompressed items However itgreatly improves the accuracy for 96 k MP-3 items Sincemethod II has an accuracy of at least 994 in all cases it ishighly plausible to identify copyrighted audiomaterials usingthis method
7 Conclusion
In this paper we propose a system using a multiresolutionstrategy to identify whether a piece of unknown music isidentical to one of the pieces in the database In the systemhigh-resolution descriptors are MPEG-7 audio signaturedescriptors and the low-resolution descriptors are obtainedfrom high-resolution descriptors with the aid of PCA toreduce their dimensionality Experimental results show that
10 The Scientific World Journal
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100
FAR ()
TPR
()
ROC curve (96 k MP-3)
Method IMethod II
Figure 10 The ROC curves for both methods
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100
FAR ()
FRR
()
Method IMethod II
DET curve (96 k MP-3)
Figure 11 The DET-like curves without logarithm for both meth-ods
low-resolution descriptors still have high identification accu-racies To reduce the time to generate the candidate list weuse the k-d tree structure to store low-resolution descriptorsExperimental results show that using k-d tree structureincreases the search time by ten-folds Since not every pieceof query input is within the database we also proposed twomethods to determine the distance thresholds Experimentalresults show that the proposed method II provides an accu-racy of 994 Therefore the proposed system can be usedin real applications such as identifying copyrighted audiofiles circulated over the Internet As it can be easily extendedto operate with multiple computers the proposed system
is a plausible starting point to construct a large operabledatabase
Acknowledgment
This work was supported in part by National Science Councilof Taiwan through Grants NSC 94-2213-E-027-042 and 99-2221-E-027-097
References
[1] For a brief description of the bill please refer to httpenwikipediaorgwikiSOPA
[2] R Eklund ldquoAudio watermarking techniquesrdquo httpwwwmusemagiccompaperswatermarkhtml
[3] J S R Jang and H R Lee ldquoA general framework of progressivefiltering and its application to query by singinghummingrdquoIEEE Transactions on Audio Speech and Language Processingvol 16 no 2 pp 350ndash358 2008
[4] S Baluja and M Covell ldquoAudio fingerprinting combiningcomputer visionamp data streamprocessingrdquo in Proceedings of theIEEE International Conference on Acoustics Speech and SignalProcessing (ICASSP rsquo07) pp II213ndashII216 Honolulu HawaiiUSA April 2007
[5] Shazam httpwwwshazamcommusicwebhomehtml[6] A Wang ldquoThe Shazam music recognition servicerdquo Communi-
cations of the ACM vol 49 no 8 pp 44ndash48 2006[7] V Chandrasekha M Sharifi and D A Ross ldquoSurvey and eval-
uation of audio fingerprinting schemes for mobile query-by-example applicationsrdquo in Proceedinjgs of the 12th InternationalConference onMusic Information Retrieval pp 801ndash806MiamiFla USA October 2011
[8] J A Haitsma and T Kalker ldquoA highly robust audio fingerprint-ing systemrdquo in Proceedinjgs of the 12th International Conferenceon Music Information Retrieval pp 107ndash115 Paris FranceOctober 2002
[9] C J C Burges J C Platt and S Jana ldquoDistortion discriminantanalysis for audio fingerprintingrdquo IEEE Transactions on Speechand Audio Processing vol 11 no 3 pp 165ndash174 2003
[10] Official website httpmusicbrainzorg[11] Official website httpwwwaudiblemagiccom[12] Official website httpwwwgracenotecom[13] httpenwikipediaorgwikiAcoustic fingerprint[14] P J O Doets M M Gisbert and R L Lagendijk ldquoOn
the comparison of audio fingerprints for extracting qualityparameters of compressed audiordquo in Security Steganographyand Watermarking of Multimedia Contents VIII vol 6072 ofProceedings of SPIE pp 60720L-1ndash60720L-12 San Jose CalifUSA January 2006
[15] F Nack and A T Lindsay ldquoEverything you wanted to knowaboutMPEG-7 part 1rdquo IEEEMultimedia vol 6 no 3 pp 65ndash771999
[16] F Nack and A Lindsay ldquoEverything you wanted to know aboutMPEG-7 part 2rdquo IEEEMultimedia vol 6 no 4 pp 64ndash73 1999
[17] ISOIEC Information TechnologymdashMultimedia ContentDescription Interface -Part 4 Audio IS 15938-4 2002
[18] H Crysandt ldquoMusic identification with MPEG-7rdquo in Proceed-ings of the 115th AES Convention October 2003 Paper 5967
The Scientific World Journal 11
[19] OHellmuth E AllamanceMCremerHGrossmann J Herreand T Kastner ldquoUsing MPEG-7 audio fingerprinting in real-world applicationrdquo in Proceedings of the 115th AES ConventionOctober 2003 Paper 5961
[20] H-G Kim N Moreau and T Sikora MPEG-7 Audio andBeyond John Wiley amp Sons 2005
[21] J-Y Lee and S D You ldquoDimension-reduction technique forMPEG-7 Audio Descriptorsrdquo in Proceedings of the 6th PacificRim Conference on Multimedia (PCM rsquo05) vol 3768 of LectureNotes in Computer Science pp 526ndash537 2005
[22] J L Bentley ldquoMultidimensional binary search trees used forassociative searchingrdquo Communications of the ACM vol 18 no9 pp 509ndash517 1975
[23] Open source program for k-d tree httpwwwcsumdedusimmountANN
[24] J Shlens ldquoA tutorial on principal component analysisrdquohttpwwwsnlsalkedusimshlenspcapdf
[25] T Fawcell ldquoROC graphs notes and practical considerations forresearchersrdquo Tech Rep HP Lab 2004
[26] A Martin G Doddington T Kamm M Ordowski and MPrzybocki ldquoThe DET curve in assessment of detection taskperformancerdquo in Proceedings of the Eurospeech vol 4 pp 1899ndash1903 Rhodes Greece September 1997
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World Journal 7
Firs
t dist
ance
Seco
nd d
istan
ce
Figure 7 The first and second distances in method I
and the threshold Referring to Figure 8 the first distance1198891015840
1(119899) of method II for a training segment in soundtrack n is
1198891015840
1(119899) = 119889
1(119899) divide
sum119872
119898=2119889119898(119899)
119872 minus 1 (8)
where 119889119898(119899) is the distance associatedwith themth candidate
in the list andM is a constant Based on our experimentM =10 is sufficient By the same arrangement the second distance1198891015840
2(119899) is calculated as
1198891015840
2(119899) = 119889
2(119899) divide
sum119872+1
119898=3119889119898(119899)
119872 minus 1 (9)
Similar to (7) the threshold 1198791015840119860is determined as
1198791015840
119860= 05 sdot (
119873119879
sum
119899=1
1198891015840
1(119899) +
119873119879
sum
119899=1
1198891015840
2(119899)) (10)
During identification phase if the computed distance 11988910158401
is greater than1198791015840119860 the query input is determined as not in the
database Otherwise the query input is highly similar to thetop soundtrack title in the candidate list
6 Experiments and Results
Before conducting the experiments we collect 750 sound-tracks frommany CD titles for constructing the database andfor identification In the experiments 30 seconds of musicis excerpted from the soundtracks as the reference items tobe identified Then low- and full-resolution descriptors ofthe reference items are calculated and stored in the databaseWe also randomly excerpt 15-second query items from thereference items Note that a query item may start fromany sample on the first half of the reference To test theidentification accuracy for compressed audio the 15-secondquery inputs are also encoded and then decoded with anMP-3 coder with bitrates of 192 k and 96 k respectivelyHowever the database only contains descriptors from theuncompressed items In addition the principal componentsare also obtained using uncompressed items
Table 4 Comparison of search time between 119896-119889 tree search andlinear search
119870-119889 tree Linear searchAverage time 0255 sec 323 sec
Several experiments are conducted to examine the per-formance of the proposed systemThefirst experiment checksthe identification accuracy using high-resolution descriptorsThe second one compares the relative identification accuracybetween the proposed method (in Section 42) and themethod given in [21] The third experiment compares thesearch speed by using the linear search and k-d tree searchThe next experiment intends to determine a suitable valueof M used in (8) and (9) Having the value of M we reportthe identification accuracy using a multiresolution strategyin experiment five In this experiment both method I andmethod II are examined To have a complete comparisonbetween method I and II we also report the results in termsof the ROC (receiver operating characteristics) [25] curveand the DET-like (detection error tradeoff) [26] curve butwithout logarithm
61 Experiment One Identification Accuracy Using High-Resolution Descriptors The first experiment is to examinethe identification accuracy using the high-resolutionMPEG-7 audio signature descriptors (without reduction) To evaluatethe influences of high-frequency descriptors we also exam-ine the accuracies with and without using high-frequency(greater than 25 kHz) descriptors The results are given inTable 2 In the table ldquomatch in first onerdquo is the rate that thequery input is correctly identified with the shortest distanceamong all reference items Similarly ldquomatch in first 15rdquo is therate that the query is correctly identified within a list of 15 ref-erence items sorted by distanceThe results indicate that high-resolution descriptors have very good identification accuracyAlso for uncompressed or 192 k MP-3 items whether usinghigh-frequency descriptors does not affect the identificationaccuracy However for 96 k MP-3 items discarding high-frequency descriptors greatly improves the accuracy
8 The Scientific World Journal
Seco
nd d
istan
ce
Firs
t dist
ance
AvgAvg
dividedivide
Figure 8 The first and second distances in method II
62 Experiment Two Identification Accuracy Using Low-Resolution Descriptors The second experiment compares therelative identification accuracies of the proposed dimension-ality reduction technique and the averaging method pro-posed in [21] In this experiment eight descriptors are used torepresent one segment of music (15 seconds) The results areshown in Table 3 For this experiment the accuracy rate inldquomatch in first 15rdquo ismore important than the figure in ldquomatchin first onerdquo because the low-resolution descriptors are usedto produce a candidate list Therefore it is acceptable as longas the correct item is within the list as the high-resolutiondescriptors are to be used in the second stage In this regardthe proposed approach is slightly better than the averagingmethod proposed in [21] for 96 k MP-3 query inputs
63 Experiment Three Time for k-d Tree Search and LinearSearch The third experiment compares the time to generatea list of 20 candidates by using k-d tree search and linear(exhausted) search for low-resolution descriptors The com-puting time is obtained by averaging the required time in tentrials For each trial 500 query items are randomly selectedamong the 750 available items The experimental platform isa personal computer with a 16GHz PentiumCPU and 15 GBmemory The program is written in C with a Borland C++compiler The results are shown in Table 4 The results showthat we can increase the search time by 10-fold if the low-resolution descriptors are embedded in the k-d tree structure
64 Experiment Four Determination of M in Method II Thefourth experiment is to determine the numberM in methodII Recall that the thresholds in both methods are computedbased on high-resolution descriptors To determine the opti-mal value of M we vary this value from one to eighteen in(8) and (9) to compute the threshold 1198791015840
119860and then to perform
identification Note that method II is degenerated to methodI if M = 1 In this experiment 96 k MP-3-coded items areused as the query In addition we deliberately keep the high-frequency descriptors in computing the distances becausekeeping them makes it easier to examine the influence of Mversus identification rate (defined in experiment five ) Theexperimental results are given in Figure 9 The results showthat whenM is equal or greater than 10 the identification rate
085408560858
0860862086408660868
087087208740876
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Iden
tifica
tion
rate
119872 value
Figure 9The identification rate versus the numberM for 96 kMP-3query inputs
remains almost constant Therefore M = 10 is sufficient formethod II This value is then used in experiment five
65 Experiment Five Comparison between Method I andMethod II The fifth experiment compares the identificationrates between method I and method II The thresholds arecalculated using (7) and (10) respectively In this experiment375 of the 750 15-second items are randomly chosen fortraining (ie to obtain the thresholds) The rest of 375 itemsare for query Note that all of the items are excerpted from thereference items in the database In addition we test other 500query items (also with duration of 15 sec) not excerpted fromdatabase references
Since some query inputs are not in the database wealso need to consider the situations of falsely identifying anoutside database item as one inside the database and viceversa Suppose that 119879
119889query items are actually inside the
database and 119879119900query items are not inside the database
Assume that the identification system correctly identifies119873119889
inside-database query item erroneously rejects 119873119903inside-
database query items and erroneously accepts 119873119886outside
database query items Then the IDR (identification rate)
The Scientific World Journal 9
Table 5 Comparison of identification performance using method I and method II with the use of high-frequency descriptors
Uncompressed 192 k MP-3 96 k MP-3Method I Method II Method I Method II Method I Method II
IDR 988 999 99 997 841 874FAR 21 02 18 02 04 02FRR 00 00 00 05 366 296ACC 984 999 989 995 718 768
Table 6 Comparison of method I and method II for IDR FAR and FRR with high-frequency descriptors removed
Uncompressed 192 k MP-3 96 k MP-3Method I Method II Method I Method II Method I Method II
IDR 991 997 989 997 990 997FAR 15 02 19 02 16 02FRR 03 04 02 04 03 06ACC 987 995 985 995 986 994
FAR (false-accept rate) and FRR (false-rejection rate) arecomputed as follows
IDR =119873119889
119879119889minus 119873119903
(11)
FAR =119873119886
119879119900
(12)
FRR =119873119903
119879119889
(13)
In addition the accuracy of the system is computed as
ACC =119873119889+ 119873119900
119879119889+ 119879119900
(14)
where119873119900is the number of query items correctly identified as
outside-database items and can be computed as119873119900= 119879119900minus119873119886
In this experiment the low-resolution descriptors areused to find 20 candidates Next high-resolution descriptorsare used to find the best-matched one in the list If thedistance associated with the best-match is greater than thethreshold the query input is determined as not in thedatabase Otherwise the query input is identified as the best-matched one In this experiment high-frequency descriptorsare kept in the comparison With the use of (11) to (14) theaverage values after four trials are given in Table 5 Fromthe results we know that the system does not perform wellfor identifying 96 k MP-3 query items This is mainly dueto high distortion in high-frequency descriptors A similarphenomenon can also be observed in Table 2 Later onwe will repeat this experiment but discard high-frequencydescriptors
To further examine the performance differences betweenmethod I and method II we again use 96 k MP-3 items asthe query inputs But this time we vary the threshold values(for both methods) and then record the IDR FAR and FRRvalues The results are presented as ROC curves [25] andDET-like curves [26] (but without logarithm) as shown in
Figures 10 and 11 respectively The ROC curve plots FARversus TPR (true positive rate) which is defined as
TPR =119873119889
119879119889
(15)
Conceptually a better classifier should have a higher TPRfor a fixed FAR Therefore a better classifier should havean ROC curve closer to the left-upper corner With thisinterpretation from Figure 10 we know that method II is abetter classifier
In this experiment we also show DET-like curves Aldquotruerdquo DET curve plots FAR versus FRR using logarithmscales In our case however we do not use logarithm scales toexhibit the similarity between ROC and DET curves Againconceptually we know that a better classifier should have alower FRR over a fixed FAR Thus a better classifier shouldhave a curve closer to the left-bottom corner By examiningFigure 11 we again confirm thatmethod II is a better method
Since the high-frequency descriptors actually affect theidentification rate at low bitrates we again repeat this exper-iment without using high-frequency descriptors The resultsare given in Table 6 By comparing the results in Tables 5 and6 we know that removing high-frequency descriptors slightlyreduces the accuracy of uncompressed items However itgreatly improves the accuracy for 96 k MP-3 items Sincemethod II has an accuracy of at least 994 in all cases it ishighly plausible to identify copyrighted audiomaterials usingthis method
7 Conclusion
In this paper we propose a system using a multiresolutionstrategy to identify whether a piece of unknown music isidentical to one of the pieces in the database In the systemhigh-resolution descriptors are MPEG-7 audio signaturedescriptors and the low-resolution descriptors are obtainedfrom high-resolution descriptors with the aid of PCA toreduce their dimensionality Experimental results show that
10 The Scientific World Journal
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100
FAR ()
TPR
()
ROC curve (96 k MP-3)
Method IMethod II
Figure 10 The ROC curves for both methods
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100
FAR ()
FRR
()
Method IMethod II
DET curve (96 k MP-3)
Figure 11 The DET-like curves without logarithm for both meth-ods
low-resolution descriptors still have high identification accu-racies To reduce the time to generate the candidate list weuse the k-d tree structure to store low-resolution descriptorsExperimental results show that using k-d tree structureincreases the search time by ten-folds Since not every pieceof query input is within the database we also proposed twomethods to determine the distance thresholds Experimentalresults show that the proposed method II provides an accu-racy of 994 Therefore the proposed system can be usedin real applications such as identifying copyrighted audiofiles circulated over the Internet As it can be easily extendedto operate with multiple computers the proposed system
is a plausible starting point to construct a large operabledatabase
Acknowledgment
This work was supported in part by National Science Councilof Taiwan through Grants NSC 94-2213-E-027-042 and 99-2221-E-027-097
References
[1] For a brief description of the bill please refer to httpenwikipediaorgwikiSOPA
[2] R Eklund ldquoAudio watermarking techniquesrdquo httpwwwmusemagiccompaperswatermarkhtml
[3] J S R Jang and H R Lee ldquoA general framework of progressivefiltering and its application to query by singinghummingrdquoIEEE Transactions on Audio Speech and Language Processingvol 16 no 2 pp 350ndash358 2008
[4] S Baluja and M Covell ldquoAudio fingerprinting combiningcomputer visionamp data streamprocessingrdquo in Proceedings of theIEEE International Conference on Acoustics Speech and SignalProcessing (ICASSP rsquo07) pp II213ndashII216 Honolulu HawaiiUSA April 2007
[5] Shazam httpwwwshazamcommusicwebhomehtml[6] A Wang ldquoThe Shazam music recognition servicerdquo Communi-
cations of the ACM vol 49 no 8 pp 44ndash48 2006[7] V Chandrasekha M Sharifi and D A Ross ldquoSurvey and eval-
uation of audio fingerprinting schemes for mobile query-by-example applicationsrdquo in Proceedinjgs of the 12th InternationalConference onMusic Information Retrieval pp 801ndash806MiamiFla USA October 2011
[8] J A Haitsma and T Kalker ldquoA highly robust audio fingerprint-ing systemrdquo in Proceedinjgs of the 12th International Conferenceon Music Information Retrieval pp 107ndash115 Paris FranceOctober 2002
[9] C J C Burges J C Platt and S Jana ldquoDistortion discriminantanalysis for audio fingerprintingrdquo IEEE Transactions on Speechand Audio Processing vol 11 no 3 pp 165ndash174 2003
[10] Official website httpmusicbrainzorg[11] Official website httpwwwaudiblemagiccom[12] Official website httpwwwgracenotecom[13] httpenwikipediaorgwikiAcoustic fingerprint[14] P J O Doets M M Gisbert and R L Lagendijk ldquoOn
the comparison of audio fingerprints for extracting qualityparameters of compressed audiordquo in Security Steganographyand Watermarking of Multimedia Contents VIII vol 6072 ofProceedings of SPIE pp 60720L-1ndash60720L-12 San Jose CalifUSA January 2006
[15] F Nack and A T Lindsay ldquoEverything you wanted to knowaboutMPEG-7 part 1rdquo IEEEMultimedia vol 6 no 3 pp 65ndash771999
[16] F Nack and A Lindsay ldquoEverything you wanted to know aboutMPEG-7 part 2rdquo IEEEMultimedia vol 6 no 4 pp 64ndash73 1999
[17] ISOIEC Information TechnologymdashMultimedia ContentDescription Interface -Part 4 Audio IS 15938-4 2002
[18] H Crysandt ldquoMusic identification with MPEG-7rdquo in Proceed-ings of the 115th AES Convention October 2003 Paper 5967
The Scientific World Journal 11
[19] OHellmuth E AllamanceMCremerHGrossmann J Herreand T Kastner ldquoUsing MPEG-7 audio fingerprinting in real-world applicationrdquo in Proceedings of the 115th AES ConventionOctober 2003 Paper 5961
[20] H-G Kim N Moreau and T Sikora MPEG-7 Audio andBeyond John Wiley amp Sons 2005
[21] J-Y Lee and S D You ldquoDimension-reduction technique forMPEG-7 Audio Descriptorsrdquo in Proceedings of the 6th PacificRim Conference on Multimedia (PCM rsquo05) vol 3768 of LectureNotes in Computer Science pp 526ndash537 2005
[22] J L Bentley ldquoMultidimensional binary search trees used forassociative searchingrdquo Communications of the ACM vol 18 no9 pp 509ndash517 1975
[23] Open source program for k-d tree httpwwwcsumdedusimmountANN
[24] J Shlens ldquoA tutorial on principal component analysisrdquohttpwwwsnlsalkedusimshlenspcapdf
[25] T Fawcell ldquoROC graphs notes and practical considerations forresearchersrdquo Tech Rep HP Lab 2004
[26] A Martin G Doddington T Kamm M Ordowski and MPrzybocki ldquoThe DET curve in assessment of detection taskperformancerdquo in Proceedings of the Eurospeech vol 4 pp 1899ndash1903 Rhodes Greece September 1997
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
8 The Scientific World Journal
Seco
nd d
istan
ce
Firs
t dist
ance
AvgAvg
dividedivide
Figure 8 The first and second distances in method II
62 Experiment Two Identification Accuracy Using Low-Resolution Descriptors The second experiment compares therelative identification accuracies of the proposed dimension-ality reduction technique and the averaging method pro-posed in [21] In this experiment eight descriptors are used torepresent one segment of music (15 seconds) The results areshown in Table 3 For this experiment the accuracy rate inldquomatch in first 15rdquo ismore important than the figure in ldquomatchin first onerdquo because the low-resolution descriptors are usedto produce a candidate list Therefore it is acceptable as longas the correct item is within the list as the high-resolutiondescriptors are to be used in the second stage In this regardthe proposed approach is slightly better than the averagingmethod proposed in [21] for 96 k MP-3 query inputs
63 Experiment Three Time for k-d Tree Search and LinearSearch The third experiment compares the time to generatea list of 20 candidates by using k-d tree search and linear(exhausted) search for low-resolution descriptors The com-puting time is obtained by averaging the required time in tentrials For each trial 500 query items are randomly selectedamong the 750 available items The experimental platform isa personal computer with a 16GHz PentiumCPU and 15 GBmemory The program is written in C with a Borland C++compiler The results are shown in Table 4 The results showthat we can increase the search time by 10-fold if the low-resolution descriptors are embedded in the k-d tree structure
64 Experiment Four Determination of M in Method II Thefourth experiment is to determine the numberM in methodII Recall that the thresholds in both methods are computedbased on high-resolution descriptors To determine the opti-mal value of M we vary this value from one to eighteen in(8) and (9) to compute the threshold 1198791015840
119860and then to perform
identification Note that method II is degenerated to methodI if M = 1 In this experiment 96 k MP-3-coded items areused as the query In addition we deliberately keep the high-frequency descriptors in computing the distances becausekeeping them makes it easier to examine the influence of Mversus identification rate (defined in experiment five ) Theexperimental results are given in Figure 9 The results showthat whenM is equal or greater than 10 the identification rate
085408560858
0860862086408660868
087087208740876
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Iden
tifica
tion
rate
119872 value
Figure 9The identification rate versus the numberM for 96 kMP-3query inputs
remains almost constant Therefore M = 10 is sufficient formethod II This value is then used in experiment five
65 Experiment Five Comparison between Method I andMethod II The fifth experiment compares the identificationrates between method I and method II The thresholds arecalculated using (7) and (10) respectively In this experiment375 of the 750 15-second items are randomly chosen fortraining (ie to obtain the thresholds) The rest of 375 itemsare for query Note that all of the items are excerpted from thereference items in the database In addition we test other 500query items (also with duration of 15 sec) not excerpted fromdatabase references
Since some query inputs are not in the database wealso need to consider the situations of falsely identifying anoutside database item as one inside the database and viceversa Suppose that 119879
119889query items are actually inside the
database and 119879119900query items are not inside the database
Assume that the identification system correctly identifies119873119889
inside-database query item erroneously rejects 119873119903inside-
database query items and erroneously accepts 119873119886outside
database query items Then the IDR (identification rate)
The Scientific World Journal 9
Table 5 Comparison of identification performance using method I and method II with the use of high-frequency descriptors
Uncompressed 192 k MP-3 96 k MP-3Method I Method II Method I Method II Method I Method II
IDR 988 999 99 997 841 874FAR 21 02 18 02 04 02FRR 00 00 00 05 366 296ACC 984 999 989 995 718 768
Table 6 Comparison of method I and method II for IDR FAR and FRR with high-frequency descriptors removed
Uncompressed 192 k MP-3 96 k MP-3Method I Method II Method I Method II Method I Method II
IDR 991 997 989 997 990 997FAR 15 02 19 02 16 02FRR 03 04 02 04 03 06ACC 987 995 985 995 986 994
FAR (false-accept rate) and FRR (false-rejection rate) arecomputed as follows
IDR =119873119889
119879119889minus 119873119903
(11)
FAR =119873119886
119879119900
(12)
FRR =119873119903
119879119889
(13)
In addition the accuracy of the system is computed as
ACC =119873119889+ 119873119900
119879119889+ 119879119900
(14)
where119873119900is the number of query items correctly identified as
outside-database items and can be computed as119873119900= 119879119900minus119873119886
In this experiment the low-resolution descriptors areused to find 20 candidates Next high-resolution descriptorsare used to find the best-matched one in the list If thedistance associated with the best-match is greater than thethreshold the query input is determined as not in thedatabase Otherwise the query input is identified as the best-matched one In this experiment high-frequency descriptorsare kept in the comparison With the use of (11) to (14) theaverage values after four trials are given in Table 5 Fromthe results we know that the system does not perform wellfor identifying 96 k MP-3 query items This is mainly dueto high distortion in high-frequency descriptors A similarphenomenon can also be observed in Table 2 Later onwe will repeat this experiment but discard high-frequencydescriptors
To further examine the performance differences betweenmethod I and method II we again use 96 k MP-3 items asthe query inputs But this time we vary the threshold values(for both methods) and then record the IDR FAR and FRRvalues The results are presented as ROC curves [25] andDET-like curves [26] (but without logarithm) as shown in
Figures 10 and 11 respectively The ROC curve plots FARversus TPR (true positive rate) which is defined as
TPR =119873119889
119879119889
(15)
Conceptually a better classifier should have a higher TPRfor a fixed FAR Therefore a better classifier should havean ROC curve closer to the left-upper corner With thisinterpretation from Figure 10 we know that method II is abetter classifier
In this experiment we also show DET-like curves Aldquotruerdquo DET curve plots FAR versus FRR using logarithmscales In our case however we do not use logarithm scales toexhibit the similarity between ROC and DET curves Againconceptually we know that a better classifier should have alower FRR over a fixed FAR Thus a better classifier shouldhave a curve closer to the left-bottom corner By examiningFigure 11 we again confirm thatmethod II is a better method
Since the high-frequency descriptors actually affect theidentification rate at low bitrates we again repeat this exper-iment without using high-frequency descriptors The resultsare given in Table 6 By comparing the results in Tables 5 and6 we know that removing high-frequency descriptors slightlyreduces the accuracy of uncompressed items However itgreatly improves the accuracy for 96 k MP-3 items Sincemethod II has an accuracy of at least 994 in all cases it ishighly plausible to identify copyrighted audiomaterials usingthis method
7 Conclusion
In this paper we propose a system using a multiresolutionstrategy to identify whether a piece of unknown music isidentical to one of the pieces in the database In the systemhigh-resolution descriptors are MPEG-7 audio signaturedescriptors and the low-resolution descriptors are obtainedfrom high-resolution descriptors with the aid of PCA toreduce their dimensionality Experimental results show that
10 The Scientific World Journal
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100
FAR ()
TPR
()
ROC curve (96 k MP-3)
Method IMethod II
Figure 10 The ROC curves for both methods
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100
FAR ()
FRR
()
Method IMethod II
DET curve (96 k MP-3)
Figure 11 The DET-like curves without logarithm for both meth-ods
low-resolution descriptors still have high identification accu-racies To reduce the time to generate the candidate list weuse the k-d tree structure to store low-resolution descriptorsExperimental results show that using k-d tree structureincreases the search time by ten-folds Since not every pieceof query input is within the database we also proposed twomethods to determine the distance thresholds Experimentalresults show that the proposed method II provides an accu-racy of 994 Therefore the proposed system can be usedin real applications such as identifying copyrighted audiofiles circulated over the Internet As it can be easily extendedto operate with multiple computers the proposed system
is a plausible starting point to construct a large operabledatabase
Acknowledgment
This work was supported in part by National Science Councilof Taiwan through Grants NSC 94-2213-E-027-042 and 99-2221-E-027-097
References
[1] For a brief description of the bill please refer to httpenwikipediaorgwikiSOPA
[2] R Eklund ldquoAudio watermarking techniquesrdquo httpwwwmusemagiccompaperswatermarkhtml
[3] J S R Jang and H R Lee ldquoA general framework of progressivefiltering and its application to query by singinghummingrdquoIEEE Transactions on Audio Speech and Language Processingvol 16 no 2 pp 350ndash358 2008
[4] S Baluja and M Covell ldquoAudio fingerprinting combiningcomputer visionamp data streamprocessingrdquo in Proceedings of theIEEE International Conference on Acoustics Speech and SignalProcessing (ICASSP rsquo07) pp II213ndashII216 Honolulu HawaiiUSA April 2007
[5] Shazam httpwwwshazamcommusicwebhomehtml[6] A Wang ldquoThe Shazam music recognition servicerdquo Communi-
cations of the ACM vol 49 no 8 pp 44ndash48 2006[7] V Chandrasekha M Sharifi and D A Ross ldquoSurvey and eval-
uation of audio fingerprinting schemes for mobile query-by-example applicationsrdquo in Proceedinjgs of the 12th InternationalConference onMusic Information Retrieval pp 801ndash806MiamiFla USA October 2011
[8] J A Haitsma and T Kalker ldquoA highly robust audio fingerprint-ing systemrdquo in Proceedinjgs of the 12th International Conferenceon Music Information Retrieval pp 107ndash115 Paris FranceOctober 2002
[9] C J C Burges J C Platt and S Jana ldquoDistortion discriminantanalysis for audio fingerprintingrdquo IEEE Transactions on Speechand Audio Processing vol 11 no 3 pp 165ndash174 2003
[10] Official website httpmusicbrainzorg[11] Official website httpwwwaudiblemagiccom[12] Official website httpwwwgracenotecom[13] httpenwikipediaorgwikiAcoustic fingerprint[14] P J O Doets M M Gisbert and R L Lagendijk ldquoOn
the comparison of audio fingerprints for extracting qualityparameters of compressed audiordquo in Security Steganographyand Watermarking of Multimedia Contents VIII vol 6072 ofProceedings of SPIE pp 60720L-1ndash60720L-12 San Jose CalifUSA January 2006
[15] F Nack and A T Lindsay ldquoEverything you wanted to knowaboutMPEG-7 part 1rdquo IEEEMultimedia vol 6 no 3 pp 65ndash771999
[16] F Nack and A Lindsay ldquoEverything you wanted to know aboutMPEG-7 part 2rdquo IEEEMultimedia vol 6 no 4 pp 64ndash73 1999
[17] ISOIEC Information TechnologymdashMultimedia ContentDescription Interface -Part 4 Audio IS 15938-4 2002
[18] H Crysandt ldquoMusic identification with MPEG-7rdquo in Proceed-ings of the 115th AES Convention October 2003 Paper 5967
The Scientific World Journal 11
[19] OHellmuth E AllamanceMCremerHGrossmann J Herreand T Kastner ldquoUsing MPEG-7 audio fingerprinting in real-world applicationrdquo in Proceedings of the 115th AES ConventionOctober 2003 Paper 5961
[20] H-G Kim N Moreau and T Sikora MPEG-7 Audio andBeyond John Wiley amp Sons 2005
[21] J-Y Lee and S D You ldquoDimension-reduction technique forMPEG-7 Audio Descriptorsrdquo in Proceedings of the 6th PacificRim Conference on Multimedia (PCM rsquo05) vol 3768 of LectureNotes in Computer Science pp 526ndash537 2005
[22] J L Bentley ldquoMultidimensional binary search trees used forassociative searchingrdquo Communications of the ACM vol 18 no9 pp 509ndash517 1975
[23] Open source program for k-d tree httpwwwcsumdedusimmountANN
[24] J Shlens ldquoA tutorial on principal component analysisrdquohttpwwwsnlsalkedusimshlenspcapdf
[25] T Fawcell ldquoROC graphs notes and practical considerations forresearchersrdquo Tech Rep HP Lab 2004
[26] A Martin G Doddington T Kamm M Ordowski and MPrzybocki ldquoThe DET curve in assessment of detection taskperformancerdquo in Proceedings of the Eurospeech vol 4 pp 1899ndash1903 Rhodes Greece September 1997
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World Journal 9
Table 5 Comparison of identification performance using method I and method II with the use of high-frequency descriptors
Uncompressed 192 k MP-3 96 k MP-3Method I Method II Method I Method II Method I Method II
IDR 988 999 99 997 841 874FAR 21 02 18 02 04 02FRR 00 00 00 05 366 296ACC 984 999 989 995 718 768
Table 6 Comparison of method I and method II for IDR FAR and FRR with high-frequency descriptors removed
Uncompressed 192 k MP-3 96 k MP-3Method I Method II Method I Method II Method I Method II
IDR 991 997 989 997 990 997FAR 15 02 19 02 16 02FRR 03 04 02 04 03 06ACC 987 995 985 995 986 994
FAR (false-accept rate) and FRR (false-rejection rate) arecomputed as follows
IDR =119873119889
119879119889minus 119873119903
(11)
FAR =119873119886
119879119900
(12)
FRR =119873119903
119879119889
(13)
In addition the accuracy of the system is computed as
ACC =119873119889+ 119873119900
119879119889+ 119879119900
(14)
where119873119900is the number of query items correctly identified as
outside-database items and can be computed as119873119900= 119879119900minus119873119886
In this experiment the low-resolution descriptors areused to find 20 candidates Next high-resolution descriptorsare used to find the best-matched one in the list If thedistance associated with the best-match is greater than thethreshold the query input is determined as not in thedatabase Otherwise the query input is identified as the best-matched one In this experiment high-frequency descriptorsare kept in the comparison With the use of (11) to (14) theaverage values after four trials are given in Table 5 Fromthe results we know that the system does not perform wellfor identifying 96 k MP-3 query items This is mainly dueto high distortion in high-frequency descriptors A similarphenomenon can also be observed in Table 2 Later onwe will repeat this experiment but discard high-frequencydescriptors
To further examine the performance differences betweenmethod I and method II we again use 96 k MP-3 items asthe query inputs But this time we vary the threshold values(for both methods) and then record the IDR FAR and FRRvalues The results are presented as ROC curves [25] andDET-like curves [26] (but without logarithm) as shown in
Figures 10 and 11 respectively The ROC curve plots FARversus TPR (true positive rate) which is defined as
TPR =119873119889
119879119889
(15)
Conceptually a better classifier should have a higher TPRfor a fixed FAR Therefore a better classifier should havean ROC curve closer to the left-upper corner With thisinterpretation from Figure 10 we know that method II is abetter classifier
In this experiment we also show DET-like curves Aldquotruerdquo DET curve plots FAR versus FRR using logarithmscales In our case however we do not use logarithm scales toexhibit the similarity between ROC and DET curves Againconceptually we know that a better classifier should have alower FRR over a fixed FAR Thus a better classifier shouldhave a curve closer to the left-bottom corner By examiningFigure 11 we again confirm thatmethod II is a better method
Since the high-frequency descriptors actually affect theidentification rate at low bitrates we again repeat this exper-iment without using high-frequency descriptors The resultsare given in Table 6 By comparing the results in Tables 5 and6 we know that removing high-frequency descriptors slightlyreduces the accuracy of uncompressed items However itgreatly improves the accuracy for 96 k MP-3 items Sincemethod II has an accuracy of at least 994 in all cases it ishighly plausible to identify copyrighted audiomaterials usingthis method
7 Conclusion
In this paper we propose a system using a multiresolutionstrategy to identify whether a piece of unknown music isidentical to one of the pieces in the database In the systemhigh-resolution descriptors are MPEG-7 audio signaturedescriptors and the low-resolution descriptors are obtainedfrom high-resolution descriptors with the aid of PCA toreduce their dimensionality Experimental results show that
10 The Scientific World Journal
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100
FAR ()
TPR
()
ROC curve (96 k MP-3)
Method IMethod II
Figure 10 The ROC curves for both methods
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100
FAR ()
FRR
()
Method IMethod II
DET curve (96 k MP-3)
Figure 11 The DET-like curves without logarithm for both meth-ods
low-resolution descriptors still have high identification accu-racies To reduce the time to generate the candidate list weuse the k-d tree structure to store low-resolution descriptorsExperimental results show that using k-d tree structureincreases the search time by ten-folds Since not every pieceof query input is within the database we also proposed twomethods to determine the distance thresholds Experimentalresults show that the proposed method II provides an accu-racy of 994 Therefore the proposed system can be usedin real applications such as identifying copyrighted audiofiles circulated over the Internet As it can be easily extendedto operate with multiple computers the proposed system
is a plausible starting point to construct a large operabledatabase
Acknowledgment
This work was supported in part by National Science Councilof Taiwan through Grants NSC 94-2213-E-027-042 and 99-2221-E-027-097
References
[1] For a brief description of the bill please refer to httpenwikipediaorgwikiSOPA
[2] R Eklund ldquoAudio watermarking techniquesrdquo httpwwwmusemagiccompaperswatermarkhtml
[3] J S R Jang and H R Lee ldquoA general framework of progressivefiltering and its application to query by singinghummingrdquoIEEE Transactions on Audio Speech and Language Processingvol 16 no 2 pp 350ndash358 2008
[4] S Baluja and M Covell ldquoAudio fingerprinting combiningcomputer visionamp data streamprocessingrdquo in Proceedings of theIEEE International Conference on Acoustics Speech and SignalProcessing (ICASSP rsquo07) pp II213ndashII216 Honolulu HawaiiUSA April 2007
[5] Shazam httpwwwshazamcommusicwebhomehtml[6] A Wang ldquoThe Shazam music recognition servicerdquo Communi-
cations of the ACM vol 49 no 8 pp 44ndash48 2006[7] V Chandrasekha M Sharifi and D A Ross ldquoSurvey and eval-
uation of audio fingerprinting schemes for mobile query-by-example applicationsrdquo in Proceedinjgs of the 12th InternationalConference onMusic Information Retrieval pp 801ndash806MiamiFla USA October 2011
[8] J A Haitsma and T Kalker ldquoA highly robust audio fingerprint-ing systemrdquo in Proceedinjgs of the 12th International Conferenceon Music Information Retrieval pp 107ndash115 Paris FranceOctober 2002
[9] C J C Burges J C Platt and S Jana ldquoDistortion discriminantanalysis for audio fingerprintingrdquo IEEE Transactions on Speechand Audio Processing vol 11 no 3 pp 165ndash174 2003
[10] Official website httpmusicbrainzorg[11] Official website httpwwwaudiblemagiccom[12] Official website httpwwwgracenotecom[13] httpenwikipediaorgwikiAcoustic fingerprint[14] P J O Doets M M Gisbert and R L Lagendijk ldquoOn
the comparison of audio fingerprints for extracting qualityparameters of compressed audiordquo in Security Steganographyand Watermarking of Multimedia Contents VIII vol 6072 ofProceedings of SPIE pp 60720L-1ndash60720L-12 San Jose CalifUSA January 2006
[15] F Nack and A T Lindsay ldquoEverything you wanted to knowaboutMPEG-7 part 1rdquo IEEEMultimedia vol 6 no 3 pp 65ndash771999
[16] F Nack and A Lindsay ldquoEverything you wanted to know aboutMPEG-7 part 2rdquo IEEEMultimedia vol 6 no 4 pp 64ndash73 1999
[17] ISOIEC Information TechnologymdashMultimedia ContentDescription Interface -Part 4 Audio IS 15938-4 2002
[18] H Crysandt ldquoMusic identification with MPEG-7rdquo in Proceed-ings of the 115th AES Convention October 2003 Paper 5967
The Scientific World Journal 11
[19] OHellmuth E AllamanceMCremerHGrossmann J Herreand T Kastner ldquoUsing MPEG-7 audio fingerprinting in real-world applicationrdquo in Proceedings of the 115th AES ConventionOctober 2003 Paper 5961
[20] H-G Kim N Moreau and T Sikora MPEG-7 Audio andBeyond John Wiley amp Sons 2005
[21] J-Y Lee and S D You ldquoDimension-reduction technique forMPEG-7 Audio Descriptorsrdquo in Proceedings of the 6th PacificRim Conference on Multimedia (PCM rsquo05) vol 3768 of LectureNotes in Computer Science pp 526ndash537 2005
[22] J L Bentley ldquoMultidimensional binary search trees used forassociative searchingrdquo Communications of the ACM vol 18 no9 pp 509ndash517 1975
[23] Open source program for k-d tree httpwwwcsumdedusimmountANN
[24] J Shlens ldquoA tutorial on principal component analysisrdquohttpwwwsnlsalkedusimshlenspcapdf
[25] T Fawcell ldquoROC graphs notes and practical considerations forresearchersrdquo Tech Rep HP Lab 2004
[26] A Martin G Doddington T Kamm M Ordowski and MPrzybocki ldquoThe DET curve in assessment of detection taskperformancerdquo in Proceedings of the Eurospeech vol 4 pp 1899ndash1903 Rhodes Greece September 1997
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
10 The Scientific World Journal
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100
FAR ()
TPR
()
ROC curve (96 k MP-3)
Method IMethod II
Figure 10 The ROC curves for both methods
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100
FAR ()
FRR
()
Method IMethod II
DET curve (96 k MP-3)
Figure 11 The DET-like curves without logarithm for both meth-ods
low-resolution descriptors still have high identification accu-racies To reduce the time to generate the candidate list weuse the k-d tree structure to store low-resolution descriptorsExperimental results show that using k-d tree structureincreases the search time by ten-folds Since not every pieceof query input is within the database we also proposed twomethods to determine the distance thresholds Experimentalresults show that the proposed method II provides an accu-racy of 994 Therefore the proposed system can be usedin real applications such as identifying copyrighted audiofiles circulated over the Internet As it can be easily extendedto operate with multiple computers the proposed system
is a plausible starting point to construct a large operabledatabase
Acknowledgment
This work was supported in part by National Science Councilof Taiwan through Grants NSC 94-2213-E-027-042 and 99-2221-E-027-097
References
[1] For a brief description of the bill please refer to httpenwikipediaorgwikiSOPA
[2] R Eklund ldquoAudio watermarking techniquesrdquo httpwwwmusemagiccompaperswatermarkhtml
[3] J S R Jang and H R Lee ldquoA general framework of progressivefiltering and its application to query by singinghummingrdquoIEEE Transactions on Audio Speech and Language Processingvol 16 no 2 pp 350ndash358 2008
[4] S Baluja and M Covell ldquoAudio fingerprinting combiningcomputer visionamp data streamprocessingrdquo in Proceedings of theIEEE International Conference on Acoustics Speech and SignalProcessing (ICASSP rsquo07) pp II213ndashII216 Honolulu HawaiiUSA April 2007
[5] Shazam httpwwwshazamcommusicwebhomehtml[6] A Wang ldquoThe Shazam music recognition servicerdquo Communi-
cations of the ACM vol 49 no 8 pp 44ndash48 2006[7] V Chandrasekha M Sharifi and D A Ross ldquoSurvey and eval-
uation of audio fingerprinting schemes for mobile query-by-example applicationsrdquo in Proceedinjgs of the 12th InternationalConference onMusic Information Retrieval pp 801ndash806MiamiFla USA October 2011
[8] J A Haitsma and T Kalker ldquoA highly robust audio fingerprint-ing systemrdquo in Proceedinjgs of the 12th International Conferenceon Music Information Retrieval pp 107ndash115 Paris FranceOctober 2002
[9] C J C Burges J C Platt and S Jana ldquoDistortion discriminantanalysis for audio fingerprintingrdquo IEEE Transactions on Speechand Audio Processing vol 11 no 3 pp 165ndash174 2003
[10] Official website httpmusicbrainzorg[11] Official website httpwwwaudiblemagiccom[12] Official website httpwwwgracenotecom[13] httpenwikipediaorgwikiAcoustic fingerprint[14] P J O Doets M M Gisbert and R L Lagendijk ldquoOn
the comparison of audio fingerprints for extracting qualityparameters of compressed audiordquo in Security Steganographyand Watermarking of Multimedia Contents VIII vol 6072 ofProceedings of SPIE pp 60720L-1ndash60720L-12 San Jose CalifUSA January 2006
[15] F Nack and A T Lindsay ldquoEverything you wanted to knowaboutMPEG-7 part 1rdquo IEEEMultimedia vol 6 no 3 pp 65ndash771999
[16] F Nack and A Lindsay ldquoEverything you wanted to know aboutMPEG-7 part 2rdquo IEEEMultimedia vol 6 no 4 pp 64ndash73 1999
[17] ISOIEC Information TechnologymdashMultimedia ContentDescription Interface -Part 4 Audio IS 15938-4 2002
[18] H Crysandt ldquoMusic identification with MPEG-7rdquo in Proceed-ings of the 115th AES Convention October 2003 Paper 5967
The Scientific World Journal 11
[19] OHellmuth E AllamanceMCremerHGrossmann J Herreand T Kastner ldquoUsing MPEG-7 audio fingerprinting in real-world applicationrdquo in Proceedings of the 115th AES ConventionOctober 2003 Paper 5961
[20] H-G Kim N Moreau and T Sikora MPEG-7 Audio andBeyond John Wiley amp Sons 2005
[21] J-Y Lee and S D You ldquoDimension-reduction technique forMPEG-7 Audio Descriptorsrdquo in Proceedings of the 6th PacificRim Conference on Multimedia (PCM rsquo05) vol 3768 of LectureNotes in Computer Science pp 526ndash537 2005
[22] J L Bentley ldquoMultidimensional binary search trees used forassociative searchingrdquo Communications of the ACM vol 18 no9 pp 509ndash517 1975
[23] Open source program for k-d tree httpwwwcsumdedusimmountANN
[24] J Shlens ldquoA tutorial on principal component analysisrdquohttpwwwsnlsalkedusimshlenspcapdf
[25] T Fawcell ldquoROC graphs notes and practical considerations forresearchersrdquo Tech Rep HP Lab 2004
[26] A Martin G Doddington T Kamm M Ordowski and MPrzybocki ldquoThe DET curve in assessment of detection taskperformancerdquo in Proceedings of the Eurospeech vol 4 pp 1899ndash1903 Rhodes Greece September 1997
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World Journal 11
[19] OHellmuth E AllamanceMCremerHGrossmann J Herreand T Kastner ldquoUsing MPEG-7 audio fingerprinting in real-world applicationrdquo in Proceedings of the 115th AES ConventionOctober 2003 Paper 5961
[20] H-G Kim N Moreau and T Sikora MPEG-7 Audio andBeyond John Wiley amp Sons 2005
[21] J-Y Lee and S D You ldquoDimension-reduction technique forMPEG-7 Audio Descriptorsrdquo in Proceedings of the 6th PacificRim Conference on Multimedia (PCM rsquo05) vol 3768 of LectureNotes in Computer Science pp 526ndash537 2005
[22] J L Bentley ldquoMultidimensional binary search trees used forassociative searchingrdquo Communications of the ACM vol 18 no9 pp 509ndash517 1975
[23] Open source program for k-d tree httpwwwcsumdedusimmountANN
[24] J Shlens ldquoA tutorial on principal component analysisrdquohttpwwwsnlsalkedusimshlenspcapdf
[25] T Fawcell ldquoROC graphs notes and practical considerations forresearchersrdquo Tech Rep HP Lab 2004
[26] A Martin G Doddington T Kamm M Ordowski and MPrzybocki ldquoThe DET curve in assessment of detection taskperformancerdquo in Proceedings of the Eurospeech vol 4 pp 1899ndash1903 Rhodes Greece September 1997
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014