5
HAL Id: hal-01161319 https://hal.archives-ouvertes.fr/hal-01161319 Submitted on 8 Jun 2015 HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. Instrument Sound Description in the Context of MPEG-7 Geoffroy Peeters, Stephen Mcadams, Perfecto Herrera To cite this version: Geoffroy Peeters, Stephen Mcadams, Perfecto Herrera. Instrument Sound Description in the Context of MPEG-7. ICMC: International Computer Music Conference, Sep 2000, Berlin, Germany. pp.166- 169. hal-01161319

Instrument Sound Description in the Context of MPEG-7

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Instrument Sound Description in the Context of MPEG-7

HAL Id: hal-01161319https://hal.archives-ouvertes.fr/hal-01161319

Submitted on 8 Jun 2015

HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, estdestinée au dépôt et à la diffusion de documentsscientifiques de niveau recherche, publiés ou non,émanant des établissements d’enseignement et derecherche français ou étrangers, des laboratoirespublics ou privés.

Instrument Sound Description in the Context ofMPEG-7

Geoffroy Peeters, Stephen Mcadams, Perfecto Herrera

To cite this version:Geoffroy Peeters, Stephen Mcadams, Perfecto Herrera. Instrument Sound Description in the Contextof MPEG-7. ICMC: International Computer Music Conference, Sep 2000, Berlin, Germany. pp.166-169. �hal-01161319�

Page 2: Instrument Sound Description in the Context of MPEG-7

Proceedingof ICMC2000(InternationalComputerMusic Conference),Berlin, Germany, August27th- September1st,2000 1

Instrument SoundDescription in the Context of MPEG-7

Geoffroy Peeters

IRCAM(Analysis/SynthesisTeam)

1, pl. Igor StravinskyF-75004Paris- France

Stephen.McAdams

IRCAM / CNRS(MusicalPerceptionandCognition)

1, pl. Igor StravinskyF-75004Paris- France

PerfectoHerrera

IUA / UPF31,Rambla

E-08002Barcelona- Spain

ABSTRACTWereview aproposalmadein theframework of MPEG-7for thedescriptionof instrumentsoundsbasedonperceptualfeatures.Resultsderivedfrom experimentsdealingwith theperceptionof musicaltimbreallow usto approximatethemainunderlyingperceptualdimensionsfrom signaldescriptors.A combinationof thesesignaldescriptorsallows acomparisonbetweensoundsin termsof an estimatedperceived timbral (dis)similarity. Sincethe proposedmodel is basedon experimentsusingmainlysynthesizedsounds,a validationof themodelwasperformedona databasecomposedof naturalmusicalsounds.

INTR ODUCTIONThe increasingamountof multimedia data over networks andthe need for quick and reliable searches,exchangesand dis-tant manipulationsof multimediadatabases,requirean efficientand universally shareddescriptionof thesemedia. MPEG-7[MPEG-7,2000] assessesthis problem by defining a new ISOstandard,to beapproved in 2001,for thedescriptionof multime-diadata.

In the soundfield, the descriptionof the mediaconsistsof adescriptionof the format (encodingformat, samplingrate, etc),metainformation (authorname,copyright owners,etc), seman-tic information (type of musicalevent, nameof the instrumentrecorded),but also of a descriptionof the audio contentitself.This descriptionof the sounditself shouldallow a specificationof eachsound in the databaseindependentlyof its taxonomy.Although many variablescan be proposedin order to achievethis [Wold et al., 1999] [Martin andKim, 1998], relying on hu-manperceptionof soundsallows us to definethemostimportantones.

In this paper, we presenta proposalmadein the framework ofMPEG-7 for the descriptionof soundsbasedon perceptualfea-tures.

Derived from the resultsof IRCAM’s Musical PerceptionandCognitionandAnalysis/Synthesisteams,we definefor eachclassof soundsa setof variablesallowing a descriptionof their mostessentialperceptualfeatures.

We then discussthe representationof this descriptionand itsevaluationin thecontext of a realdatabase.This evaluationwasmadein a collaborationbetweenIRCAM andIUA/UPF.

1 INSTRUMENT SOUNDDESCRIPTION BASED ONPERCEPTUAL FEATURES

Perceptionof sounds has been studied systematically sinceHelmholtz.It is now well acceptedthatsoundscanbedescribedintermsof their pitch, loudness,subjective duration,andsomethingcalled“timbre”. “Timbre” refersto thefeaturesthatallow onetodistinguishtwo soundsthatareequalin pitch, loudness,andsub-jectiveduration.Theunderlyingperceptualmechanismsarerathercomplex. They involve taking into accountseveralperceptualdi-mensionsat thesametime in a possiblycomplex way. “Timbre”is thus a multi-dimensionalfeaturewhich includesamongoth-ers,spectralenvelope,temporalenvelope,andvariationsof eachof them. In orderto understandbetterwhat the “timbre” featurerefersto, numerousexperiments([Plomp,1970,1976],[Wedin&Goude,1972],[Wessel,1979],[Miller & Carterette,1975],[Grey,1977], [Krumhansl, 1989], [McAdams et al., 1995], [Lakatos,2000])have beenperformed.

For the purposes of the description of sounds basedon perceptual features in MPEG-7, we rely on three oftheseexperiments: 1) Krumhansl’s [Krumhansl,1989] and 2)McAdams, Winsberg, Donnadieu, De Soete & Krimphoff ’s[McAdamset al., 1995] experiments,which used the 21 FM-syntheticsoundsfrom [Wesseletal.,1987],mainlysustainedhar-monic sounds,and 3) Lakatos[Lakatos,2000] (re-analyzedby[McAdamsandWinsberg, 2000]), who used36 soundsfrom theMcGill Universitysoundlibrary, bothharmonic(18) andpercus-sive (18) sounds,althoughwe usedonly theresultsof thepercus-sive part.

1.1 SignalDescriptors for the perceptual featuresIn all of theseexperiments,peoplewereaskedfor a(dis)similarityjudgmenton pairsof sounds.Thesejudgmentswere thenused,througha MultidimensionalScaling(MDS) analysis,to representthestimuli into a low-dimensionalspacerevealingtheunderlyingattributesusedby listenerswhenmakingthe judgments.Peopleoften refer to this low-dimensionalrepresentationas a “TimbreSpace”seeFigure1.

For each of theseexperiments,people have tried to qual-ify the dimensions of these timbre spaces, the perceptualaxes, in terms of “brightness”, “attack”, etc. Only re-cently [Grey andGordon,1978] [Krimphoff et al., 1994][Misdariiset al., 1998] have attemptsbeenmadeto quantitativelydescribetheseperceptualaxes, i.e. relatethe perceptualaxes tovariablesderiveddirectly from thesignal:signaldescriptors.

1.1.1 Quantitative descriptionof theperceptualaxesIn thequantitative descriptionof theperceptualaxes,thepositionof eachsoundin the timbre spaceis explainedusingsignal de-scriptorvalues.

For eachaxis, the descriptor(s)that bestexplain(s) the vari-anceof the positionof thesoundson this axis is(are)chosen.Amathematicalrelation betweenthe position on the axis and thevalueof this(these)descriptor(s)is obtainedby alinearregression(multiple-regression)method.Thepositionof asoundin theTim-bre Spacecan thenbe approximatedby a linear combinationofthe descriptorvaluesof eachaxis. Assumingthe orthogonalityof the dimensionsof the space,perceptual(dis)similarity canbeapproximatedby useof anEuclideandistancemodel.

1.1.2 Quantitative descriptionof theperceptualdistance“Timbre” as it is currently understood,is a relative feature. Inthis sensewhatwe areinterestedin is morethedescriptionof therelative positionsof thesoundsthantheir absolutepositionin theTimbreSpace.

Page 3: Instrument Sound Description in the Context of MPEG-7

Proceedingof ICMC2000(InternationalComputerMusic Conference),Berlin, Germany, August27th- September1st,2000 2

-6-4

-20

24

6

-6

-4

-2

0

2

4

6-8

-6

-4

-2

0

2

4

6

8

tpr

tpt

stg

dim3* = dim3

sno

hcd

bsn

gtr

tbn

Timbre Space (Krumhansl)

pno

obc

vbn

ehn

gnthrn

hrp

vbs

cnt

ols

dim2* = -dim1

dim

1* =

dim

2

-3-2

-10

12

3

-3

-2

-1

0

1

2

3-4

-3

-2

-1

0

1

2

3

4

hcd

stg

sno

gtr

tpr

dim3

tpt

gnt

Timbre Space (McAdams)

vbs

bsn

ols

ehn

pno

cntobc

hrp

hrn

tbn

dim2

vbn

dim

1

-0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

dim1

dim

2

Timbre Space (Lakatos)

BambCh

BongSm

Castqn

Celest

Cuica

CymBow

CymStr

LoDrum

Marimb

Snare

StDrum

Tambou

TamTam

TmBloc

TubBel

Tympan

VibBow VibMal

Figure1: Krumhansl,McAdamsetal. andLakatos(re-analyzedby [McAdamsandWinsberg, 2000]) TimbreSpaces

In the quantitative descriptionof the perceptualdistance,theperceiveddistancebetweensoundsis explainedby a linearcom-binationof descriptorvalues. For this, the following systemofequationsis solved usinga least-squaresresolution:where

is thevectorcomposedof theperceiveddistancebetweenall possiblepairsof sounds(for I sounds,thelengthof isI(I-1)/2 ),

is a matrix with elements beingthesquareof thedif-ferenceof thevalueof thedescriptornumberj for thepairofsoundsi,

is the unknown vector of descriptorcoefficients (thelengthof is equalto thenumberof descriptorsused).

For eachchoiceof a setof descriptorscomposing , anapproxi-mationof thevectorof perceiveddistance is comparedto . Thebestsetof descriptors , i.e. theonewith the largestcorrelationor thesmallermodelingerror, is chosen.

1.1.3 Selectionof theAudio DescriptorsFor theselectionof thesignaldescriptorsthatbestexplaintheper-ceptualfeatures,both positionanddistancemethodswereused.Thefirst onewasusedin orderto checkon the independenceofthe descriptorsandto justify the approximationof the perceiveddistanceby a Euclideanmodel.Thenthesecondonewasusedinorderto estimatethedescriptorcoefficients.

Duringtheselection,independenceof thedescriptorto thevari-ation of the analysisparameters(size and shapeof the analy-sis window, hop size)androbustnessto the presenceof noiseinthe signal were taken into account. Finally, amongthe variousformulationsof eachdescriptor(meanof the instantaneousval-ues, rms value, maximumvalue, etc), preferencewas given tothe formulationsallowing derivability of descriptorvaluesfromone temporal scale to another temporal scale (mean ).

Harmonic Timbr eSpaces (seeTable):FromtheKrumhanslandMcAdamset al. experiments,we foundresultssimilar to thoseof Krimphoff et al. andMisdariis et al.,exceptfor thethird dimensionof theMcAdamsetal. spacewhich,dueto theintroductionof anew descriptor- theHarmonicSpectralSpread- is now explainedat 75%(r=0.87).

After rotation([dim2, -dim1, dim3]) andnormalization(0.43)of theKrumhanslspace,in orderto make it comparablewith theMcAdamset al. space,the first dimensionis bestexplainedinbothcasesby Log-Attack-Time, seconddimensionin both casesby Harmonic SpectralCentroid. Harmonic SpectralDeviationbestexplainsthethird dimensionof theKrumhanslspace,while acombinationof HarmonicSpectralSpreadandHarmonicSpectralVariationbestexplain the third dimensionof theMcAdamset al.space.Orthogonalitybetweenthird dimensionsof KrumhanslandMc Adamset al. spacewasthereforeconsidered.

Percussive Timbr eSpace (seeTable):For theLakatosexperiment(percussivepart)thefirst dimensionisbestexplainedby a combinationof theLog-Attack-Time andtheTemporalCentroid,while theseconddimensionis bestexplainedby theSpectralCentroid.

1.1.4 DistancemeasureUsing thesesignal descriptors,each instrumentsound can bedescribed in terms of perceptual features and compared toother soundsaccordingto an approximationof the perceived(dis)similarityusingthefollowing Euclideandistances:

HarmonicTimbreSpace:

lat hsc

hsd hss hsv(1)

Percussive TimbreSpace:

lat tc sc (2)

where lat, hsc, ... arethe differencesof valuesfor the samedescriptorcomputedon sound andsound ; and arethevectorof weightingcoefficientsgivenin 1.1.2.

1.1.5 Extractionof theSignal’ DescriptorsExtractionof thesignaldescriptorsis depictedin Figure2. For theHarmonicTimbreSpace,it relieson thepreviousestimationof anenergy function of the signal, the estimationof the fundamentalfrequency f0 andthedetectionof theharmoniccomponentsof thesignal.For thePercussive TimbreSpace,it relieson thepreviousestimationof anenergy functionof thesignalandtheestimationof thepowerspectrumof thesignal.In practice,theglobalspectralenvelopewascomputedasanaverageover theadjacentharmonicpeaks,the Log-Attack Time was computedas the logarithm ofthe time taken by theenergy to go from a thresholdof 2% of itsmaximumamplitude,to 80%of its maximumamplitude.

For validating the proposal, two pilot programs werecreated. One based on IRCAM’s Studio OnLine tools([Doval andRodet,1993], [Depalleet al., 1993]) and one on anextensionof theSMSsoftware([Serra,1997]).

2 DATA REPRESENTATIONSIn MPEG-7, relationshipsbetweendescriptors,linking betweendescriptorsandthemedia,structuringof thedocument,aredonethrough one or several “Description Scheme(s)”. As part ofMPEG-7descriptors,Timbre featurestake placeasa setof vari-ablesattachedto temporalentities called segments,which arepart of the DescriptionScheme. Due to the natureof the ex-perimentsthis proposalis basedon (monophonicsteadysounds

Page 4: Instrument Sound Description in the Context of MPEG-7

Proceedingof ICMC2000(InternationalComputerMusic Conference),Berlin, Germany, August27th- September1st,2000 3

Krumhansl and McAdams et al. experimentsignal descriptors

lat: Log-attacktime Definedasthelogarithm(decimalbase)of thedurationfrom thetimewhenthesignalstartsto theminimumbetween[the time whenit reachesits maximumvalue,thetime whenit reachesits sustainedpart]

hsc: Harmonicspectralcentroid Definedastheaverageover the sounddurationof theamplitudeweightedmean(linear scale)of the har-monicpeaksof thespectrum

hss: Harmonicspectralspread Definedastheaverageover thesounddurationof theamplitudeweightedstandarddeviation (linearscale)of theharmonicpeaksof thespectrum,normalizedby thehsc

hsv: Harmonicspectralvariation Definedas the averageover the sounddurationof the one’s complementof the normalizedcorrelationbetweentheamplitude(linearscale)of theharmonicpeaksof thespectrumof two adjacentframes

hsd: Harmonicspectraldeviation Definedastheaverageover thesounddurationof thedeviation of theamplitude(linearscale)of thehar-monicpeaksof thespectrumfrom aglobalspectralenvelope

Lakatos experiment (percussive part) signaldescriptors

lat: Log-AttackTime Definedasthe logarithm(decimalbase)of the durationfrom the time whenthe signalstartsto the timewhenit reachesits maximumvalue

tc: Temporalcentroid Definedastheenergy weightedmeanof thetimeof thesignalsc: Spectralcentroid Definedastheamplitudeweightedmeanof thepower spectrumcomponents

hsv

hsdanalysiswindow

lat

STFT

energy

energy

power spectrum sc

signal

lat

tc

hsc

hss

f0

harmonics

signal

Figure2: Extractionof the signal descriptorsfor the Harmonic[TOP] andPercussive [BOTTOM] TimbreSpaces

for theKrumhanslandMcAdamset al. experiments,non-mixed,non-sustainedsoundsfor the Lakatosexperiment),a confidencecoefficient hasbeenaddedto thesetof Timbredescriptors.Thiscoefficient gives us the relevanceof the descriptionfor the spe-cific segment(blind encodingwithout any a priori knowledgeofthe mediacontentmay occur). The reliability coefficient takesinto accountthe measureof the periodicity of the signalandthequalityof thepeakestimation.

3 VALID ATION OF THE DESCRIPTION ON A REALDATABASE

Giventhatthemainpurposeof thedescriptorsis thefacilitationofsearch-by-similarityoperationsin databaseswith musicalsounds,thevalidationof thosedescriptorswasdonethrougha simulatedscenariothatwasconceptuallyvery closeto therealonewe envi-sionfor them(insteadof performinganothertruly psychoacousti-calexperimentthatused(dis)similarityjudgmentsfrom subjects).Another real-world featurewe includedis the usageof a largedatabaseof non-syntheticsounds.Thescenariosimulatedtwo dif-ferentresultsthathadbeenobtainedwith two differentproceduresin a similarity-basedsearchtask.Thesimulatedresultsweresup-posedto havebeengeneratedusingthesametargetsoundfor bothprocedures.Oneof theproceduresconsistedof a simplerandomselectionof soundsfrom the database,whereasthe otherusedagivennumberof theabove-discusseddescriptorsfor selectingthesounds.Onesetof soundswaspresentedgroupedin onehalf ofthe screen,and the othersetwaspresentedin the otherhalf, asshown in Figure3. Subjects,who werenot awareaboutthe twodifferentselectionprocedures,hadto decidewhich setof soundscould be consideredasbeingmoresimilar to the target. We ex-pectedto find thatthesetsretrievedwith thehelpof certaincom-binationsof descriptorswould beclearlypreferredover randomly

selectedsounds.

Responses,gatheredalongthecontinuousgrey scalein thefig-ure,weremeasuresof the preferencefor onesetor for the otherin termsof similarity to thetarget.Subjectslistenedto thesoundsby clicking on thenoteicons,andwereallowedto listento themin any order andas often as they wished. The scoresgiven bythe subjectswerecodedto indicatethe degreeof preferenceforthesetusingtheproposeddescriptors.Subjectsweremembersofthe IUA or IRCAM. The numberof participantsvariedbetween18 and25, dependingon theexperiment.Thesoundsusedin theexperimentwereselectedusinga space-samplingalgorithmthatensuresa uniform samplingof the feature-spacedefinedby thesoundcollection.Theorigin of thesoundswasdiverse:theStudioOnLineselectionspecificallycompiledfor MPEG7experiments,somepublic ftp servers,andsomecopyright-freecollections.Thenumberof screensto be judgedrangedfrom 25 to 27, dependingon theexperiment.Screenallocationof setswasrandomized,andso wasthe orderof presentationof screens.In the experimentalconditionswhereseveral descriptorswereinvolved, we usedtheweightingschemepresentedin section1.1.4.

In the first two experimentsof the series,we studiedthe sus-tainedsounds.Subjectshadto comparea randomsetversusa setselectedusingonedescriptorin the first onewhereasin the sec-ondonethey comparedarandomsetversusasetselectedusingthefiveproposeddescriptors.Combiningdatafrom bothexperimentsan ANOVA usingthe numberof descriptorsascategorical inde-pendentvariableshowed a significanteffect on the choicescore[F(1,116)=46.188,p 0.001). Figure4 [LEFT] shows that whenwe usedthefive descriptors,a betterselectionof soundscouldbeachieved. Althoughusingonly onedescriptorglobally yields in-differentpreferences,therearea coupleof them(lat andhsc)thatgeneratesetswith higherpreferencescores.

In the third experiment, devoted to percussive sounds, anANOVA performedwith the numberof descriptorsas indepen-dent variable showed a significant effect on the mean scores(F(2,51)=65.493,p 0.001). Post-hocpairwisecomparisontestsrevealedthat all differencesbetweenthe possiblecombinationsare significant. The most relevant result for us is that subjectsclearly consideredasthe mostsimilar to the target thosesetsofsoundsthathadbeenselectedusingall threedescriptors.Figure4[RIGHT] shows asummaryplot of theresults.

From this seriesof experimentswe can reasonablyconcludethatcombinationsof theproposeddescriptorscanbereliablyusedfor performingsearchesby similarity in databasesconsistingofmusicalsounds.Someof thedescriptorsby themselvesyield ac-ceptablealthoughnon-optimalretrievals, but a carefully devisedweightingschemecanprovideresultsacceptedasvalid for a largenumberof people.

Page 5: Instrument Sound Description in the Context of MPEG-7

Proceedingof ICMC2000(InternationalComputerMusic Conference),Berlin, Germany, August27th- September1st,2000 4

Figure3: A screendumpof theinterfacefor theexperiment

Number of Descriptors

Sco

re

Number of Descriptors

Sco

re

Mean Score by Number of Descriptors

Figure4: Meanscorefor theretrieval of sustainedsounds[LEFT]percussive [RIGHT] sounds

4 APPLICATIONS

Among the main applicationsof instrumentsounddescriptionbasedonperceptualfeaturesareauthoringtoolsfor sounddesign-ers,musicianor databasemanagement,retrieval toolsfor produc-ersor sounddesignsoftware. Sampledatabasesavailabletodayarebecominglargerandlarger, somuchthantheusualtaxonom-ical descriptionis becomingpoor in comparisonto thevarietyofsounds. Thesedatabaseswould benefitfrom sucha descriptionbasedon perceptualfeatures(e.g. the Studio OnLine database[Studio-On-Line,2000] with 160Go,alreadybenefitsfrom suchaperceptualdescription[Misdariiset al., 1998] seeFigure5).

��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������

Figure5: A screendumpof theinterfaceof thesearchby timbresimilarity in theStudioOnLinedatabase

CONCLUSIONFor the descriptionof instrumentsoundsin the framework ofMPEG-7,weproposeasetof perceptualfeaturesderivedfrom ex-perimentalstudies.Two familiesof soundshavebeenconsidered:sustainedharmonicsoundsandnon-sustained/percussive sounds.This proposalhasbeenvalidatedusinga databasecomposedofnatural instrumentsounds. The validation allows concludingintheadequacy of theproposalin thecontext of arealdatabaseman-agement.Althoughothervariablescouldbeaddedto theproposedsetin orderto improvethedescription,themethodologyusedhereis to rely on experimentalstudies.Futurework will thereforein-cludedescriptorsderivedfrom otherexperimentalstudiesdealingwith otherclassesof sounds.

ACKNOWLEDGEMENTPart of this work wasconductedin the context of the EuropeanEspritproject28793CUIDAD [CUIDAD, 2000].

REFERENCES[CUIDAD, 2000] CUIDAD (2000). Cuidadworking grouphomepage.

http:// www.ircam.fr/cuidad.

[Depalleetal., 1993] Depalle, P., Garcia, G., and Rodet, X. (1993).Tracking of Partials for Additive Sound Synthesisusing HiddenMarkov Models.In Proc.Int. Conf. onAudio,Speech andSignalProc.

[Doval andRodet,1993] Doval, B. andRodet,X. (1993). FundamentalFrequency EstimationandTrackingUsingMaximumLikelihoodHar-monicMatchingandHMMs. In IEEE.

[Grey andGordon,1978] Grey, J. M. andGordon,J. W. (1978). Percep-tual effects of spectralmodificationson musical timbres. J. Acoust.Soc.Am., 63(5):1493–1500.

[Krimphoff etal., 1994] Krimphoff, J., McAdams,S., andWinsberg, S.(1994). Caracterisationdu timbre dessonscomplexes. II. Analyseacoustiquesetquantificationpsychophysioque.Journal dePhysique.

[Krumhansl,1989] Krumhansl,C. L. (1989). Structure and perceptionof electroacousticsoundandmusic, chapterWhy is musicaltimbresohardto understand?,pages43–53.S.NielzenandO. Olsson,Elsevier,Amsterdam(ExpcerptaMedica846)edition.

[Lakatos,2000] Lakatos,S.(2000).A commonperceptualspacefor har-monicandpercussive timbres.PerceptionandPsychophysics. in press.

[Martin andKim, 1998] Martin,K. andKim, Y. (1998).2pMU9.MusicalInstrumentIdentification:A pattern-recognitionapproach.In Proc.of136thmeetingof ASA.

[McAdamsandWinsberg, 2000] McAdams,S.andWinsberg, S. (2000).A meta-analysisof timbrespace.I: Multidimensionalscalingof groupdatawith commondimensions,specificities,andlatentsubjectclasses.in preparation.

[McAdamsetal., 1995] McAdams, S., Winsberg, S., Donnadieu, S.,DeSoete,G., and Krimphoff, J. (1995). PerceptualScalingof syn-thesizedmusicaltimbres:commondimensions,specificitiesandlatentsubjectclasses.Psychological Research.

[Misdariis et al.,1998] Misdariis, N., Smith, B., Pressnitzer, D., Susini,P., andMcAdams,S. (1998). ValidationandMultidimensionalDis-tanceModel for PerceptualDissimilaritiesamongMusicalTimbres.InProc. of Joint meetingof the 16th congresson ICA, 135thmeetingofASA.

[MPEG-7,2000] MPEG-7(2000). Overview of the MPEG-7Standard.http:// www.cselt.it/mpeg/ standards/mpeg-7/ mpeg7.html.

[Serra,1997] Serra,X. (1997). MusicalSignalProcessing, chapterMu-sicalSoundModelingwith SinusoidsplusNoise.SwetsandZeitlingerPublishers,Poli G. D. ,Picialli A., PopeS.T. andRoadsC. edition.

[Studio-On-Line,2000] Studio-On-Line(2000). http:// www.ircam.fr/studio-online,http:// sol.ircam.fr.

[Wold et al., 1999] Wold, E., Blum, T., Keislar, D., and Wheaton,J.(1999). Handbookof MultimediaComputing, pages207–227. BocaRaton,FLA: CRCPress,In Furht,B. edition.