Transcript
Page 1: An Audio Codec for Multiple Generations Compression ...Department of Computer Science V, University of Bonn, Bonn, Germany frank@cs.uni-bonn.de We describe a generic audio codec allowing

FrankKurth An Audio Codecfor Multiple GenerationsCompressionwithoutLossof PerceptualQuality

AN AUDIO CODEC FOR MULTIPLE GENERATIONS COMPRESSIONWITHOUT LOSSOF PERCEPTUAL QUALITY

FRANK KURTH

Departmentof ComputerScienceV, Universityof Bonn,Bonn,[email protected]

We describea genericaudiocodecallowing for multiple, i.e., cascaded,lossycompressionwithout lossof perceptualquality ascomparedto the first generationof compressedaudio. For this sakewe transferencodinginformationto allsubsequentcodecsin a cascade.Thesupplementalinformationis embeddedin thedecodedaudiosignalwithout causingdegradations.The new methodis applicableto a wide rangeof currentaudiocodecsasdocumentedby our MPEG-1implementation.

INTR ODUCTION

Low bit ratehighqualitycodingis usedin awiderangeofnowadaysaudioapplicationssuchasdigital audiobroad-castingor networkconferencing.Although the decodedversionsof thecompresseddatamaintainveryhighsoundquality, multiple- or tandem-codingmay result in accu-mulatedcodingerrorsresultingfrom lossydatareductionschemes.Suchmultipleor tandemcoding,leadingto thenotionof a signal’s generations, maybedecribedasfol-lows.Assumingacoderoperation

�andacorresponding

decoderoperation� we shall call, for a givensignal � ,� � � thefirst generationand,for an integer � , � � ��� � �the � -th generationof � .In this paperwe proposea methodto overcomeageingeffects. More precisely, our ultimategoal is to preservethefirst generation’sperceptualquality. For this sakeweuseanembeddingtechnique,conceptuallysimilar to theaudiomole[1], to transportcodinginformationfrom onecodecin a cascadeto subsequentcodecs.As comparedto [1], ourembeddingtechniqueis performedin thetrans-form domain. Moreover it is basedon psychoacousticprincipleswhich allows the embeddingto be performedwithoutcausingaudibledistortions.The paperis organizedas follows. In the first sectionwe perform a detailedanalysisof ageingeffects. Forthis sakewe look at a generalmodelof a psychoacous-tic codecand investigatewhich of its componentsmayinduceartifactsin casacdedcoding. Ageingeffectsin areal-worldcodingapplicationaredocumentedby there-sultsof listening testsaswell asmeasurementsof rele-vantencodingparameters.The secondsectiondevelopsa genericaudiocodecfor cascadedcodingwithout per-ceptuallossof quality. In thethird section,animplemen-tationbasedon anMPEG-1Layer II codecis described.Thefifth sectiongivesacodecevaluationbasedonexten-sive listening testsaswell asobjective signalsimilaritymeasurements.Concludingwe point out someapplica-tionsandfuturework in thisarea.

1. AGEING EFFECTS IN CASCADED CODING

Block- or Subband-Transform

Spectral Analysis/PsychoacousticModel

Quantization

Encoder

Requantization

Decoder

InverseTransform

Input Output

Figure1: Simplifiedschemeof psychoacousticcodec.

A generalschemeof a psychoacousticaudio codecisgiven in Fig. 1. In this figure we have omittedoptionalcomponentssuchasentropycodingor scalefactorcalcu-lation, althoughthosemay indirectly influencedegener-ation effects. Sucheffectswill have to be dealtwith ineachparticularcase.Theencoderunit consistsof a subbandtransform� op-eratingin a blockby blockmode.For a signalblock � , asynchronous(w.r.t. thesubbandtransformof thatblock)spectralanalysis� is usedto obtainparametersof apsy-choacousticmodel ��� � � . A bit allocation � ��� � � � ��� � � �is performedusing the psychoacousticparametersandleadingto achoiceof quantizersfor lossydatareduction.Following quantization,codewordsandsideinformation,e.g.,quantizers,scalefactorsetc., are transmittedto thedecoder. The decoderreconstructssubbandsamplesus-ing the side information,e.g., by dequantization,code-book look-up, or inversescaling,andthencarriesout areconstructionsubbandtransform �� .Within this framework, sourcesfor signal degenerationare

1. thelossyquantizationstep,

2. round-off andarithmeticerrorsdueto � , �� , , aswell aspossiblyfurthercalculations,

3. aliasingdueto � , cancelledby �� only in the ab-senceof lossyquantization,

AES17� � InternationalConferenceonHigh QualityAudio Coding 1

Page 2: An Audio Codec for Multiple Generations Compression ...Department of Computer Science V, University of Bonn, Bonn, Germany frank@cs.uni-bonn.de We describe a generic audio codec allowing

FrankKurth An Audio Codecfor Multiple GenerationsCompressionwithoutLossof PerceptualQuality

4. NPR(nearperfectreconstruction)errors,i.e., if ��is only a pseudoinverseof

�,

5. inaccuracy of thepsychoacousticmodel � ,

6. non-suitabletime-frequency resolutionin spectralanalysis,

7. missingtranslationinvarianceof�

and �� (in viewof multiplecoding).

Thequantizationerror1. dominatesandis likely to bere-sponsiblefor thegreatestpartof thedegenerations.Thiserror may differ in several magnitudesfrom all of theother errors. As an extremeexample,considerMPEGcodingwhere,dependingonthecodecconfiguration,sev-eralof thehighestsubbandsof thetransformedsignalaresimplysetto zero.Round-off andNPR-errors(2. and4.)aresmallascomparedto 1., althoughthey mustbecon-trolled from codecto codecin a cascade.Thoseerrorswill beespeciallyimportantin conjunctionwith our em-beddingtechnique.As, e.g.,LayerIII of MPEG-1audiocodingshows, aliasingerrors(3.) have to be dealtwithcarefully. Theerrors5. and6. indirectlycontributeto thequantizationerror.

−3 −2 −1 0 1 2 3

1

2

3

4

5

6

7

8

Pop (Abba)

Castanets

Speech, female

Speech, male

Wind Instruments

Orchestra

Pop (Jackson)

Percussion

equalworse better

Figure2: Comparisonof first vs third generations.Thesolidbarsgivetheaveragerating,thesmallbarsshow thevariance.

Listeningtestsareusedto investigatesignaldegenerationwith increasinggenerations.Fig. 2 showstheresultsof alisteningtestwhere26 listenerscomparedfirst andthirdgenerationscodedwith anMPEG-1LayerII codecat128kbps. Negative valuesindicatethat the third generationswere rated worsethan first generations,whereasposi-tive valuesgive thembetterratings. It is obvious fromFig. 2 that the third generationscouldalmostall be dis-tinguishedfrom thefirst onesandweregenerallyjudgedto beof worsequality.

Themostimportantresults[2] maybesummarizedasfol-lows:� Almostall testpiecesalreadyshow noticeableper-

ceptualchangesin the secondand third genera-tions.� Fourth to sixth generationsshow a clear loss insoundquality. The inducednoisein many of thepiecesis annoying. Almost all piecesabove theeighthgenerationshow a severeperceptualdistor-tion.� Quietandvery harmonicsignalpartsare(almost)not distorteddueto MPEGsextensive useof scalefactorsanddueto higherSMR ratios inducedbytonalsignalcomponents.

It becomesclear that lossy coding changesthe signalsspectralcontentin a way thatdoesnot allow subsequentencodersto performa suitablepsychoacousticanalysis.This leadsto adegenerationof codingparametersand,fi-nally, of overall soundquality. We further illustratethis

0 5 10 15 20 25 30 35−40

−20

0

20

40

60

80

Original

Generation 5

Generation 10

Generation 15

Subband

Ene

rgy

[dB

]

Figure3: Degenerationof subbandenergiesin MPEG-1coding. For a fixed frame, the figure shows the energyin eachof the subbands.The four lines correspondtotheenergiesof theoriginalsignalandthefifth, tenth,andfifteenthgenerationrespectively.

by an exampledocumentingthe changeof the spectralcontentof a signal for increasinggenerations. Fig. 3shows thesubbandenergiesfor a fixedframein thecaseof anMPEG-1encodedguitarpiece.Thedifferentlinesrepresenttheenergiesfor differentgenerations.Oneob-servesthedecreaseof energy in subbands6-9and11-17.The higher subbandsare attenuatedas an effect of theMPEGencodernot allocatingany bits to them.In Fig. 4the energy of the scalefactorbandsis given for severalframesof a piececontaininga strummedelectricguitar

AES17� � InternationalConferenceonHigh QualityAudio Coding 2

Page 3: An Audio Codec for Multiple Generations Compression ...Department of Computer Science V, University of Bonn, Bonn, Germany frank@cs.uni-bonn.de We describe a generic audio codec allowing

FrankKurth An Audio Codecfor Multiple GenerationsCompressionwithoutLossof PerceptualQuality

Frame

Sca

lefa

ctor

ban

d

20 40 60

10

20

30

40

50

60

70

80

Frame

Sca

lefa

ctor

ban

d

20 40 60

10

20

30

40

50

60

70

80

Frame

Sca

lefa

ctor

ban

d

20 40 60

10

20

30

40

50

60

70

80

Sca

lefa

ctor

ban

d

Frame20 40 60

10

20

30

40

50

60

70

80

Figure4: Degenerationof subbandenergies in MPEG-1 coding. For successive frames,the figure shows theenergy in eachof the scalefactorbandsas an intensityplot. Thefour subplotscorrespondto theenergiesof thefirst, fifth, tenth,andfifteenthgenerationrespectively.

(the rhythmof strummingmaybeobservedasthe verti-cal structurein eachof theplots). Increasinggenerationsareplottedin a zig-zagscheme.The signal’s degenera-tion is obviousfrom the lossof structure.Black regionsindicatethat scalefactorbandsareset to zeroduring bitallocation.We briefly mentionthe effect of missingtranslationin-varianceof the codingoperation. As alreadyobservedin [3], acrucialpoint in cascadedcodingis synchronicity.Taking the 32-bandmultirateMPEG-1filter bankasanexample,a subsequentencoderwill not obtainthe samesubbandsamplesasthepreceedingdecoderunlessthein-putsignalis “in phase”with the32-bandfilter bank.Thisis, thefilter bankis only translationinvariantw.r.t. signalshiftsof 32 samples.On a frameby framebasis,thingsareeven worse,sincein order to obtain the samecon-tentin a framewisemanner, theinput to thesecondcoderhasto beframe-aligned(e.g.within 1152samplesin theMPEG-1Layer II case).Hence,we may alsohave var-ious typesof generationeffects,dependingon eventualsignalshiftsprior to subsequentencoding.Listeningtestsin ourMPEG-1LayerII framework show remarkabledif-ferecesin thetypeof degenerationdependingonwhetherthesignalwas

1. fed into subsequentcodecswithout any synchro-nization,or

2. fully synchronizedto the original input and thenfed into thecodec.

Whereastheunsynchronizedsignalstendto produceau-dible artifactsand noise-likecomponents,the synchro-nizedversionstendto attenuateseveral frequency bands.

2040

6080

510

15

−10

−5

0

SubbandGeneration

Ene

rgy

diffe

renc

e [d

B]

2040

6080

510

15

−6−4−2

024

SubbandGeneration

Ene

rgy

diffe

renc

e [d

B]

510152025

510

15

−0.5

0

0.5

SubbandGeneration

Diff

eren

ce in

allo

catio

n

510152025

510

15

−0.4

−0.2

0

0.2

SubbandGeneration

Diff

eren

ce in

allo

catio

n

Figure 5: Developmentof meansubbandenergy (left)andmeanbit allocation(right) asa differencesignalof(increasing)generationsandthefirst generation.Thetopplots show thosedifferencesignalsfor the caseof syn-chronizationbetweeneachof thecodeccascades,thebot-tom plots for thecaseof absenceof sucha synchroniza-tion.

It dependson thesignal,which of the degenerationsarepercievedto bemoresevere.Fig. 5 showsfour differenceplots (first generationversushighergenerations)for thescalefactorbandsmeanenergiesandmeandifferenceinbit allocation,wherethemeanrangesover 200framesofa relatively stationarysignal.Clearly, in thecaseof syn-chronizationbetweensubsequentcoding(top left plot),thesignalenergy is attenuated,whereasabsenceof syn-chronization(bottomleft plot) shows no suchtendencyresp.distributestheenergy acrossthesubbands.It is remarkablethat all formerly proposedmethodsforcascadedaudiocoding[1, 3] takeinto accountasynchro-nizationmechanismto overcomethe mentioneddistor-tion effects.Yet it shouldbeclearthatsucha mechanismaloneis not sufficient to overcomedegeneration.

2. A CODEC PREVENTING AGEING EFFECTS

The ideabehindour new codingmethodis to reusethefirst generation’s encodingparametersin all of the sub-sequentcodecsin a cascade.This allows usto indirectlyusethepsychoacousticanalysisof thefirst encoderin allof the following encoders.Sincethe encodingparame-tersarenot presentat the point of decoding,we usethedecodingparameters,i.e., sideinfos. This is reasonablefor in mostcaseswemayderive theencodingparametersfrom them.While it first seemstrivial thatonemayreusetheencod-ing informationof a certainencodingoperationfor sub-sequentencodings,it is notat all clear� thatonemaydeduceall encodinginformationfrom

AES17� � InternationalConferenceonHigh QualityAudio Coding 3

Page 4: An Audio Codec for Multiple Generations Compression ...Department of Computer Science V, University of Bonn, Bonn, Germany frank@cs.uni-bonn.de We describe a generic audio codec allowing

FrankKurth An Audio Codecfor Multiple GenerationsCompressionwithoutLossof PerceptualQuality

thedecodinginformationonly,� that it is possible,using this information, to per-fectly reproducethe compressedbitstreamof thefirst generation(requantizationand scalingcouldnotbeinjective),� how to convey this informationfrom onecodectoanother.

While thefirst two issuesdependontheparticularcodec,the last one posesa generalproblem, for it is in gen-eral undesirableto createa secondarybitstreamor fileformat. Sucha secondarybitstreamwould causeaddi-tional dataoverhead,would requirea new format def-inition, and, most important,would makeit very diffi-cult to usethe proposedtechniquein conjunctionwithstandardmedianot supportingthe new format. For thissake,thesecondaryinformationfor subsequentencodingshouldbeembeddedinto thedecodedPCMdata.Theuseof suchasteganographictechnique[4] in thisframeworkwasalsoindependentlydevelopedby Fletcher[1]. How-ever, Fletcher’s embeddingmethodsignificantly differsfrom themethodpresentedin thispaperleadingto ratherdifferentcodecstructures.

2.1. PsychoacousticEmbedding

The embeddingof secondarydatainto sometarget datastreamis known assteganographyandhasbeenstudiedto a considerableextent [4]. Our demandson a stegano-graphicprocessaretheabsenceof any perceptualdegra-dationin thetargetsignal,a high embeddingcapacityona frameby framebasis,anda computationallyefficientembeddingprocedure. Furthermore,efficient detectionandextractionof theembeddeddatamustbepossible.For astraightforwardembeddingapproachconsiderasam-ple ���! #" $&%' (*) � ' + ' alsowritten in its binaryrepresenta-tion �,�.- � " $&%&/ / / � ) 0 , � '2143 5 6 7 8 . For 9;:=< andanembeddingword >;�?- > @ $&%&/ / / > ) 0 we definethe 9 -bit(direct)embeddingbyA %�B - � " $&%&/ / / � ) 0 6 - > @ $&%&/ / / > ) 0CD - � " $&% / / � @ > @ $&% / / > ) 0 /(1)

This kind of embeddingdestroysthe 9 leastsignificantbits of thedataword � . Applicationof this techniquetoa time signalalreadycausesa noticeablenoiselevel forsmall 9 . In theaudiomoleproposal[1], this embeddingtechniqueis usedwith the leastsignificantbit(s). De-pendingon thebit resolution,thepossibilityof smallau-dible distortionsare reported[5]. As an alternative tooverwriting the leastsignificantbit, the authorsproposeto changeit accordingto a parity/non parity decision,whichcausesa morerandomsignalchange.A refinementof the time-domainembeddingtechniqueusesaninvertiblelineartransformE prior to embedding.

Embeddingnow becomesa map

- F 6 > 0CD E $&% A % - E�F 6 > 0 / (2)

Fromasteganographicpointof view, asignificantadvan-tageof this methodis that theembeddingis not aseasyto detectas the above without knowledgeof E . Yet itis not guaranteedthat the reconstructedsignalis still ofperceptuallytransparentquality.Froma psychoacousticcodingpoint of view it seemstobeadviseableto performa kind of selective embedding.Embeddingisperformedprior to applyingthereconstruc-tion transform GE . Theembeddingpositionsand-widths(e.g., 9 above) aregiven by the psychoacousticparam-etersor, in view of embeddingin the decoderunit, bythedecodingparameters.Intuitively, thepossiblechoiceof reconstructionlevels for a givenquantizerH with re-quantizer GH is increasedwith theamountof datareduc-tion, i.e.,decreasingnumberof reconstructionlevels.Theembeddingthusbecomesa map

- F 6 > 0CD E $&% GA % -IGH,- H�E�F 0 6 > 6 H 0 6 (3)

wherethe embeddingwidth of GA % dependson H . Thistechniqueis usedin theproposedembeddingcodecpre-sentedin thenext paragraph.

2.2. CodecModel

We first give an overview of the proposedcodecsfunc-tionality. Fig. 6 shows the genericcodecscheme. To

Input

Output

Encoder

Decoder

Decoder

b)

a)

Detectordetected

not detected

Psychoacoustic Bit Allocation

Subband- or

Inverse Transform

Quantization

Mux

Channel

Demux

DequantizationEmbedding

Block Transform

Embedding Bit Allocation

Analysis and Model

Figure6: Codecfor directembedding.

describethefunctionality, westartwith thedecoder. Thedecoderextractscodewordsandsideinformationfor onedatablock from the bitstreamandproceedsby dequan-tization. All parametersneededby the next encodertoreproducethe compressedbitstreamareassembledin abit buffer. Then,usingthequantizationinformation,abitallocationalgorithmdecideswhichsubbandswill beusedfor embedding,andhow many bits will beembeddedin

AES17J K InternationalConferenceonHigh QualityAudio Coding 4

Page 5: An Audio Codec for Multiple Generations Compression ...Department of Computer Science V, University of Bonn, Bonn, Germany frank@cs.uni-bonn.de We describe a generic audio codec allowing

FrankKurth An Audio Codecfor Multiple GenerationsCompressionwithoutLossof PerceptualQuality

eachof them. If enoughbits couldbeallocated,the de-coderproceedsby embeddingall datafrom thebit buffer.To allow extractionof theembeddeddatain subsequentencodingsteps,certainmarkers, i.e., bit combinations,areusedto indicateembedding.Afterwards,the inversetransformreconstructsthetimesignalblock.Theencoderproceedsby transformingthesignalblocksproducingsubbandsignals.For thissake,weassumethatencodingis sychronousto the decodingoperationw.r.t.theprocessedsignalblocks. The detectortries to detectthemarkerscreatedby thedecoder. If markersarefound,a decoderextracts the embeddedinformation from thesubbands.Usingtheembeddedinformation,thebit allo-cationandfurtherencoderparameterscanbesetaccord-ingly. If, however, nomarkersarefoundor theembeddedinformationis detectedto becorrupt,theencoderfollowsthestandardencodingprocedureusingthepsychoacous-tic model. Quantizationandchannelmultiplexing con-cludetheencoding.The single modulesdeserve a more detailedtreatment,whereweshallagainstartwith thedecoder:L To obtain the embeddingcapacity, i.e., the num-

berof bits per samplewhich we may useto storeour secondarydatastream,we divide thesubbandsamples(or transformblocksin transformcoding)in groupsof samplessharingthesamequantizers.Thosegroupsare called embeddingblocks. Foreachembeddingblock we calculatethenumberofbitsavailablefor embedding,whichdependsonthequantizer’s granularity. If, e.g., the quantizerre-ducesan M -bit sampletoa NPO#M bit representation,wemayuseatmost MRQ�N bitsfor embeddingin thissample.Notethatthisdirectcorrespondenceis notvalid for all kindsof quantizersandmoreelaborateconversionruleshave to be usedby non-uniformquantizers.It is alsoimportantto notethatlosslesscodingoperationsmight eventually limit the em-beddingcapacity, asit will show up in thecaseofMPEG’sscalefactors.L All informationto beembeddedis collectedin anembeddingbitstreamor bit buffer. Thetypeof in-formationdependson thechosencodec,examplesare bit allocation, quantizers,scalefactors,code-books,bitratesetc. Sometimesit will beusefultoexploit thecompressedbitstream’sstructure,sincemostof the latter informationis alreadystoredinthisbitstreamin avery compactform.L If thetotal amountof embeddingcapacitysufficesto transmitthedesiredcodinginformation,wehaveto selectthe embeddingblockswherewe want tostorethe embeddingbit stream. Furthermore,wehave to determineanembeddingbit width perem-beddingblock. The decisionaboutthoseparame-

tersis termedembeddingbit allocation. The em-beddingbit allocation,althoughconsumingonly afractionof computationalresourcesascomparedtothe encoder’s bit allocation,is the mostcomplexpartof theproposedcodecextension.

Sincetheproposedembeddingtechniquetheoreti-cally mayyield a biggerreconstructionerror thannormallyinducedby theutilizedquantizers,acon-servative allocationmethodshouldbe used. Weproposea greedybit allocationalgorithmwith em-beddingblocks sortedin descendingorder w.r.t.theirembeddingcapacities.To eachof theembed-ding blockswe assigna certainbit budget,obey-ing a safetymargin accountingfor thepossiblein-creasedreconstructionerror. If thegreedybit allo-cationloop doesnot yield enoughembeddingca-pacity, thebit budgetis increasedaslongasnofur-ther increaseis possible.Embeddingis only per-formedif theallocationproceduresucceeds.L To facilitate the detectionof the embeddedinfor-mation, the kind and position of the embeddinghave to besignaledto theencoderresp. its detec-tor unit. This amountsto conveying the embed-ding blocksandembeddingbit widths chosenbythe embeddingbit allocation. Several techniquesarepossible,e.g.,theuseof adescriptorembeddingblock containingall of this “logistic” information.We proposea block-by-blocksolutionwhereeachusedembeddingblock containsa separatemarkerindicating the correspondingbit width. We onlyhave to ensurethatno “wrong detections”(i.e., theembeddingbit width is not detectedcorrectly)oc-cur.L Sincefloating point operationscommonlycausesarithmeticerrors,a forwarderrorcorrection(FEC)haseventuallyto beappliedto theembeddedinfor-mation. Thoseerrorsmay accumulatewith NPRerrorsasdiscussedabove. A CRCchecksumcanbeusedtodecidewhethertheinformationextractedfrom a certainembeddingblock is valid or not.L Thefirst taskof the encoderis to synchronizethePCMbitstreamw.r.t. theframeboundariesusedbya previous codec. For this purpose,againseveraltechniquesmay be utilized. In the caseof a fil-ter banktransform,a (fast) searchfor markersoncertainpredetermined(frequentlyused)subbandsmaybeused.Oncethesynchronizationis done,theencoderproceedsonaframeby framebasisandnofurthersynchronizationis necessary.L The detectortries to find the embeddedmarkers,deducethe embeddingblocks embeddingwidthsandchecktheembeddingblocksintegrity.

AES17S T InternationalConferenceonHigh QualityAudio Coding 5

Page 6: An Audio Codec for Multiple Generations Compression ...Department of Computer Science V, University of Bonn, Bonn, Germany frank@cs.uni-bonn.de We describe a generic audio codec allowing

FrankKurth An Audio Codecfor Multiple GenerationsCompressionwithoutLossof PerceptualQuality

U Hybrid codingprovidesa mechanismagainstcor-rupt embeddeddata or frameswhereno embed-dingwaspossible.In ahybrid framework, for eachframeit is possibleto choosebetween

– usageof all embeddedcodinginformationifthis informationcouldbeextracted,

– partial usageof embeddedinformation,e.g.bit allocationonly,

– discardingof all of theembeddedinformationanduseof thestandardencodingprocedure.

Input: Codewords V W X Y , decodingparametersZ ,embeddingfunction [�\

1. RequantizeV W X YR]^`_X�a�bIW _X \ c d d d c _X e Y .2. Determineembeddingpositionsandwidths,W _X c Z,YR]^fW g c h Ya�bIW W g \ c d d d c g i�Y c W h \ c d d d c h i�Y Y c

as well as a suitablemarker j , such that k!lg monpg m q&\rlps , and a suitablebinary repre-sentation W Ztc j4Y4]^ bin W Ztc j4Y , consistingofubin W Ztc j4Y u awv ix y \ h x bits.

Here,h x denotestheembeddingwidth in bitsof po-sition z .

3. Createa partitionbin W Ztc j4Y�a�b&{ \| | | { i , where{ x�}2~ � c k � � � .4. For k�lwz�l!� embedinto _X using h x -bit embed-

ding:

[�\�b W _X � � c { x Y]^ _ _X � � d5. Let _ _X m�b a=_X m for all � , where ���arg x for k�lwztn� .

6. Reconstruct_ _X�]^.��� \ _ _X .Output: Signalblock ��� \ _ _X .

Figure7: Pseudocodeversionof thedecodingalgorithm.

A pseudocodeversionof thedecodingalgorithmis givenin Fig. 7. We assumethatcodewords V W X Y aswell asde-codingparametersZ aregiven asan input. In this ex-ample, V denotesthe encodersquantizerfunction. Forsakeof simplicity we only usefixed quantizersaswellasa global marker j . Note that we separateencodinganddecodingparametersto clarify thecodingsteps.Theembeddingbit allocationis summarizedin 2. In 3., theembeddingbit streamis partitionedaccordingto the bitallocation.Finally, embeddingis performedin 4. prior tothereconstructiontransform.

Input: Signalblock � }P� e , transform�psychoacousticmodel � , bit allocation� ,detectorfunction � on � e , ����� �tW �,Ydenotesthesetof all valid markers

1. CalculateX�b a���� .2. Detect��W X YRa�b j .

3. If j��} �(a) Calculate��W �IY . Let X �b a#X .(b) Calculate� W ��W �IY c X c �IY andfrom this

encodingparameters[4a#[,W �IY .4. else

(a) DecodeX�]^�_V W X Y�a�b W X � c Z,Y .(b) CalculateencodingparametersZr]^�[ .

5. QuantizeX � using [ : X �*]^�V W X � Y .6. Determinedecodingparameters[4]^�Z .

Output: codewords V W X � Y , decodingparametersZ .

Figure8: Pseudocodeversionof theencodingalgorithm.

We summarizethe encodingalgorithmin Fig. 8. In thepseudocode,the functionsymbolsfor thebit allocationandpsychoacousticmodelarechosenasin theinitial dis-cussions.Thesetof admissibleor valid markersusedtoindicateembeddingis denotedby � . Thedetectorfunc-tion is named� , quantizeranddequantizerare V and _V ,respectively. Theheartof theencodingalgorithmlies inthedecisionwhetherto usetheembeddedinformation4.or not3. andusethestandardprocedureinstead.It is importantto note that several designdecisionsin-cluding the choiceof embeddingblocksandembeddingbit allocationheavily dependon the underlyingcodec.Suchdecisionswill betreatedin thenext section.

3. MPEG-1 LAYER II IMPLEMENT ATION

To implementacodecpreventingageingeffectswechoseMPEG-1Layer II [6] asa basis.The well-know codingschemeis depictedin Fig. 9. Although,strictly speaking,only thebitstreamsyntaxanddecodingarestandardized,in whatfollowsweshallalsotalk of theencoder. RelatingtheLayer II codecto our genericscheme,we have a 32-bandmultiratefilterbankasa transformanda psychoa-cousticmodel basedon a windowed Fourier spectrum.The psychoacousticmodelyields a signal-to-maskratiousedin thebit allocationprocess.To reduceamplitudere-dundancy, blocksof subbandsamplesareassignedcom-monscalefactorsandscaledaccordingly. Thescaledval-uesarelinearly quantizedusinga variablequantizerstepsizeaccordingto thebit allocation.Codewordsandsideinformationarestoredin theMPEGbitstreamandtrans-

AES17� � InternationalConferenceonHigh QualityAudio Coding 6

Page 7: An Audio Codec for Multiple Generations Compression ...Department of Computer Science V, University of Bonn, Bonn, Germany frank@cs.uni-bonn.de We describe a generic audio codec allowing

FrankKurth An Audio Codecfor Multiple GenerationsCompressionwithoutLossof PerceptualQuality

Encoder

Decoder

Output

Input 32-band multiratefilterbank (analysis)

Scalefactor calculation

transform modelPsychoacoustic

Signal to maskratio

Bit- and Scalefactordecoding

32-band multirate filterbank (synthesis)

allocationBit and Scalefactor

Scaling andQuantization

Mux

Channel

Demux

Requantization

scalingand inverse

Windowed Fourier-

Figure9: MPEG-1LayerII codec.

portedto thedecoder.To extendtheMPEGLayerII codecto theproposednovelcodecscheme,weconsider� thechoiceof suitableembeddingblocks,� the side informationto be embeddedandthe em-

beddingbit buffer,� the determinationof the available embeddingbitwidths,� theembeddingbit allocation,� errorcorrectionmechanisms,and� thechoiceof suitablemarkers.

3.1. Embeddingblocks

MPEG-codingworksonaframeby framebasis.In LayerII, blocksof 1152samplesperchannelarefiltered,yield-ing 36 samplesin eachof the32 subbands.Scalefactorsare assignedto blocks of 12, 24 or 36 sampleswithinonesubband,dependingon the magnitudesof the sam-ples. Sincescalefactorsimplicitly changethe quantiza-tion resolution,the embeddingcapacitydependson theparticularscalefactorassignment/combination.For sim-plicity, we assumetheworst-caseof 12-samplescalefac-tor blocksandchoosethoseblocksasembeddingblocks.Hencetherearepotentially threeembeddingblockspersubbandandframe.Thefollowing discussionassumesafixedgivenMPEGframe.

3.2. Embeddedinformation and bit buffer

Besidesglobal informationsuchasthenumberof chan-nels, bitrate or joint stereocoding modes,we have totransmit/embedthefollowing frame-relatedinformation:

� Bit allocation: dependingon the target bit rateavariablenumberof subbandsis allowedfor bit al-location.In MPEG,2-4bits areusedfor eachsub-band’sallocationinformation.� Scalefactorselection: eachsubbandwith a posi-tive numberof allocatedbits is assigned1-3 scalefactors,dependingon the subbandsamplesmag-nitudes. The select-informationconsistsof 2 bitsconveying theutilizedscalefactorpattern.� Scalefactors:for eachof the1-3scalefactors,6 bitsareused.

First, onemight think that the transmissionof scalefac-tors could beobsoletefor they only representa losslesscodingstep.Yet sincedequantizationeventuallysignifi-cantlyaltersa sample’smagnitude,asubsequentencodermightcalculatedifferentscalefactors,againimplicitly re-sultingin adifferentquantizerstepsize.Thuswetransmitboth scalefactorsandselectinformation. Listeningtestsfor thecasewherewe only transmittedbit allocationin-formationshow a significantamountof degenerationdueto theaforementionedeffectsof scalefactorchanges.To storethoseparameters,we implementeda bit buffer-ing mechanismconsistingof a buffer with read/writeac-cessto �w�#� bit blocks. The coding parametersaresequentiallyfed into the buffer. During embedding,thebuffer is readin �  ,�2� bit blocks,correspondingto theembeddingblock size � andtheembeddingbit width � .In theextractor, theprocedureis reversed.

3.3. Embeddingbit width

For eachscaling/embeddingblock ¡ �*¢ £ ¤ (where � de-notesthesubbandnumberand ��¥#£&¥#¦ oneof thethreescalefactors)we determinethemaximumembeddingbitwidth § ¨ © ª from the correspondingsubbandsquantizerresolution«�¨ (in bits) andthe assignedscalefactor¬ ¨ © ª ,��¥#£¥#¦ . Wehave to accountfor thescalefactors,sincethey implicitly increasethe bit resolution: scalingby afactor   ­ ¨ prior to quantizationallows for anincreaseinbit resolutionby � bitsascomparedto quantizationwith-out scaling.We do not considersubbandswith zerobitsallocated( «�¨#®�¯ ). In this casewe define § ¨ © ªt° ®�¯for all £ . In the casethat lessthanthreescalingfactorsareassignedto a specificsubband,we determinethe ¬ ¨ © ªaccordingto thescalefactorpattern.If the scalefactor¬ ¨ © ª of scalefactorband ¡ �*¢ £ ¤ is given

by ¬ ¨ © ª&®w¬ ±t° ®f²³   ´ µ ­ ± ¶ for �t·¹¸ ¦�° º   » , we let§ ¨ © ª° ®!� º�¼2½P¾ ¿I¡ � º ¢ À ¼tÁ  à Ä*¬ ¨ © ª ÅRÆ;«�¨ ¤ ÇSince À ¼tÁ Â Ã Ä ¬ ¨ © ª Å�®=À ¼tÁ Â Ã Ä ²³   ´ µ ­ ± ¶ Å�®oÈ ± µ É ¼#� weobtain § ¨ © ª&®4� º�¼¹½P¾ ¿I¡ � º ¢IÊ � ¦Ë ¼��RÆ;«�¨ ¤ Ç

AES17Ì Í InternationalConferenceonHigh QualityAudio Coding 7

Page 8: An Audio Codec for Multiple Generations Compression ...Department of Computer Science V, University of Bonn, Bonn, Germany frank@cs.uni-bonn.de We describe a generic audio codec allowing

FrankKurth An Audio Codecfor Multiple GenerationsCompressionwithoutLossof PerceptualQuality

For Î�Ï;Ð Ñ,Ò Ó Ô , scalingcausesthelossof at mostonebitof precision.In this caseÕ Ö × ØRÒ Ù!Ú ÛÝܹÞPß àIá Ú Û â ã�Ö�Ü�Ú ä .Thecalculationof Õ Ö × Ø maybemotivatedasfollows:start-ing from an initial precisionof 16 bits, we obtain theMPEGreconstructionprecisionasthesumof thealloca-tion precisionã�Ö andtheprecisiongainedby usingscale-factors.Thedifferencebetweenthosetwo quantitiesandthe initial resolutiongivesthe maximumembeddingbitwidth. In caseof a precisionexceeding16bits (which ispossiblein MPEG),we let Õ Ö × ØÒ ÙwÑ .3.4. Embeddingbit allocation

For the embeddingbit allocationwe usea greedyalgo-rithm asalreadysketchedabove. We shall only give anoverview of thealgorithm:å Sort the embeddingblocks in orderof decreasing

capacities.å For eachblock á æ*â ç ä , assignan initial embeddingbit width è Ö × ØéwÕ Ö × Ø .å Main allocationloop: allocateembeddingblocksin the given descendingorder. Add capacitydueto è Ö × Ø to counterfor embeddedbit size.Note thatat this point, we have to takecareof FECbits andmarkerstoo.å End the allocation if enoughbits could be allo-cated.å Endtheallocationif no changein theallocatedbitsizeoccurredascomparedto thelastloop.å Increaseeachè Ö × Ø by onebit provided è Ö × ØRé4Õ Ö × Ø ,thenrestartthemainallocationprocedure.

If theembeddingbit allocationfails, thedecoderdoesnotembedanything for thatparticularframe.

3.5. Err or correction and markers

SincetheMPEGfilterbankdoesnot yield perfectrecon-struction,we cannotrely on theencoder’s subbandsam-plesbeingidenticalto thoseproducedby ourembeddingprocedure.Thereis a certainreconstructionerror inher-ent in thefilter bankwhich accumulateswith arithmeticerrorspossiblyintroducedby floating point operations.Thus,prior to embeddingweperformanarithmeticFEC,mappingthewordto beembedded,ê , to ã�êRëPì4Ù�Ò í á êIäfor suitablepositive integersã and ì . Then, í á êIä is theword to beembedded.As anexample,considerê to bean integer, then,choosingãîÒ Ùðï , ì.Ò Ùðñ , we may re-cover ê from í á êIäÝÙ4ï ê�ë�ñÝë#è for è�Ï�Ð Ü�ñ â ò Ô . In thiscase,thecodingoverheadfor thearithmeticcodewouldbe3 bits.Finally, we considerthe markersindicating the embed-ding positionsandbit widths. In our implementationwe

óóôôõõöö÷÷øøùùúúûûüüýýýþþþÿÿÿ��������������������������� ������ ����������������������������������������������� !!"" # ## #$ $$ $% %% %& && &' '' '( (( () )) )* ** *+ ++ +, ,, ,- -- -- -. .. .. ./ // // /0 00 00 01 11 11 12 22 22 23 33 33 34 44 44 45 55 55 56 66 66 67 77 77 78 88 88 8

9 99 99 9: :: :: :

;;<<

Bits

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Sample

6

7

8

9

10

11

5

4

3

2

1

0

Bits for FEC

Marker bits

Embedded data bits

Bits of subband sample

Figure10: Schemeof anembeddingblock andthe em-beddingpositions.Theleastsignificantbits aredepictedon thel.h.s.

chosevariablewidthmarkers,embeddedatthebeginningof ourembeddingblocks.Themarker’swidthdependsonthe actualembeddingbit width. To prevent falsedetec-tionsin theencoder, weperformananalysis-by-synthesisexaminationof all possibleembeddingblocksand,if nec-essary, adapttheircontentaccordingly. Fig. 10 illustratestheconceptof anembeddingblockandgivesaroughim-pressionof theembeddingpositions.For sakeof simplic-ity, weuseda 16bit PCMrepresentationasareference.

4. CODEC EVALUATION

4.1. Testsettings

Theproposedcodecwasevaluatedon a broadvarietyofaudiopieces,mostlychoosenfrom thewidely usedEBUtestmaterial[7]. We testedour codecon musicaswellaswith maleandfemalespeakers.In our test settings,thesourcematerialwasgivenas16 bit PCM stereowitha samplingrateof 44.1kHz. We usedstereobit ratesof128to 192kbpsfor compression.

4.2. Listening tests

In whatfollows,weshallconcentrateonthelisteningtestresultsfor a bitrate of 128 kbps. For our testswe re-cruited26 testlistenerschosenfrom amongthemembersof theaudiosignalprocessinggroupandothercomputersciencestudentsatBonnUniversity.In this section,the termstandard codecrefersto a non-modified MPEG-1 Layer II codecimplementation. Insummaryweconductedthefollowing tests:

1. Comparisonof 5thgenerationsof thestandardcodecand 5th generationsof the proposedcodec(rela-

AES17= >

InternationalConferenceonHigh QualityAudio Coding 8

Page 9: An Audio Codec for Multiple Generations Compression ...Department of Computer Science V, University of Bonn, Bonn, Germany frank@cs.uni-bonn.de We describe a generic audio codec allowing

FrankKurth An Audio Codecfor Multiple GenerationsCompressionwithoutLossof PerceptualQuality

tive).

2. Comparisonof first generationsand25th genera-tionsof theproposedcodec(relative).

3. Comparisonof 5th generationsusingthe standardcodecand25thgenerationsof theproposedcodec(relative).

4. Comparisonof 3rd generationsusingthestandardcodecand5th generationsof the proposedcodec(relative).

5. Comparisonof 10thgenerationsof thestandardco-dec and 25th generationsof the proposedcodec(relative).

6. Absoluteratingsof

? First generations,? 5th generationsof the standardcodec(with-outsynchronization),? 5thgenerationsof thestandardcodec(includ-ing synchronization),? 5thgenerationsof theproposedcodec,and? 15thgenerationsof theproposedcodec.

In thosetestsabsoluteratingswere given accordingtothefive point MOS impairmentscale,while relative rat-ings weregiven w.r.t. an integer scaleof

@ ACB D EFB G, in-

dicating if a secondstimulus is judgedto be of better( H EJI D EFK D EFB L ),worse( H ACB D ACK D AMI L ) or same( N ) qual-ity as comparedto a first stimulus. We shall give anoverview of thetestresults.

−3 −2 −1 0 1 2 3

1

2

3

4

5

6

7

8

Pop (Abba)

Castanets

Speech, female

Speech, male

Wind Instruments

Orchestra

Pop (Jackson)

Percussion

betterworse equal

Figure11: Comparisonof first generationsand25thgen-erationsusingtheproposedcodec.For mostof thepieces,bothversionscouldnotbedistinguished.

−3 −2 −1 0 1 2 3

1

2

3

4

5

6

7

8

Pop (Abba)

Castanets

Speech, female

Speech, male

Wind Instruments

Orchestra

Pop (Jackson)

Percussion

betterequalworse

Figure 12: Comparisonof 10th generationsusing thestandardcodecvs 25th generationsusing the proposedcodec. Positive valuesgive the proposedcodeca betterrating.

In Fig. 11, the resultsof thecomparisonof first genera-tionsand25thgenerationsusingtheproposedcodecaregiven. It is obviousthatfor almostall of thepieces,bothversionscouldnotbedistinguished.Thedecreasein qualitycausedby thestandardcodecwasalreadyreportedin thefirst section.Fig. 12 givesthere-sultsfor the“extreme”situationof a comparisonof 10thgenerationsusing the standardcodecand 25th genera-tions using the proposedcodec. Clearly, the proposedcodecis consideredto yield muchbetterresults.There-sults of the other testsare consistentwith the onesre-portedhere. Oneproblemprevalent in our testsettingswasthelossof quality in thefirst generationsintroducedby the Layer II codecin caseof oneor two pieces. Inthosecases,someof thelistenerswereunsureaboutwhichof a piece’sversionsshouldberatedworse.We summarizethetrendsof our listeningtests:

? For six of thetestpieces,first generationscouldnotbe distinguishedfrom 25th generationsgeneratedby theproposedcodec.

? For the other pieces,the quality of higher (about25th)generationsaregenerallyjudgedto becom-parableto or betterthanthe 3rd generationspro-ducedby thestandardcodec.

? Testswith highgenerations(about50th)show thatthesignalqualitystaysstable.

It is importantto commentonsegmentswherenoembed-dingis possible.As will beshown in thenext subsection,embeddingis indeednot possiblefor eachframe. Yet it

AES17O P InternationalConferenceonHigh QualityAudio Coding 9

Page 10: An Audio Codec for Multiple Generations Compression ...Department of Computer Science V, University of Bonn, Bonn, Germany frank@cs.uni-bonn.de We describe a generic audio codec allowing

FrankKurth An Audio Codecfor Multiple GenerationsCompressionwithoutLossof PerceptualQuality

turnsout thatthis is notcrucialfor ourMPEGimplemen-tation. More precisely, the frameswith no possibilityofembeddingturnoutto correspondtoquietsignalpassageswith tonalcontent.Since

1. the useof scalefactorsimplies an increasedaccu-racy within thosesegmentsand

2. tonal componentsarequantizedin a conservativeway(highersignal-to-maskratio assumed),

thereis a natural“workaround”to this problem.Our lis-teningtestsconfirmthis reasoning.

4.3. Objective measurements

To supporttheaboveresults,wegivesomeobjectivemea-surementsconcerningthesignalchangeswith increasinggenerations.Furthermore,weconsiderpossible(or max-imum) embeddingcapacitiesbeingof interestfor othersteganographicapplications.

0 10 20 30 40 50 60 70 80 900

50

100

Seg

SN

R [d

B]

0 10 20 30 40 50 60 70 80 900

50

100

Seg

SN

R [d

B]

0 10 20 30 40 50 60 70 80 900

50

100

Seg

SN

R [d

B]

0 10 20 30 40 50 60 70 80 900

50

100

Frames

Seg

SN

R [d

B]

Figure13: Eachof the four partsof theplot eachshows(fromtopto bottom)thesegmentalSNRsbetween1stand3rd generations,3rd and 5th generations,5th and 10thgenerations,aswell as10thand25thgenerations.Smallcirclesatabout60dB indicatethatthecorrespondingseg-mentsareidentical

In Fig. 13 the frameby frameSegSNRbetweenvariousgenerationsof the castagnetpieceis given. This is, thesegmentsize is 1152, in synchronicitywith the MPEGframes. While the SegSNR is plotted as a solid line,segmentswherethesignalcontentdoesnot changefromgenerationto generationareindicatedby smallcirclesatabout60dB. It canbeobservedthatwith increasinggen-erations(from the top to thebottomplot) therearemoreandmoresegmentswherethesignaldoesnotchangeany-more. This is perfectlyin accordancewith the observa-tions of the listeningtestthat the signalquality startstobe“stable”startingfrom about3rd to 5thgenerations.Inotherwords,we may saythat the proposedcodectends

0 1 2 3 4 5 6 7 8 9 10

x 104

−1

−0.5

0

0.5

1

x 104

Signal [Samples]

Am

plitu

de

1 2 3 4 5 6 7 8 9 10

x 104

−50

0

50

Signal [Samples]

Diff

eren

ce

Figure14: Theplot showsthewaveformof thecastagnetpiece(bottom).Above,thedifferencesignalbetween5thand10thgenerationsis given. Theupperhorizontalbarsindicatepositionswhereembeddingcouldbeperformedwhile thelowerbarsindicatesegmentswherenoembed-dingwasperformed.It is obviousthatthedifferencesig-nal vanishesat the embeddingpositions(Note that themagnitudeof the differencesignal is much lower thanthatof thesignal.).

to map the signal partially to somefixed point signal.Comparingtheembeddingsegmentsto thosefixedpointpositionswithin the signal, we oberve a match,as de-picted in Fig. 14. The lower part of the plot shows thewaveform of the castagnetpiece. The differencesignalbetween5th and10th generationsis given in the upperpart. Theupperhorizontalbarsindicatepositionswhereembeddingcouldbeperformedwhile thelowerbarsindi-catesegmentswhereno embeddingwasperformed.It isobviousthatthedifferencesignalvanishesat theembed-ding positions. Measurementswith differenttestpiecesandothermeasures( Q R -distance,relativesegmentalSNR)show similar results.

In view of somenaturalextensionsof theproposedcodecsaswell asothersteganographicapplications,weexaminetheembeddingcapacitiesobtainableby our apporach.Incaseof thecastagnetpiece,Fig. 15shows thevariousbitdemandsandcapacitiesfor eachframe. The solid linegivesthebit demandfor theembeddingof thecodingpa-rametersandthe dashedline shows the total embeddingcapacity. The dottedsolid line givesthe net embeddingcapacity, i.e., thetotalembeddingcapacityminustheca-pacity neededfor the error correctionand the markers.Embeddingis only performedfor frameswherethesolidline is plotted below the dottedsolid line. One clearlyobservestheoverheadproducedby themarkersanderrorcorrection.Furthermore,somesignalpartsallow a veryhugeamountof embedding(about4000-6000bits per

AES17S T InternationalConferenceonHigh QualityAudio Coding 10

Page 11: An Audio Codec for Multiple Generations Compression ...Department of Computer Science V, University of Bonn, Bonn, Germany frank@cs.uni-bonn.de We describe a generic audio codec allowing

FrankKurth An Audio Codecfor Multiple GenerationsCompressionwithoutLossof PerceptualQuality

0 10 20 30 40 50 60 70 80 900

2000

4000

6000

8000 Bit Demand Bits/Frame Net Embedding capacity Total embedding capacity

Frames

Bits

0 1 2 3 4 5 6 7 8 9 10

x 104

−1

−0.5

0

0.5

1

x 104

Signal

Figure 15: Embeddingcapacitiesand demandsfor thecastagnetpiece. The solid line gives the bit demandfor embeddingof thecodingparameters(perframe),thedashedline shows the total embeddingcapacity, andthedottedsolid line givesthenetembeddingcapacity.

20 40 60 80 100 120

0

5000

10000

Frames

Bits

140 160 180 200 220 2400

5000

10000

15000

Frames

Bits

260 280 300 320 340 360 380

0

5000

10000

Bits

380 400 420 440 460 480 500 520 540 560

0

5000

10000

Bits

Figure 16: Embeddingcapacitiesand demandsfor alongersegmentof a pieceof popmusic.

frame)which is really a considerablequantity. Note thatthis is nothingreallyexceptional,asillustratedin Fig.16.Theplot is analogousto theupperpartof Fig. 15 exceptthata considerablylongersignalpieceis shown. For thispieceof popmusic,therearesomepartswith netembed-ding capacitiesof about4000bits per frame,in this casecorrespondingto a totalof 10000bitsperframe.

5. CONCLUSIONS AND FUTURE WORK

We have presenteda novel audiocodecschemesuitablefor multiple generationsaudiocompressionwithout lossof perceptualquality. As an implementationof the newconceptwe decribedan MPEG-1Layer II basedcodec.A thoroughcodecevaluationincluding extensive listen-ing testsandobjective measurementsshows the proper

functionalityof the proposedcodec.Thecodecconceptis genericandmaybecombinedwith a widerangeof to-daysaudiocodecs(and,naturally, video/imagecodecs).Furtherresearchdescribingan improved secondproto-typebasedonpopulartransformcodingtechniquesis re-portedelsewhere.Wealsoconsideredsteganographicapplications.Thepos-siblehugeembeddingcapacitiesasillustratedin Fig. 16show the flexibility of our embeddingapproachin viewof otherapplications.Amongthoseis oneof ourcurrentprojectswhich is concernedwith the synchronousinte-grationof textual or scoreinformationinto the decodedPCMbitstream.Naturally, thereis muchroom for further improvementsof ourprototypicimplementationincludingthereductionof FEC/markeroverheadusinga techniquesimilar to theMPEG-1LayerIII bit reservoir, or introductionof robust-nessw.r.t. transmissionerrors. The applicationof theintroducedtechniqueto heterogeneouscodeccascadesisalsoof greatinterestfor further studies. Recentdevel-opmentson a VQ embeddingframework allowing forcodecswhich,undersomemild conditions,maybeprovento keeptheperceptualquality of thefirst generationsig-nal,will bereportedelsewhere.

6. ACKNOWLEDGEMENT

Theauthorwouldlike to expresshisgratitudeto MichaelClausenfor supervisingthiswork andto MeinardMullerfor giving several valuablecomments.The researchac-tivities have beenpartially supportedby DeutscheFor-schungsgemeinschaftundergrantCL 64/3-1,CL 64/3-2,andCL 64/3-3.

REFERENCES

[1] John Fletcher, “Iso/mpeg layer 2 - optimum re-encodingof decodedaudiousinga molesignal,” inProc.104thAESConvention,Amsterdam, 1998.

[2] FrankKurth, VermeidungvonGenerationseffekteninderAudiocodierung(in german), Ph.D.thesis,Insti-tut fur InformatikV, UniversitatBonn,1999.

[3] Michael Keyhl, HaraldPopp,ErnstEberlein,Karl-Heinz Brandenburg, Heinz Gerhauser, and Chris-tian Schmidmer, “Verfahren zum kaskadiertenCodierenund Decodierenvon Audiodaten,” 1994,PatentschriftbeimDeutschenPatentamtDE 4405659C1.

[4] RossJ.AndersonandFabienA.P. Petitcolas,“On thelimits of steganography,” IEEE Journal on selectedareasin communications, vol. 16,no.4,pp.474–482,May 1998.

AES17U V InternationalConferenceonHigh QualityAudio Coding 11

Page 12: An Audio Codec for Multiple Generations Compression ...Department of Computer Science V, University of Bonn, Bonn, Germany frank@cs.uni-bonn.de We describe a generic audio codec allowing

FrankKurth An Audio Codecfor Multiple GenerationsCompressionwithoutLossof PerceptualQuality

[5] A. J. Mason,A. K. McParland,and N. J. Yeadon,“The ATLANTIC audiodemonstrationequipment,”in 106thAESConvention,Munich, Germany, 1999.

[6] ISO/IEC11172-3, “Coding of moving picturesandassociatedaudiofor digital storagemediaatupto 1.5mbits/s-audiopart,” 1992,InternationalStandard.

[7] Technical Centre of the European BroadcastingUnion, “SoundQualityAssessmentMaterialRecod-ingsfor SubjectiveTests,” 1988.

AES17W X InternationalConferenceonHigh QualityAudio Coding 12


Recommended