Base Paper (ITDSP02)

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 3, MARCH 2013 675

Audio Watermarking Via EMD

Kais Khaldi and Abdel-Ouahab Boudraa, Senior Member, IEEE

Abstract—In this paper a new adaptive audio watermarking algorithmbased on Empirical Mode Decomposition (EMD) is introduced. The audiosignal is divided into frames and each one is decomposed adaptively, byEMD, into intrinsic oscillatory components called Intrinsic Mode Func-tions (IMFs). The watermark and the synchronization codes are embeddedinto the extrema of the last IMF, a low frequency mode stable under dif-ferent attacks and preserving audio perceptual quality of the host signal.The data embedding rate of the proposed algorithm is 46.9–50.3 b/s. Re-lying on exhaustive simulations, we show the robustness of the hidden wa-termark for additive noise, MP3 compression, re-quantization, filtering,cropping and resampling. The comparison analysis shows that our methodhas better performance than watermarking schemes reported recently.

Index Terms—Empirical mode decomposition, intrinsic mode function,audio watermarking, quantization index modulation, synchronizationcode.

I. INTRODUCTION

Digital audio watermarking has received a great deal of attention inthe literature to provide efficient solutions for copyright protection ofdigital media by embedding a watermark in the original audio signal[1]–[5]. Main requirements of digital audio watermarking are imper-ceptibility, robustness and data capacity. More precisely, the water-mark must be inaudible within the host audio data to maintain audioquality and robust to signal distortions applied to the host data. Finally,the watermark must be easy to extract to prove ownership. To achievethese requirements, seeking new watermarking schemes is a very chal-lenging problem [5]. Different watermarking techniques of varyingcomplexities have been proposed [2]–[5]. In [5] a robust watermarkingscheme to different attacks is proposed but with a limited transmissionbit rate. To improve the bit rate, watermarked schemes performed inthe wavelets domain have been proposed [3], [4]. A limit of waveletapproach is that the basis functions are fixed, and thus they do not nec-essarily match all real signals. To overcome this limitation, recently, anew signal decomposition method referred to as Empirical Mode De-composition (EMD) has been introduced for analyzing non-stationarysignals derived or not from linear systems in totally adaptive way [6].A major advantage of EMD relies on no a priori choice of filters orbasis functions. Compared to classical kernel based approaches, EMDis fully data-drivenmethod that recursively breaks down any signal intoa reduced number of zero-mean with symmetric envelopes AM-FMcomponents called Intrinsic Mode Functions (IMFs). The decomposi-tion starts from finer scales to coarser ones. Any signal is expandedby EMD as follows:

(1)

Manuscript received March 01, 2012; accepted October 11, 2012. Date ofpublication November 16, 2012; date of current version January 11, 2013. Theassociate editor coordinating the review of this manuscript and approving it forpublication was Prof. Chang D. Yoo.K. Khaldi is with ENIT, U2S, Le Belvédère, 1002 Tunis, Tunisia (e-mail:

[email protected]).A. O. Boudraa is with Ecole navale, IRENav, BCRM Brest, 29240 Brest

Cedex 9, France (e-mail: [email protected]).Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TASL.2012.2227733

Fig. 1. Decomposition of an audio frame by EMD.

Fig. 2. Data structure .

where is the number of IMFs and denotes the final residual.The IMFs are nearly orthogonal to each other, and all have nearly zeromeans. The number of extrema is decreased when going from onemode to the next, and the whole decomposition is guaranteed to becompleted with a finite number of modes. The IMFs are fully describedby their local extrema and thus can be recovered using these extrema[7], [8]. Low frequency components such as higher order IMFs aresignal dominated [9] and thus their alteration can lead to degradationof the signal. As result, these modes can be considered to be good lo-cations for watermark placement. Some preliminary results have ap-peared recently in [10], [11] showing the interest of EMD for audiowatermarking. In [10], the EMD is combined with Pulse Code Modu-lation (PCM) and the watermark is inserted in the final residual of thesubbands in the transform domain. This method supposes that meanvalue of PCM audio signal may no longer be zero. As stated by theauthors, the method is not robust to attacks such as band-pass filteringand cropping, and no comparison to watermarking schemes reportedrecently in literature is presented. Another strategy is presented in [11]where the EMD is associated with Hilbert transform and the watermarkis embedded into the IMF containing highest energy. However, why theIMF carrying the highest amount of energy is the best candidate modeto hide the watermark has not been addressed. Further, in practice anIMF with highest energy can be a high frequency mode and thus it isnot robust to attacks.Watermarks inserted into lower order IMFs (high frequency) are

most vulnerable to attacks. It has been argued that for watermarkingrobustness, the watermark bits are usually embedded in the perceptu-ally components, mostly, the low frequency components of the hostsignal [12]. Compared to [10], [11], to simultaneously have better re-sistance against attacks and imperceptibility, we embed the watermarkin the extrema of the last IMF. Further, unlike the schemes introducedin [10], [11], the proposed watermarking is only based on EMD andwithout domain transform. We choose in our method a watermarkingtechnique in the category of Quantization IndexModulation (QIM) dueto its good robustness and blind nature [13]. Parameters of QIM arechosen to guarantee that the embedded watermark in the last IMF isinaudible. The watermark is associated with a synchronization code tofacilitate its location. An advantage to use the time domain approach,based on EMD, is the low cost in searching synchronization codes.Audio signal is first segmented into frames where each one is decom-posed adaptively into IMFs. Bits are inserted into the extrema of the lastIMF such that the watermarked signal inaudibility is guaranteed. Ex-perimental results demonstrate that the hidden data are robust against

1558-7916/$31.00 © 2012 IEEE

676 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 3, MARCH 2013

Fig. 3. Watermark embedding.

attacks such as additive noise, MP3 compression, requantization, crop-ping and filtering. Our method has hight data payload and performanceagainstMP3 compression compared to audiowatermarking approachesreported recently in the literature.

II. PROPOSED WATERMARKING ALGORITHM

The idea of the proposed watermarking method is to hide into theoriginal audio signal a watermark together with a Synchronized Code(SC) in the time domain. The input signal is first segmented into framesand EMD is conducted on every frame to extract the associated IMFs(Fig. 1). Then a binary data sequence consisted of SCs and informativewatermark bits (Fig. 2) is embedded in the extrema of a set of consec-utive last-IMFs. A bit (0 or 1) is inserted per extrema.Since the number of IMFs and then their number of extrema depend

on the amount of data of each frame, the number of bits to be embeddedvaries from last-IMF of one frame to the following. Watermark andSCs are not all embedded in extrema of last IMF of only one frame. Ingeneral the number of extrema per last-IMF (one frame) is very smallcompared to length of the binary sequence to be embedded. This alsodepends on the length of the frame. If we design by and thenumbers of bits of SC and watermark respectively, the length of binarysequence to be embedded is equal to . Thus, thesebits are spread out on several last-IMFs (extrema) of the consecutiveframes. Further, this sequence of bits is embedded times.Finally, inverse transformation is applied to the modifiedextrema to recover the watermarked audio signal by superposition ofthe IMFs of each frame followed by the concatenation of the frames(Fig. 3). For data extraction, the watermarked audio signal is split intoframes and EMD applied to each frame (Fig. 4). Binary data sequencesare extracted from each last IMF by searching for SCs (Fig. 5). Weshow in Fig. 6 the last IMF before and after watermarking. This figureshows that there is little difference in terms of amplitudes between thetwo modes. EMD being fully data adaptive, thus it is important to guar-antee that the number of IMFs will be same before and after embeddingthe watermark (Figs. 1, 4). In fact, if the numbers of IMFs are different,there is no guarantee that the last IMF always contains the watermarkinformation to be extracted. To overcome this problem, the sifting ofthe watermarked signal is forced to extract the same number of IMFs asbefore watermarking. The proposed watermarking scheme is blind, thatis, the host signal is not required for watermark extraction. Overviewof the proposed method is detailed as follows:

A. Synchronization Code

To locate the embedding position of the hidden watermark bits inthe host signal a SC is used. This code is unaffected by cropping andshifting attacks [4].Let be the original SC and be an unknown sequence of the

same length. Sequence V is considered as a SC if only the number ofdifferent bits between and , when compared bit by bit, is less orequal than to a predefined threshold [3].

Fig. 4. Decomposition of the watermarked audio frame by EMD.

B. Watermark Embedding

Before embedding, SCs are combined with watermark bits to forma binary sequence denoted by -th bit of watermark (Fig.2). Basics of our watermark embedding are shown in Fig. 3 and detailedas follows:Step 1: Split original audio signal into frames.Step 2: Decompose each frame into IMFs.Step 3: Embed times the binary sequence into extrema of

the last IMF by QIM [13]:

(2)

where and are the extrema of of the host audiosignal and the watermarked signal respectively. sgn func-tion is equal to “ ” if is a maxima, and “ ” if it isa minima. denotes the floor function, and S denotesthe embedding strength chosen to maintain the inaudibilityconstraint.

Step 4: Reconstruct the frame using modifiedand concatenate the watermarked frames to retrieve the wa-termarked signal.

C. Watermark Extraction

For watermark extraction, host signal is splitted into frames andEMD is performed on each one as in embedding.We extract binary datausing rule given by (3). We then search for SCs in the extracted data.This procedure is repeated by shifting the selected segment (window)one sample at time until a SC is found. With the position of SC deter-mined, we can then extract the hidden information bits, which followsthe SC. Let denote the binary data to be extracted anddenote the original SC. To locate the embedded watermark we searchthe SCs in the sequence bit by bit. The extraction is performedwithout using the original audio signal. Basic steps involved in the wa-termarking extraction, shown in Fig. 5, are given as follows:Step 1: Split the watermarked signal into frames.Step 2: Decompose each frame into IMFs.Step 3: Extract the extrema of .

KHALDI AND BOUDRAA: AUDIO WATERMARKING VIA EMD 677

Fig. 5. Watermark extraction.

Fig. 6. Last IMF of an audio frame before and after watermarking.

Step 4: Extract from using the following rule [3]:

(3)

Step 5: Set the start index of the extracted data, , to andselect samples (sliding window size).

Step 6: Evaluate the similarity between the extracted segmentand bit by bit. If the similarity value is , then

is taken as the SC and go to Step 8. Otherwise proceedto the next step.

Step 7: Increase by 1 and slide the window to the nextsamples and repeat Step 6.

Step 8: Evaluate the similarity between the second extracted seg-ment, and bitby bit.

Step 9: , of the new value is equal to sequencelength of bits, go to Step 10 else repeat Step 7.

Step 10: Extract the watermarks and make comparison bit by bitbetween these marks, for correction, and finally extract thedesired watermark.

Watermarking embedding and extraction processes are summarized inFig. 7.

III. PERFORMANCE ANALYSIS

We evaluate the performance of our method in terms of data payload,error probability of SC, Signal to Noise Ratio (SNR) between originaland the watermarked audio signals, Bit Error Rate and Normal-ized cross-Correlation . According to International Federation ofthe Photographic Industry (IFPI) recommendations, a watermark audiosignal should maintain more than 20 dB SNR. To evaluate the water-mark detection accuracy after attacks, we used the and the de-fined as follows [4]:

(4)

where is the XOR operator and are the binary watermarkimage sizes. and are the original and the recovered watermarkrespectively. is used to evaluate the watermark detection accuracyafter signal processing operations. To evaluate the similarity between

Fig. 7. Embedding and extraction processes.

Fig. 8. Binary watermark.

the original watermark and the extracted one we use the measuredefined as follows:

(5)

A large indicates the presence of watermark while a low valuesuggests the lack of watermark. Two types of errors may occur whilesearching the SCs: the False Positive Error (FPE) and the False Neg-ative Error (FNE). These errors are very harmful because they impairthe credibility of the watermarking system. The associated probabili-ties of these errors are given by [3], [4]:

(6)

where is the SC length and is is the threshold. is the proba-bility that a SC is detected in false location while is the proba-bility that a watermarked signal is declared as unwatermarked by thedecoder. We also use as performance measure the payload which quan-tifies the amount of information to be hidden. More precisely, the datapayload refers to the number of bits that are embedded into that audiosignal within a unit of time and is measured in unit of bits per second(b/s).


Fig. 9. A portion of the pop audio signal and its watermarked version.

TABLE ISNR AND ODG BETWEEN ORIGINAL AND WATERMARKED AUDIO

IV. RESULTS

To show the effectiveness of our scheme, simulations are performedon audio signals including pop, jazz, rock and classic sampled at44.1 kHz. The embedded watermark, W, is a binary logo image of size

bits (Fig. 8). We convert this 2D binaryimage into 1D sequence in order to embed it into the audio signal. TheSC used is a 16 bit Barker sequence 1111100110101110. Each audiosignal is divided into frames of size 64 samples and the thresholdis set to 4. The value is fixed to 0.98. These parameters have beenchosen to have a good compromise between imperceptibility of thewatermarked signal, payload and robustness. Fig. 9 shows a portionof the pop signal and its watermarked version. This figure shows thatthe watermarked signal is visually indistinguishable from the originalone.Perceptual quality assessment can be performed using subjective lis-

tening tests by human acoustic perception or using objective evaluationtests by measuring the SNR and Objective Difference Grade (ODG).In this work we use the second approach. ODG and SNR values of thefour watermarked signals are reported in Table I. The SNR values areabove 20 dB showing the good choice of value and confirming toIFPI standard. All ODG values of the watermarked audio signals arebetween and 0 which demonstrates their good quality.

A. Robustness Test

To asses the robustness of our approach, different attacks are per-formed:—Noise:White GaussianNoise (WGN) is added to thewatermarkedsignal until the resulting signal has an SNR of 20 dB.

— Filtering: Filter the watermarked audio signal usingWiener filter.—Cropping: Segments of 512 samples are removed from the wa-termarked signal at thirteen positions and subsequently replacedby segments of the watermarked signal contaminated with WGN.

Fig. 10. versus synchronization code length.

Fig. 11. versus the length of embedding bits.

—Resampling: The watermarked signal, originally sampled at44.1 kHz, is re-sampled at 22.05 kHz and restored back bysampling again at 44.1 kHz.

—Compression: (64 kb/s and 32 kb/s) UsingMP3, the watermarkedsignal is compressed and then decompressed.

—Requantization: The watermarked signal is re-quantized downto 8 bits/sample and then back to 16 bits/sample.

Table II shows the extracted watermarks with the associated andvalues for different attacks on pop audio signal. values are all

above 0.9482 and most values are all below 3%. The extractedwatermark are visually similar to the original watermark. These resultsshows the robustness of watermarking method for pop audio signal.Even in the case of WGN attack with SNR of 20 dB, our approach doesnot detects any error. This is mainly due to the insertion of the water-mark into extrema. In fact low frequency subband has high ro-bustness against noise addition [3], [4]. Table III reports similar resultsfor classic, jazz and rock audio files. values are all above 0.9964and values are all below 3%, demonstrating the good performancerobustness of our method on these audio files. This is robustness is dueto the fact that even the perceptual characteristics of individual audiofiles vary, the EMD decomposition adapts to each one. Table IV showscomparison results in terms of payload and robustness to MP3 com-pression attack of our method to nine recent watermarking schemes

KHALDI AND BOUDRAA: AUDIO WATERMARKING VIA EMD 679

TABLE IIAND OF EXTRACTED WATERMARK FOR POP

AUDIO SIGNAL BY PROPOSED APPROACH

TABLE IIIAND OF EXTRACTED WATERMARK FOR DIFFERENT AUDIO SIGNALS

(CLASSIC, JAZZ, ROCK) BY OUR APPROACH

[4], [14]–[20]. Due to diversity of these embedding approaches, thecomparison is sorted by attempted data payload. It can be seen that ourmethod achieves the highest payload for the three audio files. Also, forthese signals our scheme has a good performance againstMP3 (32 kb/s)compression, where the maximum of BER against this last is of 1%.Fig. 10 plots the versus . We see that tends to 0 when

. So, this confirms the choice of SC length. Fig. 11 shows thatthe is dependent on the length of watermark bits. So, we note thatfor the embedding bits length , the tends to 0. Since the wa-termark size in bits used is o f 1632 , the obtained is verylow.

TABLE IVCOMPARISON OF AUDIO WATERMARKING METHODS,

SORTED BY ATTEMPTED PAYLOAD

V. CONCLUSION

In this paper a new adaptive watermarking scheme based on theEMD is proposed.Watermark is embedded in very low frequencymode(last IMF), thus achieving good performance against various attacks.Watermark is associated with synchronization codes and thus the syn-chronized watermark has the ability to resist shifting and cropping.Data bits of the synchronized watermark are embedded in the extremaof the last IMF of the audio signal based on QIM. Extensive simulationsover different audio signals indicate that the proposed watermarkingscheme has greater robustness against common attacks than nine re-cently proposed algorithms. This scheme has higher payload and betterperformance against MP3 compression compared to these earlier audiowatermarking methods. In all audio test signals, the watermark intro-duced no audible distortion. Experiments demonstrate that the water-marked audio signals are indistinguishable from original ones. Theseperformances take advantage of the self-adaptive decomposition of theaudio signal provided by the EMD. The proposed scheme achieves verylow false positive and false negative error probability rates. Our water-marking method involves easy calculations and does not use the orig-inal audio signal. In the conducted experiments the embedding strengthis kept constant for all audio files. To further improve the perfor-

mance of themethod, the parameter should be adapted to the type andmagnitudes of the original audio signal. Our future works include thedesign of a solution method for adaptive embedding problem. Also asfuture research we plan to include the characteristics of the human audi-tory and psychoacoustic model in our watermarking scheme for muchmore improvement of the performance of the watermarking method.Finally, it should be interesting to investigate if the proposed methodsupports various sampling rates with the same payload and robustnessand also if in real applications the method can handle D/A-A/D con-version problems.

REFERENCES

[1] I. J. Cox and M. L. Miller, “The first 50 years of electronic water-marking,” J. Appl. Signal Process., vol. 2, pp. 126–132, 2002.

[2] M. D. Swanson, B. Zhu, and A. H. Tewfik, “Robust audio water-marking using perceptual masking,” Signal Process., vol. 66, no. 3,pp. 337–355, 1998.

[3] S. Wu, J. Huang, D. Huang, and Y. Q. Shi, “Efficiently self-synchro-nized audio watermarking for assured audio data transmission,” IEEETrans. Broadcasting, vol. 51, no. 1, pp. 69–76, Mar. 2005.

[4] V. Bhat, K. I. Sengupta, and A. Das, “An adaptive audio watermarkingbased on the singular value decomposition in the wavelet domain,”Digital Signal Process., vol. 2010, no. 20, pp. 1547–1558, 2010.

[5] D. Kiroveski and S. Malvar, “Robust spread-spectrum audio water-marking,” in Proc. ICASSP, 2001, pp. 1345–1348.

[6] N. E. Huang et al., “The empirical mode decomposition and Hilbertspectrum for nonlinear and non-stationary time series analysis,” Proc.R. Soc., vol. 454, no. 1971, pp. 903–995, 1998.


[7] K. Khaldi, A. O. Boudraa, M. Turki, T. Chonavel, and I. Samaali,“Audio encoding based on the EMD,” in Proc. EUSIPCO, 2009, pp.924–928.

[8] K. Khaldi and A. O. Boudraa, “On signals compression by EMD,”Electron. Lett., vol. 48, no. 21, pp. 1329–1331, 2012.

[9] K. Khaldi, M. T.-H. Alouane, and A. O. Boudraa, “Voiced speech en-hancement based on adaptive filtering of selected intrinsic mode func-tions,” J. Adv. in Adapt. Data Anal., vol. 2, no. 1, pp. 65–80, 2010.

[10] L. Wang, S. Emmanuel, and M. S. Kankanhalli, “EMD and psychoa-coustic model based watermarking for audio,” in Proc. IEEE ICME,2010, pp. 1427–1432.

[11] A. N. K. Zaman, K. M. I. Khalilullah, Md. W. Islam, and Md. K. I.Molla, “A robust digital audio watermarking algorithm using empiricalmode decomposition,” in Proc. IEEE CCECE, 2010, pp. 1–4.

[12] I. J. Cox, J. Kilian, T. Leighton, and T. Shamoon, “A secure, robustwatermark for multimedia,” LNCS, vol. 1174, pp. 185–206, 1996.

[13] B. Chen and G. W. Wornell, “Quantization index modulation methodsfor digital watermarking and information embedding of multimedia,”J. VLSI Signal Process. Syst., vol. 27, pp. 7–33, 2001.

[14] W.-N. Lie and L.-C. Chang, “Robust and high-quality time-domainaudio watermarking based on low frequency amplitude modification,”IEEE Trans. Multimedia, vol. 8, no. 1, pp. 46–59, Feb. 2006.

[15] I.-K. Yeo and H. J. Kim, “Modified patchwork algorithm: A novelaudio watermarking scheme,” IEEE Trans. Speech Audio Process., vol.11, no. 4, pp. 381–386, Jul. 2003.

[16] D. Kiroveski and H. S. Malvar, “Spread-spectrum watermarkingof audio signals,” IEEE Trans. Signal Process., vol. 51, no. 4, pp.1020–1033, Apr. 2003.

[17] R. Tachibana, S. Shimizu, S. Kobayashi, and T. Nakamura, “An audiowatermarking method using a two-dimensional pseudo-random array,”Signal Process., vol. 82, no. 10, pp. 1455–1469, 2002.

[18] N. Cvejic and T. Seppanen, “Spread spectrum audio watermarkingusing frequency hopping and attack characterization,” Signal Process.,vol. 84, no. 1, pp. 207–213, 2004.

[19] W. Li, X. Xue, and P. Lu, “Localised audio watermarking techniquerobust against time-scale modification,” IEEE Trans. Multimedia, vol.8, no. 1, pp. 60–69, 2006.

[20] M. F. Mansour and A. H. Tewfik, “Data embedding in audio usingtime-scale modification,” IEEE Trans. Speech Audio Process., vol. 13,no. 3, pp. 432–440, May 2005.

[21] S. Xiang, H. J. Kim, and J. Huang, “Audio watermarking robust againsttime-scale modification and MP3 compression,” Signal Process., vol.88, no. 10, pp. 2372–2387, 2008.

Documents

Base Paper (ITDSP02)