[IEEE ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) - Prague, Czech Republic (2011.05.22-2011.05.27)] 2011 IEEE International

AN IMPROVED SCHEME OF AUDIO WATERMARKINGBASED ON TURBO CODES AND CHANNEL EFFECT MODELING

Taoufik Majoul(1,2), Fathi Raouafi(1,3), and Meriem Jaıdane(1)

1 Signals and Systems research Unit (U2S), Ecole Nationale d’Ingenieurs de Tunis, Tunisia.2 Institut Superieur des Technologies Medicales de Tunis, Tunisia.

3 Institut Superieur d’Informatique, Ariana, Tunisia.Email: [email protected]

ABSTRACT

Digital audio watermarking is adopted for many multimedia appli-cations such as copyright management, music indexing, broadcastmonitoring, etc. The watermarking techniques must ensure an im-perceptible watermark and a reliable transmission with a maximizedrobustness to a various range of perturbations depending on the ap-plication purposes. This paper deals with an improved error correc-tion strategy for digital audio watermarking. It uses a turbo cod-ing algorithm, coupled with a Wiener filtering, that takes into ac-count the equivalent corrupting noise of the watermarking channeland works with reliable soft metrics values. Experimental tests onreal audio signals demonstrate improved bit error rates (BER) per-formances comparing to the reference generic system which uses aWiener filter at the detection side.

Index Terms— : Audio watermarking, turbo codes, channelmedeling

1. INTRODUCTION

Digital audio watermarking has been firstly proposed as a potentialsolution for copyright protection. An inserted signature would serveto identify the audio data owner or distributer. The use of audiowatermark is extended for many multimedia applications includingadvertising for broadcasting, broadcast monitoring, content descrip-tion, music indexing, etc.

In addition to its transparency and transmission reliability, thewatermarking technology must guarantee an acceptable robustnessto classical perturbations applied to the watermarked signal. Theseperturbations include filtering, format change, dynamics change, etc,depending on the application purposes [1]. Therefore, there is a tradeoff between the main evaluations’ criteria of the performance of awatermarking system: imperceptibility, bit rate and robustness tochannel perturbations [2].

Many research works are focused on improving detection perfor-mances in watermarking systems. Each error correction strategy isdedicated for a specific robustness requirements. In [3], Kirovski etal. minimize the error probability by adapting the watermark powerwith respect to the perceptual constraint. The proposed procedureby Miller et al. in [4] takes benefits from the host signal charac-teristics to generate iteratively the watermark that ensures a robustdetection at the receiver side. Another informed approach which ex-ploit the properties of the host audio signal was presented in [5].The approach presented in [6] is based on equalization techniquesand proposes an efficient detection with a specific Wiener filteringreceiver.

Most of the developed techniques on watermarking detectionare mainly based on signal processing approaches. However, manyworks in image watermarking introduce coding systems for water-mark extraction. The coding schemes -and in spite of their efficiencyin communication systems- still not have been extensively employedin audio watermarking. The applicability of such approaches be-comes difficult due to the specificities of the watermarking commu-nication channel. Indeed, the host audio signal (the noise) is gener-ally non Gaussian, non stationary, correlated, and highly powered.

In [7], Cvejic et al. presented an improved watermarking systemby using turbo codes but the employed scheme has not been clearlydetailed.

The watermarking system introduced in [8] is based on a genericreceiver based on a classical Wiener filtering coupled with an it-erative decoder at the receiver side. Encouraging detection perfor-mances are obtained even in presence of an MPEG compression.

In this work, we focus on the watermarking system proposed in[6]. We introduce a turbo code module in addition to the Wienerfilter firstly based on cascade realization of a zero forcing filter anda noise reducing one1. We calculate accurately the soft inputs of theiterative decoder by an efficient estimation of the distribution of thedistances (separating the received vectors and the emitted informa-tion vectors) which observation reveals that it complies with the gen-eralized Gaussian (GG) model. This idea was motivated by the factthat in [9], we demonstrated that the performances of turbo codes inpresence of a GG noise are considerably improved when the noiseprobability density function (PDF) is accurately estimated.

The paper is organized as follows. In section 2, we describe theaudio watermarking system principles. Section 3 is devoted to theexplanation of the reliable calculation of soft inputs needed for thedecoding step. The simulation results depicting the system perfor-mances are presented in section 4.

2. PROPOSED AUDIO WATERMARKING SYSTEMSTRUCTURE

The proposed audio watermarking system is presented in figure 1.The reception was mainly based on a correlation detector of the re-ceived watermarked signal following the Wiener filtering step. Weincorporate an encoder at the emitter and an iterative decoder afterthe reception filter. The hard decision is then omitted and the softinputs are deduced from the correlation values provided by the de-modulator.

1The reception filter is simply called here: Wiener filter

353978-1-4577-0539-7/11/$26.00 ©2011 IEEE ICASSP 2011

Mod. H(z) �1

H(z)�

PAM

vnak tn yn yn

�bi

��D

audio signal : xn

+Dem. �

��

�

D

�bi

� � �vn

��

inputssoftPAM

IterativeDecoder

Equivalent Watermarking Channel

channelperturb.

Encoder G(z) �

Wiener filter

Fig. 1. Proposed audio watermarking system including an encoder and an iterative decoder

2.1. Embedding process

The binary sequence bi represents the message to be embedded inthe host audio signal xn. This sequence is encoded to form a se-quence of symbols ak where every ak represents a set of l bits. Thecodebook D uses M = 2l vectors dk of length Nd and of unit power

σ2d = dtd

Nd= 1 (imposed by the inaudibility constraint). We use a

codebook with two antipodal Gaussian random white vectors.In the modulation step (Mod.), every symbol ak is associated to

a vector dk from the codebook D. The concatenation of the obtainedvectors yields the modulated signal vn.

To ensure the inaudibility of the watermarking signal tn, vn isprocessed with a psychoacoustic shaping filter H(z). Its impulseresponse (IR) is obtained from the psychoacoustic model (PAM)2 ofthe audio signal xn and updated in each window of length Ns beingprocessed.

Finally, the audio watermarked signal yn is obtained by addingtn to xn in the time domain.

2.2. Detection process

The received signal yn is processed by the considered Wiener filterdesigned to minimize the mean square error MSE = E[|vn−vn|2].The demodulator (Dem.) produces the correlation between the con-stituent vectors vk of the received signal vn and the vectors dk of themodulation codebook. The soft inputs needed in the decoding stageare provided by the distances between every received vector and thedictionary vectors deduced from the correlation values.

We calculate accurately these soft inputs by using a suitablemodel for the distance distribution, in each information receivedblock, depending on a shape parameter. The details on the calcu-lation of the decoder inputs are explained in the next section.

3. COMPUTATION OF THE DECODER SOFT INPUTSTHROUGH CHANNEL EFFECT MODELING

The proposed decoding process is based on the maximum a poste-riori (MAP) algorithm. The decoder requires the knowledge of the

received bits probabilities P (bi|bi). If we consider a modulationdictionary with two antipodal vectors (d0 and d1), every symbol ak

represents an information bit bi. If we note a received vector as v,then we can write:

P (bi|bi = k) = P (v|dk), k = {0, 1} (1)

Therefore, we should take into account the equivalent water-marking effect (on vector modulus and phase displacement).

2The used PAM is a modified version of MPEG1, adapted to the water-marking context

3.1. Equivalent watermarking channel effect on the transmittedvectors and conditional received bits probabilities estimation

As explained before, every transmitted symbol ak is associated toa vector dk. During its processing, this vector is subjected to thethree main operations: shaping filtering, time domain audio signaladding and Wiener filtering. Therefore, a received noisy version ofthis vector is available at the decoding step. Figure 2 shows thereceived vectors constellation for an audio excerpt. It reveals thatevery vector is received with a distortion affecting its modulus andits initial phase (geometric angle) position.

As a consequence, a received vector v can be expressed as:

v = α exp(jφk)dk, k = {0, 1} (2)

where α is a real coefficient defined as 0 < α < 1 (since the ampli-tude of the transmission vector dk is attenuated) and φk is the phasedisplacement of the received vector from the transmitted vector dk

chosen from the embedding codebook.

Therefore, to report the total channel effect on every receivedvector v, we should calculate the distance rk that separates it fromits initial position (initially transmitted vector dk).

−0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0.15 0.2−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

Fig. 2. Constellations of the normalized received vectors from anembedding codebook with two antipodal vectors for a 1000 randombinary watermark. Processing window length: Ns = Nd = 1024.

The conditional received bit probability is then deduced as:

P (bi|bi = k) = P (v|dk) = f(rk), k = {0, 1} (3)

where f(.) is the PDF of the distance distribution.

Hence, an estimation of this PDF in needed so that the distribu-tion of received distance values can be efficiently modeled.

354

3.2. Distance distribution: the Generalized Gaussian model

Figure 3 shows the histograms of the distance r0 for an ”all 0” em-bedded message of length 2000 bits and for different host audio sig-nals.

0 10 20 300

50

100

150

200

250

r0

Frequency

0 10 20 30 400

50

100

150

200

250

0 10 20 30 400

100

200

300

400

0 10 20 300

50

100

150

Fig. 3. Generalized Gaussian aspect of the histogram of the distancer0 between the received vectors and the dictionary vector d0 for an”all 0” embedded message of 2000 bits. 4 different host audio filesare considered. Processing window length: Ns = Nd = 1024.

The histograms fluctuate from the Gaussian case to a model withsome impulse characteristics. Nevertheless, we expect that a modelsuch as the Generalized Gaussian PDF fits the observed distributionsquite well. The distribution is controlled by a shape parameter λ.Thus, the PDF of the distance can be written as:

f(rk) =λ

2ηkΓ(1/λ)exp

(−

∣∣∣ rk − μk

ηk

∣∣∣λ)

, k = {0, 1} (4)

where Γ(.) is the Gamma function and ηk = σk

[Γ(1/λ)Γ(3/λ)

]1/2

is a

scale parameter (σk and μk are respectively the standard deviationand the mean value of the distribution).

When λ = 2 (resp. λ = 1, λ < 1, λ → ∞) f(.) in (4) is theGaussian (resp. Laplacian, Impulsive, Uniform) distribution.

3.3. Shape parameter estimation procedure

Since the variance σ2k and the mean value μk (of the PDF of dis-

tances on the considered frame) can be empirically estimated, weare concerned with an efficient estimation of the shape parameterλ that gives acceptable results with a low computational cost. Todeal with, we adopt the simple and reliable procedure presented byRegazzoni et al. in [10] based on higher order statistics. This pro-posed evaluation was used for multilevel signal estimation in non-Gaussian impulsive noise and demonstrates good results. It will beseen in the following that this method can be adopted successfully inour context.

The shape parameter of a GG distribution can be easily calcu-lated knowing its normalized kurtosis. The relation between the nor-malized kurtosis Kb (Kb = E(b4

k)/σ4b , with σ2

b = E(b2k) ) of a GG

distribution b and its shape parameter λ is given by:

Kb = K(λ) =

[Γ(5/λ)Γ(1/λ)

(Γ(3/λ))2

](5)

The shape parameter can be then estimated as:

λ =log(55/36)

log[√

53

(Kb − 29)]

(6)

Various tests on the distance distribution, for different audio sig-nals, revealed that the shape parameter varies between 2 and 0.5.

In addition, the presented estimator presents good convergenceperformances for this range of shape parameter values of GG signals.Acceptable estimations are obtained from size of 1000 samples.

4. SIMULATION RESULTS

4.1. Summary of the Embedding/Detection system

Let us sum up how operates the proposed watermarking system. Fora considered binary watermark, we generate a coded sequence. Theassociated modulated signal is received in a corrupted version at thereceiver side.

For every received information signal vn associated to the codedbinary watermark frame, we firstly calculate the distances betweenthe constituent vectors and the codebook vectors. We then estimatethe shape parameter λ of the distribution. Ultimately, once the shapeparameter is available, the conditional received bits probabilities de-fined in eq. 3 can be then computed and passed to the decodingstage.

4.2. Simulation set up

The system performances are evaluated in term of BER average ofthe detection of random binary watermarks. The average was cal-culated on the basis of a set of 10 audio signals of different genres,sampled at a frequency of Fe = 44.1 kHz and with a bit resolutionof 16 bits. The watermark bits were divided on frames of 1020 bitsand encoded using an encoder made up of two parallel recursive sys-tematic codes (RSC) using the same polynomial generators (37, 25)and separated by a random interleaver. Since the transmission bitrate is multiplied by the coding rate value, we choose a coding rateof 1

2with block length of 1020 bits as a compromise of the com-

putational cost of the iterative process requirement and the obtainedlower bit rate due to the coding aspect.

The modulation codebook is composed of M = 2 antipodalrandom Gaussian white vectors of variable length Nd. The binary

transmission rate is R = log2(M)Fe

Nd.

In addition to a situation with no attack, we consider situationswith different channel attacks by using the automated evaluationtool3 proposed by [1]. These perturbations include echo adding,high-pass and low-pass filtering, amplitude attenuation, and dynam-ics compression. In addition, we consider an MP3 compression4 andnoise adding.

4.3. Listening tests

The quality of the watermarked signals was evaluated through sub-jective listening tests. These tests were performed on watermarkedsignals and inaudible watermarks were obtained with the consideredaudio watermarking system. Thus, no audible distortion was intro-duced by the watermark.

3We keep the provided default parameters4Performed by the digital encoder tool: ”LAME”

355

4.4. Robustness

Figure 4 presents the BER performances for different detectionstrategies when the channel is free from perturbations. We considera Wiener filter based detection and a Wiener filter coupled to a turbocode with PDF estimation and without PDF estimation (in this casethe distance distribution is assumed to be Gaussian).

The results show that significant improvements are achievedwhen a turbo code is coupled to the Wiener filter. For example, ata bit rate R of 100 bits/s, we obtain a BER of 10−6 with a turbocode operating with the PDF estimation procedure. However, theachieved BERs are about 10−5 and 10−4 with a turbo code withoutPDF estimation and with a Wiener filter based detection respectively.

The PDF estimation procedure doesn’t provide detection im-provement (to the proposed strategy with turbo codes) when workingat a bit rate R > 200 bits/s.

50 100 150 200 250 300 350 400 45010−6

10−5

10−4

10−3

10−2

10−1

bit rate [bits/s]

BE

R

Wiener filteringTurbo code without PDF estimationTurbo code with PDF estimation

Fig. 4. BER vs transmission bit rate performances with no channelperturbations and according to different detection strategies.Turbo code: results on the 7th iteration.

Figure 5 gives the obtained BER values in presence of channelperturbations, with the reference system (Wiener filtering) and withthe proposed detection system, for a fixed bit rate R = 150 bits/s.

1 2 3 4 5 6 7 8 910−4

10−3

10−2

10−1

100

perturbation

BE

R

Wiener filteringTurbo code without PDF estimationTurbo code with PDF estimation

Fig. 5. BER performances of the reference and the proposed strategyin presence of perturbations, for a fixed bit rate R = 150 bits/s.Perturbations: 1: none, 2: echo, 3: high-pass (200 Hz), 4: low-pass(9 kHz), 5: amplitude attenuation (50%), 6: compressor, 7: MP3 (64kbps), 8: MP3 (96 kbps), and 9: noise adding (30 dB). The attacksfrom 2 to 6 are performed by Stirmark Audio tool.

We note that the reference system is affected by noise adding andMP3 compression attacks and no improvement is obtained even withthe proposed detection strategies. This degradation is certainly ob-served because the GG model cannot capture the distortion includedby these perturbations. However, we point out that the proposed de-tection scheme is globally more robust than the reference system forthe other considered channel perturbations.

5. CONCLUSION

In this paper, an error correction strategy for audio watermarking hasbeen presented. The proposed detection method is based on a reli-able calculation of the soft inputs needed at the decoding stage. It en-ables the decoders constituent to work with the best approximationof the ”watermarking channel” effect. This method demonstratesgood detection results and improve significantly the transmission re-liability especially in case of no channel perturbations. Further in-vestigations should be focused in the optimization of the emitter andespecially to take benefits (in the embedding and the decoding pro-cesses) from the a priori knowledge of the host audio signal whichrepresents the main corrupting noise in this communication system.

6. REFERENCES

[1] M. Steinbach, F. Petitcolas, F. Raynal, J. Dittmann, C. Fontaine,S. Seibel, N. Fates, and L. C. Ferri, StirMark benchmark: audiowatermarking attacks, Int. Conf. on Information Technology:Coding and Computing, pp. 49 - 54, 2001.

[2] C. Neubauer and J. Herre, Advanced Watermarking and its Ap-plications, in Proc. 109th Audio Engineering Society Conven-tion, Los Angeles, 2000. Preprint 5176.

[3] D. Kirovski and S. Malvar, Robust Spread-Spectrum Audio Wa-termarking, IEEE Int. Conf. on Acoustics, Speech, and SignalProcessing, 2001.

[4] M. Miller, I. Cox, and J. Bloom, Informed embedding: exploit-ing image and detector information during watermak insertion,IEEE Int. Conf. on Image Processing, September 2000.

[5] C. Baras, N. Moreau, and P. Dymarski, Controlling the Inaudi-bility and Maximizing the Robustness in an Audio AnnotationWatermarking System, IEEE Transactions on Audio, Speech,and Language Processing, Vol. 14, No. 5, September 2006.

[6] S. Larbi, M. Jaıdane, and N. Moreau, A New Wiener FilteringBased Detection Scheme for Time Domain Perceptual Audio Wa-termarking, IEEE Int. Conf. on Acoustics, Speech, and SignalProcessing, Montreal 2004.

[7] N. Cvejic, D. Tujkovic, and T. Seppanen, Increasing Robustnessof an Audio Watermark Using Turbo Codes, IEEE. Int. Conf. onMultimedia and Expo., Baltimore, pp. 1217 - 1220, 2003.

[8] T. Majoul, F. Raouafi, and M. Jaıdane, Audio Data Hiding: Im-proving Detection Using Turbo Codes, 30th AES conference onintelligent audio environments, Saariselka, Finland, 2007.

[9] T. Majoul, F. Raouafi, and M. Jaıdane, Semi-Blind Turbo Decod-ing in Impulsive Noise Channels, 3rd Int. Symp. on Communi-cations, Control and Signal Processing. Malta, Mars 2008.

[10] C. S. Regazzoni, C. Sacchi, A. Teschioni, and S. Giulini,Higher Order Statistics based sharpness evaluation of a gen-eralized Gaussian PDF model in impulsive noisy environments.9th IEEE Workshop on Statistical Signal and Array Processing,pp. 411 - 414, 1998.

356

Documents

[IEEE ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) - Prague, Czech Republic (2011.05.22-2011.05.27)] 2011 IEEE International