8
Fast communication A substitution-by-interpolation algorithm for watermarking audio Atul Deshpande , K.M.M. Prabhu Signal Processing Laboratory, Department of Electrical Engineering, Indian Institute of Technology, Madras, Chennai 600036, India article info Article history: Received 30 January 2008 Received in revised form 2 July 2008 Accepted 17 July 2008 Available online 31 July 2008 Keywords: Audio watermarking Interpolation Spline abstract Interpolation of a sample set from a signal gives a close mathematical, and hence, a perceptually similar approximation to the original samples. In this paper, an audio watermarking technique in the temporal domain using spline interpolation is proposed. Test results for imperceptibility of the watermark and its robustness against MP3 compression and resampling attacks are presented. The simulation study shows that the watermark performance is satisfactory against the two forms of attacks mentioned above. & 2008 Elsevier B.V. All rights reserved. 1. Introduction Watermarking is the process of hiding data [1], e.g., copyright information, in a media file, which can later be retrieved from the media to ascertain the ownership of the file. It has been recognized as an important tool in applications varying from digital rights management to testing the integrity of digital media, viz. video, image and audio signals. A good watermark should be imperceptible, secure and robust to common signal alterations. Water- marking techniques exploit the imperfections of the human sensory systems to add more data to the existing signal. Audio watermarking is considered to be much more difficult to implement than image watermarking, since the human auditory system (HAS) is more sensitive to changes in the signal than the human visual system (HVS) [1]. Audio watermarking techniques exploit the insensitiv- ity of the HAS to minute amplitude changes, either in the temporal or the frequency domain. Temporal watermarks include low-bit coding, echo data hiding [1], quantization index modulation [2] and interpolation-based watermark- ing [3], whereas frequency domain watermarks include spread spectrum watermarks [4], and watermarks based on phase manipulations [5,6]. Interpolation [7] is used to generate a mathematical approximation of unknown samples from known sample values, resulting in a perceptually equivalent signal in the process. Hence, it is usually used as a part of attacks on watermarked signals [8]. Conversely, interpolation has also been used in the generation of imperceptible audio [3] and image [9] watermarks, where the interpolation error is used as a metric in extracting the watermark from the audio signal. The methods in [3,9] propose comparison with a signal- dependent threshold in the presence of attack. But the signal properties can vary locally, which either result in reduced performance [3] or more complex procedures [9] to attempt accurate extraction. This paper proposes an improved method of spline interpolation-based water- marking with increased robustness. The following convention will be followed throughout the paper. Let x be the original audio signal, w be the watermark and y ¼ x þ w be the watermarked audio. Let m be the embedded message, and ^ m be the estimate of the message after extraction. The watermarked audio signal is transmitted and possibly subjected to attacks, and the resulting signal will be denoted by z. Under additive white Gaussian noise (AWGN) assumption for the Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/sigpro Signal Processing ARTICLE IN PRESS 0165-1684/$ - see front matter & 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.sigpro.2008.07.015 Corresponding author. Tel.: +9144 22575471; fax: +9144 22574402. E-mail addresses: [email protected] (A. Deshpande), [email protected] (K.M.M. Prabhu). Signal Processing 89 (2009) 218–225

A substitution-by-interpolation algorithm for watermarking audio

Embed Size (px)

Citation preview

Page 1: A substitution-by-interpolation algorithm for watermarking audio

ARTICLE IN PRESS

Contents lists available at ScienceDirect

Signal Processing

Signal Processing 89 (2009) 218–225

0165-16

doi:10.1

� Cor

E-m

prabhu@

journal homepage: www.elsevier.com/locate/sigpro

Fast communication

A substitution-by-interpolation algorithm for watermarking audio

Atul Deshpande �, K.M.M. Prabhu

Signal Processing Laboratory, Department of Electrical Engineering, Indian Institute of Technology, Madras, Chennai 600036, India

a r t i c l e i n f o

Article history:

Received 30 January 2008

Received in revised form

2 July 2008

Accepted 17 July 2008Available online 31 July 2008

Keywords:

Audio watermarking

Interpolation

Spline

84/$ - see front matter & 2008 Elsevier B.V. A

016/j.sigpro.2008.07.015

responding author. Tel.: +9144 22575471; fax

ail addresses: [email protected] (A. Deshpande),

ee.iitm.ac.in (K.M.M. Prabhu).

a b s t r a c t

Interpolation of a sample set from a signal gives a close mathematical, and hence, a

perceptually similar approximation to the original samples. In this paper, an audio

watermarking technique in the temporal domain using spline interpolation is proposed.

Test results for imperceptibility of the watermark and its robustness against MP3

compression and resampling attacks are presented. The simulation study shows that the

watermark performance is satisfactory against the two forms of attacks mentioned

above.

& 2008 Elsevier B.V. All rights reserved.

1. Introduction

Watermarking is the process of hiding data [1], e.g.,copyright information, in a media file, which can later beretrieved from the media to ascertain the ownership ofthe file. It has been recognized as an important tool inapplications varying from digital rights management totesting the integrity of digital media, viz. video, image andaudio signals. A good watermark should be imperceptible,secure and robust to common signal alterations. Water-marking techniques exploit the imperfections of thehuman sensory systems to add more data to the existingsignal. Audio watermarking is considered to be muchmore difficult to implement than image watermarking,since the human auditory system (HAS) is more sensitiveto changes in the signal than the human visual system(HVS) [1].

Audio watermarking techniques exploit the insensitiv-ity of the HAS to minute amplitude changes, either in thetemporal or the frequency domain. Temporal watermarksinclude low-bit coding, echo data hiding [1], quantizationindex modulation [2] and interpolation-based watermark-

ll rights reserved.

: +9144 22574402.

ing [3], whereas frequency domain watermarks includespread spectrum watermarks [4], and watermarks basedon phase manipulations [5,6].

Interpolation [7] is used to generate a mathematicalapproximation of unknown samples from known samplevalues, resulting in a perceptually equivalent signal in theprocess. Hence, it is usually used as a part of attacks onwatermarked signals [8].

Conversely, interpolation has also been used in thegeneration of imperceptible audio [3] and image [9]watermarks, where the interpolation error is used as ametric in extracting the watermark from the audio signal.The methods in [3,9] propose comparison with a signal-dependent threshold in the presence of attack. But thesignal properties can vary locally, which either result inreduced performance [3] or more complex procedures [9]to attempt accurate extraction. This paper proposes animproved method of spline interpolation-based water-marking with increased robustness.

The following convention will be followed throughoutthe paper. Let x be the original audio signal, w be thewatermark and y ¼ xþw be the watermarked audio. Letm be the embedded message, and m̂ be the estimate ofthe message after extraction. The watermarked audiosignal is transmitted and possibly subjected to attacks,and the resulting signal will be denoted by z. Underadditive white Gaussian noise (AWGN) assumption for the

Page 2: A substitution-by-interpolation algorithm for watermarking audio

ARTICLE IN PRESS

Nomenclature

x original audioy watermarked audioz received audiom message

m̂ message estimateu any input vector~u result of interpolation on u�ðuÞ interpolation error on uG interpolation gridS sample set interpolated from G

A. Deshpande, K.M.M. Prabhu / Signal Processing 89 (2009) 218–225 219

distortion, z ¼ y þ n, where nHNð0;s2nIÞ. Signal-to-

watermark ratio (SWR), in decibels (dB), is defined as

SWR910 log10s2

x

s2w

� �, (1)

where s2x is the variance of the original audio x and s2

w isthe variance of the interpolation-based watermark, w.SWR is a measure of the strength of the watermark, andhence an objective indicator of its imperceptibility.

The paper is organized as follows: Section 2 explainsthe watermark embedding and extraction procedures,with analysis and methods to improve its performance.Section 3 deals with the imperceptibility of the water-mark, and presents an experimental study of the robust-ness of the proposed watermark.

2. Interpolation-based watermarking scheme

This section describes the watermarking algorithm,and compares the theoretical performance of the pro-posed algorithm with the optimal detector proposedin [9].

2.1. Embedding algorithm

Fig. 1 illustrates the embedding algorithm. The audiosignal to be watermarked, x, is divided into distinct framesof length N. The i-th frame will thus be called xi. Eachframe is further divided into non-overlapping sample sets,

Fig. 1. Embedding

G the interpolation grid and S the samples whose valueswill be interpolated from the samples in G. ~xijS representsthe interpolated values of the sample set S from thevalues of xijG belonging to G. The sample set S is furtherdivided into non-overlapping sets S1 and S0.

To embed a message bit mi ¼ 1, the samples in S1 aresubstituted by the corresponding values from ~xijS, whilethe samples in S0 are not modified. Similarly, whenmi ¼ 0, only the samples in S0 are substituted by theirinterpolated values from ~xijS. In either case, the samplesin G are left unchanged. The embedding rule is thusspecified as

yijG ¼ xijG,

yijSðkÞ ¼~xijSðkÞ; 8k 2S1; xijSðkÞ; 8k 2S0; jmi ¼ 1;

xijSðkÞ; 8k 2S1; ~xijSðkÞ; 8k 2S0; jmi ¼ 0:

(

(2)

Eq. (2) can be rewritten as

yijSðkÞ ¼xijSðkÞ �miðxijSðkÞ � ~xijSðkÞÞ; 8k 2S1;

~xijSðkÞ þmiðxijSðkÞ � ~xijSðkÞÞ; 8k 2S0:

((3)

Watermarked audio y is retrieved by concatenating thesemodified frames. This paper uses cubic-splines to inter-polate the samples in S. The selection of G and S areopen to design, although within the scope of this paper,the same pattern as proposed in [3] has been used. Thishelps in comparing the performance of the two algorithmsunder similar circumstances.

algorithm.

Page 3: A substitution-by-interpolation algorithm for watermarking audio

ARTICLE IN PRESS

A. Deshpande, K.M.M. Prabhu / Signal Processing 89 (2009) 218–225220

2.2. Extraction algorithm

Fig. 2 illustrates the extraction algorithm. The receivedaudio, z, is divided into distinct frames of length N. Eachframe zi is divided into non-overlapping sample sets, Gand S, as defined in Section 2.1. ~zijS is interpolated fromthe samples in set G. S is further divided into S1 and S0,as in Section 2.1.

For each frame zi, the mean-squared errors r2i1

and r2i0

between zijS and ~zijS are calculated, over the sets S1 andS0, respectively, as follows:

r2i1¼

1

jS1j

Xk2S1

�ðzijSðkÞÞ2, (4)

r2i0¼

1

jS0j

Xk2S0

�ðzijSðkÞÞ2, (5)

where �ðzijSðkÞÞ is the interpolation error for the receivedsignal z over S. jS1j and jS0j represent the cardinality ofthe sample sets S1 and S0, respectively. At the extractionstage, the magnitudes of r2

i1and r2

i0are compared and the

bit m̂i estimated according to the following detection rule:

m̂i ¼1 if r2

i1or2

i0;

0 if r2i1Xr2

i0:

8<: (6)

2.3. Detection analysis

Assuming mi ¼ 1 at the embedding stage,

�ðzijSðkÞÞ ¼�ðnijSðkÞÞ; k 2S1;

�ðnijSðkÞÞ þ �ðxijSðkÞÞ; k 2S0:

((7)

Here, the interpolation error �ðxijSðkÞÞ is assumed to bezero-mean Gaussian with variance s2

�ðxÞ. Since n isassumed to be AWGN and the interpolation function

Fig. 2. Extraction

being linear, �ðnijSðkÞÞ is also a Gaussian with variances2�ðnÞ ¼ ð1þ DÞs2

n, where D is a function of the interpola-tion function used and the interpolation grid G. Since xand n are supposed to be uncorrelated, �ðxijSðkÞÞ and�ðnijSðkÞÞ are also uncorrelated and their sum is Gaussianwith variance s2

�ðxÞ þ s2�ðnÞ. Let the cardinality of sets S, S1

and S0 be jSj, jS1j and jS0j, respectively, with theassumption that jS1j ¼ jS0j ¼ jSj=2.

Earlier works [3,9] proposed the substitution of all thesamples in set S with their interpolated values when themessage bit is 1. On the other hand, if the message bit is 0,the samples are not modified. At the extraction stage, theinterpolation error is calculated over the set S. This processwould yield a zero interpolation error for 1 and a finite non-zero interpolation error for 0 in the absence of distortion.The detection rule was based on the comparison of themean-squared interpolation error with a signal-dependentthreshold. If the error is below the threshold, the bit isdetected as 1. But these algorithms require precise knowl-edge of the signal and noise variance, and also assume thatthese factors are constant throughout the document.

This paper aims at overcoming the drawback men-tioned above by proposing the substitution of values inone set out of either S1 or S0, depending on the messagebit. Since both the sets are equally exposed to distortion,the set substituted with the interpolated values will havea lower mean-squared interpolation error than the onewhich retains the original values.

For example, it is assumed that mi ¼ 1 at the embed-ding stage. In the absence of any form of attack to thewatermarked audio, the following statements hold true:

zijS ¼ yijS,

‘ r2i1¼

1

jS1j

Xk2S1

�ð~xijSðkÞÞ2

¼ 0, (8)

algorithm.

Page 4: A substitution-by-interpolation algorithm for watermarking audio

ARTICLE IN PRESS

Table 1Probability of correct detection for mis-specified noise variance

Ds2�ðnÞ Pd Ds2

�ðnÞ Pd

Proposed w-interp Proposed w-interp

0 0.9553 0.9570 0 0.9553 0.9570

�10% 0.9663 0.9535 þ10% 0.9437 0.9379

�20% 0.9762 0.9336 þ20% 0.9318 0.8943

�30% 0.9847 0.9013 þ30% 0.9197 0.8326

A. Deshpande, K.M.M. Prabhu / Signal Processing 89 (2009) 218–225 221

r2i0¼

1

jS0j

Xk2S0

�ðxijSðkÞÞ2,

‘E½r2i0� ¼ s2

�ðxÞ, (9)

where E½�� is an expectation operator.Clearly, if the distortions r2

i1and r2

i0are compared,

r2i1or2

i0, and hence the embedded bit is correctly

estimated to be 1. Let an attack on the watermarkedaudio add a distortion of variance s2

n to the audio. Thisresults in additional interpolation error s2

�ðnÞ. Now, in thepresence of this additional distortion

E½r2i1� ¼ s2

�ðnÞ, (10)

E½r2i0� ¼ s2

�ðnÞ þ s2�ðxÞ. (11)

Therefore, despite the added distortion due to the attack,the condition

r2i1or2

i0(12)

should hold good. The sample sets S1 and S0 must besufficiently large so as to assure statistical accuracy of theabove statements. Hence, at the extraction stage, ifEq. (12) is satisfied, the bit is detected as 1. A similarargument can be presented to arrive at the followingdecision rule for m̂i ¼ 0:

r2i14r2

i0. (13)

2.4. Theoretical performance study

The detection of the embedded bit can be posed as abinary hypothesis test, where

H0 : E½r2i1� ¼ s2

�ðnÞ þ s2�ðxÞ; E½r2

i0� ¼ s2

�ðnÞ when mi ¼ 0,

H1 : E½r2i1� ¼ s2

�ðnÞ; E½r2i0� ¼ s2

�ðnÞ þ s2�ðxÞ when mi ¼ 1.

For simplification of the proof, implementation of regularinterpolation has been assumed. Since the interpolationerror is assumed to be Gaussian, the metrics r2

i1and r2

i0, as

defined in Eqs. (4) and (5), are chi-square distributed(represented by w2

jS1jand w2

jS0 j) with degrees of freedom

jS1j and jS0j. The scale parameters of r2i1

and r2i0

aredefined by the variance of the respective interpolationerrors over S1 and S0, respectively, which in turn aredependent on mi.

The performance of the proposed detection ruleagainst the optimal detector in [9] is compared undersimilar conditions.

The watermark algorithm w-interp in [9] proposes thefollowing detection rule:

m̂i ¼1 if r2

i onth;

0 if r2i Xnth;

((14)

where nth is a signal dependent threshold calculated fromthe optimum detector described in [9]. The probability ofcorrect detection (Pd) in the scenario of s2

�ðnÞ ¼ s2�ðxÞ, and

jSj ¼ 50 for the aforementioned optimal detector iscalculated and found to be 0.9570.

The theoretical performance of the proposed detectorunder the same circumstances, and for same jSj, isderived as follows:

The detection rule defined in (6) can be rewritten as

m̂i ¼

1 ifr2

i1

r2i0

o1;

0 ifr2

i1

r2i0

X1:

8>>>>><>>>>>:

(15)

Under H1, we have

r2i1

r2i0

¼s2�ðnÞ

ðs2�ðnÞ þ s2

�ðxÞÞ

!r, (16)

where r is an F-distributed random variable, i.e.rHFðjS1j; jS0jÞ. The probability of correct detection(PdjH1) thus can be defined as

PdjH1 ¼ 1� Pr r4ðs2�ðnÞ þ s2

�ðxÞÞ

s2�ðnÞ

�����H1

" #, (17)

which results in PdjH1 ¼ 0:9553 under the assumedconditions. Similarly, PdjH0 ¼ 0:9553, hence the averageprobability of detection assuming equal probability ofembedding 1 and 0 is Pd ¼ 0:9553. The performance of theproposed detector is thus very close to that of the optimaldetector in [9], despite not needing to know the noisevariance, which is required by the optimal detector. It maybe noted that the degrees of freedom of w2

jS1 jand w2

jS0 jare

set to half of w2jSj (used in the optimal detector), to be

consistent with the fact that only half of the samples in Sare actually used for embedding the proposed watermark.

Since the noise variance may vary locally within thedocument in practical cases, the detector performanceshave been further compared for s2

�ðnÞ mis-specified by�10%, �20% and �30%, respectively, and the values of Pd

are tabulated in Table 1.When the noise variance is exactly specified, the

optimal detector performs better than the proposeddetector. However, even slight mis-specifications lead toperformance degradation of the optimal detector. We canobserve from Table 1 that as the noise decreases(Ds2

�ðnÞo0), the performance of the proposed detectorexpectedly improves, since we are now affected by lesserdistortion. At the same time, the optimal detectorperformance worsens because its detection threshold isdependent on exact noise specification. When the noiseincreases, the performance of both the detectors degrades,but the performance degradation in the optimal detectoris much more amplified, since it is simultaneously

Page 5: A substitution-by-interpolation algorithm for watermarking audio

ARTICLE IN PRESS

A. Deshpande, K.M.M. Prabhu / Signal Processing 89 (2009) 218–225222

affected by increased distortion as well as noise mis-specification. As a result, the performance of the proposeddetector is superior to the optimal detector when thesignal characteristics are varying within the document.The proposed detector would likewise outperform theoptimal detector for any variations in s2

�ðxÞ as well.

2.5. Redundancy

The performance of the watermark can be improved byintroducing more redundancy. This can be achieved byincreasing the frame size N, but the drawback being thatin case of irregular patterns for G and S (as in [3]), thedistortion may increase with increasing N. An alternativeway of increasing redundancy is to keep the frame lengthN fixed, and assigning the same bit to R frames (R41).During detection, the samples belonging to sets S1 andS0 of the R frames are clubbed together and the collectivemean-squared errors

PRr2

i1and

PRr2

i0are calculated. The

detection rule defined in (6) can be applied by substitut-ing r2

i1and r2

i0with

PRr2

i1and

PRr2

i0, respectively. This

process increases the number of samples used, and henceassures better detection accuracy. On the other hand, italso reduces the embedding capacity, as more frames areused to embed one bit. The choice of factor R, therefore, isa compromise between robustness and embedding rate.Varying the redundancy factor R does not change the

Table 2Embedding rate (in bps)

N �! 14 15 16 17 18 19 20

No redundancy 3150 2940 2756 2594 2450 2321 2205

R ¼ 5 630 588 551 518 490 464 441

R ¼ 10 315 294 275 259 245 232 220

0 s 1 s 2 s–0.5

0

0.5Original Sop

Watermarked S

0 s 1 s 2 s–0.5

0

0.5

0 s 1 s 2 s–0.5

0

0.5Waterma

Fig. 3. Original Sopranos audio, watermarked S

perceptibility of the watermark, and only affects theembedding rate. The theoretical value of the embeddingrate can be arrived at, using the following formula:

Embedding rate ¼SR

RN, (18)

where SR is the sample rate of the audio. Assuming SR tobe 44,100 samples/s, the embedding rates for differentvalues of R and N are presented in Table 2.

3. Simulations and results

The sound quality assessment material (SQAM) data-base [10] is used to test the watermark for impercept-ibility and robustness to MP3 compression andresampling attacks. Audio samples of 5 s duration areembedded with the watermark. The watermarked audio issubjected to MP3 and resampling attacks, and the bit-detection error rate is noted after extraction of thewatermark. All the results and figures explained belowcorrespond to the Sopranos audio from the SQAMdatabase. Fig. 3 shows a segment of the original Sopranos

audio, its watermarked version, and the correspondingwatermark signal.

3.1. Imperceptibility

3.1.1. Objective evaluation

SWR is an objective measure of the imperceptibility ofthe watermark. The SWR depends on the audio signal tobe watermarked, since the interpolation error depends onthe original signal. Fig. 4 shows the SWR for the Sopranos

audio as frame length N increases. SWR is not a completemeasure of imperceptibility and subjective tests have to

3 s 4 s 5 s

ranos Audio

opranos audio

3 s 4 s 5 s

3 s 4 s 5 s

rk signal

opranos audio and the watermark signal.

Page 6: A substitution-by-interpolation algorithm for watermarking audio

ARTICLE IN PRESS

A. Deshpande, K.M.M. Prabhu / Signal Processing 89 (2009) 218–225 223

be taken into account to completely ascertain theimperceptibility of the watermark.

In the interpolation pattern [3] used, as N is increased,the size of G is kept constant. As a result, the percentage ofwatermarked samples in the audio increases. Fig. 4illustrates the SWR for Sopranos audio as frame length N

increases. The percentage of watermarked samples in theaudio is mentioned in the braces after the values of N. Thefigure shows that as frame length N is increased, SWRdecreases, implying an increase in distortion due to thewatermark. The increasing distortion when N increases isdue to two separate factors:

(1)

Increase in the watermarked sample percentage overthe document.

(2)

Increase in the interpolation error on each sample,since the interpolation is more difficult for newsamples that are far from the grid.

80 96 112 128 160 192 3200

5

10

15

20

25

30

35

mp3 compression rate (kbps)

BE

R (%

)

N = 14N = 16N = 18N = 20

Fig. 5. Robustness to MP3 compression (no redundancy).

The individual effect of watermarked sample percentagecan be studied by using a regular interpolation grid, andstudying the imperceptibility for different percentage ofwatermarked samples.

3.1.2. Subjective evaluation

Preliminary subjective tests do not present any audibledifference between the original and the watermarkedaudio in all cases except in the case of the Glockenspiel

audio. The watermark is perceptible in this case, since thisparticular audio mainly constitutes of high frequencycomponents, and interpolation being effectively a lowpass filtering process, distorts the signal. Hence, intui-tively we can say that this method in its current form willnot pass the imperceptibility test for high frequencysignals.

Subjective evaluation was conducted using the double-stimulus hidden reference test. In this test, the original (A)and watermarked (B) audio files are played to a subject.A third audio (X) is played, which can be either A or B. Thesubject is asked to identify whether X is A or B. A higher

12 (8%) 14 (14%) 16 (19%) 18 (22%) 220

25

30

35

40

45

50

55

Frame length N (Watermar

SW

R (d

B)

Fig. 4. SWR

percentage of correct detections on part of the subjectunderlines a clearer perceptibility of the watermark.A detection percentage of 50% suggests that the differencewas imperceptible and the subject could be guessing.Table 3 presents the results obtained from the subjectivetests for the Sopranos audio, averaged over five subjectswithin the age group of 23–27 years.

Table 3 illustrates the increased perceptibility of thewatermark as N increases. For NX18, the watermark iseasily perceptible.

3.2. Robustness to attacks

3.2.1. MP3 compression

The LAME MP3 encoder [11] is used to compress thewatermarked audio into MP3 format at different bit rates.The audio signal is recovered from the compressed MP3files and the extraction is attempted. Fig. 5 shows thepercentage error in watermark extraction under MP3attack without any added redundancy. Fig. 6(a) shows theperformance when redundancy is added to the scale (a)R ¼ 5 and (b) R ¼ 10. Figs. 6(a) and (b) show a drasticimprovement in performance of the watermark with

0 (25%) 22 (27%) 24 (29%) 26 (31%)

ked sample percentage)

vs. N.

Page 7: A substitution-by-interpolation algorithm for watermarking audio

ARTICLE IN PRESS

Table 3Subjective analysis: detectability of watermark vs. N

N (watermarked sample percentage) Detectability (%)

12 (8%) 50

14 (14%) 54.67

16 (19%) 75

18 (22%) 100

20 (25%) 100

A. Deshpande, K.M.M. Prabhu / Signal Processing 89 (2009) 218–225224

added redundancy. In Figs. 6(a) and (b), for a constant R,the robustness of the watermark increases as N isincreased.

Table 4 compares the performance of the proposedwatermark with that presented in [3] and other commonalgorithms available in the literature, against MP3compression at 128 kbps. The proposed watermark isobserved to be much more robust than [3], and ispositively comparable to the other algorithms. Theperformance figures for the other algorithms are obtainedfrom [12].

3.2.2. Resampling

Resampling attack involves downsampling the signalto another sampling rate, in an attempt to corrupt thewatermark. The attacked audio signal is resampled to theoriginal rate, re-interpolated and the watermark extractedfrom this audio. Fig. 7 shows the detection performance ofthe watermark as frame length N is increased, for differentlevels of redundancy. The watermark performs reasonably

80 96 112 120

2

4

6

8

10

12

14

mp3 compress

BE

R (%

)

80 96 112 120

1

2

3

4

5

6

mp3 compress

BE

R (%

)

Fig. 6. Robustness to MP3 compression with ad

well in the presence of an interpolation-based attackbecause of the irregular pattern selected for G and S(Table 3).

4. Conclusion

This paper presents a temporal audio watermark basedon cubic-spline interpolation. The watermark is imper-ceptible for all audio samples of the SQAM database,except the Glockenspiel audio, since the interpolation

8 160 192 320ion rate (kbps)

8 160 192 320ion rate (kbps)

N = 14N = 16N = 18N = 20

N = 14N = 16N = 18N = 20

ded redundancy. (a) R ¼ 5 and (b) R ¼ 10.

Page 8: A substitution-by-interpolation algorithm for watermarking audio

ARTICLE IN PRESS

14 15 16 17 18 19 2005

10152025303540

N

BE

R (%

)

No redundancyR = 5R = 10

Fig. 7. Robustness to resampling attack.

Table 4Performance of watermarking algorithms against 128 kbps MP3 com-

pression

Watermarking scheme BER (%)

Phase alteration [6] 15

Phase alteration [5] 5

Echo coding 6

DSSS 3

Interpolation based [3] ðN ¼ 15Þ 40

Proposed (N ¼ 16, no redundancy) 13.88

Proposed (N ¼ 16, R ¼ 5) 0.8595

A. Deshpande, K.M.M. Prabhu / Signal Processing 89 (2009) 218–225 225

function does not perform well for high frequency signals.Theoretical performance of the proposed algorithm isshown to closely follow that of the optimal detector [9]without a priori knowledge of the noise variance. Thewatermark performance for the case of the Sopranos audiois presented and it is found to be satisfactorily robust toMP3 compression and resampling. A choice of framelength of 16 with an embedding redundancy of 5 yields adetection accuracy of 99.14% for 128 kbps MP3 compres-sion and 93.46% in the presence of resampling attack at anembedding rate of approximately 550 bps.

Future work could include experimentation withdifferent patterns for the interpolation grid, and differentinterpolation functions to limit the distortion in highfrequency signals, while maintaining the robustness. Theproposed watermark needs to be tested for robustnessagainst other forms of attack. Issues related to security of

the watermark have to be dealt with. Possible solutionsinclude choosing random sample locations for interpola-tion and different interpolation schemes.

Acknowledgments

The authors wish to acknowledge the two anonymousreviewers for their critical comments and suggestions,which have greatly improved the presentation of thepaper. The first author wishes to acknowledge Mr. ArunAyyar’s valuable inputs during the course of the research.

Appendix A. Supplementary data

Supplementary data associated with this article can befound in the online version at doi:10.1016/j.sigpro.2008.07.015.

References

[1] W. Bender, D. Gruhl, N. Morimoto, A. Lu, Techniques for data hiding,IBM Systems J. 35 (3/4) (1996) 313–336.

[2] B. Chen, G. Wornell, Quantization index modulation: a class ofprovably good methods for digital watermarking and informationembedding, IEEE Trans. Inform. Theory 47 (4) (May 2001)1423–1443.

[3] R. Fujimoto, M. Iwaki, T. Kiryu, A method of high bit-rate data hidingin music using spline interpolation, in: Proceedings of InternationalConference on Intelligent Information Hiding and MultimediaSignal Processing, Pasadena, 18–20 December 2006, pp. 11–14.

[4] D. Kirovski, H.S. Malvar, Spread-spectrum watermarking of audiosignals, IEEE Trans. Signal Process. 51 (4) (April 2003) 1020–1033.

[5] P.Y. Liew, M.A. Armand, Inaudible watermarking via phase manip-ulation of random frequencies, Multimedia Tools and Applications35 (3) (December 2007) 357–377.

[6] R. Ansari, H. Malik, A. Khokhar, Data-hiding in audio usingfrequency-selective phase alteration, in: Proceedings of IEEE ICASSP,Montreal, 17–21 May 2004, pp. 389–392.

[7] E. Meijering, A chronology of interpolation: from ancient astronomyto modern signal and image processing, Proc. IEEE 90 (March 2002)319–342.

[8] A. Giannoula, N. Boulgouris, D. Hatzinakos, K. Plataniotis, Water-mark detection for noisy interpolated images, IEEE Trans. Circuitsand Systems 53 (5) (2006) 359–363.

[9] V. Martin, M. Chabert, B. Lacaze, An interpolation-based water-marking scheme, Signal Processing 88 (3) (March 2008) 539–557.

[10] hhttp://sound.media.mit.edu/mpeg4/audio/sqam/i.[11] hhttp://lame.sourceforge.net/index.phpi.[12] J.D. Gordy, L.T. Bruton, Performance evaluation of digital audio

watermarking algorithms, in: Proceedings of 43rd IEEE MidwestSymposium on Circuits and Systems, Lansing, 8–11 August 2000,pp. 456–459.