14
Research Article Optimization of a Blind Speech Watermarking Technique against Amplitude Scaling Mohammad Ali Nematollahi, 1 Chalee Vorakulpipat, 2 and Hamurabi Gamboa Rosales 3 1 Department of Computer Engineering, Islamic Azad University, Safadasht Branch, Tehran, Iran 2 Cybersecurity Laboratory, National Electronics and Computer Technology Center, 112 Phahonyothin Road, Klong 1, Klong Luang, Pathumthani 12120, ailand 3 Department of Electronics Engineering, Universidad Aut´ onoma de Zacatecas, 98000 Zacatecas, DF, Mexico Correspondence should be addressed to Chalee Vorakulpipat; [email protected] Received 31 August 2016; Revised 26 November 2016; Accepted 18 December 2016; Published 20 February 2017 Academic Editor: Emanuele Maiorana Copyright © 2017 Mohammad Ali Nematollahi et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. is paper presents a gain invariant speech watermarking technique based on quantization of the Lp-norm. In this scheme, first, the original speech signal is divided into different frames. Second, each frame is divided into two vectors based on odd and even indices. ird, quantization index modulation (QIM) is used to embed the watermark bits into the ratio of the Lp-norm between the odd and even indices. Finally, the Lagrange optimization technique is applied to minimize the embedding distortion. By applying a statistical analytical approach, the embedding distortion and error probability are estimated. Experimental results not only confirm the accuracy of the driven statistical analytical approach but also prove the robustness of the proposed technique against common signal processing attacks. 1. Introduction Hiding a secret message in an object has a long history, possibly dating back thousands of years. e rapid growth of computer and communication transmissions has inspired the idea of digital data hiding. Digital watermarking, as a major branch of data hiding, has attracted many researchers [1]. e importance of speech watermarking is gradually increasing because of significant speech transmission through insecure communication channels. ere are many approaches for speech watermarking, including spread spectrum (SS), audi- tory masking, patchwork, transformation, and parametric modeling [2]. In the SS approach, a pseudorandom sequence is used to spread the spectrum of the watermark data and add it to the frequency spectrum of the host signal. However, auditory masking uses unimportant perceptual components of the signal to embed the watermark bits. By contrast, the patchwork approach embeds the watermark data by manipulating two sets of the signal to determine the differ- ence between them. e transformation approach embeds the watermark data into the transformation domains, for example, discrete cosine transform, discrete wavelet trans- form (DWT), and discrete Fourier transform (DFT). Finally, in the parametric modeling approach, the watermark is embedded by modifying the coefficients of the autoregressive (AR) model. In addition to speech watermarking approaches, four main embedding strategies are widely applied for watermark- ing: least significant bit (LSB) replacement, quantization, addition, and multiplication. Among these strategies, quan- tization has attracted much attention because of blindness, robustness, controlled distortion, and payload. For this pur- pose, a set of quantizers that are associated with various watermark data are used. However, the quantization strategy suffers from amplitude scaling. To rectify this problem, rational dither modulation (RDM) [3] was proposed to enhance the robustness of quantization index modulation (QIM) [4, 5]; however, it degraded the imperceptibility of the watermarked signal. Hence, hyperbolic RDM [6] was pro- posed to improve the robustness against power law and gain attacks. Another attempt was made by embedding a watermark into the angle of the signal, known as angle QIM Hindawi Security and Communication Networks Volume 2017, Article ID 5454768, 13 pages https://doi.org/10.1155/2017/5454768

Optimization of a Blind Speech Watermarking Technique ...downloads.hindawi.com/journals/scn/2017/5454768.pdf · ResearchArticle Optimization of a Blind Speech Watermarking Technique

Embed Size (px)

Citation preview

Research ArticleOptimization of a Blind Speech Watermarking Techniqueagainst Amplitude Scaling

Mohammad Ali Nematollahi1 Chalee Vorakulpipat2 and Hamurabi Gamboa Rosales3

1Department of Computer Engineering Islamic Azad University Safadasht Branch Tehran Iran2Cybersecurity Laboratory National Electronics and Computer Technology Center 112 Phahonyothin Road Klong 1Klong Luang Pathumthani 12120 Thailand3Department of Electronics Engineering Universidad Autonoma de Zacatecas 98000 Zacatecas DF Mexico

Correspondence should be addressed to Chalee Vorakulpipat chaleevorakulpipatnectecorth

Received 31 August 2016 Revised 26 November 2016 Accepted 18 December 2016 Published 20 February 2017

Academic Editor Emanuele Maiorana

Copyright copy 2017 Mohammad Ali Nematollahi et al This is an open access article distributed under the Creative CommonsAttribution License which permits unrestricted use distribution and reproduction in any medium provided the original work isproperly cited

This paper presents a gain invariant speech watermarking technique based on quantization of the Lp-norm In this scheme firstthe original speech signal is divided into different frames Second each frame is divided into two vectors based on odd and evenindicesThird quantization indexmodulation (QIM) is used to embed thewatermark bits into the ratio of the Lp-normbetween theodd and even indices Finally the Lagrange optimization technique is applied to minimize the embedding distortion By applying astatistical analytical approach the embedding distortion and error probability are estimated Experimental results not only confirmthe accuracy of the driven statistical analytical approach but also prove the robustness of the proposed technique against commonsignal processing attacks

1 Introduction

Hiding a secret message in an object has a long historypossibly dating back thousands of years The rapid growth ofcomputer and communication transmissions has inspired theidea of digital data hiding Digital watermarking as a majorbranch of data hiding has attractedmany researchers [1]Theimportance of speech watermarking is gradually increasingbecause of significant speech transmission through insecurecommunication channels There are many approaches forspeech watermarking including spread spectrum (SS) audi-tory masking patchwork transformation and parametricmodeling [2] In the SS approach a pseudorandom sequenceis used to spread the spectrum of the watermark data and addit to the frequency spectrum of the host signal Howeverauditory masking uses unimportant perceptual componentsof the signal to embed the watermark bits By contrastthe patchwork approach embeds the watermark data bymanipulating two sets of the signal to determine the differ-ence between them The transformation approach embedsthe watermark data into the transformation domains for

example discrete cosine transform discrete wavelet trans-form (DWT) and discrete Fourier transform (DFT) Finallyin the parametric modeling approach the watermark isembedded bymodifying the coefficients of the autoregressive(AR) model

In addition to speech watermarking approaches fourmain embedding strategies are widely applied for watermark-ing least significant bit (LSB) replacement quantizationaddition and multiplication Among these strategies quan-tization has attracted much attention because of blindnessrobustness controlled distortion and payload For this pur-pose a set of quantizers that are associated with variouswatermark data are used However the quantization strategysuffers from amplitude scaling To rectify this problemrational dither modulation (RDM) [3] was proposed toenhance the robustness of quantization index modulation(QIM) [4 5] however it degraded the imperceptibility of thewatermarked signal Hence hyperbolic RDM [6] was pro-posed to improve the robustness against power law andgain attacks Another attempt was made by embedding awatermark into the angle of the signal known as angle QIM

HindawiSecurity and Communication NetworksVolume 2017 Article ID 5454768 13 pageshttpsdoiorg10115520175454768

2 Security and Communication Networks

(AQIM) [7] However this technique was very sensitive toadditive white Gaussian noise (AWGN) In [8] normalizedcross-correlation between the original signal and a randomsequence was quantized based on dither modulation (knownasNC-DM) to embed thewatermark data However applyingthe random sequence degraded the security of this techniqueLastly other efforts such as projection quantization [9]logarithmic quantization indexmodulation (LQIM) [10] andLp-norm QIM [11] have been studied for a gain invariantimage watermarking technique

This paper attempts tomitigate the limitations of previousresearch by quantizing the ratio between the Lp-norms ofeven and odd indices After quantization the Lagrange opti-mization method is applied to compute the best water-marked sample thatminimizes the embedding distortion andimproves imperceptibility By assuming Laplacian and Gaus-sian distributions for the speech and noise signals respect-ively the embedding distortion and error probability aredriven analytically and validated by performing a simula-tion Moreover experimental results show that the proposedspeech watermarking technique outperforms state-of-the-artwatermarking techniques

Generally speech watermarking should preserve theidentity of the speaker which is important for certain securityapplications [12 13] To preserve speaker-specific informa-tion some investigations have been conducted to embed thewatermark into special frequency subbands that have lessspeaker-specific information [5 14 15] Further discussioncan be found in [16]

The remainder of this paper is organized as follows InSection 2 the proposed model for the speech watermark-ing technique is presented Additionally the watermarkembedding and extraction processes are described The per-formance of the developed watermarking technique is analy-tically studied in Section 3 and validated by performing asimulation in Section 4 The experimental results are ex-plained in Section 5 Finally the conclusion and future workare discussed

2 Proposed Speech Watermarking Technique

In this section a blind speech watermarking technique isdeveloped based on quantization of the Lp-norm ratiobetween two blocks of even and odd indices Assume thatS represents an original speech signal that consists of Nsamples Two subsets X and Y are formed with respect toeven and odd indexed terms respectively so that both119883 and119884 have approximately the same energy that causes lessembedding distortion Moreover synchronization betweenthe transmitter and receiver is most efficient in this case Fig-ure 1 shows the formation of the subsequences of X and Yfrom the odd and even indices of the original signal res-pectively

Then the Lp-norm of both subsequences X and Y arecomputed respectively as follows

119871119883 = 119875radic 2119873 times 119873sum119894=1

100381610038161003816100381611987821198941003816100381610038161003816119875 (1)

119871119884 = 119875radic 2119873 times 119873sum119894=1

10038161003816100381610038161198782119894minus11003816100381610038161003816119875 (2)

The ratio (119885) between 119871119883 and LY given as

119885 = 119871119883119871119884 (3)

is quantized to embed the watermark bit Although embed-ding the watermark into the ratio of Lp-norms can providehigh robustness against various attacks imperceptibility canbe seriously degraded

To resolve this limitation the variation between the orig-inal ratio (119885) and quantized ratio (119885119876) should be minimizedTherefore the Lagrange optimizationmethod is used tomini-mize this variation that is the Lagrange optimizationmethoddecreases the embedding distortion after quantization toimprove the imperceptibility of the watermarked speechsignal As a result the Lagrange optimization problems canbe formulated as follows

Minimize 119869 (119883) = 1198732sum119895=1

(119883119876119895 minus 119883119895)119875

Subject to 119862 (119884) = 119875radic 21198731198732sum119895=1

(119883119876119895 )119875 minus 119885119876 times 119871119884 = 0(4)

To solve this optimization problem the Lagrange methodshould estimate the optimized values of the equation systemas follows

nabla119869 (119883) minus 120582nabla119862 (119884) = 0 (5)

These optimized values are simply computed by solving thefollowing

119883119876opt119895 = 120582opt times 119883119895 (6)

120582opt = 119885119876 times 119871119884119871119883 (7)

21 Speech Watermarking Algorithm The details of the pro-posed embedding and extraction processes are described inthe following algorithms

Embedding Process(a) Segment the input speech signal (S) into different

frames (119878119894) with size N(b) Form two subsequences X and Y each of length1198732

based on the even and odd indices of 119878119894 respectively(c) Compute the Lp-norms LX and LY of both the X and

Y subsequences respectively based on (1) and (2)respectively

(d) Apply theQIM technique to embed the watermark bitinto the ratio between the Lp-norms of X and Y (119885 =119871119883119871119884) as follows

119885119876 = lfloor119885 +119882119894 times Δ2Δ rfloor times 2Δ +119882119894 times Δ 119882119894 isin 0 1 (8)

Security and Communication Networks 3

Even set

Odd set

X1 X2

Y1 Y2

S1 S2 S3 S4 SNminus1 SNmiddot middot middot

X(2timesi)

Y(2timesiminus1)

XN2

YN2

Figure 1 Formation of two odd and even subsets from the original speech signal

where Δ represents the quantization steps 119882119894 is thewatermark bit and119885119876 is the modified ratio of the Lp-norms between X and Y Choosing large quantizationsteps increases the robustness but results in lessimperceptibility and vice versa

(e) Apply the Lagrange method to optimize the values of1198831198762119894(f) Reposition the even and odd subsequences based on1198831198762119894 and Y respectively

(g) Rearrange the watermarked speech signal based onthe modified frames (119878119894)

Figure 2 shows the block diagram of the proposedembedding process

Extraction Process

(a) Segment the input watermarked speech signal () intodifferent frames (119878119894) with size N

(b) Form two subsequences and each of length1198732based on the even and odd indices of Si respectively

(c) Compute the Lp-norms 119871119883 and 119871119884 of both and subsequences respectively based on (1) and (2)respectively

(d) Extract the 119896th binary watermark data from the 119896thframe of the watermarked speech signal by selectingthe minimum Euclidean distance (nearest quantiza-tion step) from the ratio of 119885119896 = 119871119883119871119884 as follows

119896 = min(radic1198851198962 + 1198760 (119885119896)2 radic1198851198962 + 1198761 (119885119896)2) (9)

where 119876119887119896 is the quantization function while meetingthe requirements of watermark bits 119887119896 = 0 1

Figure 3 shows the block diagram of the proposedextraction process

3 Statistical Analysis ofthe Proposed Technique

Generally Laplacian distribution is the best distributionapproach for modeling speech signals within the frame rangeof 5ndash50ms [17 18] Laplacian distribution is expressed as

119891 (119909) = 1198872119890(minus119887|119909minus120583|) 119887 = 119871sum119871119894=1 1003816100381610038161003816119909119894 minus 1205831003816100381610038161003816 (10)

where 119871 is the sample size and 120583 is the mean of the randomvariables If the subsequences of 119883 and 119884 are considered asindependent identically distributed (iid) variables then thedistribution of each of them can be assumed to be Laplaciandistributions119883 = ∁L(120583119883 21198872119883) and119884 = ∁L(120583119884 21198872119884) respec-tively Based on (3) the ratio (119885) between X and Y shouldbe computed However the ratio between two Laplacian dis-tributions cannot be computed exactly because the mean andvariance are not actually finite in either theGaussian or Lapla-cian case The problem arises because the denominator hasnonzero density in the neighborhood of zero If the denomi-nator is bounded away from zero (immediately it no longerhas the ratio of two Laplacian distributions or two normals)then a Taylor expansion should converge to estimate theratio between two Laplacian distributions According toAppendix A the parameters of the ratio can be derived as fol-lows

120583119885 = 120583119883120583119884 (1 + 211988721198841205832119884 ) (11)

1205902119911 = 12058321198831205832119884 (211988721198841205832119884 minus 411988741198841205834119884 ) + 211988721198831205832119884 (12)

To estimate the embedding distortion quantization noise (Δ)should be considered between the original and watermarkedspeech signals as follows

119878119894 minus 119878119894 = (2119894 minus 2119894minus1) minus (1198832119894 minus 1198842119894minus1) (13)

As in (4) to (6) 2119894minus1 = 1198842119894minus1 thus (12) can be expressed as

119878119894 minus 119878119894 = 120582opt times 1198832119894 minus 1198832119894 = 1198832119894 times (120582opt minus 1) (14)

If 119885119876119894 = (119871119883 + 120576)119871119884 then 120582opt can be expressed as

120582opt = (119871119883 + 120576119871119884 ) times 119871119884119871119883 = (1 + 120576119871119883) (15)

4 Security and Communication Networks

X

Y

Lagrange optimization

Waterm

arked signalIn

put s

igna

l

Divided into oddeven

subsequences

Lp-norm(middot)

Rearrangement

Watermark

Framing

Lp-norm(middot)

Quantization(middot)Reposition the

oddeven subsequences

ZQ

LX

LY

Z =LX

LY

Figure 2 Block diagram of the proposed embedding process

X

Y

Minimum Euclideandistancedecoding

Wat

erm

arke

d sig

nal

Divided into oddeven

subsequences

Lp-norm(middot)

Framing

Lp-norm(middot)

Quantization(middot)

Quantization(middot)

Watermark

LX

LY Bit = 0

Bit = 1

Z =LX

LY

Figure 3 Block diagram of the proposed extraction process

Thus (13) can be approximately estimated by

119878119894 minus 119878119894 asymp (1 + 120576119871119883) times 1198832119894 (16)

Therefore the expected values of (13) can be estimated as

119864 [10038171003817100381710038171003817 minus 119878100381710038171003817100381710038172] cong 119864((1 + 120576119871119883)2)119864 (11988322119894)

= [119864( 11198712119883)119864 ((120576)2) + 119864 (120576) 119864( 2119871119883) + 1]119864 (11988322119894) (17)

If quantization noise (120576) is considered as a uniform distri-bution in [minusΔ2 Δ2] then 119864(120576) = 0 and 119864((120576)2) = Δ248Additionally as the mean value of the speech signal is con-sidered to be zero then the zero mean Laplacian distributionis used to model the speech signal as 119864(1198832119894) = 0 As a result(1198832119894 ) = 21198872119883119894 To model 119864(11198712119883) the absolute moment of theLaplacian distribution should be estimated using Appendix Bas follows

119864 (|119883|119875) = (1198901205831198871198871198992 ) [(minus1)119899 sdot 119868119899 + 119899] (18)

where 119868119899 = 119899119868119899minus1 and 119868119899 = int0minusinfin

119905119899119890minus119905119889119905 Thus we can derivethe mean and variance for the 119875th absolute moment of theLaplacian distribution as

119871119875119883 = 1198732sum119895=1

1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875 sim ∁L(120583119883119875 2 (120583119883(2119875) minus 1205832119883119875)119873 ) (19)

Now based on (1) and (19) we can compute 119864(11198712119883) =119864(1( 119875radic119871119875119883)2) = 120583119883(2119875) Therefore the signal-to-watermarkratio (SWR) can be estimated as

SWR = 119864 [1198782]119864 [10038171003817100381710038171003817 minus 119878100381710038171003817100381710038172]

cong 21198872119883 + 21198872119884((Δ248) times 120583119883(2119875) + 1) times (21198872119883) (20)

Because both119883 and119884 sets have been selected from the neigh-boring samples it can be assumed that 21198872119883 cong 21198872119884 As a result(20) can be expressed based on the quantization step as

Δ = radicminus2 (1 + 12 times SWR)SWR times 120583119883(2119875) (21)

To model the error probability it is assumed that the water-marked speech signal passes through anAWGNchannel withzero mean Gaussian noise N(0 1205902119899) Therefore (3) must berewritten as

= sum1198732119895=1 100381610038161003816100381610038161003816119883119895 + 119873119883119895 100381610038161003816100381610038161003816119875sum1198732119895=1 100381610038161003816100381610038161003816119884119895 + 119873119884119895 100381610038161003816100381610038161003816119875 (22)

where 119873119884119895 and 119873119883119895 correspond to the odd and even com-ponents of the AWGN respectively Because the term

Security and Communication Networks 5

sum1198732119895=1 |119883119895|119875 is a known parameter it is not possible to estimate using a chi-square with 119873 degrees of freedom 1206002(119873) To compute the distribution of it should be decomposed andestimated as

= sum1198732119895=1 (1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875 + 119875 1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875minus1119873119883119895 + (119875 (119875 minus 1) 2) 1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875minus21198732119883119895 + sdot sdot sdot + 119873119875119883119895)sum1198732119895=1 (1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875 + 119875 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875minus1119873119884119895 + (119875 (119875 minus 1) 2) 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875minus21198732119884119895 + sdot sdot sdot + 119873119875119884119895) (23)

Equation (23) can be expressed as

asymp Original 119885+ Noise⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞1205741 + 1205742 + 1205743 (24)

where each part of is estimated as follows

Original 119885 = sum1198732119895=1 1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875sum1198732119895=1 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875

1205741 = sum1198732119895=1 119875 1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875minus1119873119883119895sum1198732119895=1 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875

1205742 = minussum1198732119895=1 1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875sum1198732119895=1 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875 times sum1198732119895=1 119875 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875minus1119873119884119895sum1198732119895=1 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875

1205743 = sum1198732119895=1 119875 1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875minus1119873119883119895sum1198732119895=1 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875times sum1198732119895=1 119875 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875minus1119873119884119895sum1198732119895=1 119875 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875minus1119873119884119895

(25)

To estimate the probability of error the noise term can beanalyzed because it makes the original119885 into a wrong regionTherefore the distribution of each term of (24) can be esti-mated by the central limit theorem (CLT) because of thelarge number of samples in each block Regardless of thetype of original speech signal distribution and because ofthe independence between the signal and noise samples themean and variance of the noise can be computed as

120583Noise = 1205831205741 + 1205831205742 + 12058312057431205902Noise = 12059021205741 + 12059021205742 + 12059021205743 (26)

By assuming equal probabilities for both zero and one bitof the watermark data the probability of error for a fixedquantization step (Δ) can be estimated as

119875119890 = infinsum119894=1

12 Pr 119879(119894minus1)2 lt 119885119875 lt 119879(119894+1)2times infinsum119895=minuslfloor1198942rfloor

Pr 1198812119895+119894 lt 119875 lt 1198812119895+119894+1 (27)

A close-form solution for (27) is computed as

119875119890 = infinsum119894=1

(119876(119879119875(119894minus1)2minus120583119885120590119885 ) minus 119876(119879119875(119894+1)2minus120583119885120590119885 ))times infinsum119895=minuslfloor1198942rfloor

(119876(119881119875(119894+2119895)2minus120583119875120590

119875

)minus 119876(119881119875(119894+2119895+1)2minus120583

119875120590

119875

)) (28)

where 119876(sdot) is the complementary error function defined as119876(119909) = (1radic2120587) intinfin119909

119890minus11990622119889119906 119879119894 = 119894Δ 119881119894 = (1198791198942 + 119879(119894+1)2)2 and 120583119885 and 120590119885 can be computed as in (11) and (12)respectively

4 Discussion on the Experimental Results

To validate the performance of the developed watermark-ing technique a simulation was performed on the TIMITdatabase to verify the robustness imperceptibility and capac-ity of the technique The TIMIT database included 630speakers (438 males and 192 females) with sampling fre-quency 16 KHz [19] Each speaker pronounced 10 sentenceswhich are contained in 6300 sentences For the experimentalresults the average results of 630 speech signals with duration1 s to 3 s from 630 speakers were used

Figure 4 shows the bit error rate (BER)with respect to dif-ferent 119875 for various frame lengths underWatermark to NoiseRatio (WNR) = 40 dB In this figure each curve is plottedseparately in order to appear the changes As can be observedthe frame size was positively correlated with the BERWhen-ever the frame size decreased the BER increased Addition-ally it seems that 119875 was not highly correlated with the BERfor 119875 values greater than two Only a small fluctuation can beobserved for the BER when 119875 changed

Figure 5 shows the BER with respect to different 119875 forvarious quantization steps As expected whenever the quan-tization step increased the BER decreased Furthermore thevariation of 119875 did not seriously change the BER It mustbe mentioned that because of perfect watermark detectionunder clean conditions a small AWGN was induced on thewatermarked signals for the experiments shown in Figures 4and 5

Figure 6 shows the variation of the signal-to-noise ratio(SNR) with respect to different 119875 values for different framelengths There was not a significant difference in the SNRwhen the frame size increased As can be observed whenever

6 Security and Communication Networks

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

0045

005

0055

006Bi

t err

or ra

te (

)

2 3 4 5 6 71Lp-norm

(a)

00598

00598

00598

00598

00598

00598

00598

Bit e

rror

rate

()

2 3 4 5 6 71

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

Lp-norm

(b)

0057

00571

00572

00573

00574

00575

00576

00577

Bit e

rror

rate

()

2 3 4 5 6 71

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

Lp-norm

(c)

00495005

005050051

005150052

005250053

00535

Bit e

rror

rate

()

2 3 4 5 6 71

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

Lp-norm

(d)

00536

00538

0054

00542

00544

00546

00548

0055

Bit e

rror

rate

()

2 3 4 5 6 71Lp-norm

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

(e)

004550046

004650047

004750048

004850049

00495005

00505

Bit e

rror

rate

()

2 3 4 5 6 71

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

Lp-norm

(f)

Figure 4 (a) BER versus Lp-norms for different frame lengths under WNR = 40 dB (bndashf) each curve separately

the frame size increased the energy level between the twosets of 119871119883 and 119871119884 increased Consequently the ratio betweenthem increased which caused a lower SNR Additionally itseems that changing119875was not highly correlatedwith the SNRfor different frame lengths

Figure 7 illustrates different SNRswith respect to different119875 for various quantization steps As observed 119875 did nothighly affect the SNR However the quantization step highlyaffected the SNR As the quantization step increased the SNRdecreased

Security and Communication Networks 7

Δ = 095

Δ = 05

Δ = 025

Δ = 01 Δ = 075

2 3 4 5 6 71002

003

004

005

006

007

008

009

01

Bit e

rror

rate

()

Lp-norm

Figure 5 BER versus Lp-norms for different quantization steps under WNR = 40 dB

Frame rate is 40Frame rate is 100Frame rate is 200

Frame rate is 300Frame rate is 400

2 3 4 5 6 713593

359353594

359453595

359553596

359653597

359753598

SNR

(dB)

Lp-norm

Figure 6 SNR versus Lp-norms for different frame lengths

To compute the payload of the proposed watermark amemoryless binary symmetric channel (BSC) (119862BSC) definedas

119862BSC = 119877 times [1 + 119867 (119875119890)] (29)

where

119867(119875119890) = 119875119890 times log(119875119890)2 + (1 minus 119875119890) times log(1minus119875119890)2 (30)

was applied to estimate the capacity of the channel withbitrate (119877) for error-free watermark transmission [20]

Because the sampling rate of the TIMIT was 16KHz 119877was assumed to be 64Kbps (8 KHz for speech bandwidth times8 bits per sample = 64Kbps) for a telephony channel and119875119890 was assumed to be equal to the BER in the watermarkdetection process Figure 8 shows the amount of the BSC fordifferent WNRs for various quantization steps As observedthe capacity increased whenever the WNR increased This isbecause the watermark was extracted with a minimum BERwhen the WNR increased Moreover it can be inferred thatthe amount of the BSC increased while the quantization step

Different quantization rate

Δ = 095

Δ = 05

Δ = 025

Δ = 01 Δ = 075

2 3 4 5 6 7132343638404244464850

SNR

(dB)

Lp-norm

Figure 7 SNR versus Lp-norms for various quantization steps

increased because the watermark was embedded with highintensity when the quantization step increased As observedthe BSC capacity for fewer quantization steps (Δ le 025) wasapproximately zero under a high noisy channel

Figure 9 shows the variation of the BSC capacity withrespect to different WNRs for different frame lengths Asobserved it seems that under serious noise the frame sizewas not a significant factor for the BSC capacity Despite thisthe frame size was likely to be important whenever theWNRincreasedThus for a largeWNR it is obvious that wheneverthe frame size increased the BER in the watermark detectionprocess decreased which caused an improvement in the BSCcapacity

To demonstrate the efficiency and performance of theproposed speech watermarking technique the robustnesscapacity and inaudibility of the proposed technique must becompared with other state-of-the-art speech watermarkingtechniques

Table 1 describes the benchmark for simulating the resultsfor the robustness test Many of these attacks are based on theStirMark Benchmark for Audio (SMBA) [24]

8 Security and Communication Networks

Table 1 Benchmark for speech watermarking

Attack type Attack name Description Parameter(s) Defaultvalue(s)

Additive Noise

AddBrummIt adds buzz or low frequency sinustone to the watermarked signal to

simulate the impact of a power supply⟨STRENGTH⟩ ⟨FREQUENCY⟩ 2500 55 to

3000 75 A

AddDynNoise It adds a dynamic white noise to thewatermarked signal ⟨STRENGTH⟩ 20 to 40 B

AddFFTNoiseIt adds white noise to the

watermarked signal in the frequencydomain

⟨FFTSIZE⟩ ⟨STRENGTH⟩ 256 1000 to1024 3000 C

AddNoiseA white Gaussian noise is

contaminated the watermarked signalto simulate ambient distortion

⟨STRENGTH⟩ 35 dB level to5 dB D

AddSinus It adds a sinus signal to thewatermarked signal ⟨AMPLITUDE⟩ ⟨FREQUENCY⟩ 120 3000 to

150 3500 E

Conversion

Resampling

The sampling rate of the watermarkedsignal is converted to⟨SAMPLERATE1⟩ and then is

reconverted to ⟨SAMPLERATE2⟩⟨SAMPLERATE1⟩ ⟨SAMPLERATE2⟩ 4KHz 16 KHz

to8KHz 16 KHz

F

Requantization

The sample of the watermarked signalis quantized to ⟨QUANTIZATION1⟩

and then is requantized to⟨QUANTIZATION2⟩⟨QUANTIZATION1⟩ ⟨QUANTIZATION2⟩ 8 bits and 16

bits G

InvertIt inverts all samples in the

watermarked signal like a 180 degreephase shift

NO PARAMETER REQUIRED None H

Ambience EchoAn echo with a delay ⟨DELAY⟩ anddecay ⟨DECAY⟩ is added to the

watermarked signal⟨DELAY⟩ ⟨DECAY⟩ 20ms and 10

to 100ms and50

I

Samplepermutations

Cut samples⟨REMOVENUMBER⟩ samples are

removed from the watermarked signalfrom every ⟨REMOVEDIST⟩ period ⟨REMOVEDIST⟩ ⟨REMOVENUMBER⟩ 1 and 1000 to 7

and 1000 J

Copy samplesSome of the samples of the

watermarked signal are copiedbetween the samples values

⟨PERIOD⟩ ⟨COPYDIST⟩ ⟨COPYCOUNT⟩ 1000 100 30to

1000 200 60K

LSB Zero Set all samples of the watermarkedsignal to zero NO PARAMETER REQUIRED None L

SmoothThe new sample value depends on the

samples before and after themodifying point

NO PARAMETER REQUIRED None M

Stat1 It averages the sample with its nextneighbors NO PARAMETER REQUIRED None N

DynamicsAmplify

The amplitude of the watermarkedsignal is increased up to ⟨FACTOR1⟩

and is decreased down to⟨FACTOR2⟩ respectively⟨FACTOR1⟩ ⟨FACTOR2⟩ 150 and 75

200 and 50 O

Denoising The watermarked signal is denoisedby ⟨FACTOR⟩ ⟨FACTOR⟩ minus80 dB tominus60 dB P

Filters

Low Pass Filter(LPF)

The watermarked signal is filtered byan elliptic LPF with cutoff frequency

of ⟨FREQUENCY⟩ ⟨FREQUENCY⟩ 5KHz to4KHz Q

Band PassFilter (BPF)

The watermarked signal is filtered byan elliptic filter with bandwidth from⟨FREQUENCY1⟩ to⟨FREQUENCY2⟩ to simulate a

narrowband telephony channel

⟨FREQUENCY1⟩ ⟨FREQUENCY2⟩500Hz amp4000Hz to300Hz amp3400Hz

R

High PassFilter (HPF)

The watermarked signal is filtered byan elliptic HPF with cutoff frequency

of ⟨FREQUENCY⟩ ⟨FREQUENCY⟩ 500Hz to800Hz S

Security and Communication Networks 9

Table 1 Continued

Attack type Attack name Description Parameter(s) Defaultvalue(s)

Time stretchand pitch shift

Pitch scaleThe pitch of the watermarked signal isnonlinearly scaled without changing

the time⟨SCALEFACTOR⟩ 105 to 1 10 T

Time stretch The time of the watermarked signal isnonlinearly stretched ⟨TEMPOFACTOR⟩ 105 to 110 U

Compression

CELP coding

The watermarked signal is coddedwith rate of ⟨BITRATE⟩ by CELPcodecs and then is decoded to

original one

⟨BITRATE⟩ 16 Kbps to96 kbps V

MP3compression

The watermarked signal iscompressed by MP3 with different

rate ⟨BITRATE⟩ ⟨BITRATE⟩ 128 to 32 W

G711 The watermarked signal is codded bystandard 64 kbps A120583-law PCM NO PARAMETER REQUIRED None X

times104

Δ = 095

Δ = 05

Δ = 025

Δ = 01 Δ = 075

10 15 20 25 30 35 40 45 50 555WNR (dB)

0

1

2

3

4

5

6

7

CBS

C

Figure 8 Variation of the BSC capacity with respect to differentWNRs for different quantization steps

Table 2 compares the BER with state-of-the-art speechwatermarking techniques We implemented all the tech-niques and tested them for the entire TIMIT corpus underdifferent attacks As can be observed the proposed speechwatermarking technique has a lower BER overall comparedwith other techniques

The perceptual quality of the watermarked signal iscritical for the evaluation of the proposed watermarked tech-nique which can be measured based on the mean opinionscore (MOS) (as proposed by the International Telecommu-nicationsUnion (ITU-T) [23]) and SNRTheMOSuses a sub-jective evaluation technique to score the watermarked signalwhich is presented in Table 3 In theMOS evaluationmethod10 people were asked to listen blindly to the original andwatermarked signals Then they reported the dissimilaritiesbetween the quality of the original and watermarked speechsignalsThe average of these reports were computed for MOSmusic and MOS speech and presented in Table 4

times104

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

10 15 20 25 30 35 40 45 50 555WNR (dB)

0

1

2

3

4

5

6C

BSC

Figure 9 Variation of the BSC capacity with respect to differentWNRs for different frame lengths

An objective evaluation technique such as SWR andSNR attempts to quantify this amount based on the followingformula

SNR = 10 times log10sum119899 1198782sum119899 ( minus 119878)2 (31)

where 119878 and are the original and watermarked signalsrespectively

Table 4 presents a comparison of the proposed techniqueand other techniques in terms of imperceptibility and capac-ity Based on the results it seems that the proposed speechwatermarking technique outperformed the other techniquesin terms of capacity and imperceptibility Although the SNRfor formant tuning [21] is higher than the proposed tech-nique the capacity and robustness of the proposed techniqueare greater than those for formant tuning [21] and Analysis-by-Synthesis [22]

10 Security and Communication Networks

Table 2 Comparison with the robustness of different speech watermarking techniques in terms of BER ()

Attack The proposed method DWPT+ multiplication [14] Formant tuning [21] Analysis-by-Synthesis [22]No attack 000 000 004 006A 191ndash423 209ndash543 365ndash645 796ndash965B 965ndash2177 1045ndash2232 1276ndash2445 1623ndash2523C 1013ndash2043 1243ndash2132 1423ndash2354 1743ndash2632D 1053ndash1923 1033ndash1893 1163ndash2323 1533ndash2598E 032ndash202 0763ndash114 123ndash232 298ndash432F 1354ndash1723 1432ndash1765 2623ndash3783 2945ndash3306G 323 265 1932 2387H 023 000 943 1245I 134ndash465 234ndash511 465ndash1043 823ndash1643J 123ndash254 132ndash467 654ndash1054 1154ndash1887K 132ndash316 178ndash423 751ndash1034 1149ndash1943L 092 198 150 404M 312 576 1034 2168N 410 423 665 954O 121ndash254 000ndash143 597ndash876 898ndash1554P 100ndash354 243ndash543 965ndash1456 1965ndash2645Q 2143ndash2943 2454ndash3143 4054ndash4443 5009ndash5032R 484ndash954 532ndash1032 1665ndash2944 2054ndash3698S 1332ndash1854 1500ndash1943 2043ndash2923 2854ndash3076T 132ndash232 201ndash313 743ndash1043 965ndash1532U 015ndash023 018ndash043 145ndash321 432ndash543V 654ndash954 1143ndash1454 132ndash421 232ndash432W 1043ndash2034 1143ndash2534 3632ndash4565 3343ndash5032X 2311 2417 4832 5065Average 580ndash904 668ndash1004 1295ndash1739 1682ndash2148

Table 3 MOS grades [23]

MOS Quality Quality scale Effort required to understand meaning scale(5) Excellent Imperceptible No effort required(4) Good Perceptible but not annoying No appreciable effort required(3) Fair Slightly annoying Moderate effort required(2) Poor Annoying Considerable effort required(1) Bad Very annoying No meaning was understood

As observed in Table 4 each entity was bounded betweentwo values that related a particular value of imperceptibility(SNR andMOS) to a particular capacity Consequently whenthe capacity increased imperceptibility decreasedThe trade-off value is completely application dependent and should bedetermined by the user

5 Performance Analysis

Generally two types of errors false positive probability (FPP)and false negative probability (FNP)must always be analyzedto validate the security of a watermarking system [25] FPPis defined when an unwatermarked speech signal is declaredas a watermarked speech signal by the watermark extractorSimilarly FNP is defined when the watermarked speechsignal is declared as an unwatermarked speech signal by the

watermark extractor By assuming that the watermark bits areindependent random variables both the FPP and FNP can beformulated based on Bernoulli trials which is expressed asfollows

119875119890 = 119879minus1sum119894=0

(119873119894 )119875119894FN (1 minus 119875FN)(119873minus119894)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟FNP

+ 119873sum119894=119879

(119873119894 )119875119894FP (1 minus 119875FP)(119873minus119894)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟FPP

(32)

where119873 is the total number ofwatermark bits 119894 is the numberof matching bits (119873119894 ) is a binomial coefficient 119875FP is theprobability of a false positive which is assumed to be 05

Security and Communication Networks 11

Table 4 Comparison of various watermarking techniques in terms of payload and imperceptibility

Technique Quality scale Effort required to understand meaning scale SNR (dB) Theoretical payload (bps)Analysis-by-Synthesis [22] 401ndash380 476ndash395 2808ndash2532 3333ndash50Formant tuning [21] 498ndash432 500ndash455 3032ndash2754 3333ndash50DWPT+ multiplication [14] 432ndash310 500ndash355 3721ndash2008 3125ndash125The proposed method 487ndash365 500ndash405 4211ndash2071 40ndash400

times10minus4

BER = 021 is shifted by adding 000009BER = 020 is shifted by adding 0000015BER = 019

100 150 200 250 300 350 40050Number of watermark bits

0

1

False

pos

itive

pro

babi

lity

Figure 10 FPP with respect to various total number of watermarkbits for different BER

119875FN is the probability of a false negative which is assumedto be 00919 (as in Table 2) and 119879 is the threshold which iscomputed as follows

119879 = lceil(1 minus BER) times 119873rceil (33)

Figure 10 shows the FPP with respect to various totalnumber of watermark bits for different BER For bettervisualization each line was shifted by adding a constantAs observed the FPP was close to zero for 119873 greater than50 There was a small fluctuation for 119873 less than 50 whichdepended on the BER

Figure 11 shows the FNP with respect to various totalnumber of watermark bits for different BER For bettervisualization each line was shifted by adding a constant Ascan be observed the FNPwas close to zero for119873 greater than100 Additionally whenever the BER decreased the fluctua-tion increased

6 Conclusion and Future Work

In this paper a gain invariant speechwatermarking techniquewas developed using the Lagrange optimization method Forthis purpose samples of the signal were separated based onodd and even indices Then the ratio between the Lp-normswas quantized using the QIM method Finally the Lagrangemethod was used to estimate the optimized values In a sim-ilar manner the extraction process detected the watermarkdata blindly by finding the nearest quantization step

BER = 021 is shifted by adding 0025BER = 020 is shifted by adding 0015BER = 019

0

0005

001

0015

002

0025

003

0035

False

neg

ativ

e pro

babi

lity

100 150 200 250 300 350 40050Number of watermark bits

Figure 11 FNP with respect to various total number of watermarkbits for different BER

By assuming Laplacian distribution for the speech signaland Gaussian distribution for the noise signal the probabilityof error and watermarking distortion were modeled based ona statistical analysis of the proposed technique Additionallyexperimental results not only proved that the developedwatermarking technique was highly robust against differentattacks such compression AWGN filtering and resamplingbut also demonstrated the validity of the analytical modelFor future work an investigation on synchronization andadaptive quantization techniques might contribute to theproposed watermarking technique

Appendix

A Estimation of the Mean and Variance ofthe Ratio of Two Laplacian Variables Basedon Taylor Series

In [26] the bivariate second-order Taylor expansion for119891(119909 119910) around 120579 = (119864(119909) 119864(119910)) is expressed as follows

119891 (119909 119910) = 119891 (120579) + 1198911015840119909 (120579) (119909 minus 120579119909) + 1198911015840119910 (120579) (119910 minus 120579119910)+ 12 11989110158401015840119909119909 (120579) (119909 minus 120579119909)2+ 211989110158401015840119909119910 (120579) (119909 minus 120579119909) (119910 minus 120579119910) + 11989110158401015840119910119910 (120579) (119910 minus 120579119909)2+ remainder

(A1)

12 Security and Communication Networks

Therefore 119864[119891(119883 119884)] can be expanded about 120579 = (119864(119883)119864(119884)) to compute the approximate values as follows

119864 (119891 (119883 119884)) = 119891 (120579) + 12 11989110158401015840119909119909 (120579) var (119883)+ 211989110158401015840119909119910 (120579) cov (119883 119884) + 11989110158401015840119910119910 (120579) var (119884)+ 119874 (119899minus1)

(A2)

For 119891 = 119877119878 11989110158401015840119877119877 = 0 11989110158401015840119877119878 = minus119878minus2 and 11989110158401015840119878119878 = 21198771198783 Thenthe mean and variance of the ratio between 119877 and 119878(119864(119877119878))respectively can be estimated as follows

119864(119877119878 ) equiv 119864 (119891 (119877 119878))asymp 119864 (119877)119864 (119878) minus cov (119877 119878)119864 (119878)2 + var (119878) 119864 (119877)119864 (119878)3= 120583119877120583119878 (1 + 12059021198781205832119878)

var(119877119878 ) asymp 11198642119878 var (119877) + 2minus1198641198771198643119878 cov (119877 119878)+ 11986421198771198644119878 var (119878)

= 12058321198771205832119878 [12059021198771205832119877 minus 2cov (119877 119878)120583119877120583119878 + 12059021198781205832119878 ]

= 12058321198771205832119878 (12059021198781205832119878 minus

12059041198781205834119878) + 12059021198771205832119878

(A3)

B Compute the Absolute Moment ofthe Laplacian Distribution

Themoment of Laplacian distribution expressed as follows

119864 (|119883|119899) = intinfinminusinfin

|119883|119899 sdot 12119887 sdot 119890minus((119883minus120583)119887)119889119909= 12119887 intinfin

minusinfin|119883|119899 sdot 119890minus((119883minus120583)119887)119889119909 (B1)

There are two cases119883 ge 120583 and119883 lt 120583119864 (|119883|119899)

= If 119883 ge 120583 then 12119887 intinfin

minusinfin|119883|119899 sdot 119890minus((119883minus120583)119887)119889119909

If 119883 lt 120583 then 12119887 intinfinminusinfin

|119883|119899 sdot 119890minus((120583minus119883)119887)119889119909(B2)

For first case when119883 ge 120583119864 (|119883|119899) = 12119887 [int0

minusinfinminus119883119899 sdot 119890minus((119883minus120583)119887)119889119909

+ intinfin0

119883119899 sdot 119890minus((119883minus120583)119887)119889119909]

= 1198901205831198872119887 [[[[(minus1)119899 int0

minusinfin119883119899119890119883119887119889119909⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟119868119899

+ intinfin0

119883119899119890minus119883119887119889119909⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟119868

]]]](B3)

If 119905 = minus119883119887 then 119868 can be expressed as

119868 = 119887119899+1 intinfin0

119905119899119890minus119905119889119905 = 119887119899+1 sdot 119899 = 119899 (B4)

119868119899 can also be expressed as

119868119899 = int0minusinfin

119883119899119890119883119887119889119909 = int0minusinfin

(119887 sdot 119905)119899 119890minus119905 sdot 119887 sdot 119889119905= 119887119899+1 int0

minusinfin119905119899119890minus119905119889119905

= 119905119899119890minus119905minus11003816100381610038161003816100381610038161003816100381610038160

minusinfin

minus int0minusinfin

119899 sdot 119905119899minus1119890minus119905minus1 119889119905 = 0 + 119899119868119899minus1(B5)

Substituting (B4) and (B5) into (B3) the absolute momentof the Laplacian distribution can be computed based on

119864 (|119883|119899) = (1198901205831198871198871198992 ) [(minus1)119899 sdot 119868119899 + 119899] (B6)

Competing Interests

The authors declare that they have no competing interests

References

[1] M A Nematollahi C Vorakulpipat and H G Rosales DigitalWatermarking Techniques and Trends vol 11 Springer 2016

[2] MANematollahi and S A R Al-Haddad ldquoAn overview of dig-ital speech watermarkingrdquo International Journal of Speech Tech-nology vol 16 no 4 pp 471ndash488 2013

[3] H-T Hu and L-Y Hsu ldquoA DWT-based rational dither modu-lation scheme for effective blind audio watermarkingrdquo CircuitsSystems and Signal Processing vol 35 no 2 pp 553ndash572 2016

[4] B Chen and G W Wornell ldquoQuantization index modulationa class of provably good methods for digital watermarking andinformation embeddingrdquo Institute of Electrical and ElectronicsEngineers Transactions on InformationTheory vol 47 no 4 pp1423ndash1443 2001

[5] M A Nematollahi M A Akhaee S A R Al-Haddad andH Gamboa-Rosales ldquoSemi-fragile digital speech watermarkingfor online speaker recognitionrdquo Eurasip Journal on AudioSpeech andMusic Processing vol 2015 no 1 article no 31 2015

[6] P Guccione and M Scagliola ldquoHyperbolic RDM for nonlin-ear valumetric distortionsrdquo IEEE Transactions on InformationForensics and Security vol 4 no 1 pp 25ndash35 2009

[7] N Cai N Zhu S Weng and B Wing-Kuen Ling ldquoDifferenceangle quantization index modulation scheme for image water-markingrdquo Signal Processing Image Communication vol 34 pp52ndash60 2015

Security and Communication Networks 13

[8] X Zhu and S Peng ldquoA novel quantization watermarkingscheme by modulating the normalized correlationrdquo in Proceed-ings of the IEEE International Conference on Acoustics Speechand Signal Processing (ICASSP rsquo12) pp 1765ndash1768 IEEE KyotoJapan March 2012

[9] M A Akhaee S M E Sahraeian and C Jin ldquoBlind imagewatermarking using a sample projection approachrdquo IEEETrans-actions on Information Forensics and Security vol 6 no 3 pp883ndash893 2011

[10] N K Kalantari and S M Ahadi ldquoA logarithmic quantizationindex modulation for perceptually better data hidingrdquo IEEETransactions on Image Processing vol 19 no 6 pp 1504ndash15172010

[11] M Zareian andH R Tohidypour ldquoA novel gain invariant quan-tization-based watermarking approachrdquo IEEE Transactions onInformation Forensics and Security vol 9 no 11 pp 1804ndash18132014

[12] M Faundez-Zanuy M Hagmuller and G Kubin ldquoSpeakerverification security improvement by means of speech water-markingrdquo Speech Communication vol 48 no 12 pp 1608ndash16192006

[13] M Faundez-Zanuy M Hagmuller and G Kubin ldquoSpeakeridentification security improvement by means of speech water-markingrdquo Pattern Recognition vol 40 no 11 pp 3027ndash30342007

[14] M A Nematollahi H Gamboa-Rosales M A Akhaee andS A R Al-Haddad ldquoRobust digital speech watermarking foronline speaker recognitionrdquo Mathematical Problems in Engi-neering vol 2015 Article ID 372398 12 pages 2015

[15] M A Nematollahi H Gamboa-Rosales F J Martinez-Ruiz JI de la Rosa-Vargas S A R Al-Haddad and M EsmaeilpourldquoMulti-factor authentication model based on multipurposespeech watermarking and online speaker recognitionrdquo Multi-media Tools and Applications pp 1ndash31 2016

[16] M A Nematollahi S A R Al-Haddad S Doraisamy and HGamboa-Rosales ldquoSpeaker frame selection for digital speechwatermarkingrdquo National Academy Science Letters vol 39 no 3pp 197ndash201 2016

[17] S Gazor andW Zhang ldquoSpeech probability distributionrdquo IEEESignal Processing Letters vol 10 no 7 pp 204ndash207 2003

[18] M A Akhaee N Khademi Kalantari and F Marvasti ldquoRobustaudio and speech watermarking using Gaussian and Laplacianmodelingrdquo Signal Processing vol 90 no 8 pp 2487ndash2497 2010

[19] J S Garofolo and L D Consortium TIMIT Acoustic-PhoneticContinuous Speech Corpus Linguistic Data Consortium 1993

[20] S Verdu and T S Han ldquoA general formula for channel capacityrdquoIEEE Transactions on Information Theory vol 40 no 4 pp1147ndash1157 1994

[21] S Wang and M Unoki ldquoSpeech watermarking method basedon formant tuningrdquo IEICETransactions on Information and Sys-tems vol 98 no 1 pp 29ndash37 2015

[22] B Yan and Y-J Guo ldquoSpeech authentication by semi-fragilespeech watermarking utilizing analysis by synthesis and spec-tral distortion optimizationrdquo Multimedia Tools and Applica-tions vol 67 no 2 pp 383ndash405 2013

[23] I Rec P 800Methods for Subjective Determination of Transmis-sion Quality International Telecommunication Union GenevaSwitzerland 1996

[24] M Steinebach F A P Petitcolas F Raynal et al ldquoStirMarkbenchmark audio watermarking attacksrdquo in Proceedings of theInternational Conference on Information Technology Codingand Computing IEEE 2001

[25] K Vivekananda Bhat I Sengupta and A Das ldquoAn audiowatermarking scheme using singular value decomposition anddither-modulation quantizationrdquo Multimedia Tools and Appli-cations vol 52 no 2-3 pp 369ndash383 2011

[26] R C Elandt-Johnson and N L Johnson Survival Models andData Analysis Wiley Classics Library John Wiley amp Sons NewYork NY USA 1999

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

2 Security and Communication Networks

(AQIM) [7] However this technique was very sensitive toadditive white Gaussian noise (AWGN) In [8] normalizedcross-correlation between the original signal and a randomsequence was quantized based on dither modulation (knownasNC-DM) to embed thewatermark data However applyingthe random sequence degraded the security of this techniqueLastly other efforts such as projection quantization [9]logarithmic quantization indexmodulation (LQIM) [10] andLp-norm QIM [11] have been studied for a gain invariantimage watermarking technique

This paper attempts tomitigate the limitations of previousresearch by quantizing the ratio between the Lp-norms ofeven and odd indices After quantization the Lagrange opti-mization method is applied to compute the best water-marked sample thatminimizes the embedding distortion andimproves imperceptibility By assuming Laplacian and Gaus-sian distributions for the speech and noise signals respect-ively the embedding distortion and error probability aredriven analytically and validated by performing a simula-tion Moreover experimental results show that the proposedspeech watermarking technique outperforms state-of-the-artwatermarking techniques

Generally speech watermarking should preserve theidentity of the speaker which is important for certain securityapplications [12 13] To preserve speaker-specific informa-tion some investigations have been conducted to embed thewatermark into special frequency subbands that have lessspeaker-specific information [5 14 15] Further discussioncan be found in [16]

The remainder of this paper is organized as follows InSection 2 the proposed model for the speech watermark-ing technique is presented Additionally the watermarkembedding and extraction processes are described The per-formance of the developed watermarking technique is analy-tically studied in Section 3 and validated by performing asimulation in Section 4 The experimental results are ex-plained in Section 5 Finally the conclusion and future workare discussed

2 Proposed Speech Watermarking Technique

In this section a blind speech watermarking technique isdeveloped based on quantization of the Lp-norm ratiobetween two blocks of even and odd indices Assume thatS represents an original speech signal that consists of Nsamples Two subsets X and Y are formed with respect toeven and odd indexed terms respectively so that both119883 and119884 have approximately the same energy that causes lessembedding distortion Moreover synchronization betweenthe transmitter and receiver is most efficient in this case Fig-ure 1 shows the formation of the subsequences of X and Yfrom the odd and even indices of the original signal res-pectively

Then the Lp-norm of both subsequences X and Y arecomputed respectively as follows

119871119883 = 119875radic 2119873 times 119873sum119894=1

100381610038161003816100381611987821198941003816100381610038161003816119875 (1)

119871119884 = 119875radic 2119873 times 119873sum119894=1

10038161003816100381610038161198782119894minus11003816100381610038161003816119875 (2)

The ratio (119885) between 119871119883 and LY given as

119885 = 119871119883119871119884 (3)

is quantized to embed the watermark bit Although embed-ding the watermark into the ratio of Lp-norms can providehigh robustness against various attacks imperceptibility canbe seriously degraded

To resolve this limitation the variation between the orig-inal ratio (119885) and quantized ratio (119885119876) should be minimizedTherefore the Lagrange optimizationmethod is used tomini-mize this variation that is the Lagrange optimizationmethoddecreases the embedding distortion after quantization toimprove the imperceptibility of the watermarked speechsignal As a result the Lagrange optimization problems canbe formulated as follows

Minimize 119869 (119883) = 1198732sum119895=1

(119883119876119895 minus 119883119895)119875

Subject to 119862 (119884) = 119875radic 21198731198732sum119895=1

(119883119876119895 )119875 minus 119885119876 times 119871119884 = 0(4)

To solve this optimization problem the Lagrange methodshould estimate the optimized values of the equation systemas follows

nabla119869 (119883) minus 120582nabla119862 (119884) = 0 (5)

These optimized values are simply computed by solving thefollowing

119883119876opt119895 = 120582opt times 119883119895 (6)

120582opt = 119885119876 times 119871119884119871119883 (7)

21 Speech Watermarking Algorithm The details of the pro-posed embedding and extraction processes are described inthe following algorithms

Embedding Process(a) Segment the input speech signal (S) into different

frames (119878119894) with size N(b) Form two subsequences X and Y each of length1198732

based on the even and odd indices of 119878119894 respectively(c) Compute the Lp-norms LX and LY of both the X and

Y subsequences respectively based on (1) and (2)respectively

(d) Apply theQIM technique to embed the watermark bitinto the ratio between the Lp-norms of X and Y (119885 =119871119883119871119884) as follows

119885119876 = lfloor119885 +119882119894 times Δ2Δ rfloor times 2Δ +119882119894 times Δ 119882119894 isin 0 1 (8)

Security and Communication Networks 3

Even set

Odd set

X1 X2

Y1 Y2

S1 S2 S3 S4 SNminus1 SNmiddot middot middot

X(2timesi)

Y(2timesiminus1)

XN2

YN2

Figure 1 Formation of two odd and even subsets from the original speech signal

where Δ represents the quantization steps 119882119894 is thewatermark bit and119885119876 is the modified ratio of the Lp-norms between X and Y Choosing large quantizationsteps increases the robustness but results in lessimperceptibility and vice versa

(e) Apply the Lagrange method to optimize the values of1198831198762119894(f) Reposition the even and odd subsequences based on1198831198762119894 and Y respectively

(g) Rearrange the watermarked speech signal based onthe modified frames (119878119894)

Figure 2 shows the block diagram of the proposedembedding process

Extraction Process

(a) Segment the input watermarked speech signal () intodifferent frames (119878119894) with size N

(b) Form two subsequences and each of length1198732based on the even and odd indices of Si respectively

(c) Compute the Lp-norms 119871119883 and 119871119884 of both and subsequences respectively based on (1) and (2)respectively

(d) Extract the 119896th binary watermark data from the 119896thframe of the watermarked speech signal by selectingthe minimum Euclidean distance (nearest quantiza-tion step) from the ratio of 119885119896 = 119871119883119871119884 as follows

119896 = min(radic1198851198962 + 1198760 (119885119896)2 radic1198851198962 + 1198761 (119885119896)2) (9)

where 119876119887119896 is the quantization function while meetingthe requirements of watermark bits 119887119896 = 0 1

Figure 3 shows the block diagram of the proposedextraction process

3 Statistical Analysis ofthe Proposed Technique

Generally Laplacian distribution is the best distributionapproach for modeling speech signals within the frame rangeof 5ndash50ms [17 18] Laplacian distribution is expressed as

119891 (119909) = 1198872119890(minus119887|119909minus120583|) 119887 = 119871sum119871119894=1 1003816100381610038161003816119909119894 minus 1205831003816100381610038161003816 (10)

where 119871 is the sample size and 120583 is the mean of the randomvariables If the subsequences of 119883 and 119884 are considered asindependent identically distributed (iid) variables then thedistribution of each of them can be assumed to be Laplaciandistributions119883 = ∁L(120583119883 21198872119883) and119884 = ∁L(120583119884 21198872119884) respec-tively Based on (3) the ratio (119885) between X and Y shouldbe computed However the ratio between two Laplacian dis-tributions cannot be computed exactly because the mean andvariance are not actually finite in either theGaussian or Lapla-cian case The problem arises because the denominator hasnonzero density in the neighborhood of zero If the denomi-nator is bounded away from zero (immediately it no longerhas the ratio of two Laplacian distributions or two normals)then a Taylor expansion should converge to estimate theratio between two Laplacian distributions According toAppendix A the parameters of the ratio can be derived as fol-lows

120583119885 = 120583119883120583119884 (1 + 211988721198841205832119884 ) (11)

1205902119911 = 12058321198831205832119884 (211988721198841205832119884 minus 411988741198841205834119884 ) + 211988721198831205832119884 (12)

To estimate the embedding distortion quantization noise (Δ)should be considered between the original and watermarkedspeech signals as follows

119878119894 minus 119878119894 = (2119894 minus 2119894minus1) minus (1198832119894 minus 1198842119894minus1) (13)

As in (4) to (6) 2119894minus1 = 1198842119894minus1 thus (12) can be expressed as

119878119894 minus 119878119894 = 120582opt times 1198832119894 minus 1198832119894 = 1198832119894 times (120582opt minus 1) (14)

If 119885119876119894 = (119871119883 + 120576)119871119884 then 120582opt can be expressed as

120582opt = (119871119883 + 120576119871119884 ) times 119871119884119871119883 = (1 + 120576119871119883) (15)

4 Security and Communication Networks

X

Y

Lagrange optimization

Waterm

arked signalIn

put s

igna

l

Divided into oddeven

subsequences

Lp-norm(middot)

Rearrangement

Watermark

Framing

Lp-norm(middot)

Quantization(middot)Reposition the

oddeven subsequences

ZQ

LX

LY

Z =LX

LY

Figure 2 Block diagram of the proposed embedding process

X

Y

Minimum Euclideandistancedecoding

Wat

erm

arke

d sig

nal

Divided into oddeven

subsequences

Lp-norm(middot)

Framing

Lp-norm(middot)

Quantization(middot)

Quantization(middot)

Watermark

LX

LY Bit = 0

Bit = 1

Z =LX

LY

Figure 3 Block diagram of the proposed extraction process

Thus (13) can be approximately estimated by

119878119894 minus 119878119894 asymp (1 + 120576119871119883) times 1198832119894 (16)

Therefore the expected values of (13) can be estimated as

119864 [10038171003817100381710038171003817 minus 119878100381710038171003817100381710038172] cong 119864((1 + 120576119871119883)2)119864 (11988322119894)

= [119864( 11198712119883)119864 ((120576)2) + 119864 (120576) 119864( 2119871119883) + 1]119864 (11988322119894) (17)

If quantization noise (120576) is considered as a uniform distri-bution in [minusΔ2 Δ2] then 119864(120576) = 0 and 119864((120576)2) = Δ248Additionally as the mean value of the speech signal is con-sidered to be zero then the zero mean Laplacian distributionis used to model the speech signal as 119864(1198832119894) = 0 As a result(1198832119894 ) = 21198872119883119894 To model 119864(11198712119883) the absolute moment of theLaplacian distribution should be estimated using Appendix Bas follows

119864 (|119883|119875) = (1198901205831198871198871198992 ) [(minus1)119899 sdot 119868119899 + 119899] (18)

where 119868119899 = 119899119868119899minus1 and 119868119899 = int0minusinfin

119905119899119890minus119905119889119905 Thus we can derivethe mean and variance for the 119875th absolute moment of theLaplacian distribution as

119871119875119883 = 1198732sum119895=1

1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875 sim ∁L(120583119883119875 2 (120583119883(2119875) minus 1205832119883119875)119873 ) (19)

Now based on (1) and (19) we can compute 119864(11198712119883) =119864(1( 119875radic119871119875119883)2) = 120583119883(2119875) Therefore the signal-to-watermarkratio (SWR) can be estimated as

SWR = 119864 [1198782]119864 [10038171003817100381710038171003817 minus 119878100381710038171003817100381710038172]

cong 21198872119883 + 21198872119884((Δ248) times 120583119883(2119875) + 1) times (21198872119883) (20)

Because both119883 and119884 sets have been selected from the neigh-boring samples it can be assumed that 21198872119883 cong 21198872119884 As a result(20) can be expressed based on the quantization step as

Δ = radicminus2 (1 + 12 times SWR)SWR times 120583119883(2119875) (21)

To model the error probability it is assumed that the water-marked speech signal passes through anAWGNchannel withzero mean Gaussian noise N(0 1205902119899) Therefore (3) must berewritten as

= sum1198732119895=1 100381610038161003816100381610038161003816119883119895 + 119873119883119895 100381610038161003816100381610038161003816119875sum1198732119895=1 100381610038161003816100381610038161003816119884119895 + 119873119884119895 100381610038161003816100381610038161003816119875 (22)

where 119873119884119895 and 119873119883119895 correspond to the odd and even com-ponents of the AWGN respectively Because the term

Security and Communication Networks 5

sum1198732119895=1 |119883119895|119875 is a known parameter it is not possible to estimate using a chi-square with 119873 degrees of freedom 1206002(119873) To compute the distribution of it should be decomposed andestimated as

= sum1198732119895=1 (1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875 + 119875 1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875minus1119873119883119895 + (119875 (119875 minus 1) 2) 1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875minus21198732119883119895 + sdot sdot sdot + 119873119875119883119895)sum1198732119895=1 (1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875 + 119875 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875minus1119873119884119895 + (119875 (119875 minus 1) 2) 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875minus21198732119884119895 + sdot sdot sdot + 119873119875119884119895) (23)

Equation (23) can be expressed as

asymp Original 119885+ Noise⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞1205741 + 1205742 + 1205743 (24)

where each part of is estimated as follows

Original 119885 = sum1198732119895=1 1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875sum1198732119895=1 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875

1205741 = sum1198732119895=1 119875 1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875minus1119873119883119895sum1198732119895=1 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875

1205742 = minussum1198732119895=1 1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875sum1198732119895=1 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875 times sum1198732119895=1 119875 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875minus1119873119884119895sum1198732119895=1 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875

1205743 = sum1198732119895=1 119875 1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875minus1119873119883119895sum1198732119895=1 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875times sum1198732119895=1 119875 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875minus1119873119884119895sum1198732119895=1 119875 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875minus1119873119884119895

(25)

To estimate the probability of error the noise term can beanalyzed because it makes the original119885 into a wrong regionTherefore the distribution of each term of (24) can be esti-mated by the central limit theorem (CLT) because of thelarge number of samples in each block Regardless of thetype of original speech signal distribution and because ofthe independence between the signal and noise samples themean and variance of the noise can be computed as

120583Noise = 1205831205741 + 1205831205742 + 12058312057431205902Noise = 12059021205741 + 12059021205742 + 12059021205743 (26)

By assuming equal probabilities for both zero and one bitof the watermark data the probability of error for a fixedquantization step (Δ) can be estimated as

119875119890 = infinsum119894=1

12 Pr 119879(119894minus1)2 lt 119885119875 lt 119879(119894+1)2times infinsum119895=minuslfloor1198942rfloor

Pr 1198812119895+119894 lt 119875 lt 1198812119895+119894+1 (27)

A close-form solution for (27) is computed as

119875119890 = infinsum119894=1

(119876(119879119875(119894minus1)2minus120583119885120590119885 ) minus 119876(119879119875(119894+1)2minus120583119885120590119885 ))times infinsum119895=minuslfloor1198942rfloor

(119876(119881119875(119894+2119895)2minus120583119875120590

119875

)minus 119876(119881119875(119894+2119895+1)2minus120583

119875120590

119875

)) (28)

where 119876(sdot) is the complementary error function defined as119876(119909) = (1radic2120587) intinfin119909

119890minus11990622119889119906 119879119894 = 119894Δ 119881119894 = (1198791198942 + 119879(119894+1)2)2 and 120583119885 and 120590119885 can be computed as in (11) and (12)respectively

4 Discussion on the Experimental Results

To validate the performance of the developed watermark-ing technique a simulation was performed on the TIMITdatabase to verify the robustness imperceptibility and capac-ity of the technique The TIMIT database included 630speakers (438 males and 192 females) with sampling fre-quency 16 KHz [19] Each speaker pronounced 10 sentenceswhich are contained in 6300 sentences For the experimentalresults the average results of 630 speech signals with duration1 s to 3 s from 630 speakers were used

Figure 4 shows the bit error rate (BER)with respect to dif-ferent 119875 for various frame lengths underWatermark to NoiseRatio (WNR) = 40 dB In this figure each curve is plottedseparately in order to appear the changes As can be observedthe frame size was positively correlated with the BERWhen-ever the frame size decreased the BER increased Addition-ally it seems that 119875 was not highly correlated with the BERfor 119875 values greater than two Only a small fluctuation can beobserved for the BER when 119875 changed

Figure 5 shows the BER with respect to different 119875 forvarious quantization steps As expected whenever the quan-tization step increased the BER decreased Furthermore thevariation of 119875 did not seriously change the BER It mustbe mentioned that because of perfect watermark detectionunder clean conditions a small AWGN was induced on thewatermarked signals for the experiments shown in Figures 4and 5

Figure 6 shows the variation of the signal-to-noise ratio(SNR) with respect to different 119875 values for different framelengths There was not a significant difference in the SNRwhen the frame size increased As can be observed whenever

6 Security and Communication Networks

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

0045

005

0055

006Bi

t err

or ra

te (

)

2 3 4 5 6 71Lp-norm

(a)

00598

00598

00598

00598

00598

00598

00598

Bit e

rror

rate

()

2 3 4 5 6 71

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

Lp-norm

(b)

0057

00571

00572

00573

00574

00575

00576

00577

Bit e

rror

rate

()

2 3 4 5 6 71

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

Lp-norm

(c)

00495005

005050051

005150052

005250053

00535

Bit e

rror

rate

()

2 3 4 5 6 71

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

Lp-norm

(d)

00536

00538

0054

00542

00544

00546

00548

0055

Bit e

rror

rate

()

2 3 4 5 6 71Lp-norm

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

(e)

004550046

004650047

004750048

004850049

00495005

00505

Bit e

rror

rate

()

2 3 4 5 6 71

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

Lp-norm

(f)

Figure 4 (a) BER versus Lp-norms for different frame lengths under WNR = 40 dB (bndashf) each curve separately

the frame size increased the energy level between the twosets of 119871119883 and 119871119884 increased Consequently the ratio betweenthem increased which caused a lower SNR Additionally itseems that changing119875was not highly correlatedwith the SNRfor different frame lengths

Figure 7 illustrates different SNRswith respect to different119875 for various quantization steps As observed 119875 did nothighly affect the SNR However the quantization step highlyaffected the SNR As the quantization step increased the SNRdecreased

Security and Communication Networks 7

Δ = 095

Δ = 05

Δ = 025

Δ = 01 Δ = 075

2 3 4 5 6 71002

003

004

005

006

007

008

009

01

Bit e

rror

rate

()

Lp-norm

Figure 5 BER versus Lp-norms for different quantization steps under WNR = 40 dB

Frame rate is 40Frame rate is 100Frame rate is 200

Frame rate is 300Frame rate is 400

2 3 4 5 6 713593

359353594

359453595

359553596

359653597

359753598

SNR

(dB)

Lp-norm

Figure 6 SNR versus Lp-norms for different frame lengths

To compute the payload of the proposed watermark amemoryless binary symmetric channel (BSC) (119862BSC) definedas

119862BSC = 119877 times [1 + 119867 (119875119890)] (29)

where

119867(119875119890) = 119875119890 times log(119875119890)2 + (1 minus 119875119890) times log(1minus119875119890)2 (30)

was applied to estimate the capacity of the channel withbitrate (119877) for error-free watermark transmission [20]

Because the sampling rate of the TIMIT was 16KHz 119877was assumed to be 64Kbps (8 KHz for speech bandwidth times8 bits per sample = 64Kbps) for a telephony channel and119875119890 was assumed to be equal to the BER in the watermarkdetection process Figure 8 shows the amount of the BSC fordifferent WNRs for various quantization steps As observedthe capacity increased whenever the WNR increased This isbecause the watermark was extracted with a minimum BERwhen the WNR increased Moreover it can be inferred thatthe amount of the BSC increased while the quantization step

Different quantization rate

Δ = 095

Δ = 05

Δ = 025

Δ = 01 Δ = 075

2 3 4 5 6 7132343638404244464850

SNR

(dB)

Lp-norm

Figure 7 SNR versus Lp-norms for various quantization steps

increased because the watermark was embedded with highintensity when the quantization step increased As observedthe BSC capacity for fewer quantization steps (Δ le 025) wasapproximately zero under a high noisy channel

Figure 9 shows the variation of the BSC capacity withrespect to different WNRs for different frame lengths Asobserved it seems that under serious noise the frame sizewas not a significant factor for the BSC capacity Despite thisthe frame size was likely to be important whenever theWNRincreasedThus for a largeWNR it is obvious that wheneverthe frame size increased the BER in the watermark detectionprocess decreased which caused an improvement in the BSCcapacity

To demonstrate the efficiency and performance of theproposed speech watermarking technique the robustnesscapacity and inaudibility of the proposed technique must becompared with other state-of-the-art speech watermarkingtechniques

Table 1 describes the benchmark for simulating the resultsfor the robustness test Many of these attacks are based on theStirMark Benchmark for Audio (SMBA) [24]

8 Security and Communication Networks

Table 1 Benchmark for speech watermarking

Attack type Attack name Description Parameter(s) Defaultvalue(s)

Additive Noise

AddBrummIt adds buzz or low frequency sinustone to the watermarked signal to

simulate the impact of a power supply⟨STRENGTH⟩ ⟨FREQUENCY⟩ 2500 55 to

3000 75 A

AddDynNoise It adds a dynamic white noise to thewatermarked signal ⟨STRENGTH⟩ 20 to 40 B

AddFFTNoiseIt adds white noise to the

watermarked signal in the frequencydomain

⟨FFTSIZE⟩ ⟨STRENGTH⟩ 256 1000 to1024 3000 C

AddNoiseA white Gaussian noise is

contaminated the watermarked signalto simulate ambient distortion

⟨STRENGTH⟩ 35 dB level to5 dB D

AddSinus It adds a sinus signal to thewatermarked signal ⟨AMPLITUDE⟩ ⟨FREQUENCY⟩ 120 3000 to

150 3500 E

Conversion

Resampling

The sampling rate of the watermarkedsignal is converted to⟨SAMPLERATE1⟩ and then is

reconverted to ⟨SAMPLERATE2⟩⟨SAMPLERATE1⟩ ⟨SAMPLERATE2⟩ 4KHz 16 KHz

to8KHz 16 KHz

F

Requantization

The sample of the watermarked signalis quantized to ⟨QUANTIZATION1⟩

and then is requantized to⟨QUANTIZATION2⟩⟨QUANTIZATION1⟩ ⟨QUANTIZATION2⟩ 8 bits and 16

bits G

InvertIt inverts all samples in the

watermarked signal like a 180 degreephase shift

NO PARAMETER REQUIRED None H

Ambience EchoAn echo with a delay ⟨DELAY⟩ anddecay ⟨DECAY⟩ is added to the

watermarked signal⟨DELAY⟩ ⟨DECAY⟩ 20ms and 10

to 100ms and50

I

Samplepermutations

Cut samples⟨REMOVENUMBER⟩ samples are

removed from the watermarked signalfrom every ⟨REMOVEDIST⟩ period ⟨REMOVEDIST⟩ ⟨REMOVENUMBER⟩ 1 and 1000 to 7

and 1000 J

Copy samplesSome of the samples of the

watermarked signal are copiedbetween the samples values

⟨PERIOD⟩ ⟨COPYDIST⟩ ⟨COPYCOUNT⟩ 1000 100 30to

1000 200 60K

LSB Zero Set all samples of the watermarkedsignal to zero NO PARAMETER REQUIRED None L

SmoothThe new sample value depends on the

samples before and after themodifying point

NO PARAMETER REQUIRED None M

Stat1 It averages the sample with its nextneighbors NO PARAMETER REQUIRED None N

DynamicsAmplify

The amplitude of the watermarkedsignal is increased up to ⟨FACTOR1⟩

and is decreased down to⟨FACTOR2⟩ respectively⟨FACTOR1⟩ ⟨FACTOR2⟩ 150 and 75

200 and 50 O

Denoising The watermarked signal is denoisedby ⟨FACTOR⟩ ⟨FACTOR⟩ minus80 dB tominus60 dB P

Filters

Low Pass Filter(LPF)

The watermarked signal is filtered byan elliptic LPF with cutoff frequency

of ⟨FREQUENCY⟩ ⟨FREQUENCY⟩ 5KHz to4KHz Q

Band PassFilter (BPF)

The watermarked signal is filtered byan elliptic filter with bandwidth from⟨FREQUENCY1⟩ to⟨FREQUENCY2⟩ to simulate a

narrowband telephony channel

⟨FREQUENCY1⟩ ⟨FREQUENCY2⟩500Hz amp4000Hz to300Hz amp3400Hz

R

High PassFilter (HPF)

The watermarked signal is filtered byan elliptic HPF with cutoff frequency

of ⟨FREQUENCY⟩ ⟨FREQUENCY⟩ 500Hz to800Hz S

Security and Communication Networks 9

Table 1 Continued

Attack type Attack name Description Parameter(s) Defaultvalue(s)

Time stretchand pitch shift

Pitch scaleThe pitch of the watermarked signal isnonlinearly scaled without changing

the time⟨SCALEFACTOR⟩ 105 to 1 10 T

Time stretch The time of the watermarked signal isnonlinearly stretched ⟨TEMPOFACTOR⟩ 105 to 110 U

Compression

CELP coding

The watermarked signal is coddedwith rate of ⟨BITRATE⟩ by CELPcodecs and then is decoded to

original one

⟨BITRATE⟩ 16 Kbps to96 kbps V

MP3compression

The watermarked signal iscompressed by MP3 with different

rate ⟨BITRATE⟩ ⟨BITRATE⟩ 128 to 32 W

G711 The watermarked signal is codded bystandard 64 kbps A120583-law PCM NO PARAMETER REQUIRED None X

times104

Δ = 095

Δ = 05

Δ = 025

Δ = 01 Δ = 075

10 15 20 25 30 35 40 45 50 555WNR (dB)

0

1

2

3

4

5

6

7

CBS

C

Figure 8 Variation of the BSC capacity with respect to differentWNRs for different quantization steps

Table 2 compares the BER with state-of-the-art speechwatermarking techniques We implemented all the tech-niques and tested them for the entire TIMIT corpus underdifferent attacks As can be observed the proposed speechwatermarking technique has a lower BER overall comparedwith other techniques

The perceptual quality of the watermarked signal iscritical for the evaluation of the proposed watermarked tech-nique which can be measured based on the mean opinionscore (MOS) (as proposed by the International Telecommu-nicationsUnion (ITU-T) [23]) and SNRTheMOSuses a sub-jective evaluation technique to score the watermarked signalwhich is presented in Table 3 In theMOS evaluationmethod10 people were asked to listen blindly to the original andwatermarked signals Then they reported the dissimilaritiesbetween the quality of the original and watermarked speechsignalsThe average of these reports were computed for MOSmusic and MOS speech and presented in Table 4

times104

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

10 15 20 25 30 35 40 45 50 555WNR (dB)

0

1

2

3

4

5

6C

BSC

Figure 9 Variation of the BSC capacity with respect to differentWNRs for different frame lengths

An objective evaluation technique such as SWR andSNR attempts to quantify this amount based on the followingformula

SNR = 10 times log10sum119899 1198782sum119899 ( minus 119878)2 (31)

where 119878 and are the original and watermarked signalsrespectively

Table 4 presents a comparison of the proposed techniqueand other techniques in terms of imperceptibility and capac-ity Based on the results it seems that the proposed speechwatermarking technique outperformed the other techniquesin terms of capacity and imperceptibility Although the SNRfor formant tuning [21] is higher than the proposed tech-nique the capacity and robustness of the proposed techniqueare greater than those for formant tuning [21] and Analysis-by-Synthesis [22]

10 Security and Communication Networks

Table 2 Comparison with the robustness of different speech watermarking techniques in terms of BER ()

Attack The proposed method DWPT+ multiplication [14] Formant tuning [21] Analysis-by-Synthesis [22]No attack 000 000 004 006A 191ndash423 209ndash543 365ndash645 796ndash965B 965ndash2177 1045ndash2232 1276ndash2445 1623ndash2523C 1013ndash2043 1243ndash2132 1423ndash2354 1743ndash2632D 1053ndash1923 1033ndash1893 1163ndash2323 1533ndash2598E 032ndash202 0763ndash114 123ndash232 298ndash432F 1354ndash1723 1432ndash1765 2623ndash3783 2945ndash3306G 323 265 1932 2387H 023 000 943 1245I 134ndash465 234ndash511 465ndash1043 823ndash1643J 123ndash254 132ndash467 654ndash1054 1154ndash1887K 132ndash316 178ndash423 751ndash1034 1149ndash1943L 092 198 150 404M 312 576 1034 2168N 410 423 665 954O 121ndash254 000ndash143 597ndash876 898ndash1554P 100ndash354 243ndash543 965ndash1456 1965ndash2645Q 2143ndash2943 2454ndash3143 4054ndash4443 5009ndash5032R 484ndash954 532ndash1032 1665ndash2944 2054ndash3698S 1332ndash1854 1500ndash1943 2043ndash2923 2854ndash3076T 132ndash232 201ndash313 743ndash1043 965ndash1532U 015ndash023 018ndash043 145ndash321 432ndash543V 654ndash954 1143ndash1454 132ndash421 232ndash432W 1043ndash2034 1143ndash2534 3632ndash4565 3343ndash5032X 2311 2417 4832 5065Average 580ndash904 668ndash1004 1295ndash1739 1682ndash2148

Table 3 MOS grades [23]

MOS Quality Quality scale Effort required to understand meaning scale(5) Excellent Imperceptible No effort required(4) Good Perceptible but not annoying No appreciable effort required(3) Fair Slightly annoying Moderate effort required(2) Poor Annoying Considerable effort required(1) Bad Very annoying No meaning was understood

As observed in Table 4 each entity was bounded betweentwo values that related a particular value of imperceptibility(SNR andMOS) to a particular capacity Consequently whenthe capacity increased imperceptibility decreasedThe trade-off value is completely application dependent and should bedetermined by the user

5 Performance Analysis

Generally two types of errors false positive probability (FPP)and false negative probability (FNP)must always be analyzedto validate the security of a watermarking system [25] FPPis defined when an unwatermarked speech signal is declaredas a watermarked speech signal by the watermark extractorSimilarly FNP is defined when the watermarked speechsignal is declared as an unwatermarked speech signal by the

watermark extractor By assuming that the watermark bits areindependent random variables both the FPP and FNP can beformulated based on Bernoulli trials which is expressed asfollows

119875119890 = 119879minus1sum119894=0

(119873119894 )119875119894FN (1 minus 119875FN)(119873minus119894)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟FNP

+ 119873sum119894=119879

(119873119894 )119875119894FP (1 minus 119875FP)(119873minus119894)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟FPP

(32)

where119873 is the total number ofwatermark bits 119894 is the numberof matching bits (119873119894 ) is a binomial coefficient 119875FP is theprobability of a false positive which is assumed to be 05

Security and Communication Networks 11

Table 4 Comparison of various watermarking techniques in terms of payload and imperceptibility

Technique Quality scale Effort required to understand meaning scale SNR (dB) Theoretical payload (bps)Analysis-by-Synthesis [22] 401ndash380 476ndash395 2808ndash2532 3333ndash50Formant tuning [21] 498ndash432 500ndash455 3032ndash2754 3333ndash50DWPT+ multiplication [14] 432ndash310 500ndash355 3721ndash2008 3125ndash125The proposed method 487ndash365 500ndash405 4211ndash2071 40ndash400

times10minus4

BER = 021 is shifted by adding 000009BER = 020 is shifted by adding 0000015BER = 019

100 150 200 250 300 350 40050Number of watermark bits

0

1

False

pos

itive

pro

babi

lity

Figure 10 FPP with respect to various total number of watermarkbits for different BER

119875FN is the probability of a false negative which is assumedto be 00919 (as in Table 2) and 119879 is the threshold which iscomputed as follows

119879 = lceil(1 minus BER) times 119873rceil (33)

Figure 10 shows the FPP with respect to various totalnumber of watermark bits for different BER For bettervisualization each line was shifted by adding a constantAs observed the FPP was close to zero for 119873 greater than50 There was a small fluctuation for 119873 less than 50 whichdepended on the BER

Figure 11 shows the FNP with respect to various totalnumber of watermark bits for different BER For bettervisualization each line was shifted by adding a constant Ascan be observed the FNPwas close to zero for119873 greater than100 Additionally whenever the BER decreased the fluctua-tion increased

6 Conclusion and Future Work

In this paper a gain invariant speechwatermarking techniquewas developed using the Lagrange optimization method Forthis purpose samples of the signal were separated based onodd and even indices Then the ratio between the Lp-normswas quantized using the QIM method Finally the Lagrangemethod was used to estimate the optimized values In a sim-ilar manner the extraction process detected the watermarkdata blindly by finding the nearest quantization step

BER = 021 is shifted by adding 0025BER = 020 is shifted by adding 0015BER = 019

0

0005

001

0015

002

0025

003

0035

False

neg

ativ

e pro

babi

lity

100 150 200 250 300 350 40050Number of watermark bits

Figure 11 FNP with respect to various total number of watermarkbits for different BER

By assuming Laplacian distribution for the speech signaland Gaussian distribution for the noise signal the probabilityof error and watermarking distortion were modeled based ona statistical analysis of the proposed technique Additionallyexperimental results not only proved that the developedwatermarking technique was highly robust against differentattacks such compression AWGN filtering and resamplingbut also demonstrated the validity of the analytical modelFor future work an investigation on synchronization andadaptive quantization techniques might contribute to theproposed watermarking technique

Appendix

A Estimation of the Mean and Variance ofthe Ratio of Two Laplacian Variables Basedon Taylor Series

In [26] the bivariate second-order Taylor expansion for119891(119909 119910) around 120579 = (119864(119909) 119864(119910)) is expressed as follows

119891 (119909 119910) = 119891 (120579) + 1198911015840119909 (120579) (119909 minus 120579119909) + 1198911015840119910 (120579) (119910 minus 120579119910)+ 12 11989110158401015840119909119909 (120579) (119909 minus 120579119909)2+ 211989110158401015840119909119910 (120579) (119909 minus 120579119909) (119910 minus 120579119910) + 11989110158401015840119910119910 (120579) (119910 minus 120579119909)2+ remainder

(A1)

12 Security and Communication Networks

Therefore 119864[119891(119883 119884)] can be expanded about 120579 = (119864(119883)119864(119884)) to compute the approximate values as follows

119864 (119891 (119883 119884)) = 119891 (120579) + 12 11989110158401015840119909119909 (120579) var (119883)+ 211989110158401015840119909119910 (120579) cov (119883 119884) + 11989110158401015840119910119910 (120579) var (119884)+ 119874 (119899minus1)

(A2)

For 119891 = 119877119878 11989110158401015840119877119877 = 0 11989110158401015840119877119878 = minus119878minus2 and 11989110158401015840119878119878 = 21198771198783 Thenthe mean and variance of the ratio between 119877 and 119878(119864(119877119878))respectively can be estimated as follows

119864(119877119878 ) equiv 119864 (119891 (119877 119878))asymp 119864 (119877)119864 (119878) minus cov (119877 119878)119864 (119878)2 + var (119878) 119864 (119877)119864 (119878)3= 120583119877120583119878 (1 + 12059021198781205832119878)

var(119877119878 ) asymp 11198642119878 var (119877) + 2minus1198641198771198643119878 cov (119877 119878)+ 11986421198771198644119878 var (119878)

= 12058321198771205832119878 [12059021198771205832119877 minus 2cov (119877 119878)120583119877120583119878 + 12059021198781205832119878 ]

= 12058321198771205832119878 (12059021198781205832119878 minus

12059041198781205834119878) + 12059021198771205832119878

(A3)

B Compute the Absolute Moment ofthe Laplacian Distribution

Themoment of Laplacian distribution expressed as follows

119864 (|119883|119899) = intinfinminusinfin

|119883|119899 sdot 12119887 sdot 119890minus((119883minus120583)119887)119889119909= 12119887 intinfin

minusinfin|119883|119899 sdot 119890minus((119883minus120583)119887)119889119909 (B1)

There are two cases119883 ge 120583 and119883 lt 120583119864 (|119883|119899)

= If 119883 ge 120583 then 12119887 intinfin

minusinfin|119883|119899 sdot 119890minus((119883minus120583)119887)119889119909

If 119883 lt 120583 then 12119887 intinfinminusinfin

|119883|119899 sdot 119890minus((120583minus119883)119887)119889119909(B2)

For first case when119883 ge 120583119864 (|119883|119899) = 12119887 [int0

minusinfinminus119883119899 sdot 119890minus((119883minus120583)119887)119889119909

+ intinfin0

119883119899 sdot 119890minus((119883minus120583)119887)119889119909]

= 1198901205831198872119887 [[[[(minus1)119899 int0

minusinfin119883119899119890119883119887119889119909⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟119868119899

+ intinfin0

119883119899119890minus119883119887119889119909⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟119868

]]]](B3)

If 119905 = minus119883119887 then 119868 can be expressed as

119868 = 119887119899+1 intinfin0

119905119899119890minus119905119889119905 = 119887119899+1 sdot 119899 = 119899 (B4)

119868119899 can also be expressed as

119868119899 = int0minusinfin

119883119899119890119883119887119889119909 = int0minusinfin

(119887 sdot 119905)119899 119890minus119905 sdot 119887 sdot 119889119905= 119887119899+1 int0

minusinfin119905119899119890minus119905119889119905

= 119905119899119890minus119905minus11003816100381610038161003816100381610038161003816100381610038160

minusinfin

minus int0minusinfin

119899 sdot 119905119899minus1119890minus119905minus1 119889119905 = 0 + 119899119868119899minus1(B5)

Substituting (B4) and (B5) into (B3) the absolute momentof the Laplacian distribution can be computed based on

119864 (|119883|119899) = (1198901205831198871198871198992 ) [(minus1)119899 sdot 119868119899 + 119899] (B6)

Competing Interests

The authors declare that they have no competing interests

References

[1] M A Nematollahi C Vorakulpipat and H G Rosales DigitalWatermarking Techniques and Trends vol 11 Springer 2016

[2] MANematollahi and S A R Al-Haddad ldquoAn overview of dig-ital speech watermarkingrdquo International Journal of Speech Tech-nology vol 16 no 4 pp 471ndash488 2013

[3] H-T Hu and L-Y Hsu ldquoA DWT-based rational dither modu-lation scheme for effective blind audio watermarkingrdquo CircuitsSystems and Signal Processing vol 35 no 2 pp 553ndash572 2016

[4] B Chen and G W Wornell ldquoQuantization index modulationa class of provably good methods for digital watermarking andinformation embeddingrdquo Institute of Electrical and ElectronicsEngineers Transactions on InformationTheory vol 47 no 4 pp1423ndash1443 2001

[5] M A Nematollahi M A Akhaee S A R Al-Haddad andH Gamboa-Rosales ldquoSemi-fragile digital speech watermarkingfor online speaker recognitionrdquo Eurasip Journal on AudioSpeech andMusic Processing vol 2015 no 1 article no 31 2015

[6] P Guccione and M Scagliola ldquoHyperbolic RDM for nonlin-ear valumetric distortionsrdquo IEEE Transactions on InformationForensics and Security vol 4 no 1 pp 25ndash35 2009

[7] N Cai N Zhu S Weng and B Wing-Kuen Ling ldquoDifferenceangle quantization index modulation scheme for image water-markingrdquo Signal Processing Image Communication vol 34 pp52ndash60 2015

Security and Communication Networks 13

[8] X Zhu and S Peng ldquoA novel quantization watermarkingscheme by modulating the normalized correlationrdquo in Proceed-ings of the IEEE International Conference on Acoustics Speechand Signal Processing (ICASSP rsquo12) pp 1765ndash1768 IEEE KyotoJapan March 2012

[9] M A Akhaee S M E Sahraeian and C Jin ldquoBlind imagewatermarking using a sample projection approachrdquo IEEETrans-actions on Information Forensics and Security vol 6 no 3 pp883ndash893 2011

[10] N K Kalantari and S M Ahadi ldquoA logarithmic quantizationindex modulation for perceptually better data hidingrdquo IEEETransactions on Image Processing vol 19 no 6 pp 1504ndash15172010

[11] M Zareian andH R Tohidypour ldquoA novel gain invariant quan-tization-based watermarking approachrdquo IEEE Transactions onInformation Forensics and Security vol 9 no 11 pp 1804ndash18132014

[12] M Faundez-Zanuy M Hagmuller and G Kubin ldquoSpeakerverification security improvement by means of speech water-markingrdquo Speech Communication vol 48 no 12 pp 1608ndash16192006

[13] M Faundez-Zanuy M Hagmuller and G Kubin ldquoSpeakeridentification security improvement by means of speech water-markingrdquo Pattern Recognition vol 40 no 11 pp 3027ndash30342007

[14] M A Nematollahi H Gamboa-Rosales M A Akhaee andS A R Al-Haddad ldquoRobust digital speech watermarking foronline speaker recognitionrdquo Mathematical Problems in Engi-neering vol 2015 Article ID 372398 12 pages 2015

[15] M A Nematollahi H Gamboa-Rosales F J Martinez-Ruiz JI de la Rosa-Vargas S A R Al-Haddad and M EsmaeilpourldquoMulti-factor authentication model based on multipurposespeech watermarking and online speaker recognitionrdquo Multi-media Tools and Applications pp 1ndash31 2016

[16] M A Nematollahi S A R Al-Haddad S Doraisamy and HGamboa-Rosales ldquoSpeaker frame selection for digital speechwatermarkingrdquo National Academy Science Letters vol 39 no 3pp 197ndash201 2016

[17] S Gazor andW Zhang ldquoSpeech probability distributionrdquo IEEESignal Processing Letters vol 10 no 7 pp 204ndash207 2003

[18] M A Akhaee N Khademi Kalantari and F Marvasti ldquoRobustaudio and speech watermarking using Gaussian and Laplacianmodelingrdquo Signal Processing vol 90 no 8 pp 2487ndash2497 2010

[19] J S Garofolo and L D Consortium TIMIT Acoustic-PhoneticContinuous Speech Corpus Linguistic Data Consortium 1993

[20] S Verdu and T S Han ldquoA general formula for channel capacityrdquoIEEE Transactions on Information Theory vol 40 no 4 pp1147ndash1157 1994

[21] S Wang and M Unoki ldquoSpeech watermarking method basedon formant tuningrdquo IEICETransactions on Information and Sys-tems vol 98 no 1 pp 29ndash37 2015

[22] B Yan and Y-J Guo ldquoSpeech authentication by semi-fragilespeech watermarking utilizing analysis by synthesis and spec-tral distortion optimizationrdquo Multimedia Tools and Applica-tions vol 67 no 2 pp 383ndash405 2013

[23] I Rec P 800Methods for Subjective Determination of Transmis-sion Quality International Telecommunication Union GenevaSwitzerland 1996

[24] M Steinebach F A P Petitcolas F Raynal et al ldquoStirMarkbenchmark audio watermarking attacksrdquo in Proceedings of theInternational Conference on Information Technology Codingand Computing IEEE 2001

[25] K Vivekananda Bhat I Sengupta and A Das ldquoAn audiowatermarking scheme using singular value decomposition anddither-modulation quantizationrdquo Multimedia Tools and Appli-cations vol 52 no 2-3 pp 369ndash383 2011

[26] R C Elandt-Johnson and N L Johnson Survival Models andData Analysis Wiley Classics Library John Wiley amp Sons NewYork NY USA 1999

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Security and Communication Networks 3

Even set

Odd set

X1 X2

Y1 Y2

S1 S2 S3 S4 SNminus1 SNmiddot middot middot

X(2timesi)

Y(2timesiminus1)

XN2

YN2

Figure 1 Formation of two odd and even subsets from the original speech signal

where Δ represents the quantization steps 119882119894 is thewatermark bit and119885119876 is the modified ratio of the Lp-norms between X and Y Choosing large quantizationsteps increases the robustness but results in lessimperceptibility and vice versa

(e) Apply the Lagrange method to optimize the values of1198831198762119894(f) Reposition the even and odd subsequences based on1198831198762119894 and Y respectively

(g) Rearrange the watermarked speech signal based onthe modified frames (119878119894)

Figure 2 shows the block diagram of the proposedembedding process

Extraction Process

(a) Segment the input watermarked speech signal () intodifferent frames (119878119894) with size N

(b) Form two subsequences and each of length1198732based on the even and odd indices of Si respectively

(c) Compute the Lp-norms 119871119883 and 119871119884 of both and subsequences respectively based on (1) and (2)respectively

(d) Extract the 119896th binary watermark data from the 119896thframe of the watermarked speech signal by selectingthe minimum Euclidean distance (nearest quantiza-tion step) from the ratio of 119885119896 = 119871119883119871119884 as follows

119896 = min(radic1198851198962 + 1198760 (119885119896)2 radic1198851198962 + 1198761 (119885119896)2) (9)

where 119876119887119896 is the quantization function while meetingthe requirements of watermark bits 119887119896 = 0 1

Figure 3 shows the block diagram of the proposedextraction process

3 Statistical Analysis ofthe Proposed Technique

Generally Laplacian distribution is the best distributionapproach for modeling speech signals within the frame rangeof 5ndash50ms [17 18] Laplacian distribution is expressed as

119891 (119909) = 1198872119890(minus119887|119909minus120583|) 119887 = 119871sum119871119894=1 1003816100381610038161003816119909119894 minus 1205831003816100381610038161003816 (10)

where 119871 is the sample size and 120583 is the mean of the randomvariables If the subsequences of 119883 and 119884 are considered asindependent identically distributed (iid) variables then thedistribution of each of them can be assumed to be Laplaciandistributions119883 = ∁L(120583119883 21198872119883) and119884 = ∁L(120583119884 21198872119884) respec-tively Based on (3) the ratio (119885) between X and Y shouldbe computed However the ratio between two Laplacian dis-tributions cannot be computed exactly because the mean andvariance are not actually finite in either theGaussian or Lapla-cian case The problem arises because the denominator hasnonzero density in the neighborhood of zero If the denomi-nator is bounded away from zero (immediately it no longerhas the ratio of two Laplacian distributions or two normals)then a Taylor expansion should converge to estimate theratio between two Laplacian distributions According toAppendix A the parameters of the ratio can be derived as fol-lows

120583119885 = 120583119883120583119884 (1 + 211988721198841205832119884 ) (11)

1205902119911 = 12058321198831205832119884 (211988721198841205832119884 minus 411988741198841205834119884 ) + 211988721198831205832119884 (12)

To estimate the embedding distortion quantization noise (Δ)should be considered between the original and watermarkedspeech signals as follows

119878119894 minus 119878119894 = (2119894 minus 2119894minus1) minus (1198832119894 minus 1198842119894minus1) (13)

As in (4) to (6) 2119894minus1 = 1198842119894minus1 thus (12) can be expressed as

119878119894 minus 119878119894 = 120582opt times 1198832119894 minus 1198832119894 = 1198832119894 times (120582opt minus 1) (14)

If 119885119876119894 = (119871119883 + 120576)119871119884 then 120582opt can be expressed as

120582opt = (119871119883 + 120576119871119884 ) times 119871119884119871119883 = (1 + 120576119871119883) (15)

4 Security and Communication Networks

X

Y

Lagrange optimization

Waterm

arked signalIn

put s

igna

l

Divided into oddeven

subsequences

Lp-norm(middot)

Rearrangement

Watermark

Framing

Lp-norm(middot)

Quantization(middot)Reposition the

oddeven subsequences

ZQ

LX

LY

Z =LX

LY

Figure 2 Block diagram of the proposed embedding process

X

Y

Minimum Euclideandistancedecoding

Wat

erm

arke

d sig

nal

Divided into oddeven

subsequences

Lp-norm(middot)

Framing

Lp-norm(middot)

Quantization(middot)

Quantization(middot)

Watermark

LX

LY Bit = 0

Bit = 1

Z =LX

LY

Figure 3 Block diagram of the proposed extraction process

Thus (13) can be approximately estimated by

119878119894 minus 119878119894 asymp (1 + 120576119871119883) times 1198832119894 (16)

Therefore the expected values of (13) can be estimated as

119864 [10038171003817100381710038171003817 minus 119878100381710038171003817100381710038172] cong 119864((1 + 120576119871119883)2)119864 (11988322119894)

= [119864( 11198712119883)119864 ((120576)2) + 119864 (120576) 119864( 2119871119883) + 1]119864 (11988322119894) (17)

If quantization noise (120576) is considered as a uniform distri-bution in [minusΔ2 Δ2] then 119864(120576) = 0 and 119864((120576)2) = Δ248Additionally as the mean value of the speech signal is con-sidered to be zero then the zero mean Laplacian distributionis used to model the speech signal as 119864(1198832119894) = 0 As a result(1198832119894 ) = 21198872119883119894 To model 119864(11198712119883) the absolute moment of theLaplacian distribution should be estimated using Appendix Bas follows

119864 (|119883|119875) = (1198901205831198871198871198992 ) [(minus1)119899 sdot 119868119899 + 119899] (18)

where 119868119899 = 119899119868119899minus1 and 119868119899 = int0minusinfin

119905119899119890minus119905119889119905 Thus we can derivethe mean and variance for the 119875th absolute moment of theLaplacian distribution as

119871119875119883 = 1198732sum119895=1

1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875 sim ∁L(120583119883119875 2 (120583119883(2119875) minus 1205832119883119875)119873 ) (19)

Now based on (1) and (19) we can compute 119864(11198712119883) =119864(1( 119875radic119871119875119883)2) = 120583119883(2119875) Therefore the signal-to-watermarkratio (SWR) can be estimated as

SWR = 119864 [1198782]119864 [10038171003817100381710038171003817 minus 119878100381710038171003817100381710038172]

cong 21198872119883 + 21198872119884((Δ248) times 120583119883(2119875) + 1) times (21198872119883) (20)

Because both119883 and119884 sets have been selected from the neigh-boring samples it can be assumed that 21198872119883 cong 21198872119884 As a result(20) can be expressed based on the quantization step as

Δ = radicminus2 (1 + 12 times SWR)SWR times 120583119883(2119875) (21)

To model the error probability it is assumed that the water-marked speech signal passes through anAWGNchannel withzero mean Gaussian noise N(0 1205902119899) Therefore (3) must berewritten as

= sum1198732119895=1 100381610038161003816100381610038161003816119883119895 + 119873119883119895 100381610038161003816100381610038161003816119875sum1198732119895=1 100381610038161003816100381610038161003816119884119895 + 119873119884119895 100381610038161003816100381610038161003816119875 (22)

where 119873119884119895 and 119873119883119895 correspond to the odd and even com-ponents of the AWGN respectively Because the term

Security and Communication Networks 5

sum1198732119895=1 |119883119895|119875 is a known parameter it is not possible to estimate using a chi-square with 119873 degrees of freedom 1206002(119873) To compute the distribution of it should be decomposed andestimated as

= sum1198732119895=1 (1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875 + 119875 1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875minus1119873119883119895 + (119875 (119875 minus 1) 2) 1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875minus21198732119883119895 + sdot sdot sdot + 119873119875119883119895)sum1198732119895=1 (1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875 + 119875 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875minus1119873119884119895 + (119875 (119875 minus 1) 2) 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875minus21198732119884119895 + sdot sdot sdot + 119873119875119884119895) (23)

Equation (23) can be expressed as

asymp Original 119885+ Noise⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞1205741 + 1205742 + 1205743 (24)

where each part of is estimated as follows

Original 119885 = sum1198732119895=1 1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875sum1198732119895=1 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875

1205741 = sum1198732119895=1 119875 1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875minus1119873119883119895sum1198732119895=1 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875

1205742 = minussum1198732119895=1 1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875sum1198732119895=1 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875 times sum1198732119895=1 119875 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875minus1119873119884119895sum1198732119895=1 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875

1205743 = sum1198732119895=1 119875 1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875minus1119873119883119895sum1198732119895=1 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875times sum1198732119895=1 119875 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875minus1119873119884119895sum1198732119895=1 119875 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875minus1119873119884119895

(25)

To estimate the probability of error the noise term can beanalyzed because it makes the original119885 into a wrong regionTherefore the distribution of each term of (24) can be esti-mated by the central limit theorem (CLT) because of thelarge number of samples in each block Regardless of thetype of original speech signal distribution and because ofthe independence between the signal and noise samples themean and variance of the noise can be computed as

120583Noise = 1205831205741 + 1205831205742 + 12058312057431205902Noise = 12059021205741 + 12059021205742 + 12059021205743 (26)

By assuming equal probabilities for both zero and one bitof the watermark data the probability of error for a fixedquantization step (Δ) can be estimated as

119875119890 = infinsum119894=1

12 Pr 119879(119894minus1)2 lt 119885119875 lt 119879(119894+1)2times infinsum119895=minuslfloor1198942rfloor

Pr 1198812119895+119894 lt 119875 lt 1198812119895+119894+1 (27)

A close-form solution for (27) is computed as

119875119890 = infinsum119894=1

(119876(119879119875(119894minus1)2minus120583119885120590119885 ) minus 119876(119879119875(119894+1)2minus120583119885120590119885 ))times infinsum119895=minuslfloor1198942rfloor

(119876(119881119875(119894+2119895)2minus120583119875120590

119875

)minus 119876(119881119875(119894+2119895+1)2minus120583

119875120590

119875

)) (28)

where 119876(sdot) is the complementary error function defined as119876(119909) = (1radic2120587) intinfin119909

119890minus11990622119889119906 119879119894 = 119894Δ 119881119894 = (1198791198942 + 119879(119894+1)2)2 and 120583119885 and 120590119885 can be computed as in (11) and (12)respectively

4 Discussion on the Experimental Results

To validate the performance of the developed watermark-ing technique a simulation was performed on the TIMITdatabase to verify the robustness imperceptibility and capac-ity of the technique The TIMIT database included 630speakers (438 males and 192 females) with sampling fre-quency 16 KHz [19] Each speaker pronounced 10 sentenceswhich are contained in 6300 sentences For the experimentalresults the average results of 630 speech signals with duration1 s to 3 s from 630 speakers were used

Figure 4 shows the bit error rate (BER)with respect to dif-ferent 119875 for various frame lengths underWatermark to NoiseRatio (WNR) = 40 dB In this figure each curve is plottedseparately in order to appear the changes As can be observedthe frame size was positively correlated with the BERWhen-ever the frame size decreased the BER increased Addition-ally it seems that 119875 was not highly correlated with the BERfor 119875 values greater than two Only a small fluctuation can beobserved for the BER when 119875 changed

Figure 5 shows the BER with respect to different 119875 forvarious quantization steps As expected whenever the quan-tization step increased the BER decreased Furthermore thevariation of 119875 did not seriously change the BER It mustbe mentioned that because of perfect watermark detectionunder clean conditions a small AWGN was induced on thewatermarked signals for the experiments shown in Figures 4and 5

Figure 6 shows the variation of the signal-to-noise ratio(SNR) with respect to different 119875 values for different framelengths There was not a significant difference in the SNRwhen the frame size increased As can be observed whenever

6 Security and Communication Networks

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

0045

005

0055

006Bi

t err

or ra

te (

)

2 3 4 5 6 71Lp-norm

(a)

00598

00598

00598

00598

00598

00598

00598

Bit e

rror

rate

()

2 3 4 5 6 71

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

Lp-norm

(b)

0057

00571

00572

00573

00574

00575

00576

00577

Bit e

rror

rate

()

2 3 4 5 6 71

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

Lp-norm

(c)

00495005

005050051

005150052

005250053

00535

Bit e

rror

rate

()

2 3 4 5 6 71

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

Lp-norm

(d)

00536

00538

0054

00542

00544

00546

00548

0055

Bit e

rror

rate

()

2 3 4 5 6 71Lp-norm

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

(e)

004550046

004650047

004750048

004850049

00495005

00505

Bit e

rror

rate

()

2 3 4 5 6 71

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

Lp-norm

(f)

Figure 4 (a) BER versus Lp-norms for different frame lengths under WNR = 40 dB (bndashf) each curve separately

the frame size increased the energy level between the twosets of 119871119883 and 119871119884 increased Consequently the ratio betweenthem increased which caused a lower SNR Additionally itseems that changing119875was not highly correlatedwith the SNRfor different frame lengths

Figure 7 illustrates different SNRswith respect to different119875 for various quantization steps As observed 119875 did nothighly affect the SNR However the quantization step highlyaffected the SNR As the quantization step increased the SNRdecreased

Security and Communication Networks 7

Δ = 095

Δ = 05

Δ = 025

Δ = 01 Δ = 075

2 3 4 5 6 71002

003

004

005

006

007

008

009

01

Bit e

rror

rate

()

Lp-norm

Figure 5 BER versus Lp-norms for different quantization steps under WNR = 40 dB

Frame rate is 40Frame rate is 100Frame rate is 200

Frame rate is 300Frame rate is 400

2 3 4 5 6 713593

359353594

359453595

359553596

359653597

359753598

SNR

(dB)

Lp-norm

Figure 6 SNR versus Lp-norms for different frame lengths

To compute the payload of the proposed watermark amemoryless binary symmetric channel (BSC) (119862BSC) definedas

119862BSC = 119877 times [1 + 119867 (119875119890)] (29)

where

119867(119875119890) = 119875119890 times log(119875119890)2 + (1 minus 119875119890) times log(1minus119875119890)2 (30)

was applied to estimate the capacity of the channel withbitrate (119877) for error-free watermark transmission [20]

Because the sampling rate of the TIMIT was 16KHz 119877was assumed to be 64Kbps (8 KHz for speech bandwidth times8 bits per sample = 64Kbps) for a telephony channel and119875119890 was assumed to be equal to the BER in the watermarkdetection process Figure 8 shows the amount of the BSC fordifferent WNRs for various quantization steps As observedthe capacity increased whenever the WNR increased This isbecause the watermark was extracted with a minimum BERwhen the WNR increased Moreover it can be inferred thatthe amount of the BSC increased while the quantization step

Different quantization rate

Δ = 095

Δ = 05

Δ = 025

Δ = 01 Δ = 075

2 3 4 5 6 7132343638404244464850

SNR

(dB)

Lp-norm

Figure 7 SNR versus Lp-norms for various quantization steps

increased because the watermark was embedded with highintensity when the quantization step increased As observedthe BSC capacity for fewer quantization steps (Δ le 025) wasapproximately zero under a high noisy channel

Figure 9 shows the variation of the BSC capacity withrespect to different WNRs for different frame lengths Asobserved it seems that under serious noise the frame sizewas not a significant factor for the BSC capacity Despite thisthe frame size was likely to be important whenever theWNRincreasedThus for a largeWNR it is obvious that wheneverthe frame size increased the BER in the watermark detectionprocess decreased which caused an improvement in the BSCcapacity

To demonstrate the efficiency and performance of theproposed speech watermarking technique the robustnesscapacity and inaudibility of the proposed technique must becompared with other state-of-the-art speech watermarkingtechniques

Table 1 describes the benchmark for simulating the resultsfor the robustness test Many of these attacks are based on theStirMark Benchmark for Audio (SMBA) [24]

8 Security and Communication Networks

Table 1 Benchmark for speech watermarking

Attack type Attack name Description Parameter(s) Defaultvalue(s)

Additive Noise

AddBrummIt adds buzz or low frequency sinustone to the watermarked signal to

simulate the impact of a power supply⟨STRENGTH⟩ ⟨FREQUENCY⟩ 2500 55 to

3000 75 A

AddDynNoise It adds a dynamic white noise to thewatermarked signal ⟨STRENGTH⟩ 20 to 40 B

AddFFTNoiseIt adds white noise to the

watermarked signal in the frequencydomain

⟨FFTSIZE⟩ ⟨STRENGTH⟩ 256 1000 to1024 3000 C

AddNoiseA white Gaussian noise is

contaminated the watermarked signalto simulate ambient distortion

⟨STRENGTH⟩ 35 dB level to5 dB D

AddSinus It adds a sinus signal to thewatermarked signal ⟨AMPLITUDE⟩ ⟨FREQUENCY⟩ 120 3000 to

150 3500 E

Conversion

Resampling

The sampling rate of the watermarkedsignal is converted to⟨SAMPLERATE1⟩ and then is

reconverted to ⟨SAMPLERATE2⟩⟨SAMPLERATE1⟩ ⟨SAMPLERATE2⟩ 4KHz 16 KHz

to8KHz 16 KHz

F

Requantization

The sample of the watermarked signalis quantized to ⟨QUANTIZATION1⟩

and then is requantized to⟨QUANTIZATION2⟩⟨QUANTIZATION1⟩ ⟨QUANTIZATION2⟩ 8 bits and 16

bits G

InvertIt inverts all samples in the

watermarked signal like a 180 degreephase shift

NO PARAMETER REQUIRED None H

Ambience EchoAn echo with a delay ⟨DELAY⟩ anddecay ⟨DECAY⟩ is added to the

watermarked signal⟨DELAY⟩ ⟨DECAY⟩ 20ms and 10

to 100ms and50

I

Samplepermutations

Cut samples⟨REMOVENUMBER⟩ samples are

removed from the watermarked signalfrom every ⟨REMOVEDIST⟩ period ⟨REMOVEDIST⟩ ⟨REMOVENUMBER⟩ 1 and 1000 to 7

and 1000 J

Copy samplesSome of the samples of the

watermarked signal are copiedbetween the samples values

⟨PERIOD⟩ ⟨COPYDIST⟩ ⟨COPYCOUNT⟩ 1000 100 30to

1000 200 60K

LSB Zero Set all samples of the watermarkedsignal to zero NO PARAMETER REQUIRED None L

SmoothThe new sample value depends on the

samples before and after themodifying point

NO PARAMETER REQUIRED None M

Stat1 It averages the sample with its nextneighbors NO PARAMETER REQUIRED None N

DynamicsAmplify

The amplitude of the watermarkedsignal is increased up to ⟨FACTOR1⟩

and is decreased down to⟨FACTOR2⟩ respectively⟨FACTOR1⟩ ⟨FACTOR2⟩ 150 and 75

200 and 50 O

Denoising The watermarked signal is denoisedby ⟨FACTOR⟩ ⟨FACTOR⟩ minus80 dB tominus60 dB P

Filters

Low Pass Filter(LPF)

The watermarked signal is filtered byan elliptic LPF with cutoff frequency

of ⟨FREQUENCY⟩ ⟨FREQUENCY⟩ 5KHz to4KHz Q

Band PassFilter (BPF)

The watermarked signal is filtered byan elliptic filter with bandwidth from⟨FREQUENCY1⟩ to⟨FREQUENCY2⟩ to simulate a

narrowband telephony channel

⟨FREQUENCY1⟩ ⟨FREQUENCY2⟩500Hz amp4000Hz to300Hz amp3400Hz

R

High PassFilter (HPF)

The watermarked signal is filtered byan elliptic HPF with cutoff frequency

of ⟨FREQUENCY⟩ ⟨FREQUENCY⟩ 500Hz to800Hz S

Security and Communication Networks 9

Table 1 Continued

Attack type Attack name Description Parameter(s) Defaultvalue(s)

Time stretchand pitch shift

Pitch scaleThe pitch of the watermarked signal isnonlinearly scaled without changing

the time⟨SCALEFACTOR⟩ 105 to 1 10 T

Time stretch The time of the watermarked signal isnonlinearly stretched ⟨TEMPOFACTOR⟩ 105 to 110 U

Compression

CELP coding

The watermarked signal is coddedwith rate of ⟨BITRATE⟩ by CELPcodecs and then is decoded to

original one

⟨BITRATE⟩ 16 Kbps to96 kbps V

MP3compression

The watermarked signal iscompressed by MP3 with different

rate ⟨BITRATE⟩ ⟨BITRATE⟩ 128 to 32 W

G711 The watermarked signal is codded bystandard 64 kbps A120583-law PCM NO PARAMETER REQUIRED None X

times104

Δ = 095

Δ = 05

Δ = 025

Δ = 01 Δ = 075

10 15 20 25 30 35 40 45 50 555WNR (dB)

0

1

2

3

4

5

6

7

CBS

C

Figure 8 Variation of the BSC capacity with respect to differentWNRs for different quantization steps

Table 2 compares the BER with state-of-the-art speechwatermarking techniques We implemented all the tech-niques and tested them for the entire TIMIT corpus underdifferent attacks As can be observed the proposed speechwatermarking technique has a lower BER overall comparedwith other techniques

The perceptual quality of the watermarked signal iscritical for the evaluation of the proposed watermarked tech-nique which can be measured based on the mean opinionscore (MOS) (as proposed by the International Telecommu-nicationsUnion (ITU-T) [23]) and SNRTheMOSuses a sub-jective evaluation technique to score the watermarked signalwhich is presented in Table 3 In theMOS evaluationmethod10 people were asked to listen blindly to the original andwatermarked signals Then they reported the dissimilaritiesbetween the quality of the original and watermarked speechsignalsThe average of these reports were computed for MOSmusic and MOS speech and presented in Table 4

times104

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

10 15 20 25 30 35 40 45 50 555WNR (dB)

0

1

2

3

4

5

6C

BSC

Figure 9 Variation of the BSC capacity with respect to differentWNRs for different frame lengths

An objective evaluation technique such as SWR andSNR attempts to quantify this amount based on the followingformula

SNR = 10 times log10sum119899 1198782sum119899 ( minus 119878)2 (31)

where 119878 and are the original and watermarked signalsrespectively

Table 4 presents a comparison of the proposed techniqueand other techniques in terms of imperceptibility and capac-ity Based on the results it seems that the proposed speechwatermarking technique outperformed the other techniquesin terms of capacity and imperceptibility Although the SNRfor formant tuning [21] is higher than the proposed tech-nique the capacity and robustness of the proposed techniqueare greater than those for formant tuning [21] and Analysis-by-Synthesis [22]

10 Security and Communication Networks

Table 2 Comparison with the robustness of different speech watermarking techniques in terms of BER ()

Attack The proposed method DWPT+ multiplication [14] Formant tuning [21] Analysis-by-Synthesis [22]No attack 000 000 004 006A 191ndash423 209ndash543 365ndash645 796ndash965B 965ndash2177 1045ndash2232 1276ndash2445 1623ndash2523C 1013ndash2043 1243ndash2132 1423ndash2354 1743ndash2632D 1053ndash1923 1033ndash1893 1163ndash2323 1533ndash2598E 032ndash202 0763ndash114 123ndash232 298ndash432F 1354ndash1723 1432ndash1765 2623ndash3783 2945ndash3306G 323 265 1932 2387H 023 000 943 1245I 134ndash465 234ndash511 465ndash1043 823ndash1643J 123ndash254 132ndash467 654ndash1054 1154ndash1887K 132ndash316 178ndash423 751ndash1034 1149ndash1943L 092 198 150 404M 312 576 1034 2168N 410 423 665 954O 121ndash254 000ndash143 597ndash876 898ndash1554P 100ndash354 243ndash543 965ndash1456 1965ndash2645Q 2143ndash2943 2454ndash3143 4054ndash4443 5009ndash5032R 484ndash954 532ndash1032 1665ndash2944 2054ndash3698S 1332ndash1854 1500ndash1943 2043ndash2923 2854ndash3076T 132ndash232 201ndash313 743ndash1043 965ndash1532U 015ndash023 018ndash043 145ndash321 432ndash543V 654ndash954 1143ndash1454 132ndash421 232ndash432W 1043ndash2034 1143ndash2534 3632ndash4565 3343ndash5032X 2311 2417 4832 5065Average 580ndash904 668ndash1004 1295ndash1739 1682ndash2148

Table 3 MOS grades [23]

MOS Quality Quality scale Effort required to understand meaning scale(5) Excellent Imperceptible No effort required(4) Good Perceptible but not annoying No appreciable effort required(3) Fair Slightly annoying Moderate effort required(2) Poor Annoying Considerable effort required(1) Bad Very annoying No meaning was understood

As observed in Table 4 each entity was bounded betweentwo values that related a particular value of imperceptibility(SNR andMOS) to a particular capacity Consequently whenthe capacity increased imperceptibility decreasedThe trade-off value is completely application dependent and should bedetermined by the user

5 Performance Analysis

Generally two types of errors false positive probability (FPP)and false negative probability (FNP)must always be analyzedto validate the security of a watermarking system [25] FPPis defined when an unwatermarked speech signal is declaredas a watermarked speech signal by the watermark extractorSimilarly FNP is defined when the watermarked speechsignal is declared as an unwatermarked speech signal by the

watermark extractor By assuming that the watermark bits areindependent random variables both the FPP and FNP can beformulated based on Bernoulli trials which is expressed asfollows

119875119890 = 119879minus1sum119894=0

(119873119894 )119875119894FN (1 minus 119875FN)(119873minus119894)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟FNP

+ 119873sum119894=119879

(119873119894 )119875119894FP (1 minus 119875FP)(119873minus119894)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟FPP

(32)

where119873 is the total number ofwatermark bits 119894 is the numberof matching bits (119873119894 ) is a binomial coefficient 119875FP is theprobability of a false positive which is assumed to be 05

Security and Communication Networks 11

Table 4 Comparison of various watermarking techniques in terms of payload and imperceptibility

Technique Quality scale Effort required to understand meaning scale SNR (dB) Theoretical payload (bps)Analysis-by-Synthesis [22] 401ndash380 476ndash395 2808ndash2532 3333ndash50Formant tuning [21] 498ndash432 500ndash455 3032ndash2754 3333ndash50DWPT+ multiplication [14] 432ndash310 500ndash355 3721ndash2008 3125ndash125The proposed method 487ndash365 500ndash405 4211ndash2071 40ndash400

times10minus4

BER = 021 is shifted by adding 000009BER = 020 is shifted by adding 0000015BER = 019

100 150 200 250 300 350 40050Number of watermark bits

0

1

False

pos

itive

pro

babi

lity

Figure 10 FPP with respect to various total number of watermarkbits for different BER

119875FN is the probability of a false negative which is assumedto be 00919 (as in Table 2) and 119879 is the threshold which iscomputed as follows

119879 = lceil(1 minus BER) times 119873rceil (33)

Figure 10 shows the FPP with respect to various totalnumber of watermark bits for different BER For bettervisualization each line was shifted by adding a constantAs observed the FPP was close to zero for 119873 greater than50 There was a small fluctuation for 119873 less than 50 whichdepended on the BER

Figure 11 shows the FNP with respect to various totalnumber of watermark bits for different BER For bettervisualization each line was shifted by adding a constant Ascan be observed the FNPwas close to zero for119873 greater than100 Additionally whenever the BER decreased the fluctua-tion increased

6 Conclusion and Future Work

In this paper a gain invariant speechwatermarking techniquewas developed using the Lagrange optimization method Forthis purpose samples of the signal were separated based onodd and even indices Then the ratio between the Lp-normswas quantized using the QIM method Finally the Lagrangemethod was used to estimate the optimized values In a sim-ilar manner the extraction process detected the watermarkdata blindly by finding the nearest quantization step

BER = 021 is shifted by adding 0025BER = 020 is shifted by adding 0015BER = 019

0

0005

001

0015

002

0025

003

0035

False

neg

ativ

e pro

babi

lity

100 150 200 250 300 350 40050Number of watermark bits

Figure 11 FNP with respect to various total number of watermarkbits for different BER

By assuming Laplacian distribution for the speech signaland Gaussian distribution for the noise signal the probabilityof error and watermarking distortion were modeled based ona statistical analysis of the proposed technique Additionallyexperimental results not only proved that the developedwatermarking technique was highly robust against differentattacks such compression AWGN filtering and resamplingbut also demonstrated the validity of the analytical modelFor future work an investigation on synchronization andadaptive quantization techniques might contribute to theproposed watermarking technique

Appendix

A Estimation of the Mean and Variance ofthe Ratio of Two Laplacian Variables Basedon Taylor Series

In [26] the bivariate second-order Taylor expansion for119891(119909 119910) around 120579 = (119864(119909) 119864(119910)) is expressed as follows

119891 (119909 119910) = 119891 (120579) + 1198911015840119909 (120579) (119909 minus 120579119909) + 1198911015840119910 (120579) (119910 minus 120579119910)+ 12 11989110158401015840119909119909 (120579) (119909 minus 120579119909)2+ 211989110158401015840119909119910 (120579) (119909 minus 120579119909) (119910 minus 120579119910) + 11989110158401015840119910119910 (120579) (119910 minus 120579119909)2+ remainder

(A1)

12 Security and Communication Networks

Therefore 119864[119891(119883 119884)] can be expanded about 120579 = (119864(119883)119864(119884)) to compute the approximate values as follows

119864 (119891 (119883 119884)) = 119891 (120579) + 12 11989110158401015840119909119909 (120579) var (119883)+ 211989110158401015840119909119910 (120579) cov (119883 119884) + 11989110158401015840119910119910 (120579) var (119884)+ 119874 (119899minus1)

(A2)

For 119891 = 119877119878 11989110158401015840119877119877 = 0 11989110158401015840119877119878 = minus119878minus2 and 11989110158401015840119878119878 = 21198771198783 Thenthe mean and variance of the ratio between 119877 and 119878(119864(119877119878))respectively can be estimated as follows

119864(119877119878 ) equiv 119864 (119891 (119877 119878))asymp 119864 (119877)119864 (119878) minus cov (119877 119878)119864 (119878)2 + var (119878) 119864 (119877)119864 (119878)3= 120583119877120583119878 (1 + 12059021198781205832119878)

var(119877119878 ) asymp 11198642119878 var (119877) + 2minus1198641198771198643119878 cov (119877 119878)+ 11986421198771198644119878 var (119878)

= 12058321198771205832119878 [12059021198771205832119877 minus 2cov (119877 119878)120583119877120583119878 + 12059021198781205832119878 ]

= 12058321198771205832119878 (12059021198781205832119878 minus

12059041198781205834119878) + 12059021198771205832119878

(A3)

B Compute the Absolute Moment ofthe Laplacian Distribution

Themoment of Laplacian distribution expressed as follows

119864 (|119883|119899) = intinfinminusinfin

|119883|119899 sdot 12119887 sdot 119890minus((119883minus120583)119887)119889119909= 12119887 intinfin

minusinfin|119883|119899 sdot 119890minus((119883minus120583)119887)119889119909 (B1)

There are two cases119883 ge 120583 and119883 lt 120583119864 (|119883|119899)

= If 119883 ge 120583 then 12119887 intinfin

minusinfin|119883|119899 sdot 119890minus((119883minus120583)119887)119889119909

If 119883 lt 120583 then 12119887 intinfinminusinfin

|119883|119899 sdot 119890minus((120583minus119883)119887)119889119909(B2)

For first case when119883 ge 120583119864 (|119883|119899) = 12119887 [int0

minusinfinminus119883119899 sdot 119890minus((119883minus120583)119887)119889119909

+ intinfin0

119883119899 sdot 119890minus((119883minus120583)119887)119889119909]

= 1198901205831198872119887 [[[[(minus1)119899 int0

minusinfin119883119899119890119883119887119889119909⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟119868119899

+ intinfin0

119883119899119890minus119883119887119889119909⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟119868

]]]](B3)

If 119905 = minus119883119887 then 119868 can be expressed as

119868 = 119887119899+1 intinfin0

119905119899119890minus119905119889119905 = 119887119899+1 sdot 119899 = 119899 (B4)

119868119899 can also be expressed as

119868119899 = int0minusinfin

119883119899119890119883119887119889119909 = int0minusinfin

(119887 sdot 119905)119899 119890minus119905 sdot 119887 sdot 119889119905= 119887119899+1 int0

minusinfin119905119899119890minus119905119889119905

= 119905119899119890minus119905minus11003816100381610038161003816100381610038161003816100381610038160

minusinfin

minus int0minusinfin

119899 sdot 119905119899minus1119890minus119905minus1 119889119905 = 0 + 119899119868119899minus1(B5)

Substituting (B4) and (B5) into (B3) the absolute momentof the Laplacian distribution can be computed based on

119864 (|119883|119899) = (1198901205831198871198871198992 ) [(minus1)119899 sdot 119868119899 + 119899] (B6)

Competing Interests

The authors declare that they have no competing interests

References

[1] M A Nematollahi C Vorakulpipat and H G Rosales DigitalWatermarking Techniques and Trends vol 11 Springer 2016

[2] MANematollahi and S A R Al-Haddad ldquoAn overview of dig-ital speech watermarkingrdquo International Journal of Speech Tech-nology vol 16 no 4 pp 471ndash488 2013

[3] H-T Hu and L-Y Hsu ldquoA DWT-based rational dither modu-lation scheme for effective blind audio watermarkingrdquo CircuitsSystems and Signal Processing vol 35 no 2 pp 553ndash572 2016

[4] B Chen and G W Wornell ldquoQuantization index modulationa class of provably good methods for digital watermarking andinformation embeddingrdquo Institute of Electrical and ElectronicsEngineers Transactions on InformationTheory vol 47 no 4 pp1423ndash1443 2001

[5] M A Nematollahi M A Akhaee S A R Al-Haddad andH Gamboa-Rosales ldquoSemi-fragile digital speech watermarkingfor online speaker recognitionrdquo Eurasip Journal on AudioSpeech andMusic Processing vol 2015 no 1 article no 31 2015

[6] P Guccione and M Scagliola ldquoHyperbolic RDM for nonlin-ear valumetric distortionsrdquo IEEE Transactions on InformationForensics and Security vol 4 no 1 pp 25ndash35 2009

[7] N Cai N Zhu S Weng and B Wing-Kuen Ling ldquoDifferenceangle quantization index modulation scheme for image water-markingrdquo Signal Processing Image Communication vol 34 pp52ndash60 2015

Security and Communication Networks 13

[8] X Zhu and S Peng ldquoA novel quantization watermarkingscheme by modulating the normalized correlationrdquo in Proceed-ings of the IEEE International Conference on Acoustics Speechand Signal Processing (ICASSP rsquo12) pp 1765ndash1768 IEEE KyotoJapan March 2012

[9] M A Akhaee S M E Sahraeian and C Jin ldquoBlind imagewatermarking using a sample projection approachrdquo IEEETrans-actions on Information Forensics and Security vol 6 no 3 pp883ndash893 2011

[10] N K Kalantari and S M Ahadi ldquoA logarithmic quantizationindex modulation for perceptually better data hidingrdquo IEEETransactions on Image Processing vol 19 no 6 pp 1504ndash15172010

[11] M Zareian andH R Tohidypour ldquoA novel gain invariant quan-tization-based watermarking approachrdquo IEEE Transactions onInformation Forensics and Security vol 9 no 11 pp 1804ndash18132014

[12] M Faundez-Zanuy M Hagmuller and G Kubin ldquoSpeakerverification security improvement by means of speech water-markingrdquo Speech Communication vol 48 no 12 pp 1608ndash16192006

[13] M Faundez-Zanuy M Hagmuller and G Kubin ldquoSpeakeridentification security improvement by means of speech water-markingrdquo Pattern Recognition vol 40 no 11 pp 3027ndash30342007

[14] M A Nematollahi H Gamboa-Rosales M A Akhaee andS A R Al-Haddad ldquoRobust digital speech watermarking foronline speaker recognitionrdquo Mathematical Problems in Engi-neering vol 2015 Article ID 372398 12 pages 2015

[15] M A Nematollahi H Gamboa-Rosales F J Martinez-Ruiz JI de la Rosa-Vargas S A R Al-Haddad and M EsmaeilpourldquoMulti-factor authentication model based on multipurposespeech watermarking and online speaker recognitionrdquo Multi-media Tools and Applications pp 1ndash31 2016

[16] M A Nematollahi S A R Al-Haddad S Doraisamy and HGamboa-Rosales ldquoSpeaker frame selection for digital speechwatermarkingrdquo National Academy Science Letters vol 39 no 3pp 197ndash201 2016

[17] S Gazor andW Zhang ldquoSpeech probability distributionrdquo IEEESignal Processing Letters vol 10 no 7 pp 204ndash207 2003

[18] M A Akhaee N Khademi Kalantari and F Marvasti ldquoRobustaudio and speech watermarking using Gaussian and Laplacianmodelingrdquo Signal Processing vol 90 no 8 pp 2487ndash2497 2010

[19] J S Garofolo and L D Consortium TIMIT Acoustic-PhoneticContinuous Speech Corpus Linguistic Data Consortium 1993

[20] S Verdu and T S Han ldquoA general formula for channel capacityrdquoIEEE Transactions on Information Theory vol 40 no 4 pp1147ndash1157 1994

[21] S Wang and M Unoki ldquoSpeech watermarking method basedon formant tuningrdquo IEICETransactions on Information and Sys-tems vol 98 no 1 pp 29ndash37 2015

[22] B Yan and Y-J Guo ldquoSpeech authentication by semi-fragilespeech watermarking utilizing analysis by synthesis and spec-tral distortion optimizationrdquo Multimedia Tools and Applica-tions vol 67 no 2 pp 383ndash405 2013

[23] I Rec P 800Methods for Subjective Determination of Transmis-sion Quality International Telecommunication Union GenevaSwitzerland 1996

[24] M Steinebach F A P Petitcolas F Raynal et al ldquoStirMarkbenchmark audio watermarking attacksrdquo in Proceedings of theInternational Conference on Information Technology Codingand Computing IEEE 2001

[25] K Vivekananda Bhat I Sengupta and A Das ldquoAn audiowatermarking scheme using singular value decomposition anddither-modulation quantizationrdquo Multimedia Tools and Appli-cations vol 52 no 2-3 pp 369ndash383 2011

[26] R C Elandt-Johnson and N L Johnson Survival Models andData Analysis Wiley Classics Library John Wiley amp Sons NewYork NY USA 1999

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

4 Security and Communication Networks

X

Y

Lagrange optimization

Waterm

arked signalIn

put s

igna

l

Divided into oddeven

subsequences

Lp-norm(middot)

Rearrangement

Watermark

Framing

Lp-norm(middot)

Quantization(middot)Reposition the

oddeven subsequences

ZQ

LX

LY

Z =LX

LY

Figure 2 Block diagram of the proposed embedding process

X

Y

Minimum Euclideandistancedecoding

Wat

erm

arke

d sig

nal

Divided into oddeven

subsequences

Lp-norm(middot)

Framing

Lp-norm(middot)

Quantization(middot)

Quantization(middot)

Watermark

LX

LY Bit = 0

Bit = 1

Z =LX

LY

Figure 3 Block diagram of the proposed extraction process

Thus (13) can be approximately estimated by

119878119894 minus 119878119894 asymp (1 + 120576119871119883) times 1198832119894 (16)

Therefore the expected values of (13) can be estimated as

119864 [10038171003817100381710038171003817 minus 119878100381710038171003817100381710038172] cong 119864((1 + 120576119871119883)2)119864 (11988322119894)

= [119864( 11198712119883)119864 ((120576)2) + 119864 (120576) 119864( 2119871119883) + 1]119864 (11988322119894) (17)

If quantization noise (120576) is considered as a uniform distri-bution in [minusΔ2 Δ2] then 119864(120576) = 0 and 119864((120576)2) = Δ248Additionally as the mean value of the speech signal is con-sidered to be zero then the zero mean Laplacian distributionis used to model the speech signal as 119864(1198832119894) = 0 As a result(1198832119894 ) = 21198872119883119894 To model 119864(11198712119883) the absolute moment of theLaplacian distribution should be estimated using Appendix Bas follows

119864 (|119883|119875) = (1198901205831198871198871198992 ) [(minus1)119899 sdot 119868119899 + 119899] (18)

where 119868119899 = 119899119868119899minus1 and 119868119899 = int0minusinfin

119905119899119890minus119905119889119905 Thus we can derivethe mean and variance for the 119875th absolute moment of theLaplacian distribution as

119871119875119883 = 1198732sum119895=1

1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875 sim ∁L(120583119883119875 2 (120583119883(2119875) minus 1205832119883119875)119873 ) (19)

Now based on (1) and (19) we can compute 119864(11198712119883) =119864(1( 119875radic119871119875119883)2) = 120583119883(2119875) Therefore the signal-to-watermarkratio (SWR) can be estimated as

SWR = 119864 [1198782]119864 [10038171003817100381710038171003817 minus 119878100381710038171003817100381710038172]

cong 21198872119883 + 21198872119884((Δ248) times 120583119883(2119875) + 1) times (21198872119883) (20)

Because both119883 and119884 sets have been selected from the neigh-boring samples it can be assumed that 21198872119883 cong 21198872119884 As a result(20) can be expressed based on the quantization step as

Δ = radicminus2 (1 + 12 times SWR)SWR times 120583119883(2119875) (21)

To model the error probability it is assumed that the water-marked speech signal passes through anAWGNchannel withzero mean Gaussian noise N(0 1205902119899) Therefore (3) must berewritten as

= sum1198732119895=1 100381610038161003816100381610038161003816119883119895 + 119873119883119895 100381610038161003816100381610038161003816119875sum1198732119895=1 100381610038161003816100381610038161003816119884119895 + 119873119884119895 100381610038161003816100381610038161003816119875 (22)

where 119873119884119895 and 119873119883119895 correspond to the odd and even com-ponents of the AWGN respectively Because the term

Security and Communication Networks 5

sum1198732119895=1 |119883119895|119875 is a known parameter it is not possible to estimate using a chi-square with 119873 degrees of freedom 1206002(119873) To compute the distribution of it should be decomposed andestimated as

= sum1198732119895=1 (1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875 + 119875 1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875minus1119873119883119895 + (119875 (119875 minus 1) 2) 1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875minus21198732119883119895 + sdot sdot sdot + 119873119875119883119895)sum1198732119895=1 (1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875 + 119875 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875minus1119873119884119895 + (119875 (119875 minus 1) 2) 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875minus21198732119884119895 + sdot sdot sdot + 119873119875119884119895) (23)

Equation (23) can be expressed as

asymp Original 119885+ Noise⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞1205741 + 1205742 + 1205743 (24)

where each part of is estimated as follows

Original 119885 = sum1198732119895=1 1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875sum1198732119895=1 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875

1205741 = sum1198732119895=1 119875 1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875minus1119873119883119895sum1198732119895=1 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875

1205742 = minussum1198732119895=1 1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875sum1198732119895=1 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875 times sum1198732119895=1 119875 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875minus1119873119884119895sum1198732119895=1 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875

1205743 = sum1198732119895=1 119875 1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875minus1119873119883119895sum1198732119895=1 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875times sum1198732119895=1 119875 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875minus1119873119884119895sum1198732119895=1 119875 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875minus1119873119884119895

(25)

To estimate the probability of error the noise term can beanalyzed because it makes the original119885 into a wrong regionTherefore the distribution of each term of (24) can be esti-mated by the central limit theorem (CLT) because of thelarge number of samples in each block Regardless of thetype of original speech signal distribution and because ofthe independence between the signal and noise samples themean and variance of the noise can be computed as

120583Noise = 1205831205741 + 1205831205742 + 12058312057431205902Noise = 12059021205741 + 12059021205742 + 12059021205743 (26)

By assuming equal probabilities for both zero and one bitof the watermark data the probability of error for a fixedquantization step (Δ) can be estimated as

119875119890 = infinsum119894=1

12 Pr 119879(119894minus1)2 lt 119885119875 lt 119879(119894+1)2times infinsum119895=minuslfloor1198942rfloor

Pr 1198812119895+119894 lt 119875 lt 1198812119895+119894+1 (27)

A close-form solution for (27) is computed as

119875119890 = infinsum119894=1

(119876(119879119875(119894minus1)2minus120583119885120590119885 ) minus 119876(119879119875(119894+1)2minus120583119885120590119885 ))times infinsum119895=minuslfloor1198942rfloor

(119876(119881119875(119894+2119895)2minus120583119875120590

119875

)minus 119876(119881119875(119894+2119895+1)2minus120583

119875120590

119875

)) (28)

where 119876(sdot) is the complementary error function defined as119876(119909) = (1radic2120587) intinfin119909

119890minus11990622119889119906 119879119894 = 119894Δ 119881119894 = (1198791198942 + 119879(119894+1)2)2 and 120583119885 and 120590119885 can be computed as in (11) and (12)respectively

4 Discussion on the Experimental Results

To validate the performance of the developed watermark-ing technique a simulation was performed on the TIMITdatabase to verify the robustness imperceptibility and capac-ity of the technique The TIMIT database included 630speakers (438 males and 192 females) with sampling fre-quency 16 KHz [19] Each speaker pronounced 10 sentenceswhich are contained in 6300 sentences For the experimentalresults the average results of 630 speech signals with duration1 s to 3 s from 630 speakers were used

Figure 4 shows the bit error rate (BER)with respect to dif-ferent 119875 for various frame lengths underWatermark to NoiseRatio (WNR) = 40 dB In this figure each curve is plottedseparately in order to appear the changes As can be observedthe frame size was positively correlated with the BERWhen-ever the frame size decreased the BER increased Addition-ally it seems that 119875 was not highly correlated with the BERfor 119875 values greater than two Only a small fluctuation can beobserved for the BER when 119875 changed

Figure 5 shows the BER with respect to different 119875 forvarious quantization steps As expected whenever the quan-tization step increased the BER decreased Furthermore thevariation of 119875 did not seriously change the BER It mustbe mentioned that because of perfect watermark detectionunder clean conditions a small AWGN was induced on thewatermarked signals for the experiments shown in Figures 4and 5

Figure 6 shows the variation of the signal-to-noise ratio(SNR) with respect to different 119875 values for different framelengths There was not a significant difference in the SNRwhen the frame size increased As can be observed whenever

6 Security and Communication Networks

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

0045

005

0055

006Bi

t err

or ra

te (

)

2 3 4 5 6 71Lp-norm

(a)

00598

00598

00598

00598

00598

00598

00598

Bit e

rror

rate

()

2 3 4 5 6 71

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

Lp-norm

(b)

0057

00571

00572

00573

00574

00575

00576

00577

Bit e

rror

rate

()

2 3 4 5 6 71

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

Lp-norm

(c)

00495005

005050051

005150052

005250053

00535

Bit e

rror

rate

()

2 3 4 5 6 71

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

Lp-norm

(d)

00536

00538

0054

00542

00544

00546

00548

0055

Bit e

rror

rate

()

2 3 4 5 6 71Lp-norm

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

(e)

004550046

004650047

004750048

004850049

00495005

00505

Bit e

rror

rate

()

2 3 4 5 6 71

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

Lp-norm

(f)

Figure 4 (a) BER versus Lp-norms for different frame lengths under WNR = 40 dB (bndashf) each curve separately

the frame size increased the energy level between the twosets of 119871119883 and 119871119884 increased Consequently the ratio betweenthem increased which caused a lower SNR Additionally itseems that changing119875was not highly correlatedwith the SNRfor different frame lengths

Figure 7 illustrates different SNRswith respect to different119875 for various quantization steps As observed 119875 did nothighly affect the SNR However the quantization step highlyaffected the SNR As the quantization step increased the SNRdecreased

Security and Communication Networks 7

Δ = 095

Δ = 05

Δ = 025

Δ = 01 Δ = 075

2 3 4 5 6 71002

003

004

005

006

007

008

009

01

Bit e

rror

rate

()

Lp-norm

Figure 5 BER versus Lp-norms for different quantization steps under WNR = 40 dB

Frame rate is 40Frame rate is 100Frame rate is 200

Frame rate is 300Frame rate is 400

2 3 4 5 6 713593

359353594

359453595

359553596

359653597

359753598

SNR

(dB)

Lp-norm

Figure 6 SNR versus Lp-norms for different frame lengths

To compute the payload of the proposed watermark amemoryless binary symmetric channel (BSC) (119862BSC) definedas

119862BSC = 119877 times [1 + 119867 (119875119890)] (29)

where

119867(119875119890) = 119875119890 times log(119875119890)2 + (1 minus 119875119890) times log(1minus119875119890)2 (30)

was applied to estimate the capacity of the channel withbitrate (119877) for error-free watermark transmission [20]

Because the sampling rate of the TIMIT was 16KHz 119877was assumed to be 64Kbps (8 KHz for speech bandwidth times8 bits per sample = 64Kbps) for a telephony channel and119875119890 was assumed to be equal to the BER in the watermarkdetection process Figure 8 shows the amount of the BSC fordifferent WNRs for various quantization steps As observedthe capacity increased whenever the WNR increased This isbecause the watermark was extracted with a minimum BERwhen the WNR increased Moreover it can be inferred thatthe amount of the BSC increased while the quantization step

Different quantization rate

Δ = 095

Δ = 05

Δ = 025

Δ = 01 Δ = 075

2 3 4 5 6 7132343638404244464850

SNR

(dB)

Lp-norm

Figure 7 SNR versus Lp-norms for various quantization steps

increased because the watermark was embedded with highintensity when the quantization step increased As observedthe BSC capacity for fewer quantization steps (Δ le 025) wasapproximately zero under a high noisy channel

Figure 9 shows the variation of the BSC capacity withrespect to different WNRs for different frame lengths Asobserved it seems that under serious noise the frame sizewas not a significant factor for the BSC capacity Despite thisthe frame size was likely to be important whenever theWNRincreasedThus for a largeWNR it is obvious that wheneverthe frame size increased the BER in the watermark detectionprocess decreased which caused an improvement in the BSCcapacity

To demonstrate the efficiency and performance of theproposed speech watermarking technique the robustnesscapacity and inaudibility of the proposed technique must becompared with other state-of-the-art speech watermarkingtechniques

Table 1 describes the benchmark for simulating the resultsfor the robustness test Many of these attacks are based on theStirMark Benchmark for Audio (SMBA) [24]

8 Security and Communication Networks

Table 1 Benchmark for speech watermarking

Attack type Attack name Description Parameter(s) Defaultvalue(s)

Additive Noise

AddBrummIt adds buzz or low frequency sinustone to the watermarked signal to

simulate the impact of a power supply⟨STRENGTH⟩ ⟨FREQUENCY⟩ 2500 55 to

3000 75 A

AddDynNoise It adds a dynamic white noise to thewatermarked signal ⟨STRENGTH⟩ 20 to 40 B

AddFFTNoiseIt adds white noise to the

watermarked signal in the frequencydomain

⟨FFTSIZE⟩ ⟨STRENGTH⟩ 256 1000 to1024 3000 C

AddNoiseA white Gaussian noise is

contaminated the watermarked signalto simulate ambient distortion

⟨STRENGTH⟩ 35 dB level to5 dB D

AddSinus It adds a sinus signal to thewatermarked signal ⟨AMPLITUDE⟩ ⟨FREQUENCY⟩ 120 3000 to

150 3500 E

Conversion

Resampling

The sampling rate of the watermarkedsignal is converted to⟨SAMPLERATE1⟩ and then is

reconverted to ⟨SAMPLERATE2⟩⟨SAMPLERATE1⟩ ⟨SAMPLERATE2⟩ 4KHz 16 KHz

to8KHz 16 KHz

F

Requantization

The sample of the watermarked signalis quantized to ⟨QUANTIZATION1⟩

and then is requantized to⟨QUANTIZATION2⟩⟨QUANTIZATION1⟩ ⟨QUANTIZATION2⟩ 8 bits and 16

bits G

InvertIt inverts all samples in the

watermarked signal like a 180 degreephase shift

NO PARAMETER REQUIRED None H

Ambience EchoAn echo with a delay ⟨DELAY⟩ anddecay ⟨DECAY⟩ is added to the

watermarked signal⟨DELAY⟩ ⟨DECAY⟩ 20ms and 10

to 100ms and50

I

Samplepermutations

Cut samples⟨REMOVENUMBER⟩ samples are

removed from the watermarked signalfrom every ⟨REMOVEDIST⟩ period ⟨REMOVEDIST⟩ ⟨REMOVENUMBER⟩ 1 and 1000 to 7

and 1000 J

Copy samplesSome of the samples of the

watermarked signal are copiedbetween the samples values

⟨PERIOD⟩ ⟨COPYDIST⟩ ⟨COPYCOUNT⟩ 1000 100 30to

1000 200 60K

LSB Zero Set all samples of the watermarkedsignal to zero NO PARAMETER REQUIRED None L

SmoothThe new sample value depends on the

samples before and after themodifying point

NO PARAMETER REQUIRED None M

Stat1 It averages the sample with its nextneighbors NO PARAMETER REQUIRED None N

DynamicsAmplify

The amplitude of the watermarkedsignal is increased up to ⟨FACTOR1⟩

and is decreased down to⟨FACTOR2⟩ respectively⟨FACTOR1⟩ ⟨FACTOR2⟩ 150 and 75

200 and 50 O

Denoising The watermarked signal is denoisedby ⟨FACTOR⟩ ⟨FACTOR⟩ minus80 dB tominus60 dB P

Filters

Low Pass Filter(LPF)

The watermarked signal is filtered byan elliptic LPF with cutoff frequency

of ⟨FREQUENCY⟩ ⟨FREQUENCY⟩ 5KHz to4KHz Q

Band PassFilter (BPF)

The watermarked signal is filtered byan elliptic filter with bandwidth from⟨FREQUENCY1⟩ to⟨FREQUENCY2⟩ to simulate a

narrowband telephony channel

⟨FREQUENCY1⟩ ⟨FREQUENCY2⟩500Hz amp4000Hz to300Hz amp3400Hz

R

High PassFilter (HPF)

The watermarked signal is filtered byan elliptic HPF with cutoff frequency

of ⟨FREQUENCY⟩ ⟨FREQUENCY⟩ 500Hz to800Hz S

Security and Communication Networks 9

Table 1 Continued

Attack type Attack name Description Parameter(s) Defaultvalue(s)

Time stretchand pitch shift

Pitch scaleThe pitch of the watermarked signal isnonlinearly scaled without changing

the time⟨SCALEFACTOR⟩ 105 to 1 10 T

Time stretch The time of the watermarked signal isnonlinearly stretched ⟨TEMPOFACTOR⟩ 105 to 110 U

Compression

CELP coding

The watermarked signal is coddedwith rate of ⟨BITRATE⟩ by CELPcodecs and then is decoded to

original one

⟨BITRATE⟩ 16 Kbps to96 kbps V

MP3compression

The watermarked signal iscompressed by MP3 with different

rate ⟨BITRATE⟩ ⟨BITRATE⟩ 128 to 32 W

G711 The watermarked signal is codded bystandard 64 kbps A120583-law PCM NO PARAMETER REQUIRED None X

times104

Δ = 095

Δ = 05

Δ = 025

Δ = 01 Δ = 075

10 15 20 25 30 35 40 45 50 555WNR (dB)

0

1

2

3

4

5

6

7

CBS

C

Figure 8 Variation of the BSC capacity with respect to differentWNRs for different quantization steps

Table 2 compares the BER with state-of-the-art speechwatermarking techniques We implemented all the tech-niques and tested them for the entire TIMIT corpus underdifferent attacks As can be observed the proposed speechwatermarking technique has a lower BER overall comparedwith other techniques

The perceptual quality of the watermarked signal iscritical for the evaluation of the proposed watermarked tech-nique which can be measured based on the mean opinionscore (MOS) (as proposed by the International Telecommu-nicationsUnion (ITU-T) [23]) and SNRTheMOSuses a sub-jective evaluation technique to score the watermarked signalwhich is presented in Table 3 In theMOS evaluationmethod10 people were asked to listen blindly to the original andwatermarked signals Then they reported the dissimilaritiesbetween the quality of the original and watermarked speechsignalsThe average of these reports were computed for MOSmusic and MOS speech and presented in Table 4

times104

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

10 15 20 25 30 35 40 45 50 555WNR (dB)

0

1

2

3

4

5

6C

BSC

Figure 9 Variation of the BSC capacity with respect to differentWNRs for different frame lengths

An objective evaluation technique such as SWR andSNR attempts to quantify this amount based on the followingformula

SNR = 10 times log10sum119899 1198782sum119899 ( minus 119878)2 (31)

where 119878 and are the original and watermarked signalsrespectively

Table 4 presents a comparison of the proposed techniqueand other techniques in terms of imperceptibility and capac-ity Based on the results it seems that the proposed speechwatermarking technique outperformed the other techniquesin terms of capacity and imperceptibility Although the SNRfor formant tuning [21] is higher than the proposed tech-nique the capacity and robustness of the proposed techniqueare greater than those for formant tuning [21] and Analysis-by-Synthesis [22]

10 Security and Communication Networks

Table 2 Comparison with the robustness of different speech watermarking techniques in terms of BER ()

Attack The proposed method DWPT+ multiplication [14] Formant tuning [21] Analysis-by-Synthesis [22]No attack 000 000 004 006A 191ndash423 209ndash543 365ndash645 796ndash965B 965ndash2177 1045ndash2232 1276ndash2445 1623ndash2523C 1013ndash2043 1243ndash2132 1423ndash2354 1743ndash2632D 1053ndash1923 1033ndash1893 1163ndash2323 1533ndash2598E 032ndash202 0763ndash114 123ndash232 298ndash432F 1354ndash1723 1432ndash1765 2623ndash3783 2945ndash3306G 323 265 1932 2387H 023 000 943 1245I 134ndash465 234ndash511 465ndash1043 823ndash1643J 123ndash254 132ndash467 654ndash1054 1154ndash1887K 132ndash316 178ndash423 751ndash1034 1149ndash1943L 092 198 150 404M 312 576 1034 2168N 410 423 665 954O 121ndash254 000ndash143 597ndash876 898ndash1554P 100ndash354 243ndash543 965ndash1456 1965ndash2645Q 2143ndash2943 2454ndash3143 4054ndash4443 5009ndash5032R 484ndash954 532ndash1032 1665ndash2944 2054ndash3698S 1332ndash1854 1500ndash1943 2043ndash2923 2854ndash3076T 132ndash232 201ndash313 743ndash1043 965ndash1532U 015ndash023 018ndash043 145ndash321 432ndash543V 654ndash954 1143ndash1454 132ndash421 232ndash432W 1043ndash2034 1143ndash2534 3632ndash4565 3343ndash5032X 2311 2417 4832 5065Average 580ndash904 668ndash1004 1295ndash1739 1682ndash2148

Table 3 MOS grades [23]

MOS Quality Quality scale Effort required to understand meaning scale(5) Excellent Imperceptible No effort required(4) Good Perceptible but not annoying No appreciable effort required(3) Fair Slightly annoying Moderate effort required(2) Poor Annoying Considerable effort required(1) Bad Very annoying No meaning was understood

As observed in Table 4 each entity was bounded betweentwo values that related a particular value of imperceptibility(SNR andMOS) to a particular capacity Consequently whenthe capacity increased imperceptibility decreasedThe trade-off value is completely application dependent and should bedetermined by the user

5 Performance Analysis

Generally two types of errors false positive probability (FPP)and false negative probability (FNP)must always be analyzedto validate the security of a watermarking system [25] FPPis defined when an unwatermarked speech signal is declaredas a watermarked speech signal by the watermark extractorSimilarly FNP is defined when the watermarked speechsignal is declared as an unwatermarked speech signal by the

watermark extractor By assuming that the watermark bits areindependent random variables both the FPP and FNP can beformulated based on Bernoulli trials which is expressed asfollows

119875119890 = 119879minus1sum119894=0

(119873119894 )119875119894FN (1 minus 119875FN)(119873minus119894)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟FNP

+ 119873sum119894=119879

(119873119894 )119875119894FP (1 minus 119875FP)(119873minus119894)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟FPP

(32)

where119873 is the total number ofwatermark bits 119894 is the numberof matching bits (119873119894 ) is a binomial coefficient 119875FP is theprobability of a false positive which is assumed to be 05

Security and Communication Networks 11

Table 4 Comparison of various watermarking techniques in terms of payload and imperceptibility

Technique Quality scale Effort required to understand meaning scale SNR (dB) Theoretical payload (bps)Analysis-by-Synthesis [22] 401ndash380 476ndash395 2808ndash2532 3333ndash50Formant tuning [21] 498ndash432 500ndash455 3032ndash2754 3333ndash50DWPT+ multiplication [14] 432ndash310 500ndash355 3721ndash2008 3125ndash125The proposed method 487ndash365 500ndash405 4211ndash2071 40ndash400

times10minus4

BER = 021 is shifted by adding 000009BER = 020 is shifted by adding 0000015BER = 019

100 150 200 250 300 350 40050Number of watermark bits

0

1

False

pos

itive

pro

babi

lity

Figure 10 FPP with respect to various total number of watermarkbits for different BER

119875FN is the probability of a false negative which is assumedto be 00919 (as in Table 2) and 119879 is the threshold which iscomputed as follows

119879 = lceil(1 minus BER) times 119873rceil (33)

Figure 10 shows the FPP with respect to various totalnumber of watermark bits for different BER For bettervisualization each line was shifted by adding a constantAs observed the FPP was close to zero for 119873 greater than50 There was a small fluctuation for 119873 less than 50 whichdepended on the BER

Figure 11 shows the FNP with respect to various totalnumber of watermark bits for different BER For bettervisualization each line was shifted by adding a constant Ascan be observed the FNPwas close to zero for119873 greater than100 Additionally whenever the BER decreased the fluctua-tion increased

6 Conclusion and Future Work

In this paper a gain invariant speechwatermarking techniquewas developed using the Lagrange optimization method Forthis purpose samples of the signal were separated based onodd and even indices Then the ratio between the Lp-normswas quantized using the QIM method Finally the Lagrangemethod was used to estimate the optimized values In a sim-ilar manner the extraction process detected the watermarkdata blindly by finding the nearest quantization step

BER = 021 is shifted by adding 0025BER = 020 is shifted by adding 0015BER = 019

0

0005

001

0015

002

0025

003

0035

False

neg

ativ

e pro

babi

lity

100 150 200 250 300 350 40050Number of watermark bits

Figure 11 FNP with respect to various total number of watermarkbits for different BER

By assuming Laplacian distribution for the speech signaland Gaussian distribution for the noise signal the probabilityof error and watermarking distortion were modeled based ona statistical analysis of the proposed technique Additionallyexperimental results not only proved that the developedwatermarking technique was highly robust against differentattacks such compression AWGN filtering and resamplingbut also demonstrated the validity of the analytical modelFor future work an investigation on synchronization andadaptive quantization techniques might contribute to theproposed watermarking technique

Appendix

A Estimation of the Mean and Variance ofthe Ratio of Two Laplacian Variables Basedon Taylor Series

In [26] the bivariate second-order Taylor expansion for119891(119909 119910) around 120579 = (119864(119909) 119864(119910)) is expressed as follows

119891 (119909 119910) = 119891 (120579) + 1198911015840119909 (120579) (119909 minus 120579119909) + 1198911015840119910 (120579) (119910 minus 120579119910)+ 12 11989110158401015840119909119909 (120579) (119909 minus 120579119909)2+ 211989110158401015840119909119910 (120579) (119909 minus 120579119909) (119910 minus 120579119910) + 11989110158401015840119910119910 (120579) (119910 minus 120579119909)2+ remainder

(A1)

12 Security and Communication Networks

Therefore 119864[119891(119883 119884)] can be expanded about 120579 = (119864(119883)119864(119884)) to compute the approximate values as follows

119864 (119891 (119883 119884)) = 119891 (120579) + 12 11989110158401015840119909119909 (120579) var (119883)+ 211989110158401015840119909119910 (120579) cov (119883 119884) + 11989110158401015840119910119910 (120579) var (119884)+ 119874 (119899minus1)

(A2)

For 119891 = 119877119878 11989110158401015840119877119877 = 0 11989110158401015840119877119878 = minus119878minus2 and 11989110158401015840119878119878 = 21198771198783 Thenthe mean and variance of the ratio between 119877 and 119878(119864(119877119878))respectively can be estimated as follows

119864(119877119878 ) equiv 119864 (119891 (119877 119878))asymp 119864 (119877)119864 (119878) minus cov (119877 119878)119864 (119878)2 + var (119878) 119864 (119877)119864 (119878)3= 120583119877120583119878 (1 + 12059021198781205832119878)

var(119877119878 ) asymp 11198642119878 var (119877) + 2minus1198641198771198643119878 cov (119877 119878)+ 11986421198771198644119878 var (119878)

= 12058321198771205832119878 [12059021198771205832119877 minus 2cov (119877 119878)120583119877120583119878 + 12059021198781205832119878 ]

= 12058321198771205832119878 (12059021198781205832119878 minus

12059041198781205834119878) + 12059021198771205832119878

(A3)

B Compute the Absolute Moment ofthe Laplacian Distribution

Themoment of Laplacian distribution expressed as follows

119864 (|119883|119899) = intinfinminusinfin

|119883|119899 sdot 12119887 sdot 119890minus((119883minus120583)119887)119889119909= 12119887 intinfin

minusinfin|119883|119899 sdot 119890minus((119883minus120583)119887)119889119909 (B1)

There are two cases119883 ge 120583 and119883 lt 120583119864 (|119883|119899)

= If 119883 ge 120583 then 12119887 intinfin

minusinfin|119883|119899 sdot 119890minus((119883minus120583)119887)119889119909

If 119883 lt 120583 then 12119887 intinfinminusinfin

|119883|119899 sdot 119890minus((120583minus119883)119887)119889119909(B2)

For first case when119883 ge 120583119864 (|119883|119899) = 12119887 [int0

minusinfinminus119883119899 sdot 119890minus((119883minus120583)119887)119889119909

+ intinfin0

119883119899 sdot 119890minus((119883minus120583)119887)119889119909]

= 1198901205831198872119887 [[[[(minus1)119899 int0

minusinfin119883119899119890119883119887119889119909⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟119868119899

+ intinfin0

119883119899119890minus119883119887119889119909⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟119868

]]]](B3)

If 119905 = minus119883119887 then 119868 can be expressed as

119868 = 119887119899+1 intinfin0

119905119899119890minus119905119889119905 = 119887119899+1 sdot 119899 = 119899 (B4)

119868119899 can also be expressed as

119868119899 = int0minusinfin

119883119899119890119883119887119889119909 = int0minusinfin

(119887 sdot 119905)119899 119890minus119905 sdot 119887 sdot 119889119905= 119887119899+1 int0

minusinfin119905119899119890minus119905119889119905

= 119905119899119890minus119905minus11003816100381610038161003816100381610038161003816100381610038160

minusinfin

minus int0minusinfin

119899 sdot 119905119899minus1119890minus119905minus1 119889119905 = 0 + 119899119868119899minus1(B5)

Substituting (B4) and (B5) into (B3) the absolute momentof the Laplacian distribution can be computed based on

119864 (|119883|119899) = (1198901205831198871198871198992 ) [(minus1)119899 sdot 119868119899 + 119899] (B6)

Competing Interests

The authors declare that they have no competing interests

References

[1] M A Nematollahi C Vorakulpipat and H G Rosales DigitalWatermarking Techniques and Trends vol 11 Springer 2016

[2] MANematollahi and S A R Al-Haddad ldquoAn overview of dig-ital speech watermarkingrdquo International Journal of Speech Tech-nology vol 16 no 4 pp 471ndash488 2013

[3] H-T Hu and L-Y Hsu ldquoA DWT-based rational dither modu-lation scheme for effective blind audio watermarkingrdquo CircuitsSystems and Signal Processing vol 35 no 2 pp 553ndash572 2016

[4] B Chen and G W Wornell ldquoQuantization index modulationa class of provably good methods for digital watermarking andinformation embeddingrdquo Institute of Electrical and ElectronicsEngineers Transactions on InformationTheory vol 47 no 4 pp1423ndash1443 2001

[5] M A Nematollahi M A Akhaee S A R Al-Haddad andH Gamboa-Rosales ldquoSemi-fragile digital speech watermarkingfor online speaker recognitionrdquo Eurasip Journal on AudioSpeech andMusic Processing vol 2015 no 1 article no 31 2015

[6] P Guccione and M Scagliola ldquoHyperbolic RDM for nonlin-ear valumetric distortionsrdquo IEEE Transactions on InformationForensics and Security vol 4 no 1 pp 25ndash35 2009

[7] N Cai N Zhu S Weng and B Wing-Kuen Ling ldquoDifferenceangle quantization index modulation scheme for image water-markingrdquo Signal Processing Image Communication vol 34 pp52ndash60 2015

Security and Communication Networks 13

[8] X Zhu and S Peng ldquoA novel quantization watermarkingscheme by modulating the normalized correlationrdquo in Proceed-ings of the IEEE International Conference on Acoustics Speechand Signal Processing (ICASSP rsquo12) pp 1765ndash1768 IEEE KyotoJapan March 2012

[9] M A Akhaee S M E Sahraeian and C Jin ldquoBlind imagewatermarking using a sample projection approachrdquo IEEETrans-actions on Information Forensics and Security vol 6 no 3 pp883ndash893 2011

[10] N K Kalantari and S M Ahadi ldquoA logarithmic quantizationindex modulation for perceptually better data hidingrdquo IEEETransactions on Image Processing vol 19 no 6 pp 1504ndash15172010

[11] M Zareian andH R Tohidypour ldquoA novel gain invariant quan-tization-based watermarking approachrdquo IEEE Transactions onInformation Forensics and Security vol 9 no 11 pp 1804ndash18132014

[12] M Faundez-Zanuy M Hagmuller and G Kubin ldquoSpeakerverification security improvement by means of speech water-markingrdquo Speech Communication vol 48 no 12 pp 1608ndash16192006

[13] M Faundez-Zanuy M Hagmuller and G Kubin ldquoSpeakeridentification security improvement by means of speech water-markingrdquo Pattern Recognition vol 40 no 11 pp 3027ndash30342007

[14] M A Nematollahi H Gamboa-Rosales M A Akhaee andS A R Al-Haddad ldquoRobust digital speech watermarking foronline speaker recognitionrdquo Mathematical Problems in Engi-neering vol 2015 Article ID 372398 12 pages 2015

[15] M A Nematollahi H Gamboa-Rosales F J Martinez-Ruiz JI de la Rosa-Vargas S A R Al-Haddad and M EsmaeilpourldquoMulti-factor authentication model based on multipurposespeech watermarking and online speaker recognitionrdquo Multi-media Tools and Applications pp 1ndash31 2016

[16] M A Nematollahi S A R Al-Haddad S Doraisamy and HGamboa-Rosales ldquoSpeaker frame selection for digital speechwatermarkingrdquo National Academy Science Letters vol 39 no 3pp 197ndash201 2016

[17] S Gazor andW Zhang ldquoSpeech probability distributionrdquo IEEESignal Processing Letters vol 10 no 7 pp 204ndash207 2003

[18] M A Akhaee N Khademi Kalantari and F Marvasti ldquoRobustaudio and speech watermarking using Gaussian and Laplacianmodelingrdquo Signal Processing vol 90 no 8 pp 2487ndash2497 2010

[19] J S Garofolo and L D Consortium TIMIT Acoustic-PhoneticContinuous Speech Corpus Linguistic Data Consortium 1993

[20] S Verdu and T S Han ldquoA general formula for channel capacityrdquoIEEE Transactions on Information Theory vol 40 no 4 pp1147ndash1157 1994

[21] S Wang and M Unoki ldquoSpeech watermarking method basedon formant tuningrdquo IEICETransactions on Information and Sys-tems vol 98 no 1 pp 29ndash37 2015

[22] B Yan and Y-J Guo ldquoSpeech authentication by semi-fragilespeech watermarking utilizing analysis by synthesis and spec-tral distortion optimizationrdquo Multimedia Tools and Applica-tions vol 67 no 2 pp 383ndash405 2013

[23] I Rec P 800Methods for Subjective Determination of Transmis-sion Quality International Telecommunication Union GenevaSwitzerland 1996

[24] M Steinebach F A P Petitcolas F Raynal et al ldquoStirMarkbenchmark audio watermarking attacksrdquo in Proceedings of theInternational Conference on Information Technology Codingand Computing IEEE 2001

[25] K Vivekananda Bhat I Sengupta and A Das ldquoAn audiowatermarking scheme using singular value decomposition anddither-modulation quantizationrdquo Multimedia Tools and Appli-cations vol 52 no 2-3 pp 369ndash383 2011

[26] R C Elandt-Johnson and N L Johnson Survival Models andData Analysis Wiley Classics Library John Wiley amp Sons NewYork NY USA 1999

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Security and Communication Networks 5

sum1198732119895=1 |119883119895|119875 is a known parameter it is not possible to estimate using a chi-square with 119873 degrees of freedom 1206002(119873) To compute the distribution of it should be decomposed andestimated as

= sum1198732119895=1 (1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875 + 119875 1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875minus1119873119883119895 + (119875 (119875 minus 1) 2) 1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875minus21198732119883119895 + sdot sdot sdot + 119873119875119883119895)sum1198732119895=1 (1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875 + 119875 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875minus1119873119884119895 + (119875 (119875 minus 1) 2) 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875minus21198732119884119895 + sdot sdot sdot + 119873119875119884119895) (23)

Equation (23) can be expressed as

asymp Original 119885+ Noise⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞1205741 + 1205742 + 1205743 (24)

where each part of is estimated as follows

Original 119885 = sum1198732119895=1 1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875sum1198732119895=1 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875

1205741 = sum1198732119895=1 119875 1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875minus1119873119883119895sum1198732119895=1 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875

1205742 = minussum1198732119895=1 1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875sum1198732119895=1 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875 times sum1198732119895=1 119875 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875minus1119873119884119895sum1198732119895=1 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875

1205743 = sum1198732119895=1 119875 1003816100381610038161003816100381611988311989510038161003816100381610038161003816119875minus1119873119883119895sum1198732119895=1 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875times sum1198732119895=1 119875 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875minus1119873119884119895sum1198732119895=1 119875 1003816100381610038161003816100381611988411989510038161003816100381610038161003816119875minus1119873119884119895

(25)

To estimate the probability of error the noise term can beanalyzed because it makes the original119885 into a wrong regionTherefore the distribution of each term of (24) can be esti-mated by the central limit theorem (CLT) because of thelarge number of samples in each block Regardless of thetype of original speech signal distribution and because ofthe independence between the signal and noise samples themean and variance of the noise can be computed as

120583Noise = 1205831205741 + 1205831205742 + 12058312057431205902Noise = 12059021205741 + 12059021205742 + 12059021205743 (26)

By assuming equal probabilities for both zero and one bitof the watermark data the probability of error for a fixedquantization step (Δ) can be estimated as

119875119890 = infinsum119894=1

12 Pr 119879(119894minus1)2 lt 119885119875 lt 119879(119894+1)2times infinsum119895=minuslfloor1198942rfloor

Pr 1198812119895+119894 lt 119875 lt 1198812119895+119894+1 (27)

A close-form solution for (27) is computed as

119875119890 = infinsum119894=1

(119876(119879119875(119894minus1)2minus120583119885120590119885 ) minus 119876(119879119875(119894+1)2minus120583119885120590119885 ))times infinsum119895=minuslfloor1198942rfloor

(119876(119881119875(119894+2119895)2minus120583119875120590

119875

)minus 119876(119881119875(119894+2119895+1)2minus120583

119875120590

119875

)) (28)

where 119876(sdot) is the complementary error function defined as119876(119909) = (1radic2120587) intinfin119909

119890minus11990622119889119906 119879119894 = 119894Δ 119881119894 = (1198791198942 + 119879(119894+1)2)2 and 120583119885 and 120590119885 can be computed as in (11) and (12)respectively

4 Discussion on the Experimental Results

To validate the performance of the developed watermark-ing technique a simulation was performed on the TIMITdatabase to verify the robustness imperceptibility and capac-ity of the technique The TIMIT database included 630speakers (438 males and 192 females) with sampling fre-quency 16 KHz [19] Each speaker pronounced 10 sentenceswhich are contained in 6300 sentences For the experimentalresults the average results of 630 speech signals with duration1 s to 3 s from 630 speakers were used

Figure 4 shows the bit error rate (BER)with respect to dif-ferent 119875 for various frame lengths underWatermark to NoiseRatio (WNR) = 40 dB In this figure each curve is plottedseparately in order to appear the changes As can be observedthe frame size was positively correlated with the BERWhen-ever the frame size decreased the BER increased Addition-ally it seems that 119875 was not highly correlated with the BERfor 119875 values greater than two Only a small fluctuation can beobserved for the BER when 119875 changed

Figure 5 shows the BER with respect to different 119875 forvarious quantization steps As expected whenever the quan-tization step increased the BER decreased Furthermore thevariation of 119875 did not seriously change the BER It mustbe mentioned that because of perfect watermark detectionunder clean conditions a small AWGN was induced on thewatermarked signals for the experiments shown in Figures 4and 5

Figure 6 shows the variation of the signal-to-noise ratio(SNR) with respect to different 119875 values for different framelengths There was not a significant difference in the SNRwhen the frame size increased As can be observed whenever

6 Security and Communication Networks

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

0045

005

0055

006Bi

t err

or ra

te (

)

2 3 4 5 6 71Lp-norm

(a)

00598

00598

00598

00598

00598

00598

00598

Bit e

rror

rate

()

2 3 4 5 6 71

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

Lp-norm

(b)

0057

00571

00572

00573

00574

00575

00576

00577

Bit e

rror

rate

()

2 3 4 5 6 71

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

Lp-norm

(c)

00495005

005050051

005150052

005250053

00535

Bit e

rror

rate

()

2 3 4 5 6 71

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

Lp-norm

(d)

00536

00538

0054

00542

00544

00546

00548

0055

Bit e

rror

rate

()

2 3 4 5 6 71Lp-norm

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

(e)

004550046

004650047

004750048

004850049

00495005

00505

Bit e

rror

rate

()

2 3 4 5 6 71

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

Lp-norm

(f)

Figure 4 (a) BER versus Lp-norms for different frame lengths under WNR = 40 dB (bndashf) each curve separately

the frame size increased the energy level between the twosets of 119871119883 and 119871119884 increased Consequently the ratio betweenthem increased which caused a lower SNR Additionally itseems that changing119875was not highly correlatedwith the SNRfor different frame lengths

Figure 7 illustrates different SNRswith respect to different119875 for various quantization steps As observed 119875 did nothighly affect the SNR However the quantization step highlyaffected the SNR As the quantization step increased the SNRdecreased

Security and Communication Networks 7

Δ = 095

Δ = 05

Δ = 025

Δ = 01 Δ = 075

2 3 4 5 6 71002

003

004

005

006

007

008

009

01

Bit e

rror

rate

()

Lp-norm

Figure 5 BER versus Lp-norms for different quantization steps under WNR = 40 dB

Frame rate is 40Frame rate is 100Frame rate is 200

Frame rate is 300Frame rate is 400

2 3 4 5 6 713593

359353594

359453595

359553596

359653597

359753598

SNR

(dB)

Lp-norm

Figure 6 SNR versus Lp-norms for different frame lengths

To compute the payload of the proposed watermark amemoryless binary symmetric channel (BSC) (119862BSC) definedas

119862BSC = 119877 times [1 + 119867 (119875119890)] (29)

where

119867(119875119890) = 119875119890 times log(119875119890)2 + (1 minus 119875119890) times log(1minus119875119890)2 (30)

was applied to estimate the capacity of the channel withbitrate (119877) for error-free watermark transmission [20]

Because the sampling rate of the TIMIT was 16KHz 119877was assumed to be 64Kbps (8 KHz for speech bandwidth times8 bits per sample = 64Kbps) for a telephony channel and119875119890 was assumed to be equal to the BER in the watermarkdetection process Figure 8 shows the amount of the BSC fordifferent WNRs for various quantization steps As observedthe capacity increased whenever the WNR increased This isbecause the watermark was extracted with a minimum BERwhen the WNR increased Moreover it can be inferred thatthe amount of the BSC increased while the quantization step

Different quantization rate

Δ = 095

Δ = 05

Δ = 025

Δ = 01 Δ = 075

2 3 4 5 6 7132343638404244464850

SNR

(dB)

Lp-norm

Figure 7 SNR versus Lp-norms for various quantization steps

increased because the watermark was embedded with highintensity when the quantization step increased As observedthe BSC capacity for fewer quantization steps (Δ le 025) wasapproximately zero under a high noisy channel

Figure 9 shows the variation of the BSC capacity withrespect to different WNRs for different frame lengths Asobserved it seems that under serious noise the frame sizewas not a significant factor for the BSC capacity Despite thisthe frame size was likely to be important whenever theWNRincreasedThus for a largeWNR it is obvious that wheneverthe frame size increased the BER in the watermark detectionprocess decreased which caused an improvement in the BSCcapacity

To demonstrate the efficiency and performance of theproposed speech watermarking technique the robustnesscapacity and inaudibility of the proposed technique must becompared with other state-of-the-art speech watermarkingtechniques

Table 1 describes the benchmark for simulating the resultsfor the robustness test Many of these attacks are based on theStirMark Benchmark for Audio (SMBA) [24]

8 Security and Communication Networks

Table 1 Benchmark for speech watermarking

Attack type Attack name Description Parameter(s) Defaultvalue(s)

Additive Noise

AddBrummIt adds buzz or low frequency sinustone to the watermarked signal to

simulate the impact of a power supply⟨STRENGTH⟩ ⟨FREQUENCY⟩ 2500 55 to

3000 75 A

AddDynNoise It adds a dynamic white noise to thewatermarked signal ⟨STRENGTH⟩ 20 to 40 B

AddFFTNoiseIt adds white noise to the

watermarked signal in the frequencydomain

⟨FFTSIZE⟩ ⟨STRENGTH⟩ 256 1000 to1024 3000 C

AddNoiseA white Gaussian noise is

contaminated the watermarked signalto simulate ambient distortion

⟨STRENGTH⟩ 35 dB level to5 dB D

AddSinus It adds a sinus signal to thewatermarked signal ⟨AMPLITUDE⟩ ⟨FREQUENCY⟩ 120 3000 to

150 3500 E

Conversion

Resampling

The sampling rate of the watermarkedsignal is converted to⟨SAMPLERATE1⟩ and then is

reconverted to ⟨SAMPLERATE2⟩⟨SAMPLERATE1⟩ ⟨SAMPLERATE2⟩ 4KHz 16 KHz

to8KHz 16 KHz

F

Requantization

The sample of the watermarked signalis quantized to ⟨QUANTIZATION1⟩

and then is requantized to⟨QUANTIZATION2⟩⟨QUANTIZATION1⟩ ⟨QUANTIZATION2⟩ 8 bits and 16

bits G

InvertIt inverts all samples in the

watermarked signal like a 180 degreephase shift

NO PARAMETER REQUIRED None H

Ambience EchoAn echo with a delay ⟨DELAY⟩ anddecay ⟨DECAY⟩ is added to the

watermarked signal⟨DELAY⟩ ⟨DECAY⟩ 20ms and 10

to 100ms and50

I

Samplepermutations

Cut samples⟨REMOVENUMBER⟩ samples are

removed from the watermarked signalfrom every ⟨REMOVEDIST⟩ period ⟨REMOVEDIST⟩ ⟨REMOVENUMBER⟩ 1 and 1000 to 7

and 1000 J

Copy samplesSome of the samples of the

watermarked signal are copiedbetween the samples values

⟨PERIOD⟩ ⟨COPYDIST⟩ ⟨COPYCOUNT⟩ 1000 100 30to

1000 200 60K

LSB Zero Set all samples of the watermarkedsignal to zero NO PARAMETER REQUIRED None L

SmoothThe new sample value depends on the

samples before and after themodifying point

NO PARAMETER REQUIRED None M

Stat1 It averages the sample with its nextneighbors NO PARAMETER REQUIRED None N

DynamicsAmplify

The amplitude of the watermarkedsignal is increased up to ⟨FACTOR1⟩

and is decreased down to⟨FACTOR2⟩ respectively⟨FACTOR1⟩ ⟨FACTOR2⟩ 150 and 75

200 and 50 O

Denoising The watermarked signal is denoisedby ⟨FACTOR⟩ ⟨FACTOR⟩ minus80 dB tominus60 dB P

Filters

Low Pass Filter(LPF)

The watermarked signal is filtered byan elliptic LPF with cutoff frequency

of ⟨FREQUENCY⟩ ⟨FREQUENCY⟩ 5KHz to4KHz Q

Band PassFilter (BPF)

The watermarked signal is filtered byan elliptic filter with bandwidth from⟨FREQUENCY1⟩ to⟨FREQUENCY2⟩ to simulate a

narrowband telephony channel

⟨FREQUENCY1⟩ ⟨FREQUENCY2⟩500Hz amp4000Hz to300Hz amp3400Hz

R

High PassFilter (HPF)

The watermarked signal is filtered byan elliptic HPF with cutoff frequency

of ⟨FREQUENCY⟩ ⟨FREQUENCY⟩ 500Hz to800Hz S

Security and Communication Networks 9

Table 1 Continued

Attack type Attack name Description Parameter(s) Defaultvalue(s)

Time stretchand pitch shift

Pitch scaleThe pitch of the watermarked signal isnonlinearly scaled without changing

the time⟨SCALEFACTOR⟩ 105 to 1 10 T

Time stretch The time of the watermarked signal isnonlinearly stretched ⟨TEMPOFACTOR⟩ 105 to 110 U

Compression

CELP coding

The watermarked signal is coddedwith rate of ⟨BITRATE⟩ by CELPcodecs and then is decoded to

original one

⟨BITRATE⟩ 16 Kbps to96 kbps V

MP3compression

The watermarked signal iscompressed by MP3 with different

rate ⟨BITRATE⟩ ⟨BITRATE⟩ 128 to 32 W

G711 The watermarked signal is codded bystandard 64 kbps A120583-law PCM NO PARAMETER REQUIRED None X

times104

Δ = 095

Δ = 05

Δ = 025

Δ = 01 Δ = 075

10 15 20 25 30 35 40 45 50 555WNR (dB)

0

1

2

3

4

5

6

7

CBS

C

Figure 8 Variation of the BSC capacity with respect to differentWNRs for different quantization steps

Table 2 compares the BER with state-of-the-art speechwatermarking techniques We implemented all the tech-niques and tested them for the entire TIMIT corpus underdifferent attacks As can be observed the proposed speechwatermarking technique has a lower BER overall comparedwith other techniques

The perceptual quality of the watermarked signal iscritical for the evaluation of the proposed watermarked tech-nique which can be measured based on the mean opinionscore (MOS) (as proposed by the International Telecommu-nicationsUnion (ITU-T) [23]) and SNRTheMOSuses a sub-jective evaluation technique to score the watermarked signalwhich is presented in Table 3 In theMOS evaluationmethod10 people were asked to listen blindly to the original andwatermarked signals Then they reported the dissimilaritiesbetween the quality of the original and watermarked speechsignalsThe average of these reports were computed for MOSmusic and MOS speech and presented in Table 4

times104

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

10 15 20 25 30 35 40 45 50 555WNR (dB)

0

1

2

3

4

5

6C

BSC

Figure 9 Variation of the BSC capacity with respect to differentWNRs for different frame lengths

An objective evaluation technique such as SWR andSNR attempts to quantify this amount based on the followingformula

SNR = 10 times log10sum119899 1198782sum119899 ( minus 119878)2 (31)

where 119878 and are the original and watermarked signalsrespectively

Table 4 presents a comparison of the proposed techniqueand other techniques in terms of imperceptibility and capac-ity Based on the results it seems that the proposed speechwatermarking technique outperformed the other techniquesin terms of capacity and imperceptibility Although the SNRfor formant tuning [21] is higher than the proposed tech-nique the capacity and robustness of the proposed techniqueare greater than those for formant tuning [21] and Analysis-by-Synthesis [22]

10 Security and Communication Networks

Table 2 Comparison with the robustness of different speech watermarking techniques in terms of BER ()

Attack The proposed method DWPT+ multiplication [14] Formant tuning [21] Analysis-by-Synthesis [22]No attack 000 000 004 006A 191ndash423 209ndash543 365ndash645 796ndash965B 965ndash2177 1045ndash2232 1276ndash2445 1623ndash2523C 1013ndash2043 1243ndash2132 1423ndash2354 1743ndash2632D 1053ndash1923 1033ndash1893 1163ndash2323 1533ndash2598E 032ndash202 0763ndash114 123ndash232 298ndash432F 1354ndash1723 1432ndash1765 2623ndash3783 2945ndash3306G 323 265 1932 2387H 023 000 943 1245I 134ndash465 234ndash511 465ndash1043 823ndash1643J 123ndash254 132ndash467 654ndash1054 1154ndash1887K 132ndash316 178ndash423 751ndash1034 1149ndash1943L 092 198 150 404M 312 576 1034 2168N 410 423 665 954O 121ndash254 000ndash143 597ndash876 898ndash1554P 100ndash354 243ndash543 965ndash1456 1965ndash2645Q 2143ndash2943 2454ndash3143 4054ndash4443 5009ndash5032R 484ndash954 532ndash1032 1665ndash2944 2054ndash3698S 1332ndash1854 1500ndash1943 2043ndash2923 2854ndash3076T 132ndash232 201ndash313 743ndash1043 965ndash1532U 015ndash023 018ndash043 145ndash321 432ndash543V 654ndash954 1143ndash1454 132ndash421 232ndash432W 1043ndash2034 1143ndash2534 3632ndash4565 3343ndash5032X 2311 2417 4832 5065Average 580ndash904 668ndash1004 1295ndash1739 1682ndash2148

Table 3 MOS grades [23]

MOS Quality Quality scale Effort required to understand meaning scale(5) Excellent Imperceptible No effort required(4) Good Perceptible but not annoying No appreciable effort required(3) Fair Slightly annoying Moderate effort required(2) Poor Annoying Considerable effort required(1) Bad Very annoying No meaning was understood

As observed in Table 4 each entity was bounded betweentwo values that related a particular value of imperceptibility(SNR andMOS) to a particular capacity Consequently whenthe capacity increased imperceptibility decreasedThe trade-off value is completely application dependent and should bedetermined by the user

5 Performance Analysis

Generally two types of errors false positive probability (FPP)and false negative probability (FNP)must always be analyzedto validate the security of a watermarking system [25] FPPis defined when an unwatermarked speech signal is declaredas a watermarked speech signal by the watermark extractorSimilarly FNP is defined when the watermarked speechsignal is declared as an unwatermarked speech signal by the

watermark extractor By assuming that the watermark bits areindependent random variables both the FPP and FNP can beformulated based on Bernoulli trials which is expressed asfollows

119875119890 = 119879minus1sum119894=0

(119873119894 )119875119894FN (1 minus 119875FN)(119873minus119894)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟FNP

+ 119873sum119894=119879

(119873119894 )119875119894FP (1 minus 119875FP)(119873minus119894)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟FPP

(32)

where119873 is the total number ofwatermark bits 119894 is the numberof matching bits (119873119894 ) is a binomial coefficient 119875FP is theprobability of a false positive which is assumed to be 05

Security and Communication Networks 11

Table 4 Comparison of various watermarking techniques in terms of payload and imperceptibility

Technique Quality scale Effort required to understand meaning scale SNR (dB) Theoretical payload (bps)Analysis-by-Synthesis [22] 401ndash380 476ndash395 2808ndash2532 3333ndash50Formant tuning [21] 498ndash432 500ndash455 3032ndash2754 3333ndash50DWPT+ multiplication [14] 432ndash310 500ndash355 3721ndash2008 3125ndash125The proposed method 487ndash365 500ndash405 4211ndash2071 40ndash400

times10minus4

BER = 021 is shifted by adding 000009BER = 020 is shifted by adding 0000015BER = 019

100 150 200 250 300 350 40050Number of watermark bits

0

1

False

pos

itive

pro

babi

lity

Figure 10 FPP with respect to various total number of watermarkbits for different BER

119875FN is the probability of a false negative which is assumedto be 00919 (as in Table 2) and 119879 is the threshold which iscomputed as follows

119879 = lceil(1 minus BER) times 119873rceil (33)

Figure 10 shows the FPP with respect to various totalnumber of watermark bits for different BER For bettervisualization each line was shifted by adding a constantAs observed the FPP was close to zero for 119873 greater than50 There was a small fluctuation for 119873 less than 50 whichdepended on the BER

Figure 11 shows the FNP with respect to various totalnumber of watermark bits for different BER For bettervisualization each line was shifted by adding a constant Ascan be observed the FNPwas close to zero for119873 greater than100 Additionally whenever the BER decreased the fluctua-tion increased

6 Conclusion and Future Work

In this paper a gain invariant speechwatermarking techniquewas developed using the Lagrange optimization method Forthis purpose samples of the signal were separated based onodd and even indices Then the ratio between the Lp-normswas quantized using the QIM method Finally the Lagrangemethod was used to estimate the optimized values In a sim-ilar manner the extraction process detected the watermarkdata blindly by finding the nearest quantization step

BER = 021 is shifted by adding 0025BER = 020 is shifted by adding 0015BER = 019

0

0005

001

0015

002

0025

003

0035

False

neg

ativ

e pro

babi

lity

100 150 200 250 300 350 40050Number of watermark bits

Figure 11 FNP with respect to various total number of watermarkbits for different BER

By assuming Laplacian distribution for the speech signaland Gaussian distribution for the noise signal the probabilityof error and watermarking distortion were modeled based ona statistical analysis of the proposed technique Additionallyexperimental results not only proved that the developedwatermarking technique was highly robust against differentattacks such compression AWGN filtering and resamplingbut also demonstrated the validity of the analytical modelFor future work an investigation on synchronization andadaptive quantization techniques might contribute to theproposed watermarking technique

Appendix

A Estimation of the Mean and Variance ofthe Ratio of Two Laplacian Variables Basedon Taylor Series

In [26] the bivariate second-order Taylor expansion for119891(119909 119910) around 120579 = (119864(119909) 119864(119910)) is expressed as follows

119891 (119909 119910) = 119891 (120579) + 1198911015840119909 (120579) (119909 minus 120579119909) + 1198911015840119910 (120579) (119910 minus 120579119910)+ 12 11989110158401015840119909119909 (120579) (119909 minus 120579119909)2+ 211989110158401015840119909119910 (120579) (119909 minus 120579119909) (119910 minus 120579119910) + 11989110158401015840119910119910 (120579) (119910 minus 120579119909)2+ remainder

(A1)

12 Security and Communication Networks

Therefore 119864[119891(119883 119884)] can be expanded about 120579 = (119864(119883)119864(119884)) to compute the approximate values as follows

119864 (119891 (119883 119884)) = 119891 (120579) + 12 11989110158401015840119909119909 (120579) var (119883)+ 211989110158401015840119909119910 (120579) cov (119883 119884) + 11989110158401015840119910119910 (120579) var (119884)+ 119874 (119899minus1)

(A2)

For 119891 = 119877119878 11989110158401015840119877119877 = 0 11989110158401015840119877119878 = minus119878minus2 and 11989110158401015840119878119878 = 21198771198783 Thenthe mean and variance of the ratio between 119877 and 119878(119864(119877119878))respectively can be estimated as follows

119864(119877119878 ) equiv 119864 (119891 (119877 119878))asymp 119864 (119877)119864 (119878) minus cov (119877 119878)119864 (119878)2 + var (119878) 119864 (119877)119864 (119878)3= 120583119877120583119878 (1 + 12059021198781205832119878)

var(119877119878 ) asymp 11198642119878 var (119877) + 2minus1198641198771198643119878 cov (119877 119878)+ 11986421198771198644119878 var (119878)

= 12058321198771205832119878 [12059021198771205832119877 minus 2cov (119877 119878)120583119877120583119878 + 12059021198781205832119878 ]

= 12058321198771205832119878 (12059021198781205832119878 minus

12059041198781205834119878) + 12059021198771205832119878

(A3)

B Compute the Absolute Moment ofthe Laplacian Distribution

Themoment of Laplacian distribution expressed as follows

119864 (|119883|119899) = intinfinminusinfin

|119883|119899 sdot 12119887 sdot 119890minus((119883minus120583)119887)119889119909= 12119887 intinfin

minusinfin|119883|119899 sdot 119890minus((119883minus120583)119887)119889119909 (B1)

There are two cases119883 ge 120583 and119883 lt 120583119864 (|119883|119899)

= If 119883 ge 120583 then 12119887 intinfin

minusinfin|119883|119899 sdot 119890minus((119883minus120583)119887)119889119909

If 119883 lt 120583 then 12119887 intinfinminusinfin

|119883|119899 sdot 119890minus((120583minus119883)119887)119889119909(B2)

For first case when119883 ge 120583119864 (|119883|119899) = 12119887 [int0

minusinfinminus119883119899 sdot 119890minus((119883minus120583)119887)119889119909

+ intinfin0

119883119899 sdot 119890minus((119883minus120583)119887)119889119909]

= 1198901205831198872119887 [[[[(minus1)119899 int0

minusinfin119883119899119890119883119887119889119909⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟119868119899

+ intinfin0

119883119899119890minus119883119887119889119909⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟119868

]]]](B3)

If 119905 = minus119883119887 then 119868 can be expressed as

119868 = 119887119899+1 intinfin0

119905119899119890minus119905119889119905 = 119887119899+1 sdot 119899 = 119899 (B4)

119868119899 can also be expressed as

119868119899 = int0minusinfin

119883119899119890119883119887119889119909 = int0minusinfin

(119887 sdot 119905)119899 119890minus119905 sdot 119887 sdot 119889119905= 119887119899+1 int0

minusinfin119905119899119890minus119905119889119905

= 119905119899119890minus119905minus11003816100381610038161003816100381610038161003816100381610038160

minusinfin

minus int0minusinfin

119899 sdot 119905119899minus1119890minus119905minus1 119889119905 = 0 + 119899119868119899minus1(B5)

Substituting (B4) and (B5) into (B3) the absolute momentof the Laplacian distribution can be computed based on

119864 (|119883|119899) = (1198901205831198871198871198992 ) [(minus1)119899 sdot 119868119899 + 119899] (B6)

Competing Interests

The authors declare that they have no competing interests

References

[1] M A Nematollahi C Vorakulpipat and H G Rosales DigitalWatermarking Techniques and Trends vol 11 Springer 2016

[2] MANematollahi and S A R Al-Haddad ldquoAn overview of dig-ital speech watermarkingrdquo International Journal of Speech Tech-nology vol 16 no 4 pp 471ndash488 2013

[3] H-T Hu and L-Y Hsu ldquoA DWT-based rational dither modu-lation scheme for effective blind audio watermarkingrdquo CircuitsSystems and Signal Processing vol 35 no 2 pp 553ndash572 2016

[4] B Chen and G W Wornell ldquoQuantization index modulationa class of provably good methods for digital watermarking andinformation embeddingrdquo Institute of Electrical and ElectronicsEngineers Transactions on InformationTheory vol 47 no 4 pp1423ndash1443 2001

[5] M A Nematollahi M A Akhaee S A R Al-Haddad andH Gamboa-Rosales ldquoSemi-fragile digital speech watermarkingfor online speaker recognitionrdquo Eurasip Journal on AudioSpeech andMusic Processing vol 2015 no 1 article no 31 2015

[6] P Guccione and M Scagliola ldquoHyperbolic RDM for nonlin-ear valumetric distortionsrdquo IEEE Transactions on InformationForensics and Security vol 4 no 1 pp 25ndash35 2009

[7] N Cai N Zhu S Weng and B Wing-Kuen Ling ldquoDifferenceangle quantization index modulation scheme for image water-markingrdquo Signal Processing Image Communication vol 34 pp52ndash60 2015

Security and Communication Networks 13

[8] X Zhu and S Peng ldquoA novel quantization watermarkingscheme by modulating the normalized correlationrdquo in Proceed-ings of the IEEE International Conference on Acoustics Speechand Signal Processing (ICASSP rsquo12) pp 1765ndash1768 IEEE KyotoJapan March 2012

[9] M A Akhaee S M E Sahraeian and C Jin ldquoBlind imagewatermarking using a sample projection approachrdquo IEEETrans-actions on Information Forensics and Security vol 6 no 3 pp883ndash893 2011

[10] N K Kalantari and S M Ahadi ldquoA logarithmic quantizationindex modulation for perceptually better data hidingrdquo IEEETransactions on Image Processing vol 19 no 6 pp 1504ndash15172010

[11] M Zareian andH R Tohidypour ldquoA novel gain invariant quan-tization-based watermarking approachrdquo IEEE Transactions onInformation Forensics and Security vol 9 no 11 pp 1804ndash18132014

[12] M Faundez-Zanuy M Hagmuller and G Kubin ldquoSpeakerverification security improvement by means of speech water-markingrdquo Speech Communication vol 48 no 12 pp 1608ndash16192006

[13] M Faundez-Zanuy M Hagmuller and G Kubin ldquoSpeakeridentification security improvement by means of speech water-markingrdquo Pattern Recognition vol 40 no 11 pp 3027ndash30342007

[14] M A Nematollahi H Gamboa-Rosales M A Akhaee andS A R Al-Haddad ldquoRobust digital speech watermarking foronline speaker recognitionrdquo Mathematical Problems in Engi-neering vol 2015 Article ID 372398 12 pages 2015

[15] M A Nematollahi H Gamboa-Rosales F J Martinez-Ruiz JI de la Rosa-Vargas S A R Al-Haddad and M EsmaeilpourldquoMulti-factor authentication model based on multipurposespeech watermarking and online speaker recognitionrdquo Multi-media Tools and Applications pp 1ndash31 2016

[16] M A Nematollahi S A R Al-Haddad S Doraisamy and HGamboa-Rosales ldquoSpeaker frame selection for digital speechwatermarkingrdquo National Academy Science Letters vol 39 no 3pp 197ndash201 2016

[17] S Gazor andW Zhang ldquoSpeech probability distributionrdquo IEEESignal Processing Letters vol 10 no 7 pp 204ndash207 2003

[18] M A Akhaee N Khademi Kalantari and F Marvasti ldquoRobustaudio and speech watermarking using Gaussian and Laplacianmodelingrdquo Signal Processing vol 90 no 8 pp 2487ndash2497 2010

[19] J S Garofolo and L D Consortium TIMIT Acoustic-PhoneticContinuous Speech Corpus Linguistic Data Consortium 1993

[20] S Verdu and T S Han ldquoA general formula for channel capacityrdquoIEEE Transactions on Information Theory vol 40 no 4 pp1147ndash1157 1994

[21] S Wang and M Unoki ldquoSpeech watermarking method basedon formant tuningrdquo IEICETransactions on Information and Sys-tems vol 98 no 1 pp 29ndash37 2015

[22] B Yan and Y-J Guo ldquoSpeech authentication by semi-fragilespeech watermarking utilizing analysis by synthesis and spec-tral distortion optimizationrdquo Multimedia Tools and Applica-tions vol 67 no 2 pp 383ndash405 2013

[23] I Rec P 800Methods for Subjective Determination of Transmis-sion Quality International Telecommunication Union GenevaSwitzerland 1996

[24] M Steinebach F A P Petitcolas F Raynal et al ldquoStirMarkbenchmark audio watermarking attacksrdquo in Proceedings of theInternational Conference on Information Technology Codingand Computing IEEE 2001

[25] K Vivekananda Bhat I Sengupta and A Das ldquoAn audiowatermarking scheme using singular value decomposition anddither-modulation quantizationrdquo Multimedia Tools and Appli-cations vol 52 no 2-3 pp 369ndash383 2011

[26] R C Elandt-Johnson and N L Johnson Survival Models andData Analysis Wiley Classics Library John Wiley amp Sons NewYork NY USA 1999

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

6 Security and Communication Networks

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

0045

005

0055

006Bi

t err

or ra

te (

)

2 3 4 5 6 71Lp-norm

(a)

00598

00598

00598

00598

00598

00598

00598

Bit e

rror

rate

()

2 3 4 5 6 71

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

Lp-norm

(b)

0057

00571

00572

00573

00574

00575

00576

00577

Bit e

rror

rate

()

2 3 4 5 6 71

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

Lp-norm

(c)

00495005

005050051

005150052

005250053

00535

Bit e

rror

rate

()

2 3 4 5 6 71

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

Lp-norm

(d)

00536

00538

0054

00542

00544

00546

00548

0055

Bit e

rror

rate

()

2 3 4 5 6 71Lp-norm

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

(e)

004550046

004650047

004750048

004850049

00495005

00505

Bit e

rror

rate

()

2 3 4 5 6 71

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

Lp-norm

(f)

Figure 4 (a) BER versus Lp-norms for different frame lengths under WNR = 40 dB (bndashf) each curve separately

the frame size increased the energy level between the twosets of 119871119883 and 119871119884 increased Consequently the ratio betweenthem increased which caused a lower SNR Additionally itseems that changing119875was not highly correlatedwith the SNRfor different frame lengths

Figure 7 illustrates different SNRswith respect to different119875 for various quantization steps As observed 119875 did nothighly affect the SNR However the quantization step highlyaffected the SNR As the quantization step increased the SNRdecreased

Security and Communication Networks 7

Δ = 095

Δ = 05

Δ = 025

Δ = 01 Δ = 075

2 3 4 5 6 71002

003

004

005

006

007

008

009

01

Bit e

rror

rate

()

Lp-norm

Figure 5 BER versus Lp-norms for different quantization steps under WNR = 40 dB

Frame rate is 40Frame rate is 100Frame rate is 200

Frame rate is 300Frame rate is 400

2 3 4 5 6 713593

359353594

359453595

359553596

359653597

359753598

SNR

(dB)

Lp-norm

Figure 6 SNR versus Lp-norms for different frame lengths

To compute the payload of the proposed watermark amemoryless binary symmetric channel (BSC) (119862BSC) definedas

119862BSC = 119877 times [1 + 119867 (119875119890)] (29)

where

119867(119875119890) = 119875119890 times log(119875119890)2 + (1 minus 119875119890) times log(1minus119875119890)2 (30)

was applied to estimate the capacity of the channel withbitrate (119877) for error-free watermark transmission [20]

Because the sampling rate of the TIMIT was 16KHz 119877was assumed to be 64Kbps (8 KHz for speech bandwidth times8 bits per sample = 64Kbps) for a telephony channel and119875119890 was assumed to be equal to the BER in the watermarkdetection process Figure 8 shows the amount of the BSC fordifferent WNRs for various quantization steps As observedthe capacity increased whenever the WNR increased This isbecause the watermark was extracted with a minimum BERwhen the WNR increased Moreover it can be inferred thatthe amount of the BSC increased while the quantization step

Different quantization rate

Δ = 095

Δ = 05

Δ = 025

Δ = 01 Δ = 075

2 3 4 5 6 7132343638404244464850

SNR

(dB)

Lp-norm

Figure 7 SNR versus Lp-norms for various quantization steps

increased because the watermark was embedded with highintensity when the quantization step increased As observedthe BSC capacity for fewer quantization steps (Δ le 025) wasapproximately zero under a high noisy channel

Figure 9 shows the variation of the BSC capacity withrespect to different WNRs for different frame lengths Asobserved it seems that under serious noise the frame sizewas not a significant factor for the BSC capacity Despite thisthe frame size was likely to be important whenever theWNRincreasedThus for a largeWNR it is obvious that wheneverthe frame size increased the BER in the watermark detectionprocess decreased which caused an improvement in the BSCcapacity

To demonstrate the efficiency and performance of theproposed speech watermarking technique the robustnesscapacity and inaudibility of the proposed technique must becompared with other state-of-the-art speech watermarkingtechniques

Table 1 describes the benchmark for simulating the resultsfor the robustness test Many of these attacks are based on theStirMark Benchmark for Audio (SMBA) [24]

8 Security and Communication Networks

Table 1 Benchmark for speech watermarking

Attack type Attack name Description Parameter(s) Defaultvalue(s)

Additive Noise

AddBrummIt adds buzz or low frequency sinustone to the watermarked signal to

simulate the impact of a power supply⟨STRENGTH⟩ ⟨FREQUENCY⟩ 2500 55 to

3000 75 A

AddDynNoise It adds a dynamic white noise to thewatermarked signal ⟨STRENGTH⟩ 20 to 40 B

AddFFTNoiseIt adds white noise to the

watermarked signal in the frequencydomain

⟨FFTSIZE⟩ ⟨STRENGTH⟩ 256 1000 to1024 3000 C

AddNoiseA white Gaussian noise is

contaminated the watermarked signalto simulate ambient distortion

⟨STRENGTH⟩ 35 dB level to5 dB D

AddSinus It adds a sinus signal to thewatermarked signal ⟨AMPLITUDE⟩ ⟨FREQUENCY⟩ 120 3000 to

150 3500 E

Conversion

Resampling

The sampling rate of the watermarkedsignal is converted to⟨SAMPLERATE1⟩ and then is

reconverted to ⟨SAMPLERATE2⟩⟨SAMPLERATE1⟩ ⟨SAMPLERATE2⟩ 4KHz 16 KHz

to8KHz 16 KHz

F

Requantization

The sample of the watermarked signalis quantized to ⟨QUANTIZATION1⟩

and then is requantized to⟨QUANTIZATION2⟩⟨QUANTIZATION1⟩ ⟨QUANTIZATION2⟩ 8 bits and 16

bits G

InvertIt inverts all samples in the

watermarked signal like a 180 degreephase shift

NO PARAMETER REQUIRED None H

Ambience EchoAn echo with a delay ⟨DELAY⟩ anddecay ⟨DECAY⟩ is added to the

watermarked signal⟨DELAY⟩ ⟨DECAY⟩ 20ms and 10

to 100ms and50

I

Samplepermutations

Cut samples⟨REMOVENUMBER⟩ samples are

removed from the watermarked signalfrom every ⟨REMOVEDIST⟩ period ⟨REMOVEDIST⟩ ⟨REMOVENUMBER⟩ 1 and 1000 to 7

and 1000 J

Copy samplesSome of the samples of the

watermarked signal are copiedbetween the samples values

⟨PERIOD⟩ ⟨COPYDIST⟩ ⟨COPYCOUNT⟩ 1000 100 30to

1000 200 60K

LSB Zero Set all samples of the watermarkedsignal to zero NO PARAMETER REQUIRED None L

SmoothThe new sample value depends on the

samples before and after themodifying point

NO PARAMETER REQUIRED None M

Stat1 It averages the sample with its nextneighbors NO PARAMETER REQUIRED None N

DynamicsAmplify

The amplitude of the watermarkedsignal is increased up to ⟨FACTOR1⟩

and is decreased down to⟨FACTOR2⟩ respectively⟨FACTOR1⟩ ⟨FACTOR2⟩ 150 and 75

200 and 50 O

Denoising The watermarked signal is denoisedby ⟨FACTOR⟩ ⟨FACTOR⟩ minus80 dB tominus60 dB P

Filters

Low Pass Filter(LPF)

The watermarked signal is filtered byan elliptic LPF with cutoff frequency

of ⟨FREQUENCY⟩ ⟨FREQUENCY⟩ 5KHz to4KHz Q

Band PassFilter (BPF)

The watermarked signal is filtered byan elliptic filter with bandwidth from⟨FREQUENCY1⟩ to⟨FREQUENCY2⟩ to simulate a

narrowband telephony channel

⟨FREQUENCY1⟩ ⟨FREQUENCY2⟩500Hz amp4000Hz to300Hz amp3400Hz

R

High PassFilter (HPF)

The watermarked signal is filtered byan elliptic HPF with cutoff frequency

of ⟨FREQUENCY⟩ ⟨FREQUENCY⟩ 500Hz to800Hz S

Security and Communication Networks 9

Table 1 Continued

Attack type Attack name Description Parameter(s) Defaultvalue(s)

Time stretchand pitch shift

Pitch scaleThe pitch of the watermarked signal isnonlinearly scaled without changing

the time⟨SCALEFACTOR⟩ 105 to 1 10 T

Time stretch The time of the watermarked signal isnonlinearly stretched ⟨TEMPOFACTOR⟩ 105 to 110 U

Compression

CELP coding

The watermarked signal is coddedwith rate of ⟨BITRATE⟩ by CELPcodecs and then is decoded to

original one

⟨BITRATE⟩ 16 Kbps to96 kbps V

MP3compression

The watermarked signal iscompressed by MP3 with different

rate ⟨BITRATE⟩ ⟨BITRATE⟩ 128 to 32 W

G711 The watermarked signal is codded bystandard 64 kbps A120583-law PCM NO PARAMETER REQUIRED None X

times104

Δ = 095

Δ = 05

Δ = 025

Δ = 01 Δ = 075

10 15 20 25 30 35 40 45 50 555WNR (dB)

0

1

2

3

4

5

6

7

CBS

C

Figure 8 Variation of the BSC capacity with respect to differentWNRs for different quantization steps

Table 2 compares the BER with state-of-the-art speechwatermarking techniques We implemented all the tech-niques and tested them for the entire TIMIT corpus underdifferent attacks As can be observed the proposed speechwatermarking technique has a lower BER overall comparedwith other techniques

The perceptual quality of the watermarked signal iscritical for the evaluation of the proposed watermarked tech-nique which can be measured based on the mean opinionscore (MOS) (as proposed by the International Telecommu-nicationsUnion (ITU-T) [23]) and SNRTheMOSuses a sub-jective evaluation technique to score the watermarked signalwhich is presented in Table 3 In theMOS evaluationmethod10 people were asked to listen blindly to the original andwatermarked signals Then they reported the dissimilaritiesbetween the quality of the original and watermarked speechsignalsThe average of these reports were computed for MOSmusic and MOS speech and presented in Table 4

times104

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

10 15 20 25 30 35 40 45 50 555WNR (dB)

0

1

2

3

4

5

6C

BSC

Figure 9 Variation of the BSC capacity with respect to differentWNRs for different frame lengths

An objective evaluation technique such as SWR andSNR attempts to quantify this amount based on the followingformula

SNR = 10 times log10sum119899 1198782sum119899 ( minus 119878)2 (31)

where 119878 and are the original and watermarked signalsrespectively

Table 4 presents a comparison of the proposed techniqueand other techniques in terms of imperceptibility and capac-ity Based on the results it seems that the proposed speechwatermarking technique outperformed the other techniquesin terms of capacity and imperceptibility Although the SNRfor formant tuning [21] is higher than the proposed tech-nique the capacity and robustness of the proposed techniqueare greater than those for formant tuning [21] and Analysis-by-Synthesis [22]

10 Security and Communication Networks

Table 2 Comparison with the robustness of different speech watermarking techniques in terms of BER ()

Attack The proposed method DWPT+ multiplication [14] Formant tuning [21] Analysis-by-Synthesis [22]No attack 000 000 004 006A 191ndash423 209ndash543 365ndash645 796ndash965B 965ndash2177 1045ndash2232 1276ndash2445 1623ndash2523C 1013ndash2043 1243ndash2132 1423ndash2354 1743ndash2632D 1053ndash1923 1033ndash1893 1163ndash2323 1533ndash2598E 032ndash202 0763ndash114 123ndash232 298ndash432F 1354ndash1723 1432ndash1765 2623ndash3783 2945ndash3306G 323 265 1932 2387H 023 000 943 1245I 134ndash465 234ndash511 465ndash1043 823ndash1643J 123ndash254 132ndash467 654ndash1054 1154ndash1887K 132ndash316 178ndash423 751ndash1034 1149ndash1943L 092 198 150 404M 312 576 1034 2168N 410 423 665 954O 121ndash254 000ndash143 597ndash876 898ndash1554P 100ndash354 243ndash543 965ndash1456 1965ndash2645Q 2143ndash2943 2454ndash3143 4054ndash4443 5009ndash5032R 484ndash954 532ndash1032 1665ndash2944 2054ndash3698S 1332ndash1854 1500ndash1943 2043ndash2923 2854ndash3076T 132ndash232 201ndash313 743ndash1043 965ndash1532U 015ndash023 018ndash043 145ndash321 432ndash543V 654ndash954 1143ndash1454 132ndash421 232ndash432W 1043ndash2034 1143ndash2534 3632ndash4565 3343ndash5032X 2311 2417 4832 5065Average 580ndash904 668ndash1004 1295ndash1739 1682ndash2148

Table 3 MOS grades [23]

MOS Quality Quality scale Effort required to understand meaning scale(5) Excellent Imperceptible No effort required(4) Good Perceptible but not annoying No appreciable effort required(3) Fair Slightly annoying Moderate effort required(2) Poor Annoying Considerable effort required(1) Bad Very annoying No meaning was understood

As observed in Table 4 each entity was bounded betweentwo values that related a particular value of imperceptibility(SNR andMOS) to a particular capacity Consequently whenthe capacity increased imperceptibility decreasedThe trade-off value is completely application dependent and should bedetermined by the user

5 Performance Analysis

Generally two types of errors false positive probability (FPP)and false negative probability (FNP)must always be analyzedto validate the security of a watermarking system [25] FPPis defined when an unwatermarked speech signal is declaredas a watermarked speech signal by the watermark extractorSimilarly FNP is defined when the watermarked speechsignal is declared as an unwatermarked speech signal by the

watermark extractor By assuming that the watermark bits areindependent random variables both the FPP and FNP can beformulated based on Bernoulli trials which is expressed asfollows

119875119890 = 119879minus1sum119894=0

(119873119894 )119875119894FN (1 minus 119875FN)(119873minus119894)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟FNP

+ 119873sum119894=119879

(119873119894 )119875119894FP (1 minus 119875FP)(119873minus119894)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟FPP

(32)

where119873 is the total number ofwatermark bits 119894 is the numberof matching bits (119873119894 ) is a binomial coefficient 119875FP is theprobability of a false positive which is assumed to be 05

Security and Communication Networks 11

Table 4 Comparison of various watermarking techniques in terms of payload and imperceptibility

Technique Quality scale Effort required to understand meaning scale SNR (dB) Theoretical payload (bps)Analysis-by-Synthesis [22] 401ndash380 476ndash395 2808ndash2532 3333ndash50Formant tuning [21] 498ndash432 500ndash455 3032ndash2754 3333ndash50DWPT+ multiplication [14] 432ndash310 500ndash355 3721ndash2008 3125ndash125The proposed method 487ndash365 500ndash405 4211ndash2071 40ndash400

times10minus4

BER = 021 is shifted by adding 000009BER = 020 is shifted by adding 0000015BER = 019

100 150 200 250 300 350 40050Number of watermark bits

0

1

False

pos

itive

pro

babi

lity

Figure 10 FPP with respect to various total number of watermarkbits for different BER

119875FN is the probability of a false negative which is assumedto be 00919 (as in Table 2) and 119879 is the threshold which iscomputed as follows

119879 = lceil(1 minus BER) times 119873rceil (33)

Figure 10 shows the FPP with respect to various totalnumber of watermark bits for different BER For bettervisualization each line was shifted by adding a constantAs observed the FPP was close to zero for 119873 greater than50 There was a small fluctuation for 119873 less than 50 whichdepended on the BER

Figure 11 shows the FNP with respect to various totalnumber of watermark bits for different BER For bettervisualization each line was shifted by adding a constant Ascan be observed the FNPwas close to zero for119873 greater than100 Additionally whenever the BER decreased the fluctua-tion increased

6 Conclusion and Future Work

In this paper a gain invariant speechwatermarking techniquewas developed using the Lagrange optimization method Forthis purpose samples of the signal were separated based onodd and even indices Then the ratio between the Lp-normswas quantized using the QIM method Finally the Lagrangemethod was used to estimate the optimized values In a sim-ilar manner the extraction process detected the watermarkdata blindly by finding the nearest quantization step

BER = 021 is shifted by adding 0025BER = 020 is shifted by adding 0015BER = 019

0

0005

001

0015

002

0025

003

0035

False

neg

ativ

e pro

babi

lity

100 150 200 250 300 350 40050Number of watermark bits

Figure 11 FNP with respect to various total number of watermarkbits for different BER

By assuming Laplacian distribution for the speech signaland Gaussian distribution for the noise signal the probabilityof error and watermarking distortion were modeled based ona statistical analysis of the proposed technique Additionallyexperimental results not only proved that the developedwatermarking technique was highly robust against differentattacks such compression AWGN filtering and resamplingbut also demonstrated the validity of the analytical modelFor future work an investigation on synchronization andadaptive quantization techniques might contribute to theproposed watermarking technique

Appendix

A Estimation of the Mean and Variance ofthe Ratio of Two Laplacian Variables Basedon Taylor Series

In [26] the bivariate second-order Taylor expansion for119891(119909 119910) around 120579 = (119864(119909) 119864(119910)) is expressed as follows

119891 (119909 119910) = 119891 (120579) + 1198911015840119909 (120579) (119909 minus 120579119909) + 1198911015840119910 (120579) (119910 minus 120579119910)+ 12 11989110158401015840119909119909 (120579) (119909 minus 120579119909)2+ 211989110158401015840119909119910 (120579) (119909 minus 120579119909) (119910 minus 120579119910) + 11989110158401015840119910119910 (120579) (119910 minus 120579119909)2+ remainder

(A1)

12 Security and Communication Networks

Therefore 119864[119891(119883 119884)] can be expanded about 120579 = (119864(119883)119864(119884)) to compute the approximate values as follows

119864 (119891 (119883 119884)) = 119891 (120579) + 12 11989110158401015840119909119909 (120579) var (119883)+ 211989110158401015840119909119910 (120579) cov (119883 119884) + 11989110158401015840119910119910 (120579) var (119884)+ 119874 (119899minus1)

(A2)

For 119891 = 119877119878 11989110158401015840119877119877 = 0 11989110158401015840119877119878 = minus119878minus2 and 11989110158401015840119878119878 = 21198771198783 Thenthe mean and variance of the ratio between 119877 and 119878(119864(119877119878))respectively can be estimated as follows

119864(119877119878 ) equiv 119864 (119891 (119877 119878))asymp 119864 (119877)119864 (119878) minus cov (119877 119878)119864 (119878)2 + var (119878) 119864 (119877)119864 (119878)3= 120583119877120583119878 (1 + 12059021198781205832119878)

var(119877119878 ) asymp 11198642119878 var (119877) + 2minus1198641198771198643119878 cov (119877 119878)+ 11986421198771198644119878 var (119878)

= 12058321198771205832119878 [12059021198771205832119877 minus 2cov (119877 119878)120583119877120583119878 + 12059021198781205832119878 ]

= 12058321198771205832119878 (12059021198781205832119878 minus

12059041198781205834119878) + 12059021198771205832119878

(A3)

B Compute the Absolute Moment ofthe Laplacian Distribution

Themoment of Laplacian distribution expressed as follows

119864 (|119883|119899) = intinfinminusinfin

|119883|119899 sdot 12119887 sdot 119890minus((119883minus120583)119887)119889119909= 12119887 intinfin

minusinfin|119883|119899 sdot 119890minus((119883minus120583)119887)119889119909 (B1)

There are two cases119883 ge 120583 and119883 lt 120583119864 (|119883|119899)

= If 119883 ge 120583 then 12119887 intinfin

minusinfin|119883|119899 sdot 119890minus((119883minus120583)119887)119889119909

If 119883 lt 120583 then 12119887 intinfinminusinfin

|119883|119899 sdot 119890minus((120583minus119883)119887)119889119909(B2)

For first case when119883 ge 120583119864 (|119883|119899) = 12119887 [int0

minusinfinminus119883119899 sdot 119890minus((119883minus120583)119887)119889119909

+ intinfin0

119883119899 sdot 119890minus((119883minus120583)119887)119889119909]

= 1198901205831198872119887 [[[[(minus1)119899 int0

minusinfin119883119899119890119883119887119889119909⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟119868119899

+ intinfin0

119883119899119890minus119883119887119889119909⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟119868

]]]](B3)

If 119905 = minus119883119887 then 119868 can be expressed as

119868 = 119887119899+1 intinfin0

119905119899119890minus119905119889119905 = 119887119899+1 sdot 119899 = 119899 (B4)

119868119899 can also be expressed as

119868119899 = int0minusinfin

119883119899119890119883119887119889119909 = int0minusinfin

(119887 sdot 119905)119899 119890minus119905 sdot 119887 sdot 119889119905= 119887119899+1 int0

minusinfin119905119899119890minus119905119889119905

= 119905119899119890minus119905minus11003816100381610038161003816100381610038161003816100381610038160

minusinfin

minus int0minusinfin

119899 sdot 119905119899minus1119890minus119905minus1 119889119905 = 0 + 119899119868119899minus1(B5)

Substituting (B4) and (B5) into (B3) the absolute momentof the Laplacian distribution can be computed based on

119864 (|119883|119899) = (1198901205831198871198871198992 ) [(minus1)119899 sdot 119868119899 + 119899] (B6)

Competing Interests

The authors declare that they have no competing interests

References

[1] M A Nematollahi C Vorakulpipat and H G Rosales DigitalWatermarking Techniques and Trends vol 11 Springer 2016

[2] MANematollahi and S A R Al-Haddad ldquoAn overview of dig-ital speech watermarkingrdquo International Journal of Speech Tech-nology vol 16 no 4 pp 471ndash488 2013

[3] H-T Hu and L-Y Hsu ldquoA DWT-based rational dither modu-lation scheme for effective blind audio watermarkingrdquo CircuitsSystems and Signal Processing vol 35 no 2 pp 553ndash572 2016

[4] B Chen and G W Wornell ldquoQuantization index modulationa class of provably good methods for digital watermarking andinformation embeddingrdquo Institute of Electrical and ElectronicsEngineers Transactions on InformationTheory vol 47 no 4 pp1423ndash1443 2001

[5] M A Nematollahi M A Akhaee S A R Al-Haddad andH Gamboa-Rosales ldquoSemi-fragile digital speech watermarkingfor online speaker recognitionrdquo Eurasip Journal on AudioSpeech andMusic Processing vol 2015 no 1 article no 31 2015

[6] P Guccione and M Scagliola ldquoHyperbolic RDM for nonlin-ear valumetric distortionsrdquo IEEE Transactions on InformationForensics and Security vol 4 no 1 pp 25ndash35 2009

[7] N Cai N Zhu S Weng and B Wing-Kuen Ling ldquoDifferenceangle quantization index modulation scheme for image water-markingrdquo Signal Processing Image Communication vol 34 pp52ndash60 2015

Security and Communication Networks 13

[8] X Zhu and S Peng ldquoA novel quantization watermarkingscheme by modulating the normalized correlationrdquo in Proceed-ings of the IEEE International Conference on Acoustics Speechand Signal Processing (ICASSP rsquo12) pp 1765ndash1768 IEEE KyotoJapan March 2012

[9] M A Akhaee S M E Sahraeian and C Jin ldquoBlind imagewatermarking using a sample projection approachrdquo IEEETrans-actions on Information Forensics and Security vol 6 no 3 pp883ndash893 2011

[10] N K Kalantari and S M Ahadi ldquoA logarithmic quantizationindex modulation for perceptually better data hidingrdquo IEEETransactions on Image Processing vol 19 no 6 pp 1504ndash15172010

[11] M Zareian andH R Tohidypour ldquoA novel gain invariant quan-tization-based watermarking approachrdquo IEEE Transactions onInformation Forensics and Security vol 9 no 11 pp 1804ndash18132014

[12] M Faundez-Zanuy M Hagmuller and G Kubin ldquoSpeakerverification security improvement by means of speech water-markingrdquo Speech Communication vol 48 no 12 pp 1608ndash16192006

[13] M Faundez-Zanuy M Hagmuller and G Kubin ldquoSpeakeridentification security improvement by means of speech water-markingrdquo Pattern Recognition vol 40 no 11 pp 3027ndash30342007

[14] M A Nematollahi H Gamboa-Rosales M A Akhaee andS A R Al-Haddad ldquoRobust digital speech watermarking foronline speaker recognitionrdquo Mathematical Problems in Engi-neering vol 2015 Article ID 372398 12 pages 2015

[15] M A Nematollahi H Gamboa-Rosales F J Martinez-Ruiz JI de la Rosa-Vargas S A R Al-Haddad and M EsmaeilpourldquoMulti-factor authentication model based on multipurposespeech watermarking and online speaker recognitionrdquo Multi-media Tools and Applications pp 1ndash31 2016

[16] M A Nematollahi S A R Al-Haddad S Doraisamy and HGamboa-Rosales ldquoSpeaker frame selection for digital speechwatermarkingrdquo National Academy Science Letters vol 39 no 3pp 197ndash201 2016

[17] S Gazor andW Zhang ldquoSpeech probability distributionrdquo IEEESignal Processing Letters vol 10 no 7 pp 204ndash207 2003

[18] M A Akhaee N Khademi Kalantari and F Marvasti ldquoRobustaudio and speech watermarking using Gaussian and Laplacianmodelingrdquo Signal Processing vol 90 no 8 pp 2487ndash2497 2010

[19] J S Garofolo and L D Consortium TIMIT Acoustic-PhoneticContinuous Speech Corpus Linguistic Data Consortium 1993

[20] S Verdu and T S Han ldquoA general formula for channel capacityrdquoIEEE Transactions on Information Theory vol 40 no 4 pp1147ndash1157 1994

[21] S Wang and M Unoki ldquoSpeech watermarking method basedon formant tuningrdquo IEICETransactions on Information and Sys-tems vol 98 no 1 pp 29ndash37 2015

[22] B Yan and Y-J Guo ldquoSpeech authentication by semi-fragilespeech watermarking utilizing analysis by synthesis and spec-tral distortion optimizationrdquo Multimedia Tools and Applica-tions vol 67 no 2 pp 383ndash405 2013

[23] I Rec P 800Methods for Subjective Determination of Transmis-sion Quality International Telecommunication Union GenevaSwitzerland 1996

[24] M Steinebach F A P Petitcolas F Raynal et al ldquoStirMarkbenchmark audio watermarking attacksrdquo in Proceedings of theInternational Conference on Information Technology Codingand Computing IEEE 2001

[25] K Vivekananda Bhat I Sengupta and A Das ldquoAn audiowatermarking scheme using singular value decomposition anddither-modulation quantizationrdquo Multimedia Tools and Appli-cations vol 52 no 2-3 pp 369ndash383 2011

[26] R C Elandt-Johnson and N L Johnson Survival Models andData Analysis Wiley Classics Library John Wiley amp Sons NewYork NY USA 1999

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Security and Communication Networks 7

Δ = 095

Δ = 05

Δ = 025

Δ = 01 Δ = 075

2 3 4 5 6 71002

003

004

005

006

007

008

009

01

Bit e

rror

rate

()

Lp-norm

Figure 5 BER versus Lp-norms for different quantization steps under WNR = 40 dB

Frame rate is 40Frame rate is 100Frame rate is 200

Frame rate is 300Frame rate is 400

2 3 4 5 6 713593

359353594

359453595

359553596

359653597

359753598

SNR

(dB)

Lp-norm

Figure 6 SNR versus Lp-norms for different frame lengths

To compute the payload of the proposed watermark amemoryless binary symmetric channel (BSC) (119862BSC) definedas

119862BSC = 119877 times [1 + 119867 (119875119890)] (29)

where

119867(119875119890) = 119875119890 times log(119875119890)2 + (1 minus 119875119890) times log(1minus119875119890)2 (30)

was applied to estimate the capacity of the channel withbitrate (119877) for error-free watermark transmission [20]

Because the sampling rate of the TIMIT was 16KHz 119877was assumed to be 64Kbps (8 KHz for speech bandwidth times8 bits per sample = 64Kbps) for a telephony channel and119875119890 was assumed to be equal to the BER in the watermarkdetection process Figure 8 shows the amount of the BSC fordifferent WNRs for various quantization steps As observedthe capacity increased whenever the WNR increased This isbecause the watermark was extracted with a minimum BERwhen the WNR increased Moreover it can be inferred thatthe amount of the BSC increased while the quantization step

Different quantization rate

Δ = 095

Δ = 05

Δ = 025

Δ = 01 Δ = 075

2 3 4 5 6 7132343638404244464850

SNR

(dB)

Lp-norm

Figure 7 SNR versus Lp-norms for various quantization steps

increased because the watermark was embedded with highintensity when the quantization step increased As observedthe BSC capacity for fewer quantization steps (Δ le 025) wasapproximately zero under a high noisy channel

Figure 9 shows the variation of the BSC capacity withrespect to different WNRs for different frame lengths Asobserved it seems that under serious noise the frame sizewas not a significant factor for the BSC capacity Despite thisthe frame size was likely to be important whenever theWNRincreasedThus for a largeWNR it is obvious that wheneverthe frame size increased the BER in the watermark detectionprocess decreased which caused an improvement in the BSCcapacity

To demonstrate the efficiency and performance of theproposed speech watermarking technique the robustnesscapacity and inaudibility of the proposed technique must becompared with other state-of-the-art speech watermarkingtechniques

Table 1 describes the benchmark for simulating the resultsfor the robustness test Many of these attacks are based on theStirMark Benchmark for Audio (SMBA) [24]

8 Security and Communication Networks

Table 1 Benchmark for speech watermarking

Attack type Attack name Description Parameter(s) Defaultvalue(s)

Additive Noise

AddBrummIt adds buzz or low frequency sinustone to the watermarked signal to

simulate the impact of a power supply⟨STRENGTH⟩ ⟨FREQUENCY⟩ 2500 55 to

3000 75 A

AddDynNoise It adds a dynamic white noise to thewatermarked signal ⟨STRENGTH⟩ 20 to 40 B

AddFFTNoiseIt adds white noise to the

watermarked signal in the frequencydomain

⟨FFTSIZE⟩ ⟨STRENGTH⟩ 256 1000 to1024 3000 C

AddNoiseA white Gaussian noise is

contaminated the watermarked signalto simulate ambient distortion

⟨STRENGTH⟩ 35 dB level to5 dB D

AddSinus It adds a sinus signal to thewatermarked signal ⟨AMPLITUDE⟩ ⟨FREQUENCY⟩ 120 3000 to

150 3500 E

Conversion

Resampling

The sampling rate of the watermarkedsignal is converted to⟨SAMPLERATE1⟩ and then is

reconverted to ⟨SAMPLERATE2⟩⟨SAMPLERATE1⟩ ⟨SAMPLERATE2⟩ 4KHz 16 KHz

to8KHz 16 KHz

F

Requantization

The sample of the watermarked signalis quantized to ⟨QUANTIZATION1⟩

and then is requantized to⟨QUANTIZATION2⟩⟨QUANTIZATION1⟩ ⟨QUANTIZATION2⟩ 8 bits and 16

bits G

InvertIt inverts all samples in the

watermarked signal like a 180 degreephase shift

NO PARAMETER REQUIRED None H

Ambience EchoAn echo with a delay ⟨DELAY⟩ anddecay ⟨DECAY⟩ is added to the

watermarked signal⟨DELAY⟩ ⟨DECAY⟩ 20ms and 10

to 100ms and50

I

Samplepermutations

Cut samples⟨REMOVENUMBER⟩ samples are

removed from the watermarked signalfrom every ⟨REMOVEDIST⟩ period ⟨REMOVEDIST⟩ ⟨REMOVENUMBER⟩ 1 and 1000 to 7

and 1000 J

Copy samplesSome of the samples of the

watermarked signal are copiedbetween the samples values

⟨PERIOD⟩ ⟨COPYDIST⟩ ⟨COPYCOUNT⟩ 1000 100 30to

1000 200 60K

LSB Zero Set all samples of the watermarkedsignal to zero NO PARAMETER REQUIRED None L

SmoothThe new sample value depends on the

samples before and after themodifying point

NO PARAMETER REQUIRED None M

Stat1 It averages the sample with its nextneighbors NO PARAMETER REQUIRED None N

DynamicsAmplify

The amplitude of the watermarkedsignal is increased up to ⟨FACTOR1⟩

and is decreased down to⟨FACTOR2⟩ respectively⟨FACTOR1⟩ ⟨FACTOR2⟩ 150 and 75

200 and 50 O

Denoising The watermarked signal is denoisedby ⟨FACTOR⟩ ⟨FACTOR⟩ minus80 dB tominus60 dB P

Filters

Low Pass Filter(LPF)

The watermarked signal is filtered byan elliptic LPF with cutoff frequency

of ⟨FREQUENCY⟩ ⟨FREQUENCY⟩ 5KHz to4KHz Q

Band PassFilter (BPF)

The watermarked signal is filtered byan elliptic filter with bandwidth from⟨FREQUENCY1⟩ to⟨FREQUENCY2⟩ to simulate a

narrowband telephony channel

⟨FREQUENCY1⟩ ⟨FREQUENCY2⟩500Hz amp4000Hz to300Hz amp3400Hz

R

High PassFilter (HPF)

The watermarked signal is filtered byan elliptic HPF with cutoff frequency

of ⟨FREQUENCY⟩ ⟨FREQUENCY⟩ 500Hz to800Hz S

Security and Communication Networks 9

Table 1 Continued

Attack type Attack name Description Parameter(s) Defaultvalue(s)

Time stretchand pitch shift

Pitch scaleThe pitch of the watermarked signal isnonlinearly scaled without changing

the time⟨SCALEFACTOR⟩ 105 to 1 10 T

Time stretch The time of the watermarked signal isnonlinearly stretched ⟨TEMPOFACTOR⟩ 105 to 110 U

Compression

CELP coding

The watermarked signal is coddedwith rate of ⟨BITRATE⟩ by CELPcodecs and then is decoded to

original one

⟨BITRATE⟩ 16 Kbps to96 kbps V

MP3compression

The watermarked signal iscompressed by MP3 with different

rate ⟨BITRATE⟩ ⟨BITRATE⟩ 128 to 32 W

G711 The watermarked signal is codded bystandard 64 kbps A120583-law PCM NO PARAMETER REQUIRED None X

times104

Δ = 095

Δ = 05

Δ = 025

Δ = 01 Δ = 075

10 15 20 25 30 35 40 45 50 555WNR (dB)

0

1

2

3

4

5

6

7

CBS

C

Figure 8 Variation of the BSC capacity with respect to differentWNRs for different quantization steps

Table 2 compares the BER with state-of-the-art speechwatermarking techniques We implemented all the tech-niques and tested them for the entire TIMIT corpus underdifferent attacks As can be observed the proposed speechwatermarking technique has a lower BER overall comparedwith other techniques

The perceptual quality of the watermarked signal iscritical for the evaluation of the proposed watermarked tech-nique which can be measured based on the mean opinionscore (MOS) (as proposed by the International Telecommu-nicationsUnion (ITU-T) [23]) and SNRTheMOSuses a sub-jective evaluation technique to score the watermarked signalwhich is presented in Table 3 In theMOS evaluationmethod10 people were asked to listen blindly to the original andwatermarked signals Then they reported the dissimilaritiesbetween the quality of the original and watermarked speechsignalsThe average of these reports were computed for MOSmusic and MOS speech and presented in Table 4

times104

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

10 15 20 25 30 35 40 45 50 555WNR (dB)

0

1

2

3

4

5

6C

BSC

Figure 9 Variation of the BSC capacity with respect to differentWNRs for different frame lengths

An objective evaluation technique such as SWR andSNR attempts to quantify this amount based on the followingformula

SNR = 10 times log10sum119899 1198782sum119899 ( minus 119878)2 (31)

where 119878 and are the original and watermarked signalsrespectively

Table 4 presents a comparison of the proposed techniqueand other techniques in terms of imperceptibility and capac-ity Based on the results it seems that the proposed speechwatermarking technique outperformed the other techniquesin terms of capacity and imperceptibility Although the SNRfor formant tuning [21] is higher than the proposed tech-nique the capacity and robustness of the proposed techniqueare greater than those for formant tuning [21] and Analysis-by-Synthesis [22]

10 Security and Communication Networks

Table 2 Comparison with the robustness of different speech watermarking techniques in terms of BER ()

Attack The proposed method DWPT+ multiplication [14] Formant tuning [21] Analysis-by-Synthesis [22]No attack 000 000 004 006A 191ndash423 209ndash543 365ndash645 796ndash965B 965ndash2177 1045ndash2232 1276ndash2445 1623ndash2523C 1013ndash2043 1243ndash2132 1423ndash2354 1743ndash2632D 1053ndash1923 1033ndash1893 1163ndash2323 1533ndash2598E 032ndash202 0763ndash114 123ndash232 298ndash432F 1354ndash1723 1432ndash1765 2623ndash3783 2945ndash3306G 323 265 1932 2387H 023 000 943 1245I 134ndash465 234ndash511 465ndash1043 823ndash1643J 123ndash254 132ndash467 654ndash1054 1154ndash1887K 132ndash316 178ndash423 751ndash1034 1149ndash1943L 092 198 150 404M 312 576 1034 2168N 410 423 665 954O 121ndash254 000ndash143 597ndash876 898ndash1554P 100ndash354 243ndash543 965ndash1456 1965ndash2645Q 2143ndash2943 2454ndash3143 4054ndash4443 5009ndash5032R 484ndash954 532ndash1032 1665ndash2944 2054ndash3698S 1332ndash1854 1500ndash1943 2043ndash2923 2854ndash3076T 132ndash232 201ndash313 743ndash1043 965ndash1532U 015ndash023 018ndash043 145ndash321 432ndash543V 654ndash954 1143ndash1454 132ndash421 232ndash432W 1043ndash2034 1143ndash2534 3632ndash4565 3343ndash5032X 2311 2417 4832 5065Average 580ndash904 668ndash1004 1295ndash1739 1682ndash2148

Table 3 MOS grades [23]

MOS Quality Quality scale Effort required to understand meaning scale(5) Excellent Imperceptible No effort required(4) Good Perceptible but not annoying No appreciable effort required(3) Fair Slightly annoying Moderate effort required(2) Poor Annoying Considerable effort required(1) Bad Very annoying No meaning was understood

As observed in Table 4 each entity was bounded betweentwo values that related a particular value of imperceptibility(SNR andMOS) to a particular capacity Consequently whenthe capacity increased imperceptibility decreasedThe trade-off value is completely application dependent and should bedetermined by the user

5 Performance Analysis

Generally two types of errors false positive probability (FPP)and false negative probability (FNP)must always be analyzedto validate the security of a watermarking system [25] FPPis defined when an unwatermarked speech signal is declaredas a watermarked speech signal by the watermark extractorSimilarly FNP is defined when the watermarked speechsignal is declared as an unwatermarked speech signal by the

watermark extractor By assuming that the watermark bits areindependent random variables both the FPP and FNP can beformulated based on Bernoulli trials which is expressed asfollows

119875119890 = 119879minus1sum119894=0

(119873119894 )119875119894FN (1 minus 119875FN)(119873minus119894)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟FNP

+ 119873sum119894=119879

(119873119894 )119875119894FP (1 minus 119875FP)(119873minus119894)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟FPP

(32)

where119873 is the total number ofwatermark bits 119894 is the numberof matching bits (119873119894 ) is a binomial coefficient 119875FP is theprobability of a false positive which is assumed to be 05

Security and Communication Networks 11

Table 4 Comparison of various watermarking techniques in terms of payload and imperceptibility

Technique Quality scale Effort required to understand meaning scale SNR (dB) Theoretical payload (bps)Analysis-by-Synthesis [22] 401ndash380 476ndash395 2808ndash2532 3333ndash50Formant tuning [21] 498ndash432 500ndash455 3032ndash2754 3333ndash50DWPT+ multiplication [14] 432ndash310 500ndash355 3721ndash2008 3125ndash125The proposed method 487ndash365 500ndash405 4211ndash2071 40ndash400

times10minus4

BER = 021 is shifted by adding 000009BER = 020 is shifted by adding 0000015BER = 019

100 150 200 250 300 350 40050Number of watermark bits

0

1

False

pos

itive

pro

babi

lity

Figure 10 FPP with respect to various total number of watermarkbits for different BER

119875FN is the probability of a false negative which is assumedto be 00919 (as in Table 2) and 119879 is the threshold which iscomputed as follows

119879 = lceil(1 minus BER) times 119873rceil (33)

Figure 10 shows the FPP with respect to various totalnumber of watermark bits for different BER For bettervisualization each line was shifted by adding a constantAs observed the FPP was close to zero for 119873 greater than50 There was a small fluctuation for 119873 less than 50 whichdepended on the BER

Figure 11 shows the FNP with respect to various totalnumber of watermark bits for different BER For bettervisualization each line was shifted by adding a constant Ascan be observed the FNPwas close to zero for119873 greater than100 Additionally whenever the BER decreased the fluctua-tion increased

6 Conclusion and Future Work

In this paper a gain invariant speechwatermarking techniquewas developed using the Lagrange optimization method Forthis purpose samples of the signal were separated based onodd and even indices Then the ratio between the Lp-normswas quantized using the QIM method Finally the Lagrangemethod was used to estimate the optimized values In a sim-ilar manner the extraction process detected the watermarkdata blindly by finding the nearest quantization step

BER = 021 is shifted by adding 0025BER = 020 is shifted by adding 0015BER = 019

0

0005

001

0015

002

0025

003

0035

False

neg

ativ

e pro

babi

lity

100 150 200 250 300 350 40050Number of watermark bits

Figure 11 FNP with respect to various total number of watermarkbits for different BER

By assuming Laplacian distribution for the speech signaland Gaussian distribution for the noise signal the probabilityof error and watermarking distortion were modeled based ona statistical analysis of the proposed technique Additionallyexperimental results not only proved that the developedwatermarking technique was highly robust against differentattacks such compression AWGN filtering and resamplingbut also demonstrated the validity of the analytical modelFor future work an investigation on synchronization andadaptive quantization techniques might contribute to theproposed watermarking technique

Appendix

A Estimation of the Mean and Variance ofthe Ratio of Two Laplacian Variables Basedon Taylor Series

In [26] the bivariate second-order Taylor expansion for119891(119909 119910) around 120579 = (119864(119909) 119864(119910)) is expressed as follows

119891 (119909 119910) = 119891 (120579) + 1198911015840119909 (120579) (119909 minus 120579119909) + 1198911015840119910 (120579) (119910 minus 120579119910)+ 12 11989110158401015840119909119909 (120579) (119909 minus 120579119909)2+ 211989110158401015840119909119910 (120579) (119909 minus 120579119909) (119910 minus 120579119910) + 11989110158401015840119910119910 (120579) (119910 minus 120579119909)2+ remainder

(A1)

12 Security and Communication Networks

Therefore 119864[119891(119883 119884)] can be expanded about 120579 = (119864(119883)119864(119884)) to compute the approximate values as follows

119864 (119891 (119883 119884)) = 119891 (120579) + 12 11989110158401015840119909119909 (120579) var (119883)+ 211989110158401015840119909119910 (120579) cov (119883 119884) + 11989110158401015840119910119910 (120579) var (119884)+ 119874 (119899minus1)

(A2)

For 119891 = 119877119878 11989110158401015840119877119877 = 0 11989110158401015840119877119878 = minus119878minus2 and 11989110158401015840119878119878 = 21198771198783 Thenthe mean and variance of the ratio between 119877 and 119878(119864(119877119878))respectively can be estimated as follows

119864(119877119878 ) equiv 119864 (119891 (119877 119878))asymp 119864 (119877)119864 (119878) minus cov (119877 119878)119864 (119878)2 + var (119878) 119864 (119877)119864 (119878)3= 120583119877120583119878 (1 + 12059021198781205832119878)

var(119877119878 ) asymp 11198642119878 var (119877) + 2minus1198641198771198643119878 cov (119877 119878)+ 11986421198771198644119878 var (119878)

= 12058321198771205832119878 [12059021198771205832119877 minus 2cov (119877 119878)120583119877120583119878 + 12059021198781205832119878 ]

= 12058321198771205832119878 (12059021198781205832119878 minus

12059041198781205834119878) + 12059021198771205832119878

(A3)

B Compute the Absolute Moment ofthe Laplacian Distribution

Themoment of Laplacian distribution expressed as follows

119864 (|119883|119899) = intinfinminusinfin

|119883|119899 sdot 12119887 sdot 119890minus((119883minus120583)119887)119889119909= 12119887 intinfin

minusinfin|119883|119899 sdot 119890minus((119883minus120583)119887)119889119909 (B1)

There are two cases119883 ge 120583 and119883 lt 120583119864 (|119883|119899)

= If 119883 ge 120583 then 12119887 intinfin

minusinfin|119883|119899 sdot 119890minus((119883minus120583)119887)119889119909

If 119883 lt 120583 then 12119887 intinfinminusinfin

|119883|119899 sdot 119890minus((120583minus119883)119887)119889119909(B2)

For first case when119883 ge 120583119864 (|119883|119899) = 12119887 [int0

minusinfinminus119883119899 sdot 119890minus((119883minus120583)119887)119889119909

+ intinfin0

119883119899 sdot 119890minus((119883minus120583)119887)119889119909]

= 1198901205831198872119887 [[[[(minus1)119899 int0

minusinfin119883119899119890119883119887119889119909⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟119868119899

+ intinfin0

119883119899119890minus119883119887119889119909⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟119868

]]]](B3)

If 119905 = minus119883119887 then 119868 can be expressed as

119868 = 119887119899+1 intinfin0

119905119899119890minus119905119889119905 = 119887119899+1 sdot 119899 = 119899 (B4)

119868119899 can also be expressed as

119868119899 = int0minusinfin

119883119899119890119883119887119889119909 = int0minusinfin

(119887 sdot 119905)119899 119890minus119905 sdot 119887 sdot 119889119905= 119887119899+1 int0

minusinfin119905119899119890minus119905119889119905

= 119905119899119890minus119905minus11003816100381610038161003816100381610038161003816100381610038160

minusinfin

minus int0minusinfin

119899 sdot 119905119899minus1119890minus119905minus1 119889119905 = 0 + 119899119868119899minus1(B5)

Substituting (B4) and (B5) into (B3) the absolute momentof the Laplacian distribution can be computed based on

119864 (|119883|119899) = (1198901205831198871198871198992 ) [(minus1)119899 sdot 119868119899 + 119899] (B6)

Competing Interests

The authors declare that they have no competing interests

References

[1] M A Nematollahi C Vorakulpipat and H G Rosales DigitalWatermarking Techniques and Trends vol 11 Springer 2016

[2] MANematollahi and S A R Al-Haddad ldquoAn overview of dig-ital speech watermarkingrdquo International Journal of Speech Tech-nology vol 16 no 4 pp 471ndash488 2013

[3] H-T Hu and L-Y Hsu ldquoA DWT-based rational dither modu-lation scheme for effective blind audio watermarkingrdquo CircuitsSystems and Signal Processing vol 35 no 2 pp 553ndash572 2016

[4] B Chen and G W Wornell ldquoQuantization index modulationa class of provably good methods for digital watermarking andinformation embeddingrdquo Institute of Electrical and ElectronicsEngineers Transactions on InformationTheory vol 47 no 4 pp1423ndash1443 2001

[5] M A Nematollahi M A Akhaee S A R Al-Haddad andH Gamboa-Rosales ldquoSemi-fragile digital speech watermarkingfor online speaker recognitionrdquo Eurasip Journal on AudioSpeech andMusic Processing vol 2015 no 1 article no 31 2015

[6] P Guccione and M Scagliola ldquoHyperbolic RDM for nonlin-ear valumetric distortionsrdquo IEEE Transactions on InformationForensics and Security vol 4 no 1 pp 25ndash35 2009

[7] N Cai N Zhu S Weng and B Wing-Kuen Ling ldquoDifferenceangle quantization index modulation scheme for image water-markingrdquo Signal Processing Image Communication vol 34 pp52ndash60 2015

Security and Communication Networks 13

[8] X Zhu and S Peng ldquoA novel quantization watermarkingscheme by modulating the normalized correlationrdquo in Proceed-ings of the IEEE International Conference on Acoustics Speechand Signal Processing (ICASSP rsquo12) pp 1765ndash1768 IEEE KyotoJapan March 2012

[9] M A Akhaee S M E Sahraeian and C Jin ldquoBlind imagewatermarking using a sample projection approachrdquo IEEETrans-actions on Information Forensics and Security vol 6 no 3 pp883ndash893 2011

[10] N K Kalantari and S M Ahadi ldquoA logarithmic quantizationindex modulation for perceptually better data hidingrdquo IEEETransactions on Image Processing vol 19 no 6 pp 1504ndash15172010

[11] M Zareian andH R Tohidypour ldquoA novel gain invariant quan-tization-based watermarking approachrdquo IEEE Transactions onInformation Forensics and Security vol 9 no 11 pp 1804ndash18132014

[12] M Faundez-Zanuy M Hagmuller and G Kubin ldquoSpeakerverification security improvement by means of speech water-markingrdquo Speech Communication vol 48 no 12 pp 1608ndash16192006

[13] M Faundez-Zanuy M Hagmuller and G Kubin ldquoSpeakeridentification security improvement by means of speech water-markingrdquo Pattern Recognition vol 40 no 11 pp 3027ndash30342007

[14] M A Nematollahi H Gamboa-Rosales M A Akhaee andS A R Al-Haddad ldquoRobust digital speech watermarking foronline speaker recognitionrdquo Mathematical Problems in Engi-neering vol 2015 Article ID 372398 12 pages 2015

[15] M A Nematollahi H Gamboa-Rosales F J Martinez-Ruiz JI de la Rosa-Vargas S A R Al-Haddad and M EsmaeilpourldquoMulti-factor authentication model based on multipurposespeech watermarking and online speaker recognitionrdquo Multi-media Tools and Applications pp 1ndash31 2016

[16] M A Nematollahi S A R Al-Haddad S Doraisamy and HGamboa-Rosales ldquoSpeaker frame selection for digital speechwatermarkingrdquo National Academy Science Letters vol 39 no 3pp 197ndash201 2016

[17] S Gazor andW Zhang ldquoSpeech probability distributionrdquo IEEESignal Processing Letters vol 10 no 7 pp 204ndash207 2003

[18] M A Akhaee N Khademi Kalantari and F Marvasti ldquoRobustaudio and speech watermarking using Gaussian and Laplacianmodelingrdquo Signal Processing vol 90 no 8 pp 2487ndash2497 2010

[19] J S Garofolo and L D Consortium TIMIT Acoustic-PhoneticContinuous Speech Corpus Linguistic Data Consortium 1993

[20] S Verdu and T S Han ldquoA general formula for channel capacityrdquoIEEE Transactions on Information Theory vol 40 no 4 pp1147ndash1157 1994

[21] S Wang and M Unoki ldquoSpeech watermarking method basedon formant tuningrdquo IEICETransactions on Information and Sys-tems vol 98 no 1 pp 29ndash37 2015

[22] B Yan and Y-J Guo ldquoSpeech authentication by semi-fragilespeech watermarking utilizing analysis by synthesis and spec-tral distortion optimizationrdquo Multimedia Tools and Applica-tions vol 67 no 2 pp 383ndash405 2013

[23] I Rec P 800Methods for Subjective Determination of Transmis-sion Quality International Telecommunication Union GenevaSwitzerland 1996

[24] M Steinebach F A P Petitcolas F Raynal et al ldquoStirMarkbenchmark audio watermarking attacksrdquo in Proceedings of theInternational Conference on Information Technology Codingand Computing IEEE 2001

[25] K Vivekananda Bhat I Sengupta and A Das ldquoAn audiowatermarking scheme using singular value decomposition anddither-modulation quantizationrdquo Multimedia Tools and Appli-cations vol 52 no 2-3 pp 369ndash383 2011

[26] R C Elandt-Johnson and N L Johnson Survival Models andData Analysis Wiley Classics Library John Wiley amp Sons NewYork NY USA 1999

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

8 Security and Communication Networks

Table 1 Benchmark for speech watermarking

Attack type Attack name Description Parameter(s) Defaultvalue(s)

Additive Noise

AddBrummIt adds buzz or low frequency sinustone to the watermarked signal to

simulate the impact of a power supply⟨STRENGTH⟩ ⟨FREQUENCY⟩ 2500 55 to

3000 75 A

AddDynNoise It adds a dynamic white noise to thewatermarked signal ⟨STRENGTH⟩ 20 to 40 B

AddFFTNoiseIt adds white noise to the

watermarked signal in the frequencydomain

⟨FFTSIZE⟩ ⟨STRENGTH⟩ 256 1000 to1024 3000 C

AddNoiseA white Gaussian noise is

contaminated the watermarked signalto simulate ambient distortion

⟨STRENGTH⟩ 35 dB level to5 dB D

AddSinus It adds a sinus signal to thewatermarked signal ⟨AMPLITUDE⟩ ⟨FREQUENCY⟩ 120 3000 to

150 3500 E

Conversion

Resampling

The sampling rate of the watermarkedsignal is converted to⟨SAMPLERATE1⟩ and then is

reconverted to ⟨SAMPLERATE2⟩⟨SAMPLERATE1⟩ ⟨SAMPLERATE2⟩ 4KHz 16 KHz

to8KHz 16 KHz

F

Requantization

The sample of the watermarked signalis quantized to ⟨QUANTIZATION1⟩

and then is requantized to⟨QUANTIZATION2⟩⟨QUANTIZATION1⟩ ⟨QUANTIZATION2⟩ 8 bits and 16

bits G

InvertIt inverts all samples in the

watermarked signal like a 180 degreephase shift

NO PARAMETER REQUIRED None H

Ambience EchoAn echo with a delay ⟨DELAY⟩ anddecay ⟨DECAY⟩ is added to the

watermarked signal⟨DELAY⟩ ⟨DECAY⟩ 20ms and 10

to 100ms and50

I

Samplepermutations

Cut samples⟨REMOVENUMBER⟩ samples are

removed from the watermarked signalfrom every ⟨REMOVEDIST⟩ period ⟨REMOVEDIST⟩ ⟨REMOVENUMBER⟩ 1 and 1000 to 7

and 1000 J

Copy samplesSome of the samples of the

watermarked signal are copiedbetween the samples values

⟨PERIOD⟩ ⟨COPYDIST⟩ ⟨COPYCOUNT⟩ 1000 100 30to

1000 200 60K

LSB Zero Set all samples of the watermarkedsignal to zero NO PARAMETER REQUIRED None L

SmoothThe new sample value depends on the

samples before and after themodifying point

NO PARAMETER REQUIRED None M

Stat1 It averages the sample with its nextneighbors NO PARAMETER REQUIRED None N

DynamicsAmplify

The amplitude of the watermarkedsignal is increased up to ⟨FACTOR1⟩

and is decreased down to⟨FACTOR2⟩ respectively⟨FACTOR1⟩ ⟨FACTOR2⟩ 150 and 75

200 and 50 O

Denoising The watermarked signal is denoisedby ⟨FACTOR⟩ ⟨FACTOR⟩ minus80 dB tominus60 dB P

Filters

Low Pass Filter(LPF)

The watermarked signal is filtered byan elliptic LPF with cutoff frequency

of ⟨FREQUENCY⟩ ⟨FREQUENCY⟩ 5KHz to4KHz Q

Band PassFilter (BPF)

The watermarked signal is filtered byan elliptic filter with bandwidth from⟨FREQUENCY1⟩ to⟨FREQUENCY2⟩ to simulate a

narrowband telephony channel

⟨FREQUENCY1⟩ ⟨FREQUENCY2⟩500Hz amp4000Hz to300Hz amp3400Hz

R

High PassFilter (HPF)

The watermarked signal is filtered byan elliptic HPF with cutoff frequency

of ⟨FREQUENCY⟩ ⟨FREQUENCY⟩ 500Hz to800Hz S

Security and Communication Networks 9

Table 1 Continued

Attack type Attack name Description Parameter(s) Defaultvalue(s)

Time stretchand pitch shift

Pitch scaleThe pitch of the watermarked signal isnonlinearly scaled without changing

the time⟨SCALEFACTOR⟩ 105 to 1 10 T

Time stretch The time of the watermarked signal isnonlinearly stretched ⟨TEMPOFACTOR⟩ 105 to 110 U

Compression

CELP coding

The watermarked signal is coddedwith rate of ⟨BITRATE⟩ by CELPcodecs and then is decoded to

original one

⟨BITRATE⟩ 16 Kbps to96 kbps V

MP3compression

The watermarked signal iscompressed by MP3 with different

rate ⟨BITRATE⟩ ⟨BITRATE⟩ 128 to 32 W

G711 The watermarked signal is codded bystandard 64 kbps A120583-law PCM NO PARAMETER REQUIRED None X

times104

Δ = 095

Δ = 05

Δ = 025

Δ = 01 Δ = 075

10 15 20 25 30 35 40 45 50 555WNR (dB)

0

1

2

3

4

5

6

7

CBS

C

Figure 8 Variation of the BSC capacity with respect to differentWNRs for different quantization steps

Table 2 compares the BER with state-of-the-art speechwatermarking techniques We implemented all the tech-niques and tested them for the entire TIMIT corpus underdifferent attacks As can be observed the proposed speechwatermarking technique has a lower BER overall comparedwith other techniques

The perceptual quality of the watermarked signal iscritical for the evaluation of the proposed watermarked tech-nique which can be measured based on the mean opinionscore (MOS) (as proposed by the International Telecommu-nicationsUnion (ITU-T) [23]) and SNRTheMOSuses a sub-jective evaluation technique to score the watermarked signalwhich is presented in Table 3 In theMOS evaluationmethod10 people were asked to listen blindly to the original andwatermarked signals Then they reported the dissimilaritiesbetween the quality of the original and watermarked speechsignalsThe average of these reports were computed for MOSmusic and MOS speech and presented in Table 4

times104

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

10 15 20 25 30 35 40 45 50 555WNR (dB)

0

1

2

3

4

5

6C

BSC

Figure 9 Variation of the BSC capacity with respect to differentWNRs for different frame lengths

An objective evaluation technique such as SWR andSNR attempts to quantify this amount based on the followingformula

SNR = 10 times log10sum119899 1198782sum119899 ( minus 119878)2 (31)

where 119878 and are the original and watermarked signalsrespectively

Table 4 presents a comparison of the proposed techniqueand other techniques in terms of imperceptibility and capac-ity Based on the results it seems that the proposed speechwatermarking technique outperformed the other techniquesin terms of capacity and imperceptibility Although the SNRfor formant tuning [21] is higher than the proposed tech-nique the capacity and robustness of the proposed techniqueare greater than those for formant tuning [21] and Analysis-by-Synthesis [22]

10 Security and Communication Networks

Table 2 Comparison with the robustness of different speech watermarking techniques in terms of BER ()

Attack The proposed method DWPT+ multiplication [14] Formant tuning [21] Analysis-by-Synthesis [22]No attack 000 000 004 006A 191ndash423 209ndash543 365ndash645 796ndash965B 965ndash2177 1045ndash2232 1276ndash2445 1623ndash2523C 1013ndash2043 1243ndash2132 1423ndash2354 1743ndash2632D 1053ndash1923 1033ndash1893 1163ndash2323 1533ndash2598E 032ndash202 0763ndash114 123ndash232 298ndash432F 1354ndash1723 1432ndash1765 2623ndash3783 2945ndash3306G 323 265 1932 2387H 023 000 943 1245I 134ndash465 234ndash511 465ndash1043 823ndash1643J 123ndash254 132ndash467 654ndash1054 1154ndash1887K 132ndash316 178ndash423 751ndash1034 1149ndash1943L 092 198 150 404M 312 576 1034 2168N 410 423 665 954O 121ndash254 000ndash143 597ndash876 898ndash1554P 100ndash354 243ndash543 965ndash1456 1965ndash2645Q 2143ndash2943 2454ndash3143 4054ndash4443 5009ndash5032R 484ndash954 532ndash1032 1665ndash2944 2054ndash3698S 1332ndash1854 1500ndash1943 2043ndash2923 2854ndash3076T 132ndash232 201ndash313 743ndash1043 965ndash1532U 015ndash023 018ndash043 145ndash321 432ndash543V 654ndash954 1143ndash1454 132ndash421 232ndash432W 1043ndash2034 1143ndash2534 3632ndash4565 3343ndash5032X 2311 2417 4832 5065Average 580ndash904 668ndash1004 1295ndash1739 1682ndash2148

Table 3 MOS grades [23]

MOS Quality Quality scale Effort required to understand meaning scale(5) Excellent Imperceptible No effort required(4) Good Perceptible but not annoying No appreciable effort required(3) Fair Slightly annoying Moderate effort required(2) Poor Annoying Considerable effort required(1) Bad Very annoying No meaning was understood

As observed in Table 4 each entity was bounded betweentwo values that related a particular value of imperceptibility(SNR andMOS) to a particular capacity Consequently whenthe capacity increased imperceptibility decreasedThe trade-off value is completely application dependent and should bedetermined by the user

5 Performance Analysis

Generally two types of errors false positive probability (FPP)and false negative probability (FNP)must always be analyzedto validate the security of a watermarking system [25] FPPis defined when an unwatermarked speech signal is declaredas a watermarked speech signal by the watermark extractorSimilarly FNP is defined when the watermarked speechsignal is declared as an unwatermarked speech signal by the

watermark extractor By assuming that the watermark bits areindependent random variables both the FPP and FNP can beformulated based on Bernoulli trials which is expressed asfollows

119875119890 = 119879minus1sum119894=0

(119873119894 )119875119894FN (1 minus 119875FN)(119873minus119894)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟FNP

+ 119873sum119894=119879

(119873119894 )119875119894FP (1 minus 119875FP)(119873minus119894)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟FPP

(32)

where119873 is the total number ofwatermark bits 119894 is the numberof matching bits (119873119894 ) is a binomial coefficient 119875FP is theprobability of a false positive which is assumed to be 05

Security and Communication Networks 11

Table 4 Comparison of various watermarking techniques in terms of payload and imperceptibility

Technique Quality scale Effort required to understand meaning scale SNR (dB) Theoretical payload (bps)Analysis-by-Synthesis [22] 401ndash380 476ndash395 2808ndash2532 3333ndash50Formant tuning [21] 498ndash432 500ndash455 3032ndash2754 3333ndash50DWPT+ multiplication [14] 432ndash310 500ndash355 3721ndash2008 3125ndash125The proposed method 487ndash365 500ndash405 4211ndash2071 40ndash400

times10minus4

BER = 021 is shifted by adding 000009BER = 020 is shifted by adding 0000015BER = 019

100 150 200 250 300 350 40050Number of watermark bits

0

1

False

pos

itive

pro

babi

lity

Figure 10 FPP with respect to various total number of watermarkbits for different BER

119875FN is the probability of a false negative which is assumedto be 00919 (as in Table 2) and 119879 is the threshold which iscomputed as follows

119879 = lceil(1 minus BER) times 119873rceil (33)

Figure 10 shows the FPP with respect to various totalnumber of watermark bits for different BER For bettervisualization each line was shifted by adding a constantAs observed the FPP was close to zero for 119873 greater than50 There was a small fluctuation for 119873 less than 50 whichdepended on the BER

Figure 11 shows the FNP with respect to various totalnumber of watermark bits for different BER For bettervisualization each line was shifted by adding a constant Ascan be observed the FNPwas close to zero for119873 greater than100 Additionally whenever the BER decreased the fluctua-tion increased

6 Conclusion and Future Work

In this paper a gain invariant speechwatermarking techniquewas developed using the Lagrange optimization method Forthis purpose samples of the signal were separated based onodd and even indices Then the ratio between the Lp-normswas quantized using the QIM method Finally the Lagrangemethod was used to estimate the optimized values In a sim-ilar manner the extraction process detected the watermarkdata blindly by finding the nearest quantization step

BER = 021 is shifted by adding 0025BER = 020 is shifted by adding 0015BER = 019

0

0005

001

0015

002

0025

003

0035

False

neg

ativ

e pro

babi

lity

100 150 200 250 300 350 40050Number of watermark bits

Figure 11 FNP with respect to various total number of watermarkbits for different BER

By assuming Laplacian distribution for the speech signaland Gaussian distribution for the noise signal the probabilityof error and watermarking distortion were modeled based ona statistical analysis of the proposed technique Additionallyexperimental results not only proved that the developedwatermarking technique was highly robust against differentattacks such compression AWGN filtering and resamplingbut also demonstrated the validity of the analytical modelFor future work an investigation on synchronization andadaptive quantization techniques might contribute to theproposed watermarking technique

Appendix

A Estimation of the Mean and Variance ofthe Ratio of Two Laplacian Variables Basedon Taylor Series

In [26] the bivariate second-order Taylor expansion for119891(119909 119910) around 120579 = (119864(119909) 119864(119910)) is expressed as follows

119891 (119909 119910) = 119891 (120579) + 1198911015840119909 (120579) (119909 minus 120579119909) + 1198911015840119910 (120579) (119910 minus 120579119910)+ 12 11989110158401015840119909119909 (120579) (119909 minus 120579119909)2+ 211989110158401015840119909119910 (120579) (119909 minus 120579119909) (119910 minus 120579119910) + 11989110158401015840119910119910 (120579) (119910 minus 120579119909)2+ remainder

(A1)

12 Security and Communication Networks

Therefore 119864[119891(119883 119884)] can be expanded about 120579 = (119864(119883)119864(119884)) to compute the approximate values as follows

119864 (119891 (119883 119884)) = 119891 (120579) + 12 11989110158401015840119909119909 (120579) var (119883)+ 211989110158401015840119909119910 (120579) cov (119883 119884) + 11989110158401015840119910119910 (120579) var (119884)+ 119874 (119899minus1)

(A2)

For 119891 = 119877119878 11989110158401015840119877119877 = 0 11989110158401015840119877119878 = minus119878minus2 and 11989110158401015840119878119878 = 21198771198783 Thenthe mean and variance of the ratio between 119877 and 119878(119864(119877119878))respectively can be estimated as follows

119864(119877119878 ) equiv 119864 (119891 (119877 119878))asymp 119864 (119877)119864 (119878) minus cov (119877 119878)119864 (119878)2 + var (119878) 119864 (119877)119864 (119878)3= 120583119877120583119878 (1 + 12059021198781205832119878)

var(119877119878 ) asymp 11198642119878 var (119877) + 2minus1198641198771198643119878 cov (119877 119878)+ 11986421198771198644119878 var (119878)

= 12058321198771205832119878 [12059021198771205832119877 minus 2cov (119877 119878)120583119877120583119878 + 12059021198781205832119878 ]

= 12058321198771205832119878 (12059021198781205832119878 minus

12059041198781205834119878) + 12059021198771205832119878

(A3)

B Compute the Absolute Moment ofthe Laplacian Distribution

Themoment of Laplacian distribution expressed as follows

119864 (|119883|119899) = intinfinminusinfin

|119883|119899 sdot 12119887 sdot 119890minus((119883minus120583)119887)119889119909= 12119887 intinfin

minusinfin|119883|119899 sdot 119890minus((119883minus120583)119887)119889119909 (B1)

There are two cases119883 ge 120583 and119883 lt 120583119864 (|119883|119899)

= If 119883 ge 120583 then 12119887 intinfin

minusinfin|119883|119899 sdot 119890minus((119883minus120583)119887)119889119909

If 119883 lt 120583 then 12119887 intinfinminusinfin

|119883|119899 sdot 119890minus((120583minus119883)119887)119889119909(B2)

For first case when119883 ge 120583119864 (|119883|119899) = 12119887 [int0

minusinfinminus119883119899 sdot 119890minus((119883minus120583)119887)119889119909

+ intinfin0

119883119899 sdot 119890minus((119883minus120583)119887)119889119909]

= 1198901205831198872119887 [[[[(minus1)119899 int0

minusinfin119883119899119890119883119887119889119909⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟119868119899

+ intinfin0

119883119899119890minus119883119887119889119909⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟119868

]]]](B3)

If 119905 = minus119883119887 then 119868 can be expressed as

119868 = 119887119899+1 intinfin0

119905119899119890minus119905119889119905 = 119887119899+1 sdot 119899 = 119899 (B4)

119868119899 can also be expressed as

119868119899 = int0minusinfin

119883119899119890119883119887119889119909 = int0minusinfin

(119887 sdot 119905)119899 119890minus119905 sdot 119887 sdot 119889119905= 119887119899+1 int0

minusinfin119905119899119890minus119905119889119905

= 119905119899119890minus119905minus11003816100381610038161003816100381610038161003816100381610038160

minusinfin

minus int0minusinfin

119899 sdot 119905119899minus1119890minus119905minus1 119889119905 = 0 + 119899119868119899minus1(B5)

Substituting (B4) and (B5) into (B3) the absolute momentof the Laplacian distribution can be computed based on

119864 (|119883|119899) = (1198901205831198871198871198992 ) [(minus1)119899 sdot 119868119899 + 119899] (B6)

Competing Interests

The authors declare that they have no competing interests

References

[1] M A Nematollahi C Vorakulpipat and H G Rosales DigitalWatermarking Techniques and Trends vol 11 Springer 2016

[2] MANematollahi and S A R Al-Haddad ldquoAn overview of dig-ital speech watermarkingrdquo International Journal of Speech Tech-nology vol 16 no 4 pp 471ndash488 2013

[3] H-T Hu and L-Y Hsu ldquoA DWT-based rational dither modu-lation scheme for effective blind audio watermarkingrdquo CircuitsSystems and Signal Processing vol 35 no 2 pp 553ndash572 2016

[4] B Chen and G W Wornell ldquoQuantization index modulationa class of provably good methods for digital watermarking andinformation embeddingrdquo Institute of Electrical and ElectronicsEngineers Transactions on InformationTheory vol 47 no 4 pp1423ndash1443 2001

[5] M A Nematollahi M A Akhaee S A R Al-Haddad andH Gamboa-Rosales ldquoSemi-fragile digital speech watermarkingfor online speaker recognitionrdquo Eurasip Journal on AudioSpeech andMusic Processing vol 2015 no 1 article no 31 2015

[6] P Guccione and M Scagliola ldquoHyperbolic RDM for nonlin-ear valumetric distortionsrdquo IEEE Transactions on InformationForensics and Security vol 4 no 1 pp 25ndash35 2009

[7] N Cai N Zhu S Weng and B Wing-Kuen Ling ldquoDifferenceangle quantization index modulation scheme for image water-markingrdquo Signal Processing Image Communication vol 34 pp52ndash60 2015

Security and Communication Networks 13

[8] X Zhu and S Peng ldquoA novel quantization watermarkingscheme by modulating the normalized correlationrdquo in Proceed-ings of the IEEE International Conference on Acoustics Speechand Signal Processing (ICASSP rsquo12) pp 1765ndash1768 IEEE KyotoJapan March 2012

[9] M A Akhaee S M E Sahraeian and C Jin ldquoBlind imagewatermarking using a sample projection approachrdquo IEEETrans-actions on Information Forensics and Security vol 6 no 3 pp883ndash893 2011

[10] N K Kalantari and S M Ahadi ldquoA logarithmic quantizationindex modulation for perceptually better data hidingrdquo IEEETransactions on Image Processing vol 19 no 6 pp 1504ndash15172010

[11] M Zareian andH R Tohidypour ldquoA novel gain invariant quan-tization-based watermarking approachrdquo IEEE Transactions onInformation Forensics and Security vol 9 no 11 pp 1804ndash18132014

[12] M Faundez-Zanuy M Hagmuller and G Kubin ldquoSpeakerverification security improvement by means of speech water-markingrdquo Speech Communication vol 48 no 12 pp 1608ndash16192006

[13] M Faundez-Zanuy M Hagmuller and G Kubin ldquoSpeakeridentification security improvement by means of speech water-markingrdquo Pattern Recognition vol 40 no 11 pp 3027ndash30342007

[14] M A Nematollahi H Gamboa-Rosales M A Akhaee andS A R Al-Haddad ldquoRobust digital speech watermarking foronline speaker recognitionrdquo Mathematical Problems in Engi-neering vol 2015 Article ID 372398 12 pages 2015

[15] M A Nematollahi H Gamboa-Rosales F J Martinez-Ruiz JI de la Rosa-Vargas S A R Al-Haddad and M EsmaeilpourldquoMulti-factor authentication model based on multipurposespeech watermarking and online speaker recognitionrdquo Multi-media Tools and Applications pp 1ndash31 2016

[16] M A Nematollahi S A R Al-Haddad S Doraisamy and HGamboa-Rosales ldquoSpeaker frame selection for digital speechwatermarkingrdquo National Academy Science Letters vol 39 no 3pp 197ndash201 2016

[17] S Gazor andW Zhang ldquoSpeech probability distributionrdquo IEEESignal Processing Letters vol 10 no 7 pp 204ndash207 2003

[18] M A Akhaee N Khademi Kalantari and F Marvasti ldquoRobustaudio and speech watermarking using Gaussian and Laplacianmodelingrdquo Signal Processing vol 90 no 8 pp 2487ndash2497 2010

[19] J S Garofolo and L D Consortium TIMIT Acoustic-PhoneticContinuous Speech Corpus Linguistic Data Consortium 1993

[20] S Verdu and T S Han ldquoA general formula for channel capacityrdquoIEEE Transactions on Information Theory vol 40 no 4 pp1147ndash1157 1994

[21] S Wang and M Unoki ldquoSpeech watermarking method basedon formant tuningrdquo IEICETransactions on Information and Sys-tems vol 98 no 1 pp 29ndash37 2015

[22] B Yan and Y-J Guo ldquoSpeech authentication by semi-fragilespeech watermarking utilizing analysis by synthesis and spec-tral distortion optimizationrdquo Multimedia Tools and Applica-tions vol 67 no 2 pp 383ndash405 2013

[23] I Rec P 800Methods for Subjective Determination of Transmis-sion Quality International Telecommunication Union GenevaSwitzerland 1996

[24] M Steinebach F A P Petitcolas F Raynal et al ldquoStirMarkbenchmark audio watermarking attacksrdquo in Proceedings of theInternational Conference on Information Technology Codingand Computing IEEE 2001

[25] K Vivekananda Bhat I Sengupta and A Das ldquoAn audiowatermarking scheme using singular value decomposition anddither-modulation quantizationrdquo Multimedia Tools and Appli-cations vol 52 no 2-3 pp 369ndash383 2011

[26] R C Elandt-Johnson and N L Johnson Survival Models andData Analysis Wiley Classics Library John Wiley amp Sons NewYork NY USA 1999

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Security and Communication Networks 9

Table 1 Continued

Attack type Attack name Description Parameter(s) Defaultvalue(s)

Time stretchand pitch shift

Pitch scaleThe pitch of the watermarked signal isnonlinearly scaled without changing

the time⟨SCALEFACTOR⟩ 105 to 1 10 T

Time stretch The time of the watermarked signal isnonlinearly stretched ⟨TEMPOFACTOR⟩ 105 to 110 U

Compression

CELP coding

The watermarked signal is coddedwith rate of ⟨BITRATE⟩ by CELPcodecs and then is decoded to

original one

⟨BITRATE⟩ 16 Kbps to96 kbps V

MP3compression

The watermarked signal iscompressed by MP3 with different

rate ⟨BITRATE⟩ ⟨BITRATE⟩ 128 to 32 W

G711 The watermarked signal is codded bystandard 64 kbps A120583-law PCM NO PARAMETER REQUIRED None X

times104

Δ = 095

Δ = 05

Δ = 025

Δ = 01 Δ = 075

10 15 20 25 30 35 40 45 50 555WNR (dB)

0

1

2

3

4

5

6

7

CBS

C

Figure 8 Variation of the BSC capacity with respect to differentWNRs for different quantization steps

Table 2 compares the BER with state-of-the-art speechwatermarking techniques We implemented all the tech-niques and tested them for the entire TIMIT corpus underdifferent attacks As can be observed the proposed speechwatermarking technique has a lower BER overall comparedwith other techniques

The perceptual quality of the watermarked signal iscritical for the evaluation of the proposed watermarked tech-nique which can be measured based on the mean opinionscore (MOS) (as proposed by the International Telecommu-nicationsUnion (ITU-T) [23]) and SNRTheMOSuses a sub-jective evaluation technique to score the watermarked signalwhich is presented in Table 3 In theMOS evaluationmethod10 people were asked to listen blindly to the original andwatermarked signals Then they reported the dissimilaritiesbetween the quality of the original and watermarked speechsignalsThe average of these reports were computed for MOSmusic and MOS speech and presented in Table 4

times104

Frame rate is 40 samplesFrame rate is 100 samplesFrame rate is 200 samples

Frame rate is 300 samplesFrame rate is 400 samples

10 15 20 25 30 35 40 45 50 555WNR (dB)

0

1

2

3

4

5

6C

BSC

Figure 9 Variation of the BSC capacity with respect to differentWNRs for different frame lengths

An objective evaluation technique such as SWR andSNR attempts to quantify this amount based on the followingformula

SNR = 10 times log10sum119899 1198782sum119899 ( minus 119878)2 (31)

where 119878 and are the original and watermarked signalsrespectively

Table 4 presents a comparison of the proposed techniqueand other techniques in terms of imperceptibility and capac-ity Based on the results it seems that the proposed speechwatermarking technique outperformed the other techniquesin terms of capacity and imperceptibility Although the SNRfor formant tuning [21] is higher than the proposed tech-nique the capacity and robustness of the proposed techniqueare greater than those for formant tuning [21] and Analysis-by-Synthesis [22]

10 Security and Communication Networks

Table 2 Comparison with the robustness of different speech watermarking techniques in terms of BER ()

Attack The proposed method DWPT+ multiplication [14] Formant tuning [21] Analysis-by-Synthesis [22]No attack 000 000 004 006A 191ndash423 209ndash543 365ndash645 796ndash965B 965ndash2177 1045ndash2232 1276ndash2445 1623ndash2523C 1013ndash2043 1243ndash2132 1423ndash2354 1743ndash2632D 1053ndash1923 1033ndash1893 1163ndash2323 1533ndash2598E 032ndash202 0763ndash114 123ndash232 298ndash432F 1354ndash1723 1432ndash1765 2623ndash3783 2945ndash3306G 323 265 1932 2387H 023 000 943 1245I 134ndash465 234ndash511 465ndash1043 823ndash1643J 123ndash254 132ndash467 654ndash1054 1154ndash1887K 132ndash316 178ndash423 751ndash1034 1149ndash1943L 092 198 150 404M 312 576 1034 2168N 410 423 665 954O 121ndash254 000ndash143 597ndash876 898ndash1554P 100ndash354 243ndash543 965ndash1456 1965ndash2645Q 2143ndash2943 2454ndash3143 4054ndash4443 5009ndash5032R 484ndash954 532ndash1032 1665ndash2944 2054ndash3698S 1332ndash1854 1500ndash1943 2043ndash2923 2854ndash3076T 132ndash232 201ndash313 743ndash1043 965ndash1532U 015ndash023 018ndash043 145ndash321 432ndash543V 654ndash954 1143ndash1454 132ndash421 232ndash432W 1043ndash2034 1143ndash2534 3632ndash4565 3343ndash5032X 2311 2417 4832 5065Average 580ndash904 668ndash1004 1295ndash1739 1682ndash2148

Table 3 MOS grades [23]

MOS Quality Quality scale Effort required to understand meaning scale(5) Excellent Imperceptible No effort required(4) Good Perceptible but not annoying No appreciable effort required(3) Fair Slightly annoying Moderate effort required(2) Poor Annoying Considerable effort required(1) Bad Very annoying No meaning was understood

As observed in Table 4 each entity was bounded betweentwo values that related a particular value of imperceptibility(SNR andMOS) to a particular capacity Consequently whenthe capacity increased imperceptibility decreasedThe trade-off value is completely application dependent and should bedetermined by the user

5 Performance Analysis

Generally two types of errors false positive probability (FPP)and false negative probability (FNP)must always be analyzedto validate the security of a watermarking system [25] FPPis defined when an unwatermarked speech signal is declaredas a watermarked speech signal by the watermark extractorSimilarly FNP is defined when the watermarked speechsignal is declared as an unwatermarked speech signal by the

watermark extractor By assuming that the watermark bits areindependent random variables both the FPP and FNP can beformulated based on Bernoulli trials which is expressed asfollows

119875119890 = 119879minus1sum119894=0

(119873119894 )119875119894FN (1 minus 119875FN)(119873minus119894)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟FNP

+ 119873sum119894=119879

(119873119894 )119875119894FP (1 minus 119875FP)(119873minus119894)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟FPP

(32)

where119873 is the total number ofwatermark bits 119894 is the numberof matching bits (119873119894 ) is a binomial coefficient 119875FP is theprobability of a false positive which is assumed to be 05

Security and Communication Networks 11

Table 4 Comparison of various watermarking techniques in terms of payload and imperceptibility

Technique Quality scale Effort required to understand meaning scale SNR (dB) Theoretical payload (bps)Analysis-by-Synthesis [22] 401ndash380 476ndash395 2808ndash2532 3333ndash50Formant tuning [21] 498ndash432 500ndash455 3032ndash2754 3333ndash50DWPT+ multiplication [14] 432ndash310 500ndash355 3721ndash2008 3125ndash125The proposed method 487ndash365 500ndash405 4211ndash2071 40ndash400

times10minus4

BER = 021 is shifted by adding 000009BER = 020 is shifted by adding 0000015BER = 019

100 150 200 250 300 350 40050Number of watermark bits

0

1

False

pos

itive

pro

babi

lity

Figure 10 FPP with respect to various total number of watermarkbits for different BER

119875FN is the probability of a false negative which is assumedto be 00919 (as in Table 2) and 119879 is the threshold which iscomputed as follows

119879 = lceil(1 minus BER) times 119873rceil (33)

Figure 10 shows the FPP with respect to various totalnumber of watermark bits for different BER For bettervisualization each line was shifted by adding a constantAs observed the FPP was close to zero for 119873 greater than50 There was a small fluctuation for 119873 less than 50 whichdepended on the BER

Figure 11 shows the FNP with respect to various totalnumber of watermark bits for different BER For bettervisualization each line was shifted by adding a constant Ascan be observed the FNPwas close to zero for119873 greater than100 Additionally whenever the BER decreased the fluctua-tion increased

6 Conclusion and Future Work

In this paper a gain invariant speechwatermarking techniquewas developed using the Lagrange optimization method Forthis purpose samples of the signal were separated based onodd and even indices Then the ratio between the Lp-normswas quantized using the QIM method Finally the Lagrangemethod was used to estimate the optimized values In a sim-ilar manner the extraction process detected the watermarkdata blindly by finding the nearest quantization step

BER = 021 is shifted by adding 0025BER = 020 is shifted by adding 0015BER = 019

0

0005

001

0015

002

0025

003

0035

False

neg

ativ

e pro

babi

lity

100 150 200 250 300 350 40050Number of watermark bits

Figure 11 FNP with respect to various total number of watermarkbits for different BER

By assuming Laplacian distribution for the speech signaland Gaussian distribution for the noise signal the probabilityof error and watermarking distortion were modeled based ona statistical analysis of the proposed technique Additionallyexperimental results not only proved that the developedwatermarking technique was highly robust against differentattacks such compression AWGN filtering and resamplingbut also demonstrated the validity of the analytical modelFor future work an investigation on synchronization andadaptive quantization techniques might contribute to theproposed watermarking technique

Appendix

A Estimation of the Mean and Variance ofthe Ratio of Two Laplacian Variables Basedon Taylor Series

In [26] the bivariate second-order Taylor expansion for119891(119909 119910) around 120579 = (119864(119909) 119864(119910)) is expressed as follows

119891 (119909 119910) = 119891 (120579) + 1198911015840119909 (120579) (119909 minus 120579119909) + 1198911015840119910 (120579) (119910 minus 120579119910)+ 12 11989110158401015840119909119909 (120579) (119909 minus 120579119909)2+ 211989110158401015840119909119910 (120579) (119909 minus 120579119909) (119910 minus 120579119910) + 11989110158401015840119910119910 (120579) (119910 minus 120579119909)2+ remainder

(A1)

12 Security and Communication Networks

Therefore 119864[119891(119883 119884)] can be expanded about 120579 = (119864(119883)119864(119884)) to compute the approximate values as follows

119864 (119891 (119883 119884)) = 119891 (120579) + 12 11989110158401015840119909119909 (120579) var (119883)+ 211989110158401015840119909119910 (120579) cov (119883 119884) + 11989110158401015840119910119910 (120579) var (119884)+ 119874 (119899minus1)

(A2)

For 119891 = 119877119878 11989110158401015840119877119877 = 0 11989110158401015840119877119878 = minus119878minus2 and 11989110158401015840119878119878 = 21198771198783 Thenthe mean and variance of the ratio between 119877 and 119878(119864(119877119878))respectively can be estimated as follows

119864(119877119878 ) equiv 119864 (119891 (119877 119878))asymp 119864 (119877)119864 (119878) minus cov (119877 119878)119864 (119878)2 + var (119878) 119864 (119877)119864 (119878)3= 120583119877120583119878 (1 + 12059021198781205832119878)

var(119877119878 ) asymp 11198642119878 var (119877) + 2minus1198641198771198643119878 cov (119877 119878)+ 11986421198771198644119878 var (119878)

= 12058321198771205832119878 [12059021198771205832119877 minus 2cov (119877 119878)120583119877120583119878 + 12059021198781205832119878 ]

= 12058321198771205832119878 (12059021198781205832119878 minus

12059041198781205834119878) + 12059021198771205832119878

(A3)

B Compute the Absolute Moment ofthe Laplacian Distribution

Themoment of Laplacian distribution expressed as follows

119864 (|119883|119899) = intinfinminusinfin

|119883|119899 sdot 12119887 sdot 119890minus((119883minus120583)119887)119889119909= 12119887 intinfin

minusinfin|119883|119899 sdot 119890minus((119883minus120583)119887)119889119909 (B1)

There are two cases119883 ge 120583 and119883 lt 120583119864 (|119883|119899)

= If 119883 ge 120583 then 12119887 intinfin

minusinfin|119883|119899 sdot 119890minus((119883minus120583)119887)119889119909

If 119883 lt 120583 then 12119887 intinfinminusinfin

|119883|119899 sdot 119890minus((120583minus119883)119887)119889119909(B2)

For first case when119883 ge 120583119864 (|119883|119899) = 12119887 [int0

minusinfinminus119883119899 sdot 119890minus((119883minus120583)119887)119889119909

+ intinfin0

119883119899 sdot 119890minus((119883minus120583)119887)119889119909]

= 1198901205831198872119887 [[[[(minus1)119899 int0

minusinfin119883119899119890119883119887119889119909⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟119868119899

+ intinfin0

119883119899119890minus119883119887119889119909⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟119868

]]]](B3)

If 119905 = minus119883119887 then 119868 can be expressed as

119868 = 119887119899+1 intinfin0

119905119899119890minus119905119889119905 = 119887119899+1 sdot 119899 = 119899 (B4)

119868119899 can also be expressed as

119868119899 = int0minusinfin

119883119899119890119883119887119889119909 = int0minusinfin

(119887 sdot 119905)119899 119890minus119905 sdot 119887 sdot 119889119905= 119887119899+1 int0

minusinfin119905119899119890minus119905119889119905

= 119905119899119890minus119905minus11003816100381610038161003816100381610038161003816100381610038160

minusinfin

minus int0minusinfin

119899 sdot 119905119899minus1119890minus119905minus1 119889119905 = 0 + 119899119868119899minus1(B5)

Substituting (B4) and (B5) into (B3) the absolute momentof the Laplacian distribution can be computed based on

119864 (|119883|119899) = (1198901205831198871198871198992 ) [(minus1)119899 sdot 119868119899 + 119899] (B6)

Competing Interests

The authors declare that they have no competing interests

References

[1] M A Nematollahi C Vorakulpipat and H G Rosales DigitalWatermarking Techniques and Trends vol 11 Springer 2016

[2] MANematollahi and S A R Al-Haddad ldquoAn overview of dig-ital speech watermarkingrdquo International Journal of Speech Tech-nology vol 16 no 4 pp 471ndash488 2013

[3] H-T Hu and L-Y Hsu ldquoA DWT-based rational dither modu-lation scheme for effective blind audio watermarkingrdquo CircuitsSystems and Signal Processing vol 35 no 2 pp 553ndash572 2016

[4] B Chen and G W Wornell ldquoQuantization index modulationa class of provably good methods for digital watermarking andinformation embeddingrdquo Institute of Electrical and ElectronicsEngineers Transactions on InformationTheory vol 47 no 4 pp1423ndash1443 2001

[5] M A Nematollahi M A Akhaee S A R Al-Haddad andH Gamboa-Rosales ldquoSemi-fragile digital speech watermarkingfor online speaker recognitionrdquo Eurasip Journal on AudioSpeech andMusic Processing vol 2015 no 1 article no 31 2015

[6] P Guccione and M Scagliola ldquoHyperbolic RDM for nonlin-ear valumetric distortionsrdquo IEEE Transactions on InformationForensics and Security vol 4 no 1 pp 25ndash35 2009

[7] N Cai N Zhu S Weng and B Wing-Kuen Ling ldquoDifferenceangle quantization index modulation scheme for image water-markingrdquo Signal Processing Image Communication vol 34 pp52ndash60 2015

Security and Communication Networks 13

[8] X Zhu and S Peng ldquoA novel quantization watermarkingscheme by modulating the normalized correlationrdquo in Proceed-ings of the IEEE International Conference on Acoustics Speechand Signal Processing (ICASSP rsquo12) pp 1765ndash1768 IEEE KyotoJapan March 2012

[9] M A Akhaee S M E Sahraeian and C Jin ldquoBlind imagewatermarking using a sample projection approachrdquo IEEETrans-actions on Information Forensics and Security vol 6 no 3 pp883ndash893 2011

[10] N K Kalantari and S M Ahadi ldquoA logarithmic quantizationindex modulation for perceptually better data hidingrdquo IEEETransactions on Image Processing vol 19 no 6 pp 1504ndash15172010

[11] M Zareian andH R Tohidypour ldquoA novel gain invariant quan-tization-based watermarking approachrdquo IEEE Transactions onInformation Forensics and Security vol 9 no 11 pp 1804ndash18132014

[12] M Faundez-Zanuy M Hagmuller and G Kubin ldquoSpeakerverification security improvement by means of speech water-markingrdquo Speech Communication vol 48 no 12 pp 1608ndash16192006

[13] M Faundez-Zanuy M Hagmuller and G Kubin ldquoSpeakeridentification security improvement by means of speech water-markingrdquo Pattern Recognition vol 40 no 11 pp 3027ndash30342007

[14] M A Nematollahi H Gamboa-Rosales M A Akhaee andS A R Al-Haddad ldquoRobust digital speech watermarking foronline speaker recognitionrdquo Mathematical Problems in Engi-neering vol 2015 Article ID 372398 12 pages 2015

[15] M A Nematollahi H Gamboa-Rosales F J Martinez-Ruiz JI de la Rosa-Vargas S A R Al-Haddad and M EsmaeilpourldquoMulti-factor authentication model based on multipurposespeech watermarking and online speaker recognitionrdquo Multi-media Tools and Applications pp 1ndash31 2016

[16] M A Nematollahi S A R Al-Haddad S Doraisamy and HGamboa-Rosales ldquoSpeaker frame selection for digital speechwatermarkingrdquo National Academy Science Letters vol 39 no 3pp 197ndash201 2016

[17] S Gazor andW Zhang ldquoSpeech probability distributionrdquo IEEESignal Processing Letters vol 10 no 7 pp 204ndash207 2003

[18] M A Akhaee N Khademi Kalantari and F Marvasti ldquoRobustaudio and speech watermarking using Gaussian and Laplacianmodelingrdquo Signal Processing vol 90 no 8 pp 2487ndash2497 2010

[19] J S Garofolo and L D Consortium TIMIT Acoustic-PhoneticContinuous Speech Corpus Linguistic Data Consortium 1993

[20] S Verdu and T S Han ldquoA general formula for channel capacityrdquoIEEE Transactions on Information Theory vol 40 no 4 pp1147ndash1157 1994

[21] S Wang and M Unoki ldquoSpeech watermarking method basedon formant tuningrdquo IEICETransactions on Information and Sys-tems vol 98 no 1 pp 29ndash37 2015

[22] B Yan and Y-J Guo ldquoSpeech authentication by semi-fragilespeech watermarking utilizing analysis by synthesis and spec-tral distortion optimizationrdquo Multimedia Tools and Applica-tions vol 67 no 2 pp 383ndash405 2013

[23] I Rec P 800Methods for Subjective Determination of Transmis-sion Quality International Telecommunication Union GenevaSwitzerland 1996

[24] M Steinebach F A P Petitcolas F Raynal et al ldquoStirMarkbenchmark audio watermarking attacksrdquo in Proceedings of theInternational Conference on Information Technology Codingand Computing IEEE 2001

[25] K Vivekananda Bhat I Sengupta and A Das ldquoAn audiowatermarking scheme using singular value decomposition anddither-modulation quantizationrdquo Multimedia Tools and Appli-cations vol 52 no 2-3 pp 369ndash383 2011

[26] R C Elandt-Johnson and N L Johnson Survival Models andData Analysis Wiley Classics Library John Wiley amp Sons NewYork NY USA 1999

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

10 Security and Communication Networks

Table 2 Comparison with the robustness of different speech watermarking techniques in terms of BER ()

Attack The proposed method DWPT+ multiplication [14] Formant tuning [21] Analysis-by-Synthesis [22]No attack 000 000 004 006A 191ndash423 209ndash543 365ndash645 796ndash965B 965ndash2177 1045ndash2232 1276ndash2445 1623ndash2523C 1013ndash2043 1243ndash2132 1423ndash2354 1743ndash2632D 1053ndash1923 1033ndash1893 1163ndash2323 1533ndash2598E 032ndash202 0763ndash114 123ndash232 298ndash432F 1354ndash1723 1432ndash1765 2623ndash3783 2945ndash3306G 323 265 1932 2387H 023 000 943 1245I 134ndash465 234ndash511 465ndash1043 823ndash1643J 123ndash254 132ndash467 654ndash1054 1154ndash1887K 132ndash316 178ndash423 751ndash1034 1149ndash1943L 092 198 150 404M 312 576 1034 2168N 410 423 665 954O 121ndash254 000ndash143 597ndash876 898ndash1554P 100ndash354 243ndash543 965ndash1456 1965ndash2645Q 2143ndash2943 2454ndash3143 4054ndash4443 5009ndash5032R 484ndash954 532ndash1032 1665ndash2944 2054ndash3698S 1332ndash1854 1500ndash1943 2043ndash2923 2854ndash3076T 132ndash232 201ndash313 743ndash1043 965ndash1532U 015ndash023 018ndash043 145ndash321 432ndash543V 654ndash954 1143ndash1454 132ndash421 232ndash432W 1043ndash2034 1143ndash2534 3632ndash4565 3343ndash5032X 2311 2417 4832 5065Average 580ndash904 668ndash1004 1295ndash1739 1682ndash2148

Table 3 MOS grades [23]

MOS Quality Quality scale Effort required to understand meaning scale(5) Excellent Imperceptible No effort required(4) Good Perceptible but not annoying No appreciable effort required(3) Fair Slightly annoying Moderate effort required(2) Poor Annoying Considerable effort required(1) Bad Very annoying No meaning was understood

As observed in Table 4 each entity was bounded betweentwo values that related a particular value of imperceptibility(SNR andMOS) to a particular capacity Consequently whenthe capacity increased imperceptibility decreasedThe trade-off value is completely application dependent and should bedetermined by the user

5 Performance Analysis

Generally two types of errors false positive probability (FPP)and false negative probability (FNP)must always be analyzedto validate the security of a watermarking system [25] FPPis defined when an unwatermarked speech signal is declaredas a watermarked speech signal by the watermark extractorSimilarly FNP is defined when the watermarked speechsignal is declared as an unwatermarked speech signal by the

watermark extractor By assuming that the watermark bits areindependent random variables both the FPP and FNP can beformulated based on Bernoulli trials which is expressed asfollows

119875119890 = 119879minus1sum119894=0

(119873119894 )119875119894FN (1 minus 119875FN)(119873minus119894)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟FNP

+ 119873sum119894=119879

(119873119894 )119875119894FP (1 minus 119875FP)(119873minus119894)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟FPP

(32)

where119873 is the total number ofwatermark bits 119894 is the numberof matching bits (119873119894 ) is a binomial coefficient 119875FP is theprobability of a false positive which is assumed to be 05

Security and Communication Networks 11

Table 4 Comparison of various watermarking techniques in terms of payload and imperceptibility

Technique Quality scale Effort required to understand meaning scale SNR (dB) Theoretical payload (bps)Analysis-by-Synthesis [22] 401ndash380 476ndash395 2808ndash2532 3333ndash50Formant tuning [21] 498ndash432 500ndash455 3032ndash2754 3333ndash50DWPT+ multiplication [14] 432ndash310 500ndash355 3721ndash2008 3125ndash125The proposed method 487ndash365 500ndash405 4211ndash2071 40ndash400

times10minus4

BER = 021 is shifted by adding 000009BER = 020 is shifted by adding 0000015BER = 019

100 150 200 250 300 350 40050Number of watermark bits

0

1

False

pos

itive

pro

babi

lity

Figure 10 FPP with respect to various total number of watermarkbits for different BER

119875FN is the probability of a false negative which is assumedto be 00919 (as in Table 2) and 119879 is the threshold which iscomputed as follows

119879 = lceil(1 minus BER) times 119873rceil (33)

Figure 10 shows the FPP with respect to various totalnumber of watermark bits for different BER For bettervisualization each line was shifted by adding a constantAs observed the FPP was close to zero for 119873 greater than50 There was a small fluctuation for 119873 less than 50 whichdepended on the BER

Figure 11 shows the FNP with respect to various totalnumber of watermark bits for different BER For bettervisualization each line was shifted by adding a constant Ascan be observed the FNPwas close to zero for119873 greater than100 Additionally whenever the BER decreased the fluctua-tion increased

6 Conclusion and Future Work

In this paper a gain invariant speechwatermarking techniquewas developed using the Lagrange optimization method Forthis purpose samples of the signal were separated based onodd and even indices Then the ratio between the Lp-normswas quantized using the QIM method Finally the Lagrangemethod was used to estimate the optimized values In a sim-ilar manner the extraction process detected the watermarkdata blindly by finding the nearest quantization step

BER = 021 is shifted by adding 0025BER = 020 is shifted by adding 0015BER = 019

0

0005

001

0015

002

0025

003

0035

False

neg

ativ

e pro

babi

lity

100 150 200 250 300 350 40050Number of watermark bits

Figure 11 FNP with respect to various total number of watermarkbits for different BER

By assuming Laplacian distribution for the speech signaland Gaussian distribution for the noise signal the probabilityof error and watermarking distortion were modeled based ona statistical analysis of the proposed technique Additionallyexperimental results not only proved that the developedwatermarking technique was highly robust against differentattacks such compression AWGN filtering and resamplingbut also demonstrated the validity of the analytical modelFor future work an investigation on synchronization andadaptive quantization techniques might contribute to theproposed watermarking technique

Appendix

A Estimation of the Mean and Variance ofthe Ratio of Two Laplacian Variables Basedon Taylor Series

In [26] the bivariate second-order Taylor expansion for119891(119909 119910) around 120579 = (119864(119909) 119864(119910)) is expressed as follows

119891 (119909 119910) = 119891 (120579) + 1198911015840119909 (120579) (119909 minus 120579119909) + 1198911015840119910 (120579) (119910 minus 120579119910)+ 12 11989110158401015840119909119909 (120579) (119909 minus 120579119909)2+ 211989110158401015840119909119910 (120579) (119909 minus 120579119909) (119910 minus 120579119910) + 11989110158401015840119910119910 (120579) (119910 minus 120579119909)2+ remainder

(A1)

12 Security and Communication Networks

Therefore 119864[119891(119883 119884)] can be expanded about 120579 = (119864(119883)119864(119884)) to compute the approximate values as follows

119864 (119891 (119883 119884)) = 119891 (120579) + 12 11989110158401015840119909119909 (120579) var (119883)+ 211989110158401015840119909119910 (120579) cov (119883 119884) + 11989110158401015840119910119910 (120579) var (119884)+ 119874 (119899minus1)

(A2)

For 119891 = 119877119878 11989110158401015840119877119877 = 0 11989110158401015840119877119878 = minus119878minus2 and 11989110158401015840119878119878 = 21198771198783 Thenthe mean and variance of the ratio between 119877 and 119878(119864(119877119878))respectively can be estimated as follows

119864(119877119878 ) equiv 119864 (119891 (119877 119878))asymp 119864 (119877)119864 (119878) minus cov (119877 119878)119864 (119878)2 + var (119878) 119864 (119877)119864 (119878)3= 120583119877120583119878 (1 + 12059021198781205832119878)

var(119877119878 ) asymp 11198642119878 var (119877) + 2minus1198641198771198643119878 cov (119877 119878)+ 11986421198771198644119878 var (119878)

= 12058321198771205832119878 [12059021198771205832119877 minus 2cov (119877 119878)120583119877120583119878 + 12059021198781205832119878 ]

= 12058321198771205832119878 (12059021198781205832119878 minus

12059041198781205834119878) + 12059021198771205832119878

(A3)

B Compute the Absolute Moment ofthe Laplacian Distribution

Themoment of Laplacian distribution expressed as follows

119864 (|119883|119899) = intinfinminusinfin

|119883|119899 sdot 12119887 sdot 119890minus((119883minus120583)119887)119889119909= 12119887 intinfin

minusinfin|119883|119899 sdot 119890minus((119883minus120583)119887)119889119909 (B1)

There are two cases119883 ge 120583 and119883 lt 120583119864 (|119883|119899)

= If 119883 ge 120583 then 12119887 intinfin

minusinfin|119883|119899 sdot 119890minus((119883minus120583)119887)119889119909

If 119883 lt 120583 then 12119887 intinfinminusinfin

|119883|119899 sdot 119890minus((120583minus119883)119887)119889119909(B2)

For first case when119883 ge 120583119864 (|119883|119899) = 12119887 [int0

minusinfinminus119883119899 sdot 119890minus((119883minus120583)119887)119889119909

+ intinfin0

119883119899 sdot 119890minus((119883minus120583)119887)119889119909]

= 1198901205831198872119887 [[[[(minus1)119899 int0

minusinfin119883119899119890119883119887119889119909⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟119868119899

+ intinfin0

119883119899119890minus119883119887119889119909⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟119868

]]]](B3)

If 119905 = minus119883119887 then 119868 can be expressed as

119868 = 119887119899+1 intinfin0

119905119899119890minus119905119889119905 = 119887119899+1 sdot 119899 = 119899 (B4)

119868119899 can also be expressed as

119868119899 = int0minusinfin

119883119899119890119883119887119889119909 = int0minusinfin

(119887 sdot 119905)119899 119890minus119905 sdot 119887 sdot 119889119905= 119887119899+1 int0

minusinfin119905119899119890minus119905119889119905

= 119905119899119890minus119905minus11003816100381610038161003816100381610038161003816100381610038160

minusinfin

minus int0minusinfin

119899 sdot 119905119899minus1119890minus119905minus1 119889119905 = 0 + 119899119868119899minus1(B5)

Substituting (B4) and (B5) into (B3) the absolute momentof the Laplacian distribution can be computed based on

119864 (|119883|119899) = (1198901205831198871198871198992 ) [(minus1)119899 sdot 119868119899 + 119899] (B6)

Competing Interests

The authors declare that they have no competing interests

References

[1] M A Nematollahi C Vorakulpipat and H G Rosales DigitalWatermarking Techniques and Trends vol 11 Springer 2016

[2] MANematollahi and S A R Al-Haddad ldquoAn overview of dig-ital speech watermarkingrdquo International Journal of Speech Tech-nology vol 16 no 4 pp 471ndash488 2013

[3] H-T Hu and L-Y Hsu ldquoA DWT-based rational dither modu-lation scheme for effective blind audio watermarkingrdquo CircuitsSystems and Signal Processing vol 35 no 2 pp 553ndash572 2016

[4] B Chen and G W Wornell ldquoQuantization index modulationa class of provably good methods for digital watermarking andinformation embeddingrdquo Institute of Electrical and ElectronicsEngineers Transactions on InformationTheory vol 47 no 4 pp1423ndash1443 2001

[5] M A Nematollahi M A Akhaee S A R Al-Haddad andH Gamboa-Rosales ldquoSemi-fragile digital speech watermarkingfor online speaker recognitionrdquo Eurasip Journal on AudioSpeech andMusic Processing vol 2015 no 1 article no 31 2015

[6] P Guccione and M Scagliola ldquoHyperbolic RDM for nonlin-ear valumetric distortionsrdquo IEEE Transactions on InformationForensics and Security vol 4 no 1 pp 25ndash35 2009

[7] N Cai N Zhu S Weng and B Wing-Kuen Ling ldquoDifferenceangle quantization index modulation scheme for image water-markingrdquo Signal Processing Image Communication vol 34 pp52ndash60 2015

Security and Communication Networks 13

[8] X Zhu and S Peng ldquoA novel quantization watermarkingscheme by modulating the normalized correlationrdquo in Proceed-ings of the IEEE International Conference on Acoustics Speechand Signal Processing (ICASSP rsquo12) pp 1765ndash1768 IEEE KyotoJapan March 2012

[9] M A Akhaee S M E Sahraeian and C Jin ldquoBlind imagewatermarking using a sample projection approachrdquo IEEETrans-actions on Information Forensics and Security vol 6 no 3 pp883ndash893 2011

[10] N K Kalantari and S M Ahadi ldquoA logarithmic quantizationindex modulation for perceptually better data hidingrdquo IEEETransactions on Image Processing vol 19 no 6 pp 1504ndash15172010

[11] M Zareian andH R Tohidypour ldquoA novel gain invariant quan-tization-based watermarking approachrdquo IEEE Transactions onInformation Forensics and Security vol 9 no 11 pp 1804ndash18132014

[12] M Faundez-Zanuy M Hagmuller and G Kubin ldquoSpeakerverification security improvement by means of speech water-markingrdquo Speech Communication vol 48 no 12 pp 1608ndash16192006

[13] M Faundez-Zanuy M Hagmuller and G Kubin ldquoSpeakeridentification security improvement by means of speech water-markingrdquo Pattern Recognition vol 40 no 11 pp 3027ndash30342007

[14] M A Nematollahi H Gamboa-Rosales M A Akhaee andS A R Al-Haddad ldquoRobust digital speech watermarking foronline speaker recognitionrdquo Mathematical Problems in Engi-neering vol 2015 Article ID 372398 12 pages 2015

[15] M A Nematollahi H Gamboa-Rosales F J Martinez-Ruiz JI de la Rosa-Vargas S A R Al-Haddad and M EsmaeilpourldquoMulti-factor authentication model based on multipurposespeech watermarking and online speaker recognitionrdquo Multi-media Tools and Applications pp 1ndash31 2016

[16] M A Nematollahi S A R Al-Haddad S Doraisamy and HGamboa-Rosales ldquoSpeaker frame selection for digital speechwatermarkingrdquo National Academy Science Letters vol 39 no 3pp 197ndash201 2016

[17] S Gazor andW Zhang ldquoSpeech probability distributionrdquo IEEESignal Processing Letters vol 10 no 7 pp 204ndash207 2003

[18] M A Akhaee N Khademi Kalantari and F Marvasti ldquoRobustaudio and speech watermarking using Gaussian and Laplacianmodelingrdquo Signal Processing vol 90 no 8 pp 2487ndash2497 2010

[19] J S Garofolo and L D Consortium TIMIT Acoustic-PhoneticContinuous Speech Corpus Linguistic Data Consortium 1993

[20] S Verdu and T S Han ldquoA general formula for channel capacityrdquoIEEE Transactions on Information Theory vol 40 no 4 pp1147ndash1157 1994

[21] S Wang and M Unoki ldquoSpeech watermarking method basedon formant tuningrdquo IEICETransactions on Information and Sys-tems vol 98 no 1 pp 29ndash37 2015

[22] B Yan and Y-J Guo ldquoSpeech authentication by semi-fragilespeech watermarking utilizing analysis by synthesis and spec-tral distortion optimizationrdquo Multimedia Tools and Applica-tions vol 67 no 2 pp 383ndash405 2013

[23] I Rec P 800Methods for Subjective Determination of Transmis-sion Quality International Telecommunication Union GenevaSwitzerland 1996

[24] M Steinebach F A P Petitcolas F Raynal et al ldquoStirMarkbenchmark audio watermarking attacksrdquo in Proceedings of theInternational Conference on Information Technology Codingand Computing IEEE 2001

[25] K Vivekananda Bhat I Sengupta and A Das ldquoAn audiowatermarking scheme using singular value decomposition anddither-modulation quantizationrdquo Multimedia Tools and Appli-cations vol 52 no 2-3 pp 369ndash383 2011

[26] R C Elandt-Johnson and N L Johnson Survival Models andData Analysis Wiley Classics Library John Wiley amp Sons NewYork NY USA 1999

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Security and Communication Networks 11

Table 4 Comparison of various watermarking techniques in terms of payload and imperceptibility

Technique Quality scale Effort required to understand meaning scale SNR (dB) Theoretical payload (bps)Analysis-by-Synthesis [22] 401ndash380 476ndash395 2808ndash2532 3333ndash50Formant tuning [21] 498ndash432 500ndash455 3032ndash2754 3333ndash50DWPT+ multiplication [14] 432ndash310 500ndash355 3721ndash2008 3125ndash125The proposed method 487ndash365 500ndash405 4211ndash2071 40ndash400

times10minus4

BER = 021 is shifted by adding 000009BER = 020 is shifted by adding 0000015BER = 019

100 150 200 250 300 350 40050Number of watermark bits

0

1

False

pos

itive

pro

babi

lity

Figure 10 FPP with respect to various total number of watermarkbits for different BER

119875FN is the probability of a false negative which is assumedto be 00919 (as in Table 2) and 119879 is the threshold which iscomputed as follows

119879 = lceil(1 minus BER) times 119873rceil (33)

Figure 10 shows the FPP with respect to various totalnumber of watermark bits for different BER For bettervisualization each line was shifted by adding a constantAs observed the FPP was close to zero for 119873 greater than50 There was a small fluctuation for 119873 less than 50 whichdepended on the BER

Figure 11 shows the FNP with respect to various totalnumber of watermark bits for different BER For bettervisualization each line was shifted by adding a constant Ascan be observed the FNPwas close to zero for119873 greater than100 Additionally whenever the BER decreased the fluctua-tion increased

6 Conclusion and Future Work

In this paper a gain invariant speechwatermarking techniquewas developed using the Lagrange optimization method Forthis purpose samples of the signal were separated based onodd and even indices Then the ratio between the Lp-normswas quantized using the QIM method Finally the Lagrangemethod was used to estimate the optimized values In a sim-ilar manner the extraction process detected the watermarkdata blindly by finding the nearest quantization step

BER = 021 is shifted by adding 0025BER = 020 is shifted by adding 0015BER = 019

0

0005

001

0015

002

0025

003

0035

False

neg

ativ

e pro

babi

lity

100 150 200 250 300 350 40050Number of watermark bits

Figure 11 FNP with respect to various total number of watermarkbits for different BER

By assuming Laplacian distribution for the speech signaland Gaussian distribution for the noise signal the probabilityof error and watermarking distortion were modeled based ona statistical analysis of the proposed technique Additionallyexperimental results not only proved that the developedwatermarking technique was highly robust against differentattacks such compression AWGN filtering and resamplingbut also demonstrated the validity of the analytical modelFor future work an investigation on synchronization andadaptive quantization techniques might contribute to theproposed watermarking technique

Appendix

A Estimation of the Mean and Variance ofthe Ratio of Two Laplacian Variables Basedon Taylor Series

In [26] the bivariate second-order Taylor expansion for119891(119909 119910) around 120579 = (119864(119909) 119864(119910)) is expressed as follows

119891 (119909 119910) = 119891 (120579) + 1198911015840119909 (120579) (119909 minus 120579119909) + 1198911015840119910 (120579) (119910 minus 120579119910)+ 12 11989110158401015840119909119909 (120579) (119909 minus 120579119909)2+ 211989110158401015840119909119910 (120579) (119909 minus 120579119909) (119910 minus 120579119910) + 11989110158401015840119910119910 (120579) (119910 minus 120579119909)2+ remainder

(A1)

12 Security and Communication Networks

Therefore 119864[119891(119883 119884)] can be expanded about 120579 = (119864(119883)119864(119884)) to compute the approximate values as follows

119864 (119891 (119883 119884)) = 119891 (120579) + 12 11989110158401015840119909119909 (120579) var (119883)+ 211989110158401015840119909119910 (120579) cov (119883 119884) + 11989110158401015840119910119910 (120579) var (119884)+ 119874 (119899minus1)

(A2)

For 119891 = 119877119878 11989110158401015840119877119877 = 0 11989110158401015840119877119878 = minus119878minus2 and 11989110158401015840119878119878 = 21198771198783 Thenthe mean and variance of the ratio between 119877 and 119878(119864(119877119878))respectively can be estimated as follows

119864(119877119878 ) equiv 119864 (119891 (119877 119878))asymp 119864 (119877)119864 (119878) minus cov (119877 119878)119864 (119878)2 + var (119878) 119864 (119877)119864 (119878)3= 120583119877120583119878 (1 + 12059021198781205832119878)

var(119877119878 ) asymp 11198642119878 var (119877) + 2minus1198641198771198643119878 cov (119877 119878)+ 11986421198771198644119878 var (119878)

= 12058321198771205832119878 [12059021198771205832119877 minus 2cov (119877 119878)120583119877120583119878 + 12059021198781205832119878 ]

= 12058321198771205832119878 (12059021198781205832119878 minus

12059041198781205834119878) + 12059021198771205832119878

(A3)

B Compute the Absolute Moment ofthe Laplacian Distribution

Themoment of Laplacian distribution expressed as follows

119864 (|119883|119899) = intinfinminusinfin

|119883|119899 sdot 12119887 sdot 119890minus((119883minus120583)119887)119889119909= 12119887 intinfin

minusinfin|119883|119899 sdot 119890minus((119883minus120583)119887)119889119909 (B1)

There are two cases119883 ge 120583 and119883 lt 120583119864 (|119883|119899)

= If 119883 ge 120583 then 12119887 intinfin

minusinfin|119883|119899 sdot 119890minus((119883minus120583)119887)119889119909

If 119883 lt 120583 then 12119887 intinfinminusinfin

|119883|119899 sdot 119890minus((120583minus119883)119887)119889119909(B2)

For first case when119883 ge 120583119864 (|119883|119899) = 12119887 [int0

minusinfinminus119883119899 sdot 119890minus((119883minus120583)119887)119889119909

+ intinfin0

119883119899 sdot 119890minus((119883minus120583)119887)119889119909]

= 1198901205831198872119887 [[[[(minus1)119899 int0

minusinfin119883119899119890119883119887119889119909⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟119868119899

+ intinfin0

119883119899119890minus119883119887119889119909⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟119868

]]]](B3)

If 119905 = minus119883119887 then 119868 can be expressed as

119868 = 119887119899+1 intinfin0

119905119899119890minus119905119889119905 = 119887119899+1 sdot 119899 = 119899 (B4)

119868119899 can also be expressed as

119868119899 = int0minusinfin

119883119899119890119883119887119889119909 = int0minusinfin

(119887 sdot 119905)119899 119890minus119905 sdot 119887 sdot 119889119905= 119887119899+1 int0

minusinfin119905119899119890minus119905119889119905

= 119905119899119890minus119905minus11003816100381610038161003816100381610038161003816100381610038160

minusinfin

minus int0minusinfin

119899 sdot 119905119899minus1119890minus119905minus1 119889119905 = 0 + 119899119868119899minus1(B5)

Substituting (B4) and (B5) into (B3) the absolute momentof the Laplacian distribution can be computed based on

119864 (|119883|119899) = (1198901205831198871198871198992 ) [(minus1)119899 sdot 119868119899 + 119899] (B6)

Competing Interests

The authors declare that they have no competing interests

References

[1] M A Nematollahi C Vorakulpipat and H G Rosales DigitalWatermarking Techniques and Trends vol 11 Springer 2016

[2] MANematollahi and S A R Al-Haddad ldquoAn overview of dig-ital speech watermarkingrdquo International Journal of Speech Tech-nology vol 16 no 4 pp 471ndash488 2013

[3] H-T Hu and L-Y Hsu ldquoA DWT-based rational dither modu-lation scheme for effective blind audio watermarkingrdquo CircuitsSystems and Signal Processing vol 35 no 2 pp 553ndash572 2016

[4] B Chen and G W Wornell ldquoQuantization index modulationa class of provably good methods for digital watermarking andinformation embeddingrdquo Institute of Electrical and ElectronicsEngineers Transactions on InformationTheory vol 47 no 4 pp1423ndash1443 2001

[5] M A Nematollahi M A Akhaee S A R Al-Haddad andH Gamboa-Rosales ldquoSemi-fragile digital speech watermarkingfor online speaker recognitionrdquo Eurasip Journal on AudioSpeech andMusic Processing vol 2015 no 1 article no 31 2015

[6] P Guccione and M Scagliola ldquoHyperbolic RDM for nonlin-ear valumetric distortionsrdquo IEEE Transactions on InformationForensics and Security vol 4 no 1 pp 25ndash35 2009

[7] N Cai N Zhu S Weng and B Wing-Kuen Ling ldquoDifferenceangle quantization index modulation scheme for image water-markingrdquo Signal Processing Image Communication vol 34 pp52ndash60 2015

Security and Communication Networks 13

[8] X Zhu and S Peng ldquoA novel quantization watermarkingscheme by modulating the normalized correlationrdquo in Proceed-ings of the IEEE International Conference on Acoustics Speechand Signal Processing (ICASSP rsquo12) pp 1765ndash1768 IEEE KyotoJapan March 2012

[9] M A Akhaee S M E Sahraeian and C Jin ldquoBlind imagewatermarking using a sample projection approachrdquo IEEETrans-actions on Information Forensics and Security vol 6 no 3 pp883ndash893 2011

[10] N K Kalantari and S M Ahadi ldquoA logarithmic quantizationindex modulation for perceptually better data hidingrdquo IEEETransactions on Image Processing vol 19 no 6 pp 1504ndash15172010

[11] M Zareian andH R Tohidypour ldquoA novel gain invariant quan-tization-based watermarking approachrdquo IEEE Transactions onInformation Forensics and Security vol 9 no 11 pp 1804ndash18132014

[12] M Faundez-Zanuy M Hagmuller and G Kubin ldquoSpeakerverification security improvement by means of speech water-markingrdquo Speech Communication vol 48 no 12 pp 1608ndash16192006

[13] M Faundez-Zanuy M Hagmuller and G Kubin ldquoSpeakeridentification security improvement by means of speech water-markingrdquo Pattern Recognition vol 40 no 11 pp 3027ndash30342007

[14] M A Nematollahi H Gamboa-Rosales M A Akhaee andS A R Al-Haddad ldquoRobust digital speech watermarking foronline speaker recognitionrdquo Mathematical Problems in Engi-neering vol 2015 Article ID 372398 12 pages 2015

[15] M A Nematollahi H Gamboa-Rosales F J Martinez-Ruiz JI de la Rosa-Vargas S A R Al-Haddad and M EsmaeilpourldquoMulti-factor authentication model based on multipurposespeech watermarking and online speaker recognitionrdquo Multi-media Tools and Applications pp 1ndash31 2016

[16] M A Nematollahi S A R Al-Haddad S Doraisamy and HGamboa-Rosales ldquoSpeaker frame selection for digital speechwatermarkingrdquo National Academy Science Letters vol 39 no 3pp 197ndash201 2016

[17] S Gazor andW Zhang ldquoSpeech probability distributionrdquo IEEESignal Processing Letters vol 10 no 7 pp 204ndash207 2003

[18] M A Akhaee N Khademi Kalantari and F Marvasti ldquoRobustaudio and speech watermarking using Gaussian and Laplacianmodelingrdquo Signal Processing vol 90 no 8 pp 2487ndash2497 2010

[19] J S Garofolo and L D Consortium TIMIT Acoustic-PhoneticContinuous Speech Corpus Linguistic Data Consortium 1993

[20] S Verdu and T S Han ldquoA general formula for channel capacityrdquoIEEE Transactions on Information Theory vol 40 no 4 pp1147ndash1157 1994

[21] S Wang and M Unoki ldquoSpeech watermarking method basedon formant tuningrdquo IEICETransactions on Information and Sys-tems vol 98 no 1 pp 29ndash37 2015

[22] B Yan and Y-J Guo ldquoSpeech authentication by semi-fragilespeech watermarking utilizing analysis by synthesis and spec-tral distortion optimizationrdquo Multimedia Tools and Applica-tions vol 67 no 2 pp 383ndash405 2013

[23] I Rec P 800Methods for Subjective Determination of Transmis-sion Quality International Telecommunication Union GenevaSwitzerland 1996

[24] M Steinebach F A P Petitcolas F Raynal et al ldquoStirMarkbenchmark audio watermarking attacksrdquo in Proceedings of theInternational Conference on Information Technology Codingand Computing IEEE 2001

[25] K Vivekananda Bhat I Sengupta and A Das ldquoAn audiowatermarking scheme using singular value decomposition anddither-modulation quantizationrdquo Multimedia Tools and Appli-cations vol 52 no 2-3 pp 369ndash383 2011

[26] R C Elandt-Johnson and N L Johnson Survival Models andData Analysis Wiley Classics Library John Wiley amp Sons NewYork NY USA 1999

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

12 Security and Communication Networks

Therefore 119864[119891(119883 119884)] can be expanded about 120579 = (119864(119883)119864(119884)) to compute the approximate values as follows

119864 (119891 (119883 119884)) = 119891 (120579) + 12 11989110158401015840119909119909 (120579) var (119883)+ 211989110158401015840119909119910 (120579) cov (119883 119884) + 11989110158401015840119910119910 (120579) var (119884)+ 119874 (119899minus1)

(A2)

For 119891 = 119877119878 11989110158401015840119877119877 = 0 11989110158401015840119877119878 = minus119878minus2 and 11989110158401015840119878119878 = 21198771198783 Thenthe mean and variance of the ratio between 119877 and 119878(119864(119877119878))respectively can be estimated as follows

119864(119877119878 ) equiv 119864 (119891 (119877 119878))asymp 119864 (119877)119864 (119878) minus cov (119877 119878)119864 (119878)2 + var (119878) 119864 (119877)119864 (119878)3= 120583119877120583119878 (1 + 12059021198781205832119878)

var(119877119878 ) asymp 11198642119878 var (119877) + 2minus1198641198771198643119878 cov (119877 119878)+ 11986421198771198644119878 var (119878)

= 12058321198771205832119878 [12059021198771205832119877 minus 2cov (119877 119878)120583119877120583119878 + 12059021198781205832119878 ]

= 12058321198771205832119878 (12059021198781205832119878 minus

12059041198781205834119878) + 12059021198771205832119878

(A3)

B Compute the Absolute Moment ofthe Laplacian Distribution

Themoment of Laplacian distribution expressed as follows

119864 (|119883|119899) = intinfinminusinfin

|119883|119899 sdot 12119887 sdot 119890minus((119883minus120583)119887)119889119909= 12119887 intinfin

minusinfin|119883|119899 sdot 119890minus((119883minus120583)119887)119889119909 (B1)

There are two cases119883 ge 120583 and119883 lt 120583119864 (|119883|119899)

= If 119883 ge 120583 then 12119887 intinfin

minusinfin|119883|119899 sdot 119890minus((119883minus120583)119887)119889119909

If 119883 lt 120583 then 12119887 intinfinminusinfin

|119883|119899 sdot 119890minus((120583minus119883)119887)119889119909(B2)

For first case when119883 ge 120583119864 (|119883|119899) = 12119887 [int0

minusinfinminus119883119899 sdot 119890minus((119883minus120583)119887)119889119909

+ intinfin0

119883119899 sdot 119890minus((119883minus120583)119887)119889119909]

= 1198901205831198872119887 [[[[(minus1)119899 int0

minusinfin119883119899119890119883119887119889119909⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟119868119899

+ intinfin0

119883119899119890minus119883119887119889119909⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟119868

]]]](B3)

If 119905 = minus119883119887 then 119868 can be expressed as

119868 = 119887119899+1 intinfin0

119905119899119890minus119905119889119905 = 119887119899+1 sdot 119899 = 119899 (B4)

119868119899 can also be expressed as

119868119899 = int0minusinfin

119883119899119890119883119887119889119909 = int0minusinfin

(119887 sdot 119905)119899 119890minus119905 sdot 119887 sdot 119889119905= 119887119899+1 int0

minusinfin119905119899119890minus119905119889119905

= 119905119899119890minus119905minus11003816100381610038161003816100381610038161003816100381610038160

minusinfin

minus int0minusinfin

119899 sdot 119905119899minus1119890minus119905minus1 119889119905 = 0 + 119899119868119899minus1(B5)

Substituting (B4) and (B5) into (B3) the absolute momentof the Laplacian distribution can be computed based on

119864 (|119883|119899) = (1198901205831198871198871198992 ) [(minus1)119899 sdot 119868119899 + 119899] (B6)

Competing Interests

The authors declare that they have no competing interests

References

[1] M A Nematollahi C Vorakulpipat and H G Rosales DigitalWatermarking Techniques and Trends vol 11 Springer 2016

[2] MANematollahi and S A R Al-Haddad ldquoAn overview of dig-ital speech watermarkingrdquo International Journal of Speech Tech-nology vol 16 no 4 pp 471ndash488 2013

[3] H-T Hu and L-Y Hsu ldquoA DWT-based rational dither modu-lation scheme for effective blind audio watermarkingrdquo CircuitsSystems and Signal Processing vol 35 no 2 pp 553ndash572 2016

[4] B Chen and G W Wornell ldquoQuantization index modulationa class of provably good methods for digital watermarking andinformation embeddingrdquo Institute of Electrical and ElectronicsEngineers Transactions on InformationTheory vol 47 no 4 pp1423ndash1443 2001

[5] M A Nematollahi M A Akhaee S A R Al-Haddad andH Gamboa-Rosales ldquoSemi-fragile digital speech watermarkingfor online speaker recognitionrdquo Eurasip Journal on AudioSpeech andMusic Processing vol 2015 no 1 article no 31 2015

[6] P Guccione and M Scagliola ldquoHyperbolic RDM for nonlin-ear valumetric distortionsrdquo IEEE Transactions on InformationForensics and Security vol 4 no 1 pp 25ndash35 2009

[7] N Cai N Zhu S Weng and B Wing-Kuen Ling ldquoDifferenceangle quantization index modulation scheme for image water-markingrdquo Signal Processing Image Communication vol 34 pp52ndash60 2015

Security and Communication Networks 13

[8] X Zhu and S Peng ldquoA novel quantization watermarkingscheme by modulating the normalized correlationrdquo in Proceed-ings of the IEEE International Conference on Acoustics Speechand Signal Processing (ICASSP rsquo12) pp 1765ndash1768 IEEE KyotoJapan March 2012

[9] M A Akhaee S M E Sahraeian and C Jin ldquoBlind imagewatermarking using a sample projection approachrdquo IEEETrans-actions on Information Forensics and Security vol 6 no 3 pp883ndash893 2011

[10] N K Kalantari and S M Ahadi ldquoA logarithmic quantizationindex modulation for perceptually better data hidingrdquo IEEETransactions on Image Processing vol 19 no 6 pp 1504ndash15172010

[11] M Zareian andH R Tohidypour ldquoA novel gain invariant quan-tization-based watermarking approachrdquo IEEE Transactions onInformation Forensics and Security vol 9 no 11 pp 1804ndash18132014

[12] M Faundez-Zanuy M Hagmuller and G Kubin ldquoSpeakerverification security improvement by means of speech water-markingrdquo Speech Communication vol 48 no 12 pp 1608ndash16192006

[13] M Faundez-Zanuy M Hagmuller and G Kubin ldquoSpeakeridentification security improvement by means of speech water-markingrdquo Pattern Recognition vol 40 no 11 pp 3027ndash30342007

[14] M A Nematollahi H Gamboa-Rosales M A Akhaee andS A R Al-Haddad ldquoRobust digital speech watermarking foronline speaker recognitionrdquo Mathematical Problems in Engi-neering vol 2015 Article ID 372398 12 pages 2015

[15] M A Nematollahi H Gamboa-Rosales F J Martinez-Ruiz JI de la Rosa-Vargas S A R Al-Haddad and M EsmaeilpourldquoMulti-factor authentication model based on multipurposespeech watermarking and online speaker recognitionrdquo Multi-media Tools and Applications pp 1ndash31 2016

[16] M A Nematollahi S A R Al-Haddad S Doraisamy and HGamboa-Rosales ldquoSpeaker frame selection for digital speechwatermarkingrdquo National Academy Science Letters vol 39 no 3pp 197ndash201 2016

[17] S Gazor andW Zhang ldquoSpeech probability distributionrdquo IEEESignal Processing Letters vol 10 no 7 pp 204ndash207 2003

[18] M A Akhaee N Khademi Kalantari and F Marvasti ldquoRobustaudio and speech watermarking using Gaussian and Laplacianmodelingrdquo Signal Processing vol 90 no 8 pp 2487ndash2497 2010

[19] J S Garofolo and L D Consortium TIMIT Acoustic-PhoneticContinuous Speech Corpus Linguistic Data Consortium 1993

[20] S Verdu and T S Han ldquoA general formula for channel capacityrdquoIEEE Transactions on Information Theory vol 40 no 4 pp1147ndash1157 1994

[21] S Wang and M Unoki ldquoSpeech watermarking method basedon formant tuningrdquo IEICETransactions on Information and Sys-tems vol 98 no 1 pp 29ndash37 2015

[22] B Yan and Y-J Guo ldquoSpeech authentication by semi-fragilespeech watermarking utilizing analysis by synthesis and spec-tral distortion optimizationrdquo Multimedia Tools and Applica-tions vol 67 no 2 pp 383ndash405 2013

[23] I Rec P 800Methods for Subjective Determination of Transmis-sion Quality International Telecommunication Union GenevaSwitzerland 1996

[24] M Steinebach F A P Petitcolas F Raynal et al ldquoStirMarkbenchmark audio watermarking attacksrdquo in Proceedings of theInternational Conference on Information Technology Codingand Computing IEEE 2001

[25] K Vivekananda Bhat I Sengupta and A Das ldquoAn audiowatermarking scheme using singular value decomposition anddither-modulation quantizationrdquo Multimedia Tools and Appli-cations vol 52 no 2-3 pp 369ndash383 2011

[26] R C Elandt-Johnson and N L Johnson Survival Models andData Analysis Wiley Classics Library John Wiley amp Sons NewYork NY USA 1999

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Security and Communication Networks 13

[8] X Zhu and S Peng ldquoA novel quantization watermarkingscheme by modulating the normalized correlationrdquo in Proceed-ings of the IEEE International Conference on Acoustics Speechand Signal Processing (ICASSP rsquo12) pp 1765ndash1768 IEEE KyotoJapan March 2012

[9] M A Akhaee S M E Sahraeian and C Jin ldquoBlind imagewatermarking using a sample projection approachrdquo IEEETrans-actions on Information Forensics and Security vol 6 no 3 pp883ndash893 2011

[10] N K Kalantari and S M Ahadi ldquoA logarithmic quantizationindex modulation for perceptually better data hidingrdquo IEEETransactions on Image Processing vol 19 no 6 pp 1504ndash15172010

[11] M Zareian andH R Tohidypour ldquoA novel gain invariant quan-tization-based watermarking approachrdquo IEEE Transactions onInformation Forensics and Security vol 9 no 11 pp 1804ndash18132014

[12] M Faundez-Zanuy M Hagmuller and G Kubin ldquoSpeakerverification security improvement by means of speech water-markingrdquo Speech Communication vol 48 no 12 pp 1608ndash16192006

[13] M Faundez-Zanuy M Hagmuller and G Kubin ldquoSpeakeridentification security improvement by means of speech water-markingrdquo Pattern Recognition vol 40 no 11 pp 3027ndash30342007

[14] M A Nematollahi H Gamboa-Rosales M A Akhaee andS A R Al-Haddad ldquoRobust digital speech watermarking foronline speaker recognitionrdquo Mathematical Problems in Engi-neering vol 2015 Article ID 372398 12 pages 2015

[15] M A Nematollahi H Gamboa-Rosales F J Martinez-Ruiz JI de la Rosa-Vargas S A R Al-Haddad and M EsmaeilpourldquoMulti-factor authentication model based on multipurposespeech watermarking and online speaker recognitionrdquo Multi-media Tools and Applications pp 1ndash31 2016

[16] M A Nematollahi S A R Al-Haddad S Doraisamy and HGamboa-Rosales ldquoSpeaker frame selection for digital speechwatermarkingrdquo National Academy Science Letters vol 39 no 3pp 197ndash201 2016

[17] S Gazor andW Zhang ldquoSpeech probability distributionrdquo IEEESignal Processing Letters vol 10 no 7 pp 204ndash207 2003

[18] M A Akhaee N Khademi Kalantari and F Marvasti ldquoRobustaudio and speech watermarking using Gaussian and Laplacianmodelingrdquo Signal Processing vol 90 no 8 pp 2487ndash2497 2010

[19] J S Garofolo and L D Consortium TIMIT Acoustic-PhoneticContinuous Speech Corpus Linguistic Data Consortium 1993

[20] S Verdu and T S Han ldquoA general formula for channel capacityrdquoIEEE Transactions on Information Theory vol 40 no 4 pp1147ndash1157 1994

[21] S Wang and M Unoki ldquoSpeech watermarking method basedon formant tuningrdquo IEICETransactions on Information and Sys-tems vol 98 no 1 pp 29ndash37 2015

[22] B Yan and Y-J Guo ldquoSpeech authentication by semi-fragilespeech watermarking utilizing analysis by synthesis and spec-tral distortion optimizationrdquo Multimedia Tools and Applica-tions vol 67 no 2 pp 383ndash405 2013

[23] I Rec P 800Methods for Subjective Determination of Transmis-sion Quality International Telecommunication Union GenevaSwitzerland 1996

[24] M Steinebach F A P Petitcolas F Raynal et al ldquoStirMarkbenchmark audio watermarking attacksrdquo in Proceedings of theInternational Conference on Information Technology Codingand Computing IEEE 2001

[25] K Vivekananda Bhat I Sengupta and A Das ldquoAn audiowatermarking scheme using singular value decomposition anddither-modulation quantizationrdquo Multimedia Tools and Appli-cations vol 52 no 2-3 pp 369ndash383 2011

[26] R C Elandt-Johnson and N L Johnson Survival Models andData Analysis Wiley Classics Library John Wiley amp Sons NewYork NY USA 1999

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of