Download pdf - Real-time frame-dependent video watermarking in VLC domain

ARTICLE IN PRESS

0923-5965/$ - se

doi:10.1016/j.im

$This paper

of an earlier pap

the IEEE Int

France, 2004.�CorrespondiE-mail addre

Signal Processing: Image Communication 20 (2005) 624–642

www.elsevier.com/locate/image

Real-time frame-dependent video watermarking inVLC domain$

Chun-Shien Lua,�, Jan-Ru Chena,b, Kuo-Chin Fanb

aInstitute of Information Science, Academia Sinica, Taipei, Taiwan, ROCbDepartment of Computer Science and Information Engineering, National Central University, Chung-Li, Taiwan, ROC

Received 5 October 2004; accepted 15 March 2005

Abstract

Digital watermarking is a helpful technology for providing copyright protection for valuable multimedia data. In

particular, video watermarking deals with several issues, which are unique to various types of media watermarking.

In this paper, these issues, including compressed domain watermarking, real-time detection, bit-rate control, and

resistance to watermark estimation attacks, will be addressed. Since video sequences are usually compressed before they

are transmitted over networks, we first describe how watermark signals can be embedded into compressed video while

keeping the desired bit-rate nearly unchanged. In the embedding process, our algorithm is designed to operate directly

in the variable length codeword (VLC) domain to satisfy the requirement of real-time detection. We describe how

suitable positions in the VLC domain can be selected for embedding transparent watermarks. Second, in addition to

typical attacks, the peculiar attacks that video sequences encounter are investigated. In particular, in order to deal with

both collusion and copy attacks that are fatal to video watermarking, the video frame-dependent watermark (VFDW)

is presented. Extensive experimental results verify the excellent performance of the proposed compressed video

watermarking system in addressing the aforementioned issues.

r 2005 Elsevier B.V. All rights reserved.

Keywords: Attack; Bit-rate control; Collusion; Video frame-dependent watermark; Real-time detection; Robustness

e front matter r 2005 Elsevier B.V. All rights reserve

age.2005.03.012

is a significantly modified and extended version

er [15] that was published in the Proceedings of

ernational Conference on Communications,

ng author. Tel.: +886 2 27883799.

ss: [email protected] (C.-S. Lu).

1. Introduction

1.1. Literature review

Due to the rapid development of the Internetduring the past decade, numerous methods havebeen proposed for storing and transmitting digitalmultimedia data. Digital data is much superior toanalog data because the quality of copies is not

d.

www.elsevier.com/locate/image

ARTICLE IN PRESS

1It is worth mentioning that if each image unit (e.g., a block

or mesh) in an image is treated like a frame in a video sequence;

then collusion attacks can also be applied to those image

watermarking methods that employ a multiple redundant

watermark embedding strategy [16].

C.-S. Lu et al. / Signal Processing: Image Communication 20 (2005) 624–642 625

degraded at all. However, this superiority hasbecome a threat to authorized usage becausedigital data can be easily tampered with, dupli-cated or distributed without the need for expensivetools. As a consequence, the intellectual propertyprotection problem has become an urgent issue inthe digital world.Recently, an emerging intellectual property

protection scheme called ‘‘digital watermarking’’has been extensively explored [4,7,20]. Digitalwatermarking is a helpful technique that can beused to protect multimedia data. The underlyingconcept of digital watermarking is to embedimperceptible signals into digital multimedia datato carry out specific missions. When watermarkedmultimedia data encounters a digital signal pro-cessing operation, the embedded watermark isexpected to survive this operation. If someoneviolates the copyright of watermarked multimediadata, the original owner can prove ownership byextracting the embedded watermark. In this paper,we shall focus on the development of a new videowatermarking scheme.In the past decade, a number of video water-

marking schemes have been proposed. The existingvideo watermarking schemes embed watermarkseither in raw video [3,5,6,10,22] or in compressedvideo [2,6,12–14]. Compressed domain videowatermarking is considered more practical and isthe objective of this paper because video is usuallystored in a compressed format before it istransmitted over networks. In [6], Hartung andGirod proposed a video watermarking scheme thatuses MPEG-2 bitstreams as the underlying targetdata. They arrange a watermark sequence into a2D format, which is the same size as a video frame.Then, the watermark signals are 8� 8 DCTtransformed and added into their correspondingDCT coefficients. In order to deal with the errorpropagation problem caused by watermarking,they add a drift compensation signal to thewatermarked signal. Their detection process hasbeen shown to be very close to real-time detection.Arena et al. [2] improved Hartung and Girod’swork with interleaved encoding. They also makeuse of the advantageous properties of the humanvisual system (HVS). In [13], Langelaar et al.proposed a video watermarking scheme which can

be used in the compressed domain by modifyingthe codewords generated by a variable lengthcodeword (VLC). First, they divide run-level pairsinto many groups, where each group containscodewords of equal length and the level differencein each group is exactly one. During watermarkembedding, a run-level pair is either left un-changed or replaced, depending on the incomingwatermark value. Their method is basically a leastsignificant bit (LSB) approach. This implies thatquality degradation of a video after embeddingcan be almost negligible. However, for a com-monly adopted signal processing operation, suchas decoding followed by re-encoding, achievingrobustness becomes impossible. In [12], Langelaarand Lagendijk proposed a differential energywatermarking (DEW) algorithm which can beemployed in the DCT domain. They calculate theenergy difference of two blocks and remove thehigh-frequency coefficients of the correspondingblock with energy being pre-defined as smaller inorder to maintain the pre-defined energy relation.The authors mentioned that their method is robustagainst the re-encoding of video bit streams.However, as mentioned in [1,12], this techniqueis susceptible to transcoding, especially when thegroup of picture (GOP) structure is changed. Anearlier paper [22] described a scene-based videowatermarking approach that uses a ‘‘temporalwavelet transform.’’ One major feature is that theauthors first pointed out the collusion attack,which is a common threat to video watermarkingmethods. In fact, the collusion attack can be saidto be a unique outcome1 of video watermarkmethods. We deeply believe that it would bemeaningless to claim that a video watermarkingscheme was robust if it could not deal withcollusion. Overall, although the aforementionedpapers have dealt with problems related to videowatermarking, some issues still remain. In order togive the reader a clear idea of the state of the art invideo watermarking, Table 1 summarizes selected

ARTICLE IN PRESS

Table 1

Video watermarking methods vs. video characteristics

Video characteristic [22] [9] [21] [6] [13] [12] [1]

Compressed domain watermarking N N N Y(DCT) Y(VLC) Y(DCT) Y(DCT)

Real-time detection N Y N Y Y Y Y

Bit-rate (nearly) unchanged N N N N Y N Y

Drift compensation N N N Y N N Y

Low bit-rate embedding N N N N N N Y

Resistance to collusion Y N Y N N N N

Resistance to copy attack N N N N N N N

C.-S. Lu et al. / Signal Processing: Image Communication 20 (2005) 624–642626

techniques versus video characteristics. In thefollowing subsections, the issues of compresseddomain watermarking, real-time detection, bit-ratecontrol, and resistance to peculiar attacks thatvideo sequences may encounter will be discussed.

1.2. Compressed domain video watermarking,

real-time detection, and bit-rate control

As for watermarking in the compressed domain,most of the existing compressed domain videowatermarking methods are, in fact, employed inthe ‘‘DCT’’ domain. In other words, both inverseentropy coding and inverse quantization must beperformed before watermark embedding or detec-tion. In this work, we shall propose a compressed-domain video watermarking method in which thewatermarking process can be directly performed inthe VLC domain. In comparison with the existingDCT domain watermarking methods, the placeswhere we propose to embed watermarks are closerto the coded bitstream. The advantage of thisembedding strategy is its efficiency in real-timedetection. With the watermark embedded in theVLC domain, tedious DCT and inverse DCTtransforms can be avoided. On the other hand,since the codeword length can be calculated bylooking up the VLC table, the bit-rate controlproblem can be easily handled.

1.3. Robustness of video watermarking with

emphasis on resistance to collusion and copy attacks

As for the problem of resistance to videoattacks, it is known that robustness is the criticalissue affecting the practicability of any water-

marking method employed in a DRM system. Therobustness of the current watermarking methodshas been frequently examined with respect toremoval attacks or geometrical attacks or both.Removal attacks try to eliminate the hidden signalW (originally embedded in the cover data I) bymanipulating the stego data Is such that the fidelityof the attacked data Ia is inevitably destroyed(i.e., PSNRðI; IsÞXPSNRðI; IaÞ). However, therealso exist attacks that can defeat a watermarkingsystem without sacrificing perceptual quality.Typically, the collusion attack [16,21,22], whichis one of the removal attacks, can make colludedmedia data more perceptually similar to its coverversion (i.e., PSNRðI; IsÞpPSNRðI; IaÞ). The collu-sion attack, in particular, is fatal to video water-marking. Collusion attacks in video watermarkingare divided into two types [22]: Type I collusionattacks (applied to video frames embedded withthe same watermark) and Type II collusion attacks(applied to video frames embedded with differentwatermarks). A Type I collusion attack is con-ducted first by averaging a set of extractedwatermarks (usually obtained through using de-noising) to estimate the hidden watermark, andthen the estimated watermark is subtracted fromall the frames in order that the hidden signal canbe removed, whereas a Type II collusion attack isoperated by averaging those perceptually similarframes in order to directly remove the watermarks.However, Type II collusion is less powerful since itis restricted to operating only on a subset of videoframes such that the video watermarks cannot beeliminated entirely. Hence, we will only focus onthe Type I collusion attack in this paper. It shouldbe noted that the conventional denoising-based

ARTICLE IN PRESS

Table 2

VLC table (s denotes the sign bit)

(run,level) Variable length code Bit length

(0,1) 11s 3

(0,2) 0100s 5

(0,3) 0010 1s 6

(0,4) 0000 110s 8

(0,5) 0010 0110s 9


removal attack [24] when applied to one singleimage is a special case of the collusion attack.On the other hand, the copy attack [11], which is

one of the protocol attacks, is used to create thefalse positive problem. Initially, the copy attackwas proposed in image watermarking and carriedout as follows: (i) a watermark is first predictedfrom a stego image; (ii) the predicted watermark isadded into a target image to create a counterfeitstego image; and (iii) from the counterfeit image, awatermark can be detected that wrongly claimsrightful ownership. Compared with the collusionattack, the copy attack can be executed on onlyone video frame or an image and, thus, is moreflexible. In this regard, the copy attack must not beignored when robustness is mentioned. Becausewatermark estimation is a common step performedto realize the collusion and copy attacks, they willbe called watermark-estimation attacks (WEAs).In the literature, previous collusion-resistant

video watermarking methods were either compu-tationally complex [22] or dependent on unstablefeature extraction [21]. In addition, the copy attackhas been ignored. In this paper, we propose acontent-dependent video watermarking scheme toresist both collusion and copy attacks. First, eachvideo-frame hash is extracted and then combinedwith a hidden message to yield the video frame-dependent watermark (VFDW). The properties ofthe VFDW will be examined, and mathematicalanalyses of the VFDW’s resistance to WEAs willbe discussed.The remainder of this paper is organized as

follows. In Section 2, we shall study how to embedand blindly extract watermarks in the VLC domain,and how to keep the bit-rate of the resultant stegovideo nearly unchanged. In Section 3, the designand analyses of the proposed VFDW in resistingboth collusion and copy attacks will be discussed.Extensive experimental results will be given inSection 4, and concluding remarks will be drawnin Section 5.

(0,6) 0010 0001s 9

: : :

(1,1) 011s 4

(1,2) 0001 10s 7

(1,3) 0010 0101s 9

: : :

2. MPEG-2 bitstream watermarking

The proposed compressed domain MPEG-2video watermarking scheme will be discussed in

this section. We shall start with a discussion ofhow to embed watermark signals in the VLCdomain. Then, a macroblock (MB)-based embed-ding scheme will be described. We will alsoinvestigate the problem of bit-rate control. Finally,we will describe how the hidden watermark can beblindly extracted. In this paper, ‘‘cover video’’is used to denote an MPEG-2 compressed bit-stream, and ‘‘stego video’’ is used to denote a(compressed+embedded) bitstream.

2.1. Video watermarking in the VLC domain

When an MPEG bitstream is to be water-marked, it is required to be decoded backwards tosome extent. In our method, only inverse entropycoding must be executed since our system isapplied in the VLC domain. After a cover(compressed) video is inversely entropy coded, anMPEG compressed bitstream is represented usingVLCs, as tabulated in Table 2. In the VLC table,each codeword corresponds to a run-level pair,denoted as ðr; lÞ. During a video encoding stage,the pixel values in the spatial domain are 8� 8discrete cosine transformed (DCT). After quanti-zation, the content of each 8� 8 block is scannedin a zigzag manner. Each scanned non-zero integerhas to be converted into a so-called run-level pair,ðr; lÞ. The run, r, indicates the number of zerospreceding the current non-zero coefficient. Thelevel, l, on the other hand, represents an integerthat is the magnitude of the non-zero coefficientafter quantization. Having the run-level pairs for

ARTICLE IN PRESS


all the non-zero quantized coefficients in an 8� 8block, one can consult the VLC table and convertthem into a bitstream. It is easy to see that if onechooses run-level pairs as potential hiding places,then it will be feasible to modify the level value only.Based on the MPEG coding rule, if a run value ismodulated, then this indicates that the number ofpreceding zeros has been altered. Basically, this kindof update will lead to serious consequences becausethe positions of all subsequent non-zero coefficientswill be shifted. Under these circumstances, if theseshifted coefficients are decoded back to the spatialdomain, the resultant frame will be quite differentfrom its original. However, if we update the levelvalue, then only its magnitude will be changed.Under these circumstances, visual degradation dueto watermark embedding can be controlled. There-fore, we propose to update the level value instead ofupdating the run value during the watermarkembedding process.There is a potential problem if one adopts level

values as hiding places. It is known that if thecompression ratio is changed, then the size of avideo bitstream will be changed and the number ofrun-level pairs in a frame will be changed as well.This will lead to the fact that the number ofwatermark bits and the number of hiding positions(e.g., number of run-level pairs) are not the same;i.e., the number of original watermark bits isdifferent from that of extracted watermark bits.Thus, it is impossible to calculate the correlationbetween an original watermark and an extractedversion, which are of different lengths. In order totackle this problem (i.e., keep the number of hidingpositions un-changed), we propose to use a MB asan embedding unit because the number of MBswithin a frame can be kept un-changed if the framesize is not changed after attacks are applied.Another issue regarding video watermarking is

the selection of a proper color channel forembedding. Usually, the frames of a videosequence will be split into Y, Cb and Cr channelsin the MPEG coding stage. According to differentcolor resampling rules, it is possible for the ratioY:Cb:Cr be set to 4:2:2 or 4:2:0. Under thesecircumstances, the only unchanged channel is theY channel. Thus, when embedding watermarks, weprefer to exploit the Y channel as the host channel.

In addition to the above selections, choosing anappropriate frame type among I-frame, P-frame orB-frame for hiding watermarks is also a crucialissue. Usually, a conventional video consists ofa number of GOPs. Each GOP is composed ofone I-frame and several B-frames and P-frames. Atypical I-frame adopts intra coding, which meansit does not refer to any other frames. Differentfrom an I-frame, a P-frame only refers to itsnearest preceding I- or P-frame. As for a B-frame,it refers to the nearest preceding and succeeding I-or P-frame. In a conventional MPEG format, thecontent of a B- or P-frame is the so-called residualerror between the current frame and the frame towhich it refers. Therefore, only an I-frame canhold complete information. In this paper, wechoose to embed watermarks into the I-frames ofan MPEG compressed video sequence. Due to theinherent referencing effect, we know that water-marks embedded in I-frames will be propagatedinto succeeding B- or P-frames when re-encodingis performed. In this situation, watermarks can bedetected from all the frames. In the next section,we shall present the detailed procedures forembedding and extracting watermarks.

2.2. MB-based video watermarking

In addition to different (spatial/transformed/compressed) domain watermarking methods, inorder to deal with geometrical distortions, theembedding of synchronization patterns/templatesin advance for later recovery of geometricalparameters has been popularly used in imagewatermarking and offers a certain benefits. In theliterature, regularly tiled subframes [9] or featurepoint-based irregular subframes [21] have beenproposed as a basic embedding unit to provide acertain degree of resistance to geometrical attacks.In [1], Alattar et al. also implemented synchroni-zation patterns and embedded them into a MPEG-4 bitstream to solve rotation and scaling problems.Of course, we can also adopt similar principles tohandle geometrical attacks in our system. How-ever, as we have pointed out in [16] and willdescribe later in Section 3, embedded synchroniza-tion/repetition patterns can be easily removed bymeans of the collusion attack. As a result, we

ARTICLE IN PRESS


would rather focus on those attacks that arepeculiar to video sequences.In this work, we shall propose a content-

dependent, collusion- and copy-resilient blindvideo watermarking system, where the originalsource is not used during the detection process.The rationale behind using blind detection ismainly based on the fact that the intrinsic contentof a video can be mostly preserved even whendigital operations are applied. Here, the so-called‘‘intrinsic content’’ of a video is defined as itsfiltered version since the noisy part is discardedby means of mean filtering. It is known thata watermark signal is usually a high-frequencysignal; therefore, watermark embedding and de-tection steps are operated in the noisy part. In thefollowing, we shall describe in detail the MB-basedvideo watermarking scheme. A block diagram ofour method is illustrated in Fig. 1.

2.2.1. Video watermark embedding

Suppose there are, in total, N MBs in a videoframe. Let ðrij ; lijÞ be the jth run-level pair in theith MB, let uðiÞ be the mean of the levels in the ithMB, and let ni be the number of levels in theith MB. Under these circumstances, the meanvalues uðiÞð1pipNÞ will form a 1-D sequence U

for each video frame. In fact, these mean valuesneed to be shuffled based on a secret key K beforewatermark embedding. Shuffling is performedbecause the mean sequence is required to besecured in order to enhance the security of theproposed watermark embedding and detectionprocesses. Under this circumstance, even thewatermarking algorithm is known, the maliciousattacker will be computationally difficult toremove the hidden watermark.

Fig. 1. Block diagram of the proposed watermark embedding

process.

Let uðiÞ’s be the mean filtered version obtainedfrom uðiÞ’s. Since uðiÞ is the mean level value of aMB, its corresponding mean filtered value uðiÞ inthe mean filtered sequence U can be derived byaveraging the mean level values of its left and rightneighbors; i.e.,

uðiÞ ¼1

mf s

Xk¼iþbmf s=2c

k¼i�bmf s=2c�1

uðkÞ (1)

is obtained and mf s denotes mean filteringsupport. By applying this mechanism to all theMBs, we can obtain a set of magnitude relation-ships between each pair of uðiÞ and uðiÞ as

mrðiÞ ¼ sgnðuðiÞ � uðiÞÞ, (2)

where sgnðÞ is a sign function and is defined as

sgnðtÞ ¼þ1 if tX0;

�1 if to0:

�

Through this filtering procedure, the noisy parts ofa host signal, i.e., ðuðiÞ � uðiÞÞ’s, that are used forembedding are obtained. In addition, we assumethat the approximate version, U, of a host signalcan be mostly obtained in the watermark detectionprocess in order to not affect robustness underblind detection.Next, a watermark signal W ¼ fwð1Þ;wð2Þ; . . . ;

wðNÞg is generated based on the secret key K forthe purpose of embedding, where wðiÞ ¼ þC or�C, and C denotes a constant. In this paper,watermark embedding is done by using a water-mark value wðiÞð1pipNÞ to perturb uðiÞ as

uhðiÞ ¼ uðiÞ þ wðiÞ, (3)

where uhðiÞ is the modulated version of uðiÞ. Afterapplying Eq. (3) to all the block mean values uðiÞ’s,each uhðiÞ can be derived according to Eq. (1) as

uhðiÞ ¼1

mfs

Xk¼iþbmfs=2c

k¼i�bmfs=2c�1

uhðkÞ

¼1

mfs

Xk¼iþbmfs=2c

k¼i�bmfs=2c�1

uðkÞ

þ1

mfs

Xk¼iþbmfs=2c

k¼i�bmfs=2c�1

wðkÞ � uðiÞ, ð4Þ

ARTICLE IN PRESS


based on the assumption that 1mfs

Pk¼iþbmfs=2ck¼i�bmfs=2c�1

�

wðkÞ � 0 due to the fact that the watermark valuesare constant and mfs is an even integer. Therefore,the new magnitude relationship following embed-ding between uhðiÞ and uhðiÞ is

mrhðiÞ ¼ sgnðuhðiÞ � uhðiÞÞ. (5)

Sequentially substituting Eqs. (3), (4), and (2) intoEq. (5), we get

mrhðiÞ ¼ sgnðuhðiÞ � uhðiÞÞ ¼ sgnððuðiÞ þ wðiÞÞ

� uðiÞÞÞ ¼ sgnððuðiÞ � uðiÞÞ þ wðiÞÞ. ð6Þ

Comparing Eqs. (2) and (6), it becomes clear thatour embedding idea is to change the magnituderelationship, mrðiÞ, between uðiÞ and uðiÞ accordingto the incoming watermark bit wðiÞ. In this paper,the goal of our embedding process is to force themodulated magnitude relationship (i.e., mrhðiÞ) tohave the ‘‘sign’’ the same as its correspondingwatermark bit, wðiÞ. With this embedding rule, wehave the following: (i) when wðiÞ is positive, uðiÞ

must be modulated by adding wðiÞ in order to getpositive mrhðiÞ; (ii) when wðiÞ is negative, uðiÞ mustbe modulated by adding wðiÞ in order to getnegative mrhðiÞ.However, the entire embedding process at this

stage has not been carried out completely becausewatermark embedding only proceeds to the ‘‘MB’’level. In practice, each run-level pair in a MB stillneeds to be modulated. Furthermore, we have notyet discussed the fidelity problem that is encoun-tered following watermark embedding. With re-gard to level-wise embedding, we propose topropagate the embedding quantity wðiÞ to all thelevels of the run-level pairs in a MB i. This impliesthat all the original run-level pairs in a MB i aremodulated with the same quantity; i.e., the levelvalue lij is modulated as ih

ij by means of

lhij ¼ lij þ wðiÞ, (7)

where 1pjpni and 1pipN.As for the fidelity issue, since the embedding

step (Eq. (7)) may produce visual defects, theactual watermark value wðiÞ needs to be furtherstudied here. Usually, a human visual model is

adopted to maintain fidelity in digital water-marking. For VLC domain video watermarkingconsidered here, the maximum quantity changeallowed for a ‘‘level’’ value is ‘‘1,’’ which corre-sponds to a fixed quantization interval in the DCTdomain. As a consequence, the following trans-parency constraint needs to be enforced during theembedding process:

jlhij � lijj ¼ jwðiÞj ¼ 1. (8)

Based on Eq. (8), the hidden watermark W isderived as a bipolar signal; i.e., the constant C isset to be 1. The magnitude of the watermarkvalues exactly indicates the maximum distortionthat we can impose on the level values of a videobitstream’s VLCs.During the compressed domain video water-

marking process, however, we encounter someproblems that should be carefully handled. Forexample, if the modulated run-level pair, ðrij ; l

hijÞ,

does not exist in the VLC codewords, then this willcause a video encoding and decoding problem. Inorder to tackle this problem, the modulated ðrij ; l

hijÞ

should be forced to possess the level value of itsclosest run-level pair in the VLC codewords. Thisalso implies that the transparency constraint,specified in Eq. (8), may not be satisfied whenthe aforementioned problem occurs. On the otherhand, if the modulated level lh

ij is less than or equalto zero, then we propose to leave the original run-level pair unchanged to maintain the correctness ofre-encoding. Under this circumstance, robustnesswill be more or less affected. In practice, ourextensive results have shown that the watermarkdetection results obtained from stego/attackedvideo sequences can be separated very wellfrom those obtained from un-watermarked videosequences.

2.3. Bit-rate control

In some applications, the bit-rate of a videobitstream is not allowed to be (dramatically)changed after a watermarking process is per-formed. Here, we shall explain how the problemof bit-rate control can be dealt with. According tothe VLC table (Table 2), the number of bits usedto represent a run level pair satisfies the following

ARTICLE IN PRESS


inequality:

VLClengthðr; lÞpVLClengthðr; l þ 1Þ, (9)

where the function VLClengthð; Þ reports thenumber of bits used to represent the VLC code-word of a run level pair. Eq. (9) implies that giventhe same run r, a larger level value is associatedwith a longer codeword. Based on the abovecoding rule, we shall describe in the followinghow the bit-rate can be kept unchanged. Strictlyspeaking, our goal is to make the bit-ratedifference between a cover and a stego videosequence negligibly small.During the watermark embedding process, we

first modify the level values of those MBs thatcorrespond to negative watermark bits. Sincenegative watermark bits are to be embedded, thecodeword length of a modulated level value will beshorter than that of an original level value, asindicated in Eq. (9). We will calculate the numberof bits, B, that have been saved after the negativewatermark bits are embedded. Then, the increasein the bit-rate that results from embedding positivewatermark bits must be kept smaller than or equalto B. To this end, we propose to embed thepositive watermark bits one by one until theincreased bit-rate satisfies the above constraint. Ifthere still are positive watermark bits that have notbeen embedded, we give up on embedding them inorder to not increase the bit-rate. Thus, robustnesswill be affected. However, our extensive resultshave shown that the watermark detection resultsobtained from the stego/attacked video sequencescan be separated very well from those obtainedfrom un-watermarked video sequences. Throughthis embedding strategy, the bit-rate (BRstego)of a stego video can be guaranteed to be equalto or smaller than that (BRcover) of its correspond-ing cover video, i.e., BRstegopBRcover. In fact,the difference between BRstego and BRcover isnegligibly small, as indicated by our experimentalresults.

2.3.1. Video watermark extraction

The watermark signal extraction process is fastand simple, and is basically an inverse process ofembedding. Because watermark extraction is con-ducted in the VLC domain, inverse quantization,

inverse DCT, and decoding+re-encoding are notperformed on the incoming video bitstreams.On the contrary, only inverse entropy coding isrequired to obtain VLC codewords. Numericalresults will be given later to show that real-timedetection can be achieved.In the watermark extraction process, the first

step calculates the block-based mean level se-quence, Us, which is then de-shuffled based on thesecret key K before watermark detection. Next, itsmean filtered version, U

s, as described in Section

2.2.1, of a suspect video is calculated. Since videoattacks may have been imposed on a stego video,the resultant sequence, U

s, will be not the same

as its corresponding original, U. However, asexplained previously, what we can rely on is theinvariance of the intrinsic content (i.e., U) ofa video. Strictly speaking, this assumption (i.e.,U ¼ U

s) cannot be guaranteed and is only correct

to some extent. However, our results indicate thatthis assumption is reasonable. After the twosequences Us and U

sare obtained, each element

weðiÞ of an extracted watermark sequence We isdetermined as follows:

weðiÞ ¼ sgnðusðiÞ � usðiÞÞ � sgnðusðiÞ � uðiÞÞ ¼ wðiÞ,

(10)

where usðiÞ � uðiÞ is simply assumed to realizeblind watermark estimation. In practice, usðiÞ canbe further formulated as the result of imposing afading effect plus a noise component on uðiÞ. Then,the subsequent task is to solve the parameters offading and noising, as previously discussed in [23]and other works. Furthermore, it should be notedthat both sequences Us and U

scan only be

obtained if the secret key K used for shuffling isknown. Under this circumstance, pirates cannoteasily estimate the watermark bits based onEq. (10).In order to determine the presence/absence of a

hidden watermark, the normalized correlationvalue between W and We is computed as follows:

dnc ¼W WeffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffijWkWej

p ¼W We

N, (11)

where ‘‘’’ is an inner product operation. Therelationship between dncð; Þ and bit error rate

ARTICLE IN PRESS


(BER) can be expressed as dncð; Þ ¼ 1� 2� BER.It is said that a watermark has been detected ifdncð; Þ is larger than a pre-determined thresholdT. In this study, T was selected as 0:11 if thedesired false positive probability was approxi-mately 10�7 [4].

3. Video frame-dependent watermark

In this section, we will describe the proposedVFDW that is embedded in our system. In Section3.1, we shall first discuss the characteristicsof watermark estimation attacks (WEAs). Then,the proposed video frame hash and VFDW willbe described in Section 3.2. Finally, the propertiesof the VFDW will be studied and its resistanceWEAs will be analyzed in Sections 3.3 and3.4, respectively. Note that in order to betterexplain the resistance of VFDW to WEAs,our analyses will be conducted in the spatialdomain. This is reasonable because a signalembedded in the transformed domain can betransferred to another equivalent signal in thespatial domain and watermark estimation bymeans of denoising [11,24] is intuitively appliedin the spatial domain. As shown in Table 1, theresistance of the VFDW to WEAs is the novelcontribution of this paper.

Fig. 2. Watermark estimation/removal illustrated with energy varia

indicating the energy of each watermark value; (b) gray bars show the

same as in the original (a); (c) the residual watermark obtained after

estimated watermark with most of the signs being opposite to those i

examples, sufficiently large correlations between (a) and (c), and betw

3.1. Watermark estimation attack

From an attacker’s perspective, the energy ofeach watermark bit must be accurately predictedso that the previously added watermark energy canbe completely subtracted to accomplish effectivewatermark removal. An estimated watermark’senergy is closely related to the accuracy of theremoval attack. Several scenarios are shown inFig. 2, which illustrates the energy variations of (a)an original watermark; (b) and (d) an estimatedwatermark (illustrated in gray-scale); and (c) and(e) a residual watermark generated by subtractingthe estimated watermark from the original water-mark. From Figs. 2(a)–(c), we can see that eventhough watermark’s sign bits are fully obtained,the corresponding energies cannot be completelydiscarded, and the residual watermark still sufficesto reveal the encoded message. Furthermore, if thesign of an estimated watermark bit is differentfrom its original sign (i.e., sgnðW ðiÞÞasgnðW eðiÞÞ),then any additional energy subtraction will not behelpful in improving removal efficiency. On thecontrary, watermark removal in terms of energysubtraction operated in the opposite (wrong)polarity will undesirably damage the media data’sfidelity. Actually, this corresponds to adding awatermark with higher energy into cover datawithout satisfying the masking constraint, as

tions: (a) original embedded watermark with each white bar

energies of an estimated watermark with all the signs being the

removing the estimated watermark (b); (d) the energies of an

n (a); (e) the residual watermark derived from (d). In the above

een (a) and (e) exist, indicating the presence of a watermark.

ARTICLE IN PRESS


shown in Fig. 2(d). After subtracting Fig. 2(d)from (a), the resultant residual watermark is thatillustrated in Fig. 2(e). By correlating Figs. 2(a)and (e), it is highly possible to reveal the existenceof a watermark. Unlike other watermark removalattacks that reduce the quality of the media data,the collusion attack may improve the quality ofcolluded data. In view of this fact, it is necessary toconsider the collusion attack when the robustnessof a video watermarking system is evaluated.

3.2. Frame hash and VFDW

In Section 3.1, we found that WEAs areachievable mainly because the hidden watermarkbehaves like a noise, so anyone can reliably utilizeall estimated noise-like watermarks. To disguisethis prior knowledge and hide it from attackers,the key is to reduce the confidence of watermarkestimation achieved by WEAs. To this end, wepropose the VFDW, which must carry informa-tion relevant to the video frame itself. Meanwhile,the content-dependent information (called theframe hash herein) must be secured by means ofthe same secret key K in order for anti-forgery andmust be robust against digital processing [16] inorder to not affect watermark detection.Here, the proposed video frame hash extraction

procedure is operated in the VLC domain. Foreach MB, a piece of representative, robustinformation is created. It is defined in each MB i

as the magnitude relationship between two ener-gies computed from level values:

hðiÞ ¼þ1 if

Pjjf jðp1ÞjX

Pjjf jðp2Þj;

�1 otherwise;

(

where hðiÞ is an element of a frame hash FH, j isthe index that indicates a block belonging to a MBi, and f jðp1Þ and f jðp2Þ denote level values at zig-zaged positions p1 and p2 in a block j, respectively.The length of an FH is exactly equal to the numberof MBs. In addition, the selected level valuesshould be at lower frequencies because level-runpairs located at high-frequency positions arevulnerable to attacks. We say that this featurevalue hðÞ is robust because this magnituderelationship can be mostly preserved under in-

cidental modifications. Since the robustness of theframe hash is beyond the scope of this paper, thereader may refer to [8,17] for similar robustnessanalyses.Next, the frame hash, FH, is merged with the

watermark, W, to generate the VFDW as

VFDW ¼ SðW;FHÞ, (12)

where Sð; Þ is a mixing function, which is operatedbased on a secret key (which will be described inthe next section) and is used to prevent attackersfrom forging the VFDW. The sequence VFDW iswhat we will embed into a video frame.

3.3. Properties of the VFDW

Let a video V be expressed as [i2OFi, where allframes Fi are concatenated to form V and Odenotes the set of frame indices. Let VFDWi be theframe-dependent watermark that is embedded inthe frame Fi to form a stego video Vs, i.e.,

Fsi ¼ Fi þ VFDWi; Vs ¼ [i2OF

si ,

where Fsi is a stego frame and VFDWi, similar to

Eq. (12), is defined as

VFDWi ¼ SðW;FHFiÞ. (13)

In Eq. (13), the mixing function Sð; Þ is designedfor shuffling the frame hash FHFi

using the samesecret key K, followed by shuffling of the water-mark to enhance security for anti-forgery. Speci-fically, it is expressed as

SðW;FHFiÞðkÞ ¼ W ðkÞPTðFHFi

;KÞðkÞ,

where PT denotes a shuffling function controlledby a secret key K.The proposed VFDW possesses the character-

istics described as below. They are useful forproving resistance to WEAs.

Definition 1. Given two frames Fi and Fj , theirdegree of similarity depends on the correlationbetween FHFi

and FHFj, i.e., dncðFi;FjÞ �

dncðFHFi;FHFj

Þ. Two extreme cases exist: (i) ifFi ¼ Fj, then dncðFi;FjÞ ¼ 1; (ii) if Fi and Fj arevisually dissimilar, then dncðFi;FjÞ � 0.

Definition 2. Given two frames Fi and Fj,dncðFi;FjÞ, and their, respectively, embedded video

ARTICLE IN PRESS


frame-dependent watermarks VFDWi and VFDWj

that are assumed to be independent and identicallydistributed (i.i.d.), the following properties can beestablished: (i) dncðVFDWi;VFDWjÞ is linearlyproportional to dncðFi;FjÞ; (ii) dncðVFDWi;VFDWjÞpdncðW2Þ; (iii) dncðW;VFDWÞ ¼ 0. Adefinition similar to this has been given in [16]for images. It is essential to emphasize thatproperty (i) of Definition 2 contrasts with theone pointed out in [22], and that the novelty of ourscheme is that the concept of the content-dependent watermark is employed.

3.4. Resistance to WEA

By means of a collusion attack, the averagingoperation is performed on stego frames Fs

i ’s of astego video. From an attacker’s perspective, eachhidden watermark has to be estimated using adenoising operation [11,24], so deviation in esti-mation will inevitably occur. Let We

i be a water-mark denoised from Fs

i . In fact, Wei can be

modeled as a partial hidden watermark plus anoise component, i.e.,

Wei ¼ aiVFDWi þ ni,

where ni represents a frame-dependent Gaussiannoise with zero mean and ai denotes the propor-tion that the watermark has been extracted. Underthese circumstances, 1Xai ¼ dncðWe

i ;VFDWiÞ4T

always holds based on the fact that a watermark isa high-frequency signal, which can be efficientlyextracted by means of denoising [9,11,24]. Let Cð�OÞ denote the set of frames used for collusion.After frame collusion is performed, the average ofall the estimated watermarks by employing theCentral Limit Theorem can be expressed as

We¼

ffiffiffiffiffiffiffijCj

p

jCj

Xi2C

Wei ¼

1ffiffiffiffiffiffiffijCj

pXi2C

ðaiVFDWi þ niÞ.

(14)

Now, a sufficient and necessary condition forresisting a collusion attack will be given inProposition 1.

Proposition 1. In a collusion attack, an attacker

first estimates We

from a set C of stego frames.

Then, a counterfeit unwatermarked video Vu is

generated from a stego video Vs ¼ [i2OFsi according

to

Fui ¼ Fs

i � We; Vu ¼ [i2OF

ui . (15)

It is said that the collusion attack fails withina frame Fu

k, k 2 C, i.e., dncðFuk;VFDWkÞ4T , if

and only if dncðWe;VFDWkÞ ¼

Pk2Cak=


p

o1� T .

Proof. First of all, we need to derivedncðW

e;VFDWkÞ. Making use of Eq. (14), we have

the following derivation:

dncðWe;VFDWkÞ

¼1ffiffiffiffiffiffiffijCj

p dncXi2C

ðaiVFDWi þ niÞ;VFDWk

!

¼1ffiffiffiffiffiffiffijCj

pXi2C

aidncðVFDWi;VFDWkÞ

þ1ffiffiffiffiffiffiffijCj

pXi2C

dncðni;VFDWkÞ

¼

Pk2CakffiffiffiffiffiffiffijCj

p , ð16Þ

where VFDWk represents the content-dependentwatermark embedded in Fk. Consequently, givenDefinition 2, and Eqs. (15) and (16), we get:

dncðFuk;VFDWkÞ4T

iff dncðFk þ VFDWk � We;VFDWkÞ4T ,

iff dncðVFDWk;VFDWkÞ

� dncðWe;VFDWkÞ4T ,

iff dncðWe;VFDWkÞ ¼

Pk2CakffiffiffiffiffiffiffijCj

p o1� T : &

ð17Þ

Examining the derived result in Proposition 1,we can find from the numerator of

Pk2Cak=


p

that the summation function exists because videocollusion is generally conducted using a set ofsimilar video frames such that the watermarksextracted from a pair of similar frames can possessa certain positive correlation. However, thischaracteristic is not guaranteed to hold in images[16]. This difference leads to the fact that videocollusion is relatively easier to accomplish than

ARTICLE IN PRESS

Table 3

Results for bit-rate control: the ratios of the bit-rate decrease

obtained from three video sequences are within the order of

½10�4 10�3�

Video BRcover (bytes) BRstego (bytes) BRcover�BRstegoBRcover

Flower-Garden 28,120,582 28,109,649 3:89e�4

Table-Tennis 28,118,904 28,089,882 1:03e�3

Football 7,276,845 7,271,528 7:31e�4


image collusion.2 Our experimental results, as willbe shown in Section 4, further confirm this fact.Furthermore, resistance to the copy attack can besimilarly derived. Please refer to [16] for moredetails by treating an image as if it were a videoframe.

4. Experimental results

A series of experiments was conducted to verifythe performance of the proposed method. Threecommonly used videos, ‘‘Flower-Garden,’’ ‘‘Table-Tennis,’’ and ‘‘Football,’’ were adopted in ourexperiments. The number of frames was 375 forboth ‘‘Flower-Garden’’ and ‘‘Table-Tennis’’ and 97for ‘‘Football.’’ The frame size of both ‘‘Flower-Garden’’ and ‘‘Table-Tennis’’ was 704� 576, and itwas 720� 486 for ‘‘Football.’’ The MPEG-2 codec[19] was used to generate compressed videosequences as the cover sources. The bit-rate of eachsource video was fixed at 15Mbits/s, and the frame-rate was 25 frames/s. In addition, the length ofa GOP was 12, and the GOP structure was‘‘IBBPBBPBBPBB.’’ Because watermarks wereconcealed in I-frames in this study, we will presentthe watermark detection results obtained fromI-frames only even though watermarks also couldbe found in non-I-frames (this will be shown to betrue by using I-frame dropping and transcodingattacks). The performance in terms of bit-ratecontrol, fidelity, resistance to numerous incidentaland malicious attacks, and real-time detection wereexamined in our experiments.

4.1. Bit-rate control

In order to show the bit-rate control perfor-mance of our method, the bit-rates, BRcover and

2More specifically, we gain an interesting result from the

collusion-resilient image watermarking [16] and collusion-

resilient video watermarking (this paper) methods. In [16], if a

collusion attack is carried out in an image watermarking

scheme with embedding of multiple redundant watermarks, the

anti-collusion performance is proved to be lower bounded by

jCj ¼ 1. However, if a collusion attack is carried out in a video

watermarking scheme, the anti-collusion performance is almost

the same no matter what the size of jCj is.

BRstego, generated from MPEG-2 cover video andstego video, respectively, are compared in Table 3.In addition, the ratio, ðBRcover � BRstegoÞ=BRcover,of bit-rate reduction achieved by our watermark-ing method is also given. It can be observed fromTable 3 that the bit-rate reduction ratios for thesevideo sequences are all sufficiently small. Numeri-cally, they are on the order of 10�4�10�3.

4.2. Fidelity of stego video sequences

In order to determine the impact of watermarkembedding on the quality of stego video se-quences, we compared two PNSR curves thatwere generated from (i) raw video vs. cover(compressed) video and (ii) raw video vs. stego(compressed+embedded) video. Fig. 3 plots thePSNR curves, where the PSNR values weremeasured in all frames of the three videos. InFig. 3 (a), the PSNR values of the first 200 framesvary smoothly while the PSNR values of theremaining frames vary significantly. The reasonfor these results was that the ‘‘Flower-Garden’’video had no significant motion at the beginningand started to have significant motion after the200th frame. Statistically, an average PSNRdecrease of 3:13 dB was generated. Perceptually,no visual degradations could be sensed when thestego video was played normally. Figs. 3(b) and (c)show similar results that were yielded from thevideo sequences ‘‘Table Tennis’’ and ‘‘Football.’’The average PSNR decrease, as shown in Fig. 3(c),was less than 1 dB. The reason for these results wasthat some watermark bits were given up to not beembedded in order to enable video re-encodingand decoding to be normally performed.

ARTICLE IN PRESS

50 100 150 200 250 300 350

34

36

38

40

42

44

Frames

PS

NR

(dB

)

Cover (compressed) Flower − Garden videoStego (compressed+watermarking) Flower − Garden video

50 100 150 200 250 300 350

32

34

36

38

40

42

44

Frames

PS

NR

(dB

)

Cover (compressed) Table − Tennis videoStego (compressed+watermarking) Table − Tennis video

10 20 30 40 50 60 70 80 90

34

36

38

40

42

44

Frames

PS

NR

(dB

)

Cover (compressed) Football videoStego (compressed+watermarking) Football video

(a)

(b)

(c)

Fig. 3. The PSNR values measured in different frames of three

videos: (a) Flower-Garden; (b) Table-Tennis; and (c) Football.

The PSNR decrease (in dB) for each video is indicated

statistically in terms of the mean/variance: (a) 3:13/1:35;(b) 2:32/1:26; and (c) 1:11/0:69.


In Fig. 3, it is not difficult to locate the regionswhere motion was significant. One thing to note isthat the lowest PSNR value was always obtainedin an I-frame due to embedding. In addition, thePSNR decrease observed in the B- and P-frameswas due to: (i) the referred I-frames wereembedded and (ii) drift compensation was notadopted to compensate for the effect of (i) in theproposed watermarking method. Although theoverall fidelity of the stego video sequences wasmore or less degraded, the computational com-plexity of applying drift compensation was saved.We will show in the following that robustness wasnot sacrificed even we did not employ driftcompensation.

4.3. Resistance to incidental video attacks

To test robustness against different attacks, weused several attacks, including MPEG-2 re-encod-ing with lower bit-rates (i.e., changing the originalbit-rate from 15M bps to 6M, 4M, and 2M bps,respectively), noise addition (the PSNR between anoiseless frame and its noisy version was fixed at27:05 dB), sharpening, frame averaging, and framerate changing (changed from 25 to 24 frames/s) toverify the performance of our video watermarkingalgorithm. Since the tested attacks may be appliedin normal applications, they are called incidentalattacks here. Fig. 4 shows an example of thewatermark detection results obtained when Flow-er-Garden was used as the cover video. In thisfigure, the horizontal axis indicates the I-framenumber, and the vertical axis indicates thecorrelation value. The correlation values detectedin the I-frames of the attacked Flower-Gardenvideo and detected in un-watermarked videoI-frames are also provided for the purpose ofcomparison. In addition, Fig. 4(a) shows thedetection results obtained using the proposedmethod but without using the VFDW, whileFig. 4(b) shows the results obtained using theproposed method with embedding of the VFDW.In both Figs. 4(a) and (b), it is easy to distinguishthe attacked watermarked videos from theun-watermarked videos. Furthermore, Fig. 4(b)shows lower correlation values (detected inattacked stego videos) than Fig. 4(a) does. The

ARTICLE IN PRESS


main reason for these results may have beeninsufficient randomness of the VFDW (referring tothe numbers of 1’s and �1’s), generated from thecontents of video frames (as indicated in Eq. (12)).In particular, when a video frame showed domi-

Table 4

Transcoding parameters

GOP’s structure IBBBBBBBBBBBB I

GOP’s length 13 1

-0.2

0

0.2

0.4

0.6

0.8

Nor

mal

ized

cor

rela

tion

-0.2

0

0.2

0.4

0.6

0.8

Nor

mal

ized

cor

rela

tion

Flower − Garden (watermarked)Flower − Garden (frame − rate changing)Flower − Garden (MPEG: 6 Mbit/s )Flower − Garden (MPEG: 4 Mbit/s)Flower − Garden (MPEG: 2 Mbit/s)Flower − Garden (frame averaging)Flower − Garden (sharpening)Flower − Garden (noise adding)Susi (unwatermarked)Mobile (unwatermarked)Table − Tennis (unwatermarked)

5 10 15 20 25 30I Frames

Flower − Garden (watermarked)Flower − Garden (frame − rate changing)Flower − Garden (MPEG: 6 Mbit/s )Flower − Garden (MPEG: 4 Mbit/s)Flower − Garden (MPEG: 2 Mbit/s)Flower − Garden (frame averaging)Flower − Garden (sharpening)Flower − Garden (noise adding)Susi (unwatermarked)Mobile (unwatermarked)Table − Tennis (unwatermarked)

5 10 15 20 25 30I Frames(a)

(b)

Fig. 4. The correlation values detected from the attacked

Flower-Garden video sequences and un-watermarked videos:

(a) shows the results obtained using our method but without

using the VFDW, while (b) shows the results obtained using our

method by embedding the VFDW. The dashdot line indicates

the threshold T ¼ 0:11.

nant features in either the horizontal or the verticaldirection, the resultant video hash was biased.Overall, based on the achieved robustness, ourassumption of blind detection accomplished bypreserving the intrinsic content of a video, asdescribed in Section 2.3.1, is acceptable. Inaddition, as mentioned previously, the lack ofdrift compensation did not apparently harmrobustness.In addition to the above attacks, transcoding

is also popularly applied to different applicationsfor video transmission over the network. In thefollowing test, transcoding was used to change theGOP structure of stego video sequences. Recallthat the original GOP was ‘‘IBBPBBPBBPBB’’with a length of 12. Three different GOPs, asshown in Table 4, were used in the robustness test.In particular, two of the GOPs had lengths 13 and19, respectively, such that each of them and 12(the length of the original GOP) got the greatestcommon denominator (GCD) 1. Our intentionwas to (i) make the original I-frames inter-coded inthe attacked video; (ii) create new I-frames in theattacked video that were originally inter-coded inthe stego video. Fig. 5 shows an example of thechanges of the frame types after transcoding. Sincethe watermarks were embedded into the I-frames,we wanted to evaluate whether the watermarkscould still be detected in the new I-frames nomatter whether they were B- or P-frames in thestego/un-attacked video. More specifically, weevaluated whether the watermarks, propagatedfrom I-frames to B- and P-frames throughtranscoding, could still be detected. The correla-tion values detected from the ‘‘I-frames’’ of thethree transcoded video sequences are shown inFig. 6. It can be observed that (i) the correlationvalues detected from frames with smaller motionswere clearly exceeded the threshold T and (ii)watermarks were harder to detect in frames withlarger motions.

BBPBBPBBPBBPBB IPPPPPPPPPPPPPPPPPP

5 19

ARTICLE IN PRESS

Fig. 5. Change of the GOP: (a) a video encoded with an

original GOP; (b) a video encoded with a new GOP under

transcoding. Light gray shading indicates the frame types that

are changed from P to B, while dark gray shading indicates the

frame types that are changed from I to non-I or from non-I to I.

5 10 15 20 25 30 -0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

I Frames

Nor

mal

ized

cor

rela

tion

Table − Tennis with transcoding (|GOP|=13, IBBB...B)Flower − Garden with transcoding (|GOP|=13, IBBB...B)

5 10 15 20 25 -0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

I Frames

Nor

mal

ized

cor

rela

tion

Table − Tennis with transcoding (|GOP|=15, IBBPBBP...)Flower − Garden with transcoding (|GOP|=15, IBBPBBP...)

2 4 6 8 10 12 14 16 18 20 -0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

I Frames

Nor

mal

ized

cor

rela

tion

Table − Tennis with transcoding (|GOP|=19, IPPP...P)Flower − Garden with transcoding (|GOP|=19, IPPP...P)

(a)

(b)

(c)

Fig. 6. The correlation values detected from transcoded videos

using the GOP parameters described in Table 4. The dashdot

line indicates the threshold T ¼ 0:11: (a) GOP Type I; (b) GOPType II; and (c) GOP Type III.


4.4. Resistance to malicious video attacks

Here, malicious video attacks refer to those ofattackers who may have knowledge about thewatermarking algorithm and purposely apply theknown knowledge to remove/destroy the hiddenwatermarks. In the following experiments, we willconsider I-frame dropping and WEAs. I-framedropping was taken into consideration becausewe assumed that the attackers knew that wehad concealed watermarks in I-frames, and thatthe hidden watermarks would be propagated intoB- and P-frames once re-encoding was performed.In addition, watermark estimation attacks, asdescribed and analyzed in Section 3.1, are alsoconsidered to be malicious.

4.4.1. Resistance to I-frame dropping

We started by decoding the watermarked videointo a still image sequence, and then those videoframes (in the spatial domain) corresponding tothe I-frames (in the compressed domain) wereremoved. After the I-frames had been removed,the remaining frames were re-encoded to yield anew compressed video. Fig. 7 shows three sets ofexperimental results following the I-frame drop-ping attack. The horizontal axis denotes the newI-frame numbers of the re-encoded videos, andthe vertical axis denotes the correlation valuesdetected in the renewed I-frames. We can see that

ARTICLE IN PRESS

5 10 15 20 25 -0.2

0

0.2

0.4

0.6

0.8

Renewed I−frame

5 10 15 20 25

Renewed I−frame

5 10 15 20 25Renewed I−frame

Nor

mal

ized

cor

rela

tion

-0.2

0

0.2

0.4

0.6

0.8

Nor

mal

ized

cor

rela

tion

-0.2

0

0.2

0.4

0.6

0.8

Nor

mal

ized

cor

rela

tion

Flower − Garden (I− frame dropping)

Table − Tennis (I − frame dropping)

Football (I − frame dropping)

(a)

(b)

(c)

Fig. 7. The correlation values detected after the I-frame

dropping attack was applied to the (a) Flower-Garden,

(b) Table-Tennis, and (c) Football, respectively. The dashdot

line indicates the threshold T ¼ 0:11.


some detected correlation values were quite high,which means that the residual watermark (propa-gated from the original I-frames) could still bedetected in the newly assigned I-frames. However,some residual watermarks were almost destroyed.Serious damage to the residual watermark usuallyhappened in those video frames that had signifi-cant motion.In what follows, we will discuss why our method

is vulnerable to significant motion. Suppose awatermarked I-frame is at frame t; then framet þ 1 is definitely a B- or P-frame that will refer tothe preceding I-frame (frame t). Let MBi;t andMBj;t be the ith and the jth MB of frame t,respectively. According to our algorithm, thewatermark bit wðiÞ will be embedded in MBi;t,and wðjÞ will be embedded in MBj;t. If MBj;tþ1

must refer to MBi;t during motion estimation, thenthe content of MBj;tþ1 is expected to be similar tothat of MBi;t. The above-mentioned referencingmechanism implies that the watermark bit wðiÞ ofMBi;t of frame t will be propagated to MBj;tþ1 offrame t þ 1. Under these circumstances, the water-mark bit detected from MBj;tþ1 is wðiÞ, not wðjÞ.However, ideally, the expected watermark bitshould be wðjÞ. Therefore, it is clear that whenthere is any significant motion in a video and theI-frame dropping attack is applied, it is difficult tocorrectly detect the watermark in the newlyassigned I-frames. This explains why the footballvideo produced the worst results (as indicated inFig. 7(c)).

4.4.2. Resistance to watermark estimation attacks

To study the resistance to WEAs, the proposedvideo watermarking scheme embedded with avideo frame-independent watermark was usedand, denoted as Method I. The combination ofthe proposed VFDW and Method I was denotedas Method II. We wanted to verify the advantageof using VFDW by comparing the performance ofMethods I and II when WEAs were imposed.

4.4.2.1. VFDW resistance to the collusion attack.

The collusion attack was applied to Method I(without using the VFDW) and Method II (usingthe VFDW), respectively. The affects of thecollusion attack and VFDW were examined from

ARTICLE IN PRESS

5 10 15 20 25 30 -0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

I Frames

Nor

mal

ized

cor

rela

tion

Flower − Garden without VFDW/without collusionFlower − Garden with VFDW/without collusionFlower − Garden with VFDW/with Type − I collusion (|C|=16)Flower − Garden with VFDW/with Type − I collusion (|C|=1)Flower − Garden without VFDW/with Type − I collusion (|C|=1)Flower − Garden without VFDW/with Type − I collusion (|C|=16)

Fig. 9. Watermark detection under collusion (the dashdot line

indicates the threshold T ¼ 0:11). Comparing the detectioncurves obtained using different jCj’s reveals that the anti-

collusion capability of our watermarking method is not affected

by the size of a collusion set (see the third and fourth curves). If

VFDW is not used, collusion indeed provides effective water-

mark removal (see the last two curves). These results are exactly

consistent with Proposition 1.

50 100 150 200 250 300 350Frames

50 100 150 200 250 300 350Frames

without VFDW/without collusionwithout VFDW/with collusion

34

36

38

40

42

44

46

48

PS

NR

(dB

)

34

36

38

40

42

44

46

48

PS

NR

(dB

)

with VFDW/without collusionwith VFDW/with collusion

(a)

(b)

Fig. 8. Quality of a colluded Flower-Garden video: (a) the

PSNR values of the colluded frames (top) are higher than those

of the stego frames; (b) when VFDW was applied, the PSNR

values of the colluded frames (bottom) became lower than those

of the stego frames. This experiment reveals that a collusion

attack will fail to improve the fidelity of a colluded video when

VFDW is applied.


two viewpoints: (s1) the quality of a colludedvideo; and (s2) watermark detection after perform-ing collusion. Typical results obtained from theFlower-Garden video are depicted in Figs. 8 and 9,respectively.As for (s1), it can be found in Fig. 8(a) that

collusion improved the quality of the colluded videoframes in terms of MSE. However, the VFDWcould force collusion to undesirably degrade thequality of the colluded video frames, as shown inFig. 8(b). This experiment demonstrated that the

VFDW was efficient in preventing a collusionattack from achieving perfect cover video recovery.As for (s2), the watermark detection results for

colluded video frames are shown in Fig. 9. We cansee from the first two curves of Fig. 9 that whenvideo collusion was absent, the detection valuesobtained from Method II were slightly smallerthan those obtained from Method I. On the otherhand, the last two curves in Fig. 9 show that whenVFDW was not employed (i.e., using Method I),all the watermarks could not be extracted fromcolluded frames. In addition, once the VFDW wasemployed in embedding (i.e., using Method II),watermarks could be detected (the third andfourth curves of Fig. 9) from all the I-frames nomatter what the size of the collusion set C is. Itshould be noted that these two curves are veryclose to each other, which coincide with our resultderived in Proposition 1. This experiment verifiedthe resistance of the VFDW to a collusion attack.In summary, as long as a frame hash is used to

construct a watermark, even when a collusionattack is applied, watermarks still can be extractedby owners, and the fidelity of colluded videoscannot be improved. As a result, the merits ofVFDW in resisting collusion have been confirmed.

ARTICLE IN PRESS

1 2 3 40

50

100

150

200

250

300

350

400

450

500

Tim

e[se

c.]

decoding+

re − encoding

(Flower − Garden)

decoding+

re − encoding

(Table − Tennis)

watermark detection+

decoding

decoding

Fig. 10. Comparison of the time consumed by (1) Flower-

Garden video decoding+re-encoding; (2) Table-Tennis video

decoding+re-encoding; (3) video decoding+our watermark

detection (not optimized for speed), and (4) video decoding,

respectively. Note that the amount of time needed to re-encode

Flower-Garden and Table-Tennis were different. However, the

amount of time used in watermark detection+decoding and

decoding for both Flower-Garden and Table-Tennis were

almost the same.


4.4.2.2. VFDW resistance to the copy attack. Thecopy attack was applied on Method I and MethodII, respectively. When the copy attack was per-formed, one of the videos was first watermarked,and then the watermark was estimated and copied tothe other unwatermarked videos to form counterfeitstego videos. By repeating the above procedure, weobtained six counterfeit stego videos in total. ThePSNR values (stego video vs. stego+copy attackedvideo) of the attacked video frames were in the rangeof 36�55 dB (no masking was used). The normalizedcorrelations obtained by applying the copy attack toMethod I fell within the interval [0.487 0.650] (allwere sufficiently larger than T ¼ 0:11), whichindicated the presence of watermarks. However,when VFDW was introduced, these correlationsdecreased significantly to the interval ½�0:056 0:060�,which indicated the absence of watermarks. Theexperimental results are consistent with the analyticresult indicating that the proposed VFDW is able todeter the detection of copied watermarks.

4.5. Real-time detection

As for the real-time detection requirement, thetime consumed on the video decoder/watermarkdetector side in three different situations, including(1) video decoding and re-encoding, (2) videodecoding and watermark detection, and (3) videodecoding, was compared and is shown in Fig. 10.Our watermarking system was run on a PC with aPentium-4 2.5GHz CPU under Windows 2000.The two left bars in Fig. 10 show the time neededto finish video decoding plus re-encoding for‘‘Flower-Garden’’ and ‘‘Table-Tennis,’’ respec-tively. The difference in the time cost is mainlydue to the different contents of the videosequences; therefore the amount of time consumedin motion estimation was different. However, thedecoding time required for both video sequenceswas almost the same. On the other hand, theaverage watermark detection time was about0.117 s/frame. It is obvious that our method (asshown by the third bar in Fig. 10) used much lesstime than video decoding+re-encoding. Compar-ing the amount of time used for pure decoding (thebar on the right side of Fig. 10), our method (videodecoding+watermark detection) needed nearly

the same amount of time. From the comparedresults, it is reasonable to conclude that ourwatermark detection scheme can almost be exe-cuted in real-time. In addition, uncompresseddomain video watermarking is not feasible forreal-time application because the time spent in thedecoding and re-encoding process is very long.

5. Conclusion

A digital video watermarking system needs todeal with several critical issues that are peculiar tovideo sequences. In this paper, we have presented anew video watermarking scheme that takes theseissues into consideration. These include com-pressed domain watermarking, real-time detection,bit-rate control, and resistance to video incidentaland malicious attacks. In particular, we haveprovided a watermarking method that can beperformed in the variable length codeword (VLC)domain. We have also designed a video frame-dependent watermark (VFDW), which is able toresist watermark estimation attacks (WEAs) thathave been largely ignored in the literature.

ARTICLE IN PRESS


Resistance to WEAs is indeed indispensablebecause they are efficient in defeating a videowatermark system while maintaining the visualquality of attacked video sequences. We haveconducted extensive experiments to verify theperformance of the proposed method.In this paper, we have not considered resistance

to geometrical distortions. As pointed out pre-viously, the embedding of synchronization/repeti-tion patterns is a common way used to tolerategeometrical attacks (only efficient to certainextent). However, the embedded synchronizationpatterns are easy to remove using the collusionattack. Therefore, we do not think it is sufficientto employ the similar idea to deal with the problemof geometrical distortions. On the contrary, wepropose to exploit a mesh [18] as the basicembedding unit, and to combine it with its mesh-based hash [8,17] to resist both the geometricattacks and estimation attacks [16].

Acknowledgements

This paper was supported under NSC Grants91-2213-E-001-037 and 92-2422-H-001-004.

References

[1] A.M. Alattar, E.T. Lin, M.U. Celik, Digital watermarking

of low bit-rate advanced simple profile MPEG-4 com-

pressed video, IEEE Trans. Circuits Syst. Video Technol.

13 (8) (2003) 787–800.

[2] S. Arena, M. Caramma, R. Lancini, Digital watermarking

applied to MPEG-2 coded video sequences exploiting

space and frequency masking, Proceedings of the IEEE

International Conference on Image Processing, 2000.

[3] S. Baudry, P. Nguyen, H. Maita, Channel coding in video

watermarking: use of soft decoding to improve the

watermark retrieval, Proceedings of the IEEE Interna-

tional Conference on Image Processing, 2000.

[4] I.J. Cox, M.L. Miller, J.A. Bloom, Digital Watermarking,

Morgan Kaufmann Publishers, Los Altos, CA, 2002.

[5] J. Dittmann, M. Stabenau, R. Steinmetz, Robust MPEG

video watermarking technologies, Proceedings of the ACM

Multimedia, Bristol, UK, 1998.

[6] F. Hartung, B. Girod, Watermarking of uncompressed and

compressed video, Signal Process. 66 (3) (1998) 283–302.

[7] F. Hartung, M. Kutter, Multimedia watermarking techni-

ques, Proc. IEEE 87 (1999) 1079–1107.

[8] C.Y. Hsu, C.S. Lu, A geometric-resilient image hashing

system and its application scalability, Proceedings of the

ACM Multimedia and Security Workshop, Magdeburg,

Germany, 2004, pp. 81–92.

[9] T. Kalker, G. Depovere, J. Haitsma, M. Maes, A video

watermarking system for broadcast monitoring, Proceed-

ings of the SPIE, vol. 3657, 1999, pp. 103–112.

[10] D.R. Kim, S.H. Park, A robust video watermarking

method, Proceedings of the IEEE International Confer-

ence on Multimedia and Expo, 2000.

[11] M. Kutter, S. Voloshynovskiy, A. Herrigel, The water-

mark copy attack, Proceedings of the SPIE: Security and

Watermarking of Multimedia Contents II, vol. 3971, 2000.

[12] G.C. Langelaar, R.L. Lagendijk, Optimal differential

energy watermarking of DCT encoded images and videos,

IEEE Trans. Image Process. 10 (1) (2001) 148–158.

[13] G.C. Langelaar, R.L. Lagendijk, J. Biemond, Real-time

labeling of MPEG-2 compressed video, J. Visual Commun.

Image Represent. 9 (4) (1998) 256–270.

[14] J. Linnartz, J.C. Talstra, MPEG PTY-marks: cheap

detection of embedded copyright data in dvd-video,

ESORICS98, 1998.

[15] C.S. Lu, J.R. Chen, K.C. Fan, Resistance of content-

dependent video watermarking to watermark-estimation

attacks, Proceedings of the IEEE International Conference

on Communications, France, 2004, pp. 1386–1390.

[16] C.S. Lu, C.Y. Hsu, Content-dependent anti-disclosure

image watermark, Proceedings of the International Work-

shop on Digital Watermarking, Lecture Notes in Compu-

ter Science, vol. 2939, Seoul, Korea, 2003, pp. 61–76.

[17] C.S. Lu, C.Y. Hsu, S.W. Sun, P.C. Chang, Robust mesh-

based hashing for copy detection and tracing of images,

Proceedings of the IEEE International Conference on

Multimedia and Expo, Taipei, Taiwan, 2004.

[18] C.S. Lu, S.W. Sun, P.C. Chang, Robust mesh-based

content-dependent image watermarking with resistance to

both geometric attack and watermark-estimation attack,

Proceedings of the SPIE: Security, Steganography, and

Watermarking of Multimedia Contents VII (E1120),

San Jose, CA, USA, 2005.

[19] ftp://ftp.mpegtv.com/pub/mpeg/mssg/mpeg2v12.zip.

[20] F. Petitcolas, R.J. Anderson, M.G. Kuhn, Information

hiding: a survey, Proc. IEEE 87 (1999) 1062–1078.

[21] K. Su, D. Kundur, D. Hatzinakos, Statistical invisibility

for collusion-resistant digital video watermarking, IEEE

Trans. Multimedia 7 (1) (2005) 43–51.

[22] M.D. Swanson, B. Zhu, A.H. Tewfik, Multiresolution

scene-based video watermarking using perceptual models,

IEEE J. Sel. Area Commun. 16 (4) (1998) 540–550.

[23] S. Voloshynovskiy, F. Deguillaume, S. Pereira, T. Pun,

Optimal adaptive diversity watermarking with channel

state estimation, Proceedings of the SPIE: Security and

Watermarking of Multimedia Contents III, vol. 4314,

USA, 2001.

[24] S. Voloshynovskiy, S. Pereira, A. Herrigel, N. Baumgart-

ner, T. Pun, Generalized watermarking attack based

on watermark estimation and perceptual remodulation,

Proceedings of the SPIE: Security and Watermarking of

Multimedia Contents II, vol. 3971, 2000.

ftp://ftp.mpegtv.com/pub/mpeg/mssg/mpeg2v12.zip