ARTICLE IN PRESS
0923-5965/$ - se
doi:10.1016/j.im
$This paper
of an earlier pap
the IEEE Int
France, 2004.�CorrespondiE-mail addre
Signal Processing: Image Communication 20 (2005) 624–642
www.elsevier.com/locate/image
Real-time frame-dependent video watermarking inVLC domain$
Chun-Shien Lua,�, Jan-Ru Chena,b, Kuo-Chin Fanb
aInstitute of Information Science, Academia Sinica, Taipei, Taiwan, ROCbDepartment of Computer Science and Information Engineering, National Central University, Chung-Li, Taiwan, ROC
Received 5 October 2004; accepted 15 March 2005
Abstract
Digital watermarking is a helpful technology for providing copyright protection for valuable multimedia data. In
particular, video watermarking deals with several issues, which are unique to various types of media watermarking.
In this paper, these issues, including compressed domain watermarking, real-time detection, bit-rate control, and
resistance to watermark estimation attacks, will be addressed. Since video sequences are usually compressed before they
are transmitted over networks, we first describe how watermark signals can be embedded into compressed video while
keeping the desired bit-rate nearly unchanged. In the embedding process, our algorithm is designed to operate directly
in the variable length codeword (VLC) domain to satisfy the requirement of real-time detection. We describe how
suitable positions in the VLC domain can be selected for embedding transparent watermarks. Second, in addition to
typical attacks, the peculiar attacks that video sequences encounter are investigated. In particular, in order to deal with
both collusion and copy attacks that are fatal to video watermarking, the video frame-dependent watermark (VFDW)
is presented. Extensive experimental results verify the excellent performance of the proposed compressed video
watermarking system in addressing the aforementioned issues.
r 2005 Elsevier B.V. All rights reserved.
Keywords: Attack; Bit-rate control; Collusion; Video frame-dependent watermark; Real-time detection; Robustness
e front matter r 2005 Elsevier B.V. All rights reserve
age.2005.03.012
is a significantly modified and extended version
er [15] that was published in the Proceedings of
ernational Conference on Communications,
ng author. Tel.: +886 2 27883799.
ss: [email protected] (C.-S. Lu).
1. Introduction
1.1. Literature review
Due to the rapid development of the Internetduring the past decade, numerous methods havebeen proposed for storing and transmitting digitalmultimedia data. Digital data is much superior toanalog data because the quality of copies is not
d.
ARTICLE IN PRESS
1It is worth mentioning that if each image unit (e.g., a block
or mesh) in an image is treated like a frame in a video sequence;
then collusion attacks can also be applied to those image
watermarking methods that employ a multiple redundant
watermark embedding strategy [16].
C.-S. Lu et al. / Signal Processing: Image Communication 20 (2005) 624–642 625
degraded at all. However, this superiority hasbecome a threat to authorized usage becausedigital data can be easily tampered with, dupli-cated or distributed without the need for expensivetools. As a consequence, the intellectual propertyprotection problem has become an urgent issue inthe digital world.Recently, an emerging intellectual property
protection scheme called ‘‘digital watermarking’’has been extensively explored [4,7,20]. Digitalwatermarking is a helpful technique that can beused to protect multimedia data. The underlyingconcept of digital watermarking is to embedimperceptible signals into digital multimedia datato carry out specific missions. When watermarkedmultimedia data encounters a digital signal pro-cessing operation, the embedded watermark isexpected to survive this operation. If someoneviolates the copyright of watermarked multimediadata, the original owner can prove ownership byextracting the embedded watermark. In this paper,we shall focus on the development of a new videowatermarking scheme.In the past decade, a number of video water-
marking schemes have been proposed. The existingvideo watermarking schemes embed watermarkseither in raw video [3,5,6,10,22] or in compressedvideo [2,6,12–14]. Compressed domain videowatermarking is considered more practical and isthe objective of this paper because video is usuallystored in a compressed format before it istransmitted over networks. In [6], Hartung andGirod proposed a video watermarking scheme thatuses MPEG-2 bitstreams as the underlying targetdata. They arrange a watermark sequence into a2D format, which is the same size as a video frame.Then, the watermark signals are 8� 8 DCTtransformed and added into their correspondingDCT coefficients. In order to deal with the errorpropagation problem caused by watermarking,they add a drift compensation signal to thewatermarked signal. Their detection process hasbeen shown to be very close to real-time detection.Arena et al. [2] improved Hartung and Girod’swork with interleaved encoding. They also makeuse of the advantageous properties of the humanvisual system (HVS). In [13], Langelaar et al.proposed a video watermarking scheme which can
be used in the compressed domain by modifyingthe codewords generated by a variable lengthcodeword (VLC). First, they divide run-level pairsinto many groups, where each group containscodewords of equal length and the level differencein each group is exactly one. During watermarkembedding, a run-level pair is either left un-changed or replaced, depending on the incomingwatermark value. Their method is basically a leastsignificant bit (LSB) approach. This implies thatquality degradation of a video after embeddingcan be almost negligible. However, for a com-monly adopted signal processing operation, suchas decoding followed by re-encoding, achievingrobustness becomes impossible. In [12], Langelaarand Lagendijk proposed a differential energywatermarking (DEW) algorithm which can beemployed in the DCT domain. They calculate theenergy difference of two blocks and remove thehigh-frequency coefficients of the correspondingblock with energy being pre-defined as smaller inorder to maintain the pre-defined energy relation.The authors mentioned that their method is robustagainst the re-encoding of video bit streams.However, as mentioned in [1,12], this techniqueis susceptible to transcoding, especially when thegroup of picture (GOP) structure is changed. Anearlier paper [22] described a scene-based videowatermarking approach that uses a ‘‘temporalwavelet transform.’’ One major feature is that theauthors first pointed out the collusion attack,which is a common threat to video watermarkingmethods. In fact, the collusion attack can be saidto be a unique outcome1 of video watermarkmethods. We deeply believe that it would bemeaningless to claim that a video watermarkingscheme was robust if it could not deal withcollusion. Overall, although the aforementionedpapers have dealt with problems related to videowatermarking, some issues still remain. In order togive the reader a clear idea of the state of the art invideo watermarking, Table 1 summarizes selected
ARTICLE IN PRESS
Table 1
Video watermarking methods vs. video characteristics
Video characteristic [22] [9] [21] [6] [13] [12] [1]
Compressed domain watermarking N N N Y(DCT) Y(VLC) Y(DCT) Y(DCT)
Real-time detection N Y N Y Y Y Y
Bit-rate (nearly) unchanged N N N N Y N Y
Drift compensation N N N Y N N Y
Low bit-rate embedding N N N N N N Y
Resistance to collusion Y N Y N N N N
Resistance to copy attack N N N N N N N
C.-S. Lu et al. / Signal Processing: Image Communication 20 (2005) 624–642626
techniques versus video characteristics. In thefollowing subsections, the issues of compresseddomain watermarking, real-time detection, bit-ratecontrol, and resistance to peculiar attacks thatvideo sequences may encounter will be discussed.
1.2. Compressed domain video watermarking,
real-time detection, and bit-rate control
As for watermarking in the compressed domain,most of the existing compressed domain videowatermarking methods are, in fact, employed inthe ‘‘DCT’’ domain. In other words, both inverseentropy coding and inverse quantization must beperformed before watermark embedding or detec-tion. In this work, we shall propose a compressed-domain video watermarking method in which thewatermarking process can be directly performed inthe VLC domain. In comparison with the existingDCT domain watermarking methods, the placeswhere we propose to embed watermarks are closerto the coded bitstream. The advantage of thisembedding strategy is its efficiency in real-timedetection. With the watermark embedded in theVLC domain, tedious DCT and inverse DCTtransforms can be avoided. On the other hand,since the codeword length can be calculated bylooking up the VLC table, the bit-rate controlproblem can be easily handled.
1.3. Robustness of video watermarking with
emphasis on resistance to collusion and copy attacks
As for the problem of resistance to videoattacks, it is known that robustness is the criticalissue affecting the practicability of any water-
marking method employed in a DRM system. Therobustness of the current watermarking methodshas been frequently examined with respect toremoval attacks or geometrical attacks or both.Removal attacks try to eliminate the hidden signalW (originally embedded in the cover data I) bymanipulating the stego data Is such that the fidelityof the attacked data Ia is inevitably destroyed(i.e., PSNRðI; IsÞXPSNRðI; IaÞ). However, therealso exist attacks that can defeat a watermarkingsystem without sacrificing perceptual quality.Typically, the collusion attack [16,21,22], whichis one of the removal attacks, can make colludedmedia data more perceptually similar to its coverversion (i.e., PSNRðI; IsÞpPSNRðI; IaÞ). The collu-sion attack, in particular, is fatal to video water-marking. Collusion attacks in video watermarkingare divided into two types [22]: Type I collusionattacks (applied to video frames embedded withthe same watermark) and Type II collusion attacks(applied to video frames embedded with differentwatermarks). A Type I collusion attack is con-ducted first by averaging a set of extractedwatermarks (usually obtained through using de-noising) to estimate the hidden watermark, andthen the estimated watermark is subtracted fromall the frames in order that the hidden signal canbe removed, whereas a Type II collusion attack isoperated by averaging those perceptually similarframes in order to directly remove the watermarks.However, Type II collusion is less powerful since itis restricted to operating only on a subset of videoframes such that the video watermarks cannot beeliminated entirely. Hence, we will only focus onthe Type I collusion attack in this paper. It shouldbe noted that the conventional denoising-based
ARTICLE IN PRESS
Table 2
VLC table (s denotes the sign bit)
(run,level) Variable length code Bit length
(0,1) 11s 3
(0,2) 0100s 5
(0,3) 0010 1s 6
(0,4) 0000 110s 8
(0,5) 0010 0110s 9
C.-S. Lu et al. / Signal Processing: Image Communication 20 (2005) 624–642 627
removal attack [24] when applied to one singleimage is a special case of the collusion attack.On the other hand, the copy attack [11], which is
one of the protocol attacks, is used to create thefalse positive problem. Initially, the copy attackwas proposed in image watermarking and carriedout as follows: (i) a watermark is first predictedfrom a stego image; (ii) the predicted watermark isadded into a target image to create a counterfeitstego image; and (iii) from the counterfeit image, awatermark can be detected that wrongly claimsrightful ownership. Compared with the collusionattack, the copy attack can be executed on onlyone video frame or an image and, thus, is moreflexible. In this regard, the copy attack must not beignored when robustness is mentioned. Becausewatermark estimation is a common step performedto realize the collusion and copy attacks, they willbe called watermark-estimation attacks (WEAs).In the literature, previous collusion-resistant
video watermarking methods were either compu-tationally complex [22] or dependent on unstablefeature extraction [21]. In addition, the copy attackhas been ignored. In this paper, we propose acontent-dependent video watermarking scheme toresist both collusion and copy attacks. First, eachvideo-frame hash is extracted and then combinedwith a hidden message to yield the video frame-dependent watermark (VFDW). The properties ofthe VFDW will be examined, and mathematicalanalyses of the VFDW’s resistance to WEAs willbe discussed.The remainder of this paper is organized as
follows. In Section 2, we shall study how to embedand blindly extract watermarks in the VLC domain,and how to keep the bit-rate of the resultant stegovideo nearly unchanged. In Section 3, the designand analyses of the proposed VFDW in resistingboth collusion and copy attacks will be discussed.Extensive experimental results will be given inSection 4, and concluding remarks will be drawnin Section 5.
(0,6) 0010 0001s 9
: : :
(1,1) 011s 4
(1,2) 0001 10s 7
(1,3) 0010 0101s 9
: : :
2. MPEG-2 bitstream watermarking
The proposed compressed domain MPEG-2video watermarking scheme will be discussed in
this section. We shall start with a discussion ofhow to embed watermark signals in the VLCdomain. Then, a macroblock (MB)-based embed-ding scheme will be described. We will alsoinvestigate the problem of bit-rate control. Finally,we will describe how the hidden watermark can beblindly extracted. In this paper, ‘‘cover video’’is used to denote an MPEG-2 compressed bit-stream, and ‘‘stego video’’ is used to denote a(compressed+embedded) bitstream.
2.1. Video watermarking in the VLC domain
When an MPEG bitstream is to be water-marked, it is required to be decoded backwards tosome extent. In our method, only inverse entropycoding must be executed since our system isapplied in the VLC domain. After a cover(compressed) video is inversely entropy coded, anMPEG compressed bitstream is represented usingVLCs, as tabulated in Table 2. In the VLC table,each codeword corresponds to a run-level pair,denoted as ðr; lÞ. During a video encoding stage,the pixel values in the spatial domain are 8� 8discrete cosine transformed (DCT). After quanti-zation, the content of each 8� 8 block is scannedin a zigzag manner. Each scanned non-zero integerhas to be converted into a so-called run-level pair,ðr; lÞ. The run, r, indicates the number of zerospreceding the current non-zero coefficient. Thelevel, l, on the other hand, represents an integerthat is the magnitude of the non-zero coefficientafter quantization. Having the run-level pairs for
ARTICLE IN PRESS
C.-S. Lu et al. / Signal Processing: Image Communication 20 (2005) 624–642628
all the non-zero quantized coefficients in an 8� 8block, one can consult the VLC table and convertthem into a bitstream. It is easy to see that if onechooses run-level pairs as potential hiding places,then it will be feasible to modify the level value only.Based on the MPEG coding rule, if a run value ismodulated, then this indicates that the number ofpreceding zeros has been altered. Basically, this kindof update will lead to serious consequences becausethe positions of all subsequent non-zero coefficientswill be shifted. Under these circumstances, if theseshifted coefficients are decoded back to the spatialdomain, the resultant frame will be quite differentfrom its original. However, if we update the levelvalue, then only its magnitude will be changed.Under these circumstances, visual degradation dueto watermark embedding can be controlled. There-fore, we propose to update the level value instead ofupdating the run value during the watermarkembedding process.There is a potential problem if one adopts level
values as hiding places. It is known that if thecompression ratio is changed, then the size of avideo bitstream will be changed and the number ofrun-level pairs in a frame will be changed as well.This will lead to the fact that the number ofwatermark bits and the number of hiding positions(e.g., number of run-level pairs) are not the same;i.e., the number of original watermark bits isdifferent from that of extracted watermark bits.Thus, it is impossible to calculate the correlationbetween an original watermark and an extractedversion, which are of different lengths. In order totackle this problem (i.e., keep the number of hidingpositions un-changed), we propose to use a MB asan embedding unit because the number of MBswithin a frame can be kept un-changed if the framesize is not changed after attacks are applied.Another issue regarding video watermarking is
the selection of a proper color channel forembedding. Usually, the frames of a videosequence will be split into Y, Cb and Cr channelsin the MPEG coding stage. According to differentcolor resampling rules, it is possible for the ratioY:Cb:Cr be set to 4:2:2 or 4:2:0. Under thesecircumstances, the only unchanged channel is theY channel. Thus, when embedding watermarks, weprefer to exploit the Y channel as the host channel.
In addition to the above selections, choosing anappropriate frame type among I-frame, P-frame orB-frame for hiding watermarks is also a crucialissue. Usually, a conventional video consists ofa number of GOPs. Each GOP is composed ofone I-frame and several B-frames and P-frames. Atypical I-frame adopts intra coding, which meansit does not refer to any other frames. Differentfrom an I-frame, a P-frame only refers to itsnearest preceding I- or P-frame. As for a B-frame,it refers to the nearest preceding and succeeding I-or P-frame. In a conventional MPEG format, thecontent of a B- or P-frame is the so-called residualerror between the current frame and the frame towhich it refers. Therefore, only an I-frame canhold complete information. In this paper, wechoose to embed watermarks into the I-frames ofan MPEG compressed video sequence. Due to theinherent referencing effect, we know that water-marks embedded in I-frames will be propagatedinto succeeding B- or P-frames when re-encodingis performed. In this situation, watermarks can bedetected from all the frames. In the next section,we shall present the detailed procedures forembedding and extracting watermarks.
2.2. MB-based video watermarking
In addition to different (spatial/transformed/compressed) domain watermarking methods, inorder to deal with geometrical distortions, theembedding of synchronization patterns/templatesin advance for later recovery of geometricalparameters has been popularly used in imagewatermarking and offers a certain benefits. In theliterature, regularly tiled subframes [9] or featurepoint-based irregular subframes [21] have beenproposed as a basic embedding unit to provide acertain degree of resistance to geometrical attacks.In [1], Alattar et al. also implemented synchroni-zation patterns and embedded them into a MPEG-4 bitstream to solve rotation and scaling problems.Of course, we can also adopt similar principles tohandle geometrical attacks in our system. How-ever, as we have pointed out in [16] and willdescribe later in Section 3, embedded synchroniza-tion/repetition patterns can be easily removed bymeans of the collusion attack. As a result, we
ARTICLE IN PRESS
C.-S. Lu et al. / Signal Processing: Image Communication 20 (2005) 624–642 629
would rather focus on those attacks that arepeculiar to video sequences.In this work, we shall propose a content-
dependent, collusion- and copy-resilient blindvideo watermarking system, where the originalsource is not used during the detection process.The rationale behind using blind detection ismainly based on the fact that the intrinsic contentof a video can be mostly preserved even whendigital operations are applied. Here, the so-called‘‘intrinsic content’’ of a video is defined as itsfiltered version since the noisy part is discardedby means of mean filtering. It is known thata watermark signal is usually a high-frequencysignal; therefore, watermark embedding and de-tection steps are operated in the noisy part. In thefollowing, we shall describe in detail the MB-basedvideo watermarking scheme. A block diagram ofour method is illustrated in Fig. 1.
2.2.1. Video watermark embedding
Suppose there are, in total, N MBs in a videoframe. Let ðrij ; lijÞ be the jth run-level pair in theith MB, let uðiÞ be the mean of the levels in the ithMB, and let ni be the number of levels in theith MB. Under these circumstances, the meanvalues uðiÞð1pipNÞ will form a 1-D sequence U
for each video frame. In fact, these mean valuesneed to be shuffled based on a secret key K beforewatermark embedding. Shuffling is performedbecause the mean sequence is required to besecured in order to enhance the security of theproposed watermark embedding and detectionprocesses. Under this circumstance, even thewatermarking algorithm is known, the maliciousattacker will be computationally difficult toremove the hidden watermark.
Fig. 1. Block diagram of the proposed watermark embedding
process.
Let uðiÞ’s be the mean filtered version obtainedfrom uðiÞ’s. Since uðiÞ is the mean level value of aMB, its corresponding mean filtered value uðiÞ inthe mean filtered sequence U can be derived byaveraging the mean level values of its left and rightneighbors; i.e.,
uðiÞ ¼1
mf s
Xk¼iþbmf s=2c
k¼i�bmf s=2c�1
uðkÞ (1)
is obtained and mf s denotes mean filteringsupport. By applying this mechanism to all theMBs, we can obtain a set of magnitude relation-ships between each pair of uðiÞ and uðiÞ as
mrðiÞ ¼ sgnðuðiÞ � uðiÞÞ, (2)
where sgnðÞ is a sign function and is defined as
sgnðtÞ ¼þ1 if tX0;
�1 if to0:
�
Through this filtering procedure, the noisy parts ofa host signal, i.e., ðuðiÞ � uðiÞÞ’s, that are used forembedding are obtained. In addition, we assumethat the approximate version, U, of a host signalcan be mostly obtained in the watermark detectionprocess in order to not affect robustness underblind detection.Next, a watermark signal W ¼ fwð1Þ;wð2Þ; . . . ;
wðNÞg is generated based on the secret key K forthe purpose of embedding, where wðiÞ ¼ þC or�C, and C denotes a constant. In this paper,watermark embedding is done by using a water-mark value wðiÞð1pipNÞ to perturb uðiÞ as
uhðiÞ ¼ uðiÞ þ wðiÞ, (3)
where uhðiÞ is the modulated version of uðiÞ. Afterapplying Eq. (3) to all the block mean values uðiÞ’s,each uhðiÞ can be derived according to Eq. (1) as
uhðiÞ ¼1
mfs
Xk¼iþbmfs=2c
k¼i�bmfs=2c�1
uhðkÞ
¼1
mfs
Xk¼iþbmfs=2c
k¼i�bmfs=2c�1
uðkÞ
þ1
mfs
Xk¼iþbmfs=2c
k¼i�bmfs=2c�1
wðkÞ � uðiÞ, ð4Þ
ARTICLE IN PRESS
C.-S. Lu et al. / Signal Processing: Image Communication 20 (2005) 624–642630
based on the assumption that 1mfs
Pk¼iþbmfs=2ck¼i�bmfs=2c�1
�
wðkÞ � 0 due to the fact that the watermark valuesare constant and mfs is an even integer. Therefore,the new magnitude relationship following embed-ding between uhðiÞ and uhðiÞ is
mrhðiÞ ¼ sgnðuhðiÞ � uhðiÞÞ. (5)
Sequentially substituting Eqs. (3), (4), and (2) intoEq. (5), we get
mrhðiÞ ¼ sgnðuhðiÞ � uhðiÞÞ ¼ sgnððuðiÞ þ wðiÞÞ
� uðiÞÞÞ ¼ sgnððuðiÞ � uðiÞÞ þ wðiÞÞ. ð6Þ
Comparing Eqs. (2) and (6), it becomes clear thatour embedding idea is to change the magnituderelationship, mrðiÞ, between uðiÞ and uðiÞ accordingto the incoming watermark bit wðiÞ. In this paper,the goal of our embedding process is to force themodulated magnitude relationship (i.e., mrhðiÞ) tohave the ‘‘sign’’ the same as its correspondingwatermark bit, wðiÞ. With this embedding rule, wehave the following: (i) when wðiÞ is positive, uðiÞ
must be modulated by adding wðiÞ in order to getpositive mrhðiÞ; (ii) when wðiÞ is negative, uðiÞ mustbe modulated by adding wðiÞ in order to getnegative mrhðiÞ.However, the entire embedding process at this
stage has not been carried out completely becausewatermark embedding only proceeds to the ‘‘MB’’level. In practice, each run-level pair in a MB stillneeds to be modulated. Furthermore, we have notyet discussed the fidelity problem that is encoun-tered following watermark embedding. With re-gard to level-wise embedding, we propose topropagate the embedding quantity wðiÞ to all thelevels of the run-level pairs in a MB i. This impliesthat all the original run-level pairs in a MB i aremodulated with the same quantity; i.e., the levelvalue lij is modulated as ih
ij by means of
lhij ¼ lij þ wðiÞ, (7)
where 1pjpni and 1pipN.As for the fidelity issue, since the embedding
step (Eq. (7)) may produce visual defects, theactual watermark value wðiÞ needs to be furtherstudied here. Usually, a human visual model is
adopted to maintain fidelity in digital water-marking. For VLC domain video watermarkingconsidered here, the maximum quantity changeallowed for a ‘‘level’’ value is ‘‘1,’’ which corre-sponds to a fixed quantization interval in the DCTdomain. As a consequence, the following trans-parency constraint needs to be enforced during theembedding process:
jlhij � lijj ¼ jwðiÞj ¼ 1. (8)
Based on Eq. (8), the hidden watermark W isderived as a bipolar signal; i.e., the constant C isset to be 1. The magnitude of the watermarkvalues exactly indicates the maximum distortionthat we can impose on the level values of a videobitstream’s VLCs.During the compressed domain video water-
marking process, however, we encounter someproblems that should be carefully handled. Forexample, if the modulated run-level pair, ðrij ; l
hijÞ,
does not exist in the VLC codewords, then this willcause a video encoding and decoding problem. Inorder to tackle this problem, the modulated ðrij ; l
hijÞ
should be forced to possess the level value of itsclosest run-level pair in the VLC codewords. Thisalso implies that the transparency constraint,specified in Eq. (8), may not be satisfied whenthe aforementioned problem occurs. On the otherhand, if the modulated level lh
ij is less than or equalto zero, then we propose to leave the original run-level pair unchanged to maintain the correctness ofre-encoding. Under this circumstance, robustnesswill be more or less affected. In practice, ourextensive results have shown that the watermarkdetection results obtained from stego/attackedvideo sequences can be separated very wellfrom those obtained from un-watermarked videosequences.
2.3. Bit-rate control
In some applications, the bit-rate of a videobitstream is not allowed to be (dramatically)changed after a watermarking process is per-formed. Here, we shall explain how the problemof bit-rate control can be dealt with. According tothe VLC table (Table 2), the number of bits usedto represent a run level pair satisfies the following
ARTICLE IN PRESS
C.-S. Lu et al. / Signal Processing: Image Communication 20 (2005) 624–642 631
inequality:
VLClengthðr; lÞpVLClengthðr; l þ 1Þ, (9)
where the function VLClengthð; Þ reports thenumber of bits used to represent the VLC code-word of a run level pair. Eq. (9) implies that giventhe same run r, a larger level value is associatedwith a longer codeword. Based on the abovecoding rule, we shall describe in the followinghow the bit-rate can be kept unchanged. Strictlyspeaking, our goal is to make the bit-ratedifference between a cover and a stego videosequence negligibly small.During the watermark embedding process, we
first modify the level values of those MBs thatcorrespond to negative watermark bits. Sincenegative watermark bits are to be embedded, thecodeword length of a modulated level value will beshorter than that of an original level value, asindicated in Eq. (9). We will calculate the numberof bits, B, that have been saved after the negativewatermark bits are embedded. Then, the increasein the bit-rate that results from embedding positivewatermark bits must be kept smaller than or equalto B. To this end, we propose to embed thepositive watermark bits one by one until theincreased bit-rate satisfies the above constraint. Ifthere still are positive watermark bits that have notbeen embedded, we give up on embedding them inorder to not increase the bit-rate. Thus, robustnesswill be affected. However, our extensive resultshave shown that the watermark detection resultsobtained from the stego/attacked video sequencescan be separated very well from those obtainedfrom un-watermarked video sequences. Throughthis embedding strategy, the bit-rate (BRstego)of a stego video can be guaranteed to be equalto or smaller than that (BRcover) of its correspond-ing cover video, i.e., BRstegopBRcover. In fact,the difference between BRstego and BRcover isnegligibly small, as indicated by our experimentalresults.
2.3.1. Video watermark extraction
The watermark signal extraction process is fastand simple, and is basically an inverse process ofembedding. Because watermark extraction is con-ducted in the VLC domain, inverse quantization,
inverse DCT, and decoding+re-encoding are notperformed on the incoming video bitstreams.On the contrary, only inverse entropy coding isrequired to obtain VLC codewords. Numericalresults will be given later to show that real-timedetection can be achieved.In the watermark extraction process, the first
step calculates the block-based mean level se-quence, Us, which is then de-shuffled based on thesecret key K before watermark detection. Next, itsmean filtered version, U
s, as described in Section
2.2.1, of a suspect video is calculated. Since videoattacks may have been imposed on a stego video,the resultant sequence, U
s, will be not the same
as its corresponding original, U. However, asexplained previously, what we can rely on is theinvariance of the intrinsic content (i.e., U) ofa video. Strictly speaking, this assumption (i.e.,U ¼ U
s) cannot be guaranteed and is only correct
to some extent. However, our results indicate thatthis assumption is reasonable. After the twosequences Us and U
sare obtained, each element
weðiÞ of an extracted watermark sequence We isdetermined as follows:
weðiÞ ¼ sgnðusðiÞ � usðiÞÞ � sgnðusðiÞ � uðiÞÞ ¼ wðiÞ,
(10)
where usðiÞ � uðiÞ is simply assumed to realizeblind watermark estimation. In practice, usðiÞ canbe further formulated as the result of imposing afading effect plus a noise component on uðiÞ. Then,the subsequent task is to solve the parameters offading and noising, as previously discussed in [23]and other works. Furthermore, it should be notedthat both sequences Us and U
scan only be
obtained if the secret key K used for shuffling isknown. Under this circumstance, pirates cannoteasily estimate the watermark bits based onEq. (10).In order to determine the presence/absence of a
hidden watermark, the normalized correlationvalue between W and We is computed as follows:
dnc ¼W WeffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffijWkWej
p ¼W We
N, (11)
where ‘‘’’ is an inner product operation. Therelationship between dncð; Þ and bit error rate
ARTICLE IN PRESS
C.-S. Lu et al. / Signal Processing: Image Communication 20 (2005) 624–642632
(BER) can be expressed as dncð; Þ ¼ 1� 2� BER.It is said that a watermark has been detected ifdncð; Þ is larger than a pre-determined thresholdT. In this study, T was selected as 0:11 if thedesired false positive probability was approxi-mately 10�7 [4].
3. Video frame-dependent watermark
In this section, we will describe the proposedVFDW that is embedded in our system. In Section3.1, we shall first discuss the characteristicsof watermark estimation attacks (WEAs). Then,the proposed video frame hash and VFDW willbe described in Section 3.2. Finally, the propertiesof the VFDW will be studied and its resistanceWEAs will be analyzed in Sections 3.3 and3.4, respectively. Note that in order to betterexplain the resistance of VFDW to WEAs,our analyses will be conducted in the spatialdomain. This is reasonable because a signalembedded in the transformed domain can betransferred to another equivalent signal in thespatial domain and watermark estimation bymeans of denoising [11,24] is intuitively appliedin the spatial domain. As shown in Table 1, theresistance of the VFDW to WEAs is the novelcontribution of this paper.
Fig. 2. Watermark estimation/removal illustrated with energy varia
indicating the energy of each watermark value; (b) gray bars show the
same as in the original (a); (c) the residual watermark obtained after
estimated watermark with most of the signs being opposite to those i
examples, sufficiently large correlations between (a) and (c), and betw
3.1. Watermark estimation attack
From an attacker’s perspective, the energy ofeach watermark bit must be accurately predictedso that the previously added watermark energy canbe completely subtracted to accomplish effectivewatermark removal. An estimated watermark’senergy is closely related to the accuracy of theremoval attack. Several scenarios are shown inFig. 2, which illustrates the energy variations of (a)an original watermark; (b) and (d) an estimatedwatermark (illustrated in gray-scale); and (c) and(e) a residual watermark generated by subtractingthe estimated watermark from the original water-mark. From Figs. 2(a)–(c), we can see that eventhough watermark’s sign bits are fully obtained,the corresponding energies cannot be completelydiscarded, and the residual watermark still sufficesto reveal the encoded message. Furthermore, if thesign of an estimated watermark bit is differentfrom its original sign (i.e., sgnðW ðiÞÞasgnðW eðiÞÞ),then any additional energy subtraction will not behelpful in improving removal efficiency. On thecontrary, watermark removal in terms of energysubtraction operated in the opposite (wrong)polarity will undesirably damage the media data’sfidelity. Actually, this corresponds to adding awatermark with higher energy into cover datawithout satisfying the masking constraint, as
tions: (a) original embedded watermark with each white bar
energies of an estimated watermark with all the signs being the
removing the estimated watermark (b); (d) the energies of an
n (a); (e) the residual watermark derived from (d). In the above
een (a) and (e) exist, indicating the presence of a watermark.
ARTICLE IN PRESS
C.-S. Lu et al. / Signal Processing: Image Communication 20 (2005) 624–642 633
shown in Fig. 2(d). After subtracting Fig. 2(d)from (a), the resultant residual watermark is thatillustrated in Fig. 2(e). By correlating Figs. 2(a)and (e), it is highly possible to reveal the existenceof a watermark. Unlike other watermark removalattacks that reduce the quality of the media data,the collusion attack may improve the quality ofcolluded data. In view of this fact, it is necessary toconsider the collusion attack when the robustnessof a video watermarking system is evaluated.
3.2. Frame hash and VFDW
In Section 3.1, we found that WEAs areachievable mainly because the hidden watermarkbehaves like a noise, so anyone can reliably utilizeall estimated noise-like watermarks. To disguisethis prior knowledge and hide it from attackers,the key is to reduce the confidence of watermarkestimation achieved by WEAs. To this end, wepropose the VFDW, which must carry informa-tion relevant to the video frame itself. Meanwhile,the content-dependent information (called theframe hash herein) must be secured by means ofthe same secret key K in order for anti-forgery andmust be robust against digital processing [16] inorder to not affect watermark detection.Here, the proposed video frame hash extraction
procedure is operated in the VLC domain. Foreach MB, a piece of representative, robustinformation is created. It is defined in each MB i
as the magnitude relationship between two ener-gies computed from level values:
hðiÞ ¼þ1 if
Pjjf jðp1ÞjX
Pjjf jðp2Þj;
�1 otherwise;
(
where hðiÞ is an element of a frame hash FH, j isthe index that indicates a block belonging to a MBi, and f jðp1Þ and f jðp2Þ denote level values at zig-zaged positions p1 and p2 in a block j, respectively.The length of an FH is exactly equal to the numberof MBs. In addition, the selected level valuesshould be at lower frequencies because level-runpairs located at high-frequency positions arevulnerable to attacks. We say that this featurevalue hðÞ is robust because this magnituderelationship can be mostly preserved under in-
cidental modifications. Since the robustness of theframe hash is beyond the scope of this paper, thereader may refer to [8,17] for similar robustnessanalyses.Next, the frame hash, FH, is merged with the
watermark, W, to generate the VFDW as
VFDW ¼ SðW;FHÞ, (12)
where Sð; Þ is a mixing function, which is operatedbased on a secret key (which will be described inthe next section) and is used to prevent attackersfrom forging the VFDW. The sequence VFDW iswhat we will embed into a video frame.
3.3. Properties of the VFDW
Let a video V be expressed as [i2OFi, where allframes Fi are concatenated to form V and Odenotes the set of frame indices. Let VFDWi be theframe-dependent watermark that is embedded inthe frame Fi to form a stego video Vs, i.e.,
Fsi ¼ Fi þ VFDWi; Vs ¼ [i2OF
si ,
where Fsi is a stego frame and VFDWi, similar to
Eq. (12), is defined as
VFDWi ¼ SðW;FHFiÞ. (13)
In Eq. (13), the mixing function Sð; Þ is designedfor shuffling the frame hash FHFi
using the samesecret key K, followed by shuffling of the water-mark to enhance security for anti-forgery. Speci-fically, it is expressed as
SðW;FHFiÞðkÞ ¼ W ðkÞPTðFHFi
;KÞðkÞ,
where PT denotes a shuffling function controlledby a secret key K.The proposed VFDW possesses the character-
istics described as below. They are useful forproving resistance to WEAs.
Definition 1. Given two frames Fi and Fj , theirdegree of similarity depends on the correlationbetween FHFi
and FHFj, i.e., dncðFi;FjÞ �
dncðFHFi;FHFj
Þ. Two extreme cases exist: (i) ifFi ¼ Fj, then dncðFi;FjÞ ¼ 1; (ii) if Fi and Fj arevisually dissimilar, then dncðFi;FjÞ � 0.
Definition 2. Given two frames Fi and Fj,dncðFi;FjÞ, and their, respectively, embedded video
ARTICLE IN PRESS
C.-S. Lu et al. / Signal Processing: Image Communication 20 (2005) 624–642634
frame-dependent watermarks VFDWi and VFDWj
that are assumed to be independent and identicallydistributed (i.i.d.), the following properties can beestablished: (i) dncðVFDWi;VFDWjÞ is linearlyproportional to dncðFi;FjÞ; (ii) dncðVFDWi;VFDWjÞpdncðW2Þ; (iii) dncðW;VFDWÞ ¼ 0. Adefinition similar to this has been given in [16]for images. It is essential to emphasize thatproperty (i) of Definition 2 contrasts with theone pointed out in [22], and that the novelty of ourscheme is that the concept of the content-dependent watermark is employed.
3.4. Resistance to WEA
By means of a collusion attack, the averagingoperation is performed on stego frames Fs
i ’s of astego video. From an attacker’s perspective, eachhidden watermark has to be estimated using adenoising operation [11,24], so deviation in esti-mation will inevitably occur. Let We
i be a water-mark denoised from Fs
i . In fact, Wei can be
modeled as a partial hidden watermark plus anoise component, i.e.,
Wei ¼ aiVFDWi þ ni,
where ni represents a frame-dependent Gaussiannoise with zero mean and ai denotes the propor-tion that the watermark has been extracted. Underthese circumstances, 1Xai ¼ dncðWe
i ;VFDWiÞ4T
always holds based on the fact that a watermark isa high-frequency signal, which can be efficientlyextracted by means of denoising [9,11,24]. Let Cð�OÞ denote the set of frames used for collusion.After frame collusion is performed, the average ofall the estimated watermarks by employing theCentral Limit Theorem can be expressed as
We¼
ffiffiffiffiffiffiffijCj
p
jCj
Xi2C
Wei ¼
1ffiffiffiffiffiffiffijCj
pXi2C
ðaiVFDWi þ niÞ.
(14)
Now, a sufficient and necessary condition forresisting a collusion attack will be given inProposition 1.
Proposition 1. In a collusion attack, an attacker
first estimates We
from a set C of stego frames.
Then, a counterfeit unwatermarked video Vu is
generated from a stego video Vs ¼ [i2OFsi according
to
Fui ¼ Fs
i � We; Vu ¼ [i2OF
ui . (15)
It is said that the collusion attack fails withina frame Fu
k, k 2 C, i.e., dncðFuk;VFDWkÞ4T , if
and only if dncðWe;VFDWkÞ ¼
Pk2Cak=
ffiffiffiffiffiffiffijCj
p
o1� T .
Proof. First of all, we need to derivedncðW
e;VFDWkÞ. Making use of Eq. (14), we have
the following derivation:
dncðWe;VFDWkÞ
¼1ffiffiffiffiffiffiffijCj
p dncXi2C
ðaiVFDWi þ niÞ;VFDWk
!
¼1ffiffiffiffiffiffiffijCj
pXi2C
aidncðVFDWi;VFDWkÞ
þ1ffiffiffiffiffiffiffijCj
pXi2C
dncðni;VFDWkÞ
¼
Pk2CakffiffiffiffiffiffiffijCj
p , ð16Þ
where VFDWk represents the content-dependentwatermark embedded in Fk. Consequently, givenDefinition 2, and Eqs. (15) and (16), we get:
dncðFuk;VFDWkÞ4T
iff dncðFk þ VFDWk � We;VFDWkÞ4T ,
iff dncðVFDWk;VFDWkÞ
� dncðWe;VFDWkÞ4T ,
iff dncðWe;VFDWkÞ ¼
Pk2CakffiffiffiffiffiffiffijCj
p o1� T : &
ð17Þ
Examining the derived result in Proposition 1,we can find from the numerator of
Pk2Cak=
ffiffiffiffiffiffiffijCj
p
that the summation function exists because videocollusion is generally conducted using a set ofsimilar video frames such that the watermarksextracted from a pair of similar frames can possessa certain positive correlation. However, thischaracteristic is not guaranteed to hold in images[16]. This difference leads to the fact that videocollusion is relatively easier to accomplish than
ARTICLE IN PRESS
Table 3
Results for bit-rate control: the ratios of the bit-rate decrease
obtained from three video sequences are within the order of
½10�4 10�3�
Video BRcover (bytes) BRstego (bytes) BRcover�BRstegoBRcover
Flower-Garden 28,120,582 28,109,649 3:89e�4
Table-Tennis 28,118,904 28,089,882 1:03e�3
Football 7,276,845 7,271,528 7:31e�4
C.-S. Lu et al. / Signal Processing: Image Communication 20 (2005) 624–642 635
image collusion.2 Our experimental results, as willbe shown in Section 4, further confirm this fact.Furthermore, resistance to the copy attack can besimilarly derived. Please refer to [16] for moredetails by treating an image as if it were a videoframe.
4. Experimental results
A series of experiments was conducted to verifythe performance of the proposed method. Threecommonly used videos, ‘‘Flower-Garden,’’ ‘‘Table-Tennis,’’ and ‘‘Football,’’ were adopted in ourexperiments. The number of frames was 375 forboth ‘‘Flower-Garden’’ and ‘‘Table-Tennis’’ and 97for ‘‘Football.’’ The frame size of both ‘‘Flower-Garden’’ and ‘‘Table-Tennis’’ was 704� 576, and itwas 720� 486 for ‘‘Football.’’ The MPEG-2 codec[19] was used to generate compressed videosequences as the cover sources. The bit-rate of eachsource video was fixed at 15Mbits/s, and the frame-rate was 25 frames/s. In addition, the length ofa GOP was 12, and the GOP structure was‘‘IBBPBBPBBPBB.’’ Because watermarks wereconcealed in I-frames in this study, we will presentthe watermark detection results obtained fromI-frames only even though watermarks also couldbe found in non-I-frames (this will be shown to betrue by using I-frame dropping and transcodingattacks). The performance in terms of bit-ratecontrol, fidelity, resistance to numerous incidentaland malicious attacks, and real-time detection wereexamined in our experiments.
4.1. Bit-rate control
In order to show the bit-rate control perfor-mance of our method, the bit-rates, BRcover and
2More specifically, we gain an interesting result from the
collusion-resilient image watermarking [16] and collusion-
resilient video watermarking (this paper) methods. In [16], if a
collusion attack is carried out in an image watermarking
scheme with embedding of multiple redundant watermarks, the
anti-collusion performance is proved to be lower bounded by
jCj ¼ 1. However, if a collusion attack is carried out in a video
watermarking scheme, the anti-collusion performance is almost
the same no matter what the size of jCj is.
BRstego, generated from MPEG-2 cover video andstego video, respectively, are compared in Table 3.In addition, the ratio, ðBRcover � BRstegoÞ=BRcover,of bit-rate reduction achieved by our watermark-ing method is also given. It can be observed fromTable 3 that the bit-rate reduction ratios for thesevideo sequences are all sufficiently small. Numeri-cally, they are on the order of 10�4�10�3.
4.2. Fidelity of stego video sequences
In order to determine the impact of watermarkembedding on the quality of stego video se-quences, we compared two PNSR curves thatwere generated from (i) raw video vs. cover(compressed) video and (ii) raw video vs. stego(compressed+embedded) video. Fig. 3 plots thePSNR curves, where the PSNR values weremeasured in all frames of the three videos. InFig. 3 (a), the PSNR values of the first 200 framesvary smoothly while the PSNR values of theremaining frames vary significantly. The reasonfor these results was that the ‘‘Flower-Garden’’video had no significant motion at the beginningand started to have significant motion after the200th frame. Statistically, an average PSNRdecrease of 3:13 dB was generated. Perceptually,no visual degradations could be sensed when thestego video was played normally. Figs. 3(b) and (c)show similar results that were yielded from thevideo sequences ‘‘Table Tennis’’ and ‘‘Football.’’The average PSNR decrease, as shown in Fig. 3(c),was less than 1 dB. The reason for these results wasthat some watermark bits were given up to not beembedded in order to enable video re-encodingand decoding to be normally performed.
ARTICLE IN PRESS
50 100 150 200 250 300 350
34
36
38
40
42
44
Frames
PS
NR
(dB
)
Cover (compressed) Flower − Garden videoStego (compressed+watermarking) Flower − Garden video
50 100 150 200 250 300 350
32
34
36
38
40
42
44
Frames
PS
NR
(dB
)
Cover (compressed) Table − Tennis videoStego (compressed+watermarking) Table − Tennis video
10 20 30 40 50 60 70 80 90
34
36
38
40
42
44
Frames
PS
NR
(dB
)
Cover (compressed) Football videoStego (compressed+watermarking) Football video
(a)
(b)
(c)
Fig. 3. The PSNR values measured in different frames of three
videos: (a) Flower-Garden; (b) Table-Tennis; and (c) Football.
The PSNR decrease (in dB) for each video is indicated
statistically in terms of the mean/variance: (a) 3:13/1:35;(b) 2:32/1:26; and (c) 1:11/0:69.
C.-S. Lu et al. / Signal Processing: Image Communication 20 (2005) 624–642636
In Fig. 3, it is not difficult to locate the regionswhere motion was significant. One thing to note isthat the lowest PSNR value was always obtainedin an I-frame due to embedding. In addition, thePSNR decrease observed in the B- and P-frameswas due to: (i) the referred I-frames wereembedded and (ii) drift compensation was notadopted to compensate for the effect of (i) in theproposed watermarking method. Although theoverall fidelity of the stego video sequences wasmore or less degraded, the computational com-plexity of applying drift compensation was saved.We will show in the following that robustness wasnot sacrificed even we did not employ driftcompensation.
4.3. Resistance to incidental video attacks
To test robustness against different attacks, weused several attacks, including MPEG-2 re-encod-ing with lower bit-rates (i.e., changing the originalbit-rate from 15M bps to 6M, 4M, and 2M bps,respectively), noise addition (the PSNR between anoiseless frame and its noisy version was fixed at27:05 dB), sharpening, frame averaging, and framerate changing (changed from 25 to 24 frames/s) toverify the performance of our video watermarkingalgorithm. Since the tested attacks may be appliedin normal applications, they are called incidentalattacks here. Fig. 4 shows an example of thewatermark detection results obtained when Flow-er-Garden was used as the cover video. In thisfigure, the horizontal axis indicates the I-framenumber, and the vertical axis indicates thecorrelation value. The correlation values detectedin the I-frames of the attacked Flower-Gardenvideo and detected in un-watermarked videoI-frames are also provided for the purpose ofcomparison. In addition, Fig. 4(a) shows thedetection results obtained using the proposedmethod but without using the VFDW, whileFig. 4(b) shows the results obtained using theproposed method with embedding of the VFDW.In both Figs. 4(a) and (b), it is easy to distinguishthe attacked watermarked videos from theun-watermarked videos. Furthermore, Fig. 4(b)shows lower correlation values (detected inattacked stego videos) than Fig. 4(a) does. The
ARTICLE IN PRESS
C.-S. Lu et al. / Signal Processing: Image Communication 20 (2005) 624–642 637
main reason for these results may have beeninsufficient randomness of the VFDW (referring tothe numbers of 1’s and �1’s), generated from thecontents of video frames (as indicated in Eq. (12)).In particular, when a video frame showed domi-
Table 4
Transcoding parameters
GOP’s structure IBBBBBBBBBBBB I
GOP’s length 13 1
-0.2
0
0.2
0.4
0.6
0.8
Nor
mal
ized
cor
rela
tion
-0.2
0
0.2
0.4
0.6
0.8
Nor
mal
ized
cor
rela
tion
Flower − Garden (watermarked)Flower − Garden (frame − rate changing)Flower − Garden (MPEG: 6 Mbit/s )Flower − Garden (MPEG: 4 Mbit/s)Flower − Garden (MPEG: 2 Mbit/s)Flower − Garden (frame averaging)Flower − Garden (sharpening)Flower − Garden (noise adding)Susi (unwatermarked)Mobile (unwatermarked)Table − Tennis (unwatermarked)
5 10 15 20 25 30I Frames
Flower − Garden (watermarked)Flower − Garden (frame − rate changing)Flower − Garden (MPEG: 6 Mbit/s )Flower − Garden (MPEG: 4 Mbit/s)Flower − Garden (MPEG: 2 Mbit/s)Flower − Garden (frame averaging)Flower − Garden (sharpening)Flower − Garden (noise adding)Susi (unwatermarked)Mobile (unwatermarked)Table − Tennis (unwatermarked)
5 10 15 20 25 30I Frames(a)
(b)
Fig. 4. The correlation values detected from the attacked
Flower-Garden video sequences and un-watermarked videos:
(a) shows the results obtained using our method but without
using the VFDW, while (b) shows the results obtained using our
method by embedding the VFDW. The dashdot line indicates
the threshold T ¼ 0:11.
nant features in either the horizontal or the verticaldirection, the resultant video hash was biased.Overall, based on the achieved robustness, ourassumption of blind detection accomplished bypreserving the intrinsic content of a video, asdescribed in Section 2.3.1, is acceptable. Inaddition, as mentioned previously, the lack ofdrift compensation did not apparently harmrobustness.In addition to the above attacks, transcoding
is also popularly applied to different applicationsfor video transmission over the network. In thefollowing test, transcoding was used to change theGOP structure of stego video sequences. Recallthat the original GOP was ‘‘IBBPBBPBBPBB’’with a length of 12. Three different GOPs, asshown in Table 4, were used in the robustness test.In particular, two of the GOPs had lengths 13 and19, respectively, such that each of them and 12(the length of the original GOP) got the greatestcommon denominator (GCD) 1. Our intentionwas to (i) make the original I-frames inter-coded inthe attacked video; (ii) create new I-frames in theattacked video that were originally inter-coded inthe stego video. Fig. 5 shows an example of thechanges of the frame types after transcoding. Sincethe watermarks were embedded into the I-frames,we wanted to evaluate whether the watermarkscould still be detected in the new I-frames nomatter whether they were B- or P-frames in thestego/un-attacked video. More specifically, weevaluated whether the watermarks, propagatedfrom I-frames to B- and P-frames throughtranscoding, could still be detected. The correla-tion values detected from the ‘‘I-frames’’ of thethree transcoded video sequences are shown inFig. 6. It can be observed that (i) the correlationvalues detected from frames with smaller motionswere clearly exceeded the threshold T and (ii)watermarks were harder to detect in frames withlarger motions.
BBPBBPBBPBBPBB IPPPPPPPPPPPPPPPPPP
5 19
ARTICLE IN PRESS
Fig. 5. Change of the GOP: (a) a video encoded with an
original GOP; (b) a video encoded with a new GOP under
transcoding. Light gray shading indicates the frame types that
are changed from P to B, while dark gray shading indicates the
frame types that are changed from I to non-I or from non-I to I.
5 10 15 20 25 30 -0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
I Frames
Nor
mal
ized
cor
rela
tion
Table − Tennis with transcoding (|GOP|=13, IBBB...B)Flower − Garden with transcoding (|GOP|=13, IBBB...B)
5 10 15 20 25 -0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
I Frames
Nor
mal
ized
cor
rela
tion
Table − Tennis with transcoding (|GOP|=15, IBBPBBP...)Flower − Garden with transcoding (|GOP|=15, IBBPBBP...)
2 4 6 8 10 12 14 16 18 20 -0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
I Frames
Nor
mal
ized
cor
rela
tion
Table − Tennis with transcoding (|GOP|=19, IPPP...P)Flower − Garden with transcoding (|GOP|=19, IPPP...P)
(a)
(b)
(c)
Fig. 6. The correlation values detected from transcoded videos
using the GOP parameters described in Table 4. The dashdot
line indicates the threshold T ¼ 0:11: (a) GOP Type I; (b) GOPType II; and (c) GOP Type III.
C.-S. Lu et al. / Signal Processing: Image Communication 20 (2005) 624–642638
4.4. Resistance to malicious video attacks
Here, malicious video attacks refer to those ofattackers who may have knowledge about thewatermarking algorithm and purposely apply theknown knowledge to remove/destroy the hiddenwatermarks. In the following experiments, we willconsider I-frame dropping and WEAs. I-framedropping was taken into consideration becausewe assumed that the attackers knew that wehad concealed watermarks in I-frames, and thatthe hidden watermarks would be propagated intoB- and P-frames once re-encoding was performed.In addition, watermark estimation attacks, asdescribed and analyzed in Section 3.1, are alsoconsidered to be malicious.
4.4.1. Resistance to I-frame dropping
We started by decoding the watermarked videointo a still image sequence, and then those videoframes (in the spatial domain) corresponding tothe I-frames (in the compressed domain) wereremoved. After the I-frames had been removed,the remaining frames were re-encoded to yield anew compressed video. Fig. 7 shows three sets ofexperimental results following the I-frame drop-ping attack. The horizontal axis denotes the newI-frame numbers of the re-encoded videos, andthe vertical axis denotes the correlation valuesdetected in the renewed I-frames. We can see that
ARTICLE IN PRESS
5 10 15 20 25 -0.2
0
0.2
0.4
0.6
0.8
Renewed I−frame
5 10 15 20 25
Renewed I−frame
5 10 15 20 25Renewed I−frame
Nor
mal
ized
cor
rela
tion
-0.2
0
0.2
0.4
0.6
0.8
Nor
mal
ized
cor
rela
tion
-0.2
0
0.2
0.4
0.6
0.8
Nor
mal
ized
cor
rela
tion
Flower − Garden (I− frame dropping)
Table − Tennis (I − frame dropping)
Football (I − frame dropping)
(a)
(b)
(c)
Fig. 7. The correlation values detected after the I-frame
dropping attack was applied to the (a) Flower-Garden,
(b) Table-Tennis, and (c) Football, respectively. The dashdot
line indicates the threshold T ¼ 0:11.
C.-S. Lu et al. / Signal Processing: Image Communication 20 (2005) 624–642 639
some detected correlation values were quite high,which means that the residual watermark (propa-gated from the original I-frames) could still bedetected in the newly assigned I-frames. However,some residual watermarks were almost destroyed.Serious damage to the residual watermark usuallyhappened in those video frames that had signifi-cant motion.In what follows, we will discuss why our method
is vulnerable to significant motion. Suppose awatermarked I-frame is at frame t; then framet þ 1 is definitely a B- or P-frame that will refer tothe preceding I-frame (frame t). Let MBi;t andMBj;t be the ith and the jth MB of frame t,respectively. According to our algorithm, thewatermark bit wðiÞ will be embedded in MBi;t,and wðjÞ will be embedded in MBj;t. If MBj;tþ1
must refer to MBi;t during motion estimation, thenthe content of MBj;tþ1 is expected to be similar tothat of MBi;t. The above-mentioned referencingmechanism implies that the watermark bit wðiÞ ofMBi;t of frame t will be propagated to MBj;tþ1 offrame t þ 1. Under these circumstances, the water-mark bit detected from MBj;tþ1 is wðiÞ, not wðjÞ.However, ideally, the expected watermark bitshould be wðjÞ. Therefore, it is clear that whenthere is any significant motion in a video and theI-frame dropping attack is applied, it is difficult tocorrectly detect the watermark in the newlyassigned I-frames. This explains why the footballvideo produced the worst results (as indicated inFig. 7(c)).
4.4.2. Resistance to watermark estimation attacks
To study the resistance to WEAs, the proposedvideo watermarking scheme embedded with avideo frame-independent watermark was usedand, denoted as Method I. The combination ofthe proposed VFDW and Method I was denotedas Method II. We wanted to verify the advantageof using VFDW by comparing the performance ofMethods I and II when WEAs were imposed.
4.4.2.1. VFDW resistance to the collusion attack.
The collusion attack was applied to Method I(without using the VFDW) and Method II (usingthe VFDW), respectively. The affects of thecollusion attack and VFDW were examined from
ARTICLE IN PRESS
5 10 15 20 25 30 -0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
I Frames
Nor
mal
ized
cor
rela
tion
Flower − Garden without VFDW/without collusionFlower − Garden with VFDW/without collusionFlower − Garden with VFDW/with Type − I collusion (|C|=16)Flower − Garden with VFDW/with Type − I collusion (|C|=1)Flower − Garden without VFDW/with Type − I collusion (|C|=1)Flower − Garden without VFDW/with Type − I collusion (|C|=16)
Fig. 9. Watermark detection under collusion (the dashdot line
indicates the threshold T ¼ 0:11). Comparing the detectioncurves obtained using different jCj’s reveals that the anti-
collusion capability of our watermarking method is not affected
by the size of a collusion set (see the third and fourth curves). If
VFDW is not used, collusion indeed provides effective water-
mark removal (see the last two curves). These results are exactly
consistent with Proposition 1.
50 100 150 200 250 300 350Frames
50 100 150 200 250 300 350Frames
without VFDW/without collusionwithout VFDW/with collusion
34
36
38
40
42
44
46
48
PS
NR
(dB
)
34
36
38
40
42
44
46
48
PS
NR
(dB
)
with VFDW/without collusionwith VFDW/with collusion
(a)
(b)
Fig. 8. Quality of a colluded Flower-Garden video: (a) the
PSNR values of the colluded frames (top) are higher than those
of the stego frames; (b) when VFDW was applied, the PSNR
values of the colluded frames (bottom) became lower than those
of the stego frames. This experiment reveals that a collusion
attack will fail to improve the fidelity of a colluded video when
VFDW is applied.
C.-S. Lu et al. / Signal Processing: Image Communication 20 (2005) 624–642640
two viewpoints: (s1) the quality of a colludedvideo; and (s2) watermark detection after perform-ing collusion. Typical results obtained from theFlower-Garden video are depicted in Figs. 8 and 9,respectively.As for (s1), it can be found in Fig. 8(a) that
collusion improved the quality of the colluded videoframes in terms of MSE. However, the VFDWcould force collusion to undesirably degrade thequality of the colluded video frames, as shown inFig. 8(b). This experiment demonstrated that the
VFDW was efficient in preventing a collusionattack from achieving perfect cover video recovery.As for (s2), the watermark detection results for
colluded video frames are shown in Fig. 9. We cansee from the first two curves of Fig. 9 that whenvideo collusion was absent, the detection valuesobtained from Method II were slightly smallerthan those obtained from Method I. On the otherhand, the last two curves in Fig. 9 show that whenVFDW was not employed (i.e., using Method I),all the watermarks could not be extracted fromcolluded frames. In addition, once the VFDW wasemployed in embedding (i.e., using Method II),watermarks could be detected (the third andfourth curves of Fig. 9) from all the I-frames nomatter what the size of the collusion set C is. Itshould be noted that these two curves are veryclose to each other, which coincide with our resultderived in Proposition 1. This experiment verifiedthe resistance of the VFDW to a collusion attack.In summary, as long as a frame hash is used to
construct a watermark, even when a collusionattack is applied, watermarks still can be extractedby owners, and the fidelity of colluded videoscannot be improved. As a result, the merits ofVFDW in resisting collusion have been confirmed.
ARTICLE IN PRESS
1 2 3 40
50
100
150
200
250
300
350
400
450
500
Tim
e[se
c.]
decoding+
re − encoding
(Flower − Garden)
decoding+
re − encoding
(Table − Tennis)
watermark detection+
decoding
decoding
Fig. 10. Comparison of the time consumed by (1) Flower-
Garden video decoding+re-encoding; (2) Table-Tennis video
decoding+re-encoding; (3) video decoding+our watermark
detection (not optimized for speed), and (4) video decoding,
respectively. Note that the amount of time needed to re-encode
Flower-Garden and Table-Tennis were different. However, the
amount of time used in watermark detection+decoding and
decoding for both Flower-Garden and Table-Tennis were
almost the same.
C.-S. Lu et al. / Signal Processing: Image Communication 20 (2005) 624–642 641
4.4.2.2. VFDW resistance to the copy attack. Thecopy attack was applied on Method I and MethodII, respectively. When the copy attack was per-formed, one of the videos was first watermarked,and then the watermark was estimated and copied tothe other unwatermarked videos to form counterfeitstego videos. By repeating the above procedure, weobtained six counterfeit stego videos in total. ThePSNR values (stego video vs. stego+copy attackedvideo) of the attacked video frames were in the rangeof 36�55 dB (no masking was used). The normalizedcorrelations obtained by applying the copy attack toMethod I fell within the interval [0.487 0.650] (allwere sufficiently larger than T ¼ 0:11), whichindicated the presence of watermarks. However,when VFDW was introduced, these correlationsdecreased significantly to the interval ½�0:056 0:060�,which indicated the absence of watermarks. Theexperimental results are consistent with the analyticresult indicating that the proposed VFDW is able todeter the detection of copied watermarks.
4.5. Real-time detection
As for the real-time detection requirement, thetime consumed on the video decoder/watermarkdetector side in three different situations, including(1) video decoding and re-encoding, (2) videodecoding and watermark detection, and (3) videodecoding, was compared and is shown in Fig. 10.Our watermarking system was run on a PC with aPentium-4 2.5GHz CPU under Windows 2000.The two left bars in Fig. 10 show the time neededto finish video decoding plus re-encoding for‘‘Flower-Garden’’ and ‘‘Table-Tennis,’’ respec-tively. The difference in the time cost is mainlydue to the different contents of the videosequences; therefore the amount of time consumedin motion estimation was different. However, thedecoding time required for both video sequenceswas almost the same. On the other hand, theaverage watermark detection time was about0.117 s/frame. It is obvious that our method (asshown by the third bar in Fig. 10) used much lesstime than video decoding+re-encoding. Compar-ing the amount of time used for pure decoding (thebar on the right side of Fig. 10), our method (videodecoding+watermark detection) needed nearly
the same amount of time. From the comparedresults, it is reasonable to conclude that ourwatermark detection scheme can almost be exe-cuted in real-time. In addition, uncompresseddomain video watermarking is not feasible forreal-time application because the time spent in thedecoding and re-encoding process is very long.
5. Conclusion
A digital video watermarking system needs todeal with several critical issues that are peculiar tovideo sequences. In this paper, we have presented anew video watermarking scheme that takes theseissues into consideration. These include com-pressed domain watermarking, real-time detection,bit-rate control, and resistance to video incidentaland malicious attacks. In particular, we haveprovided a watermarking method that can beperformed in the variable length codeword (VLC)domain. We have also designed a video frame-dependent watermark (VFDW), which is able toresist watermark estimation attacks (WEAs) thathave been largely ignored in the literature.
ARTICLE IN PRESS
C.-S. Lu et al. / Signal Processing: Image Communication 20 (2005) 624–642642
Resistance to WEAs is indeed indispensablebecause they are efficient in defeating a videowatermark system while maintaining the visualquality of attacked video sequences. We haveconducted extensive experiments to verify theperformance of the proposed method.In this paper, we have not considered resistance
to geometrical distortions. As pointed out pre-viously, the embedding of synchronization/repeti-tion patterns is a common way used to tolerategeometrical attacks (only efficient to certainextent). However, the embedded synchronizationpatterns are easy to remove using the collusionattack. Therefore, we do not think it is sufficientto employ the similar idea to deal with the problemof geometrical distortions. On the contrary, wepropose to exploit a mesh [18] as the basicembedding unit, and to combine it with its mesh-based hash [8,17] to resist both the geometricattacks and estimation attacks [16].
Acknowledgements
This paper was supported under NSC Grants91-2213-E-001-037 and 92-2422-H-001-004.
References
[1] A.M. Alattar, E.T. Lin, M.U. Celik, Digital watermarking
of low bit-rate advanced simple profile MPEG-4 com-
pressed video, IEEE Trans. Circuits Syst. Video Technol.
13 (8) (2003) 787–800.
[2] S. Arena, M. Caramma, R. Lancini, Digital watermarking
applied to MPEG-2 coded video sequences exploiting
space and frequency masking, Proceedings of the IEEE
International Conference on Image Processing, 2000.
[3] S. Baudry, P. Nguyen, H. Maita, Channel coding in video
watermarking: use of soft decoding to improve the
watermark retrieval, Proceedings of the IEEE Interna-
tional Conference on Image Processing, 2000.
[4] I.J. Cox, M.L. Miller, J.A. Bloom, Digital Watermarking,
Morgan Kaufmann Publishers, Los Altos, CA, 2002.
[5] J. Dittmann, M. Stabenau, R. Steinmetz, Robust MPEG
video watermarking technologies, Proceedings of the ACM
Multimedia, Bristol, UK, 1998.
[6] F. Hartung, B. Girod, Watermarking of uncompressed and
compressed video, Signal Process. 66 (3) (1998) 283–302.
[7] F. Hartung, M. Kutter, Multimedia watermarking techni-
ques, Proc. IEEE 87 (1999) 1079–1107.
[8] C.Y. Hsu, C.S. Lu, A geometric-resilient image hashing
system and its application scalability, Proceedings of the
ACM Multimedia and Security Workshop, Magdeburg,
Germany, 2004, pp. 81–92.
[9] T. Kalker, G. Depovere, J. Haitsma, M. Maes, A video
watermarking system for broadcast monitoring, Proceed-
ings of the SPIE, vol. 3657, 1999, pp. 103–112.
[10] D.R. Kim, S.H. Park, A robust video watermarking
method, Proceedings of the IEEE International Confer-
ence on Multimedia and Expo, 2000.
[11] M. Kutter, S. Voloshynovskiy, A. Herrigel, The water-
mark copy attack, Proceedings of the SPIE: Security and
Watermarking of Multimedia Contents II, vol. 3971, 2000.
[12] G.C. Langelaar, R.L. Lagendijk, Optimal differential
energy watermarking of DCT encoded images and videos,
IEEE Trans. Image Process. 10 (1) (2001) 148–158.
[13] G.C. Langelaar, R.L. Lagendijk, J. Biemond, Real-time
labeling of MPEG-2 compressed video, J. Visual Commun.
Image Represent. 9 (4) (1998) 256–270.
[14] J. Linnartz, J.C. Talstra, MPEG PTY-marks: cheap
detection of embedded copyright data in dvd-video,
ESORICS98, 1998.
[15] C.S. Lu, J.R. Chen, K.C. Fan, Resistance of content-
dependent video watermarking to watermark-estimation
attacks, Proceedings of the IEEE International Conference
on Communications, France, 2004, pp. 1386–1390.
[16] C.S. Lu, C.Y. Hsu, Content-dependent anti-disclosure
image watermark, Proceedings of the International Work-
shop on Digital Watermarking, Lecture Notes in Compu-
ter Science, vol. 2939, Seoul, Korea, 2003, pp. 61–76.
[17] C.S. Lu, C.Y. Hsu, S.W. Sun, P.C. Chang, Robust mesh-
based hashing for copy detection and tracing of images,
Proceedings of the IEEE International Conference on
Multimedia and Expo, Taipei, Taiwan, 2004.
[18] C.S. Lu, S.W. Sun, P.C. Chang, Robust mesh-based
content-dependent image watermarking with resistance to
both geometric attack and watermark-estimation attack,
Proceedings of the SPIE: Security, Steganography, and
Watermarking of Multimedia Contents VII (E1120),
San Jose, CA, USA, 2005.
[19] ftp://ftp.mpegtv.com/pub/mpeg/mssg/mpeg2v12.zip.
[20] F. Petitcolas, R.J. Anderson, M.G. Kuhn, Information
hiding: a survey, Proc. IEEE 87 (1999) 1062–1078.
[21] K. Su, D. Kundur, D. Hatzinakos, Statistical invisibility
for collusion-resistant digital video watermarking, IEEE
Trans. Multimedia 7 (1) (2005) 43–51.
[22] M.D. Swanson, B. Zhu, A.H. Tewfik, Multiresolution
scene-based video watermarking using perceptual models,
IEEE J. Sel. Area Commun. 16 (4) (1998) 540–550.
[23] S. Voloshynovskiy, F. Deguillaume, S. Pereira, T. Pun,
Optimal adaptive diversity watermarking with channel
state estimation, Proceedings of the SPIE: Security and
Watermarking of Multimedia Contents III, vol. 4314,
USA, 2001.
[24] S. Voloshynovskiy, S. Pereira, A. Herrigel, N. Baumgart-
ner, T. Pun, Generalized watermarking attack based
on watermark estimation and perceptual remodulation,
Proceedings of the SPIE: Security and Watermarking of
Multimedia Contents II, vol. 3971, 2000.