[ACM Press the 21st ACM international conference - Barcelona, Spain (2013.10.21-2013.10.25)] Proceedings of the 21st ACM international conference on Multimedia - MM '13 - Motion compensated

Motion Compensated Compressed Domain Watermarking∗

Tanima DuttaDepartment of Computer Science and Engineering

Indian Institute of Technology Guwahati, [email protected]

ABSTRACTThe security has become an important issue in multime-dia applications. The embedding of watermark bits in com-pressed domain is less computationally expensive as full de-coding and re-encoding is not required. The motion co-herency is an essential property to resist temporal frame av-eraging based attacks. The design of motion compensatedembedding method in compressed domain is a challengingtask. As far we know, no such embedding method is ex-plored yet. In this paper, we propose a motion compensatedcompressed domain embedding method within a short videoneighborhood that gives acceptable visual quality, embed-ding capacity, and robustness. The simulation results showthe effectiveness of the proposed method.

Categories and Subject DescriptorsK.4.1 [Computers and Society]: Public Policy Issues;K.5.1 [Legal Aspects of Computing]: Software Protec-tion—Copyrights;Proprietary rights

General TermsLegal aspects, Security, Performance

KeywordsVideo watermarking, compressed domain embedding, mo-tion compensation, temporal frame averaging

1. INTRODUCTIONIn several multimedia applications, security has become

an important research topic. Video watermarking is a tech-nique of hiding a secret signature into digital videos. Videowatermarking methods must be robust i.e. subsequent pro-cessing of watermarked video should not weaken the detec-tion of embedded data. The videos are mostly stored andtransmitted in a compressed format that enhances the im-portance of compressed domain watermarking. Embeddingwatermarks in compressed domain is less computationally

∗(Produces the permission block, and copyright informa-tion).

Permission to make digital or hard copies of part or all of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage, and that copies bear this notice and the full ci-tation on the first page. Copyrights for third-party components of this work must behonored. For all other uses, contact the owner/author(s). Copyright is held by theauthor/owner(s).MM’13, October 21–25, 2013, Barcelona, Spain.ACM 978-1-4503-2404-5/13/10.http://dx.doi.org/10.1145/2502081.2502211 .

expensive as full decoding and re-encoding is not requiredfor watermark embedding. Many video compression stan-dards have emerged in the last few years. The aim of eachstandard is to provide more compressed data with bettervisual quality. H.264 is regarded as the most efficient com-pression standard used in a wide range of applications.

Many watermarking schemes for H.264 Advanced VideoCodec (AVC) have been emerged in recent years. Noorkamiand Mersereau have proposed a non-blind watermarking tech-nique [9] to embed watermark in I-frames using the watsonvisual model, which is computationally expensive. The non-blind technique [9] is extended for P-frames [8] for bettervisual quality and embedded in all nonzero quantized residu-als of P-frames. Full decoding re-encoding of the compressedvideo is required [9, 8] for watermark embedding. The loca-tion map is used to decrease synchronization error. Framedrop or alter attack is not handled in [9, 8].

Mansouri et al. have presented a blind compressed do-main method to embed in I-frames of H.264/AVC videos[7]. The embedding method has changed the nonzero coef-ficients to zero value for hiding the watermark. Esen andAlantan [5] have designed a data hiding framework that se-lects the embedding region using the Forbidden Zone DataHiding and exploits the error correction capability feature ofReed Solomon codes to make it robust. In [3], frame drop oralter attack is handled using the error correction capabilityfeature of Reed Solomon codes.

1.1 MotivationIn [9, 7], the authors have embedded in I-frames. Em-

bedding in I-frames degrade the visual quality significantlybecause embedding error in I-frame propagates in the re-maining uncoded (not yet encoded) blocks of that I-framedue to intra prediction and P-frames and B-frames due tointer prediction in that Group of Pictures (GOP). In P-frames and B-frames, the embedding error propagates onlyin uncoded P-frames and B-frames in that GOP. So, embed-ding in such frames will provide better visual quality thanI-frame embedding [8]. However, I-frames have the higherembedding capacity compared to P-frames and B-frames. InH.264/AVC compressed videos, one I-frame occurs in eachGOP, the number of P-frames or B-frames are more thanI-frames so embedding capacity of such frames can be ex-ploited. To get more compressed video, the number of B-frames are kept more than P-frames in a GOP. However,embedding in such frames may be prone to frame drop oralter attack.

Frame drop or alter attack is quite common nowadays. Itcan be intentional or unintentional attack. Sometime due to

1039

less channel bandwidth or delay, some B-frames are droppedor attackers may drop some B-frames to make the exactextraction of watermark complicated. In addition, handlingsuch attacks in compressed domain is a real complex task. Inwatermark estimation attack (WEA), the attacker removesthe watermark completely from watermarked video. Thewatermark estimation attack is mainly performed using thetemporal frame averaging. Nowadays, motion coherency isan desirable property to resist temporal frame averaging.

Pankajakshan et al. have designed a tool to check whethera video sequence contains any motion incoherent componentusing features extracted from frame prediction error [10].In order to withstand temporal frame averaging along themotion axis, motion coherency has recently been identifiedas a desirable property for embedding watermarks withinvideos [10]. It is difficult to predict the impact of changein motion information in compressed video, so resistancefrom frame drop or alter attack and collusion attack is achallenging job. As far we know, no such compressed em-bedding method based on B-frames, where frame drop oralter attack is handled, is well explored yet. Moreover, noneof them have handled watermark estimation attack [10] incompressed domain.

1.2 Our ContributionThe aim of this paper is to design an compressed domain

embedding method that would provide acceptable visualquality and robustness. It would also be capable to with-stand frame drop or alter attack and watermark estimationattack in compressed domain without using error correctingcodes (unlike [5, 3]). In this paper, we propose a compresseddomain embedding method, where the watermark is embed-ded in each motion compensated temporally averaged lowpass frame obtained from B-frames of a GOP. Motion co-herent 4× 4 blocks are detected and trajectory of both mo-tion and static blocks are determined. The low pass frameis extracted based on motion coherent blocks on B-framesof a GOP. High pass frames are also estimated. Watermarkembedding is done using spread spectrum techniques on allnonzero quantized coefficients in low pass frames. B-framesare reconstructed from watermarked low pass frames andunwatermarked high pass frames. The complexity measureof the proposed method is elaborated. Simulation resultsshow that embedding capacity, visual quality, and robust-ness against recompression error give acceptable results.

The rest of the paper is organized as follows. The pro-posed method is described in Section 2. Simulation resultsare given in Section 3. Finally, Section 4 concludes the pa-per.

2. PROPOSED METHODThe goal of this paper is to design a motion compensated

compressed domain watermarking for low bit compressedvideos like H.264. Within a short video neighborhood (SVN),most video frames are visually coherent and no scene changeis detected. The motion compensated blocks are those whichhave coherent motion within a SVN. A method for detectionof motion compensated blocks is performed in compresseddomain to avoid further decoding and re-encoding of videosequences. So the proposed method is computationally lessexpensive. To get more compressed low bit video, I-frameand B-frames are only coded to form a GOP.

The complete process of motion compensated compresseddomain watermarking is performed in following steps. First,

motion compensated blocks are detected in Section 2.1. Next,extraction of low pass frame based on motion compensatedblocks and high pass frames in each GOP is performed inSection 2.2 and Section 2.3, respectively. Last, the embed-ding and extraction methods are proposed in Section 2.4.

2.1 Motion Compensated Block DetectionIn this section, motion compensated blocks are detected

within a SVN. The method of finding motion compensatedblocks is given in the Procedure : Motion CompensatedBlock Detection. Unlike [4], motion coherent blocks areestimated for B-frames in this paper. Motion vectors ofeach non-overlapping blocks of B-frames are normalized toensure that it point directly to the location in the immedi-ate previous frame [6]. Since H.264/AVC supports multiplereference frames, therefore, normalization simplifies the ref-erencing relationship. The normalized motion vectors aresmoothened using 3 × 3 median filter. Motion vectors ofintra-coded blocks in B-frames are zero so motion vectorsof such blocks are estimated from neighboring blocks [13].Accumulation operator [6] is used for all motion vectors toenhance coherent motion and suppress noisy motion. Accu-mulated motion vectors which have a magnitude of zero be-fore accumulation are removed. Using spatial and temporalconfidence measure [12] discontinuity in motion magnitudeand direction is checked. The motion compensated blocks inthe hall monitor and foreman video are depicted in Figure1 and Figure 2, respectively.

Figure 1: Red boxes indicate two motion compen-sated blocks in (a) 2nd frame and (b) 13th frame in 7th

GOP and 9th GOP, respectively in the hall monitorvideo.

2.2 Extraction of Low Pass FramesEmbedding can be done in each low pass frame extracted

using motion compensated temporal averaging (MC-TFA)[10] on B-frames in a GOP. MC-TFA is performed on B-frames in a GOP based on accumulated motion vectors ofeach block in B-frames obtained in Procedure : MotionCompensated Block Detection. All intra and inter predictedluminance blocks have accumulated motion vector. After de-tection of motion coherent blocks in B-frames, all blocks hav-ing zero (static) and nonzero (motion) motion vector havea motion trajectory. Based on this MC-TFA is performedover all B-frames in a GOP to extract a low pass frame.Motion compensated temporal averaging is estimated blockwise. Assume N and L be the number of B-frames and thelow pass frame extracted in a GOP based on MC-TFA, re-spectively. Therefore, the proposed low pass frame (L) canbe obtained using [10] as follows:

L =motion compensated temporal summation (B1 . . . BN )

N(1)

1040

Figure 2: Red box shows a motion compensated block in (a) 2nd frame, (b) 5th frame, (c) 8th frame, and (d)13th frame in 1st GOP in the foreman video.

Procedure : Motion Compensated Block Detection

Data: Unwatermarked VideoResult: Motion Compensated Blocksforeach short video neighborhood (SVN) do /* SVN notare accessed yet */

foreach frame in a SVN do /* frames not areaccessed yet */

Divide B-frames into non-overlapping blocks;Calculate motion vectors for all blocks in B-frames;

foreach block in B-frames do /* blocks are yet toaccess */

Normalize each motion vectors [6] ; /* Motionvector normalization */Smooth all normalized motion vectors by using a3 × 3 median filter [2]; /* Median filtering */Estimate motion vector for intra-coded blocks fromneighboring blocks [13] ; /* Intra-coded blocks */

Assign motion vector to I-frames by interpolatingmotion vectors of the nearest B-frame; /* I-framemotion vector */foreach block within a SVN do /* blocks are yet toaccess */

Apply accumulation operator [6] to all motionvectors; /* Accumulation process */Use spatial and temporal confidence measure [12];/* Confidence measure */

where B1 . . . BN denotes the B-frames in a GOP in the orderin which they are encoded.

2.3 Extraction of High Pass FramesHigh pass frames ares extracted from motion compensated

temporal difference between every two consecutive B-framesin a GOP. Motion compensated temporal difference is esti-mated block wise. The number of high pass frames extractedare N − 1. Assume H[k] denotes the motion compensatedtemporal difference between kth B-frame and k+1th B-frameof a GOP, where k ∈ N . The kth B-frame and k + 1th B-frame of a GOP are denoted by Bk and Bk+1, respectively.The proposed high pass frame can be extracted using [10]as follows:

H[k] = motion compensated temporal difference (Bk−Bk+1)(2)

2.4 Watermark Embedding and ExtractionA bipolar invisible watermark sequence (0,1) is used for

embedding. Embedding is done in all nonzero quantizedresiduals of intra and inter predicted blocks of motion com-pensated temporally averaged low pass frame using spreadspectrum techniques [1]. Perturbing zero residuals will in-

crease the video bit rate. The watermark embedding is per-formed as follows:

CW = C(1 + αW ), C 6= 0 (3)

where α, C, and CW denote the watermark strength, origi-nal coefficient, and watermarked coefficient. The value of αlies between 0 to 1. B-frames of a GOP are reconstructedfrom the watermarked motion compensated temporally av-eraged low pass frame L and high pass frames H[1 . . . N−1]of that GOP.

The watermark extraction is performed at the decoder af-ter entropy decoding. The extraction procedure is exactlythe reverse process of watermark embedding. The water-mark is extracted as follows:

W ∗ =CW − C

α, CW 6= 0, C 6= 0 (4)

where W ∗, CW , and C denote extracted watermark, coef-ficient of the watermarked video, and original coefficient,respectively.

2.5 Watermark Similarity MeasureIt is highly unlikely that the extracted watermark W ∗ will

be identical to the original watermark W . Even the act ofre-compressing the watermarked video for delivery will causeW ∗ to deviate from W . We measure the similarity of W ∗

and W by [1] as

sim(W,W ∗) =W ∗ ×W√W ∗ ×W

(5)

where sim(W,W ∗) denotes the similarity measure betweenW and W ∗ and × denotes an operator. With the increase insimilarity measure, the robustness of the proposed methodincreases. In other words, sim(W,W ∗) measures the robust-ness of the proposed method.

3. EXPERIMENTAL RESULTSThe proposed method is implemented using H.264/AVC

reference software JM 17.2 [11]. The proposed method isevaluated in foreman and hall monitor videos. The video se-quence is in Quarter Common Intermediate format (QCIF)where frame resolution is 176×144. Frame Rate is 30 framesper second. The intra period is 16 and Quantization Pa-rameter is 24 for both I-frame and B-frame. The fast fullsearch in JM is employed and all intra and inter predictionsizes are enabled. Video sequences of 160 frames and pay-load={200, 250, 300} are taken. In our experiments, thevalue of α = {0.1, 0.2}.

The number of coefficients available for embedding in theforeman video and the hall monitor video are depicted in

1041

Figure 3. The Figure 3 illustrates that embedding capacityor number of coefficient available for embedding is accept-able.

00.050.1

0.150.2

0.250.3

0.350.4

0.450.5

0.550.6

0.650.7

0.750.8

0.850.9

0.951

0 1 2 3 4 5 6 7 8 9

Rate

of

nonze

roco

effici

ents

Group of Pictures (GOP)

Foreman Video

00.050.1

0.150.2

0.250.3

0.350.4

0.450.5

0.550.6

0.650.7

0.750.8

0.850.9

0.951

0 1 2 3 4 5 6 7 8 9R

ate

of

nonze

roco

effici

ents

Group of Pictures (GOP)

Hall Monitor Video

Figure 3: GOP wise embedding capacity in the fore-man video (left) and hall monitor video (right).

The visual quality of the watermarked video is measuredusing Peak Signal-to-Noise Ratio (PSNR) for video sequences,such as, {Foreman and Hall Monitor}. The bit error rate isused for evaluating the robustness of the proposed schemeagainst image processing attacks.

It is clear from Figure 4 and Table 1 that with the increasein α, the PSNR decrease and robustness increases. However,the result shown in Figure 4 give acceptable visual quality.PSNR and robustness results at α = 0.1 show better resultsthan [7] and [8], while at α = 0.2, the results are comparable.Similarly, Table 1 shows robustness against image processingattacks at α = 0.1, 0.2.

Figure 4: PSNR for (a) α = 0.1 and (b) α = 0.2.

Table 1: Robustness against three different attacks.Videos NoiseId [7] [8] [PM](0.1) [PM](0.2)

Foreman

(a) 72 65 75 78(b) 74 69 74 79(c) 71 67 73 76

Hall Monitor

(a) 70 66 73 75(b) 72 68 73 76(c) 73 69 75 78

Attacks NoiseIdSalt and Pepper Noise (a)Circular Averaging Filter (b)Gaussian Filter (c)

4. CONCLUSIONSIn this paper, a compressed domain watermarking method

based on motion coherency is proposed for low bit com-pressed videos like H.264/AVC. Low pass frame and highframes are extracted for each short video neighborhood. The

watermark is embedded in the low pass frame. Experimen-tal results show the effectiveness of the proposed method.The proposed method can be further evaluated against wa-termark estimation attack and frame drop or alter attack infuture.

5. ACKNOWLEDGMENTSI would like to thank Tata Consultancy Services (TCS) for

their support through the TCS Research Scholar Programfor perusing my Ph.D. Degree.

6. REFERENCES[1] I. Cox, J. Kilian, F. Leighton, and T. Shamoon.

Secure spread spectrum watermarking for multimedia.Image Processing, IEEE Transactions on,6(12):1673–1687, 1997.

[2] P. Dong, Y. Xia, L. Zhuo, and D. Feng. Real-timemoving object segmentation and tracking forH.264/AVC surveillance videos. In Proceedings ofICIP, pages 2309–2312, 2011.

[3] T. Dutta, A. Sur, and S. Nandi. A RobustCompressed Domain Video Watermarking in P-frameswith Controlled Bit Rate Increase. In Proceedings ofNCC, pages 1–5, 2013.

[4] T. Dutta, A. Sur, and S. Nandi. MCRD: MotionCoherent Region Detection In H.264 CompressedVideo. In Proceedings of ICME, 2013.

[5] E. Esen and A. Alatan. Robust Video Data HidingUsing Forbidden Zone Data Hiding and SelectiveEmbedding. IEEE Transactions on Circuits andSystems for Video Technology, 21(8):1130–1138, 2011.

[6] Z. Liu, Y. Lu, and Z. Zhang. Real-time spatiotemporalsegmentation of video objects in the H.264 compresseddomain. Journal of Visual Comunication and ImageRepresentation, 18(3):275–290, 2007.

[7] A. Mansouri, A. Aznaveh, F. Torkamani, andF. Kurugollu. A Low Complexity Video Watermarkingin H.264 Compressed Domain. IEEE Trans. onInformation Forensics and Security, 5(4):649–657,2010.

[8] M. Noorkami and R. Mersereau. Digital VideoWatermarking in P-Frames With Controlled VideoBit-Rate Increase. IEEE Trans. on InformationForensics and Security, 3(3):441–455, 2008.

[9] M. Noorkami and R. M. Mersereau. A Framework forRobust Watermarking of H.264-Encoded Video WithControllable Detection Performance. IEEE Trans. onInformation Forensics and Security, 2(1):14–23, 2007.

[10] V. Pankajakshan, G. Doerr, and P. Bora. Detection ofmotion-incoherent components in video streams. IEEETrans. on Information Forensics and Security,4(1):49–58, 2009.

[11] K. Sahring. H.264 Reference Software Group, 2008.

[12] R. Wang, H. Zhang, and Y. Zhang. A confidencemeasure based moving object extraction system builtfor compressed domain. In Proceedings of ISCAS,volume 5, pages 21–24, 2000.

[13] T. Wiegand and G. Sullivan. Advanced video codingfor generic audiovisual services. InternationalTellecommunication Union, 2003.

1042

Documents

[ACM Press the 21st ACM international conference - Barcelona, Spain (2013.10.21-2013.10.25)] Proceedings of the 21st ACM international conference on Multimedia - MM '13 - Motion compensated