A Tunable Encryption Scheme and Analysis of Fast Selective Encryption for CAVLC and CABAC in H.264/AVC

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

1

A Tunable Encryption Scheme and Analysis of Fast

Selective Encryption for CAVLC and CABAC in

H.264/AVCYongsheng Wang, Student Member, IEEE, Maire O’Neill, Senior Member, IEEE,

and Fatih Kurugollu, Senior Member, IEEE

Abstract—Recently, two fast selective encryption methods forCAVLC and CABAC in H.264/AVC were proposed by Shahidet al. In this work, it is demonstrated that these two methodsare not as efficient as only encrypting the sign bits of nonzerocoefficients. Experimental results show that without encryptingthe sign bits of nonzero coefficients, these two methods cannot provide a perceptual scrambling effect. If a much strongerscrambling effect is required, intra prediction modes and thesign bits of motion vectors can be encrypted together with thesign bits of nonzero coefficients. For practical applications, therequired encryption scheme should be customized according to auser’s specified requirement on the perceptual scrambling effectand the computational cost. Thus, a tunable encryption schemecombining these three methods is proposed for H.264/AVC. Tosimplify its implementation and reduce the computational cost,a simple control mechanism is proposed to adjust the controlfactors. Experimental results show that this scheme can providedifferent scrambling levels by adjusting three control factors withno or very little impact on the compression performance. Theproposed scheme can run in real-time and its computational costis minimal. The security of the proposed scheme is also discussed.It is secure against the replacement attack when all three controlfactors are set to 1.

Index Terms—Selective encryption, sign bits, nonzero co-efficients, intra prediction mode, motion vectors, tunable,H.264/AVC.

I. INTRODUCTION

IN the past decade, with the rapid development of digital

video and network technology, a series of video compres-

sion standards have been proposed to meet the increasing

requirements of video applications. At the same time, a sig-

nificant amount of research has been carried out on protecting

the video stream. Selective encryption is one of the most

promising techniques for practical applications [1]–[3], as it

can meet real-time processing requirements and provide an

effective perceptual scrambling effect with no or very minimal

impact on the compression performance. Using some selective

encryption methods, the encrypted video stream will remain

format-compliant [4] [5] to a general decoder, which preserves

Manuscript received September 27, 2012; revised January 11, 2013;accepted January 16, 2013. This research was supported by the Queen’sUniversity, Belfast. This paper was recommended by Associate Editor M.Barni.

The authors are with the Centre for Secure Information Technologies,Queen’s University, Belfast, BT3 9DT UK (e-mail: [email protected],[email protected], [email protected])

the original functionality of the video stream, like transcoding,

network friendliness, error resiliency, and so on.

The latest video compression standard, H.264/AVC (Part

10 of MPEG-4), was collaboratively issued by the joint video

team (JVT) of ISO/IEC MPEG and ITU-T VCEG in 2003,

and was recently updated by JVT [6]–[8] . From the point

of view of compression performance, it has been reported

that H.264/AVC can significantly outperform previous stan-

dards [7]–[10]. H.264/AVC offers two entropy coding methods

to encode nonzero quantized transform coefficients, CAVLC

(Context-Adaptive Variable Length Coding) and CABAC

(Context-Adaptive Binary Arithmetic Coding). In the price of

additional computational cost, CABAC can generally offer an

average bitrate saving of 6%∼10% [10] [11].

Several selective encryption methods for H.264/AVC have

been proposed in the literature. Ahn et al. [12] first proposed

the encryption of the intra prediction mode in H.264/AVC.

This method is format compliant. For CAVLC, it has no effect

on the compression ratio since it does not change the bit length

that represents the intra prediction mode; for CABAC, the

compression ratio is only very slightly affected. Kwon et al.

[13] proposed to permutate motion vectors and the slice data;

however, this method incurs a longer delay when encoding

since the relocation happens in the range of a slice. Under

the background of MPEG-1, Shi and Bhargava [14]–[16] first

proposed the encryption of the sign bits of nonzero quantized

DCT coefficients. They also indicated that this method can

be combined with the encryption of the sign bits of motion

vectors. Their method can effectively degrade the perceptual

quality without impacting the compression performance. Zeng

and Lei [17] proposed to integrate this method into a scheme

for MPEG-4, with block shuffling and rotation of the trans-

form coefficients and motion vectors. But the latter operation

decreases the compression ratio and leads to a longer delay

and higher computational complexity when encoding. Lian

[18] [19] extended the encryption of the sign bits of nonzero

coefficients into a scheme for H.264/AVC, which included the

encryption of intra DC coefficients, the sign bits of intra AC

coefficients, intra prediction modes and the sign bits of motion

vectors. Shahid et al. [20] [21] proposed to encrypt nonzero

quantized coefficients by using a permutation of codes with

the same suffix length for CAVLC in H.264/AVC. In [21]

[22], they also proposed a similar scheme for CABAC, which

encrypted the sub-suffix in the suffix of nonzero coefficients

Copyright (c) 2013 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtainedfrom the IEEE by sending an email to [email protected].



2

and their sign bits. Compared with only encrypting the sign

bits of nonzero coefficients, these two methods encrypt much

more bits, which seems to provide a better scrambling effect

and higher security level. However, as will be shown in this

paper, the perceptual scrambling effect of these two methods

is mainly produced by encrypting the sign bits of nonzero

coefficients, and they can not provide a higher security level

and are not as efficient as only encrypting the sign bits of

nonzero coefficients.

The permutation used in the methods proposed by Shahid

et al. [20]–[22] is equivalent to XORing the plaintext (here

the plaintext is the chosen bitstring) with a random byte

of the same length as the plaintext. Often, a random bit

sequence is generated by a stream cipher and then XORed

with the selective portion of video information. Since this

operation only includes an XOR operation, the corresponding

computational cost is relatively low. This makes the use of a

stream cipher very suitable for video encryption. In this paper,

a stream cipher, Rabbit [23], which was developed for the

ECRYPT Stream Cipher Project, is adopted to generate the

random bit sequence. It is one of the Profile 1 finalists which

are suitable for software implementation. To date, there has

been no effective attack against this cipher [24].

A tunable video encryption scheme for MPEG-1, which was

possibly the first proposed in the literature, was authored by

Meyer and Dadegast [25]. Their scheme provides four security

levels by encrypting different parts of the video stream.

However, this scheme affects the compression performance

and is not format compliant. In [26], Alattar et al. proposed

the encryption of every N -th I-macroblock and/or header

information for MPEG-1. Pazarci and Dipcin [27] proposed

a scrambling technique based on MPEG-2, which applied

four linear transforms to the original RGB-formatted video

content. This technique is criticized for its unrecoverable

quality loss and significant influence on compression ratio

[28]. Wang et al. [29] employed a control factor to adjust the

scrambling effect by operating on the DCT coefficients (also

for MPEG-2). But this technique still affects the compression

ratio [28] and can only provide a limited scrambling effect.

Based on MPEG-4, Li et al. [28] extended the encryption of

the sign bits of nonzero coefficients to form another tunable

scheme, which could provide different scrambling levels by

adjusting three control factors. The tunable feature of their

scheme is much more flexible and convenient in meeting the

various requirements of practical applications. Hong et al.

[30] proposed a quality-controllable encryption scheme for

H.264/AVC, which could provide four scrambling levels by

encrypting different parts of intra blocks. However, it affects

the compression ratio and often provides a relatively weak

scrambling effect.

In this paper, a tunable encryption scheme for H.264/AVC

is proposed, which improves upon this previous work [28].

The proposed tunable scheme combines the encryption of intra

prediction modes, the sign bits of nonzero coefficients and

the sign bits of motion vectors, where the number of each to

be encrypted is controlled by a corresponding control factor.

To simplify the implementation, a simple control mechanism

is proposed such that when the control factor is 1

N(N >

0), every N -th corresponding syntax element is chosen to be

encrypted. By adjusting these three control factors, a wide

range of scrambling effects can be produced.

The rest of this paper is arranged as follows. In Section II,

the method of encrypting the suffix of nonzero coefficients for

CAVLC as proposed by Shahid et al. [20] [21] is analyzed,

and statistical and experimental results show that without

encrypting the sign bits of nonzero coefficients, encrypting

other parts of the suffix can not provide an effective scrambling

effect. Also the method is shown not to be as efficient as

only encrypting the sign bits since it requires many more

random bits to encrypt more video stream bits. In Section

III, a second method proposed by Shahid et al. [21] [22] for

encryption of nonzero quantized DCT coefficients for CABAC

is analyzed. The same conclusion as above can be obtained

from statistical and experimental results. Thus, the encryption

of the sign bits of nonzero coefficients is recommended for

practical applications. In Section IV, an improved tunable

encryption scheme for H.264/AVC is proposed, which com-

bines the encryption of intra prediction modes, sign bits of

nonzero coefficients and sign bits of motion vectors. A simple

control mechanism to implement this tunable scheme is also

demonstrated. Experimental results show that the resulting

scrambling effect of this scheme can be effectively adjusted by

changing three control factors in real-time with no or minimal

impact on the compression ratio. Conclusions and future work

are provided in Section V.

II. ANALYSIS OF ENCRYPTING NONZERO QUANTIZED

COEFFICIENTS UNDER CAVLC

Under CAVLC, directly applying encryption to the sign bits

of DCT coefficients [16] has almost no impact on the com-

pression ratio. Shahid et al. [20] [21] proposed to encrypt the

suffix of nonzero quantized coefficients, which has no impact

at all on the compression ratio and can achieve an effective

perceptual scrambling effect. In this section, statistical and

experimental results show that if only the suffix is encrypted

without encrypting the sign bits of nonzero coefficients, then

an effective perceptual scrambling effect can not be produced.

In addition, encrypting the suffix needs many more random

bits, which means that its computational cost is relatively

greater than only encrypting the sign bits. Thus, the encryption

of the sign bits is better than the method in [20] [21].

A. Encoding of DCT coefficients under CAVLC

In CAVLC, the quantized coefficients of a block are grouped

into nonzero coefficients and runs of zeros, which are encoded

separately. As shown in Fig. 1, the nonzero coefficients of a

block are scanned in reverse order (from high frequency to low

frequency), the run of zeros before each nonzero coefficient is

also recorded in reverse order, and finally the information is

represented by five syntax elements. The first syntax element,

Coeff token, indicates the number of nonzero coefficients and

the number of trailing ±1 values (referred to as trailing ones,

or T1s). The next element indicates the sign of each T1 using

exactly one bit per T1. The remaining nonzero coefficients

are encoded into the third syntax element by using seven

VLC (Variable Length Coding) tables [6] [7]. Finally, the total



3

number of zeros and then the run of zeros before each nonzero

coefficient are encoded in the final two syntax elements.

Coeff_token

Signs of T1s

Remaining nonzero coefficients

The total number of zeros

Each run of zeros

Scanning in

reverse order

0 0 1 0 1 0 0

The run of zeros before

each nonzero coefficient

Encoding

Syntax ElementsRemaining nonzero

coefficientsT1s

Fig. 1. The encoding procedure of CAVLC in H.264/AVC.

Following the T1s, each of the remaining nonzero coef-

ficients is first mapped to a non-negative integer, called a

levelCode: if the nonzero coefficient is negative, the value

of the levelCode is 2 × |level| − 1; otherwise, it equals

2 × |level| − 2. Here, |level| means the absolute value of a

nonzero coefficient, level. In the video stream, the levelCode

is represented by a prefix and a suffix as in Equation (1):

the value of the prefix , denoted as prefix, is the smaller

one of 15 and (levelCode >> L); the value of the suffix

is levelCode − (prefix << L). Here, ‘<<’and ‘>>’ are

the logical left and right shift operators, respectively. L is

generally initialized to be 0, but when there are more than

10 nonzero coefficients and less than 3 trailing ones, it is set

to 1, which is a special case with a very small probability

of occurance. After encoding each coefficient, L is increased

by 1 if the magnitude of the coefficient is greater than the

corresponding threshold according to Table I. The prefix is

coded by consecutive leading zero bits followed by a ‘1’

bit, where the number of leading zero bits is the value of

the prefix. The suffix is coded as an unsigned integer of L

bits, and when L = 0, it is inferred to be 0. Each remaining

nonzero quantized coefficient following T1s is represented by

a bit stream similar to that shown in Fig. 2. The bitstream

used to represent a nonzero coefficient includes two parts, a

prefix, which corresponds to the MSB (most significant bit)

of its magnitude, and a suffix, which indicates the LSB (least

significant bit) of its magnitude and its sign bit.{

Value of the prefix = min(15, levelCode >> L)

Value of the suffix = levelCode− prefix << L(1)

TABLE ITHRESHOLDS TO INCREMENT L.

L 0 1 2 3 4 5 6

Threshold 0 3 6 12 24 48 ∞

0 0 · · · 01 × × · · · × s

Magnitude LSB

MSBCAVLC

Each remaining

nonzero coefficient

L-1 bit

suffix prefix

Sign bit

Fig. 2. The bitstream format for each remaining nonzero coefficient afterT1s under CAVLC.

B. Encryption of quantized coefficients under CAVLC

As mentioned, two methods have previously been proposed

to encrypt quantized coefficients: one is to encrypt the sign

bit of each nonzero coefficient [14]–[16], and another is to

encrypt the suffix of each nonzero coefficient [20] [21]. As

shown in Fig. 2, the sign bit is the last bit in the suffix when

L > 0. Since the suffix includes many more bits than the

sign bit, it appears that encryption of the suffix can provide a

much better scrambling effect and a higher security level than

encryption of only the sign bit. However, this assumption is

incorrect as demonstrated below.

Before comparing these two methods, the statistical distri-

bution of the maximum value of L in each coefficient block

is studied. This is obtained by encoding the first 30 frames

of ‘foreman’ at QCIF resolution under the baseline profile,

IPP...P, with QP=12 and 4:2:0 sampling format. First, the

distribution of the maximum value of L in each coefficient

block is shown in Fig. 3 and Fig. 4. Here, the coefficient blocks

may be a luma DC, luma AC or chroma AC block with a 4x4

block size, or a chroma DC with a 2x2 block size. Because the

majority of luma blocks have many more nonzero coefficients

after the DCT transformation and quantization than the chroma

blocks, the distribution of the maximum value of L for each

type of blocks is shown separately in Fig. 3 and Fig. 4. The

corresponding percentages are listed in Table II and Table III.

From Table II, for the luma component it is clear that the

maximum L value of most blocks in the I frame (the first

frame) is less than 4 (about 89%) and in P frames (the frames

following the I frame) less than 3 (about 96%). Since the

last bit in the suffix is occupied by the sign bit, this means

that for the luma component in most blocks, the sufffix at

most represents the 2 LSB bits for the I frame or 1 LSB bit

for P frames, and if the suffix is always set to the medium

value in its range, the error between the medium value and

the correct suffix will not be beyond half of its range. Thus,

without encryption of the sign bit and with encryption of the

remaining portion of the suffix, only a very small error will be

incurred for most luma coefficients, which is expected to be

under ±2 for the I frame and ±1 for P frames. For the chroma

component, a much smaller value of L is used as shown in

Table III. For the I frame, about 94% of blocks will have a

maximum L value of less than 3; for P frames, most of them

can be encoded with an L value of less than 2 or 1. This

means that encryption of the suffix excluding the sign bit for

most chroma coefficients will not produce an error.

02

46 0

1020

300

200

400

600

800

1000

Frame No.L, the length of the suffix

The

num

ber

ofco

effici

ents

’blo

ck

Fig. 3. The statistical distribution of the maximum L in each coefficientblock in terms of the number of coefficient blocks in the luma component.

The statistical result of L for all nonzero coefficients is also

investigated as shown in Fig. 5 and Fig. 6. These two figures

correspond to the luma and chroma components, respectively.

It can be observed that the conclusion above holds again: most

nonzero coefficients can be encoded using a small L value, 3

and 2 for the luma component in the I frame and P frames



4

respectively, and 2 and 1 for the chroma component in the I

frame and P frames respectively.

02

46 0

10

20

300

100

200

300

400

Frame No.L, the length of the suffix

The

num

ber

ofco

effici

ents

’blo

ck

Fig. 4. The statistical distribution of the maximum L in each coefficientblock in terms of the number of coefficient blocks in the chroma component.

TABLE IIFOR THE LUMA COMPONENT, PERCENTAGE OF THE BLOCKS WITH A

SPECIFIED L AS ITS MAXIMUM L RELATIVE TO THE TOTAL NONZERO

COEFFICIENT BLOCKS IN EACH FRAME.

Frame Percentage for each L (%)No. 0 1 2 3 4 5 6

1 10.72 37.52 24.02 17.14 7.21 3.04 0.33

2 52.37 34.52 10.66 2.37 0.08

0

0

3 53.17 34.65 10.38 1.57 0.224 55.93 31.54 10.44 1.94 0.155 51.52 35.92 10.30 1.96 0.296 49.82 38.65 9.48 1.98 0.077 50.14 38.55 8.33 2.62 0.368 53.10 38.19 7.21 1.28 0.219 47.64 42.29 8.57 1.50 010 54.01 36.89 6.88 1.79 0.29 0.1411 52.38 38.66 6.58 1.95 0.43 012 52.74 37.78 6.81 2.22 0.44 013 46.72 41.32 9.22 2.52 0.22 014 45.39 40.25 10.98 2.67 0.63 0.0715 44.58 38.94 12.52 3.55 0.28 0.1416 46.99 38.84 11.10 2.81 0.21 0.0717 49.76 40.18 8.25 1.68 0.07 0.0718 50.61 40.22 8.10 0.86 0.22

0

19 54.83 38.08 5.87 1.15 0.0720 53.05 41.46 4.75 0.59 0.1521 56.06 38.71 4.66 0.50 0.0722 54.07 39.06 5.90 0.82 0.1523 54.55 37.93 5.27 2.02 0.2224 59.23 33.38 5.54 1.54 0.3125 56.49 36.4 5.87 1.09 0.1526 56.56 36.21 5.53 1.62 0.0727 60.96 34.3 4.08 0.66 028 65.18 31.39 3.01 0.42 029 61.76 36.16 1.76 0.32 030 67.01 30.36 2.47 0.17 0

Moreover, all of the statistical results above just consider

blocks with some nonzero coefficients. Some blocks are di-

rectly predicted from neighbouring blocks without residual

data. Considering these blocks have an L value of 0, a block

is much more probable to be encoded with a small L.

As mentioned previously in Fig. 2, the remaining portion

of the suffix excluding the last sign bit, only represents the

L − 1 LSB bits of the full coefficient. Thus, encrypting the

suffix without the sign bit, can only scramble L− 1 LSB bits

of the coefficient. At most, this is equivalent to increasing the

quantization parameter, QP, which may result in lowering the

perceptual quality to a similar level seen in a low bitrate video

TABLE IIIFOR THE CHROMA COMPONENT, PERCENTAGE OF THE BLOCKS WITH A

SPECIFIED L AS ITS MAXIMUM L RELATIVE TO THE TOTAL NONZERO

COEFFICIENT BLOCKS IN EACH FRAME.

Frame Percentage for each L (%)No. 0 1 2 3 4 5 6

1 49.24 31.09 13.76 5.09 0.83

0 0

2 92.83 6.75 0.42 0

0

3 89.61 9.32 1.08 04 91.78 6.91 1.32 05 89.09 9.45 1.09 0.366 93.27 6.41 0.32

0

7 88.10 10.78 1.128 90.57 9.09 0.349 92.18 7.14 0.68

10 91.64 7.16 1.1911 91.28 7.85 0.8712 92.17 7.54 0.2913 90.27 9.73 014 86.06 12.06 1.07 0.54 0.2715 87.14 11.67 0.95 0.24

0

16 89.12 10.36 0.52

0

17 91.17 8.83 018 93.26 6.46 0.2819 94.01 5.99 020 95.17 4.83 021 95.60 4.40 022 94.03 5.97 023 91.94 8.06 024 93.35 6.65 025 90.67 8.80 0.5326 91.45 8.55 027 96.42 3.58 028 94.96 5.04 029 94.76 5.24 030 96.20 3.80 0

0

2

4

60

10

20

30

0

1000

2000

3000

4000

5000

L, the length of the suffixFrame No.T

he

num

ber

ofnon

zero

coeffi

cien

ts

Fig. 5. The statistical distribution of L in terms of the number of nonzerocoefficients in the luma component.

02

460

1020

30

0

200

400

600

800

1000

1200

L, the length of the suffixFrame No.The

num

ber

ofnon

zero

coeffi

cien

ts

Fig. 6. The statistical distribution of L in terms of the number of nonzerocoefficients in the chroma component.



5

but keeping the scene fully intelligible.

Based on the statistical results and observation on L, it

implies that encryption of the suffix without the sign bit

can not produce an effective perceptual scrambling effect.

Since the suffix is encrypted by XORing it with random bits,

encryption of the remaining portion of the suffix indeed wastes

these random bits. It is also fair to say that encryption of the

suffix is not as efficient as encryption of only the sign bit in

the suffix. From an attacker’s point of view, if a brute-force

attack can break encryption of the sign bit, it will also break

encryption of the suffix. Thus, encryption of the suffix can not

provide a higher security level than encryption of the sign bit.

C. Experimental results for encrypting the suffix without the

sign bit under CAVLC

For a suffix with L bits, if the sign bit is set as correct,

only the L− 1 bits in the suffix are encrypted. The resulting

encrypted video has almost no scrambling effect. For example,

the tenth frame in ‘foreman’ with QP=12 is shown in Fig. 7. In

fact, if the L−1 bits are set to the medium value of its range,

the perceptual quality can be further recovered. Adjusting the

QP value, experimental results for the ‘foreman’ sequence are

tabulated in Table IV in terms of PSNR and SSIM (Structural

SIMilarity) [31]–[33]. SSIM is a recently proposed metric for

image quality evaluation, which has been reported to perform

more closely to subjective observation than PSNR [31]–[33].

It is clear that encrypting the remaining portion of the suffix

does not produce an effective scrambling effect. This supports

the conclusion in the previous section: encryption of the suffix

actually relies on encryption of the sign bit. Thus, it is not

necessary to encrypt the total suffix but better to encrypt the

sign bit for each nonzero coefficient.

(a) (b)

Fig. 7. The 10th frame in ‘foreman’: (a) the original; (b) the encryptedversion in which the suffix is encrypted excluding the sign bit.

TABLE IVCOMPARISON OF THE PERCEPTUAL QUALITY OF THE RECONSTRUCTED

VIDEO AND THE ENCRYPTED VERSION IN WHICH THE SUFFIX IS

ENCRYPTED EXCLUDING THE SIGN BIT.

QP 12 18 24 30 36

Y 21.06 28.30 22.62 29.35 36.37Encrypted U 48.06 46.71 38.98 40.13 38.39

PSNR V 47.43 50.57 51.42 41.76 38.84Y 49.60 44.37 39.63 35.45 31.39

Reconstructed U 49.36 45.58 42.34 40.13 38.39V 50.46 47.42 44.32 41.76 38.84

SSIM Encrypted 0.956 0.970 0.947 0.938 0.897Reconstructed 0.996 0.989 0.973 0.945 0.898

D. Encrypting the sign bits of nonzero coefficients under

CAVLC

When the first of the remaining nonzero coefficients is

encoded, generally its value is very small and L is initialized

to 0. Under this case,the value of the prefix equals levelCode.

The bit length to represent this coefficient is denoted as l1:

l1 = value of the prefix + 1

=

{

2× |level| − 1 if level > 0

2× |level| if level < 0

(2)

After encoding the first nonzero coefficient following T1s,

L increases to 1. For the other remaining coefficients, the bit

length of a coefficient is denoted as lo:

lo = value of the prefix + 1 + L

= (|level| − 1) >> (L− 1) + 1 + L(3)

where, L ≥ 1. It is obvious that lo is only related to

the magnitude of a coefficient. Thus, randomly flipping the

sign bits of other remaining coefficients will not change the

compression ratio.

However, Equation (2) implies that l1 is affected by the

sign of the coefficient. This means that randomly flipping

the sign of this coefficient when L = 0 may slightly affect

the compression performance. When a positive coefficient is

encoded as negative with the same magnitude, l1 increases by

1; or when a negative one is encoded as positive, it decreases

by 1. Since the residual data is the difference between the

original block and its prediction, the sign of a coefficient can

be thought to be approximately uniformly distributed. Under

this case, encrypting the sign bit when L = 0 has a very slight

impact on the compression performance.

From practical tests, when encrypting the sign bits of all

nonzero coefficients, the compression ratio fluctuates in the

range of -0.05% to 0.05%, which is caused by encrypting

the sign bits of nonzero coefficients with L = 0. This small

fluctuation is negligible. Thus, for implementation conve-

nience, encrypting the sign bits of all nonzero coefficients

is recommended, since the value of L does not need to be

checked.

III. ANALYSIS OF ENCRYPTING NONZERO QUANTIZED

COEFFICIENTS UNDER CABAC

A. Encoding of DCT coefficients under CABAC

In CABAC, significant map coding is used to specify the

positions of nonzero quantized DCT coefficients, unlike in

CAVLC where zeros run coding is adopted; the nonzero

quantized DCT coefficients are encoded by a binary arith-

metic coding (BAC) module with an adaptive context model.

The generic block diagram of CABAC is shown in Fig. 8,

which consists of three elementary stages, binarization, context

model updating and binary arithmetic coding.

Binary Arithmetic Coder

binstring

syntax element

with a binary value

syntax

element

Binarization Context model Arithmetic coder

Bypass

syntax element with

a non-binary value

bin value for context update

bitstream

Fig. 8. The generic block diagram of CABAC in H.264/AVC.



6

Context

model

Arithmetic

Coder

Magnitude>14A nonzero

coefficient

Magnitude-1

Prefix UT

binarization

bitstream

Sign bit

Suffix EG0

binarization

Bypass

Fig. 9. The procedure for encoding a nonzero quantized coefficient underCABAC in H.264/AVC.

For a nonzero quantized coefficient, the sign bit and the

magnitude are coded separately: the sign bit only passes

through the bypass branch; its magnitude is first binarized by

UEG0 [6] [11], which is the concatenation of the truncated

unary (TU) code and the 0th Exp-Golomb (EG0) code, and

then coded by BAC. Fig. 9 shows the block diagram depicting

the encoding of a nonzero coefficient. Since zero coefficients

are coded by a significance map [6] [11], the binarizing and

coding procedure is applied to the magnitude minus 1. As

shown in Fig. 10, after the binarization of the magnitude, the

generated binary string has, at most, two parts, a prefix and a

suffix. The prefix is coded by the truncated unary (TU) code

and the suffix is coded by the 0th Exp-Golomb (EG0) code.

If the magnitude is not greater than 14, it is only represented

by a prefix; else, a suffix is concatenated to the prefix. The

suffix also consists of a sub-prefix of L ‘1’ bits heading a ‘0’

bit, and a sub-suffix with L bits. L is decided by Equation (4),

where ⌊•⌋ denotes the maximum integer no greater than •. The

value of the sub-suffix is calculated according to Equation (5),

which corresponds to the L LSBs of the magnitude −14.

Magnitude-1 bits

L bits14 bits

1…1 x…x0

1…1 0

1…1

if Magnitude <= 14,

L bits

if Magnitude > 14

Magnitude =

prefix suffix

Fig. 10. Bin string after binarizing the magnitude of a nonzero quantizedcoefficient under CABAC in H.264/AVC.

{

L = ⌊log2(x+ 1)⌋

x = Magnitude − 15(4)

Value of the sub-suffix = Magnitude − 14− 2L (5)

From equation (4), the range of the magnitude for a given

L can be calculated, as listed in Table V.

TABLE VTHE MAGNITUDE RANGE FOR A GIVEN L .

Magnitude ≤ 15 ≤ 17 ≤ 21 ≤ 29 ≤ 45 ≤ 77 ≤ 141 ...

L 0 1 2 3 4 5 6 ...

B. Encryption of nonzero coefficients under CABAC

The method to encrypt the sign bits of nonzero quantized

coefficients [14] [15] [16], can also be extended to CABAC

in H.264/AVC. Since sign bits are bypassed without further

processing, the encryption of the sign bits has no impact on

the compression performance. Shahid et al. [21] [22] proposed

to encrypt the sign bit and the sub-suffix in the suffix of the

nonzero coefficients. Since the sub-suffix is just XORed with a

secret bit stream, its length will not change. Thus, this method

can also maintain the compression ratio. Compared with only

encrypting the sign bits, this method encrypts many more bits

in the sub-suffix and therefore it appears to be much more

secure and could possibly provide a better scrambling effect.

However, as indicated previously, the sub-suffix is only the L

LSBs of the magnitude −14. Similarly to CAVLC, encryption

of the sub-suffix is in fact not as efficient as only encrypting

the sign bits and can not provide higher security from the point

view of the perceptual scrambling effect.

TABLE VITHE PERCENTAGE FOR EACH L IN TERMS OF THE NUMBER OF NONZERO

COEFFICIENTS WITH A GIVEN L RELATIVE TO THE TOTAL NUMBER OF

NONZERO COEFFICIENTS, FOR THE LUMA COMPONENT IN EACH FRAME.

Frame The percentage for each L (%)No. 0 1 2 3 4 5 6 ≥ 7

1 97.24 0.66 0.8 0.71 0.38 0.2 0.02

0

2 99.39 0.18 0.16 0.2 0.07 0

0

3 99.94 0.06 0 0 0 04 99.28 0.14 0.24 0.13 0.19 0.015 99.92 0.03 0.06 0 0 06 99.35 0.16 0.22 0.13 0.1 0.037 99.97 0 0.03 0 0

0

8 99.58 0.09 0.09 0.15 0.099 99.91 0 0.05 0.05 010 99.33 0.21 0.19 0.23 0.0411 99.79 0.05 0.12 0.05 012 99.54 0.09 0.11 0.18 0.0813 99.98 0 0.02 0 014 99.42 0.23 0.17 0.11 0.0715 99.79 0.1 0.12 0 016 99.36 0.16 0.25 0.19 0.0417 99.84 0.08 0.06 0.02 018 99.92 0.01 0.05 0 0.0119 99.98 0.02 0 0 020 99.89 0.04 0.04 0.03 021 100 0 0 0 022 99.43 0.1 0.16 0.21 0.123 99.9 0.02 0.07 0 024 99.36 0.14 0.15 0.21 0.1425 99.85 0.1 0.05 0 026 99.63 0.15 0.09 0.1 0.0327 99.98 0 0.02 0 028 99.95 0 0.04 0.02 029 100 0 0 0 030 100 0 0 0 0

TABLE VIITHE PERCENTAGE FOR EACH L IN TERMS OF THE NUMBER OF NONZERO

COEFFICIENTS WITH A GIVEN L RELATIVE TO THE TOTAL NUMBER OF

NONZERO COEFFICIENTS, FOR THE CHROMA COMPONENT IN EACH

FRAME.

Frame The percentage for each L (%)No. 0 1 2 3 4 5 ≥ 6

1 99.01 0.31 0.31 0.23 0.11 0.04

02 99.58 0.14 0.14 0 0.14 04 99.76 0 0.12 0 0.12 014 99.82 0 0.09 0.09 0 0

other 100 0

The statistical distribution of L is studied. Under the main

profile, IBPBP..., the first 30 frames of ‘foreman’ at QCIF



7

resolution are tested with QP=12 and 4:2:0 sampling format.

For the luma component, the percentage for each L in terms of

the number of nonzero coefficients with a given L relative to

the total number of nonzero coefficients in each frame is listed

in Table VI. It is clear that most of the nonzero coefficients in

the luma component are coded with L=0, above 97% for the I

frame (the first frame) and above 99% for P and B frames (the

frames following the I frame). For coefficients with L > 0,

most have a very small L. Similarly, the percentage for each L

in the chroma component is listed in Table VII. It can be seen

that most of the nonzero coefficients in the chroma component

are coded with L = 0. Since the percentage of coefficients with

L > 0 is rather small, it can be expected that encryption of

the sub-suffix can not provide an effective scrambling effect.

In addition, the value of the sub-suffix with a given L can

also be set to the medium value in its range under this given

L. The error between the medium value and the correct sub-

suffix will not be beyond half of its range, which can further

help to recover the perceptual quality. Similarly to CAVLC,

an attack can directly concentrate on breaking the encryption

of the sign bits without considering the sub-suffix. Thus, it is

believed that encryption of the sub-suffix is not necessary.

C. Experimental results for encrypting the sub-suffix without

the sign bit under CABAC

For each nonzero coefficient, supposing the sign bit is cor-

rect, only the sub-suffix is encrypted. Fig. 11, the tenth frame

in ‘foreman’ with QP=12, shows that this has no scrambling

effect. Setting the sub-suffix to the medium value in its range,

the perceptual quality of encrypted videos can be further

improved. Table VIII lists test results under different QPs for

the ‘foreman’ video sequence. It verifies that encrypting the

sub-suffix can not produce effective scrambling. This means

that the method proposed by Shahid [21] [22] in fact only

relies on the encryption of the sign bits.

(a) (b)

Fig. 11. The 10th frame in ‘foreman’: (a) the original; (b) encryption of thesub-suffix without the sign bit.

TABLE VIIICOMPARISON OF THE PERCEPTUAL QUALITY OF THE RECONSTRUCTED

VIDEO AND THE ONE ENCRYPTING THE SUB-SUFFIX WITHOUT THE SIGN

BIT UNDER CABAC.

QP 12 18 24 30 36

Y 23.62 21.88 36.50 35.49 31.41Encrypted U 52.47 45.65 42.51 40.31 38.45

PSNR V 46.20 55.54 65.45 41.95 39.02Y 49.50 44.31 39.73 35.49 31.41

Reconstructed U 49.25 45.65 42.51 40.31 38.45V 50.37 47.52 44.44 41.95 39.02

SSIM Encrypted 0.962 0.966 0.972 0.946 0.900Reconstructed 0.996 0.989 0.973 0.946 0.900

D. Encrypting sign bits of nonzero coefficients under CABAC

Since the sign bits of nonzero coefficients are coded in the

bypass mode under CABAC, randomly flipping these sign bits

will not affect the compression ratio. Based on the analysis

above, compared with the method proposed by Shahid et al.

[21] [22], encrypting the sign bits of nonzero coefficients is

recommended for practical applications.

IV. A TUNABLE SCHEME FOR VIDEO ENCRYPTION IN

H.264/AVC

Since there exits a wide range of video related applications,

from on-line video programmes to video surveillance systems,

from mobile video players to video conferencing, it is very

likely that the required scrambling effect to protect the video

stream varies significantly for different scenarios. For some

entertainment applications, providing a preview by lightly

scrambling the video content is of commercial value. But

for some sensitive applications, a more complete scrambling

effect is mandatory. Thus, in practical applications, it is

better to provide a parameterizable scrambling method which

achieves different scrambling effects by changing the provided

parameters according to the given application. Under this case,

a user can very easily adjust the scrambling effect according

to the specific application at hand. However, most existing

methods [2]–[5] [12]–[22] can only provide a fixed scrambling

effect. Although there are some tunable schemes [25]–[27]

[29] [30] which claim to provide different scrambling levels,

the practical scrambling effect is very limited and some of

them achieve this tunable feature at the cost of an increased

compression ratio and/or a reduction of the video quality. The

tunable scheme for MPEG-4 Part 2 proposed by Li et al. [28]

can achieve a wide range of scrambling effects. Different from

MPEG-4 Part 2, H.264/AVC first adopts the intra prediction

mode and a more powerful entropy coding method, CABAC,

both of which contribute a large degree of improvement on the

compression ratio [7]–[10]. In this section, the tunable concept

of Li et al. [28] is extended into H.264/AVC by combining

the three main selective encryption methods of H.264/AVC. To

simplify the implementation of this tunable feature and reduce

the computational cost, a simple control mechanism to adjust

the scrambling effect is also proposed. A full investigation

of the proposed tunable scheme is carried out in CAVLC

under the baseline profile and in CABAC under the main

profile, whereas most the related previous work is only tested

in CAVLC under the baseline or main profile.

A. Improved Tunable Encryption Scheme

In the previous two sections, encrypting the sign bits of

nonzero coefficients (referred to as SNC) is analyzed in detail.

Compared to the two fast selective encryption methods for

nonzero coefficients proposed by Shahid et al. [20]–[22], SNC

is more efficient while with the equivalent security and is

recommended for practical applications. In previous literature,

two further important selective encryption methods have been

reported for H.264/AVC: one involves encrypting the intra

prediction mode (referred to as IPM) as proposed by Ahn et al.

[12], and the other involves encrypting the sign bits of motion

vectors (referred to as EMV) as proposed by Shi and Bhargava



8

[14] [15]. These two methods and SNC can maintain format

compliance and compression performance, and it has been

proposed that they can be combined together into a scheme

for H.264/AVC [18] [19].

Based on MPEG-4 Part 2, Li et al. [28] proposed a tunable

encryption scheme, which included three categories of syntax

element to be encrypted: intra DC coefficients, the sign bits

of nonzero coefficients (except intra DC) and the sign bits

of motion vectors. These three categories of syntax element

correspond to three different dimensions of the visual quality:

the low-resolution spatial view, the high-resolution spatial

view and the temporal motion information. The number of

elements in each category to be encrypted is controlled by

one control factor. Thus, the resulting perceptual scrambling

effect can be adjusted by changing all three control factors.

Macroblock/

Frame = +

Intra

Prediction Modes

Motion Vectors

Quantized DCT

Coefficients

Residual

Data

Prediction

Information

Three Encryption options

1

2

3

Low-resolution

Spatial View

High-resolution

Spatial View

Temporal

Motion Information

Fig. 12. The three encryption options for H.264/AVC.

In this paper, it is proposed to extend this concept into

H.264/AVC. In H.264/AVC, the video information can be

represented by three syntax elements: the quantized DCT

coefficients of the residual data, intra prediction modes and the

motion vectors for inter prediction. When considering format

compliance and compression performance, Fig. 12 shows the

encryption options that can be employed for H.264/AVC,

which focus on these syntax elements. Furthermore, the intra

prediction modes, the sign bits of nonzero coefficients and

the sign bits of motion vectors correspond to three different

dimensions in H.264/AVC: the low-resolution spatial view,

the high-resolution spatial view and the temporal motion

information, each of which can be encrypted by IPM, SNC and

EMV, respectively. With three control factors, the procedure

is described as follows:

1) encrypting the intra prediction mode by IPM with prob-

ability pipm;

2) encrypting the sign bits of nonzero coefficients by SNC

with probability psnc;

3) encrypting the sign bits of motion vectors by EMV with

probability pemv.

The perceptual scrambling effect of each control factor can

be expected as follows:

• pipm = 1 → 0 (psnc = 0 and pemv = 0): the low spatial

perceptual quality will change from “roughly impercep-

tible” to “very perceptible”, but motion information can

be observed;

• psnc = 1 → 0 (pipm = 0 and pemv = 0): the high spatial

perceptual quality will change from “almost impercepti-

ble” to “very perceptible”, but motion information can be

observed;

• pemv = 1 → 0 (pipm = 0 and psnc = 0): the motion

information will change from “almost imperceptible” to

“very perceptible”, but in the first few frames, the video

content and quality is clear as the first frame is an I frame

and the spatial perceptual quality is not affected.

Here, pipm = 1 → 0 indicates that the value of pipm is

changing from 1 to 0, inclusive, similarly for psnc and pemv.

Although changing both pipm and psnc could adjust the

spatial perceptual quality, it is not recommended to control

them by using just one control factor because each corresponds

to a different syntax element and will influence the perceptual

quality differently. By separately controlling these two factors,

many more levels of scrambling effects can be achieved.

The values of the control factors is also an important issue,

which was analyzed in [28]. Using a very small value for

each control factor means that only a small number of the

related syntax elements are encrypted, which would enable an

attacker to guess the encrypted syntax elements one by one.

Analysis in [28] showed that a control factor greater than 0.09

can ensure sufficient complexity to prevent such an attack.

B. Implementation Issues

In the work by Li et al. [28], a probabilistic quality control

with a decimal factor p is realized by generating a pseudo-

random decimal, r ∈ [0, 1], which is compared with p: the

current syntax element is only encrypted when r ≤ p. But in

fact, this random number generation procedure is not necessary

and can be skipped to reduce the computational cost. Alattar et

al. [26] proposed the encryption of every N -th I macroblock

and/or the header of every N -th predicted macroblock for

secure transmission of MPEG-1 video streams. This method

can be extended to carry out the probabilistic quality control

process in a simplified fashion. The proposed process is as

follows:

1) The control factor, p, can only take a value of 0 or 1

N,

where, N is an integer greater than zero;

2) If p = 0, no encryption is carried out;

3) If p = 1

N, every N -th related syntax element is en-

crypted. In particular, when N = 1, p = 1 means that all

related syntax elements are encrypted.

For simplicity and to meet the lower bound of the control

factor, in practical experiments, N is chosen from 1, 2, 4 and

8. The resulting scrambling effect is not linear to the control

factor (but generally a higher control factor is likely to produce

much more effective degradation of the perceptual quality),

and it is not necessary to adjust the scrambling effect in small

steps since the human visual system is not sensitive.

For the encryption operation, a stream cipher can be used

to generate a random bit sequence, which is XORed with the

bits of the related syntax elements.

C. Experimental results

Experiments were set up based on the reference software of

H.264/AVC, JM17.2 [34]. A stream cipher, Rabbit [23], was

used to produce the random bit sequence.

1) The scrambling effect: To investigate the scrambling

effect, three test video sequences were chosen from a standard

video library [35]. Experiments were implemented on these

three sequences under the baseline profile with CAVLC as

the entropy coding method and under the main profile with



9

0 5 10 15 20 25 300

0.2

0.4

0.6

0.8

1

Frame No.

SS

IM

pipm=0.125pipm=0.25pipm=0.5pipm=1

(a) foreman encrypted by IPM

0 5 10 15 20 25 300

0.2

0.4

0.6

0.8

1

Frame No.

SS

IM

psnc=0.125psnc=0.25psnc=0.5psnc=1

(b) foreman encrypted by SNC

0 5 10 15 20 25 300

0.2

0.4

0.6

0.8

1

Frame No.

SS

IM

pemv=0.125pemv=0.25pemv=0.5pemv=1

(c) foreman encrypted by EMV

0 5 10 15 20 25 300

0.2

0.4

0.6

0.8

1

Frame No.

SS

IM


(d) coastguard encrypted by IPM

0 5 10 15 20 25 300

0.2

0.4

0.6

0.8

1

Frame No.

SS

IM


(e) coastguard encrypted by SNC

0 5 10 15 20 25 300

0.2

0.4

0.6

0.8

1

Frame No.

SS

IM


(f) coastguard encrypted by EMV

0 5 10 15 20 25 300

0.2

0.4

0.6

0.8

1

Frame No.

SS

IM


(g) mobile encrypted by IPM

0 5 10 15 20 25 300

0.2

0.4

0.6

0.8

1

Frame No.

SS

IM


(h) mobile encrypted by SNC

0 5 10 15 20 25 300

0.2

0.4

0.6

0.8

1

Frame No.S

SIM


(i) mobile encrypted by EMV

Fig. 13. The tunable scrambling effect of IPM, SNC and EMV, when adjusting the control factors; The three video sequences are tested in CAVLC underthe baseline profile with QP=18.

0 5 10 15 20 25 300

0.2

0.4

0.6

0.8

1

Frame No.

SS

IM



0 5 10 15 20 25 300

0.2

0.4

0.6

0.8

1

Frame No.

SS

IM



0 5 10 15 20 25 300

0.2

0.4

0.6

0.8

1

Frame No.

SS

IM



0 5 10 15 20 25 300

0.2

0.4

0.6

0.8

1

Frame No.

SS

IM



0 5 10 15 20 25 300

0.2

0.4

0.6

0.8

1

Frame No.

SS

IM



0 5 10 15 20 25 300

0.2

0.4

0.6

0.8

1

Frame No.

SS

IM



0 5 10 15 20 25 300

0.2

0.4

0.6

0.8

1

Frame No.

SS

IM



0 5 10 15 20 25 300

0.2

0.4

0.6

0.8

1

Frame No.

SS

IM



0 5 10 15 20 25 300

0.2

0.4

0.6

0.8

1

Frame No.

SS

IM



Fig. 14. The tunable scrambling effect of IPM, SNC and EMV, when adjusting the control factors; The three video sequences are tested in CABAC underthe main profile with QP=18.



10

12 18 24 30 360

0.2

0.4

0.6

0.8

1

QP

Avera

ge S

SIM



12 18 24 30 360

0.2

0.4

0.6

0.8

1

QP

Avera

ge S

SIM



12 18 24 30 360

0.2

0.4

0.6

0.8

1

QP

Avera

ge S

SIM



12 18 24 30 360

0.2

0.4

0.6

0.8

1

QP

Avera

ge S

SIM



12 18 24 30 360

0.2

0.4

0.6

0.8

1

QP

Avera

ge S

SIM



12 18 24 30 360

0.2

0.4

0.6

0.8

1

QP

Avera

ge S

SIM



12 18 24 30 360

0.2

0.4

0.6

0.8

1

QP

Avera

ge S

SIM



12 18 24 30 360

0.2

0.4

0.6

0.8

1

QP

Avera

ge S

SIM



12 18 24 30 360

0.2

0.4

0.6

0.8

1

QPA

vera

ge S

SIM



Fig. 15. The tunable scrambling effect of IPM, SNC and EMV for different QP values in CAVLC under the baseline profile, when adjusting the controlfactors; The SSIM values of 30 frames are averaged for each test.

12 18 24 30 360

0.2

0.4

0.6

0.8

1

QP

Avera

ge S

SIM



12 18 24 30 360

0.2

0.4

0.6

0.8

1

QP

Avera

ge S

SIM



12 18 24 30 360

0.2

0.4

0.6

0.8

1

QP

Avera

ge S

SIM



12 18 24 30 360

0.2

0.4

0.6

0.8

1

QP

Avera

ge S

SIM



12 18 24 30 360

0.2

0.4

0.6

0.8

1

QP

Avera

ge S

SIM



12 18 24 30 360

0.2

0.4

0.6

0.8

1

QP

Avera

ge S

SIM



12 18 24 30 360

0.2

0.4

0.6

0.8

1

QP

Avera

ge S

SIM



12 18 24 30 360

0.2

0.4

0.6

0.8

1

QP

Avera

ge S

SIM



12 18 24 30 360

0.2

0.4

0.6

0.8

1

QP

Avera

ge S

SIM



Fig. 16. The tunable scrambling effect of IPM, SNC and EMV for different QP values in CABAC under the main profile, when adjusting the control factors;The SSIM values of 30 frames are averaged for each test.



11

CABAC as the entropy coding method. For each sequence,

the first 30 frames were encrypted and then decoded without

decryption. In the baseline profile and the main profile, the

coding sequence is IPP...P and IBPBP...BP, respectively. To

evaluate the scrambling effect, only SSIM (Structural SIM-

ilarity) [31] is utilized in this section since as previously

mentioned it acts as a much better metric than PSNR for the

subjective observation [32].

In Fig. 13, the video sequences at QCIF resolution are

encoded in CAVLC under the baseline profile with QP=18.

It is clear that the scrambling effect of IPM, SNC and EMV

can be effectively adjusted by changing the corresponding

control factor. The relationship between each control factor

and the scrambling effect is nonlinear and may be different

for different test sequences. Generally, a higher control factor

will produce a much more effective scrambling effect, and

vice versa. On the other hand, a smaller control factor means

that fewer related syntax elements are encrypted and the

corresponding computational cost is relatively lower, and vice

versa. In Fig. 14, the same sequences are encoded in CABAC

under the main profile with QP=18. The same conclusion can

be obtained as in Fig. 13.

In Fig. 15 and 16, more experimental results for the scram-

bling effect are shown under different QP values. Generally, a

high QP means a high compression ratio and a relatively low

video quality. Here, for each test, the SSIM values of the 30

frames are averaged to represent the scrambling effect. It is

clear that under different QP settings, the above conclusion is

further confirmed: a higher control factor will produce a much

more effective scrambling effect, and vice versa.

Combining these three methods by separately adjusting the

three control factors, the tunable scheme can provide various

scrambling levels. According to the user’s requirement, these

three control factors can be adjusted to reach a compromise

between the computational cost and the scrambling effect.

It is also clear that when (pipm, psnc, pemv) = (1, 1, 1), the

resulting scrambling effect is the most effective. The 15-th

frame in the “foreman” sequence under different settings of

the control factors is presented in Fig. 17, which is encoded

in CABAC under the main profile. When encoded in CAVLC

under the baseline profile, similar results are obtained.

Among the three encryption methods, SNC contributes

much more to the scrambling effect than the other two, when

three control factors are set to the same value. This can be

concluded from the comparison of the three columns in Fig.

13, 14, 15 and 16. This is because the nonzero coefficients

represent the high-resolution spatial information for the video

and occupy a very high percentage of the video stream. IPM

is only to work for the I macroblocks, most of which are

located in the I frame. Thus, the resulting scrambling effect

of IPM is weaker than SNC. One frame of ’foreman’ is shown

in Fig. 17(e) and (h) to provide a perceptual sense, which are

encrypted by SNC and IPM, respectively. Combining both, a

more degraded scrambling frame is obtained as shown in Fig.

17(c). However, both SNC and IPM can only scramble the

texture in the video and can not encrypt the motion information

in a sequence. EMV is just to encrypt the motion information

and generally only affects the moving object in the video as

shown in Fig. 17(i), where most of the static background is

kept well while the face (the moving object) is scrambled.

From observation of Fig. 13(c), (f) and (i), and Fig. 14(c),

(f) and (i), when only EMV is applied, the first few frames

are left clear and are of good perceptual quality. Thus, EMV

should be combined with one or both of the other two when

a relatively strong scrambling effect is required.

(a) (1, 1, 1) (b) (1, 0.5, 0.5)

(c) (1, 1, 0) (d) (0.5, 0.5, 0.125)

(e) (0, 1, 0) (f) (0.5, 0.25, 0)

(g) (0.25, 0.125, 0) (h) (1, 0, 0)

(i) (0, 0, 1) (j) (0.125, 0, 0)

Fig. 17. The 15th frame in ‘foreman’ encrypted by different control factors(pipm, psnc, pemv).

2) Compression ratio: From the above tests, when IPM

is used in CABAC under the main profile, the compression

ratio is slightly affected, mostly in the range of -1% to 1%

and seldom reaching -2% or 2% only if QP=36. This small

change is due to the variable length coding used for the intra

prediction mode in CABAC. Also when SNC is applied in



12

CAVLC under the baseline profile, there is a small fluctuation

in the compression ratio, in the range of -0.05% to 0.05%. In

other test conditions, the compression ratio is unaffected.

3) Computational cost: For each of the three techniques

in the proposed scheme, the computational costs for different

values of the control factor are evaluated under the baseline

and main profiles. In the proposed scheme, the encryption

procedure is almost the same as the decryption procedure.

Since a decoder is much faster than the encoder, here the

computational time incurred by the encryption/decryption pro-

cedure is only compared with the normal decoding time. The

test is based on a PC (Intel Xeon CPU @3.20GHZ, 3GB

RAM). In Table IX and Table X, the computational overhead

times to encode the first 30 frames in the three test sequences

with QP=18 are presented and the permillages relative to

the normal decoding time are also listed. Here, the second

column, N , is the reciprocal of the corresponding control

factor. It is clear that no matter which profile is utilized,

the incurred computational time by the encryption/decryption

is minimal and the scheme can meet real-time processing

requirements. Among the three techniques, SNC consumes

much more of a time overhead than the other two. Thus, if

a faster encryption/decryption is required, the combination of

IPM and EMV without SNC would be a possible choice to

make further savings on the computational cost.

TABLE IXCOMPUTATIONAL COSTS OF IPM, SNC AND EMV FOR DIFFERENT

VALUES OF THE CONTROL FACTOR UNDER THE BASELINE PROFILE WITH

CAVLC AS THE ENTROPY CODING METHOD AND QP=18.

Video NTime cost (ms) Per thousand (‰)

IPM SNC EMV IPM SNC EMV

foreman

1 0.089 2.788 0.405 0.196 6.128 0.8912 0.049 1.659 0.239 0.109 3.646 0.5254 0.030 1.187 0.169 0.066 2.609 0.3728 0.021 0.972 0.135 0.047 2.136 0.296

coastguard

1 0.086 6.803 0.255 0.178 14.084 0.5282 0.047 4.048 0.150 0.098 8.381 0.3114 0.029 2.897 0.106 0.060 5.997 0.2208 0.021 2.372 0.085 0.043 4.910 0.175

mobile

1 0.087 11.914 0.368 0.146 19.989 0.6172 0.048 7.089 0.217 0.080 11.895 0.3644 0.029 5.073 0.153 0.049 8.511 0.2578 0.021 4.154 0.122 0.035 6.969 0.205

TABLE XCOMPUTATIONAL COSTS OF IPM, SNC AND EMV FOR DIFFERENT

VALUES OF THE CONTROL FACTOR UNDER THE MAIN PROFILE WITH

CABAC AS THE ENTROPY CODING METHOD AND QP=18.

Video NTime cost (ms) Per thousand (‰)

IPM SNC EMV IPM SNC EMV

foreman

1 0.103 2.499 0.400 0.225 5.492 0.8802 0.059 1.471 0.244 0.129 3.233 0.5354 0.035 1.052 0.172 0.077 2.312 0.3778 0.025 0.866 0.133 0.055 1.903 0.293

coastguard

1 0.088 5.889 0.253 0.181 12.192 0.5232 0.048 3.511 0.146 0.099 7.269 0.3024 0.029 2.507 0.107 0.061 5.191 0.2218 0.021 2.062 0.081 0.043 4.268 0.168

mobile

1 0.087 10.149 0.320 0.146 17.029 0.5372 0.048 6.052 0.185 0.080 10.155 0.3104 0.029 4.326 0.134 0.049 7.258 0.2258 0.021 3.548 0.107 0.035 5.952 0.180

4) Security: From a cryptographic viewpoint, the security

of the proposed scheme relies on the adopted stream cipher,

Rabbit, which to date has no known weaknesses [24].

Video encryption is also vulnerable to other attacks, such

as the replacement attack [36]. In the replacement attack, the

encrypted syntax elements are set to fixed values to improve

the video quality. But this often depends on the specified

application. For example, in an entertainment scenario, which

provides a video preview, a light scrambling effect is required.

In such a scenario, even though the use of the replacement

attack can improve the quality of the scrambled video, a high

quality copy can still not be obtained, and thus the replacement

attack is unsuccessful for such applications. However, for sen-

sitive video content which often requires a strong scrambling

effect, some useful information may be revealed using the

replacement attack. In the proposed scheme, when all three

control factors are 1, it becomes equivalent to the scheme

in [19], which has been verified to be secure against the

replacement attack [19]. Thus, it is suggested that for such

applications, all three control factors should be set to 1 to

achieve the most effective scrambling effect. As shown in

Fig. 18(a), by using the replacement attack on the “foreman”

sequence in the same setting as the test in Fig. 17(a), the

video quality can be improved to some extent. But it is still

unintelligible as shown in Fig. 18(b), which is the 15th frame

of the sequence. Here, the replacement attack is conducted by

setting all the intra prediction modes to the predicted one, and

all the sign bits of nonzero coefficients and motion vectors to

positive values.

0 5 10 15 20 25 300

0.2

0.4

0.6

0.8

1

Frame No.

SS

IM

Encrypted by (1,1,1)Under the replacement attack

(a) SSIM of “foreman”

(b) The 15th frame

Fig. 18. The replacement attack to the “foreman” sequence encrypted by(1,1,1) under the main profile in CABAC with QP=18.

V. CONCLUSIONS AND FUTURE WORK

In this paper, two fast selective encryption methods for

CAVLC and CABAC in H.264/AVC are analyzed. These two



13

methods can be thought of as extensions of the sign bit

encryption of nonzero transformed coefficients: for CAVLC,

the suffix of each nonzero coefficient is encrypted by XORing

with a random byte of the same length as the suffix; for

CABAC, the sub-suffix in the suffix of each nonzero coef-

ficient together with its sign bit is encrypted by an XORing

operation. Although these two methods encrypt many more

bits and seem to be more secure, it is demonstrated that they

are in fact not as efficient as only encrypting the sign bits of

nonzero coefficients. Experimental results indicate that without

encrypting the sign bits, these two methods can not provide

a perceptual scrambling effect. Thus, these two methods can

not enhance the security as claimed. Based on the analysis,

it is recommended that encryption of sign bits is much more

suitable for practical applications.

For practical applications, providing different scrambling

effects may be convenient to meet various user’s requirements.

Thus, a tunable encryption scheme is proposed to encrypt

intra prediction modes and the sign bits of motion vectors

together with the sign bits of nonzero coefficients. To simplify

the implementation, a simple control mechanism is adopted:

when the control factor is set to 1

N(N is an integer greater

than 0), every N -th corresponding syntax element is chosen

to be encrypted. According to the user’s requirement, the

perceptual scrambling effect can be adjusted by changing

the three control factors. Experimental results show that the

proposed scheme can provide this tunable feature by changing

the three control factors with no or very minimal impact on

the compression ratio. The computational costs of the proposed

scheme for different values of the control factor are evaluated

under the baseline and main profiles, which show that it can

meet real-time processing requirements. A security analysis

of the proposed scheme is also given. It is suggested that for

sensitive video content, all three control factors should be set

to 1 to achieve the most effective scrambling effect, which is

secure against the replacement attack.

Future work aims at designing an automatically tunable

scheme which can automatically adjust the three control

factors to achieve an expected psychovisual or security cri-

terion. Other possible work would be to further investigate

the security of existing selective video encryption methods

using some new types of attack derived from advanced image

and video processing techniques. Applying existing video

encryption techniques for privacy region protection in video

surveillance systems is another research focus [37] [38].

REFERENCES

[1] T. Lookabaugh and D. Sicker, “Selective Encryption for ConsumerApplications,” IEEE Communications Magazine, vol. 42, no. 5, pp. 124–129, 2004.

[2] A. Massoudi, F. Lefebvre, C. De Vleeschouwer, B. Macq, andJ. Quisquater, “Overview on selective encryption of image and video:challenges and perspectives,” EURASIP J. Info. Secur., vol. 2008, pp.1–18, 2008.

[3] T. Stutz and A. Uhl, “Survey of H.264 AVC/SVC encryption,” IEEE

Trans. Circuits Syst. Video Technol., vol. 22, no. 3, pp. 325–339, 2011.[4] J. Wen, M. Severa, W. Zeng, M. Luttrell, and W. Jin, “A Format-

compliant Configurable Encryption Framework for Access Control ofMultimedia,” in IEEE 4th Workshop on Multimedia Signal Processing,

2001, pp. 435–440.

[5] ——, “A Format-compliant Configurable Encryption Framework forAccess Control of Video,” IEEE Trans. Circuits Syst. Video Tech.,vol. 12, no. 6, pp. 545–557, 2002.

[6] ITU-T. Rec.(ISO/IEC 14496-10):2010, “Advanced Video Coding forGeneric Audio-visual Services,”.

[7] I. Richardson, The H.264 Advanced Video Compression Standard. Wi-ley, 2010.

[8] Y. Shi and H. Sun, Image and Video Compression for Multimedia

Engineering: Fundamentals, Algorithms, and Standards. CRC Pr ILlc, 2000.

[9] T. Wiegand, G. Sullivan, G. Bjontegaard, and A. Luthra, “Overview ofThe H.264/AVC Video Coding Standard,” IEEE Trans. Circuits Syst.

Video Technol., vol. 13, no. 7, pp. 560–576, 2003.[10] A. Puri, X. Chen, and A. Luthra, “Video Coding Using The

H.264/MPEG-4 AVC Compression Standard,” Signal processing: Image

communication, vol. 19, no. 9, pp. 793–849, 2004.[11] D. Marpe, H. Schwarz, and T. Wiegand, “Context-based Adaptive Binary

Arithmetic Coding in The H.264/AVC Video Compression Standard,”IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 620–636,2003.

[12] J. Ahn, H. Shim, B. Jeon, and I. Choi, “Digital Video ScramblingMethod Using Intra Prediction Mode,” PCM 2004, Springer, LNCS, vol.3333, pp. 386–393.

[13] S. Kwon, W. Choi, and B. Jeon, “Digital Video Scrambling UsingMotion Vector and Slice Relocation,” in Proc. 2nd Int. Conf. Image

Analysis and Recognition, Springer, LNCS, 2005, vol. 3656, pp. 207–214.

[14] C. Shi and B. Bhargava, “An Efficient MPEG Video Encryption Algo-rithm,” in Proc. 17th IEEE Symp. Reliable Distributed Systems, 1998,pp. 381–386.

[15] B. Bhargava, C. Shi, and S. Wang, “MPEG Video Encryption Algo-rithms,” Multimedia Tools and Applications, vol. 24, no. 1, pp. 57–79,2004.

[16] C. Shi and B. Bhargava, “A Fast MPEG Video Encryption Algorithm,”in Proc. 6th ACM Int. Conf. Multimedia, 1998, pp. 81–88.

[17] W. Zeng and S. Lei, “Efficient Frequency Domain Selective Scramblingof Digital Video,” IEEE Trans. Multimedia, vol. 5, no. 1, pp. 118–129,2003.

[18] S. Lian, Z. Liu, Z. Ren, and Z. Wang, “Selective Video EncryptionBased on Advanced Video Coding,” PCM 2005, Springer, LNCS, vol.3768, pp. 281–290.

[19] S. Lian, Z. Liu, Z. Ren, and H. Wang, “Secure Advanced Video CodingBased on Selective Encryption Algorithms,” IEEE Trans. Consumer

Electron., vol. 52, no. 2, pp. 621–629, 2006.[20] Z. Shahid, M. Chaumont, and W. Puech et al., “Fast Protection of

H.264/AVC by Selective Encryption,” in Proc. SinFra IPAL Symp., 2009,pp. 11-21.

[21] ——, “Fast Protection of H.264/AVC by Selective Encryption ofCAVLC and CABAC for I & P Frames,” IEEE Trans. Circuits Syst.

Video Technol., no. 99, pp. 565–576, 2011.[22] ——, “Fast Protection of H.264/AVC by Selective Encryption of

CABAC,” in Proc. IEEE Int. Conf. Multimedia and Expo, 2009, pp.1038–1041.

[23] M. Boesgaard, M. Vesterager, T. Christensen, and E. Zenner, “TheStream Cipher Rabbit,” ECRYPT Stream Cipher Project Report, vol. 6,2005.

[24] S. Babbage, C. Canniere, A. Canteaut, C. Cid, H. Gilbert, T. Johansson,M. Parker, B. Preneel, V. Rijmen, and M. Robshaw, “The eSTREAMPortfolio,” eSTREAM, ECRYPT Stream Cipher Project, 2008.

[25] J. Meyer and F. Gadegast, “Security Mechanisms for Multimedia Datawith The Example MPEG-1 Video,” Project Description of SECMPEG,

Technical University of Berlin, Germany, 1995.[26] A. Alattar, G. Al-Regib, and S. Al-Semari, “Improved Selective Encryp-

tion Techniques for Secure Transmission of MPEG Video Bit-streams,”in Proc. ICIP, 1999, pp. 256–260.

[27] M. Pazarci and V. Dipcin, “A MPEG2-transparent Scrambling Tech-nique,” IEEE Trans. Consumer Electron., vol. 48, no. 2, pp. 345–355,2002.

[28] S. Li, G. Chen, A. Cheung, B. Bhargava, and K. Lo, “On The Design ofPerceptual MPEG-video Encryption Algorithms,” IEEE Trans. Circuits

Syst. Video Technol., vol. 17, no. 2, pp. 214–223, 2007.[29] C. Wang, H. Yu, and M. Zheng, “A DCT-based MPEG-2 Transparent

Scrambling Algorithm,” IEEE Trans. Consumer Electron., vol. 49, no. 4,pp. 1208–1213, 2003.

[30] G. Hong, C. Yuan, Y. Wang, and Y. Zhong, “A Quality-controllableEncryption for H.264/AVC Video Coding,” PCM 2006, Springer, LNCS,Vol. 4261, pp. 510–517.



14

[31] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image QualityAssessment: From Error Visibility to Structural Similarity,” IEEE Trans.

Image Process., vol. 13, no. 4, pp. 600–612, 2004.[32] Z. Wang and A. Bovik, “Mean Squared Error: Love It or Leave It? A

New Look at Signal Fidelity Measures,” IEEE Signal Process. Mag.,vol. 26, no. 1, pp. 98–117, 2009.

[33] A. Bovik, The essential guide to Image Processing. Academic Pr, 2009.[34] JM reference software, ver. 17.2. [Online]. Available: http://iphome.hhi.

de/suehring/tml, 2011.[35] Video trace library. [Online]. Available: http://trace.eas.asu.edu/yuv/

index.html, 2011.[36] M. Podesser, H. Schmidt, and A. Uhl, “Selective Bitplane Encryption

for Secure Transmission of Image Data in Mobile Environments,” inProc. the 5th IEEE Nordic Signal Process. Symp., 2002, pp. 4–6.

[37] F. Dufaux and T. Ebrahimi, “Scrambling for Privacy Protection in VideoSurveillance Systems,” IEEE Trans. Circuits Syst. Video Technol., vol.18, no. 8, pp. 1168–1174, 2008.

[38] F. Dai, L. Tong, Y. Zhang, and J. Li, “Restricted H.264/AVC VideoCoding for Privacy Protected Video Scrambling,” J. Visual Commun.

Image Represent., vol. 22, no. 6, pp. 479–490, 2011.

Documents

A Tunable Encryption Scheme and Analysis of Fast Selective Encryption for CAVLC and CABAC in H.264/AVC