Upload
fatih
View
214
Download
0
Embed Size (px)
Citation preview
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
1
A Tunable Encryption Scheme and Analysis of Fast
Selective Encryption for CAVLC and CABAC in
H.264/AVCYongsheng Wang, Student Member, IEEE, Maire O’Neill, Senior Member, IEEE,
and Fatih Kurugollu, Senior Member, IEEE
Abstract—Recently, two fast selective encryption methods forCAVLC and CABAC in H.264/AVC were proposed by Shahidet al. In this work, it is demonstrated that these two methodsare not as efficient as only encrypting the sign bits of nonzerocoefficients. Experimental results show that without encryptingthe sign bits of nonzero coefficients, these two methods cannot provide a perceptual scrambling effect. If a much strongerscrambling effect is required, intra prediction modes and thesign bits of motion vectors can be encrypted together with thesign bits of nonzero coefficients. For practical applications, therequired encryption scheme should be customized according to auser’s specified requirement on the perceptual scrambling effectand the computational cost. Thus, a tunable encryption schemecombining these three methods is proposed for H.264/AVC. Tosimplify its implementation and reduce the computational cost,a simple control mechanism is proposed to adjust the controlfactors. Experimental results show that this scheme can providedifferent scrambling levels by adjusting three control factors withno or very little impact on the compression performance. Theproposed scheme can run in real-time and its computational costis minimal. The security of the proposed scheme is also discussed.It is secure against the replacement attack when all three controlfactors are set to 1.
Index Terms—Selective encryption, sign bits, nonzero co-efficients, intra prediction mode, motion vectors, tunable,H.264/AVC.
I. INTRODUCTION
IN the past decade, with the rapid development of digital
video and network technology, a series of video compres-
sion standards have been proposed to meet the increasing
requirements of video applications. At the same time, a sig-
nificant amount of research has been carried out on protecting
the video stream. Selective encryption is one of the most
promising techniques for practical applications [1]–[3], as it
can meet real-time processing requirements and provide an
effective perceptual scrambling effect with no or very minimal
impact on the compression performance. Using some selective
encryption methods, the encrypted video stream will remain
format-compliant [4] [5] to a general decoder, which preserves
Manuscript received September 27, 2012; revised January 11, 2013;accepted January 16, 2013. This research was supported by the Queen’sUniversity, Belfast. This paper was recommended by Associate Editor M.Barni.
The authors are with the Centre for Secure Information Technologies,Queen’s University, Belfast, BT3 9DT UK (e-mail: [email protected],[email protected], [email protected])
the original functionality of the video stream, like transcoding,
network friendliness, error resiliency, and so on.
The latest video compression standard, H.264/AVC (Part
10 of MPEG-4), was collaboratively issued by the joint video
team (JVT) of ISO/IEC MPEG and ITU-T VCEG in 2003,
and was recently updated by JVT [6]–[8] . From the point
of view of compression performance, it has been reported
that H.264/AVC can significantly outperform previous stan-
dards [7]–[10]. H.264/AVC offers two entropy coding methods
to encode nonzero quantized transform coefficients, CAVLC
(Context-Adaptive Variable Length Coding) and CABAC
(Context-Adaptive Binary Arithmetic Coding). In the price of
additional computational cost, CABAC can generally offer an
average bitrate saving of 6%∼10% [10] [11].
Several selective encryption methods for H.264/AVC have
been proposed in the literature. Ahn et al. [12] first proposed
the encryption of the intra prediction mode in H.264/AVC.
This method is format compliant. For CAVLC, it has no effect
on the compression ratio since it does not change the bit length
that represents the intra prediction mode; for CABAC, the
compression ratio is only very slightly affected. Kwon et al.
[13] proposed to permutate motion vectors and the slice data;
however, this method incurs a longer delay when encoding
since the relocation happens in the range of a slice. Under
the background of MPEG-1, Shi and Bhargava [14]–[16] first
proposed the encryption of the sign bits of nonzero quantized
DCT coefficients. They also indicated that this method can
be combined with the encryption of the sign bits of motion
vectors. Their method can effectively degrade the perceptual
quality without impacting the compression performance. Zeng
and Lei [17] proposed to integrate this method into a scheme
for MPEG-4, with block shuffling and rotation of the trans-
form coefficients and motion vectors. But the latter operation
decreases the compression ratio and leads to a longer delay
and higher computational complexity when encoding. Lian
[18] [19] extended the encryption of the sign bits of nonzero
coefficients into a scheme for H.264/AVC, which included the
encryption of intra DC coefficients, the sign bits of intra AC
coefficients, intra prediction modes and the sign bits of motion
vectors. Shahid et al. [20] [21] proposed to encrypt nonzero
quantized coefficients by using a permutation of codes with
the same suffix length for CAVLC in H.264/AVC. In [21]
[22], they also proposed a similar scheme for CABAC, which
encrypted the sub-suffix in the suffix of nonzero coefficients
Copyright (c) 2013 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtainedfrom the IEEE by sending an email to [email protected].
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
2
and their sign bits. Compared with only encrypting the sign
bits of nonzero coefficients, these two methods encrypt much
more bits, which seems to provide a better scrambling effect
and higher security level. However, as will be shown in this
paper, the perceptual scrambling effect of these two methods
is mainly produced by encrypting the sign bits of nonzero
coefficients, and they can not provide a higher security level
and are not as efficient as only encrypting the sign bits of
nonzero coefficients.
The permutation used in the methods proposed by Shahid
et al. [20]–[22] is equivalent to XORing the plaintext (here
the plaintext is the chosen bitstring) with a random byte
of the same length as the plaintext. Often, a random bit
sequence is generated by a stream cipher and then XORed
with the selective portion of video information. Since this
operation only includes an XOR operation, the corresponding
computational cost is relatively low. This makes the use of a
stream cipher very suitable for video encryption. In this paper,
a stream cipher, Rabbit [23], which was developed for the
ECRYPT Stream Cipher Project, is adopted to generate the
random bit sequence. It is one of the Profile 1 finalists which
are suitable for software implementation. To date, there has
been no effective attack against this cipher [24].
A tunable video encryption scheme for MPEG-1, which was
possibly the first proposed in the literature, was authored by
Meyer and Dadegast [25]. Their scheme provides four security
levels by encrypting different parts of the video stream.
However, this scheme affects the compression performance
and is not format compliant. In [26], Alattar et al. proposed
the encryption of every N -th I-macroblock and/or header
information for MPEG-1. Pazarci and Dipcin [27] proposed
a scrambling technique based on MPEG-2, which applied
four linear transforms to the original RGB-formatted video
content. This technique is criticized for its unrecoverable
quality loss and significant influence on compression ratio
[28]. Wang et al. [29] employed a control factor to adjust the
scrambling effect by operating on the DCT coefficients (also
for MPEG-2). But this technique still affects the compression
ratio [28] and can only provide a limited scrambling effect.
Based on MPEG-4, Li et al. [28] extended the encryption of
the sign bits of nonzero coefficients to form another tunable
scheme, which could provide different scrambling levels by
adjusting three control factors. The tunable feature of their
scheme is much more flexible and convenient in meeting the
various requirements of practical applications. Hong et al.
[30] proposed a quality-controllable encryption scheme for
H.264/AVC, which could provide four scrambling levels by
encrypting different parts of intra blocks. However, it affects
the compression ratio and often provides a relatively weak
scrambling effect.
In this paper, a tunable encryption scheme for H.264/AVC
is proposed, which improves upon this previous work [28].
The proposed tunable scheme combines the encryption of intra
prediction modes, the sign bits of nonzero coefficients and
the sign bits of motion vectors, where the number of each to
be encrypted is controlled by a corresponding control factor.
To simplify the implementation, a simple control mechanism
is proposed such that when the control factor is 1
N(N >
0), every N -th corresponding syntax element is chosen to be
encrypted. By adjusting these three control factors, a wide
range of scrambling effects can be produced.
The rest of this paper is arranged as follows. In Section II,
the method of encrypting the suffix of nonzero coefficients for
CAVLC as proposed by Shahid et al. [20] [21] is analyzed,
and statistical and experimental results show that without
encrypting the sign bits of nonzero coefficients, encrypting
other parts of the suffix can not provide an effective scrambling
effect. Also the method is shown not to be as efficient as
only encrypting the sign bits since it requires many more
random bits to encrypt more video stream bits. In Section
III, a second method proposed by Shahid et al. [21] [22] for
encryption of nonzero quantized DCT coefficients for CABAC
is analyzed. The same conclusion as above can be obtained
from statistical and experimental results. Thus, the encryption
of the sign bits of nonzero coefficients is recommended for
practical applications. In Section IV, an improved tunable
encryption scheme for H.264/AVC is proposed, which com-
bines the encryption of intra prediction modes, sign bits of
nonzero coefficients and sign bits of motion vectors. A simple
control mechanism to implement this tunable scheme is also
demonstrated. Experimental results show that the resulting
scrambling effect of this scheme can be effectively adjusted by
changing three control factors in real-time with no or minimal
impact on the compression ratio. Conclusions and future work
are provided in Section V.
II. ANALYSIS OF ENCRYPTING NONZERO QUANTIZED
COEFFICIENTS UNDER CAVLC
Under CAVLC, directly applying encryption to the sign bits
of DCT coefficients [16] has almost no impact on the com-
pression ratio. Shahid et al. [20] [21] proposed to encrypt the
suffix of nonzero quantized coefficients, which has no impact
at all on the compression ratio and can achieve an effective
perceptual scrambling effect. In this section, statistical and
experimental results show that if only the suffix is encrypted
without encrypting the sign bits of nonzero coefficients, then
an effective perceptual scrambling effect can not be produced.
In addition, encrypting the suffix needs many more random
bits, which means that its computational cost is relatively
greater than only encrypting the sign bits. Thus, the encryption
of the sign bits is better than the method in [20] [21].
A. Encoding of DCT coefficients under CAVLC
In CAVLC, the quantized coefficients of a block are grouped
into nonzero coefficients and runs of zeros, which are encoded
separately. As shown in Fig. 1, the nonzero coefficients of a
block are scanned in reverse order (from high frequency to low
frequency), the run of zeros before each nonzero coefficient is
also recorded in reverse order, and finally the information is
represented by five syntax elements. The first syntax element,
Coeff token, indicates the number of nonzero coefficients and
the number of trailing ±1 values (referred to as trailing ones,
or T1s). The next element indicates the sign of each T1 using
exactly one bit per T1. The remaining nonzero coefficients
are encoded into the third syntax element by using seven
VLC (Variable Length Coding) tables [6] [7]. Finally, the total
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
3
number of zeros and then the run of zeros before each nonzero
coefficient are encoded in the final two syntax elements.
Coeff_token
Signs of T1s
Remaining nonzero coefficients
The total number of zeros
Each run of zeros
Scanning in
reverse order
0 0 1 0 1 0 0
The run of zeros before
each nonzero coefficient
Encoding
Syntax ElementsRemaining nonzero
coefficientsT1s
Fig. 1. The encoding procedure of CAVLC in H.264/AVC.
Following the T1s, each of the remaining nonzero coef-
ficients is first mapped to a non-negative integer, called a
levelCode: if the nonzero coefficient is negative, the value
of the levelCode is 2 × |level| − 1; otherwise, it equals
2 × |level| − 2. Here, |level| means the absolute value of a
nonzero coefficient, level. In the video stream, the levelCode
is represented by a prefix and a suffix as in Equation (1):
the value of the prefix , denoted as prefix, is the smaller
one of 15 and (levelCode >> L); the value of the suffix
is levelCode − (prefix << L). Here, ‘<<’and ‘>>’ are
the logical left and right shift operators, respectively. L is
generally initialized to be 0, but when there are more than
10 nonzero coefficients and less than 3 trailing ones, it is set
to 1, which is a special case with a very small probability
of occurance. After encoding each coefficient, L is increased
by 1 if the magnitude of the coefficient is greater than the
corresponding threshold according to Table I. The prefix is
coded by consecutive leading zero bits followed by a ‘1’
bit, where the number of leading zero bits is the value of
the prefix. The suffix is coded as an unsigned integer of L
bits, and when L = 0, it is inferred to be 0. Each remaining
nonzero quantized coefficient following T1s is represented by
a bit stream similar to that shown in Fig. 2. The bitstream
used to represent a nonzero coefficient includes two parts, a
prefix, which corresponds to the MSB (most significant bit)
of its magnitude, and a suffix, which indicates the LSB (least
significant bit) of its magnitude and its sign bit.{
Value of the prefix = min(15, levelCode >> L)
Value of the suffix = levelCode− prefix << L(1)
TABLE ITHRESHOLDS TO INCREMENT L.
L 0 1 2 3 4 5 6
Threshold 0 3 6 12 24 48 ∞
0 0 · · · 01 × × · · · × s
Magnitude LSB
MSBCAVLC
Each remaining
nonzero coefficient
L-1 bit
suffix prefix
Sign bit
Fig. 2. The bitstream format for each remaining nonzero coefficient afterT1s under CAVLC.
B. Encryption of quantized coefficients under CAVLC
As mentioned, two methods have previously been proposed
to encrypt quantized coefficients: one is to encrypt the sign
bit of each nonzero coefficient [14]–[16], and another is to
encrypt the suffix of each nonzero coefficient [20] [21]. As
shown in Fig. 2, the sign bit is the last bit in the suffix when
L > 0. Since the suffix includes many more bits than the
sign bit, it appears that encryption of the suffix can provide a
much better scrambling effect and a higher security level than
encryption of only the sign bit. However, this assumption is
incorrect as demonstrated below.
Before comparing these two methods, the statistical distri-
bution of the maximum value of L in each coefficient block
is studied. This is obtained by encoding the first 30 frames
of ‘foreman’ at QCIF resolution under the baseline profile,
IPP...P, with QP=12 and 4:2:0 sampling format. First, the
distribution of the maximum value of L in each coefficient
block is shown in Fig. 3 and Fig. 4. Here, the coefficient blocks
may be a luma DC, luma AC or chroma AC block with a 4x4
block size, or a chroma DC with a 2x2 block size. Because the
majority of luma blocks have many more nonzero coefficients
after the DCT transformation and quantization than the chroma
blocks, the distribution of the maximum value of L for each
type of blocks is shown separately in Fig. 3 and Fig. 4. The
corresponding percentages are listed in Table II and Table III.
From Table II, for the luma component it is clear that the
maximum L value of most blocks in the I frame (the first
frame) is less than 4 (about 89%) and in P frames (the frames
following the I frame) less than 3 (about 96%). Since the
last bit in the suffix is occupied by the sign bit, this means
that for the luma component in most blocks, the sufffix at
most represents the 2 LSB bits for the I frame or 1 LSB bit
for P frames, and if the suffix is always set to the medium
value in its range, the error between the medium value and
the correct suffix will not be beyond half of its range. Thus,
without encryption of the sign bit and with encryption of the
remaining portion of the suffix, only a very small error will be
incurred for most luma coefficients, which is expected to be
under ±2 for the I frame and ±1 for P frames. For the chroma
component, a much smaller value of L is used as shown in
Table III. For the I frame, about 94% of blocks will have a
maximum L value of less than 3; for P frames, most of them
can be encoded with an L value of less than 2 or 1. This
means that encryption of the suffix excluding the sign bit for
most chroma coefficients will not produce an error.
02
46 0
1020
300
200
400
600
800
1000
Frame No.L, the length of the suffix
The
num
ber
ofco
effici
ents
’blo
ck
Fig. 3. The statistical distribution of the maximum L in each coefficientblock in terms of the number of coefficient blocks in the luma component.
The statistical result of L for all nonzero coefficients is also
investigated as shown in Fig. 5 and Fig. 6. These two figures
correspond to the luma and chroma components, respectively.
It can be observed that the conclusion above holds again: most
nonzero coefficients can be encoded using a small L value, 3
and 2 for the luma component in the I frame and P frames
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
4
respectively, and 2 and 1 for the chroma component in the I
frame and P frames respectively.
02
46 0
10
20
300
100
200
300
400
Frame No.L, the length of the suffix
The
num
ber
ofco
effici
ents
’blo
ck
Fig. 4. The statistical distribution of the maximum L in each coefficientblock in terms of the number of coefficient blocks in the chroma component.
TABLE IIFOR THE LUMA COMPONENT, PERCENTAGE OF THE BLOCKS WITH A
SPECIFIED L AS ITS MAXIMUM L RELATIVE TO THE TOTAL NONZERO
COEFFICIENT BLOCKS IN EACH FRAME.
Frame Percentage for each L (%)No. 0 1 2 3 4 5 6
1 10.72 37.52 24.02 17.14 7.21 3.04 0.33
2 52.37 34.52 10.66 2.37 0.08
0
0
3 53.17 34.65 10.38 1.57 0.224 55.93 31.54 10.44 1.94 0.155 51.52 35.92 10.30 1.96 0.296 49.82 38.65 9.48 1.98 0.077 50.14 38.55 8.33 2.62 0.368 53.10 38.19 7.21 1.28 0.219 47.64 42.29 8.57 1.50 010 54.01 36.89 6.88 1.79 0.29 0.1411 52.38 38.66 6.58 1.95 0.43 012 52.74 37.78 6.81 2.22 0.44 013 46.72 41.32 9.22 2.52 0.22 014 45.39 40.25 10.98 2.67 0.63 0.0715 44.58 38.94 12.52 3.55 0.28 0.1416 46.99 38.84 11.10 2.81 0.21 0.0717 49.76 40.18 8.25 1.68 0.07 0.0718 50.61 40.22 8.10 0.86 0.22
0
19 54.83 38.08 5.87 1.15 0.0720 53.05 41.46 4.75 0.59 0.1521 56.06 38.71 4.66 0.50 0.0722 54.07 39.06 5.90 0.82 0.1523 54.55 37.93 5.27 2.02 0.2224 59.23 33.38 5.54 1.54 0.3125 56.49 36.4 5.87 1.09 0.1526 56.56 36.21 5.53 1.62 0.0727 60.96 34.3 4.08 0.66 028 65.18 31.39 3.01 0.42 029 61.76 36.16 1.76 0.32 030 67.01 30.36 2.47 0.17 0
Moreover, all of the statistical results above just consider
blocks with some nonzero coefficients. Some blocks are di-
rectly predicted from neighbouring blocks without residual
data. Considering these blocks have an L value of 0, a block
is much more probable to be encoded with a small L.
As mentioned previously in Fig. 2, the remaining portion
of the suffix excluding the last sign bit, only represents the
L − 1 LSB bits of the full coefficient. Thus, encrypting the
suffix without the sign bit, can only scramble L− 1 LSB bits
of the coefficient. At most, this is equivalent to increasing the
quantization parameter, QP, which may result in lowering the
perceptual quality to a similar level seen in a low bitrate video
TABLE IIIFOR THE CHROMA COMPONENT, PERCENTAGE OF THE BLOCKS WITH A
SPECIFIED L AS ITS MAXIMUM L RELATIVE TO THE TOTAL NONZERO
COEFFICIENT BLOCKS IN EACH FRAME.
Frame Percentage for each L (%)No. 0 1 2 3 4 5 6
1 49.24 31.09 13.76 5.09 0.83
0 0
2 92.83 6.75 0.42 0
0
3 89.61 9.32 1.08 04 91.78 6.91 1.32 05 89.09 9.45 1.09 0.366 93.27 6.41 0.32
0
7 88.10 10.78 1.128 90.57 9.09 0.349 92.18 7.14 0.68
10 91.64 7.16 1.1911 91.28 7.85 0.8712 92.17 7.54 0.2913 90.27 9.73 014 86.06 12.06 1.07 0.54 0.2715 87.14 11.67 0.95 0.24
0
16 89.12 10.36 0.52
0
17 91.17 8.83 018 93.26 6.46 0.2819 94.01 5.99 020 95.17 4.83 021 95.60 4.40 022 94.03 5.97 023 91.94 8.06 024 93.35 6.65 025 90.67 8.80 0.5326 91.45 8.55 027 96.42 3.58 028 94.96 5.04 029 94.76 5.24 030 96.20 3.80 0
0
2
4
60
10
20
30
0
1000
2000
3000
4000
5000
L, the length of the suffixFrame No.T
he
num
ber
ofnon
zero
coeffi
cien
ts
Fig. 5. The statistical distribution of L in terms of the number of nonzerocoefficients in the luma component.
02
460
1020
30
0
200
400
600
800
1000
1200
L, the length of the suffixFrame No.The
num
ber
ofnon
zero
coeffi
cien
ts
Fig. 6. The statistical distribution of L in terms of the number of nonzerocoefficients in the chroma component.
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
5
but keeping the scene fully intelligible.
Based on the statistical results and observation on L, it
implies that encryption of the suffix without the sign bit
can not produce an effective perceptual scrambling effect.
Since the suffix is encrypted by XORing it with random bits,
encryption of the remaining portion of the suffix indeed wastes
these random bits. It is also fair to say that encryption of the
suffix is not as efficient as encryption of only the sign bit in
the suffix. From an attacker’s point of view, if a brute-force
attack can break encryption of the sign bit, it will also break
encryption of the suffix. Thus, encryption of the suffix can not
provide a higher security level than encryption of the sign bit.
C. Experimental results for encrypting the suffix without the
sign bit under CAVLC
For a suffix with L bits, if the sign bit is set as correct,
only the L− 1 bits in the suffix are encrypted. The resulting
encrypted video has almost no scrambling effect. For example,
the tenth frame in ‘foreman’ with QP=12 is shown in Fig. 7. In
fact, if the L−1 bits are set to the medium value of its range,
the perceptual quality can be further recovered. Adjusting the
QP value, experimental results for the ‘foreman’ sequence are
tabulated in Table IV in terms of PSNR and SSIM (Structural
SIMilarity) [31]–[33]. SSIM is a recently proposed metric for
image quality evaluation, which has been reported to perform
more closely to subjective observation than PSNR [31]–[33].
It is clear that encrypting the remaining portion of the suffix
does not produce an effective scrambling effect. This supports
the conclusion in the previous section: encryption of the suffix
actually relies on encryption of the sign bit. Thus, it is not
necessary to encrypt the total suffix but better to encrypt the
sign bit for each nonzero coefficient.
(a) (b)
Fig. 7. The 10th frame in ‘foreman’: (a) the original; (b) the encryptedversion in which the suffix is encrypted excluding the sign bit.
TABLE IVCOMPARISON OF THE PERCEPTUAL QUALITY OF THE RECONSTRUCTED
VIDEO AND THE ENCRYPTED VERSION IN WHICH THE SUFFIX IS
ENCRYPTED EXCLUDING THE SIGN BIT.
QP 12 18 24 30 36
Y 21.06 28.30 22.62 29.35 36.37Encrypted U 48.06 46.71 38.98 40.13 38.39
PSNR V 47.43 50.57 51.42 41.76 38.84Y 49.60 44.37 39.63 35.45 31.39
Reconstructed U 49.36 45.58 42.34 40.13 38.39V 50.46 47.42 44.32 41.76 38.84
SSIM Encrypted 0.956 0.970 0.947 0.938 0.897Reconstructed 0.996 0.989 0.973 0.945 0.898
D. Encrypting the sign bits of nonzero coefficients under
CAVLC
When the first of the remaining nonzero coefficients is
encoded, generally its value is very small and L is initialized
to 0. Under this case,the value of the prefix equals levelCode.
The bit length to represent this coefficient is denoted as l1:
l1 = value of the prefix + 1
=
{
2× |level| − 1 if level > 0
2× |level| if level < 0
(2)
After encoding the first nonzero coefficient following T1s,
L increases to 1. For the other remaining coefficients, the bit
length of a coefficient is denoted as lo:
lo = value of the prefix + 1 + L
= (|level| − 1) >> (L− 1) + 1 + L(3)
where, L ≥ 1. It is obvious that lo is only related to
the magnitude of a coefficient. Thus, randomly flipping the
sign bits of other remaining coefficients will not change the
compression ratio.
However, Equation (2) implies that l1 is affected by the
sign of the coefficient. This means that randomly flipping
the sign of this coefficient when L = 0 may slightly affect
the compression performance. When a positive coefficient is
encoded as negative with the same magnitude, l1 increases by
1; or when a negative one is encoded as positive, it decreases
by 1. Since the residual data is the difference between the
original block and its prediction, the sign of a coefficient can
be thought to be approximately uniformly distributed. Under
this case, encrypting the sign bit when L = 0 has a very slight
impact on the compression performance.
From practical tests, when encrypting the sign bits of all
nonzero coefficients, the compression ratio fluctuates in the
range of -0.05% to 0.05%, which is caused by encrypting
the sign bits of nonzero coefficients with L = 0. This small
fluctuation is negligible. Thus, for implementation conve-
nience, encrypting the sign bits of all nonzero coefficients
is recommended, since the value of L does not need to be
checked.
III. ANALYSIS OF ENCRYPTING NONZERO QUANTIZED
COEFFICIENTS UNDER CABAC
A. Encoding of DCT coefficients under CABAC
In CABAC, significant map coding is used to specify the
positions of nonzero quantized DCT coefficients, unlike in
CAVLC where zeros run coding is adopted; the nonzero
quantized DCT coefficients are encoded by a binary arith-
metic coding (BAC) module with an adaptive context model.
The generic block diagram of CABAC is shown in Fig. 8,
which consists of three elementary stages, binarization, context
model updating and binary arithmetic coding.
Binary Arithmetic Coder
binstring
syntax element
with a binary value
syntax
element
Binarization Context model Arithmetic coder
Bypass
syntax element with
a non-binary value
bin value for context update
bitstream
Fig. 8. The generic block diagram of CABAC in H.264/AVC.
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
6
Context
model
Arithmetic
Coder
Magnitude>14A nonzero
coefficient
Magnitude-1
Prefix UT
binarization
bitstream
Sign bit
Suffix EG0
binarization
Bypass
Fig. 9. The procedure for encoding a nonzero quantized coefficient underCABAC in H.264/AVC.
For a nonzero quantized coefficient, the sign bit and the
magnitude are coded separately: the sign bit only passes
through the bypass branch; its magnitude is first binarized by
UEG0 [6] [11], which is the concatenation of the truncated
unary (TU) code and the 0th Exp-Golomb (EG0) code, and
then coded by BAC. Fig. 9 shows the block diagram depicting
the encoding of a nonzero coefficient. Since zero coefficients
are coded by a significance map [6] [11], the binarizing and
coding procedure is applied to the magnitude minus 1. As
shown in Fig. 10, after the binarization of the magnitude, the
generated binary string has, at most, two parts, a prefix and a
suffix. The prefix is coded by the truncated unary (TU) code
and the suffix is coded by the 0th Exp-Golomb (EG0) code.
If the magnitude is not greater than 14, it is only represented
by a prefix; else, a suffix is concatenated to the prefix. The
suffix also consists of a sub-prefix of L ‘1’ bits heading a ‘0’
bit, and a sub-suffix with L bits. L is decided by Equation (4),
where ⌊•⌋ denotes the maximum integer no greater than •. The
value of the sub-suffix is calculated according to Equation (5),
which corresponds to the L LSBs of the magnitude −14.
Magnitude-1 bits
L bits14 bits
1…1 x…x0
1…1 0
1…1
if Magnitude <= 14,
L bits
if Magnitude > 14
Magnitude =
prefix suffix
Fig. 10. Bin string after binarizing the magnitude of a nonzero quantizedcoefficient under CABAC in H.264/AVC.
{
L = ⌊log2(x+ 1)⌋
x = Magnitude − 15(4)
Value of the sub-suffix = Magnitude − 14− 2L (5)
From equation (4), the range of the magnitude for a given
L can be calculated, as listed in Table V.
TABLE VTHE MAGNITUDE RANGE FOR A GIVEN L .
Magnitude ≤ 15 ≤ 17 ≤ 21 ≤ 29 ≤ 45 ≤ 77 ≤ 141 ...
L 0 1 2 3 4 5 6 ...
B. Encryption of nonzero coefficients under CABAC
The method to encrypt the sign bits of nonzero quantized
coefficients [14] [15] [16], can also be extended to CABAC
in H.264/AVC. Since sign bits are bypassed without further
processing, the encryption of the sign bits has no impact on
the compression performance. Shahid et al. [21] [22] proposed
to encrypt the sign bit and the sub-suffix in the suffix of the
nonzero coefficients. Since the sub-suffix is just XORed with a
secret bit stream, its length will not change. Thus, this method
can also maintain the compression ratio. Compared with only
encrypting the sign bits, this method encrypts many more bits
in the sub-suffix and therefore it appears to be much more
secure and could possibly provide a better scrambling effect.
However, as indicated previously, the sub-suffix is only the L
LSBs of the magnitude −14. Similarly to CAVLC, encryption
of the sub-suffix is in fact not as efficient as only encrypting
the sign bits and can not provide higher security from the point
view of the perceptual scrambling effect.
TABLE VITHE PERCENTAGE FOR EACH L IN TERMS OF THE NUMBER OF NONZERO
COEFFICIENTS WITH A GIVEN L RELATIVE TO THE TOTAL NUMBER OF
NONZERO COEFFICIENTS, FOR THE LUMA COMPONENT IN EACH FRAME.
Frame The percentage for each L (%)No. 0 1 2 3 4 5 6 ≥ 7
1 97.24 0.66 0.8 0.71 0.38 0.2 0.02
0
2 99.39 0.18 0.16 0.2 0.07 0
0
3 99.94 0.06 0 0 0 04 99.28 0.14 0.24 0.13 0.19 0.015 99.92 0.03 0.06 0 0 06 99.35 0.16 0.22 0.13 0.1 0.037 99.97 0 0.03 0 0
0
8 99.58 0.09 0.09 0.15 0.099 99.91 0 0.05 0.05 010 99.33 0.21 0.19 0.23 0.0411 99.79 0.05 0.12 0.05 012 99.54 0.09 0.11 0.18 0.0813 99.98 0 0.02 0 014 99.42 0.23 0.17 0.11 0.0715 99.79 0.1 0.12 0 016 99.36 0.16 0.25 0.19 0.0417 99.84 0.08 0.06 0.02 018 99.92 0.01 0.05 0 0.0119 99.98 0.02 0 0 020 99.89 0.04 0.04 0.03 021 100 0 0 0 022 99.43 0.1 0.16 0.21 0.123 99.9 0.02 0.07 0 024 99.36 0.14 0.15 0.21 0.1425 99.85 0.1 0.05 0 026 99.63 0.15 0.09 0.1 0.0327 99.98 0 0.02 0 028 99.95 0 0.04 0.02 029 100 0 0 0 030 100 0 0 0 0
TABLE VIITHE PERCENTAGE FOR EACH L IN TERMS OF THE NUMBER OF NONZERO
COEFFICIENTS WITH A GIVEN L RELATIVE TO THE TOTAL NUMBER OF
NONZERO COEFFICIENTS, FOR THE CHROMA COMPONENT IN EACH
FRAME.
Frame The percentage for each L (%)No. 0 1 2 3 4 5 ≥ 6
1 99.01 0.31 0.31 0.23 0.11 0.04
02 99.58 0.14 0.14 0 0.14 04 99.76 0 0.12 0 0.12 014 99.82 0 0.09 0.09 0 0
other 100 0
The statistical distribution of L is studied. Under the main
profile, IBPBP..., the first 30 frames of ‘foreman’ at QCIF
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
7
resolution are tested with QP=12 and 4:2:0 sampling format.
For the luma component, the percentage for each L in terms of
the number of nonzero coefficients with a given L relative to
the total number of nonzero coefficients in each frame is listed
in Table VI. It is clear that most of the nonzero coefficients in
the luma component are coded with L=0, above 97% for the I
frame (the first frame) and above 99% for P and B frames (the
frames following the I frame). For coefficients with L > 0,
most have a very small L. Similarly, the percentage for each L
in the chroma component is listed in Table VII. It can be seen
that most of the nonzero coefficients in the chroma component
are coded with L = 0. Since the percentage of coefficients with
L > 0 is rather small, it can be expected that encryption of
the sub-suffix can not provide an effective scrambling effect.
In addition, the value of the sub-suffix with a given L can
also be set to the medium value in its range under this given
L. The error between the medium value and the correct sub-
suffix will not be beyond half of its range, which can further
help to recover the perceptual quality. Similarly to CAVLC,
an attack can directly concentrate on breaking the encryption
of the sign bits without considering the sub-suffix. Thus, it is
believed that encryption of the sub-suffix is not necessary.
C. Experimental results for encrypting the sub-suffix without
the sign bit under CABAC
For each nonzero coefficient, supposing the sign bit is cor-
rect, only the sub-suffix is encrypted. Fig. 11, the tenth frame
in ‘foreman’ with QP=12, shows that this has no scrambling
effect. Setting the sub-suffix to the medium value in its range,
the perceptual quality of encrypted videos can be further
improved. Table VIII lists test results under different QPs for
the ‘foreman’ video sequence. It verifies that encrypting the
sub-suffix can not produce effective scrambling. This means
that the method proposed by Shahid [21] [22] in fact only
relies on the encryption of the sign bits.
(a) (b)
Fig. 11. The 10th frame in ‘foreman’: (a) the original; (b) encryption of thesub-suffix without the sign bit.
TABLE VIIICOMPARISON OF THE PERCEPTUAL QUALITY OF THE RECONSTRUCTED
VIDEO AND THE ONE ENCRYPTING THE SUB-SUFFIX WITHOUT THE SIGN
BIT UNDER CABAC.
QP 12 18 24 30 36
Y 23.62 21.88 36.50 35.49 31.41Encrypted U 52.47 45.65 42.51 40.31 38.45
PSNR V 46.20 55.54 65.45 41.95 39.02Y 49.50 44.31 39.73 35.49 31.41
Reconstructed U 49.25 45.65 42.51 40.31 38.45V 50.37 47.52 44.44 41.95 39.02
SSIM Encrypted 0.962 0.966 0.972 0.946 0.900Reconstructed 0.996 0.989 0.973 0.946 0.900
D. Encrypting sign bits of nonzero coefficients under CABAC
Since the sign bits of nonzero coefficients are coded in the
bypass mode under CABAC, randomly flipping these sign bits
will not affect the compression ratio. Based on the analysis
above, compared with the method proposed by Shahid et al.
[21] [22], encrypting the sign bits of nonzero coefficients is
recommended for practical applications.
IV. A TUNABLE SCHEME FOR VIDEO ENCRYPTION IN
H.264/AVC
Since there exits a wide range of video related applications,
from on-line video programmes to video surveillance systems,
from mobile video players to video conferencing, it is very
likely that the required scrambling effect to protect the video
stream varies significantly for different scenarios. For some
entertainment applications, providing a preview by lightly
scrambling the video content is of commercial value. But
for some sensitive applications, a more complete scrambling
effect is mandatory. Thus, in practical applications, it is
better to provide a parameterizable scrambling method which
achieves different scrambling effects by changing the provided
parameters according to the given application. Under this case,
a user can very easily adjust the scrambling effect according
to the specific application at hand. However, most existing
methods [2]–[5] [12]–[22] can only provide a fixed scrambling
effect. Although there are some tunable schemes [25]–[27]
[29] [30] which claim to provide different scrambling levels,
the practical scrambling effect is very limited and some of
them achieve this tunable feature at the cost of an increased
compression ratio and/or a reduction of the video quality. The
tunable scheme for MPEG-4 Part 2 proposed by Li et al. [28]
can achieve a wide range of scrambling effects. Different from
MPEG-4 Part 2, H.264/AVC first adopts the intra prediction
mode and a more powerful entropy coding method, CABAC,
both of which contribute a large degree of improvement on the
compression ratio [7]–[10]. In this section, the tunable concept
of Li et al. [28] is extended into H.264/AVC by combining
the three main selective encryption methods of H.264/AVC. To
simplify the implementation of this tunable feature and reduce
the computational cost, a simple control mechanism to adjust
the scrambling effect is also proposed. A full investigation
of the proposed tunable scheme is carried out in CAVLC
under the baseline profile and in CABAC under the main
profile, whereas most the related previous work is only tested
in CAVLC under the baseline or main profile.
A. Improved Tunable Encryption Scheme
In the previous two sections, encrypting the sign bits of
nonzero coefficients (referred to as SNC) is analyzed in detail.
Compared to the two fast selective encryption methods for
nonzero coefficients proposed by Shahid et al. [20]–[22], SNC
is more efficient while with the equivalent security and is
recommended for practical applications. In previous literature,
two further important selective encryption methods have been
reported for H.264/AVC: one involves encrypting the intra
prediction mode (referred to as IPM) as proposed by Ahn et al.
[12], and the other involves encrypting the sign bits of motion
vectors (referred to as EMV) as proposed by Shi and Bhargava
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
8
[14] [15]. These two methods and SNC can maintain format
compliance and compression performance, and it has been
proposed that they can be combined together into a scheme
for H.264/AVC [18] [19].
Based on MPEG-4 Part 2, Li et al. [28] proposed a tunable
encryption scheme, which included three categories of syntax
element to be encrypted: intra DC coefficients, the sign bits
of nonzero coefficients (except intra DC) and the sign bits
of motion vectors. These three categories of syntax element
correspond to three different dimensions of the visual quality:
the low-resolution spatial view, the high-resolution spatial
view and the temporal motion information. The number of
elements in each category to be encrypted is controlled by
one control factor. Thus, the resulting perceptual scrambling
effect can be adjusted by changing all three control factors.
Macroblock/
Frame = +
Intra
Prediction Modes
Motion Vectors
Quantized DCT
Coefficients
Residual
Data
Prediction
Information
Three Encryption options
1
2
3
Low-resolution
Spatial View
High-resolution
Spatial View
Temporal
Motion Information
Fig. 12. The three encryption options for H.264/AVC.
In this paper, it is proposed to extend this concept into
H.264/AVC. In H.264/AVC, the video information can be
represented by three syntax elements: the quantized DCT
coefficients of the residual data, intra prediction modes and the
motion vectors for inter prediction. When considering format
compliance and compression performance, Fig. 12 shows the
encryption options that can be employed for H.264/AVC,
which focus on these syntax elements. Furthermore, the intra
prediction modes, the sign bits of nonzero coefficients and
the sign bits of motion vectors correspond to three different
dimensions in H.264/AVC: the low-resolution spatial view,
the high-resolution spatial view and the temporal motion
information, each of which can be encrypted by IPM, SNC and
EMV, respectively. With three control factors, the procedure
is described as follows:
1) encrypting the intra prediction mode by IPM with prob-
ability pipm;
2) encrypting the sign bits of nonzero coefficients by SNC
with probability psnc;
3) encrypting the sign bits of motion vectors by EMV with
probability pemv.
The perceptual scrambling effect of each control factor can
be expected as follows:
• pipm = 1 → 0 (psnc = 0 and pemv = 0): the low spatial
perceptual quality will change from “roughly impercep-
tible” to “very perceptible”, but motion information can
be observed;
• psnc = 1 → 0 (pipm = 0 and pemv = 0): the high spatial
perceptual quality will change from “almost impercepti-
ble” to “very perceptible”, but motion information can be
observed;
• pemv = 1 → 0 (pipm = 0 and psnc = 0): the motion
information will change from “almost imperceptible” to
“very perceptible”, but in the first few frames, the video
content and quality is clear as the first frame is an I frame
and the spatial perceptual quality is not affected.
Here, pipm = 1 → 0 indicates that the value of pipm is
changing from 1 to 0, inclusive, similarly for psnc and pemv.
Although changing both pipm and psnc could adjust the
spatial perceptual quality, it is not recommended to control
them by using just one control factor because each corresponds
to a different syntax element and will influence the perceptual
quality differently. By separately controlling these two factors,
many more levels of scrambling effects can be achieved.
The values of the control factors is also an important issue,
which was analyzed in [28]. Using a very small value for
each control factor means that only a small number of the
related syntax elements are encrypted, which would enable an
attacker to guess the encrypted syntax elements one by one.
Analysis in [28] showed that a control factor greater than 0.09
can ensure sufficient complexity to prevent such an attack.
B. Implementation Issues
In the work by Li et al. [28], a probabilistic quality control
with a decimal factor p is realized by generating a pseudo-
random decimal, r ∈ [0, 1], which is compared with p: the
current syntax element is only encrypted when r ≤ p. But in
fact, this random number generation procedure is not necessary
and can be skipped to reduce the computational cost. Alattar et
al. [26] proposed the encryption of every N -th I macroblock
and/or the header of every N -th predicted macroblock for
secure transmission of MPEG-1 video streams. This method
can be extended to carry out the probabilistic quality control
process in a simplified fashion. The proposed process is as
follows:
1) The control factor, p, can only take a value of 0 or 1
N,
where, N is an integer greater than zero;
2) If p = 0, no encryption is carried out;
3) If p = 1
N, every N -th related syntax element is en-
crypted. In particular, when N = 1, p = 1 means that all
related syntax elements are encrypted.
For simplicity and to meet the lower bound of the control
factor, in practical experiments, N is chosen from 1, 2, 4 and
8. The resulting scrambling effect is not linear to the control
factor (but generally a higher control factor is likely to produce
much more effective degradation of the perceptual quality),
and it is not necessary to adjust the scrambling effect in small
steps since the human visual system is not sensitive.
For the encryption operation, a stream cipher can be used
to generate a random bit sequence, which is XORed with the
bits of the related syntax elements.
C. Experimental results
Experiments were set up based on the reference software of
H.264/AVC, JM17.2 [34]. A stream cipher, Rabbit [23], was
used to produce the random bit sequence.
1) The scrambling effect: To investigate the scrambling
effect, three test video sequences were chosen from a standard
video library [35]. Experiments were implemented on these
three sequences under the baseline profile with CAVLC as
the entropy coding method and under the main profile with
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
9
0 5 10 15 20 25 300
0.2
0.4
0.6
0.8
1
Frame No.
SS
IM
pipm=0.125pipm=0.25pipm=0.5pipm=1
(a) foreman encrypted by IPM
0 5 10 15 20 25 300
0.2
0.4
0.6
0.8
1
Frame No.
SS
IM
psnc=0.125psnc=0.25psnc=0.5psnc=1
(b) foreman encrypted by SNC
0 5 10 15 20 25 300
0.2
0.4
0.6
0.8
1
Frame No.
SS
IM
pemv=0.125pemv=0.25pemv=0.5pemv=1
(c) foreman encrypted by EMV
0 5 10 15 20 25 300
0.2
0.4
0.6
0.8
1
Frame No.
SS
IM
pipm=0.125pipm=0.25pipm=0.5pipm=1
(d) coastguard encrypted by IPM
0 5 10 15 20 25 300
0.2
0.4
0.6
0.8
1
Frame No.
SS
IM
psnc=0.125psnc=0.25psnc=0.5psnc=1
(e) coastguard encrypted by SNC
0 5 10 15 20 25 300
0.2
0.4
0.6
0.8
1
Frame No.
SS
IM
pemv=0.125pemv=0.25pemv=0.5pemv=1
(f) coastguard encrypted by EMV
0 5 10 15 20 25 300
0.2
0.4
0.6
0.8
1
Frame No.
SS
IM
pipm=0.125pipm=0.25pipm=0.5pipm=1
(g) mobile encrypted by IPM
0 5 10 15 20 25 300
0.2
0.4
0.6
0.8
1
Frame No.
SS
IM
psnc=0.125psnc=0.25psnc=0.5psnc=1
(h) mobile encrypted by SNC
0 5 10 15 20 25 300
0.2
0.4
0.6
0.8
1
Frame No.S
SIM
pemv=0.125pemv=0.25pemv=0.5pemv=1
(i) mobile encrypted by EMV
Fig. 13. The tunable scrambling effect of IPM, SNC and EMV, when adjusting the control factors; The three video sequences are tested in CAVLC underthe baseline profile with QP=18.
0 5 10 15 20 25 300
0.2
0.4
0.6
0.8
1
Frame No.
SS
IM
pipm=0.125pipm=0.25pipm=0.5pipm=1
(a) foreman encrypted by IPM
0 5 10 15 20 25 300
0.2
0.4
0.6
0.8
1
Frame No.
SS
IM
psnc=0.125psnc=0.25psnc=0.5psnc=1
(b) foreman encrypted by SNC
0 5 10 15 20 25 300
0.2
0.4
0.6
0.8
1
Frame No.
SS
IM
pemv=0.125pemv=0.25pemv=0.5pemv=1
(c) foreman encrypted by EMV
0 5 10 15 20 25 300
0.2
0.4
0.6
0.8
1
Frame No.
SS
IM
pipm=0.125pipm=0.25pipm=0.5pipm=1
(d) coastguard encrypted by IPM
0 5 10 15 20 25 300
0.2
0.4
0.6
0.8
1
Frame No.
SS
IM
psnc=0.125psnc=0.25psnc=0.5psnc=1
(e) coastguard encrypted by SNC
0 5 10 15 20 25 300
0.2
0.4
0.6
0.8
1
Frame No.
SS
IM
pemv=0.125pemv=0.25pemv=0.5pemv=1
(f) coastguard encrypted by EMV
0 5 10 15 20 25 300
0.2
0.4
0.6
0.8
1
Frame No.
SS
IM
pipm=0.125pipm=0.25pipm=0.5pipm=1
(g) mobile encrypted by IPM
0 5 10 15 20 25 300
0.2
0.4
0.6
0.8
1
Frame No.
SS
IM
psnc=0.125psnc=0.25psnc=0.5psnc=1
(h) mobile encrypted by SNC
0 5 10 15 20 25 300
0.2
0.4
0.6
0.8
1
Frame No.
SS
IM
pemv=0.125pemv=0.25pemv=0.5pemv=1
(i) mobile encrypted by EMV
Fig. 14. The tunable scrambling effect of IPM, SNC and EMV, when adjusting the control factors; The three video sequences are tested in CABAC underthe main profile with QP=18.
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
10
12 18 24 30 360
0.2
0.4
0.6
0.8
1
QP
Avera
ge S
SIM
pipm=0.125pipm=0.25pipm=0.5pipm=1
(a) foreman encrypted by IPM
12 18 24 30 360
0.2
0.4
0.6
0.8
1
QP
Avera
ge S
SIM
psnc=0.125psnc=0.25psnc=0.5psnc=1
(b) foreman encrypted by SNC
12 18 24 30 360
0.2
0.4
0.6
0.8
1
QP
Avera
ge S
SIM
pemv=0.125pemv=0.25pemv=0.5pemv=1
(c) foreman encrypted by EMV
12 18 24 30 360
0.2
0.4
0.6
0.8
1
QP
Avera
ge S
SIM
pipm=0.125pipm=0.25pipm=0.5pipm=1
(d) coastguard encrypted by IPM
12 18 24 30 360
0.2
0.4
0.6
0.8
1
QP
Avera
ge S
SIM
psnc=0.125psnc=0.25psnc=0.5psnc=1
(e) coastguard encrypted by SNC
12 18 24 30 360
0.2
0.4
0.6
0.8
1
QP
Avera
ge S
SIM
pemv=0.125pemv=0.25pemv=0.5pemv=1
(f) coastguard encrypted by EMV
12 18 24 30 360
0.2
0.4
0.6
0.8
1
QP
Avera
ge S
SIM
pipm=0.125pipm=0.25pipm=0.5pipm=1
(g) mobile encrypted by IPM
12 18 24 30 360
0.2
0.4
0.6
0.8
1
QP
Avera
ge S
SIM
psnc=0.125psnc=0.25psnc=0.5psnc=1
(h) mobile encrypted by SNC
12 18 24 30 360
0.2
0.4
0.6
0.8
1
QPA
vera
ge S
SIM
pemv=0.125pemv=0.25pemv=0.5pemv=1
(i) mobile encrypted by EMV
Fig. 15. The tunable scrambling effect of IPM, SNC and EMV for different QP values in CAVLC under the baseline profile, when adjusting the controlfactors; The SSIM values of 30 frames are averaged for each test.
12 18 24 30 360
0.2
0.4
0.6
0.8
1
QP
Avera
ge S
SIM
pipm=0.125pipm=0.25pipm=0.5pipm=1
(a) foreman encrypted by IPM
12 18 24 30 360
0.2
0.4
0.6
0.8
1
QP
Avera
ge S
SIM
psnc=0.125psnc=0.25psnc=0.5psnc=1
(b) foreman encrypted by SNC
12 18 24 30 360
0.2
0.4
0.6
0.8
1
QP
Avera
ge S
SIM
pemv=0.125pemv=0.25pemv=0.5pemv=1
(c) foreman encrypted by EMV
12 18 24 30 360
0.2
0.4
0.6
0.8
1
QP
Avera
ge S
SIM
pipm=0.125pipm=0.25pipm=0.5pipm=1
(d) coastguard encrypted by IPM
12 18 24 30 360
0.2
0.4
0.6
0.8
1
QP
Avera
ge S
SIM
psnc=0.125psnc=0.25psnc=0.5psnc=1
(e) coastguard encrypted by SNC
12 18 24 30 360
0.2
0.4
0.6
0.8
1
QP
Avera
ge S
SIM
pemv=0.125pemv=0.25pemv=0.5pemv=1
(f) coastguard encrypted by EMV
12 18 24 30 360
0.2
0.4
0.6
0.8
1
QP
Avera
ge S
SIM
pipm=0.125pipm=0.25pipm=0.5pipm=1
(g) mobile encrypted by IPM
12 18 24 30 360
0.2
0.4
0.6
0.8
1
QP
Avera
ge S
SIM
psnc=0.125psnc=0.25psnc=0.5psnc=1
(h) mobile encrypted by SNC
12 18 24 30 360
0.2
0.4
0.6
0.8
1
QP
Avera
ge S
SIM
pemv=0.125pemv=0.25pemv=0.5pemv=1
(i) mobile encrypted by EMV
Fig. 16. The tunable scrambling effect of IPM, SNC and EMV for different QP values in CABAC under the main profile, when adjusting the control factors;The SSIM values of 30 frames are averaged for each test.
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
11
CABAC as the entropy coding method. For each sequence,
the first 30 frames were encrypted and then decoded without
decryption. In the baseline profile and the main profile, the
coding sequence is IPP...P and IBPBP...BP, respectively. To
evaluate the scrambling effect, only SSIM (Structural SIM-
ilarity) [31] is utilized in this section since as previously
mentioned it acts as a much better metric than PSNR for the
subjective observation [32].
In Fig. 13, the video sequences at QCIF resolution are
encoded in CAVLC under the baseline profile with QP=18.
It is clear that the scrambling effect of IPM, SNC and EMV
can be effectively adjusted by changing the corresponding
control factor. The relationship between each control factor
and the scrambling effect is nonlinear and may be different
for different test sequences. Generally, a higher control factor
will produce a much more effective scrambling effect, and
vice versa. On the other hand, a smaller control factor means
that fewer related syntax elements are encrypted and the
corresponding computational cost is relatively lower, and vice
versa. In Fig. 14, the same sequences are encoded in CABAC
under the main profile with QP=18. The same conclusion can
be obtained as in Fig. 13.
In Fig. 15 and 16, more experimental results for the scram-
bling effect are shown under different QP values. Generally, a
high QP means a high compression ratio and a relatively low
video quality. Here, for each test, the SSIM values of the 30
frames are averaged to represent the scrambling effect. It is
clear that under different QP settings, the above conclusion is
further confirmed: a higher control factor will produce a much
more effective scrambling effect, and vice versa.
Combining these three methods by separately adjusting the
three control factors, the tunable scheme can provide various
scrambling levels. According to the user’s requirement, these
three control factors can be adjusted to reach a compromise
between the computational cost and the scrambling effect.
It is also clear that when (pipm, psnc, pemv) = (1, 1, 1), the
resulting scrambling effect is the most effective. The 15-th
frame in the “foreman” sequence under different settings of
the control factors is presented in Fig. 17, which is encoded
in CABAC under the main profile. When encoded in CAVLC
under the baseline profile, similar results are obtained.
Among the three encryption methods, SNC contributes
much more to the scrambling effect than the other two, when
three control factors are set to the same value. This can be
concluded from the comparison of the three columns in Fig.
13, 14, 15 and 16. This is because the nonzero coefficients
represent the high-resolution spatial information for the video
and occupy a very high percentage of the video stream. IPM
is only to work for the I macroblocks, most of which are
located in the I frame. Thus, the resulting scrambling effect
of IPM is weaker than SNC. One frame of ’foreman’ is shown
in Fig. 17(e) and (h) to provide a perceptual sense, which are
encrypted by SNC and IPM, respectively. Combining both, a
more degraded scrambling frame is obtained as shown in Fig.
17(c). However, both SNC and IPM can only scramble the
texture in the video and can not encrypt the motion information
in a sequence. EMV is just to encrypt the motion information
and generally only affects the moving object in the video as
shown in Fig. 17(i), where most of the static background is
kept well while the face (the moving object) is scrambled.
From observation of Fig. 13(c), (f) and (i), and Fig. 14(c),
(f) and (i), when only EMV is applied, the first few frames
are left clear and are of good perceptual quality. Thus, EMV
should be combined with one or both of the other two when
a relatively strong scrambling effect is required.
(a) (1, 1, 1) (b) (1, 0.5, 0.5)
(c) (1, 1, 0) (d) (0.5, 0.5, 0.125)
(e) (0, 1, 0) (f) (0.5, 0.25, 0)
(g) (0.25, 0.125, 0) (h) (1, 0, 0)
(i) (0, 0, 1) (j) (0.125, 0, 0)
Fig. 17. The 15th frame in ‘foreman’ encrypted by different control factors(pipm, psnc, pemv).
2) Compression ratio: From the above tests, when IPM
is used in CABAC under the main profile, the compression
ratio is slightly affected, mostly in the range of -1% to 1%
and seldom reaching -2% or 2% only if QP=36. This small
change is due to the variable length coding used for the intra
prediction mode in CABAC. Also when SNC is applied in
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
12
CAVLC under the baseline profile, there is a small fluctuation
in the compression ratio, in the range of -0.05% to 0.05%. In
other test conditions, the compression ratio is unaffected.
3) Computational cost: For each of the three techniques
in the proposed scheme, the computational costs for different
values of the control factor are evaluated under the baseline
and main profiles. In the proposed scheme, the encryption
procedure is almost the same as the decryption procedure.
Since a decoder is much faster than the encoder, here the
computational time incurred by the encryption/decryption pro-
cedure is only compared with the normal decoding time. The
test is based on a PC (Intel Xeon CPU @3.20GHZ, 3GB
RAM). In Table IX and Table X, the computational overhead
times to encode the first 30 frames in the three test sequences
with QP=18 are presented and the permillages relative to
the normal decoding time are also listed. Here, the second
column, N , is the reciprocal of the corresponding control
factor. It is clear that no matter which profile is utilized,
the incurred computational time by the encryption/decryption
is minimal and the scheme can meet real-time processing
requirements. Among the three techniques, SNC consumes
much more of a time overhead than the other two. Thus, if
a faster encryption/decryption is required, the combination of
IPM and EMV without SNC would be a possible choice to
make further savings on the computational cost.
TABLE IXCOMPUTATIONAL COSTS OF IPM, SNC AND EMV FOR DIFFERENT
VALUES OF THE CONTROL FACTOR UNDER THE BASELINE PROFILE WITH
CAVLC AS THE ENTROPY CODING METHOD AND QP=18.
Video NTime cost (ms) Per thousand (‰)
IPM SNC EMV IPM SNC EMV
foreman
1 0.089 2.788 0.405 0.196 6.128 0.8912 0.049 1.659 0.239 0.109 3.646 0.5254 0.030 1.187 0.169 0.066 2.609 0.3728 0.021 0.972 0.135 0.047 2.136 0.296
coastguard
1 0.086 6.803 0.255 0.178 14.084 0.5282 0.047 4.048 0.150 0.098 8.381 0.3114 0.029 2.897 0.106 0.060 5.997 0.2208 0.021 2.372 0.085 0.043 4.910 0.175
mobile
1 0.087 11.914 0.368 0.146 19.989 0.6172 0.048 7.089 0.217 0.080 11.895 0.3644 0.029 5.073 0.153 0.049 8.511 0.2578 0.021 4.154 0.122 0.035 6.969 0.205
TABLE XCOMPUTATIONAL COSTS OF IPM, SNC AND EMV FOR DIFFERENT
VALUES OF THE CONTROL FACTOR UNDER THE MAIN PROFILE WITH
CABAC AS THE ENTROPY CODING METHOD AND QP=18.
Video NTime cost (ms) Per thousand (‰)
IPM SNC EMV IPM SNC EMV
foreman
1 0.103 2.499 0.400 0.225 5.492 0.8802 0.059 1.471 0.244 0.129 3.233 0.5354 0.035 1.052 0.172 0.077 2.312 0.3778 0.025 0.866 0.133 0.055 1.903 0.293
coastguard
1 0.088 5.889 0.253 0.181 12.192 0.5232 0.048 3.511 0.146 0.099 7.269 0.3024 0.029 2.507 0.107 0.061 5.191 0.2218 0.021 2.062 0.081 0.043 4.268 0.168
mobile
1 0.087 10.149 0.320 0.146 17.029 0.5372 0.048 6.052 0.185 0.080 10.155 0.3104 0.029 4.326 0.134 0.049 7.258 0.2258 0.021 3.548 0.107 0.035 5.952 0.180
4) Security: From a cryptographic viewpoint, the security
of the proposed scheme relies on the adopted stream cipher,
Rabbit, which to date has no known weaknesses [24].
Video encryption is also vulnerable to other attacks, such
as the replacement attack [36]. In the replacement attack, the
encrypted syntax elements are set to fixed values to improve
the video quality. But this often depends on the specified
application. For example, in an entertainment scenario, which
provides a video preview, a light scrambling effect is required.
In such a scenario, even though the use of the replacement
attack can improve the quality of the scrambled video, a high
quality copy can still not be obtained, and thus the replacement
attack is unsuccessful for such applications. However, for sen-
sitive video content which often requires a strong scrambling
effect, some useful information may be revealed using the
replacement attack. In the proposed scheme, when all three
control factors are 1, it becomes equivalent to the scheme
in [19], which has been verified to be secure against the
replacement attack [19]. Thus, it is suggested that for such
applications, all three control factors should be set to 1 to
achieve the most effective scrambling effect. As shown in
Fig. 18(a), by using the replacement attack on the “foreman”
sequence in the same setting as the test in Fig. 17(a), the
video quality can be improved to some extent. But it is still
unintelligible as shown in Fig. 18(b), which is the 15th frame
of the sequence. Here, the replacement attack is conducted by
setting all the intra prediction modes to the predicted one, and
all the sign bits of nonzero coefficients and motion vectors to
positive values.
0 5 10 15 20 25 300
0.2
0.4
0.6
0.8
1
Frame No.
SS
IM
Encrypted by (1,1,1)Under the replacement attack
(a) SSIM of “foreman”
(b) The 15th frame
Fig. 18. The replacement attack to the “foreman” sequence encrypted by(1,1,1) under the main profile in CABAC with QP=18.
V. CONCLUSIONS AND FUTURE WORK
In this paper, two fast selective encryption methods for
CAVLC and CABAC in H.264/AVC are analyzed. These two
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
13
methods can be thought of as extensions of the sign bit
encryption of nonzero transformed coefficients: for CAVLC,
the suffix of each nonzero coefficient is encrypted by XORing
with a random byte of the same length as the suffix; for
CABAC, the sub-suffix in the suffix of each nonzero coef-
ficient together with its sign bit is encrypted by an XORing
operation. Although these two methods encrypt many more
bits and seem to be more secure, it is demonstrated that they
are in fact not as efficient as only encrypting the sign bits of
nonzero coefficients. Experimental results indicate that without
encrypting the sign bits, these two methods can not provide
a perceptual scrambling effect. Thus, these two methods can
not enhance the security as claimed. Based on the analysis,
it is recommended that encryption of sign bits is much more
suitable for practical applications.
For practical applications, providing different scrambling
effects may be convenient to meet various user’s requirements.
Thus, a tunable encryption scheme is proposed to encrypt
intra prediction modes and the sign bits of motion vectors
together with the sign bits of nonzero coefficients. To simplify
the implementation, a simple control mechanism is adopted:
when the control factor is set to 1
N(N is an integer greater
than 0), every N -th corresponding syntax element is chosen
to be encrypted. According to the user’s requirement, the
perceptual scrambling effect can be adjusted by changing
the three control factors. Experimental results show that the
proposed scheme can provide this tunable feature by changing
the three control factors with no or very minimal impact on
the compression ratio. The computational costs of the proposed
scheme for different values of the control factor are evaluated
under the baseline and main profiles, which show that it can
meet real-time processing requirements. A security analysis
of the proposed scheme is also given. It is suggested that for
sensitive video content, all three control factors should be set
to 1 to achieve the most effective scrambling effect, which is
secure against the replacement attack.
Future work aims at designing an automatically tunable
scheme which can automatically adjust the three control
factors to achieve an expected psychovisual or security cri-
terion. Other possible work would be to further investigate
the security of existing selective video encryption methods
using some new types of attack derived from advanced image
and video processing techniques. Applying existing video
encryption techniques for privacy region protection in video
surveillance systems is another research focus [37] [38].
REFERENCES
[1] T. Lookabaugh and D. Sicker, “Selective Encryption for ConsumerApplications,” IEEE Communications Magazine, vol. 42, no. 5, pp. 124–129, 2004.
[2] A. Massoudi, F. Lefebvre, C. De Vleeschouwer, B. Macq, andJ. Quisquater, “Overview on selective encryption of image and video:challenges and perspectives,” EURASIP J. Info. Secur., vol. 2008, pp.1–18, 2008.
[3] T. Stutz and A. Uhl, “Survey of H.264 AVC/SVC encryption,” IEEE
Trans. Circuits Syst. Video Technol., vol. 22, no. 3, pp. 325–339, 2011.[4] J. Wen, M. Severa, W. Zeng, M. Luttrell, and W. Jin, “A Format-
compliant Configurable Encryption Framework for Access Control ofMultimedia,” in IEEE 4th Workshop on Multimedia Signal Processing,
2001, pp. 435–440.
[5] ——, “A Format-compliant Configurable Encryption Framework forAccess Control of Video,” IEEE Trans. Circuits Syst. Video Tech.,vol. 12, no. 6, pp. 545–557, 2002.
[6] ITU-T. Rec.(ISO/IEC 14496-10):2010, “Advanced Video Coding forGeneric Audio-visual Services,”.
[7] I. Richardson, The H.264 Advanced Video Compression Standard. Wi-ley, 2010.
[8] Y. Shi and H. Sun, Image and Video Compression for Multimedia
Engineering: Fundamentals, Algorithms, and Standards. CRC Pr ILlc, 2000.
[9] T. Wiegand, G. Sullivan, G. Bjontegaard, and A. Luthra, “Overview ofThe H.264/AVC Video Coding Standard,” IEEE Trans. Circuits Syst.
Video Technol., vol. 13, no. 7, pp. 560–576, 2003.[10] A. Puri, X. Chen, and A. Luthra, “Video Coding Using The
H.264/MPEG-4 AVC Compression Standard,” Signal processing: Image
communication, vol. 19, no. 9, pp. 793–849, 2004.[11] D. Marpe, H. Schwarz, and T. Wiegand, “Context-based Adaptive Binary
Arithmetic Coding in The H.264/AVC Video Compression Standard,”IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 620–636,2003.
[12] J. Ahn, H. Shim, B. Jeon, and I. Choi, “Digital Video ScramblingMethod Using Intra Prediction Mode,” PCM 2004, Springer, LNCS, vol.3333, pp. 386–393.
[13] S. Kwon, W. Choi, and B. Jeon, “Digital Video Scrambling UsingMotion Vector and Slice Relocation,” in Proc. 2nd Int. Conf. Image
Analysis and Recognition, Springer, LNCS, 2005, vol. 3656, pp. 207–214.
[14] C. Shi and B. Bhargava, “An Efficient MPEG Video Encryption Algo-rithm,” in Proc. 17th IEEE Symp. Reliable Distributed Systems, 1998,pp. 381–386.
[15] B. Bhargava, C. Shi, and S. Wang, “MPEG Video Encryption Algo-rithms,” Multimedia Tools and Applications, vol. 24, no. 1, pp. 57–79,2004.
[16] C. Shi and B. Bhargava, “A Fast MPEG Video Encryption Algorithm,”in Proc. 6th ACM Int. Conf. Multimedia, 1998, pp. 81–88.
[17] W. Zeng and S. Lei, “Efficient Frequency Domain Selective Scramblingof Digital Video,” IEEE Trans. Multimedia, vol. 5, no. 1, pp. 118–129,2003.
[18] S. Lian, Z. Liu, Z. Ren, and Z. Wang, “Selective Video EncryptionBased on Advanced Video Coding,” PCM 2005, Springer, LNCS, vol.3768, pp. 281–290.
[19] S. Lian, Z. Liu, Z. Ren, and H. Wang, “Secure Advanced Video CodingBased on Selective Encryption Algorithms,” IEEE Trans. Consumer
Electron., vol. 52, no. 2, pp. 621–629, 2006.[20] Z. Shahid, M. Chaumont, and W. Puech et al., “Fast Protection of
H.264/AVC by Selective Encryption,” in Proc. SinFra IPAL Symp., 2009,pp. 11-21.
[21] ——, “Fast Protection of H.264/AVC by Selective Encryption ofCAVLC and CABAC for I & P Frames,” IEEE Trans. Circuits Syst.
Video Technol., no. 99, pp. 565–576, 2011.[22] ——, “Fast Protection of H.264/AVC by Selective Encryption of
CABAC,” in Proc. IEEE Int. Conf. Multimedia and Expo, 2009, pp.1038–1041.
[23] M. Boesgaard, M. Vesterager, T. Christensen, and E. Zenner, “TheStream Cipher Rabbit,” ECRYPT Stream Cipher Project Report, vol. 6,2005.
[24] S. Babbage, C. Canniere, A. Canteaut, C. Cid, H. Gilbert, T. Johansson,M. Parker, B. Preneel, V. Rijmen, and M. Robshaw, “The eSTREAMPortfolio,” eSTREAM, ECRYPT Stream Cipher Project, 2008.
[25] J. Meyer and F. Gadegast, “Security Mechanisms for Multimedia Datawith The Example MPEG-1 Video,” Project Description of SECMPEG,
Technical University of Berlin, Germany, 1995.[26] A. Alattar, G. Al-Regib, and S. Al-Semari, “Improved Selective Encryp-
tion Techniques for Secure Transmission of MPEG Video Bit-streams,”in Proc. ICIP, 1999, pp. 256–260.
[27] M. Pazarci and V. Dipcin, “A MPEG2-transparent Scrambling Tech-nique,” IEEE Trans. Consumer Electron., vol. 48, no. 2, pp. 345–355,2002.
[28] S. Li, G. Chen, A. Cheung, B. Bhargava, and K. Lo, “On The Design ofPerceptual MPEG-video Encryption Algorithms,” IEEE Trans. Circuits
Syst. Video Technol., vol. 17, no. 2, pp. 214–223, 2007.[29] C. Wang, H. Yu, and M. Zheng, “A DCT-based MPEG-2 Transparent
Scrambling Algorithm,” IEEE Trans. Consumer Electron., vol. 49, no. 4,pp. 1208–1213, 2003.
[30] G. Hong, C. Yuan, Y. Wang, and Y. Zhong, “A Quality-controllableEncryption for H.264/AVC Video Coding,” PCM 2006, Springer, LNCS,Vol. 4261, pp. 510–517.
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
14
[31] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image QualityAssessment: From Error Visibility to Structural Similarity,” IEEE Trans.
Image Process., vol. 13, no. 4, pp. 600–612, 2004.[32] Z. Wang and A. Bovik, “Mean Squared Error: Love It or Leave It? A
New Look at Signal Fidelity Measures,” IEEE Signal Process. Mag.,vol. 26, no. 1, pp. 98–117, 2009.
[33] A. Bovik, The essential guide to Image Processing. Academic Pr, 2009.[34] JM reference software, ver. 17.2. [Online]. Available: http://iphome.hhi.
de/suehring/tml, 2011.[35] Video trace library. [Online]. Available: http://trace.eas.asu.edu/yuv/
index.html, 2011.[36] M. Podesser, H. Schmidt, and A. Uhl, “Selective Bitplane Encryption
for Secure Transmission of Image Data in Mobile Environments,” inProc. the 5th IEEE Nordic Signal Process. Symp., 2002, pp. 4–6.
[37] F. Dufaux and T. Ebrahimi, “Scrambling for Privacy Protection in VideoSurveillance Systems,” IEEE Trans. Circuits Syst. Video Technol., vol.18, no. 8, pp. 1168–1174, 2008.
[38] F. Dai, L. Tong, Y. Zhang, and J. Li, “Restricted H.264/AVC VideoCoding for Privacy Protected Video Scrambling,” J. Visual Commun.
Image Represent., vol. 22, no. 6, pp. 479–490, 2011.