View
213
Download
1
Category
Preview:
Citation preview
Influence of HEVC Compression on Event Detection in Security Video Sequences
Stanislav Vitek, Lukas Krasula, Milos Klima, Vojtech Hvezda
Czech Technical University in Prague
Marcelo Herrera Martinez
Universidad de san Buenaventura
Carrera 8H, Bogota, Colombia
mherrera@usbbog.edu.co
Technicka 2, 166 27 Prague 6, Czech Republic
viteks,krasuluk,klima@jel.cvut.cz
Abstract-In this paper, the impact of the degree of compression by High Efficiency Video Coding (REVC) on observers ability of detecting certain events in videos obtained by outdoor CCTV cameras is studied. This was done by extensive subjective testing. The testing was performed also for H.264IMPEG-4 Part 10 AVC compression on similar bitrates and the capabilities of detection were compared and thus the superiority of REVC was proven. A lso the threshold for encoder setting, where the increase of bit rate or decrease of quantization parameter (QP) do not improve observers ability to detect the events, was proposed.
Keywords-Closed Circuit Television (CCTV), Event Detection, High Efficiency Video Coding (REVC), Security, Video Compression, H.264IMPEG-4 Part 10 AVC
I. INTRODUCTION
In recent years the importance of Closed Circuit Televi
sion (CCTV) has significantly grown and it is being used
for various purposes. Its first generation was analogue and
recorded on the video cassettes. That had big demands on
storing space and the tapes also had to be manually changed
which was vulnerable to mistakes (e.g. overwriting). The
true potential of CCTV could not be utilized until the era of
digitalization which brought brand new possibilities. Record
ings could be stored on hard drives, the resolution was no
longer strictly limited, searching through the data became
much faster and easier, the system could be connected to
the internet etc.
This new advantages led to development of so called
smart CCTV [1]. These systems are able to analyze the
video and decide if the event is unusual, unexpected or
possible dangerous, and notify the operator. Demands on the
concentration are thus significantly lowered. Smart CCTV
systems can focus on various events. Nam and Tewfik in
their paper [2] proposed a system for detection of tampering,
methods for detecting human activity from the video were
proposed by Lv et al. [3] or Ribeiro and Santos-Victor [5],
and Kettnaker and Zabih developed an algorithm for count
ing people from records obtained by multiple cameras [5].
Some systems are also able to detect more of these events
at the same time (e.g. system proposed by Nam et al. [6]).
The biggest problem of CCTV videos is their quality and
the way how to evaluate it. The traditional ways to evaluate
the quality of video are not sufficient because in security
applications it is dependent on the expected outcome. It
is not necessary for the video to be of good subjectively
perceived quality if the quality is sufficient enough for the
specific task (e.g. face identification, car plate recognition,
event detection etc.).
The main aspects influencing the quality are the lighting
of the scene, the camera, video compression, quality of
transmission channel, and the display. All of them were
investigated in detail by Smith et al. [7]. In the scope of
this paper, only the compression is taken into account.
The choice of the compression is very important because
security cameras record mostly for very long time period
which means the huge amount of data to be stored. For
that reason CCTV recordings are often heavily compressed
and that brings massive quality degradations. Some of the
compression techniques, used in the field of security, were
evaluated by Klima et al. [8]. The impact of compression
of CCTV videos on the ability to identify the person's face
was investigated by Kovesi [9] or Keval and Sasse [10],
who proposes the use of DCT-based compression over
wavelet-based for these purposes. Kovesi also points out
that the color information is distorted due to quantization
and therefore the importance of pigmentation is lost.
So far the most advanced compression technique used
was H.264/MPEG-4 Part lO [11]. In this paper authors
investigate the influence of High Efficiency Video Coding
(HEVC) compression standard, developed by Joint Collab
orative Team on Video Coding (JCT-VC), on the event
detection in CCTV videos by extensive subjective tests. The
superiority of HEVC over H.264 was verified by Hanhart et
al. [12].
The paper is organized as follows: Section II describes the
subjective tests procedure, Section III is about used dataset,
test environment and equipment is discussed in Section IV,
Section V shows the results and Section VI concludes the
paper.
II. TESTING PROCEDURE
A. Subjects
32 observers participated in subjective test. All the partic
ipants were students from Faculty of Electrical Engineering,
Czech Technical University in Prague. Most of the students
were male. Every subject had normal or corrected to normal
vision. None of them had previous experience with security
video surveillance but each had experience with watching
videos of various degree of compression.
Untrained non-professional observers could be used be
cause CCTV surveillance systems are no longer exclusively
the matter of police and professional security companies.
The current cost of these systems makes them available
even for the private owners and the number of sold security
equipment grows every year.
Plenty of CCTV videos are also available online (e.g.
some towns and cities provide the security camera record
ings on their websites 2417). US web users are for example
encouraged to watch the Texas-Mexican borders via their
web browsers and report eventual suspicious situations I.
Similar example can be found in east London, where sub
scribers to the community safety channel are able to view
the digital CCTV video on their televisions2.
For this reasons, it is necessary to conduct subjective
tests not only with trained professionals but even with naive
observers.
B. Procedure
There is a plenty of methodologies for subjective testing
standardized. Probably the most important ones, like Double
Stimulus Impairment Scale (DSIS) or Double Stimulus
Continuous Quality Scale (DSCQS) and many others, can be
found in ITU-R Recommendation BT.500-12 [13]. However,
all of these procedures are defined for the assessment of
perceived quality in a classical way. For the purpose of this
paper a different method was needed.
Before the beginning of the test the introductory pre
sentation was given, in which the purpose of the study
was explained, parameters of the test and the procedure
were summarized and some important notifications, what is
allowed and prohibited, were stressed. Every observer was
supposed to watch 20 videos. The duration of every video
was 10 seconds. All of the subjects were given an evaluating
sheet, according to a group they were assigned to, with 20
situation descriptions and following questions.
Before watching the video, participant was supposed to
read the description of the situation. Typically, the person
requiring observers attention was described (e.g. "person in
a black coat crossing the road"). Then subject watched the
video and after that had to answer few questions (e.g. "is the
person carrying something"). In this way they were supposed
to answer all of the questions to all the videos.
Subjects were not allowed to watch videos more than once
and to stop the videos. They were also instructed not to
guess and enter the answer only when they were sure about
I BBC News, "Web users to patrol US border," http;//news.bbc.co.uk! llhi/worl d/ameri cas/ 50403 72. stm (2006).
2BBC News, "Rights group criticises Asbo TV;' http://news.bbc.co.ukll/hi/england/london/4597990.stm (2006).
it. Otherwise they were supposed to mark the situation as
undetectable.
The duration of the test was about ten minutes, there
fore no problems with dropping observers attention should
emerge.
C. Test Parameters
Authors had chosen 20 videos from security surveillance
system and 5 degrees of compression for both H.264 and
HEVC. That gives a total of 200 videos. Every participant
was however supposed to see every video only once not to
be biased. Observers were therefore divided into 10 groups.
Input videos for every group were assigned randomly and the
order of videos in the test was also randomized. Which video
was assigned to which group can be seen Table I. The bold
numbers denote the number of the video and capitals stand
for the compression. Letters A - E represent the degrees of
compression for HEVC with decreasing quality (for more
information about the compressions used, refer to Section
III). F - J means the same for H.264IMPEG-4. Coordinates
lA therefore mean 1st video compressed by HEVC with
lowest Quantization Parameter. This video was evaluated by
group number 5 in this experiment.
Table I SERIAL NUMBERS OF GROUPS To EVALUATE VIDEOS
A B C D E F G H I J
1 5 1 3 10 9 6 4 8 7 2
2 6 4 3 7 9 10 1 5 2 8
3 1 3 9 2 6 7 5 8 4 10
4 2 5 7 3 8 9 1 10 4 6
5 3 2 5 4 8 6 9 1 7 10
6 6 2 8 3 1 10 9 7 5 4
7 10 5 1 3 9 4 7 2 6 8
8 10 6 3 4 1 2 7 8 9 5
9 6 2 10 1 4 8 5 3 7 9
10 7 5 4 1 2 8 10 6 9 3
11 2 6 1 8 3 4 10 7 5 9
12 3 6 5 1 7 8 4 2 10 9
13 6 4 2 8 3 9 10 7 1 5
14 1 2 4 9 8 5 10 6 7 3
15 4 6 7 3 2 1 5 9 8 10
16 9 10 8 7 4 5 2 1 6 3
17 1 9 4 3 2 5 7 10 6 8
18 10 4 2 3 6 1 7 8 9 5
19 5 6 9 2 8 4 7 1 10 3
20 8 5 10 9 7 6 4 2 3 1
III. DATASET
As stated above, 20 videos from CCTV surveillance
system were used. The recordings were obtained within the
joint project of the TNO Physics and Electronics Laboratory
(TNO-FEL) in Hague and Czech Technical University in
Prague. In this project the effectiveness of CCTV was
studied [14].
Every video was 10 seconds long and contained certain
event subjects were supposed to detect. They were typically
asked about the number of people riding a bicycle, to decide
if the person is carrying something, if they are able to read
a number on the bus or tram, how many people are getting
out of the car and so on.
The original videos were of resolution 768x576 pixels,
25 frames per second and with YUV 4:2:0 color sampling.
They were than compressed by HEVC and H.264IMPEG-
4 compressions. HM 9.0 encoder3 was used for HEVC
compression and JM 18.4 encoder4 for H.264. The Group
of Pictures was set to 4 and Intra Period to one frame for
both encoders. The coding order was 0 1 2 3 4. For HEVC
authors decided to use the Low Delay (LD) configuration
over Random Access (RA) configuration. More information
about the encoder settings could be found in Table II.
Table II DETAILED SETTINGS FOR BOTH ENCODERS
Codec AVCIH.264 HEVC/H.265
Encoder 1M 18.4 HM 9.0
Profile Main Main
Reference Frames 4 4
RID Optimization on on
Motion Estimation EPZS EPZS
Weighted Prediction off
Search Range 64 64
Group of Pictures 4 4
Hierarchical Encoding on on
Temporal Levels 4 4
Intra Period 1 frame 1 frame
Deblocking off off
Rate Control on off
8x8 Transform on
Adaptive Loop Filter off
Coding Unit size I depth 64 I 4
Transform Unit size min I max 4 I 32
5 different degrees of HEVC compression were applied
on videos. The difference was set by different Quantization
Parameters (QP) therefore no rate control was used. The QPs
were 37, 42, 46, 49 and 5 l. The column marked as A in I
stands for videos compressed by HEVC with QP = 37, B
with QP = 42 etc.
To compare the performance of both compressions, av
erage bit rates were calculated from the outcome of the
encoder. The specific values are stated in Table III. These
values were used in the H.264 encoder to create videos
with similar bit rates as by HEVC. Here the rate control
3 https:llhevc.hhi. fraunho fer. de/s vn/s vn_HE V CSo ftware/branches/HM-9.0-devl
4http://iphome.hhi.de/suehringltml
Figure 1. Multimedia Technology Groups post-processing lab.
was employed. The videos in column F in I are therefore
compressed by H.264 with expected bit rate 46.7 kbps, G
with 25.9 etc.
Table III AVERAGE BIT RATES FOR VIDEOS COMPRESSED BY HEVC WITH
DIFFERENT QP
Quantization Parameter Average Bit Rate (kbps)
37 46.7
42 25.9
46 15.7
49 1�8
51 8.5
In some cases, the H.264 encoder was not able to set the
bit rate correctly. This was mostly the case of two lowest
expected bit rates (10.8 kbps and 8.5 kbps, respectively)
where the compression is really massive and it is almost
impossible for the H.264 encoder to achieve these bit rates.
IV. ENVIRONMENT AND EQUIPMENT
The subjective tests were conducted in Multimedia Tech
nology Group's5 post-processing lab at Faculty of Elec
trical Engineering, Czech Technical University in Prague.
The laboratory organization can be seen in Figure l. Ten
workspaces were available for evaluation. Every workspace
was equipped with color calibrated LCD display. The reso
lution of the screens was 1600x 1200 pixels.
No special software for displaying videos was used. All
the content was played using ordinary Windows Media
Player.
V. RESULTS
Considering that most of the questions in the sheet had
three parts, the maximum score for every video was 3. If
the participant answered every part of the question correctly
(that means was able to reliably detect and recognize the
5http://www.multimediatech.cz
, , Compre;s;onmethodx(-)
Figure 3. Two-sampled right-tailed t-test results.
event) his score for the question was 3. Therefore result of
every question could be 3, 2, I or O.
The processing of results was done according to ITU-R
Recommendation BT.500-12 [13]. Mean scores for every ap
plied compression with corresponding confidence intervals
at level of significance 0.05 can be found in Figure 2.
For the proper comparison of the influence of com
pressions, two-sample right-tailed t-test was employed. Its
function is to decide if the scores for different compressions
are from the same distribution (i.e. if the event detection
capability of observers is the same). The t-test results are
visualized in Figure 3. In cases when detection in videos
compressed by method on y axis was statistically signifi
cantly more successful than in videos compressed by method
on x axis, the particular square is white. Otherwise it is
black.
The average percentage of successful detection for each
compression method can be found in Table IV.
Table IV AVERAGE PERCENTAGE OF SUCCESFUL DETECTION
Compression Method6 Average Percentage (%)
A 88.5
B 88.3
C 70.1
D 5l.3
E 54.8
F 74.2
G 61.8
H 3l.9
38.9
40.2
The results show several interesting things. First important
thing is that detection in videos compressed by method
B (HEVC with QP = 42, average bit rate 25.9 kbps) is
more suitable for the detection than method F (H.264 with
6 A - E: HEVC from best quality to worse; F - J: H.264/MPEG-4 from best quality to worse
expected bit rate 46.7). That proofs that videos compressed
by HEVC are better for the detection than videos of almost
double the bit rate when compressed by H.264. This hypoth
esis was not always confirmed (C did not outperform G, D
did not outperform I and J). This is probably caused by the
extension of confidence intervals on these lower bit rates
which is the consequence of massive artifacts complicating
the reliable detection. Some events were easier to detect than
the others and this difference is much more significant when
the videos are heavily distorted.
The other and maybe even more important outcome of
the study is that there is no significant difference between
detection capabilities when using methods A and B. That
means that using HEVC with QP lower than 42 (or bit
rates higher than 25.9) does not improve the quality of event
detection and is thus redundant for the CCTV surveillance
purposes.
The quality indexes for videos no. 2 and 12 measured by
objective video quality metrics and actual bit rates of particu
lar videos are stated in V and VI, respectively. A MATLAB
based framework developed by Murthy [15] was employed
to assess the quality by four criteria - VQM [16], Averaged
PSNR, Averaged SSIM [17] and Averaged VSNR [18]. Note
that unlike the other metrics, VQM decreases with better
quality. Also the H.264 encoder in case of video 12 reached
the same bit rates for rate control set as 10.8 and 8.5.
VI. CONCLUSION
In this work authors studied the suitability of HEVC
compression for CCTV surveillance systems purposes. 20
video sequences obtained by outdoor security camera were
compressed using 5 different settings of parameters. The
same videos were also compressed by H.264IMPEG-4 Part
10 AVC with comparable bit rates. These video sequences
were than shown to 32 observers. They were asked to detect
particular events in the videos.
The results showed that, in most cases, HEVC enables
more reliable detection than H.264 even with half the bit
rate necessary.
It was also proven that bit rates higher than 26 kbps do
not improve observers' ability to detect the events and this
bit rate is therefore sufficient for security video compressed
by HEVC. This bit rate was obtained with QP set to 42.
ACKNOWLED GMENT
This work was supported by the grant No. Pi021l01l320
Research and modeling of advanced methods of image qual
ity evaluation of the Grant Agency of the Czech Republic.
REFERENCES
[1] C. Held, J. Krumm, P. Markel and R. P. Schenke, "Intelligent Video Surveillance," in Computer, Vol. 45, No. 3, pp. 83-84, March 2012.
2_5 ++ ,--r-
I
0.)
A B C
-r-
,--r-
r--,---
D E F G Compression method (-)
c:::::J KEVC
c:::::J H.264
r-- -r-
,---
H
Figure 2. The subjective tests results with confidence intervals on significance level 0.05.
Table V OBJECTIVE QUALITY METRICS RESULTS AND ACTUAL BIT RATES FOR VIDEO No. 2
Compression Method VQM Averaged PSNR Averaged SSIM Averaged VSNR Actual Bit Rate (kbps)
A 0.4720 33.3712 0.9299 25.5473 44.0960
B 0.6271 30.2172 0.8844 20.6540 25.0996
C 0.7796 27.4695 0.8221 16.6779 15.4718
D 0.9068 25.6711 0.7641 14.1261 10.8579
E 0.9678 24.4582 0.7222 12.5501 8.6315
F 0.7698 25.8306 0.8727 17.6400 51.0100
G 0.9004 25.0564 0.8272 15.6801 27.8400
H 1.0019 24.1259 0.7807 13.6564 17.1800
I 1.0154 23.6163 0.7626 12.8200 12.9000
J 1.0172 23.5519 0.7591 12.6695 12.4000
Table VI OBJECTIVE QUALITY METRICS RESULTS AND ACTUAL BIT RATES FOR VIDEO No. 12
Compression Method VQM Averaged PSNR Averaged SSIM Averaged VSNR Actual Bit Rate (kbps)
A 0.4664 33.6295
B 0.6382 30.3134
C 0.7934 27.6619
D 0.9027 25.7214
E 0.9733 24.6455
F 0.6944 29.5971
G 0.8257 27.2824
H 0.9144 25.7122
I 0.9576 24.6300
0.9576 24.6300
[2] J. Nam and A. H Tewfik, "Detection of gradual transitions in video sequences using B-spline interpolation," in IEEE Transactions on Multimedia, Vol. 7, No. 4, pp. 667-679, August 2005.
[3] F. Lv, J. Kang, R. Nevatia, I. Cohen and G. Medioni, "Au-
0.9323 25.6088 43.4193
0.8832 20.3115 23.9281
0.8217 16.4305 14.3311
0.7626 13.8593 9.91111
0.7212 12.3980 7.91037
0.8710 19.9469 48.2200
0.8270 16.4995 28.8000
0.7893
0.7641
0.7641
14.3492 16.5000
13.0417 11.7600
13.0417 11.7600
tomatic tracking and labeling of human activities in a video sequence," in Proceedings of the 6th IEEE International Workshop on Performance Evaluation of Tracking and Surveilance, 2004.
[4] P. C. Ribeiro, J. Santos-Victor, "Human activity recognition
from video: modeling, feature selection and classification architecture," in International Workshop on Human Activity Recognition and Modeling, pp. 61-70, 2005.
[5] V. Kettnaker and R. Zabih, "Counting people from multiple cameras," in IEEE International Conference on Multimedia Computing and Systems, Vol. 2, pp. 267-271, July 1999.
[6] Y. Nam, S. Rho and J. H. Park, "Intelligent video surveillance system: 3-tier context-aware surveillance system with metadata," in Multimedia Tools and Applications, Vol. 57, No. 2, pp. 315-334, March 2012.
[7] R. A. Smith, K. MacLennan-Brown, J. F. Tighe, N. Cohen, S. Triantaphillidou and L. W. MacDonald, "Colour analysis and verification of CCTV images under different lighting conditions," in Image Quality and System Performance, Proc. SPIE, Vol. 6808, 2008.
[8] M. Klima and K. Fliegel, "Image compression techniques in the field of security technology: examples and discussion," in 38th Annual 2004 International Carnahan Conference on Security Technology, pp. 78-284, October 2004.
[9] P. Kovesi, "Video Surveillance: Legally Blind?" in Digital Image Computing: Techniques and Applications (DICTA 2009), pp. 204-211, 2009.
[10] H. U. Keval and M. A. Sasse, "Can we ID from CCTV: image quality in digital CCTV and face identification performance," in Mobile Multimedia/Image Processing, Security, and Applications, Proc. SPIE, Vol. 6982, 2008.
[11] ISO, "Information technology - Coding of audio-visual objects - Part 10: Advanced Video Coding," in Tech. Rep. ISOIIEC 14496-10:2005, ISOIIEC, 2005.
[12] P. Hanhart, M. Rerabek, F. De Simone and T. Ebrahimi, "Subjective quality evaluation of the upcoming HEVC video compression standard," in Applications of Digital Image Processing XXXV, Proc. SPIE, Vol. 8499, 2012.
[13] ITU-R Recommendation BT.500-12, "Methodology for the subjective assessment of the quality of television pictures,", September 2009.
[14] G. van Voorthuijsen, H. van Hoof, M. Klima, K. Roubik, M. Bernas, et aI. , "CCTV Effectiveness Study," in Proc. of 39 IEEE ICCST, Piscataway: IEEE, pp. 105-108, 2005.
[15] A. V. Murthy and L. J. Karam, "A MATLAB-basedframework for image and video quality quality evaluation," in Proceedings QoMEX 2010, 2010.
[16] M. Pinson, S. Wolf, "A new standardized method for objectively measuring video quality," in IEEE Transactions on Broadcasting, Vol. 50, No. 3, pp. 312-446, September 2004.
[17] Z. Wang et aI. , "Image quality assessment: From error visibility to structural similarity," in IEEE Transactions on Image Processing, Vol. 13, No. 4, pp. 600-612, April 2004.
[18] D. M. Chandler and S. S. Hemami, "VSNR: A Wavelet-Based Visual Signal-to-Noise Ratio for Natural Images," in IEEE Transactions on Image Processing, Vol. 16, No. 9, pp. 2284-2298, September 2007.
Stanislav V itek graduated at the Czech Technical Uni
versity in Prague in 2002, PhD in 2008. Recently he is
an assistant professor with Dept. of Radioelectronics at the
Faculty of Electrical Enginnering at the Czech Technical
University in Prague. His main research interests are as
sistive technologies, multimedia processing and database
systems.
Lukas Krasula graduated at the Czech Technical Uni
versity in Prague in 2013. Currently he is a Ph.D student at
the Faculty of Electrical Engineering at the Czech Technical
University in Prague. His research interests are oriented to
image processing and image compression for security and
multimedia applied imaging systems.
Milos Klima graduated at the Czech Technical University
in Prague in 1974, PhD in 1978. He is a full professor from
2000. Recently he is the head of Dept. of radioelectron
ics at the Faculty of Electrical Engineering at the Czech
Technical University in Prague and the leader of Multimedia
Technology Group. His research interests are oriented to the
image sensing, image processing and image compression for
security and multimedia applied imaging systems. He has
participated at the ICCST since 1991.
Vojtech Hvezda graduated at Czech Technical University
in Prague in 2013. His research interests are oriented to
image processing and image compression for security and
multimedia applied imaging systems.
Marcelo Herrera Martinez graduated at the Czech
Technical University in Prague in 2003, PhD in 2010. His
research topics are Psychoacoustics, Noise Control, and
Digital Signal Processing (specially the fields of Perceptual
Compression and Broadcasting Technology). At the present
time he is a full-time research professor at Universidad de
San Buenaventura-Bogota.
Recommended