An Efficient Method For Gradual Transition Detection In Presence Of Camera Motion

International Journal of Advance Foundation and Research in Computer (IJAFRC)

Volume 1, Issue 6, June 2014. ISSN 2348 - 4853

16 | © 2014, IJAFRC All Rights Reserved www.ijafrc.org

An Efficient Method For Gradual Transition Detection In

Presence Of Camera Motion Salim A. Chavan1, Amol A. Alkari2, Dr. Sudhir G. Akojwar3

1Department of E&TC, Associate Prof and Vice Principal, DBNCOET, Yavatmal. 2Department of E&TC, PG Student, DBNCOET, Yavatmal.

3Department of E&TC, Professor and Head, RGCER&T, Chandrapur.

[email protected],[email protected],[email protected]

A B S T R A C T

Gradual transition detection is one of the most important issues in the field of video indexing and

retrieval. Among the various types of gradual transitions, the fade and dissolve type the gradual

transition is considered the most common one, but it is most difficult one to detect. In most of the

existing fade and dissolve detection algorithms, the false detection problem caused by motion is

very serious. In this paper we present a novel gradual transition detection algorithm using local

key-point integrated with twin comparison method that can correctly distinguish fades and

dissolve from object and camera motion.

Index Terms: Gradual transition Detection, Fades, Dissolve, Recall, Precision and Retrieval

Success Index.

I. INTRODUCTION

The necessity for intelligent processing and analysis of multimedia information has been rising on a

regular basis. Researchers have built a number of technologies for intelligent video management which

includes the shot transition detection, key frame extraction, video summarization, video retrieval and

more. Gradual transition detection is considered to be the most difficult and significant issue of practical

value amongst all the others.

Videos have become a popular means of entertainment over the years. Traditionally, videos were created

only by a limited number of producers. But now, the commoners as well can afford and use with

simplicity the video capturing devices, as a result of which there is an increase in the amount of user

generated videos. A large collection of videos is readily available on various video sharing websites.

Searching for videos with desired content from such a large collection has become a tedious task. Also,

viewers want to have a better control over the video data. As a result, many video browsing, indexing and

summarization applications are being developed.

Altogether the amount of video data available to anyone increases more rapidly than anybody can handle

on his or her own. Computers and video search engines are needed to make it possible to find and access

relevant information from this huge amount of data. Most of the video search engines available to the

public are still based on textual search and are thus dependent on manual annotation of the data.

However, manual annotation is slow, expensive and sometimes inaccurate and heterogeneous since the

annotations are always subjective and dependent on the annotator’s cultural background, language and

opinions. Automatic annotation methods are needed to really be able to access all the video data

available. It assists the users in the retrieval of favoured video segments from a vast video database

efficiently based on the video contents with the aid of user interactions. In general, the video retrieval

system can be divided into two principle constituents i.e. a module for the extraction of representative

characteristics from video segments and defining a fitting similarity model to position similar video clips




from video database. A large number of approaches employed a wide variety of features to symbolize a

video sequence. One of such approach is local key point approach which is stated in this paper.

A common first step for shot boundary detection is to segment a video into elementary shots, each

comprising a continuous in time and space. These elementary shots are composed to form a video

sequence during video sorting or editing with either cut transitions or gradual transitions of visual effects

such as fades, dissolves and wipes. Shot boundaries are typically found by computing an image-based

distance between adjacent frames of the video and noting when this distance exceeds a certain threshold.

The distance between adjacent frames can be based on statistical properties of pixels, compression

algorithms or edge differences.

There are two basic types of shot transitions abrupt transition and Gradual transition. Gradual shot

change detection is one of the most important issue in the field of video Processing. The study shows that

it is comparatively easy to detect hard transition but detection of gradual transition is challenging issue.

In this paper key points in the frames are detected and then key point of adjacent frames are matched to

detect the gradual transition. This works efficiently for scaling and rotation of frames. This approach

gives an efficient algorithm for abrupt and gradual transition in presence camera as well as objects

motion.

II. RELATED WORK

An effective algorithm for dissolve detection with camera and objet motion is proposed in [1]. This paper

provides an approach to overcome the problem of misdetection and false detection by modeling a

dissolve transition and selecting a proper threshold. Although there are problem in Illumination this

algorithm works with high efficiency. An approach to detect the gradual transition and its type by using

B-spline Interpolation is explained in [2]. Special detectors are used to detect the type of transition. The

problem of detection of fades in film cadence is mentioned in [3]. The difference in video and film

cadence and an approach to detect fades in film cadence is explained in above paper. The method to

detect fades by observing changes occurring in histogram is explained in [4]. There is an increased need

to extract key information automatically from video for the purposes of indexing, fast retrieval, and scene

analysis. To support this vision, reliable scene change detection algorithms is explained in [4]. The

algorithms have been proposed for both sudden and gradual scene change detection in uncompressed

and compressed video. The author used the properties of the fading operation and extracts these features

in the luminance histogram. In this an image is divided into four parts and histograms of all the four parts

are considered to detect the fade type. Results show that the proposed algorithm can be used in both

uncompressed and compressed video to detect fade regions with a high reliability. The approach to

detect dissolve based on accumulating histogram difference (AHD) with the support point is explained in

[5].

Two different algorithms for detection of dissolve and wipe based on image histogram are proposed in

[6]. A fuzzy logic approach to integrate hybrid features for detecting shot boundaries is explained in [7].

In this paper, author proposes a fuzzy logic approach to integrate hybrid features for detecting shot

boundaries inside general videos. The fuzzy logic approach contains two short dissolved shots, and the

other for detection of gradual shot cuts. These two modes are unified by processing modes, where one is

dedicated to detection of abrupt shot cuts including that mode-selector to decide which mode the scheme

should work on in order to achieve the best possible detection. The hybrid features used in this paper are

color histogram, texture change and edge variance are integrated for better performance in this paper.

The partition of video into shots and detection of abrupt and gradual transition using foveated

representation of video is explained in [8]. Foveated imaging is the technique in which the image has

different resolution at different parts. A method to partition a video into shot is proposed in [9] where




the motion of image at each time instant is represented by two dimensional models. An adaptive time

window is applied for shot boundary detection is explained in [10]. The local color information is used to

eliminate false detection by abrupt change of illumination such as camera flash and thunder.

Detection of gradual transition i.e. fade in, fade out, and dissolves by selecting a threshold by considering

an adaptive, robust and analytical study of mathematical model of transition is proposed in [11]. Here the

objective has been to accurately classify the type of transitions (fade-in, fade-out and dissolve) and to

precisely locate the boundary of the transitions. The false detection is removed by motion transition

removal. A neural network algorithm to detect the dissolve is explained in [12] where by giving large no

dissolve as an example are taken to create a dissolve synthesizer. The examples used to create the

synthesizer are taken from video database with high accuracy. These all examples are provided as a

training set to the neural network for dissolve detection. The approach to detect shot boundary by using

edge complexity is explained in [13]. The image quality of archived films often degrades gradually which

makes false detection during shot segmentation. The problem about the scene change detection in

archived films is explained in [14]. For abrupt change shot detection, the difference between histograms

of block-bases adjacent frames and across frames based on the brightness information is used, the similar

histogram problem is treated and the error caused by flicker is eliminated. Then the gradual change of

shots is detected based on the fact that the variances of inter-frame are usually monotonically increasing

in fade-in sequence and monotonically decreasing in fade-out sequence. The image quality of archived

films often degrades gradually so to detect scene change in such scenario an effective method is proposed

in [14]. A fuzzy logic approach to detect the shots in sport video is explained in [15].The fuzzy logic

approach is consider for two reasons there is no fix or hard limit for threshold selection as the hard

threshold makes the decision binary and there is no need of large training set. In fuzzy logic approach the

problem of shot boundary detection is formulated in terms of features used to detect the shot boundary

with certain value of membership function. In [16] author has proposed a new method for shot change

detection that is less sensitive to object or camera motion due to the robustness of the feature tracking

algorithm. A method for finding type of transitions is also proposed. The detection problem is solved by

using object recognition techniques, rather than some overall features, so that shot changes can be

distinguished from objector background motions in a scene. The advantage in [16] is that the method

does not completely depend on threshold of the number of matched points; the threshold applied in our

method is varied with the Local maxima and minima of the number of matches which can handle the

variations of transitions better. Also the method do not match neighboring frames or frames apart from a

fixed period only, but also match nonadjacent frames inferred by shot-change interval estimation which

can further increase the detection accuracy.

III. METHODOLOGY

The methodology consist of two main parts first one is finding key point and defining CCH descriptor and

second one is using twin comparison approach. The important steps involved in gradual transition

detection method these are described below

A. Defining a CCH Descriptor

In this method instead of considering all feature, high level features are consider. These high feature

points are extracted from an image by considering its RGB values. The main issue in developing invariant

local descriptors is how to represent a region more effectively and discriminatively. The color histogram

is one option for textural explanation; but it is sensitive to illumination changes. Instead, we consider a

technique that computes the contrast values of points within a region with respect to a salient corner

(Key points). We assume that many key points have already been extracted from an image. For each key




point Pc at the image coordinate (Uc, Vc), we locate an n*n local region R surrounding Pc. Let P denote a

pixel at the image coordinate (u, v) in region R. We compute the contrast value of C (P) of P in R as

C(P) = I (P) – I(Pc)

Where I (P) and I (Pc) are the intensity values of P and Pc respectively. We then construct a descriptor of

Pc based on these contrast values and separate R into several non-overlapping regions R1, R2…..Rt.

Without loss of generality, we use a log-polar coordinate system (r, ө) to perform the division, as shown

in Fig.1.

Figure 1. Log Polar Diagram of CCH descriptor

To ensure that the descriptor is invariant to image rotations, the direction of in the log-polar coordinate

system is set to coincide with the edge orientation of Pc. Considering the importance of representing a

sub region Ri efficiently and discriminatively, we consider a histogram-based representation because a

histogram is relatively insensitive to non-uniform deformations of a region. A perceptive way to employ

the histogram feature is to gather the contrast values in a sub region into a histogram bin. However,

summations of positive and negative contrast values may reduce the discriminating response of the bin.

Thus to improve the discriminative ability of the descriptor, we use both positive and negative histogram

bins of contrast values for each sub region, as described below For the sub region Ri , we define the

positive contrast histogram bin respective to Pc as

Where Ri+ is the number of positive contrast values in R . In a similar manner, the negative contrast

histogram bin is defined as

Where Ri- is the number of negative contrast values in R.

By combining the contrast histograms of all the sub regions Pc into a single vector, the CCH descriptor of

in association with its local region R can be defined as follows

CCH (Pc) = (HR1+, HR1-, ................ HRi+, HRi-)




This CCH descriptor was evaluated by using a large set of images undergoing various geometric

transformations. The evaluation results show that the CCH descriptor is efficient and highly accurate in

determining feature correspondence.

B. Locating Transitions by Matching Adjacent Frames

At this stage a set of local descriptor and key point is ready. We observe that objects or scenes are

replaced during transitions, even though they may be moving or rotating within the shot. Most methods

of shot change detection produce many false alarms when objects or cameras move, as they can only

detect changes in some overall features between the same image locations of adjacent frames in a video.

Although such features will change dramatically during transitions, they will also change when

something moves in a single shot. The advantage of feature matching is that it is invariant to

transformations; thus, we can even match objects after they have moved. An additional advantage is that

we do not have to design a detector for each kind of transition. Since a shot change indicates a change of

objects in the scene, multiple kinds of shot changes can be detected in a unified manner.

Figure 2. Feature matching results between adjacent frames.

In our algorithm, each frame is pre-processed by key point detectors. The key points are extracted as

mentioned above. A salient key point is selected by detecting the local maxima in an n*n region. Fig.1

illustrates the contrast context histogram of a salient key point under the log-polar coordinate system. A

local region is divided into several sub regions by quantizing and plotted on log-polar coordinate system.

For each sub region, a 2-bin contrast histogram, introduced above is constructed. Each key point in first

frame is matched to the key point in the next frame that has the shortest distance to it. However, if the

shortest distance is longer than a predefined minimum threshold, the key point is not matched to any key

point then it is declared as non-gradual or abrupt transition. While calculating the gradual transition, the

minimum matching between adjacent frame is consider and this threshold will definitely less than the

threshold that we have considered in case of abrupt transition. Thus we have defined two threshold in

our method upper and lower from which we separated abrupt and gradual transition.

C. Intervals of Transitions

When we determine that in this particular video, the gradual transition occurs, it is also necessary to find

the intervals of transitions. Our method for finding the intervals is also based on feature matching. Shot

changes are likely to occur when the number of matched objects decreases, thus there should not be any

transitions when several objects in adjacent frames are matched. In our method, the local maxima to the

left and right of the candidate transition are possible start and end frames of that transition. We add

another condition that the video sequence before and after the shot change should also be stable

resulting in stable numbers of matched key points. This find for start and end points begins with the two

maxima and continues until the number of matched key points is stable.




D. The Twin Comparison Approach

In this paper, the twin comparison approach was implemented to detect shot boundaries. Zhang was the

first researcher who proposed this method. In the twin comparison method, two successive frames are

computed according to their histogram differences metric. The next step is to pick out two thresholds;

high and low, which will be used on the obtained result. More precisely, cut transitions are detected if the

difference is greater than the high threshold TH. Gradual transitions are detected if the differences are

greater than the low threshold TL. The differences are accumulated until it becomes lower than the low

threshold TL. Finally, the high threshold is compared with the obtained accumulated difference in order

to detect gradual shot boundary. In general, the twin comparison approach is applied to several metrics

to detect gradual transitions. Here, the CCH histogram comparison difference metric is implemented to

detect the gradual changes in video.

IV. PERFORMANCE MEASUREMENT AND MANUAL DATA SET

A. Performance measurement

Recall and Precision are the two best and universally used performance matrices for analyzing any shot

boundary detection techniques. For demonstrating the performance of our method, both recall and

precision are computed. The recall and precision are defined in the following equations.

Recall =

Correct

* 100

Correct + Missed

Precision =

Correct

*100

Correct + False positive

Where the ‘Correct’ refers to the number of correct and the ‘Missed’ refers to the number of missed

detections while ‘False Positive’ denotes the number of false detections or false positive detections. If the

numbers of missed and false positive transitions are low, the performance of this algorithm can be more

accurate. On the other hand, ‘Correct + missed’ stands for the overall number of real transition in the

video under testing while ‘Correct+ False Positive identified the overall number of transitions detected by

this algorithm.

Recall tells us which proportions of the detected transitions were truly shot boundaries and Precision

describes in what proportion the algorithm is giving false detection.

Usually both recall and precision are given to fully describe the performance of a system since the values

are somewhat interlinked. Usually the system parameters can be altered to gain high precision values,

but this decreases recall. On the other hand, the parameters can be tweaked to gain high recall values, but

this often decreases precision. The goal is to find a compromise that gives sufficiently good precision and

recall at the same time.

There exist various ways to combine the precision and recall values to one single performance score. One

of these ways is the F1 measure which is a harmonic mean value that gives equal weight to both precision

and recall.




F1 measure = 2 *

Recall. Precision

Recall + Precision

In the evaluation, precision and recall values were calculated for gradual transitions. In addition, total

precision and recall were calculated by combining the two detection results. One more performance

measure to check the accuracy of the algorithm is Retrieval Success Index (RSI) given by the following

equation.

RSI =

Correct

*100

Correct +Missed +False Positive

B. Manual Data Set

The proposed algorithm was applied and tested on a data set of approximately one hour and forty eight

minutes of multiplicity videos. The chosen data set consist of six types of videos. The frames have been

extracted from data set video by using free studio tool generating 146452frames as a total length. All

video frames have been stored as JPG format involving about 4 GB of space. Furthermore, the fades and

dissolve transitions have been determined with a human observer. Table 1 exhibits the transitions

among the video list having 2005fades and 936dissolve transitions. The six types of selected video were:

Video 1: The video 1 is remix song of Ashiqui 2 movie with high amount of light effect. It contains total

7425frames with a frame rate of 25 frames per second and lasting for four minutes fifty seven second.

Video 2: The video 2 is Jeene laga hoon song from Ramaiya Vastavaiya movie contains total 1905frames

with a frame rate of 15 frames per second and lasting for two minutes seven seconds.

Video 3: The video 3 is Dil Tu Hi Bata song from Krish 3 movie with complete camera and object motion.

It contains total 4475 frames with a frame rate of 25 frames per second and lasting for two minutes fifty

nine seconds.

Video 4: The video is I am in love Song from Once upon a time in Mumbai movie. It contains total

4825frames with a frame rate of25 frames per second and lasting for three minutes thirteen seconds.

Video 5: The video 5 is Jumper movie. It contains total122110frames with a frame rate of 29 frames per

second and lasting for one hour thirty minutes.

Video 6: The video 6 is a Tu Jane Na Song from Ajab Prem ki Gajab Kahani movie.It contains total

5712frames with a frame rate of 24 frames per second and lasting for three minutes fifty eight seconds.

Table 1. The Manual Data Set for Gradual transitions.

Video Clips No. of frames Fades Dissolve

Video 1 7425 521 167

Video 2 1905 95 15

Video 3 4475 116 192

Video 4 4825 747 303




Video 5 122110 412 92

Video 6 5712 114 167

Total 146452 2005 936

V. RESULTS AND ANALYSIS

To measure the accuracy of detecting the change between shots a comparison has-been done between

the shot boundary that detected manually and those that detected using the twin comparison algorithms.

The Correct, False & Missed detected transition given by the algorithm is presented in Table 2. The

results obtained for various performance metrics are presented in Table 3 and Table 4 below.

Table 2 Results of Correct, Missed & False detected transition.

Video

Collection

Fades Dissolve

Correct

Detection

Missed

Detection

False

Detection

Correct

Detection

Missed

Detection

False

Detection

Video 1 500 21 46 160 7 20

Video 2 83 12 11 12 3 3

Video 3 91 25 0 173 19 10

Video 4 688 59 75 231 72 30

Video 5 383 29 32 78 14 11

Video 6 107 7 11 146 21 18

Table 3 Results in terms of Performance Matrices

Video

Collections

Fades Dissolve

Recall Precision Recall Precision

Video 1 95.97 93.63 95.80 92.21

Video 2 87.36 88.29 80 80

Video 3 78.44 100 90.10 94.53

Video 4 92.10 90.17 76.23 88.50

Video 5 92.96 92.28 84.78 87.64

Video 6 93.85 90.67 87.42 89.02

Table 4. Results of F1 measure and RSI

Video

Collections

Fades Dissolve

F1Measure RSI F1measure RSI

Video 1 94.78 88.18 92.21 85.56

Video 2 87.82 78.30 80.00 66.66

Video 3 87.91 78.44 92.26 85.64

Video 4 91.12 83.69 81.90 69.36

Video 5 92.61 86.26 86.18 75.72

Video 6 92.23 85.6 88.21 78.92

We have used the local key point approach integrated with twin comparison method which mainly

focused on constructing of histogram based on relative intensities instead of taking direct intensities. The




obtained results are tabulated and we get average 90% and 92% value of recall and precision

respectively. 60% of the above videos are such that it contains large amount of camera motion or light

effect. The f1measure and RSI obtained by our method have also very high value as compared to the

previous approaches. The trade-off between Recall and Precision is also very proper. Following graph

shows trade-off between recall and precision for Fades and Dissolve detection

Figure 3 Trade-off between Recall and Precision for Fades

Figure 4 Trade-off between Recall and Precision for Dissolve

VI. CONCLUSION

The paper mainly focused on detection of gradual transition in different video sequence. Due to the large

amount of capturing, storing and processing devices, it is almost impractical to analyze the video

manually.

In this paper we used a local key point approach to detect gradual transition. In this method, key points

are extracted, depending on that local descriptor is built and positive and negative histogram of relative

intensities is plotted which is used for the detection of gradual transition. For the detection of dissolve

twin comparison method is integrated with local key point approach.

Total 146452 frames are observed out of which around 16 thousand frames contains camera motion. The

average value of recall and precision is found to be 90% and 92% respectively. The maximum value of

f1measure and RSI are 93.49% and 86.87% respectively which is relatively very high as compared to

previous methods.




VII. REFERENCES

[1] Chih-Wen Su, Hong-Yuan Mark Liao, Hsiao-RongTyan and Kuo-Chin Fan“A Motion- Tolerant

Dissolve Detection Algorithm,” IEEE transactions on mulmultimedia, vol. 7, no. 6, December

2005

[2] JehoNamand Ahmed H. Tewfik, “Detection of Gradual Transitions in Video Sequences Using B-

Spline Interpolation,” IEEE transactions on multimedia, vol. 7, no. 4, august 2005 667

[3] Joe Diggins, “Detecting Cross-Fades in Interlaced Video with 3:2 Film Cadence,” IEEE transactions

on circuits and systems for video technology, vol. 19, no. 7, July 2009 1063

[4] W.A.C. Fernando, C.N. Canagarajahand D. R. Bull “Fade-in and fade-out detection in video

sequences using histograms,”ISCAS 2000 - IEEE International Symposium on Circuits and

Systems, May 28-31, 2000, Geneva, Switzerland

[5] Qing-GeJi, Jian-Wei Feng, Jie Zhao and Zhe-Ming Lu “Effective Dissolve Detection Based on

Accumulating Histogram Difference and theSupport Point,”978-0-7695-4180-8/10 © 2010

[6] Robert A. Joyce, and Bede Li, “Temporal Segmentation of Video UsingFrame and Histogram

Space,” IEEE transactions on multimedia, vol. 8, no. 1, February 2006

[7] Hui Fang, Jianmin Jiang and YueFeng, “Fuzzy logic approach for detection of video shot

boundaries,”0031-3203_ 2006 Pattern Recognition Society. Published by Elsevier

Ltd.doi:10.1016/j.patcog.2006.04.044

[8] G. Boccignone, A. Chianese, V. Moscato, and A. Picariello, “Foveated shot detection for video

segmentation,” IEEE Trans. Circuits Syst.Video Technol., vol. 15, no. 3, pp. 365–377, Mar. 2005.

[9] P. Bouthemy,M. Gelgon, and F. Ganansia, “A unified approach to shot change detection and camera

motion characterization,” IEEE Trans. Circuits Syst. Video Technol., vol. 9, no. 7, pp. 1030–1044,

1999.

[10] Mee-Sook Lee, Yun-Mo Yang, Seong-Whan Lee, “Automatic video parsing using shot boundary

detection and camera operation analysis,” 1999 0031-3203/01

[11] Ba TuTruongt, ChitraDorai, SvethaVenkatesht “Improved fade and dissolve detection for reliable

video Segmentation,”0-7803-G297-7/00/2000 IEEE

[12] Rainer LienhartandAndre Zaccarh, “A System for Reliable Dissolve Detection in Videos,” 0-7803-

6725-1/0102001 IEEE

[13] Chen Xu and Liu wei, “Study on shot boundry Detection Based on Fuzzy Subset Hood Theory”

978-0-7695-4212-6/10 2010 IEEE DOI 10.1109/ISDEA.2010.201

[14] Zhang xiaona, Qi guoqing, Wang Qiang and Zhang Tao “An Improved Approach of Scene Change

Detection in Archived Films” 978-1-4244-5900-1/10©2010 IEEE

[15] Mohammed A. Refaey, Khaled M. Elsayed, Sanaa M. Hanafyand Larry S. Davis “Concurrent

transition and shot detection in football videos using Fuzzy logic,” 978-1-4244-5654-3/09/2009

IEEE




[16] Chun-RongHuang,Huai-Ping Lee, and Chu-Song Chen, “Shot Change Detection via Local Keypoint

Matching,” IEEE transactions on multimedia, vol. 10, no. 6, october 2008 1097.

Presentations & Public Speaking

An Efficient Method For Gradual Transition Detection In Presence Of Camera Motion