1
REAL-TIME KEYFRAME EXTRACTION TOWARDS VIDEO CONTENT IDENTIFICATION Maria Chatzigiorgaki and Athanassios N. Skodras School of Science and Technology, Hellenic Open University, 26222 Patras, Greece [email protected], [email protected] Motivation Keyframe extraction constitutes a fundamental unit in many video retrieval-related applications and the first stage of a Video Copy Detection (VCD) application. Even if the video database keyframe extraction is performed manually to ensure optimal video representation, the problem of automated query video processing arises, as video copies and original videos should result in visually similar set of keyframes (prerequisite of similarity matching for effective video copy detection). Figure 1: Video copy definition: Segment of the original video (first row) and a copy of it subjected to format and content variations (second row), while the third row corresponds to a visually similar (to the original) video, but NOT a copy (near-duplicate ). The requirements regarding keyframe extraction in a VCD application can be summarized as follows: The extracted keyframes should capture the content variation along a video At least one keyframe per shot At least one keyframe per sub-shot The extracted keyframes should preserve their temporal position Distinct keyframes Real-time performance Use compressed-domain features directly extracted from MPEG bitstream by exploiting motion vectors, macroblock type information and DCT coefficients) Compact set of keyframes Although keyframe extraction is content-dependent (leading to unconstrained number of keyframes), keyframe percentage up to 2%-3% of the total number of video frames is reason- able. Approach We propose a real-time sequential keyframe extraction algorithm orientated towards video content identification applications, that bypasses the process of temporal video segmentation (shot boundary detection) into shots. dct-domain feature extraction Real-time performance is accomplished by exploiting DCT coefficients of each I -frame in feature extraction. Further gain in speed by considering only the DC coefficients of I -frames. Two different features (and their corresponding measures of similarity between two feature vectors) have been selected and evaluated in the proposed keyframe extraction algorithm. Chang et al. [1] DC difference-based feature, used in JPEG image retrieval Kim et al. [2] YCbCr color layout (λ Y ) feature, used for frame-to-frame similarity in Video Linkage framework towards video copy detection Depending on the feature used to represent each I -frame, we refer to the proposed algorithm as: – C-Algorithm using Chang et al. [1] feature – K-Algorithm using Kim et al. [2] feature Proposed algorithm Get DCT coefficients of i-th I -frame by partial decoding of MPEG stream Feature extraction (f i ) of the currently decoded I -frame Calculate current ratio R c (R c 1.0): R c = max{d(f i-2 ,f i-1 ),d(f i-2 ,f i )} min{d(f i-2 ,f i-1 ),d(f i-2 ,f i )} Get binary values of R c and R p : (R x ) bin = 1, if R x >T 0, otherwise Calculate k flag : k flag =(R c ) bin (R p ) bin As a result of the logical xor operation, k flag = 0 if both ratios R c and R p follow the same behavior, i.e. either both have values greater, or less than T . Transition is detected if k flag =1 where a keyframe is selected. Store updated values of ratio (R p R c ) and feature vectors (f i-2 f i-1 and f i-1 f i ) and continue Remark: The index of each candidate keyframe is examined and only if it is greater than the index of the last detected keyframe it is stored as keyframe. Figure 2: Flowchart of the proposed keyframe extractor The goal is to detect two keyframes per shot, one at the begging and one at the end of each shot (as well as for each sub-shot for complex scenes involving camera/object motion), which will be used as indicators for effective similarity matching between original videos and copies. Triplets of I -frames are examined on the basis of change detection, either when shot transition occurs or camera/object motion is involved (sub-shots). Assuming that R p <T and R c >T ((R p ) bin =0 and (R c ) bin = 1 respectively), results in k flag =1 (Transition detected). The (i - 2)-th I -frame (t = t n ) is stored as keyframe according to the flowchart. Figure 3: Key-idea of the proposed keyframe extractor: a) A simple example of a shot cut is presented for better understanding of how the proposed algorithm works in practice. b) Illustra- tion of the keyframe extraction process along a video segment consisting of 3 shots. Evaluation A subset of eight MPEG-1 video sequences from TRECVid 2007 test video collection [3] where selected for evaluation. The evaluation of the proposed algorithm is carried out in terms of the number and the percentage of the extracted keyframes and the processing time. (a) (b) (c) Figure 4: Evaluation of the proposed keyframe extractor in terms of: a) # of extracted keyframes, b) Keyframe percentage (%) and c) Processing time (sec) A typical example of how the algorithm works in a real-world application scenario is presented in Fig.5. A popular TV spot was used as query and many versions of it (copies and near-duplicates) were downloaded from YouTube and converted to MPEG-1 format. Figure 5: Extracted keyframes of seven different versions of the video “BMW 1-series com- mercial with Kermit ” using the C-Algorithm (T =1.2) The proposed algorithm results in a compact set of keyframes, where the keyframe percentage is lim- ited to a maximum of 2.5% (1.5% on average) (Fig.4b). Real-time performance is achieved (Fig.4c), not only due to DCT-domain feature extraction but also by bypassing the process of temporal video segmentation. References [1] C.-C. Chang, J.-C. Chuang, and Y.-S. Hu, “Retrieving digital images from a JPEG compressed image database,” Image and Vision Computing, vol. 22, pp. 471–484, 2004. [2] H. Kim, J. Lee, H. Liu, and D. Lee, “Video Linkage: Group based copied video detection,” in Proc. ACM International Conference on Image and Video Retrieval (CIVR’08), Niagara Falls, Canada, July 7–9, 2008. [3] A.F. Smeaton, P. Over, and W. Kraaij, “Evaluation campaigns and TRECVid,” in Proc. of the 8th ACM International Workshop on Multimedia Information Retrieval (MIR’06), Santa Barbara, California, USA, 2006, pp. 321–330. 16th IEEE International Conference on Digital Signal Processing (DSP’09) July 5-7, 2009 - Santorini, Greece

REAL-TIME KEYFRAME EXTRACTION TOWARDS VIDEO CONTENT ... · poster.dvi Author: Maria Chatzigiorgaki Subject: Poster DSP'09 Keywords: 16th IEEE Int'l Conf. on Digital Signal Processing

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: REAL-TIME KEYFRAME EXTRACTION TOWARDS VIDEO CONTENT ... · poster.dvi Author: Maria Chatzigiorgaki Subject: Poster DSP'09 Keywords: 16th IEEE Int'l Conf. on Digital Signal Processing

REAL-TIME KEYFRAME EXTRACTION TOWARDS VIDEO CONTENT IDENTIFICATION

Maria Chatzigiorgaki and Athanassios N. Skodras

School of Science and Technology, Hellenic Open University, 26222 Patras, Greece

[email protected], [email protected]

Motivation

Keyframe extraction constitutes a fundamental unit in many video retrieval-related applications andthe first stage of a Video Copy Detection (VCD) application. Even if the video databasekeyframe extraction is performed manually to ensure optimal video representation, the problem ofautomated query video processing arises, as video copies and original videos should result in visuallysimilar set of keyframes (prerequisite of similarity matching for effective video copy detection).

Figure 1: Video copy definition: Segment of the original video (first row) and a copy of itsubjected to format and content variations (second row), while the third row corresponds to avisually similar (to the original) video, but NOT a copy (near-duplicate).

The requirements regarding keyframe extraction in a VCD application can be summarized as follows:

• The extracted keyframes should capture the content variation along a video

– At least one keyframe per shot

– At least one keyframe per sub-shot

• The extracted keyframes should preserve their temporal position

•Distinct keyframes

•Real-time performance

Use compressed-domain features directly extracted from MPEG bitstream by exploiting motionvectors, macroblock type information and DCT coefficients)

• Compact set of keyframes

Although keyframe extraction is content-dependent (leading to unconstrained number ofkeyframes), keyframe percentage up to 2%-3% of the total number of video frames is reason-able.

Approach

We propose a real-time sequential keyframe extraction algorithm orientated towards video contentidentification applications, that bypasses the process of temporal video segmentation (shot boundarydetection) into shots.

dct-domain feature extraction

•Real-time performance is accomplished by exploiting DCT coefficients of each I-frame in featureextraction. Further gain in speed by considering only the DC coefficients of I-frames.

• Two different features (and their corresponding measures of similarity between two feature vectors)have been selected and evaluated in the proposed keyframe extraction algorithm.

– Chang et al. [1] DC difference-based feature, used in JPEG image retrieval

– Kim et al. [2] YCbCr color layout (λY ) feature, used for frame-to-frame similarity inVideo Linkage framework towards video copy detection

•Depending on the feature used to represent each I-frame, we refer to the proposed algorithm as:

– C-Algorithm using Chang et al. [1] feature

– K-Algorithm using Kim et al. [2] feature

Proposed algorithm

•Get DCT coefficients of i-th I-frame bypartial decoding of MPEG stream

• Feature extraction (fi) of the currentlydecoded I-frame

• Calculate current ratio Rc (Rc ≥ 1.0):

Rc =max{d(fi−2, fi−1), d(fi−2, fi)}

min{d(fi−2, fi−1), d(fi−2, fi)}

•Get binary values of Rc and Rp:

(Rx)bin =

{

1, if Rx > T

0, otherwise

• Calculate k flag:

k flag = (Rc)bin ⊕ (Rp)bin

As a result of the logical xor operation,k flag = 0 if both ratios Rc and Rp

follow the same behavior, i.e. eitherboth have values greater, or less thanT . Transition is detected if k flag = 1where a keyframe is selected.

• Store updated values of ratio (Rp ←Rc) and feature vectors (fi−2 ← fi−1and fi−1← fi) and continue

Remark:The index of each candidate keyframe isexamined and only if it is greater than theindex of the last detected keyframe it isstored as keyframe.

Figure 2: Flowchart of the proposed keyframe extractor

The goal is to detect two keyframes per shot, one at the begging and one at the end of each shot(as well as for each sub-shot for complex scenes involving camera/object motion), which will be usedas indicators for effective similarity matching between original videos and copies. Triplets of I-framesare examined on the basis of change detection, either when shot transition occurs or camera/objectmotion is involved (sub-shots).

Assuming that Rp < T and Rc > T ((Rp)bin = 0and (Rc)bin = 1 respectively), results in k flag = 1(Transition detected). The (i−2)-th I-frame (t = tn)is stored as keyframe according to the flowchart.

Figure 3: Key-idea of the proposed keyframe extractor: a) A simple example of a shot cut ispresented for better understanding of how the proposed algorithm works in practice. b) Illustra-tion of the keyframe extraction process along a video segment consisting of 3 shots.

Evaluation

A subset of eight MPEG-1 video sequences from TRECVid 2007 test video collection [3]where selected for evaluation. The evaluation of the proposed algorithm is carried out in terms of thenumber and the percentage of the extracted keyframes and the processing time.

(a) (b) (c)

Figure 4: Evaluation of the proposed keyframe extractor in terms of: a) # of extractedkeyframes, b) Keyframe percentage (%) and c) Processing time (sec)

A typical example of how the algorithm works in a real-world application scenario is presented inFig.5. A popular TV spot was used as query and many versions of it (copies and near-duplicates)were downloaded from YouTube and converted to MPEG-1 format.

Figure 5: Extracted keyframes of seven different versions of the video “BMW 1-series com-mercial with Kermit” using the C-Algorithm (T = 1.2)

The proposed algorithm results in a compact set of keyframes, where the keyframe percentage is lim-ited to a maximum of 2.5% (1.5% on average) (Fig.4b). Real-time performance is achieved (Fig.4c),not only due to DCT-domain feature extraction but also by bypassing the process of temporal videosegmentation.

References

[1] C.-C. Chang, J.-C. Chuang, and Y.-S. Hu, “Retrieving digital images from a JPEG compressed image database,” Image

and Vision Computing, vol. 22, pp. 471–484, 2004.

[2] H. Kim, J. Lee, H. Liu, and D. Lee, “Video Linkage: Group based copied video detection,” in Proc. ACM International

Conference on Image and Video Retrieval (CIVR’08), Niagara Falls, Canada, July 7–9, 2008.

[3] A.F. Smeaton, P. Over, and W. Kraaij, “Evaluation campaigns and TRECVid,” in Proc. of the 8th ACM International

Workshop on Multimedia Information Retrieval (MIR’06), Santa Barbara, California, USA, 2006, pp. 321–330.

16th IEEE International Conference on Digital Signal Processing (DSP’09)July 5-7, 2009 - Santorini, Greece