Event retrieval in large video collections with circulant temporal encoding

CVPR 2013 Oral

Outline

• 1. Introduction• 2. EVVE: an event retrieval dataset• 3. Frame description• 4. Circulant temporal aggregation• 5. Indexing strategy and complexity• 6. Experiments• 7. Conclusion

1. Introduction＊ This paper introduces an approach for specific event retrieval

video copy detection[13]event category recognition [16]

＊ Searching for specific events related

find deformed videos

＊ Measures the similarity between two sequences for all possible alignments

Frame descriptors are jointly encoded in the frequency domain

Frequency pruning

Regularization

Quantization

1. Introduction＊ Contribution

2.exploit the properties of circulant matrices to compare videos in the frequency domain

1.encode the frame descriptors into a temporal representation

3.Dataset named EVVE for specific event retrieval in large user-generated video content

2. EVVE: an event retrieval dataset

＊ EVVE also includes events for which relevant videos might not correspondto the same instance in place or time.

＊ Several of them are localized precisely in time and space

＊ The human annotators have marked the videos as either positive or negative. Ambiguous videos were removed.

＊ we have also retrieved a set of 100,000 distractor

2. EVVE: an event retrieval dataset

3. Frame description＊ Pre-processing

Sampling them at a fixed rate of 15 fps and resizing them to a maximum of 120k pixels

＊ Local description.

dense grid SIFT extract

We square root the SIFT components and reduce the descriptor to 32 dimensions with PCA

＊ Descriptor aggregation

The SIFT descriptors of a frame are encoded using MultiVLAD

(VLAD: Vector of Locally Aggregated Descriptors)

Two VLAD descriptors are obtained from two different codebooks of size 128, and concatenated.

Power law normalization is applied to the vector and it is reduced by PCA to dimension d

4. Circulant temporal aggregation

qt=0,if t<1 or t>mbt=0,if t<1 or t>n

Assumption 1: There is no (or limited) temporal acceleration.

Assumption 2: The inner product is a good similarity between individual frames

Assumption 3: The sum of similarities between the frame descriptors reflects the similarity of the sequences

(In practice, this assumption is not well satisfied, because the videos are very self-similar in time)

• Circulant encoding of vector sequences＊ column notation

convolution operator:

＊ Assuming sequences of equal lengths (n = m)

element-wise multiplication of 2 vectors:

• Regularized comparison metric

(which is more efficient to compute)

＊ we consider hereafter that the sequences have been preprocessed to have the same length m = n = 2k:

because the temporal consistency and more generally the self-similarity of frames in videos,the values of the score vector s(q,b) are noisy and its peak over is not precisely localized

4. Circulant temporal aggregationadditional filter:

＊ we compute Wi assuming that the contributions are shared equally across dimensions

(in the Fourier domain)

4. Circulant temporal aggregation＊ One major drawback is that the denominator in Eqn. 8 may be close to zero

＊ Solve this issue

regularization coefficient :

• Boundary detection

＊ Time shift: to find optimal alignment

In some applications such as video alignment (see Section 6), we also need the boundaries of the matching segments

The matching sequence is defined as a set of contiguous t for which the scores St are high enough.

database descriptors are reconstructed in the temporal domain from

5. Indexing strategy and complexity

• Frequency domain representation

＊ The goal is to implement the method presented in Section 4 in an approximate manner

Video : b Length: n

complex matrix

＊ Frequency pruning

reduce the video representation by keeping only a fraction

of the low frequency vectors

• Complex PQ-codes and metric optimization

＊ Descriptor sizes

we pre-compute a Fourier descriptor for different zero-padded versions of the query

by noticing that the Fourier descriptor of the concatenation of a signal with itself is

Use product quantization technique [9] , which is a compression technique that enables efficient compressed-domain comparison and search.

5. Indexing strategy and complexitydatabase vector is split into p sub-vectors (j=1…p)

The sub-vectors are separately quantized using k-means quantizers

This produces a vector of indexes

＊ The comparison between a query descriptor x and the database vectors is performed in two stages

First, the squared distances between each sub-vector xj and all thepossible centroids are computed and stored in a table

• Summary of search procedure and complexity

Second, the squared distance between x and y is approximated as

which only requires p look-ups and additions.

1. The video is pre-processed and each frame is described as a d-dimensional Multi-VLAD descriptor.

2. This vector is padded with zeros to the next power of two, and mapped to the Fourier domain using d independent 1-dimensional FFTs.

3. High frequencies are pruned: Only frequency vectors are kept. After this step, the video is represented by dimensional complex vectors.

4. These vectors are separately encoded with a complex product quantizer, producing a compressed representation of bytes for the whole video.

＊ complexity

6. Experiments

For TRECVID the measure is NDCR (lower = better)

6. Experiments

7. Conclusion

• This video representation provides an efficient search scheme

• Extensive experiments on two video copy detection benchmarks show that our approach improves over the state of the art with respect to accuracy, search time and memory usage.

Event retrieval in large video collections with circulant temporal encoding

Documents

Dual Encoding for Zero-Example Video Retrievalopenaccess.thecvf.com/content_CVPR_2019/papers/Dong_Dual...Dual Encoding for Zero-Example Video Retrieval Jianfeng Dong1, Xirong Li∗2,3,

Dual Encoding for Zero-Example Video Retrieval · 2019-03-20 · Dual Encoding for Zero-Example Video Retrieval Jianfeng Dong1, Xirong Li 2,3, Chaoxi Xu3, Shouling Ji4,5, Yuan He6,

Semantic Encoding and Retrieval in the Left Inferior ... · Encoding and retrieval constitute two discrete stages of memory. Encoding refers to the processes operating at the time

Kuliah 9 10 Encoding Storage Retrieval Revisi

Home - Scott County Schoolsin memory: encoding, storage, and retrieval. Vocabulary memory encoding storage retrieval sensory memory short-term memory maintenance rehearsal chunking

Toeplitz and Circulant Matrices

Memory part2. Why Do We Forget? Encoding failure Storage failure Retrieval failure Interference theory

PSY 323: Cognition Chapter 7: Long-Term Memory: Encoding, Retrieval, & Consolidation

Event Retrieval in Large Video Collections with Circulant ...€¦ · “Eruption of Strokkur geyser in Iceland” and “Jurassic Park ride in Universal Studios theme park”. All

Context Encoding for Video Retrieval with Contrastive LearningRepresentation Learning, Video Retrieval, Contrastive Learning 1 INTRODUCTION Content-based video retrieval is critical

Event retrieval in large video collections with circulant temporal encoding · 2020-02-05 · Event retrieval in large video collections with circulant temporal encoding Jer´ ome

Retrieval and Forgetting AP Psychology. Forgetting An inability to retrieve information due to poor encoding, storage, or retrieval. Biological Reasons

Davis Circulant Matrices

Kuliah 9- Encoding-storage-retrieval Revisi 2015

Effects of Ethanol on Encoding and Retrieval of …...Effects of Ethanol on Encoding and Retrieval of Serial Rule Switch Operant Task Kristin S. Edwards and Ben Givens Department of

Factors that influence Memory - Wofford Collegewebs.wofford.edu/boppkl/courseFiles/Psy150/PPT/6slidestopage/8... · Encoding-retrieval match Encoding-retrieval match Effective cue

Episodic Memory (memory for episodes) Encoding Retrieval Encoding x Retrieval interactions Amnesia/Implicit memory Memory for natural settings

Tetrahydrocannabinol (THC) impairs encoding but not ... · Conclusions: THC acutely interferes with encoding of verbal memory without interfering with retrieval. These data suggest

Brain Electrical Activity (ERPs) during Memory Encoding and Retrieval

Information Processing Module 20. What is Memory? Process of encoding, storage, and retrieval. Learning that persists over time. Encoding Get info in