27
Near-Duplicate Videos

Understanding Near-Duplicate Videos: A User-Centric Approach

Embed Size (px)

DESCRIPTION

Popular content in video sharing web sites (e.g., YouTube) is usually duplicated. Most scholars define near-duplicate video clips (NDVC) based on non-semantic features (e.g., different image/audio quality), while a few also include semantic features (different videos of similar content). However, it is unclear what features contribute to the human perception of similar videos. Findings of two large scale online surveys (N = 1003) confirm the relevance of both types of features. While some of our findings confirm the adopted definitions of NDVC, other findings are surprising. For example, videos that vary in visual content –by overlaying or inserting additional information– may not be perceived as near-duplicate versions of the original videos. Conversely, two different videos with distinct sounds, people, and scenarios were considered to be NDVC because they shared the same semantics (none of the pairs had additional information). Furthermore, the exact role played by semantics in relation to the features that make videos alike is still an open question. In most cases, participants preferred to see only one of the NDVC in the search results of a video search query and they were more tolerant to changes in the audio than in the video tracks. Finally, we propose a user-centric NDVC definition and present implications for how duplicate content should be dealt with by video sharing websites.

Citation preview

Page 1: Understanding Near-Duplicate Videos: A User-Centric Approach

Near-Duplicate Videos

Page 2: Understanding Near-Duplicate Videos: A User-Centric Approach

Let’s say you’re looking for theBush attack video…

Page 3: Understanding Near-Duplicate Videos: A User-Centric Approach

…and you get

11,100 results.

Page 4: Understanding Near-Duplicate Videos: A User-Centric Approach

…after40 minutes...

watching the videos listed on the first page you notice

> 50% are similar, i.e. NDVC27% in average [Wu et al., 2007]

Page 5: Understanding Near-Duplicate Videos: A User-Centric Approach

NDVC technical definition

• Identical or approximately identical videos, that differ in some feature:– file formats, encoding parameters– photometric variations (color, lighting changes)– overlays (caption, logo, audio commentary)– editing operations (frames add/remove)– semantic similarity

NDVC are videos that are “essentially the same”NDVC are videos that are “essentially the same”

Page 6: Understanding Near-Duplicate Videos: A User-Centric Approach

…like this

Page 7: Understanding Near-Duplicate Videos: A User-Centric Approach

Two challenges:

1. There is no agreement on a single definition of NDVC

1. NDVC are mostly considered as redundant content that has to be removed from the system

Page 8: Understanding Near-Duplicate Videos: A User-Centric Approach

Human Perception of

Mauro CherubiniRodrigo de Oliveira

Nuria Oliver

Near Duplicate Videos

Page 9: Understanding Near-Duplicate Videos: A User-Centric Approach

What kind of NDVC?

Malicious (i.e., spamproduced by a single user)

Copyright infringement (e.g., pirated music videos)

User-edited content : videos that complement the original materialwith additional information

Page 10: Understanding Near-Duplicate Videos: A User-Centric Approach

Recently

NDVC detection algorithm

Page 11: Understanding Near-Duplicate Videos: A User-Centric Approach

Recently

NDVC detection algorithm

Page 12: Understanding Near-Duplicate Videos: A User-Centric Approach

Why not?

NDVC detection algorithm

?

Page 13: Understanding Near-Duplicate Videos: A User-Centric Approach

Methodology

• 2 large-scale online surveys (n=1003)• 7 pairs of NDVC (differing in 1 feature)

• Subjects were asked about:– Similarity– Preference

Page 14: Understanding Near-Duplicate Videos: A User-Centric Approach

NDVC technical definition

• Identical or approximately identical videos, that differ in some features:– photometric variations (color, lighting changes)– overlays (caption, logo, audio commentary)– editing operations (frames add/remove)And …– semantic similarity (e.g., two deer grazing grass in two different forests)

Page 15: Understanding Near-Duplicate Videos: A User-Centric Approach

Audio Quality

NDVCNDVC

PreferencePreference

Stereo, 44 Khz

Mono, 11 Khz

Page 16: Understanding Near-Duplicate Videos: A User-Centric Approach

Image Quality

NDVCNDVC

PreferencePreference

Page 17: Understanding Near-Duplicate Videos: A User-Centric Approach

Audio content (overlay)

PreferencePreference

NDVCNDVC

Page 18: Understanding Near-Duplicate Videos: A User-Centric Approach

Visual + audio content (length)

PreferencePreference

Not NDVCNot NDVC

Page 19: Understanding Near-Duplicate Videos: A User-Centric Approach

Visual content (editing)

Not NDVCNot NDVC

Want bothWant both

Page 20: Understanding Near-Duplicate Videos: A User-Centric Approach

Similar semantics, different videos(similar visual info)

NDVCNDVC

Want bothWant both

Page 21: Understanding Near-Duplicate Videos: A User-Centric Approach

Similar semantics, different videos(similar audio info)

Not NDVCNot NDVC

PreferencePreference

Page 22: Understanding Near-Duplicate Videos: A User-Centric Approach

Implications for Design

1. User-centric NDVC definitionNDVC are approximately identical videos that might

differ in audio/image quality, or overlays. Conversely, identical videos with relevant complementary

information (changing clip length or scenes) are not considered as NDVC.

Furthermore, users perceive as near-duplicate videos that are not alike but that are visually similar and

semantically related.

NDVC are approximately identical videos that might differ in audio/image quality, or overlays. Conversely,

identical videos with relevant complementary information (changing clip length or scenes) are not

considered as NDVC.

Furthermore, users perceive as near-duplicate videos that are not alike but that are visually similar and

semantically related.

Page 23: Understanding Near-Duplicate Videos: A User-Centric Approach

Implications for Design

2. Clustering– Groups sharing video,

audio, semantic content– Ranking based on

user-submitted query– Highlight the most

representative

Page 24: Understanding Near-Duplicate Videos: A User-Centric Approach

Implications for Design

3. Feature and user adaptation– Boost ranking based on general observations

• More content• Better image/audio quality• …

– Boost ranking based on personalization• Abilities (e.g., auditory skills)• Task (e.g., video producer vs. movie enthusiastic)• Search query

Page 25: Understanding Near-Duplicate Videos: A User-Centric Approach

Future Work

• NDVC’s differing in more than 1 low-level feature

• Propose ways to visualize the NDVCs• Study effects of user’s goals while searching

videos

Page 26: Understanding Near-Duplicate Videos: A User-Centric Approach

A Human-Centric stance in Multimedia research

Biomimetics

Crowdsourcing

Psychophysical experiments

Page 27: Understanding Near-Duplicate Videos: A User-Centric Approach

Thank you!

Mauro CherubiniRodrigo de Oliveira

Nuria Oliver

[email protected]@[email protected]