1 Video Classification By: Maryam S. Mirian For: Multimedia & Pattern Recognition Joint Courses...

Preview:

Citation preview

1

Video Classification

By: Maryam S. MirianFor: Multimedia & Pattern Recognition Joint Courses Project

/249

Outline

What is Video Classification? Straightforward or Difficult? What is its Applications? What are its methods? Review of Video Classification

Methods What is my own Project, exactly?

/349

What is Video Classification?

Classify a Video (Shot) into one of Nc predefined Classes:

Indoor / outdoor News / Sports …

/449

Is Video Classification Difficult? Why?

YES, Because: Data Stream is a Multi-dimensional

signal. It has a subjective nature.

5

Classification

/649

Required Steps for Classification

ClassificationFeature

ExtractionFeature

Reduction

The most Important and the most difficult

part

Using Methods

like: PCA, LDA

Object

Observations

ClassLabels

/749

Methods of Classification

Bayesian Classification kNN Classification Neural Classification

MLP RBF

Classification based on Support Vector Machines

Rule-based Classification

/849

Bayesian Decision Making So, x belongs to w2

/949

Methods of Classification

Bayesian Classification kNN Classification Neural Classification

MLP RBF

Classification based on Support Vector Machines

Rule-based Classification

/1049

kNN Decision Making k = 5 ,2 Red

NeighborWhile 3 Black

Neighbor, so X should be

Black!

/1149

Methods of Classification

Bayesian Classification kNN Classification Neural Classification

MLP RBF

Classification based on Support Vector Machines

Rule-based Classification

/1249

MLP Classifier

/1349

Video Content Analysis

/1449

Applications of Automatic video classification

Automatic Video segmentation content based retrieval browsing and retrieving digitized video identifying close-up video frames before

running a computationally expensive face recognizer.

effective management of ever-increasing amount of broadcast news video: personalization of news video.

/1549

Classify Shot or Video?

One effective way to organize the video is to segment the video into small, single-story units and classify these units according to their semantics.

A shot represents a contiguous sequence of visually similar frames. It is a syntactical representation and does not usually convey any coherent semantics to the users.

16

Looking @Video Classification

/1749

Ide et al. [1998] Problem Domain: News video Features:

Videotext motion face

segmented the video into shots used clustering techniques classify each shot into 1 of 5 classes: Speech/report,

Anchor, Walking, Gathering, and Computer graphics shots.

Quite simple but seems effective for this restricted class of problems.

/1849

Huang et al. [1999] Problem Domain: TV Programs

news report weather forecast Commercials basketball games football games

Features: Audio Color motion

/1949

Chen and Wong [2001] Problem Domain:

news video: News Weather Reporting Commercials Basketball Football

Features: Motion Color text caption cut rate

used a rule-based approach

/2049

Looking @ Lekha Chaisorn et.al [2002] in More Details

/2149

Basic Ideas Proposes a two-level, multi-modal framework. The video is analyzed at the shot and story unit (or scene)

levels. At the shot level, a Decision Tree to classify the shot into

one of 13 pre-defined categories is employed. At the scene level, the HMM (Hidden Markov Models)

analysis is used to eliminate shot classification errors Results indicate that a high accuracy of over 95 % for shot

classification can be achieved. The use of HMM analysis helps to improve the accuracy of

the shot classification and achieve over 89% accuracy on story segmentation.

/2249

Predefined Classes

/2349

Features in Shot Level Low-level Visual Content Feature

Color Histogram Temporal Features

Background scene change Speaker change Audio Motion activity Shot duration

High-level Object-based features Face Shot type Videotext Centralized Videotext

/2449

Feature vector of a shot

Si = (a, m, d, f, s, t, c) a the class of audio, a ∈{ t=speech, m=music, s=silence,

n =noise, tn = speech + noise, tm= speech + music, mn=music+noise}

m the motion activity, m ∈{l=low, m=medium, h=high} d the shot duration, d ∈{s=short, m=medium, l=long} f the number of faces, Ν ∈ f s the shot type, s ∈{c= closed-up, m=medium, l=long,

u=unknown} t the number of lines of text in the scene, Ν ∈ t c set to “true” if the videotexts present are centralized, c

∈{t=true, f=false}

/2549

Decision Tree for Shot Classification

26

Reading these papers, I decided about My own Project….

/2749

About Problem Domain…

Sport Classification seems OK Interesting Enough It is helpful for Sports-Lovers

/2849

About Extracting features…. Features used in video analysis: color,texture,shape,motion vector… Criteria of choosing features : they should

have similar statistical behavior across time

Color histogram: simple and robust Motion vectors:invariance to color and

light

/2949

So, My Own Project is Sports Video Classifications : Football, Basketball, ….

(Those Well-defined sports, I can find Video On!) Steps I should take:

Finding or Gathering a Video Collection Shot Detection Feature Extraction :

Key Frame (s) Extraction: Selecting Middle Shot I-Frame Use of Clustering …

Motion Vector–based Features Straight Lines Detection

Design a Classifier Test the Approach

/3049

Looking @Ekin,Tekalp[2003]

one Research on Football Video Classification

/3149

Cinematic result from common video composition

and production rules. shot types, camera motions and replays.

Object-based Described by their spatial, e.g., color,

texture, and shape, and spatio-temporal features, such as object motions and interactions

Features

/3249

Robust Dominant Color Region Detection

A soccer field has one distinct dominant color (a tone of green) that may vary from stadium to stadium, and also due to weather and lighting conditions within the same stadium.

The statistics of this dominant color, in the HSI space, are learned by the system at start-up, and then automatically updated to adapt to temporal variations.

/3349

Shot classification Long Shot

A long shot displays the global view of the field. In-Field Medium Shot

a whole human body is usually visible. Close-Up Shot

shows the above-waist view of one person Out of Field Shot

The audience, coach, and other shots

/3449

/3549

How Extend to Shot from a Frame?

Due to the computational simplicity they find the class of every frame in a shot and assign the shot class to the label of the majority of frames.

/3649

Decision Schema based on G

The first stage uses G value and two thresholds, TcloseUp and Tmedium to determine the frame view label.

/3849

Soccer Eevent Detection

Goal Detection Referee Detection

Controversial calls, such as red-yellow cards and penalties

Penalty Box Detection

/3949

Goal Detection Occurrence of a goal is generally

followed by a special pattern of cinematic features. A goal event leads to a break in the

game. one or more close-up views of the actors

of the goal event. show one or more replay(s) the restart of the game is usually

captured by a long shot.

/4049

/4149

Referee Detection

Assumed that there is, a single referee in a: medium out of field close-up shot So no search for a referee in a long shot

/4249

Penalty Box Detection

Field lines in a long view can be used to localize the view and/or register the current frame on the standard field model

/4349

Interesting Summaries

Goal summaries summaries with Referee and

Penalty box objects

/4449

Adaptation of Parameters

Parameters Tcolor in dominant color region detection TcloseUp and Tmedium in shot classification referee color statistics

The training stage can be performed in a very short time to find Mean and Variance of a Normal pdf.

/4549

Results for High-Level Analysis and Summarization

Goal detection results

/4649

Results for High-Level Analysis and Summarization(2)

Referee detection results

/4749

Results for High-Level Analysis and Summarization(3)

Penalty box detection results

/4849

References Automatic soccer video analysis and summarization, in

Symp. Electronic Imaging: Science and Technology: Storage and Retrieval for Image and Video Databases IV, IS&T/SPI03, Jan. 2003, CA.

“The Segmentation and Classification of Story Boundaries In News Video”, Proceeding of 6th IFIP working conference on Visual Database Systems-

VDB6 2002, Australia 2002 Pattern Classification, by Duda, Hart, and Stork,

2000

49

Thanks for Your Attention

Any Question or Comment?