Bridge Semantic Gap: A Large Scale Concept Ontology for Multimedia (LSCOM)

Preview:

DESCRIPTION

Bridge Semantic Gap: A Large Scale Concept Ontology for Multimedia (LSCOM). Guo -Jun Qi Beckman Institute University of Illinois at Urbana-Champaign. LSCOM (Large Scale Concept Ontology for Multimedia). A broadcast news video dataset 200+ news videos/ 170 hours 61,901 shots Language - PowerPoint PPT Presentation

Citation preview

Bridge Semantic Gap: A Bridge Semantic Gap: A Large Scale Concept Large Scale Concept Ontology for Multimedia Ontology for Multimedia (LSCOM)(LSCOM)

Guo-Jun QiBeckman InstituteUniversity of Illinois at Urbana-Champaign

LSCOM (Large Scale Concept LSCOM (Large Scale Concept Ontology for Multimedia)Ontology for Multimedia)A broadcast news video dataset

200+ news videos/ 170 hours

61,901 shots

Language

◦ English/Arabic/Chinese

Why broadcast News Why broadcast News ontology?ontology?Critical mass of users, content

providers, applicationsGood content availability

(TRECVID LDC FBIS)Share Large set of core concepts

with other domains

LSCOM ProvidesLSCOM ProvidesRichly annotated video content

for accomplishing required access and analysis functions over massive amount of video content

Large scale useful well-defined semantic lexicon◦More than 3000 concepts◦374 annotated concepts◦Bridging semantic gap from low-level

features to high-level concepts

A LSCOM conceptA LSCOM concept

000 - ParadeConcept ID: 000Name: ParadeDefinition: Multiple units of marchers, devices, bands, banners or Music.Labeled: Yes

LSCOM HierarchyLSCOM Hierarchy http://www.lscom.org/ontology/index.html

Thing.Individual..Dangerous_Thing...Dangerous_Situation....Emergency_Incident.....Disaster_Event......Natural_Disaster....Natural_Hazard.....Avalance.....Earthquake.....Mudslide.....Natural_Disaster.....Tornado...Dangerous_Tangible_Thing....Cutting_Device

Definition: What’s the Definition: What’s the ontology? (Wikipedia)ontology? (Wikipedia)An ontology is a formal

representation of the knowledge by a set of concepts within a domain and the relationships between those concepts. It is used to reason about the properties of that domain, and may be used to describe the domain.

OntologyOntologyRepresents the visual knowledge

base in a structure way◦Graph structure◦Tree (hierarchy) structure

Images/videos can be effectively learned and retrieved by the coherence between concepts◦Logical coherence◦Statistical coherence

An Ontology Hierarchy: An Ontology Hierarchy: Military VehicleMilitary Vehicle

An example from An example from WikipediaWikipedia

Ontology Tree for LSCOMOntology Tree for LSCOM

A Light Scale Concept A Light Scale Concept Ontology for Multimedia Ontology for Multimedia Understanding (LSCOM-Lite)Understanding (LSCOM-Lite)The aim is to break the semantic

space using a few concepts (39 concepts).

Selection Criteria◦Semantic Coverage

As many as semantic concepts in News videos could be covered by the light concept set.

◦Compactness These concept should not semantically overlap.

◦Modelability These concepts could be modeled with a

smaller semantic gap.

Selected concept Selected concept dimensionsdimensionsDivide the semantic space into a

multimedia-dimensional space, where each dimension is nearly orthogonal◦Program Category◦Setting/Scene/Site◦People◦Objects◦Activities◦Events◦Graphics

Histogram of LSCOM-Lite Histogram of LSCOM-Lite ConceptsConcepts

Some example keyframesSome example keyframes

ApplicationsApplications

Application I: Conceptual Fusion

(most basic – early fusion)

Application II: Cross-Category

Classification (inter-class relation)

Application III: Event Dynamic in

Concept Space

Application I: Conceptual Application I: Conceptual FusionFusion

Video

Concept 1

Concept 2

Concept 3

Concept n

Visual Features

Classifier

LSCOM 374 ModelsLSCOM 374 Models

374 LIBSVM models◦http://www.ee.columbia.edu/ln/dvmm/col

umbia374/◦Feature used (MPEG-7 descriptors)

Color Moments Edge Histogram Wavelet Texture

◦LIBSVM – a library for support vector machine at http://www.csie.ntu.edu.tw/~cjlin/libsvm/

Application II: cross-category Application II: cross-category classification with concept classification with concept transfertransfer

G.-J. Qi et al. Towards Cross-Category Knowledge Propagation for Learning Visual Concepts, in CVPR 2011

Instance-Level Concept Instance-Level Concept CorrelationCorrelation

+1

-1

+1

-1

Mountain Castle

Mountain and castle

Castle o

nly Mountain only

Transfer FunctionTransfer Function

Mountain, Castle

Mountain

Castle

None of them

Model Concept RelationsModel Concept Relations

Automatically construct Automatically construct ontology in a data-driven ontology in a data-driven mannermanner

An application III – Event An application III – Event Dynamics in Concept SpaceDynamics in Concept Space

Event Detection with Event Detection with Concept DynamicsConcept Dynamics

W. Jiang et al, Semantic event detection based on visual concept prediction, ICME, Germany, 2008.

Open ProblemsOpen ProblemsCross-Dataset Gap

◦ Generalize LSCOM dataset to other dataset (e.g., non-news video dataset)

Cross-Domain Gap◦ Text script associated with news videos

Can help information extraction for visual concepts?

Automatic ontology construction◦ Task dependent v.s. task independent◦ Data driven v.s. preliminary knowledge (e.g.,

WordNet)◦ Incorporate prior human knowledge (logic relation

etc.)

TRECVID CompetitionTRECVID CompetitionTask 1: High-Level Feature

Extraction◦Input: subshot◦Output: detection results for 39

LSCOM-Lite concepts in the subshot

High-Level Feature High-Level Feature ExtractionExtractionEach concept assumed to be binary

(absent or present) in each subshotSubmission: Find subshots that

contain a certain concept, rank them by the detection confidence score, and submit the top 2000.

Evaluations: NIST evaluated 20 medium frequent concepts from 39 concepts using a 50% random samples of all the submission pools

20 Evaluated Concepts20 Evaluated Concepts

Evaluation Metric: Average Evaluation Metric: Average PrecisionPrecisionRelevant subshots should be

ranked higher than the irrelevant ones.

R is the number of relevant images in total, Rj is the number of relevant images in top j images, Ij indicates if the jth image is irrelevant or not.

1

1Average Precision

Njj

j

RI

R j

ResultsResults

TRECVID CompetitionTRECVID CompetitionTask II: Video Search

◦Input: text-based 24 topics◦Output: relevant subshots in the

database

Topics to searchTopics to search

Topics to search (cont’d)Topics to search (cont’d)

Topics to searchTopics to search

Three Types of Search Three Types of Search Systems Systems

Results: Automatic RunsResults: Automatic Runs

Results: Manual RunsResults: Manual Runs

Results: Interactive RunsResults: Interactive Runs

Machine Problem 7: Shot Machine Problem 7: Shot Boundary Detection in Boundary Detection in VideosVideos

GoalsGoalsDetect the abrupt content

changes between consecutive frames.◦Scene changes◦Scene cuts

StepsStepsStep 1: Measuring the change of

content between video frames◦Visual/Acoustic measurements

Step 2: Compare the content distance between successive frames. If the distance is larger than a certain threshold, then a shot boundary may exist.

Measuring Content based on Measuring Content based on Visual InformationVisual Information256 dimensional Color Histogram

◦In RGB space, normalize the r, g, b in [0,1]

◦Color spacenr

ng

8X8 histogram

Color HistogramsColor HistogramsDivide each image into four

parts, each part has a 8X8 histogram, and 256 dim features in total.

Acoustic FeaturesAcoustic Features

12 cepstral coefficients

Energy (sum of square of raw signals)

Zero crossing rates (ZCR)

ZCR = sum(|sign(S(2:N))-sign(S(1:N-

1))|)Hints: normalize energy to avoid it

over-dominating when computing distances between successive frames

DatasetsDatasetsTwo videos of little over one

minuteManually label the shot boundary

What to submitWhat to submitSource codeReport

◦compare shot boundary detection results returned by your algorithm with the manually labeled boundaries

◦Compare ◦Explain your choice of threshold◦Explain the differences between the

acoustic-based and visual-based detection results

Where and when to Where and when to submitsubmit

Email to ece.ece.ece.417@gmail.com

Due: May 2nd

Thanks! Thanks! Q&AQ&A

Recommended