Multi-Level Annotation of Natural Scenes Using Dominant Image

Multi-Level Annotation of Natural Scenes Using Dominant Image

Compounds and Semantic Concepts

Jianping Fan, Yuli Gao, Hangzai Luo

Department of Computer Science

University of North Carolina at Charlotte

Outline of Presentation

� Research Motivation

� Semantic Image Representation

� Semantic Image Concept Modeling

� Adaptive EM Algorithm for Classifier Training

� Multi-Level Image Annotation

� Conclusions

(b) Semantics of Image Contents should be captured based on the human perception dimensions for judgment of image similarity!

(a) Semantic Image Indexing should be done by using the semantics of image contents!

Google indexes images by using associated text, but the semantics of image contents may not be described by the associated text correctly!

1. Research Motivation

� Three Types for Image Similarity Judgment:

(a) Image Similarity with same Dominant Image Compound

Dog


� Three Types of Image Similarity Judgment:

(b) Image Similarity with same Semantic Image Concept

Mountain View


� Three Types of Image Similarity Judgment:

(c) Image Similarity with same Semantic Image Event

Sailing


� Image Similarity Judgment: Conclusions

Image Similarity could be on Multiple Levels

Image Annotation should be on Multiple Levels

Dominant Image Compounds & Semantic Image Concepts & Events


� Semantic Image Classification is widely used to enable Automatic Image Annotation, but its performance largely depends on three issues:

(a) The ability of the underlying Image Patterns for image representation and feature extraction to capture the Middle-Level Semantics of Images!

(b) The ability of visual features on discriminating among different semantic image concepts!

(c) Performance of Classifier Training Algorithms.


Semantic Image Concepts or Events

Low-Level Image Signals

� Challenges for Semantic Image Classification

Semantic Gap

What are the suitable image patterns that can be used to enhance quality of features and narrow the semantic gap?

2. Image Content Representation

Semantic Image Concepts

Image Patterns to Capture Middle-Level Image Semantics


Sem

an

tic G

ap

Sem

an

tic

Brid

ge 1

Sem

an

ti c

Br id

ge 2

Gap

1G

ap

2


� Basic Requirements of Image Patterns:

(a) Be able to capture middle-level semantics of images, narrow the semantic gap and to be semantic to human beings!

(b) Be able to enhance the quality of features and improve classifier performance!


Semantic Image Concepts

Semantic-Sensitive Salient Objects


Sem

an

tic G

ap

Sem

an

tic

Brid

ge 1

Sem

an

ti c

Br id

ge 2

Gap

1G

ap

2


� Three Issues for Salient Objects

a. What are the salient objects?

b. What is the basic vocabulary?

c. How can we detect them automatically?---The same type of salient object may appear in different

images with very different visual properties!

---Requirement of salient objects: they should be able to capture middle-level semantics of images without performing semanticobject detection, and they should be able to capture the dominant visual properties of the relevant semantic objects!

---The number of “key” salient objects in a specific imagedomain should be limited!

� Definition of Salient Objects


Salient Objects are defined as the dominant imagecompounds that are semantic to human beings!


� WordNet or Domain Knowledge can be used to define Basic Vocabulary of Salient Objects

Natural Images

Sky Ground Foliage

Blue Sky

Cloudy Sky

Floor Sand Grass

Green Foliage

Floral Foliage


� Automatic Salient Object Detection Function

a. Homogeneous Image Regions are first detected!

b. Support Vector Machines (SVM) for binary region classification and similarity-based Region Merging!----classifier training is performed on multiple images to capture different visual

properties of the same type of salient object under different vision conditions!

---we boost mean shift, edgeflow, seeded region growing for image segmentation!


� Average Performance of Detection Functions

Objects

Precision

Recall

Brown Horse

95.6%

100%

Grass

92.9%

94.8%

Purple Flower

96.1%

95.2%

Red Flower

87.8%

86.4%

Sand Field

98.8%

96.6%

Objects Rock

Precision

Recall

98.7%

100%

Water

86.7%

89.5%

Human Skin

86.2%

85.4%

Yellow Flower

87.4%

89.3%

Sunset/Sunrise

92.5%

95.2%

Objects Sky

Precision

Recall

87.6%

94.5%

Snow

86.7%

87.5%

Waterfall

88.5%

87.1%

Sail Cloth

96.3%

94.9%

Forest

85.4%

84.8%


� Salient Object Detection Results


� Salient Object Detection Results

Observation: Salient Objects have provided the dominant visual propertiesfor the relevant semantic objects!


� Advantages & Benefits:

© It will enable a compound-level image annotation, users will have more choices to select the keywords at compound level to specify their query concepts!

(b) It is able to capture the middle-level semantics of images, thus it is able to enhance the quality offeatures and improve classifier performance!

(a) It is able to reach a good balance between the semantics sufficiency and the segmentation cost!

3. Semantic Image Concept Modeling

Semantic Concept 1 Semantic Concept i Semantic Concept Nc

Salient Object Type 1 Salient Object Type k Salient Object Type Ns

Color/Texture Pattern 1 Color/Texture Pattern j Color/Texture Pattern Nr

How can we model semantic image concepts by using salient objects?


How can we quantify the semantic relationship (i.e., image context)between one specific image concept and the relevant salient objects?


� Finite Mixture Model for Image Concept Modeling

(a) The semantic relationship (i.e., image context) between semantic image concept and the relevant salient objects is modeled by finite mixture model!

(b) Different semantic image concepts are relevant to different typesand different numbers of salient objects with different importance!

© The class distribution for each type of relevant salient objects is modeled by using multiple mixture components to capture the different visual properties under different conditions!

iij Rr

i

isj RXPSXP ϖθθ

κ

),|(),,(1

∑=

=

4. Adaptive EM Algorithm

Problems of traditional EM algorithm:

a. Knowledge of K is required and K is often predefined based on experience!

b. Local maximum!c. Sensitive to initial values of parameters!

??? ϖθκ


� We start from a large value of K to capture the essential relationships between semantic image concept and relevant salient objects!

� We perform automatic merging, splitting and elimination of mixture components to search optimal model structure and parameters!

When we should do merging, splitting and elimination?


Merging: Two mixture components overpopulate the relevant sample regions!

Jensen-Shannon Divergence is used to quantify the overlapping!

Original Mixture Components

Merged Mixture Component


Splitting: One specific component under-populate the relevant sample region!

Original Mixture Component

Local Sample Distribution

Elimination: Two mixture components from two concept models overlap too much!

Mixture Components for Concept 1

Mixture Componentsfor Concept 2


(a) How can we make the margins among different concepts large enough?

(b) What the negative samples can do for us to maximize the margins?


)(

))|,(),,|,((),,(

ΘΦ

Θ=Θ

jlklkj

merge

CXPSCXPJSklJ

θ

)(

)),|,(),,|,((),,(lim

ΘΦ=Θ

mmillj

inatione

SCXPSCXPJSmlJ

θθ

)|,(),,|,((

)(),(

Θ

ΘΦ=Θ

jllj

splitCXPSCXPJS

lJθ

Merge:

Split:

Elimination:

� Criteria for Merging, Splitting and Elimination


1),,(),,(),( lim

1 11 11

=Θ+Θ+Θ ∑ ∑∑∑∑= +== ==

mlJklJlJ inatione

l lm

merge

l k

split

l

j ij jj κ κκ κκ

� Normalization Factor )(ΘΦ is determined by:

� Acceptance Probability to prevent poor operations

Θ−Θ= 1,

),(),(expmin 21

τ

XLXLPaccept

),( 1ΘXL ),( 2ΘXLand are the penalized likelihood functions

before and after performing merging, splitting or elimination!


� Advantages of Adaptive EM Algorithm

(a) It is able to search the optimal model structure K and model parameters automatically in a single probabilistic scheme!

(b) It is able to avoid local maximum problem by re-organizing the distribution of mixture components in feature space!

© It is able to integrate negative samples for classifier training, thus it is able to support margin-based classifier training like SVM!

Convergence of our Adaptive EM Algorithm


5. Semantic Image Classification

6. Multi-Level Image Annotation




7. Benchmark: Image Representation

� Comparison with Blobs under same classifier

Concepts

Salient Objects

Blobs

Mountain View

81.7% (p), 84.3% ®

Beach

80.5% (p)84.7% ®

Garden

80.6% (p)90.6% ®

78.5% (p), 75.5% ®74.6% (p)75.9% ®

73.3% (p)78.2% ®

Concepts Sailing Skiing Desert

Salient Objects 87.6% (p), 85.5% ® 85.4% (p)83.7% ®

89.6% (p)82.8% ®

Blobs 79.5% (p), 77.3% ®79.3% (p) 78.2% ®

76.6% (p)78.5% ®

7. Benchmark: Image Annotation

Image Blobs Salient Objects

It is very easy to use salient objects to specify query concepts!

7. Benchmark: Image Classification

� Comparison with SVM Classifiers

Negative samples are also integrated for FMM classifier training like SVM!

7. Benchmark: Text Classification

� Comparison with SVM Classifiers

SVM classifiers are better than FMM classifiers for high-dimensional text data classification!

FMM needs a large number of labeled samples in such high-dimensional space!

8. Conclusions

� A multi-level approach for image annotation is proposed. Two advantages of this approach are:

(1) it is able to capture middle-level image

semantics, enhance quality of features

and improve classifier performance!

(2) users will have more choices to specify

their query concepts using the keywords

at both compound and concept levels!

8. Conclusions

� An adaptive EM algorithm has been proposed for more effective model selection and model parameter estimation!

� Negative Samples are integrated to enable margin-based classifier training in a FMM framework!

---our FMM classifier can achieve the same advantage like SVM!

Ear – Huge Fan

Body -- Wall

Leg --- Pillar

Tail -- Snake

Trunk --- Tree

Tusk – Spear

@ Prof. R. Jain, IEEE Multimedia,2001@ Journal of Electronic Imaging,2000

Acknowledgement

� Two Wonderful Papers:

Acknowledgement

� Solutions: Compound-Based Chinese Characters

Follow up

People

People

People follow people Follow up

Chinese Character:

English Meaning:

Acknowledgement

� Solutions: Compound-Based Chinese Characters

Crowd

people

people

people

Multiple people together Crowd

Chinese Character:

English Meaning:

We just transform this idea into a statistical image modeling framework!

Acknowledgement

� Prof. Ying Wu at Northwestern University for useful discussion!

� Prof. Edward Chang at UCSB, mentor for the final version of this paper!

� Reviewers for the information on SMEM, CEM!

Q/A

Online demo is available at:

http://www.cs.uncc.edu/~jfan

Documents

Multi-Level Annotation of Natural Scenes Using Dominant Image