48
Multi-Level Annotation of Natural Scenes Using Dominant Image Compounds and Semantic Concepts Jianping Fan, Yuli Gao, Hangzai Luo Department of Computer Science University of North Carolina at Charlotte

Multi-Level Annotation of Natural Scenes Using Dominant Image

Embed Size (px)

Citation preview

Page 1: Multi-Level Annotation of Natural Scenes Using Dominant Image

Multi-Level Annotation of Natural Scenes Using Dominant Image

Compounds and Semantic Concepts

Jianping Fan, Yuli Gao, Hangzai Luo

Department of Computer Science

University of North Carolina at Charlotte

Page 2: Multi-Level Annotation of Natural Scenes Using Dominant Image

Outline of Presentation

� Research Motivation

� Semantic Image Representation

� Semantic Image Concept Modeling

� Adaptive EM Algorithm for Classifier Training

� Multi-Level Image Annotation

� Conclusions

Page 3: Multi-Level Annotation of Natural Scenes Using Dominant Image

(b) Semantics of Image Contents should be captured based on the human perception dimensions for judgment of image similarity!

(a) Semantic Image Indexing should be done by using the semantics of image contents!

Google indexes images by using associated text, but the semantics of image contents may not be described by the associated text correctly!

Page 4: Multi-Level Annotation of Natural Scenes Using Dominant Image

1. Research Motivation

� Three Types for Image Similarity Judgment:

(a) Image Similarity with same Dominant Image Compound

Dog

Page 5: Multi-Level Annotation of Natural Scenes Using Dominant Image

1. Research Motivation

� Three Types of Image Similarity Judgment:

(b) Image Similarity with same Semantic Image Concept

Mountain View

Page 6: Multi-Level Annotation of Natural Scenes Using Dominant Image

1. Research Motivation

� Three Types of Image Similarity Judgment:

(c) Image Similarity with same Semantic Image Event

Sailing

Page 7: Multi-Level Annotation of Natural Scenes Using Dominant Image

1. Research Motivation

� Image Similarity Judgment: Conclusions

Image Similarity could be on Multiple Levels

Image Annotation should be on Multiple Levels

Dominant Image Compounds & Semantic Image Concepts & Events

Page 8: Multi-Level Annotation of Natural Scenes Using Dominant Image

1. Research Motivation

� Semantic Image Classification is widely used to enable Automatic Image Annotation, but its performance largely depends on three issues:

(a) The ability of the underlying Image Patterns for image representation and feature extraction to capture the Middle-Level Semantics of Images!

(b) The ability of visual features on discriminating among different semantic image concepts!

(c) Performance of Classifier Training Algorithms.

Page 9: Multi-Level Annotation of Natural Scenes Using Dominant Image

1. Research Motivation

Semantic Image Concepts or Events

Low-Level Image Signals

� Challenges for Semantic Image Classification

Semantic Gap

What are the suitable image patterns that can be used to enhance quality of features and narrow the semantic gap?

Page 10: Multi-Level Annotation of Natural Scenes Using Dominant Image

2. Image Content Representation

Semantic Image Concepts

Image Patterns to Capture Middle-Level Image Semantics

Low-Level Image Signals

Sem

an

tic G

ap

Sem

an

tic

Brid

ge 1

Sem

an

ti c

Br id

ge 2

Gap

1G

ap

2

Page 11: Multi-Level Annotation of Natural Scenes Using Dominant Image

2. Image Content Representation

� Basic Requirements of Image Patterns:

(a) Be able to capture middle-level semantics of images, narrow the semantic gap and to be semantic to human beings!

(b) Be able to enhance the quality of features and improve classifier performance!

Page 12: Multi-Level Annotation of Natural Scenes Using Dominant Image

2. Image Content Representation

Semantic Image Concepts

Semantic-Sensitive Salient Objects

Low-Level Image Signals

Sem

an

tic G

ap

Sem

an

tic

Brid

ge 1

Sem

an

ti c

Br id

ge 2

Gap

1G

ap

2

Page 13: Multi-Level Annotation of Natural Scenes Using Dominant Image

2. Image Content Representation

� Three Issues for Salient Objects

a. What are the salient objects?

b. What is the basic vocabulary?

c. How can we detect them automatically?---The same type of salient object may appear in different

images with very different visual properties!

---Requirement of salient objects: they should be able to capture middle-level semantics of images without performing semanticobject detection, and they should be able to capture the dominant visual properties of the relevant semantic objects!

---The number of “key” salient objects in a specific imagedomain should be limited!

Page 14: Multi-Level Annotation of Natural Scenes Using Dominant Image

� Definition of Salient Objects

2. Image Content Representation

Salient Objects are defined as the dominant imagecompounds that are semantic to human beings!

Page 15: Multi-Level Annotation of Natural Scenes Using Dominant Image

2. Image Content Representation

� WordNet or Domain Knowledge can be used to define Basic Vocabulary of Salient Objects

Natural Images

Sky Ground Foliage

Blue Sky

Cloudy Sky

Floor Sand Grass

Green Foliage

Floral Foliage

Page 16: Multi-Level Annotation of Natural Scenes Using Dominant Image

2. Image Content Representation

� Automatic Salient Object Detection Function

a. Homogeneous Image Regions are first detected!

b. Support Vector Machines (SVM) for binary region classification and similarity-based Region Merging!----classifier training is performed on multiple images to capture different visual

properties of the same type of salient object under different vision conditions!

---we boost mean shift, edgeflow, seeded region growing for image segmentation!

Page 17: Multi-Level Annotation of Natural Scenes Using Dominant Image

2. Image Content Representation

� Average Performance of Detection Functions

Objects

Precision

Recall

Brown Horse

95.6%

100%

Grass

92.9%

94.8%

Purple Flower

96.1%

95.2%

Red Flower

87.8%

86.4%

Sand Field

98.8%

96.6%

Objects Rock

Precision

Recall

98.7%

100%

Water

86.7%

89.5%

Human Skin

86.2%

85.4%

Yellow Flower

87.4%

89.3%

Sunset/Sunrise

92.5%

95.2%

Objects Sky

Precision

Recall

87.6%

94.5%

Snow

86.7%

87.5%

Waterfall

88.5%

87.1%

Sail Cloth

96.3%

94.9%

Forest

85.4%

84.8%

Page 18: Multi-Level Annotation of Natural Scenes Using Dominant Image

2. Image Content Representation

� Salient Object Detection Results

Page 19: Multi-Level Annotation of Natural Scenes Using Dominant Image

2. Image Content Representation

� Salient Object Detection Results

Observation: Salient Objects have provided the dominant visual propertiesfor the relevant semantic objects!

Page 20: Multi-Level Annotation of Natural Scenes Using Dominant Image

2. Image Content Representation

� Advantages & Benefits:

© It will enable a compound-level image annotation, users will have more choices to select the keywords at compound level to specify their query concepts!

(b) It is able to capture the middle-level semantics of images, thus it is able to enhance the quality offeatures and improve classifier performance!

(a) It is able to reach a good balance between the semantics sufficiency and the segmentation cost!

Page 21: Multi-Level Annotation of Natural Scenes Using Dominant Image

3. Semantic Image Concept Modeling

Semantic Concept 1 Semantic Concept i Semantic Concept Nc

Salient Object Type 1 Salient Object Type k Salient Object Type Ns

Color/Texture Pattern 1 Color/Texture Pattern j Color/Texture Pattern Nr

How can we model semantic image concepts by using salient objects?

Page 22: Multi-Level Annotation of Natural Scenes Using Dominant Image

3. Semantic Image Concept Modeling

How can we quantify the semantic relationship (i.e., image context)between one specific image concept and the relevant salient objects?

Page 23: Multi-Level Annotation of Natural Scenes Using Dominant Image

3. Semantic Image Concept Modeling

� Finite Mixture Model for Image Concept Modeling

(a) The semantic relationship (i.e., image context) between semantic image concept and the relevant salient objects is modeled by finite mixture model!

(b) Different semantic image concepts are relevant to different typesand different numbers of salient objects with different importance!

© The class distribution for each type of relevant salient objects is modeled by using multiple mixture components to capture the different visual properties under different conditions!

iij Rr

i

isj RXPSXP ϖθθ

κ

),|(),,(1

∑=

=

Page 24: Multi-Level Annotation of Natural Scenes Using Dominant Image

4. Adaptive EM Algorithm

Problems of traditional EM algorithm:

a. Knowledge of K is required and K is often predefined based on experience!

b. Local maximum!c. Sensitive to initial values of parameters!

??? ϖθκ

Page 25: Multi-Level Annotation of Natural Scenes Using Dominant Image

4. Adaptive EM Algorithm

� We start from a large value of K to capture the essential relationships between semantic image concept and relevant salient objects!

� We perform automatic merging, splitting and elimination of mixture components to search optimal model structure and parameters!

When we should do merging, splitting and elimination?

Page 26: Multi-Level Annotation of Natural Scenes Using Dominant Image

4. Adaptive EM Algorithm

Merging: Two mixture components overpopulate the relevant sample regions!

Jensen-Shannon Divergence is used to quantify the overlapping!

Original Mixture Components

Merged Mixture Component

Page 27: Multi-Level Annotation of Natural Scenes Using Dominant Image

4. Adaptive EM Algorithm

Splitting: One specific component under-populate the relevant sample region!

Original Mixture Component

Local Sample Distribution

Page 28: Multi-Level Annotation of Natural Scenes Using Dominant Image

Elimination: Two mixture components from two concept models overlap too much!

Mixture Components for Concept 1

Mixture Componentsfor Concept 2

4. Adaptive EM Algorithm

(a) How can we make the margins among different concepts large enough?

(b) What the negative samples can do for us to maximize the margins?

Page 29: Multi-Level Annotation of Natural Scenes Using Dominant Image

4. Adaptive EM Algorithm

)(

))|,(),,|,((),,(

ΘΦ

Θ=Θ

jlklkj

merge

CXPSCXPJSklJ

θ

)(

)),|,(),,|,((),,(lim

ΘΦ=Θ

mmillj

inatione

SCXPSCXPJSmlJ

θθ

)|,(),,|,((

)(),(

Θ

ΘΦ=Θ

jllj

splitCXPSCXPJS

lJθ

Merge:

Split:

Elimination:

� Criteria for Merging, Splitting and Elimination

Page 30: Multi-Level Annotation of Natural Scenes Using Dominant Image

4. Adaptive EM Algorithm

1),,(),,(),( lim

1 11 11

=Θ+Θ+Θ ∑ ∑∑∑∑= +== ==

mlJklJlJ inatione

l lm

merge

l k

split

l

j ij jj κ κκ κκ

� Normalization Factor )(ΘΦ is determined by:

� Acceptance Probability to prevent poor operations

Θ−Θ= 1,

),(),(expmin 21

τ

XLXLPaccept

),( 1ΘXL ),( 2ΘXLand are the penalized likelihood functions

before and after performing merging, splitting or elimination!

Page 31: Multi-Level Annotation of Natural Scenes Using Dominant Image

4. Adaptive EM Algorithm

� Advantages of Adaptive EM Algorithm

(a) It is able to search the optimal model structure K and model parameters automatically in a single probabilistic scheme!

(b) It is able to avoid local maximum problem by re-organizing the distribution of mixture components in feature space!

© It is able to integrate negative samples for classifier training, thus it is able to support margin-based classifier training like SVM!

Page 32: Multi-Level Annotation of Natural Scenes Using Dominant Image

Convergence of our Adaptive EM Algorithm

4. Adaptive EM Algorithm

Page 33: Multi-Level Annotation of Natural Scenes Using Dominant Image

5. Semantic Image Classification

Page 34: Multi-Level Annotation of Natural Scenes Using Dominant Image

6. Multi-Level Image Annotation

Page 35: Multi-Level Annotation of Natural Scenes Using Dominant Image

6. Multi-Level Image Annotation

Page 36: Multi-Level Annotation of Natural Scenes Using Dominant Image

6. Multi-Level Image Annotation

Page 37: Multi-Level Annotation of Natural Scenes Using Dominant Image

6. Multi-Level Image Annotation

Page 38: Multi-Level Annotation of Natural Scenes Using Dominant Image

7. Benchmark: Image Representation

� Comparison with Blobs under same classifier

Concepts

Salient Objects

Blobs

Mountain View

81.7% (p), 84.3% ®

Beach

80.5% (p)84.7% ®

Garden

80.6% (p)90.6% ®

78.5% (p), 75.5% ®74.6% (p)75.9% ®

73.3% (p)78.2% ®

Concepts Sailing Skiing Desert

Salient Objects 87.6% (p), 85.5% ® 85.4% (p)83.7% ®

89.6% (p)82.8% ®

Blobs 79.5% (p), 77.3% ®79.3% (p) 78.2% ®

76.6% (p)78.5% ®

Page 39: Multi-Level Annotation of Natural Scenes Using Dominant Image

7. Benchmark: Image Annotation

Image Blobs Salient Objects

It is very easy to use salient objects to specify query concepts!

Page 40: Multi-Level Annotation of Natural Scenes Using Dominant Image

7. Benchmark: Image Classification

� Comparison with SVM Classifiers

Negative samples are also integrated for FMM classifier training like SVM!

Page 41: Multi-Level Annotation of Natural Scenes Using Dominant Image

7. Benchmark: Text Classification

� Comparison with SVM Classifiers

SVM classifiers are better than FMM classifiers for high-dimensional text data classification!

FMM needs a large number of labeled samples in such high-dimensional space!

Page 42: Multi-Level Annotation of Natural Scenes Using Dominant Image

8. Conclusions

� A multi-level approach for image annotation is proposed. Two advantages of this approach are:

(1) it is able to capture middle-level image

semantics, enhance quality of features

and improve classifier performance!

(2) users will have more choices to specify

their query concepts using the keywords

at both compound and concept levels!

Page 43: Multi-Level Annotation of Natural Scenes Using Dominant Image

8. Conclusions

� An adaptive EM algorithm has been proposed for more effective model selection and model parameter estimation!

� Negative Samples are integrated to enable margin-based classifier training in a FMM framework!

---our FMM classifier can achieve the same advantage like SVM!

Page 44: Multi-Level Annotation of Natural Scenes Using Dominant Image

Ear – Huge Fan

Body -- Wall

Leg --- Pillar

Tail -- Snake

Trunk --- Tree

Tusk – Spear

@ Prof. R. Jain, IEEE Multimedia,2001@ Journal of Electronic Imaging,2000

Acknowledgement

� Two Wonderful Papers:

Page 45: Multi-Level Annotation of Natural Scenes Using Dominant Image

Acknowledgement

� Solutions: Compound-Based Chinese Characters

Follow up

People

People

People follow people Follow up

Chinese Character:

English Meaning:

Page 46: Multi-Level Annotation of Natural Scenes Using Dominant Image

Acknowledgement

� Solutions: Compound-Based Chinese Characters

Crowd

people

people

people

Multiple people together Crowd

Chinese Character:

English Meaning:

We just transform this idea into a statistical image modeling framework!

Page 47: Multi-Level Annotation of Natural Scenes Using Dominant Image

Acknowledgement

� Prof. Ying Wu at Northwestern University for useful discussion!

� Prof. Edward Chang at UCSB, mentor for the final version of this paper!

� Reviewers for the information on SMEM, CEM!

Page 48: Multi-Level Annotation of Natural Scenes Using Dominant Image

Q/A

Online demo is available at:

http://www.cs.uncc.edu/~jfan