Upload
xiaohu-zhu
View
32
Download
1
Embed Size (px)
Citation preview
Lei Wang School of Computing and Information Technology
University of Wollongong, Australia 15-Oct-2016
CBIR in the Era of Deep Learning -- A Perspective from Feature Representation
• Introduction of CBIR
• Evolution of CBIR
– Early days (before 2000)
– Days of BoF model (2000 ~ 2012)
– Era of Deep learning (after 2012)
• Conclusion
Outline
Images courtesy of related papers and authors
Introduction
• Retrieval
– Getting back information that has been stored in a
database
• Image Retrieval
Introduction
• Text-based image retrieval (TBIR, since late 1970’s)
– Manually associate images with text annotations
– Interpret images with high-level semantics
– Retrieval by matching the associated text annotations
Retrieval result of Google Images for “Airplane”
Introduction
• Issus with text-based image retrieval – Annotation is time consuming and labour intensive
– Only partially describe the visual content
– Human’s perception subjectivity
– Not support query by example
Drouin Post Office, front desks Iron Ore Fashion
Introduction
• Content-based image retrieval – Human annotators are replaced by computers
– Text annotations are replaced by visual features
– Retrieval by comparing the associated visual features
Drouin Post Office, front desks Iron Ore Fashion
Introduction
• National Science Foundation (NSF) organised a special
workshop on the topic of visual information
management (Feb 1992, San Jose, CA)
• "It would be impossible to cope with this explosion of image
information, unless the images were organized for retrieval.
The fundamental problem is that images, video, and other
similar data differ from numeric data and text data format,
and hence they require a totally different technique of
organization, indexing and query processing."
Introduction
• CBIR categorisation
– No query: Randomly browse similar images
– Query by text (by typing “airplane” or description)
– Query by example
• by using an image, sketch, or graphic of airplane
Introduction
• CBIR categorisation
– Find images of similar colour, texture or shape
– Find images of similar object, scene, place, event, etc.
Introduction
CBIR
Image matching
Image Recognition
Image Segmentation
Object detection
Image annotation
More tasks …
Introduction
• Applications of CBIR
– Archival photo collection management
– Personal album management
– Crime investigation
– Fashion and design
– Education and entertainment
– Localisation and navigation
– Medical Image analysis
– ….
Introduction
• CBIR systems – QBIC, Virage, Photobook, VisualSEEk, MARS, etc.
Source: http://vismod.media.mit.edu/vismod/demos/photobook/ Source: http://www.cse.unsw.edu.au/~jas/talks/curveix/notes.html
• Introduction of CBIR
• Evolution of CBIR
– Early days (before 2000)
– Days of BoF model (2000 ~ 2012)
– Era of Deep learning (after 2012)
• Conclusion
Outline
Images courtesy of related papers and authors
Early days
A new research problem received great interest
CBIR
Application
Semantic gap
Domain knowledge
User model
Query mode Visual features
Similarity measure
Interaction
Learning from data
System
Evaluation
• Hand-crafted features
– Color, texture, shape, structure, etc.
– Goal: “Invariant and discriminative”
• Similarity or distance measure
– Euclidean distance, Manhattan distance, etc.
– Specific measures designed for specific features
Early days
• Relevance feedback
– Bring user into the loop of CBIR to handle “Semantic Gap”
– A key point of “machine Learning” research in CBIR
Early days
• Relevance feedback
– Learning from small sample
– Semi-supervised learning
– Transductive learning
– Feature selection, dimensionality reduction
– Kernel based learning
– Manifold learning
– Relation learning
– …
Early days
• Achievements
– Researched CBIR from various perspectives
– Identified the key issues and obstacles
– Many initial but insightful observations and attempts
– Machine learning started playing an important role
• To be improved
– Basic, hand-crafted features, limited invariance
– Considerably depend on domain theory
– Small-sized databases for evaluation
• Introduction of CBIR
• Evolution of CBIR
– Early days (before 2000)
– Days of the BoF model (2000 ~ 2012)
– Era of Deep learning (after 2012)
• Conclusion
Outline
Images courtesy of related papers and authors
• SIFT, HOG, SURF, CENTRIST, filter-based, … – Invariant to view angle, rotation, scale, illumination, ...
Days of the BoF model
Local Invariant Features
http://www.robots.ox.ac.uk/~vgg/software/
Image courtesy of David Lowe, IJCV04
SIFT (Scale Invariant Feature Transform
Days of the BoF model
Local Invariant Features
http://www.robots.ox.ac.uk/~vgg/research/affine/#software/
Image A Image B
Days of the BoF model
Local Invariant Features
Source: http://ivt.sourceforge.net/examples.html
Image A Image B
Days of the BoF model
Local Invariant Features
Source: http://www.robots.ox.ac.uk/~vgg/share/SearchPractical2012.html
Image A Image B
Days of the BoF model
Interest point detection or
Dense sampling
The cropped detected regions
Bag-of-feature model is borrowed from text analysis
Days of the BoF model
Generated “Visual Words”
…
…
…
…
Word 1:
Word 2:
Word 3:
Word 4:
Word k: … … … … … … … … … … … … … … … … … … … … … … … … …
…
Days of the BoF model
From an image to a histogram
[ n1 , n2, … , nk ]
The number of occurrence of 1st “word” in this image
2 Rk
[ 0 , 1, 0, … , 0 ] 2 Rk
[ 1 , 0, 0, … , 0 ] 2 Rk
[ 0 , 0, 1, … , 0 ] 2 Rk… … … …
Days of the BoF model
Classifying, clustering or retrieving images
Rk
y = w>x + b
Days of the BoF model
A Bag-of-Features Image Analysis System
Image database
Feature extraction
Codebook generation
Feature coding
Feature pooling
Classification Clustering or
Retrieval
Days of the BoF model
Local Invariant Features, such as SIFT (Lowe, ICCV99)
Video Google (Sivic, CVPR03); Bag-of-keypoints (Csurka, SLCV@ECCV04)
Vocabulary tree (Nister, CVPR06); Randomized Clustering Forests (Moosmann, NIPS06); Spatial Pyramid Matching (Lazebnik, CVPR06)
Pyramid Match Kernel (Grauman, ICCV05); Dense sampling (Jurie, ICCV05); Compact Codebook (Winn, ICCV05)
Comparative Study (Zhang, IJCV07); Coding with Fisher Kernels (Perronnin, CVPR07)
Local Soft-assignment Coding & Mix-order pooling (Liu, ICCV11); Comparative Study on BoF model (Chatfield, BMVC, 2011);
Locality-constrained Linear Coding for BoF (Wang, CVPR10); Coding & pooling scheme comparison (Boureau, CVPR10);
Sparse coding for BoF (Yang, CVPR09) Local Coordinate Coding (Yu, NIPS09)
Kernel Codebook (van Gemert, ECCV08); In Defense of Nearest Neighbor Classifier (Boiman, CVPR08)
11
10
09
08
07
06
05
03
99
Days of the BoF model
Key issues of CBIR with the BoF model
Source: Nister and Stewenius, CVPR06
• How to quickly create a large visual codebook – hierarchical k-means clustering – Approximate k-means clustering
Days of the BoF model
Key issues of CBIR with the BoF model
• How to incorporate spatial information – The BoF model ignores the spatial information of
SIFT features
Spatial Pyramid Matching Re-ranking with Spatial verification
Days of the BoF model
Key issues of CBIR with the BoF model
Retrieval result before spatial verification
Query:
Days of the BoF model
25 points matched under a consistent spatial relationship
Only 4 points matched under a consistent spatial relationship
• Re-ranking with spatial verification
Key issues of CBIR with the BoF model
Days of the BoF model
Retrieval result after spatial verification
Query:
Key issues of CBIR with the BoF model
Days of the BoF model
• Large-scale image retrieval – Memory, time, precision – Approximate nearest-neighbor search
x1
x2
xd
.
.
. 0100101100…
How?
Key issues of CBIR with the BoF model
Days of the BoF model
• Local sensitive hashing (LSH) – Random projection, data independent, unsupervised,
• Learning compact binary codes – Preserving sample similarities, data dependent
1
1
1
0
0
0
LSH
Key issues of CBIR with the BoF model
Days of the BoF model
Retrieval examples from the “Oxford5K” data set
Source: Philbin et. al, Object retrieval with large vocabularies and fast spatial matching, CVPR07
Days of the BoF model (Summary)
• Achievements – Local invariant features plays a fundamental role – Visual codebook creation, feature coding, and feature
pooling are extensively studied – Multiple benchmark data sets are established – Large-scale image retrieval is also researched
• To be improved – Feature representation and recognition separate – Focused more on object level level retrieval but less
on semantic level retrieval
• Introduction of CBIR
• Evolution of CBIR
– Early days (before 2000)
– Days of the BoF model (2000 ~ 2012)
– Era of Deep learning (after 2012)
• Conclusion
Outline
Images courtesy of related papers and authors
Era of Deep Learning
Visual • Images • Videos
Audio • Speech • Music
Text • Natural Language
Planning
…
Era of Deep Learning
• Image Recognition – Faces, objects, poses, scenes, …
• Video content analysis – Action, activities, events, summarization, …
• Visual information management – Search, retrieval, indexing, browsing, …
• Potential Outcome: AI – Computers can see and understand visual
information – Robotics, self-driving cars, surveillance – ….
Era of Deep Learning
Object detection (Source: Rich feature hierarchies for accurate object detection and semantic segmentation, CVPR 2014)
Face Recognition (Source: DeepFace: Closing the Gap to Human-Level Performance in Face Verification, CVPR 2014)
Era of Deep Learning
Pose estimation (DeepPose: Human Pose Estimation via Deep Neural Networks, CVPR2014)
Image Segmentation (Source: SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation, IEEE TPAMI 2016)
Era of Deep Learning
• Fine-grained image recognition
• Human attribute classification
[Ning Zhang et al. CVPR 2014]
[Branson et al. arXiv 2014 ]
Era of Deep Learning
• Action Recognition
• Large-scale Video Classification
[Karpathy et al. CVPR 2014]
[Simonyan et al. arXiv 2014]
Era of Deep Learning
• Invariant and discriminative features
Feature Representation
Feature Extraction Classification “Panda”?
Prior Knowledge, Experience
Pose Occlusion Multiple objects
Inter-class similarity
Image courtesy of M. Ranzato
Era of Deep Learning
• Directly learn features representations from data. • Joint learn feature representation and classifier.
Low-level Features
Mid-level Features
High-level Features Classifier
Deep Learning: train layers of features so that classifier works well.
More abstract representation
“Panda”?
Image courtesy of M. Ranzato
Era of Deep Learning
• Deep Learning – Inspired by the way human brain processes information
– Many layers of non-linear information processing stages
Era of Deep Learning
Yes. • Basic ideas common to past neural networks research • Standard machine learning strategies still relevant.
No.
Have we been here before?
Computational Power Large-scale Data New Algorithms
Deep Learning
Era of Deep Learning
Convolutional Neural Networks (CNNs)
• A special multi-stage architecture inspired by visual system
Era of Deep Learning
Source: Slide: Girshick
Fukushima 1980 Neocognitron
LeCun et al. 1989-1998 Hand-written digit reading
Rumelhart, Hinton, Williams 1986 “T” versus “C” problem
...
Krizhevksy, Sutskever, Hinton 2012 ImageNet classification breakthrough “SuperVision” CNN
Convolutional Neural Networks (CNNs)
Era of Deep Learning
CNNs: ImageNet Breakthrough
● Krizhevsky et al. win 2012 ImageNet classification with a much bigger ConvNet ○ deeper: 7 stages vs 3 before ○ larger: 60 million parameters vs 1 million before ○ 16.4% error (top-5) vs Next best 26.2% error
● This was made possible by:
○ fast hardware: GPU-optimized code ○ big dataset: 1.2 million images vs thousands before ○ better regularization: dropout et al.
[Krizhevsky et al. NIPS 2012]
Image courtesy of Deng et al.
Era of Deep Learning
CBIR: From SIFT to CNNs
• Three main approaches – Directly use pre-trained CNNs models
• to extract feature representations
– Fine-tune pre-trained CNNs models • with information (pairwise or triplet similarity)
– Bag-of-features model on CNN features • “Deep SIFT”
Era of Deep Learning
1. Directly use pre-trained CNNs
• How to use the feature representations? – Which layer? – How to pool the features in a convolutional layer? – How to select the features in a convolutional layer?
Era of Deep Learning
1. Directly use pre-trained CNNs
• How to use the feature representations? – Which layer?
Fully connected layer Convolutional layer
Era of Deep Learning
1. Directly use pre-trained CNNs
• How to use the feature representations? – How to pool the features in a convolutional layer?
Depth
Height
Width
x1
x2
.
.
.
xn
How?
Era of Deep Learning
1. Directly use pre-trained CNNs
• How to use the feature representations? – How to pool the features in a convolutional layer?
Depth
Height
Width
x1
x2
.
.
.
xn
How? • Sum-pooling • Max-pooling • Grid-based max-pooling • Region-based pooling • Mixed sum & max pooling
Era of Deep Learning
1. Directly use pre-trained CNNs
• How to use the feature representations? – How to select the features in a convolutional layer?
• Weighting • Activation
magnitude • Region
detection
Source: Cao et. al, Where to Focus: Query Adaptive Matching for Instance Retrieval Using Convolutional Feature Maps
Era of Deep Learning
2. Fine-tune pre-trained CNNs
• To incorporate extra information from a new image data set – Side information (pairwise or triplet similarity) – Distance metric learning
√
X
Era of Deep Learning
2. Fine-tune pre-trained CNNs
Source: MatchNet, CVPR2015 Source: Learning Fine-Grained Image Similarity with Deep Ranking. CVPR 2014
Era of Deep Learning
3. Bag-of-features model on “Deep SIFT”
SIFT (Scale Invariant Feature Transform
Source: Multi-scale Orderless Pooling of Deep Convolutional Activation Features, ECCV2014
Era of Deep Learning
3. Bag-of-features model on “Deep SIFT”
SIFT (Scale Invariant Feature Transform
“Deep SIFT”
Source: Cao et. al, Where to Focus: Query Adaptive Matching for Instance Retrieval Using Convolutional Feature Maps
Era of Deep Learning
3. Bag-of-features model on “Deep SIFT”
Codebook generation
Feature coding
Feature pooling
Classification Clustering or
Retrieval
Or
Era of Deep Learning
Image Classification with DCNN (Krizhevsky, NIPS12)
CNN Features off-the-shelf (Razavian, CVPRW14); Neural codes (Babenko, ECCV14) Deep ranking (Wang, CVPR14) Multi-scale orderless pooling (Gong, ECCV14) Encoding High Dimensional Local Features (Liu, NIPS14) Survey: Deep learning for CBIR (Wan, ACMMM14)
16
15
14
13
12
Deep filter banks (Cimpoi, CVPR15); Exploiting Local Features from DNN (Ng, CVPRW15) SPoC (Babenko, ICCV15); MatchNet (Han, CVPR15)
R-MAC (Tolias, ICLR16); CNN IR Learns from BoW (Radenovic, ECCV16); CroW (Kalantidis, ECCVW16); Where to focus (Cao, 2016)
Some papers appeared on Arxiv
Summary
• A very limited (and biased) account of CBIR • CBIR has made significant progress during two
past decades • The development of feature representation plays
a key role • Issues to be resolved
– How to transfer the benefit of Deep Learning? – How to deal with unsupervised learning case? – How to better handle the semantic gap? – …